Python Text Processing with NLTK Cookbook Chapter 2 Errata

It has come to my attention that there are two errors in Chapter 2, Replacing and Correcting Words of Python Text Processing with NLTK Cookbook. My thanks to the reader who went out of their way to verify my mistakes and send in corrections.

In Lemmatizing words with WordNet, on page 29, under How it works…, I said that “cooking” is not a noun and does not have a lemma. In fact, cooking is a noun, and as such is its own lemma. Of course, “cooking” is also a verb, and the verb form has the lemma “cook”.

In Removing repeating characters, on page 35, under How it works…, I explained the repeat_regexp match groups incorrectly. The actual match grouping of the word “looooove” is (looo)(o)o(ve) because the pattern matching is greedy. The end result is still correct.

  • Sean Upton

    Out of curiousity, how does Packt handle errata in e-books? Do they integrate corrections and such into future output for folks who bought in ebook format?

    Unrelated note: I emailed Packt customer support about leading-indentation and multiple adjoining spaces being stripped in ePub format for code samples of your book and other Python books they publish. I am under the impression that they sent my note to whoever is responsible for writing output filters for epub/html format output.

  • Jacob Perkins

    Hi Sean – I think Packt integrates corrections, but I’m not sure. I’m afraid I can’t help with the ePub formatting, though you can download the code at

  • Sean Upton

    Thanks. I am enjoying the book so far. It is a nice complement to the O’Reilly NLTK book.

  • Pingback: Tweets that mention Python Text Processing with NLTK Cookbook Chapter 2 Errata | --

  • Deoren

    Hi Sean,

    I asked them before and they do NOT incorporate errata into their ebooks. Instead you have to use the original ebook + keep a separate page of errata handy for reading. Manning Publications is the same way.

    Here is a list for those that are curious:

  • Jacob Perkins

    The errata has now been officially posted at

  • Skyheights

    Hi Jacob,
    Really enjoying and appreciating the book. Ran into this error message on p 59 bottom.
    >>> from nltk.corpus.reader import CategorizedPlaintextCorpusReader
    >>> reader = CategorizedPlaintextCorpusReader(‘.’, r’movie_.*.txt’, cat_pattern=r’movie_(w+).txt’)
    >>> reader.categories()
    [‘neg’, ‘pos’]
    >>> reader.fileids(categories=[‘neg’])
    Traceback (most recent call last):
    File “”, line 1, in
    File “/usr/lib/python2.5/site-packages/nltk/corpus/reader/”, line 354, in fileids
    return sorted(set.union(*[self._c2f[/c] for c in categories]))
    TypeError: union() takes exactly one argument (0 given)

    Removing the [] surrounding ‘neg’ fixed it. Just thought you’d want to know. It’s the first errata I’ve encountered as I’ve worked thus far.

  • Jacob Perkins

    What version of NLTK do you have? In 2.0b9, you should be able to pass a list as the categories argument, but that may have been a recent change.