Python Text Processing with NLTK Cookbook Chapter 2 Errata

It has come to my attention that there are two errors in Chapter 2, Replacing and Correcting Words of Python Text Processing with NLTK Cookbook. My thanks to the reader who went out of their way to verify my mistakes and send in corrections.

In Lemmatizing words with WordNet, on page 29, under How it works…, I said that “cooking” is not a noun and does not have a lemma. In fact, cooking is a noun, and as such is its own lemma. Of course, “cooking” is also a verb, and the verb form has the lemma “cook”.

In Removing repeating characters, on page 35, under How it works…, I explained the repeat_regexp match groups incorrectly. The actual match grouping of the word “looooove” is (looo)(o)o(ve) because the pattern matching is greedy. The end result is still correct.

  • Out of curiousity, how does Packt handle errata in e-books? Do they integrate corrections and such into future output for folks who bought in ebook format?

    Unrelated note: I emailed Packt customer support about leading-indentation and multiple adjoining spaces being stripped in ePub format for code samples of your book and other Python books they publish. I am under the impression that they sent my note to whoever is responsible for writing output filters for epub/html format output.

  • Hi Sean – I think Packt integrates corrections, but I’m not sure. I’m afraid I can’t help with the ePub formatting, though you can download the code at

  • Thanks. I am enjoying the book so far. It is a nice complement to the O’Reilly NLTK book.

  • Pingback: Tweets that mention Python Text Processing with NLTK Cookbook Chapter 2 Errata | --

  • Hi Sean,

    I asked them before and they do NOT incorporate errata into their ebooks. Instead you have to use the original ebook + keep a separate page of errata handy for reading. Manning Publications is the same way.

    Here is a list for those that are curious:

  • The errata has now been officially posted at

  • Skyheights

    Hi Jacob,
    Really enjoying and appreciating the book. Ran into this error message on p 59 bottom.
    >>> from nltk.corpus.reader import CategorizedPlaintextCorpusReader
    >>> reader = CategorizedPlaintextCorpusReader(‘.’, r’movie_.*.txt’, cat_pattern=r’movie_(w+).txt’)
    >>> reader.categories()
    [‘neg’, ‘pos’]
    >>> reader.fileids(categories=[‘neg’])
    Traceback (most recent call last):
    File “”, line 1, in
    File “/usr/lib/python2.5/site-packages/nltk/corpus/reader/”, line 354, in fileids
    return sorted(set.union(*[self._c2f[/c] for c in categories]))
    TypeError: union() takes exactly one argument (0 given)

    Removing the [] surrounding ‘neg’ fixed it. Just thought you’d want to know. It’s the first errata I’ve encountered as I’ve worked thus far.

  • What version of NLTK do you have? In 2.0b9, you should be able to pass a list as the categories argument, but that may have been a recent change.

  • jamal ahmed

    hey jacob i am working o correcting noun error’s…can u suggest me a good book..or do u have any code for that?

  • I’m not sure what you mean by noun errors. Spelling correction? Part of speech tagging?

  • jamal ahmed

    POS tagging

  • Take a look at my answer here, for how to override POS tags: