StreamHacker Weotta be Hacking

2Aug/105

Announcing Python NLTK Demos

If you want to see what NLTK can do, but don't want to go thru the effort of installation and learning how to use it, then check out my Python NLTK demos.

It currently demonstrates the following functionality:

If you like it, please share it. If you want to see more, leave a comment below. And if you are interested in a service that could apply these processes to your own data, please fill out this NLTK services survey.

Other Natural Language Processing Demos

Here's a list of similar resources on the web:

  • Pingback: Tweets that mention Announcing Python NLTK Demos «streamhacker.com -- Topsy.com

  • Pingback: Tweets that mention Announcing Python NLTK Demos «streamhacker.com -- Topsy.com

  • phun

    Where do you find the “ACE corpus” that you used?

  • http://streamhacker.com/ Jacob Perkins

    The ACE website is here: http://www.itl.nist.gov/iad/mig//tests/ace/
    I didn't train the default chunker, though, and am not sure how or where to get the corpus data.

  • phun

    Yeah, I looked up on the LDC website where it was distributed and it was $1000. I thought you might have found it elsewhere.

  • phun

    Are you just using the default commands:

    text = nltk.word_tokenize(sent)
    tagged = nltk.pos_tag(text)
    named_entities = nltk.ne_chunk(tagged)

    ?

    Running these takes my system (4 3.2Ghz core 8gb ram machine) over 3 seconds to run, while your site does it instantly.

  • http://streamhacker.com/ Jacob Perkins

    I do sent_tokenize, word_tokenize, then batch_pos_tag and batch_ne_chunk. The tagging & chunking should only be slow the first time, as the default tagger and chunker are pickle files that have to be loaded first. But once they're loaded, subsequent calls should be much faster. So I just make sure to pre-load them any time I restart the django server.

%d bloggers like this: