I've given a few talks & presentations recently, so for anyone that doesn't follow japerk on twitter, here are some links:
- Weotta's MongoDB presentation from Tuesday, Feb 21 at the SF MongoDB meetup
- Corpus Bootstrapping with NLTK from Tuesday, Feb 28, during the Deep Data session at Strata
- PyCon NLTK Tutorial code from Thursday, March 8 at PyCon 2012
I also want to recommend 2 books that helped me mentally prepare for these talks:
At the end of February and the beginning of March, I'll be giving 3 talks in the SF Bay Area and one in St Louis, MO. In chronological order...
How Weotta uses MongoDB
Grant and I will be helping 10gen celebrate the opening of their new San Francisco office on Tuesday, February 21, by talking about
How Weotta uses MongoDB. We'll cover some of our favorite features of MongoDB and how we use it for local place & events search. Then we'll finish with a preview of Weotta's upcoming MongoDB powered local search APIs.
NLTK Jam Session at NICAR 2012
On Thursday, February 23, in St Louis, MO, I'll be demonstrating how to use NLTK as part of the NewsCamp workshop at NICAR 2012. This will be a version of my PyCon NLTK Tutorial with a focus on news text and corpora like treebank.
Corpus Bootstrapping with NLTK at Strata 2012
As part of the Strata 2012 Deep Data program, I'll talk about Corpus Bootstrapping with NLTK on Tuesday, February 28. The premise of this talk is that while there's plenty of great algorithms and methods for natural language processing, most of them require a training corpus, and chances are the training corpus you really need doesn't exist. So how can you quickly create a quality corpus at minimal cost? I'll cover specific real-world examples to answer this question.
NLTK Tutorial at PyCon 2012
Introduction to NLTK will be a 3 hour tutorial at PyCon on Thursday, March 8th. You'll get to know NLTK in depth, learn about corpus organization, and train your own models manually & with nltk-trainer. My goal is that you'll walk out with at least one new NLP superpower that you can put to use immediately.
For those that missed it, my company, Weotta, launched at TechCrunch Disrupt NY 2011. The experience was at turns exciting, stressful, and fun. We met many cool people (like the teams from Skylines and Rexly) and had some delicious food at restaurants like Song, Fatty Crab, and Momofuku.
On the first day, we gave our demo, and I nearly swore on stage when I saw the big red X's that you get when the Google Static Maps API rate limits your IP address. I had checked before the session started, and everything seemed okay, but I guess you can't escape Murphy's Law (especially when hundreds of people are sharing the same IP address). Afterwards, we immediately scrambled to get the site ready to allow people in. So many people were sharing our beta invite link on Facebook that the Weotta Facebook App was temporarily disabled due to unusual behavior. Luckily, our excellent advisor Mike Hart connected us with some great people at the Facebook API team, and they quickly got us back online.
The next day we discovered, and quickly fixed, an inaccurate geocode that was causing certain plans not to generate. Then I found out that anonymized facebook emails are much longer than Django's 75 character default EmailField
max_length. Not wanting to do a database migration while so many people were using the site, I waited until getting back home to fix this issue. But despite these small problems, hundreds of people were able to get in to Weotta, make plans, and discover fun things to do with their friends.
Weotta has been running smoothly ever since, and now that the conference craziness is over, we can start focusing on our #1 feedback: when will Weotta be in my city? We got requests for everywhere from Chicago and Denver to Sydney and Singapore. We hear you, and will be expanding outside of SF and NY as fast we can. While our methods are very algorithmic and we don't depend on UGC, it still takes human effort to give you focused, localized, highly relevant content so you can easily discover and plan amazing occasions. And if you'd like to help us expand and improve Weotta, get in touch. On the technical side, we're looking for at least 2 developers: a crawler/content person familiar with Scrapy, and a Django/jQuery web developer. If you're interested, contact me on github, LinkedIn, or directly at firstname.lastname@example.org.
We hope that everyone who signed up for the beta has received their invite; if you haven't (or want one), then you can signup for Weotta here (only a limited number will get in). And if you want to learn more about Weotta, then check out the Weotta press coverage. Weotta currently covers San Francisco and New York, so if you're interested in a "personal concierge" like service that can provide recommended plans/itineraries of things to do in a city, then signup for weotta here.