Text Classification for Sentiment Analysis – Naive Bayes Classifier

Sentiment analysis is becoming a popular area of research and social media analysis, especially around user reviews and tweets. It is a special case of text mining generally focused on identifying opinion polarity, and while it’s often not very accurate, it can still be useful. For simplicity (and because the training data is easily accessible) I’ll focus on 2 possible sentiment classifications: positive and negative.

NLTK Naive Bayes Classification

NLTK comes with all the pieces you need to get started on sentiment analysis: a movie reviews corpus with reviews categorized into pos and neg categories, and a number of trainable classifiers. We’ll start with a simple NaiveBayesClassifier as a baseline, using boolean word feature extraction.

Bag of Words Feature Extraction

All of the NLTK classifiers work with featstructs, which can be simple dictionaries mapping a feature name to a feature value. For text, we’ll use a simplified bag of words model where every word is feature name with a value of True. Here’s the feature extraction method:

def word_feats(words):
		return dict([(word, True) for word in words])

Training Set vs Test Set and Accuracy

The movie reviews corpus has 1000 positive files and 1000 negative files. We’ll use 3/4 of them as the training set, and the rest as the test set. This gives us 1500 training instances and 500 test instances. The classifier training method expects to be given a list of tokens in the form of [(feats, label)] where feats is a feature dictionary and label is the classification label. In our case, feats will be of the form {word: True} and label will be one of ‘pos’ or ‘neg’. For accuracy evaluation, we can use nltk.classify.util.accuracy with the test set as the gold standard.

Training and Testing the Naive Bayes Classifier

Here’s the complete python code for training and testing a Naive Bayes Classifier on the movie review corpus.

import nltk.classify.util
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import movie_reviews

def word_feats(words):
	return dict([(word, True) for word in words])

negids = movie_reviews.fileids('neg')
posids = movie_reviews.fileids('pos')

negfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'neg') for f in negids]
posfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'pos') for f in posids]

negcutoff = len(negfeats)*3/4
poscutoff = len(posfeats)*3/4

trainfeats = negfeats[:negcutoff] + posfeats[:poscutoff]
testfeats = negfeats[negcutoff:] + posfeats[poscutoff:]
print 'train on %d instances, test on %d instances' % (len(trainfeats), len(testfeats))

classifier = NaiveBayesClassifier.train(trainfeats)
print 'accuracy:', nltk.classify.util.accuracy(classifier, testfeats)

And the output is:

train on 1500 instances, test on 500 instances
accuracy: 0.728
Most Informative Features
         magnificent = True              pos : neg    =     15.0 : 1.0
         outstanding = True              pos : neg    =     13.6 : 1.0
           insulting = True              neg : pos    =     13.0 : 1.0
          vulnerable = True              pos : neg    =     12.3 : 1.0
           ludicrous = True              neg : pos    =     11.8 : 1.0
              avoids = True              pos : neg    =     11.7 : 1.0
         uninvolving = True              neg : pos    =     11.7 : 1.0
          astounding = True              pos : neg    =     10.3 : 1.0
         fascination = True              pos : neg    =     10.3 : 1.0
             idiotic = True              neg : pos    =      9.8 : 1.0

As you can see, the 10 most informative features are, for the most part, highly descriptive adjectives. The only 2 words that seem a bit odd are “vulnerable” and “avoids”. Perhaps these words refer to important plot points or character development that signify a good movie. Whatever the case, with simple assumptions and very little code we’re able to get almost 73% accuracy. This is somewhat near human accuracy, as apparently people agree on sentiment only around 80% of the time. Future articles in this series will cover precision & recall metrics, alternative classifiers, and techniques for improving accuracy.

  • anonymous

    can u tell me how do we write a cosine similarity for these reviews when we r creating a dictionary of these features

  • It’s not exactly cosine similarity, but I wrote about using information to eliminate low information features at http://streamhacker.com/2010/06/16/text-classification-sentiment-analysis-eliminate-low-information-features/.

  • consider i have a sentence that contain multiple sense word and using sentiwordnet i am getting multiple score of respective word then how can i calcualte positive or negative .can you elobrate ??

  • consider i have a sentence that contain multiple sense word and using sentiwordnet i am getting multiple score of respective word then how can i calcualte positive or negative .can you elobrate ?? if there is any way to do plz. i am not able to use sentiwordnet due to this reason???plz

  • You need to look into Word Sense Disambiguation: https://en.wikipedia.org/wiki/Word_sense_disambiguation

  • Dear Jacob, thank you for such a great intro to NLTK. I am reading your book closely too to get a better understanding of text analysis. I would like to know if there is already a pre existing corpus for news (tv transcript or print) which has been classified by positive and negative. I would like to do some sentiment analysis of tv news transcripts and wanted to start from an existing database before I create my own classifications (as a first pass).

  • Hi Amar,

    I don’t know of any news sentiment corpus, but you might want to look into “corpus bootstrapping”, which is a way to create your own custom corpus based on existing corpora and/or models. Here’s a presentation I gave on the topic: http://www.slideshare.net/japerk/corpus-bootstrapping-with-nltk

  • Dear Jacob,

    I have tried working with the code you wrote in your book but get stuck on one point which I am not sure why will not execute. When I try to run the negation replacer, I get the following message:

    ‘AntonymReplacer’ object has no attribute ‘replace_negations’

    I am sure I have copied everything exactly as your code.


  • On Page 42, the AntonymReplacer class is defined with 2 methods: replace & replace_negations. Based on the error message, you either did not define the replace_negations method, or defined it incorrectly.

  • chaoprokia

    is it possible for me to use to train 3 classes?

    negids = movie_reviews.fileids(‘neg’)

    posids = movie_reviews.fileids(‘pos’)

    neuIds = movie_review.fileids(‘neu’0

  • Sure, NLTK classifiers work with any number of classes, but most classifiers tend to get less accurate as you go beyond 2 classes.

  • Praveen Gr

    Can anyone help me with Sentiment Analysis code or link which gives considerably good result ?

  • tarik setia

    How can i use nltk to calculate a priori probabilities and probability of each word in the feature?

  • The probability module has many useful functions & classes for calculating probabilities: http://nltk.org/api/nltk.html#module-nltk.probability

  • Pingback: GPS Trivia | -(Lab *) oneTwoClick()

  • Rojin


    I m developing a new application using NLTK. I want to classify mails into different buckets such as query, feedback etc. So I wish to learn more about nltk. How I can develop an application? suggest some tutorials or links.

  • Rojin

    hi thanks, but this is not I meant. I have gone through these links. What I need is NLTK example codes with tutorial. for example sentiment analysis with different conditions. Please do reply.

  • Every article in my Text Classification for Sentiment Analysis series has example code for sentiment analysis. The NLTK book and my book both have many NLTK example codes for all sorts of different uses. I can’t help you more than that without a much more specific question.

  • Andres

    I tried with pickle but, once I have reload/unpickle the classifier, it classifies wrong. I used the following code to save and load the files

    def save_fichB(fich1,data):
    f = open(fich, ‘wb’)
    pickle.dump(data, f)

    def load_fichB(fich1):
    f= open(fich, ‘rb’)
    data = pickle.load(f)
    return data

    I’m stuck on it!

  • Not sure how to diagnose this. Maybe code has changed? Or the classifier didn’t classify correctly in the first place?

  • Rojin

    Hello, I am developing a mail classification system using NLP. I have developed a classifier with Naive Bayer’s algorithm. The problem I m facing now is classification of a single mail to different categories. Suppose one mail has three category information and Naive Bayer’s allow to classify one text to on category. How I can classify a text into multiple categories. Please help me.

  • What you need to do is train a classifier for each category. Every classifier should have 2 logical labels: yes or no. Then to classify in multiple categories, you run each classifier over the text and keep the categories where the classifier label is yes. This technique is called “multiple binary classifiers”.

  • Rojin

    If it is possible please give an example python code for this. So that I can understand this properly. Thanks

  • I’ve implemented this in train_classifier.py from https://github.com/japerk/nltk-trainer. Use the options –multi –binary to train on a corpus with multiple labels.

  • ruby

    Hi, this is a great tutorial! one thing I was going to add, for improvement is that the training and test data should really be randomised, where here I don’t think it is. One way I would do this is just to randomise the lists of neg and pos feats before cutting off the list accordingly. By doing this, you could then repeat the tests multiple times to apply cross-validation

  • Rojin

    Hi, for multi class classification I have trained each classifier with logic labels ‘yes’ or ‘no’. Then I ran all classifiers on the same text. But its giving not a better accuracy. I used skleran (scikit) for the classification. Please guide me how I can improve the accuracy.

  • There’s very little advice I can offer when I know nothing about the data. Try different algorithms. Try filtering the features using information gain: http://streamhacker.com/2010/06/16/text-classification-sentiment-analysis-eliminate-low-information-features/. Try using different features.

  • venkatesh M

    Mr . Jacob , if you don’t mind, can you kindly let me how to get the data set (for eg: some tweets from twitter) so that i can use the data set for doing my project work, kindly help me , (i am newly started to work on this) requesting you to me…

  • venkatesh M

    Hello Mr. Kellegher, can you kindly let me know how you have received the corpus of total 1850 files, requesting you to reply me…


  • I recommend the movie_reviews corpus that comes with NLTK. It’s very simple to work with, because NLTK already has corpus reader for reading the file contents in various ways. http://nltk.org/data.html

  • Anon92115

    Hi, thank you very much for this great site you have! I am building a sentiment classifier for product reviews..so,can you please direct me to some resource where I can find the training set for product reviews?
    Thank you!

  • There’s a corpus of Amazon reviews online somewhere. I don’t remember where, but try searching for “product review corpus” or “review sentiment”.

  • Anon92115

    Thank you!

  • Naveed

    I have one more question, I have the training data as a whole in one file and I have five different categories that are labelled as positive and negative examples in the training data.

    In order to train a classifier on these five categories I need to create a five files (pos and neg) for each category and then train classifier on each categories using pos and neg examples. Is this the right way??

  • Yes, if you want to train a classifier for each category, then the simplest thing to do is create separate pos & neg training files for each category.

  • rwanda

    Hi! How can we determine the cases of classification failure? And besides, I would be interested in also applying SVM and Random Forest for text classification, how can this purpose be achieved?

  • Naveed

    Thanks for the reply Jacob. one more thing I want to ask is that while preparing separate pos and neg training files for each category, I will treat all the sentences where category is not mentioned as positive or negative then I should treat those sentences as neutral. As I have positive, negative and neutral labels. Is this right?

  • That sounds right

  • Selva Saravanakumar

    Hi.. I’m using both nltk NaiveBayesClassifier and SklearnClassifier for classification of sentences. Is there is a way to find which is the best classification. For eg: If i give “You are looking not so great” , one is classifying it as “Positive” and other as “Negative”. I just want to know which is correct, because i will automate for more the 2k data where manual checking is tedious.


  • If you can’t do manual checking, then you could at least look the class probabilities, and choose the most probable. Or combine the probabilities, and choose based on that. But if you two classifiers disagreeing, you may want to save those disagreements separately from the text where both agree, then use those disagreements, along with manual classifications, to update/fix your training data & classifiers.

  • Selva Saravanakumar

    Thanks for the reply.

    I was able to find the probability of classification for NaiveBayesClassifier, but not for SklearnClassifier.
    So, I decided to stick with NaiveBayesClassifier whose accuracy is more than SklearnClassifier.

  • geetika

    can you please tell me if i have a twitter dataset then how can i find accuracy,recall of the test data of twitter ? here is the movie review import is using but how can i use it?

  • You need a training set in order to get accuracy, precisions, and recall. If your twitter dataset is already categorized, then that’s what you compare against. If it’s not, then you need to construct a test set in order to calculate metrics.

  • Vandana

    Hi Jacob,
    i am trying to use NLTK for hindi language tagging and it works fine with the given corpus(hindi.pos in Indian corpus) in nltk but when i want to tag my sentences in hindi from the hindi language websites, the taging is wrong, i mean tags are not correct. Please provide any help if possible.

  • Text in the wild is often much different and harder to tag than a training corpus. The most reliable fix I know of is to create your own tagged corpus and train a new tagger on it.

  • Manjunath Nadagouda N

    We are working on Hindi Data (Indian language) using NLTK tool. i want to know how to create our own tagged corpus,is there any link which explain the steps. Thanks in Advance.

  • I think my book may help you, there’s an entire chapter about working with and creating custom corpora. I also recommend looking at the included NLTK corpora, specifically the indian corpus. This is a tagged corpus for various Indian languages, and if you use the same format, you can use the IndianCorpusReader included in NLTK.

  • Manjunath Nadagouda N

    I will see that chapter jacob.thanks.

  • Vandana

    Why i am getiing the following error while calculating accuracy for my program:

    Traceback (most recent call last):
    File “/home/vandana/Desktop/tweet.py”, line 49, in
    print nltk.classify.util.accuracy(classifier, test_tweets)
    File “/usr/local/lib/python2.7/dist-packages/nltk/classify/util.py”, line 85, in accuracy
    results = classifier.batch_classify([fs for (fs,l) in gold])
    File “/usr/local/lib/python2.7/dist-packages/nltk/classify/api.py”, line 77, in batch_classify
    return [self.classify(fs) for fs in featuresets]
    File “/usr/local/lib/python2.7/dist-packages/nltk/classify/naivebayes.py”, line 88, in classify
    return self.prob_classify(featureset).max()
    File “/usr/local/lib/python2.7/dist-packages/nltk/classify/naivebayes.py”, line 94, in prob_classify
    featureset = featureset.copy()
    AttributeError: ‘str’ object has no attribute ‘copy’