Hierarchical Classification

Hierarchical classification is an obscure but simple concept. The idea is that you arrange two or more classifiers in a hierarchy such that the classifiers lower in the hierarchy are only used if a higher classifier returns an appropriate result.

For example, the text-processing.com sentiment analysis demo uses hierarchical classification by combining a subjectivity classifier and a polarity classifier. The subjectivity classifier is first, and determines whether the text is objective or subjective. If the text is objective, then a label of neutral is returned, and the polarity classifier is not used. However, if the text is subjective (or polar),  then the polarity classifier is used to determine if the text is positive or negative.

Hierarchical Sentiment Classification Model

Hierarchical classification is a useful way to combine multiple binary classifiers, if you have a hierarchy of labels that can modeled as a binary tree. In this model, each branch of the tree either continues on to a new pair of branches, or stops, and at each branching you use a classifier to determine which branch to take.

  • While “objective” text is plausibly conceived to be “neutral”, I wonder if this statement is bijective. A sentence where many polar words appear is determined to be “subjective”, but its overall sentiment might still be balanced, and therefore be “neutral”, like in “I love you so bad”. Do you agree?

    IMO there should also be an arrow from the subjective branch to the neutral in the sentiment taxonomy above.

    BTW, congratulations for you book!

  • You’re right that the taxonomy is not quite right, and subjective text can be neutral (and objective text is not always neutral). But as I don’t know of a source for “neutral” movie text, I had to figure out a way to use the categories available to provide a “neutral” label. And because of the way the text is categorized, binary classifiers are the most accurate, forcing the need for a hierarchy. At some point I’ll revisit this method, as it has plenty of obvious flaws 🙂 (not least is its terrible handling of phrases like “not great”).

  • Ben

    Another good way to leverage many binary classifiers is round-robin voting, where a classifier is trained to decide between every two possible classes. http://jmlr.csail.mit.edu/papers/volume2/fuernkranz02a/html/

  • Thanks for the link Ben, looks like a great explanation of the technique.

  • John

    Have you posted any example for implementing a hierarchical classification?

  • John

    …what I meant was some example source code.

  • No, but here’s a simple example that’s similar to how http://text-processing.com/demo/sentiment/ works:

    label = level1_classifier.classify(feats)

    if label == “level2”:
    label = level2_classifier.classify(feats)

  • Jon

    Which corpus did you use to train the subjectivity classifier?

  • The subjectivity dataset from http://www.cs.cornell.edu/people/pabo/movie-review-data/

  • Praveen Gr

    Where can i get python code for Hierarchical classification ?

  • Praveen Gr

    where can i get the code for this … please help me

  • The code is simply 2 classifiers with an if statement, as described in the second paragraph above.

  • Praveen Gr

    thanks Jacob… How to find polarity of sentence ? I am completely new to this field.. Can you please help with some python code to find polarity or even a link for an example would be a gr8 help….

  • There’s a lot of code examples in my sentiment classification series of articles, starting with http://streamhacker.com/2010/05/10/text-classification-sentiment-analysis-naive-bayes-classifier/

  • Praveen Gr

    For most of the sentences, subjectivity classifier gives result as objective. I downloaded this code from http://www.jaist.ac.jp/~s1010205/sentiment_classifier/#online-demo/

    Is this working properly ? or should i use any other classifier ? please help me

  • max

    Interesting idea. How does stacking subjectivity classifier up front affect precision/recall on sentiment?

  • It doesn’t change the individual classifier measurements, but it does cause error propagation when the subjectivity classifier gets it wrong. I’m not sure if there’s a standard measurement for that, but I do know it’s an inherent issue in any data pipeline.

  • akram

    is it possible to use a hierarchical classification setup that consists of a sequence of two cascading SVM classifiers using word and pattern features , and how i can implement this in java or weka tool ?

  • I’m sure it’s possible, but I can’t help with java or weka. You may want to look at OpenNLP for java.

  • akram

    thank you , can u help me to get the codes on python?

  • Hierarchical classification is really just one or more if statements based on rules for which classifier to use & when. Once you have your classifiers, you figure out how to put them together in the right way to get the results you expect. It should only be a few lines of code, but I’m not sure anyone else can do that for you.

  • Eduan

    Jaboc, thank you for the great examples! Do you perhaps have sample python code how one would implement subjective classification? As I understand from literature, it is best to do this first, then filter out all objectives tweets and then use the remaining corpus for sentiment analysis.

  • Eduan

    Apologies, meant to say Jacob.

  • Hi Eduan,

    I used the train_classifier.py script from https://github.com/japerk/nltk-trainer on the subjectivity dataset from https://www.cs.cornell.edu/people/pabo/movie-review-data/ to add subjectivity to http://text-processing.com/demo/sentiment/