StreamHacker Weotta be Hacking

5Jan/1117

Hierarchical Classification

Hierarchical classification is an obscure but simple concept. The idea is that you arrange two or more classifiers in a hierarchy such that the classifiers lower in the hierarchy are only used if a higher classifier returns an appropriate result.

For example, the text-processing.com sentiment analysis demo uses hierarchical classification by combining a subjectivity classifier and a polarity classifier. The subjectivity classifier is first, and determines whether the text is objective or subjective. If the text is objective, then a label of neutral is returned, and the polarity classifier is not used. However, if the text is subjective (or polar),  then the polarity classifier is used to determine if the text is positive or negative.

Hierarchical Sentiment Classification Model

Hierarchical classification is a useful way to combine multiple binary classifiers, if you have a hierarchy of labels that can modeled as a binary tree. In this model, each branch of the tree either continues on to a new pair of branches, or stops, and at each branching you use a classifier to determine which branch to take.

  • http://www.salle.url.edu/~atrilla/ atrilla

    While “objective” text is plausibly conceived to be “neutral”, I wonder if this statement is bijective. A sentence where many polar words appear is determined to be “subjective”, but its overall sentiment might still be balanced, and therefore be “neutral”, like in “I love you so bad”. Do you agree?

    IMO there should also be an arrow from the subjective branch to the neutral in the sentiment taxonomy above.

    BTW, congratulations for you book!

  • http://streamhacker.com/ Jacob Perkins

    You’re right that the taxonomy is not quite right, and subjective text can be neutral (and objective text is not always neutral). But as I don’t know of a source for “neutral” movie text, I had to figure out a way to use the categories available to provide a “neutral” label. And because of the way the text is categorized, binary classifiers are the most accurate, forcing the need for a hierarchy. At some point I’ll revisit this method, as it has plenty of obvious flaws :) (not least is its terrible handling of phrases like “not great”).

  • Ben

    Another good way to leverage many binary classifiers is round-robin voting, where a classifier is trained to decide between every two possible classes. http://jmlr.csail.mit.edu/papers/volume2/fuernkranz02a/html/

  • http://streamhacker.com/ Jacob Perkins

    Thanks for the link Ben, looks like a great explanation of the technique.

  • John

    Have you posted any example for implementing a hierarchical classification?

  • John

    …what I meant was some example source code.

  • http://streamhacker.com/ Jacob Perkins

    No, but here’s a simple example that’s similar to how http://text-processing.com/demo/sentiment/ works:

    label = level1_classifier.classify(feats)

    if label == “level2″:
    label = level2_classifier.classify(feats)

  • Jon

    Which corpus did you use to train the subjectivity classifier?

  • http://streamhacker.com/ Jacob Perkins

    The subjectivity dataset from http://www.cs.cornell.edu/people/pabo/movie-review-data/

  • Praveen Gr

    Where can i get python code for Hierarchical classification ?

  • Praveen Gr

    where can i get the code for this … please help me

  • http://streamhacker.com/ Jacob Perkins

    The code is simply 2 classifiers with an if statement, as described in the second paragraph above.

  • Praveen Gr

    thanks Jacob… How to find polarity of sentence ? I am completely new to this field.. Can you please help with some python code to find polarity or even a link for an example would be a gr8 help….

  • http://streamhacker.com/ Jacob Perkins

    There’s a lot of code examples in my sentiment classification series of articles, starting with http://streamhacker.com/2010/05/10/text-classification-sentiment-analysis-naive-bayes-classifier/

  • Praveen Gr

    For most of the sentences, subjectivity classifier gives result as objective. I downloaded this code from http://www.jaist.ac.jp/~s1010205/sentiment_classifier/#online-demo/

    Is this working properly ? or should i use any other classifier ? please help me

  • max

    Interesting idea. How does stacking subjectivity classifier up front affect precision/recall on sentiment?

  • http://streamhacker.com/ Jacob Perkins

    It doesn’t change the individual classifier measurements, but it does cause error propagation when the subjectivity classifier gets it wrong. I’m not sure if there’s a standard measurement for that, but I do know it’s an inherent issue in any data pipeline.

%d bloggers like this: