<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Text Classification for Sentiment Analysis &#8211; Eliminate Low Information Features</title>
	<atom:link href="http://streamhacker.com/2010/06/16/text-classification-sentiment-analysis-eliminate-low-information-features/feed/" rel="self" type="application/rss+xml" />
	<link>http://streamhacker.com/2010/06/16/text-classification-sentiment-analysis-eliminate-low-information-features/#utm_source=feed&#038;utm_medium=feed&#038;utm_campaign=feed</link>
	<description>Weotta be Hacking</description>
	<lastBuildDate>Sun, 05 Feb 2012 22:47:34 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com" />
	<atom:link rel="hub" href="http://superfeedr.com/hubbub" />
		<item>
		<title>By: Alex Leykin</title>
		<link>http://streamhacker.com/2010/06/16/text-classification-sentiment-analysis-eliminate-low-information-features/comment-page-1/#comment-912</link>
		<dc:creator>Alex Leykin</dc:creator>
		<pubDate>Wed, 12 Oct 2011 21:39:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=1246#comment-912</guid>
		<description>Yes, this seems like a lookahead bias here. Bestwords should be computed only on the train set (first 3/4 of the reviews)</description>
		<content:encoded><![CDATA[<p>Yes, this seems like a lookahead bias here. Bestwords should be computed only on the train set (first 3/4 of the reviews)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sriram</title>
		<link>http://streamhacker.com/2010/06/16/text-classification-sentiment-analysis-eliminate-low-information-features/comment-page-1/#comment-900</link>
		<dc:creator>Sriram</dc:creator>
		<pubDate>Sun, 18 Sep 2011 05:19:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=1246#comment-900</guid>
		<description>Thanks a lot Jacob.Let me try it out.</description>
		<content:encoded><![CDATA[<p>Thanks a lot Jacob.Let me try it out.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jacob Perkins</title>
		<link>http://streamhacker.com/2010/06/16/text-classification-sentiment-analysis-eliminate-low-information-features/comment-page-1/#comment-899</link>
		<dc:creator>Jacob Perkins</dc:creator>
		<pubDate>Sat, 17 Sep 2011 16:53:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=1246#comment-899</guid>
		<description>You can average the probabilities from the NaiveBayes and Maxent classifiers, but DecisionTree doesn&#039;t support prob_classify, so you&#039;ll have to hardcode a check and use 100% or 0% in your averaging.</description>
		<content:encoded><![CDATA[<p>You can average the probabilities from the NaiveBayes and Maxent classifiers, but DecisionTree doesn&#8217;t support prob_classify, so you&#8217;ll have to hardcode a check and use 100% or 0% in your averaging.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sriram</title>
		<link>http://streamhacker.com/2010/06/16/text-classification-sentiment-analysis-eliminate-low-information-features/comment-page-1/#comment-898</link>
		<dc:creator>Sriram</dc:creator>
		<pubDate>Sat, 17 Sep 2011 12:47:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=1246#comment-898</guid>
		<description>Hi Jacob,
              Iam trying to implement prob_classify() method in the MaxVoteClassifier class which combines three classifiers(naive,decision,max ent) and does voting.I added the defenition of prob_classify to the MaxVoteClassifier class.But not sure how to pass feature_probdist to the MaxVoteClassifier class.Appreciating your help.</description>
		<content:encoded><![CDATA[<p>Hi Jacob,<br />
              Iam trying to implement prob_classify() method in the MaxVoteClassifier class which combines three classifiers(naive,decision,max ent) and does voting.I added the defenition of prob_classify to the MaxVoteClassifier class.But not sure how to pass feature_probdist to the MaxVoteClassifier class.Appreciating your help.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sriram</title>
		<link>http://streamhacker.com/2010/06/16/text-classification-sentiment-analysis-eliminate-low-information-features/comment-page-1/#comment-874</link>
		<dc:creator>Sriram</dc:creator>
		<pubDate>Mon, 22 Aug 2011 11:40:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=1246#comment-874</guid>
		<description>Thanks a lot Jacob.I already had separate classifiers.But didnt know how to use the Prob_classify() to calculate the positive and negative scores.It worked now.Thanks once again.</description>
		<content:encoded><![CDATA[<p>Thanks a lot Jacob.I already had separate classifiers.But didnt know how to use the Prob_classify() to calculate the positive and negative scores.It worked now.Thanks once again.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jacob Perkins</title>
		<link>http://streamhacker.com/2010/06/16/text-classification-sentiment-analysis-eliminate-low-information-features/comment-page-1/#comment-873</link>
		<dc:creator>Jacob Perkins</dc:creator>
		<pubDate>Sat, 20 Aug 2011 15:52:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=1246#comment-873</guid>
		<description>You&#039;d need to train separate classifiers, one for each method, then instead of using the classify() function, call prob_classify() to get a ProbDist, which you can use to get the probability of each label.</description>
		<content:encoded><![CDATA[<p>You&#8217;d need to train separate classifiers, one for each method, then instead of using the classify() function, call prob_classify() to get a ProbDist, which you can use to get the probability of each label.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sriram</title>
		<link>http://streamhacker.com/2010/06/16/text-classification-sentiment-analysis-eliminate-low-information-features/comment-page-1/#comment-872</link>
		<dc:creator>Sriram</dc:creator>
		<pubDate>Sat, 20 Aug 2011 14:40:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=1246#comment-872</guid>
		<description>Hi,
   Thanks a lot for the all information.For a given test string for example &quot;This is a great movie&quot; if i need to measure the confidence level of classification of the above three methods(best words features,bigram etc) how do i do that?So that i can determine which output to take when i get different classification results from each method. Thanks once again for all the knowledge sharing.</description>
		<content:encoded><![CDATA[<p>Hi,<br />
   Thanks a lot for the all information.For a given test string for example &#8220;This is a great movie&#8221; if i need to measure the confidence level of classification of the above three methods(best words features,bigram etc) how do i do that?So that i can determine which output to take when i get different classification results from each method. Thanks once again for all the knowledge sharing.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jacob Perkins</title>
		<link>http://streamhacker.com/2010/06/16/text-classification-sentiment-analysis-eliminate-low-information-features/comment-page-1/#comment-811</link>
		<dc:creator>Jacob Perkins</dc:creator>
		<pubDate>Mon, 14 Mar 2011 16:04:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=1246#comment-811</guid>
		<description>Thanks for that paper reference, looks interesting.

Perhaps it would be more correct to only use the training set for info gain stats, but I see those stats as a kind of global filter that allows you to determine significant features beforehand (as you can often do in non-text classification). The exact numbers aren&#039;t used directly, but the relative values are used to determine which words are more significant, and that significance should theoretically be similar in the training set, testing set, and both combined. And that assumption appears to be correct based on your tests.</description>
		<content:encoded><![CDATA[<p>Thanks for that paper reference, looks interesting.</p>
<p>Perhaps it would be more correct to only use the training set for info gain stats, but I see those stats as a kind of global filter that allows you to determine significant features beforehand (as you can often do in non-text classification). The exact numbers aren&#8217;t used directly, but the relative values are used to determine which words are more significant, and that significance should theoretically be similar in the training set, testing set, and both combined. And that assumption appears to be correct based on your tests.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Henrik Nordvik</title>
		<link>http://streamhacker.com/2010/06/16/text-classification-sentiment-analysis-eliminate-low-information-features/comment-page-1/#comment-809</link>
		<dc:creator>Henrik Nordvik</dc:creator>
		<pubDate>Mon, 14 Mar 2011 15:29:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=1246#comment-809</guid>
		<description>When calculating information gain, you&#039;re using the whole data set, including the test set. Aren&#039;t you supposed to use only data from the training set when training?

I tested using only the training set to gather statistics, but it didn&#039;t seem to affect the results much, just a tad lower accuracy.

Also, check out &quot;Delta TFIDF: An Improved Feature Space for Sentiment Analysis&quot; for a similar method.</description>
		<content:encoded><![CDATA[<p>When calculating information gain, you&#8217;re using the whole data set, including the test set. Aren&#8217;t you supposed to use only data from the training set when training?</p>
<p>I tested using only the training set to gather statistics, but it didn&#8217;t seem to affect the results much, just a tad lower accuracy.</p>
<p>Also, check out &#8220;Delta TFIDF: An Improved Feature Space for Sentiment Analysis&#8221; for a similar method.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Henrik Nordvik</title>
		<link>http://streamhacker.com/2010/06/16/text-classification-sentiment-analysis-eliminate-low-information-features/comment-page-1/#comment-810</link>
		<dc:creator>Henrik Nordvik</dc:creator>
		<pubDate>Mon, 14 Mar 2011 15:29:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=1246#comment-810</guid>
		<description>When calculating information gain, you&#039;re using the whole data set, including the test set. Aren&#039;t you supposed to use only data from the training set when training?

I tested using only the training set to gather statistics, but it didn&#039;t seem to affect the results much, just a tad lower accuracy.

Also, check out &quot;Delta TFIDF: An Improved Feature Space for Sentiment Analysis&quot; for a similar method.</description>
		<content:encoded><![CDATA[<p>When calculating information gain, you&#8217;re using the whole data set, including the test set. Aren&#8217;t you supposed to use only data from the training set when training?</p>
<p>I tested using only the training set to gather statistics, but it didn&#8217;t seem to affect the results much, just a tad lower accuracy.</p>
<p>Also, check out &#8220;Delta TFIDF: An Improved Feature Space for Sentiment Analysis&#8221; for a similar method.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

