<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments for StreamHacker</title>
	<atom:link href="http://streamhacker.com/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://streamhacker.com</link>
	<description>Weotta be Hacking</description>
	<lastBuildDate>Wed, 08 Feb 2012 15:44:00 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com" />
	<atom:link rel="hub" href="http://superfeedr.com/hubbub" />
		<item>
		<title>Comment on Text Classification for Sentiment Analysis &#8211; Stopwords and Collocations by Jacob Perkins</title>
		<link>http://streamhacker.com/2010/05/24/text-classification-sentiment-analysis-stopwords-collocations/comment-page-1/#comment-944</link>
		<dc:creator>Jacob Perkins</dc:creator>
		<pubDate>Wed, 08 Feb 2012 15:44:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=1227#comment-944</guid>
		<description>words is not explicitly defined above, but it&#039;s a function parameter that is expected to be a list of strings. featx is also a function parameter, but it&#039;s expected to be a function that accepts words and returns a dict. This way, you can pass different featx functions to evaluate_classifier to see the different results.</description>
		<content:encoded><![CDATA[<p>words is not explicitly defined above, but it&#8217;s a function parameter that is expected to be a list of strings. featx is also a function parameter, but it&#8217;s expected to be a function that accepts words and returns a dict. This way, you can pass different featx functions to evaluate_classifier to see the different results.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Text Classification for Sentiment Analysis &#8211; Stopwords and Collocations by Fredrik</title>
		<link>http://streamhacker.com/2010/05/24/text-classification-sentiment-analysis-stopwords-collocations/comment-page-1/#comment-943</link>
		<dc:creator>Fredrik</dc:creator>
		<pubDate>Wed, 08 Feb 2012 14:37:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=1227#comment-943</guid>
		<description>I am quite new to Python, and some parts of the code seems more or less magic to me... I have understood that functions are just ordinary objects/values in Python and I guess that this is the trick, but can you explain or suggest a good link for explaining how the following parts of the code work? The name word_feats seems to be bounded to the function word_feats, but what is words bound too? I guess it is bound to featx through function evaluate_classifier, but I really don&#039;t get how featx is assigned a value in 
negfeats = [(featx(movie_reviews.words(fileids=[f])), &#039;neg&#039;) for f in negids] (to me it looks like featx is a function here, but I guess it is not? I guess that I should do some basic reading about Python, but any clarification would be helpful.

</description>
		<content:encoded><![CDATA[<p>I am quite new to Python, and some parts of the code seems more or less magic to me&#8230; I have understood that functions are just ordinary objects/values in Python and I guess that this is the trick, but can you explain or suggest a good link for explaining how the following parts of the code work? The name word_feats seems to be bounded to the function word_feats, but what is words bound too? I guess it is bound to featx through function evaluate_classifier, but I really don&#8217;t get how featx is assigned a value in <br />
negfeats = [(featx(movie_reviews.words(fileids=[f])), &#8216;neg&#8217;) for f in negids] (to me it looks like featx is a function here, but I guess it is not? I guess that I should do some basic reading about Python, but any clarification would be helpful.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Text Classification for Sentiment Analysis &#8211; Naive Bayes Classifier by &#187; A Text Analysis of Supreme Court Oral Arguments jarv.org</title>
		<link>http://streamhacker.com/2010/05/10/text-classification-sentiment-analysis-naive-bayes-classifier/comment-page-1/#comment-942</link>
		<dc:creator>&#187; A Text Analysis of Supreme Court Oral Arguments jarv.org</dc:creator>
		<pubDate>Sun, 05 Feb 2012 22:47:34 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=1180#comment-942</guid>
		<description>[...] For more information about sentiment analysis there is some good information here and in these two articles. Applying this to oral arguments? Well let&#8217;s leave it as just one way to look at [...]</description>
		<content:encoded><![CDATA[<p>[...] For more information about sentiment analysis there is some good information here and in these two articles. Applying this to oral arguments? Well let&#8217;s leave it as just one way to look at [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Fuzzy String Matching in Python by Jacob Perkins</title>
		<link>http://streamhacker.com/2011/10/31/fuzzy-string-matching-python/comment-page-1/#comment-940</link>
		<dc:creator>Jacob Perkins</dc:creator>
		<pubDate>Sun, 25 Dec 2011 15:55:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=1815#comment-940</guid>
		<description>I&#039;ve never heard of anyone doing that, and I&#039;m not sure how well it would work, because you can&#039;t necessarily break a string down into pieces and match those pieces to match the whole string. For larger strings that might make sense, but the smaller the string, the more exact you should be.</description>
		<content:encoded><![CDATA[<p>I&#8217;ve never heard of anyone doing that, and I&#8217;m not sure how well it would work, because you can&#8217;t necessarily break a string down into pieces and match those pieces to match the whole string. For larger strings that might make sense, but the smaller the string, the more exact you should be.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Fuzzy String Matching in Python by Christopher Stoll</title>
		<link>http://streamhacker.com/2011/10/31/fuzzy-string-matching-python/comment-page-1/#comment-939</link>
		<dc:creator>Christopher Stoll</dc:creator>
		<pubDate>Sun, 25 Dec 2011 05:46:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=1815#comment-939</guid>
		<description>You might be able to replace these techniques (normalization, regex, etc.) with a single dynamic programming algorithm, no? I guess it probably depends upon what you are trying to accomplish though.</description>
		<content:encoded><![CDATA[<p>You might be able to replace these techniques (normalization, regex, etc.) with a single dynamic programming algorithm, no? I guess it probably depends upon what you are trying to accomplish though.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Text Classification for Sentiment Analysis &#8211; Stopwords and Collocations by ???? ?????? &#171; ?????</title>
		<link>http://streamhacker.com/2010/05/24/text-classification-sentiment-analysis-stopwords-collocations/comment-page-1/#comment-938</link>
		<dc:creator>???? ?????? &#171; ?????</dc:creator>
		<pubDate>Thu, 01 Dec 2011 16:21:37 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=1227#comment-938</guid>
		<description>[...] 1. NLTK: http://streamhacker.com/2010/05/24/text-classification-sentiment-analysis-stopwords-collocations/ [...]</description>
		<content:encoded><![CDATA[<p>[...] 1. NLTK: http://streamhacker.com/2010/05/24/text-classification-sentiment-analysis-stopwords-collocations/ [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Text Classification for Sentiment Analysis &#8211; Stopwords and Collocations by Schillermika</title>
		<link>http://streamhacker.com/2010/05/24/text-classification-sentiment-analysis-stopwords-collocations/comment-page-1/#comment-937</link>
		<dc:creator>Schillermika</dc:creator>
		<pubDate>Mon, 21 Nov 2011 05:54:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=1227#comment-937</guid>
		<description>thanks...def need to polish up my python skills</description>
		<content:encoded><![CDATA[<p>thanks&#8230;def need to polish up my python skills</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Text Classification for Sentiment Analysis &#8211; Stopwords and Collocations by Jacob Perkins</title>
		<link>http://streamhacker.com/2010/05/24/text-classification-sentiment-analysis-stopwords-collocations/comment-page-1/#comment-936</link>
		<dc:creator>Jacob Perkins</dc:creator>
		<pubDate>Mon, 21 Nov 2011 04:52:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=1227#comment-936</guid>
		<description>featuresets = [(bag_of_words([word.lower() for word in sent]), label) for (sentence, label) in raw_dataset]

or

raw_dataset = [([word.lower() for word in sentence], &quot;physics&quot;) for sentence in physics.sents()]</description>
		<content:encoded><![CDATA[<p>featuresets = [(bag_of_words([word.lower() for word in sent]), label) for (sentence, label) in raw_dataset]</p>
<p>or</p>
<p>raw_dataset = [([word.lower() for word in sentence], &#8220;physics&#8221;) for sentence in physics.sents()]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Text Classification for Sentiment Analysis &#8211; Stopwords and Collocations by Schillermika</title>
		<link>http://streamhacker.com/2010/05/24/text-classification-sentiment-analysis-stopwords-collocations/comment-page-1/#comment-935</link>
		<dc:creator>Schillermika</dc:creator>
		<pubDate>Mon, 21 Nov 2011 04:39:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=1227#comment-935</guid>
		<description>Here&#039;s the problem I&#039;m having. I&#039;ll use a test corpus I play around with to demonstrate. So, first, the corpus reader object and bag of words function

physics_corpus = LazyCorpusLoader(&#039;cookbook&#039;, PlaintextCorpusReader, [&#039;physics.txt&#039;])
def bag_of_words(sentence):	return dict([(word, True) for word in sentence])


Then I label the training data

raw_dataset = [(sentence, &quot;physics&quot;) for sentence in physics.sents()]

I would have preferred that raw_dataset be this instead:

raw_dataset2 = [(word.lower(), &quot;physics&quot;) for word in physics.words()]

But the problem is that if I use raw_dataset2 to create my featuresets to train the classifier like this:

featuresets = [(bag_of_words(word), label) for (word, label) in raw_dataset2]

Then I get this:

[({&#039;h&#039;: True, &#039;e&#039;: True, &#039;T&#039;: True}, &#039;physics&#039;), ({&#039;a&#039;: True, &#039;c&#039;: True, &#039;i&#039;: True, &#039;h&#039;: True, &#039;l&#039;: True, &#039;p&#039;: True, &#039;s&#039;: True, &#039;y&#039;: True}, &#039;physics&#039;)

Not what I want. But with plain old raw_dataset:

raw_dataset = [(sentence, &quot;physics&quot;) for sentence in physics.sents()]

featuresets = [(bag_of_words(sentence), label) for (sentence, label) in raw_dataset]

It returns whole words as I want:

({&#039;and&#039;: True, &#039;distances&#039;: True, &#039;scales&#039;: True, &#039;subatomic&#039;: True, &#039;over&#039;: True, &#039;challenges&#039;: True, &#039;meters&#039;: True}, &#039;physics&#039;)

So my dilemma is that I&#039;m stuck with physics.sents() so that bag_of_words returns whole words rather than letters. But I can&#039;t lowercase sentences so a list comprehension like [word.lower() for word in physics.sents()] is not an option. And that&#039;s why I put word.lower() in the bag_of_words() function.  I&#039;m having trouble seeing where I can apply word.lower() . I tried converting raw_dataset to a string so I could lowercase the words and then convert back to a list, but I should have known it&#039;s inane. Any insights?

thnx</description>
		<content:encoded><![CDATA[<p>Here&#8217;s the problem I&#8217;m having. I&#8217;ll use a test corpus I play around with to demonstrate. So, first, the corpus reader object and bag of words function</p>
<p>physics_corpus = LazyCorpusLoader(&#8216;cookbook&#8217;, PlaintextCorpusReader, ['physics.txt'])<br />
def bag_of_words(sentence):	return dict([(word, True) for word in sentence])</p>
<p>Then I label the training data</p>
<p>raw_dataset = [(sentence, "physics") for sentence in physics.sents()]</p>
<p>I would have preferred that raw_dataset be this instead:</p>
<p>raw_dataset2 = [(word.lower(), "physics") for word in physics.words()]</p>
<p>But the problem is that if I use raw_dataset2 to create my featuresets to train the classifier like this:</p>
<p>featuresets = [(bag_of_words(word), label) for (word, label) in raw_dataset2]</p>
<p>Then I get this:</p>
<p>[({'h': True, 'e': True, 'T': True}, 'physics'), ({'a': True, 'c': True, 'i': True, 'h': True, 'l': True, 'p': True, 's': True, 'y': True}, 'physics')</p>
<p>Not what I want. But with plain old raw_dataset:</p>
<p>raw_dataset = [(sentence, "physics") for sentence in physics.sents()]</p>
<p>featuresets = [(bag_of_words(sentence), label) for (sentence, label) in raw_dataset]</p>
<p>It returns whole words as I want:</p>
<p>({&#8216;and&#8217;: True, &#8216;distances&#8217;: True, &#8216;scales&#8217;: True, &#8216;subatomic&#8217;: True, &#8216;over&#8217;: True, &#8216;challenges&#8217;: True, &#8216;meters&#8217;: True}, &#8216;physics&#8217;)</p>
<p>So my dilemma is that I&#8217;m stuck with physics.sents() so that bag_of_words returns whole words rather than letters. But I can&#8217;t lowercase sentences so a list comprehension like [word.lower() for word in physics.sents()] is not an option. And that&#8217;s why I put word.lower() in the bag_of_words() function.  I&#8217;m having trouble seeing where I can apply word.lower() . I tried converting raw_dataset to a string so I could lowercase the words and then convert back to a list, but I should have known it&#8217;s inane. Any insights?</p>
<p>thnx</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Text Classification for Sentiment Analysis &#8211; Stopwords and Collocations by Jacob Perkins</title>
		<link>http://streamhacker.com/2010/05/24/text-classification-sentiment-analysis-stopwords-collocations/comment-page-1/#comment-934</link>
		<dc:creator>Jacob Perkins</dc:creator>
		<pubDate>Mon, 21 Nov 2011 01:10:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=1227#comment-934</guid>
		<description>You should remove word.lower() from bag_of_words(), and instead lowercase everything yourself. The best way to do this would be to lowercase every word in the sentence first, before finding bigrams or calling bag_of_words(). This is a simple list comprehension, like sentence = [word.lower() for word in sentence]</description>
		<content:encoded><![CDATA[<p>You should remove word.lower() from bag_of_words(), and instead lowercase everything yourself. The best way to do this would be to lowercase every word in the sentence first, before finding bigrams or calling bag_of_words(). This is a simple list comprehension, like sentence = [word.lower() for word in sentence]</p>
]]></content:encoded>
	</item>
</channel>
</rss>

