<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Execnet vs Disco for Distributed NLTK</title>
	<atom:link href="http://streamhacker.com/2009/12/14/execnet-disco-distributed-nltk/feed/" rel="self" type="application/rss+xml" />
	<link>http://streamhacker.com/2009/12/14/execnet-disco-distributed-nltk/#utm_source=feed&#038;utm_medium=feed&#038;utm_campaign=feed</link>
	<description>Weotta be Hacking</description>
	<lastBuildDate>Thu, 19 Apr 2012 12:53:00 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com" />
	<atom:link rel="hub" href="http://superfeedr.com/hubbub" />
		<item>
		<title>By: Hadoop MapReduce</title>
		<link>http://streamhacker.com/2009/12/14/execnet-disco-distributed-nltk/comment-page-1/#comment-951</link>
		<dc:creator>Hadoop MapReduce</dc:creator>
		<pubDate>Thu, 19 Apr 2012 12:53:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=765#comment-951</guid>
		<description>Nice Post dude.. Keep it up.</description>
		<content:encoded><![CDATA[<p>Nice Post dude.. Keep it up.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jacob Perkins</title>
		<link>http://streamhacker.com/2009/12/14/execnet-disco-distributed-nltk/comment-page-1/#comment-641</link>
		<dc:creator>Jacob Perkins</dc:creator>
		<pubDate>Thu, 09 Sep 2010 02:46:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=765#comment-641</guid>
		<description>Thanks for the comment. I&#039;m not really a fan of visual programming tools, but I have been meaning to checkout stackless. Could be a good combination with execnet, where execnet handles the distribution, and stackless handles processing on each node.</description>
		<content:encoded><![CDATA[<p>Thanks for the comment. I&#8217;m not really a fan of visual programming tools, but I have been meaning to checkout stackless. Could be a good combination with execnet, where execnet handles the distribution, and stackless handles processing on each node.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eric Gaumer</title>
		<link>http://streamhacker.com/2009/12/14/execnet-disco-distributed-nltk/comment-page-1/#comment-640</link>
		<dc:creator>Eric Gaumer</dc:creator>
		<pubDate>Tue, 07 Sep 2010 02:28:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=765#comment-640</guid>
		<description>Great blog for NTLK users. Lot&#039;s of helpful insight. We do a lot of NLP work as a precursor to indexing content for search. It&#039;s pretty typical for us to process tens of millions of documents so we&#039;ve designed a highly scalable pipeline framework that allows you to build document processing clusters.

In terms of the problem you&#039;ve described here, we&#039;re using stackless Python to model pipeline stages (referred to as Components). Since tasklets are true co-routines, they&#039;re instantiated once and run forever making them ideal for loading data models and then using those models across huge volumes of data.

Check it out, it might fit into your needs at some point.

http://www.pypes.org/

A simple example of writing a component.

http://bitbucket.org/diji/pypes/wiki/Reverse_Field

We also provide a visual designer that allows you graphically design data flow graphs from the components you write.

http://bitbucket.org/diji/pypes/wiki/Screenshots</description>
		<content:encoded><![CDATA[<p>Great blog for NTLK users. Lot&#8217;s of helpful insight. We do a lot of NLP work as a precursor to indexing content for search. It&#8217;s pretty typical for us to process tens of millions of documents so we&#8217;ve designed a highly scalable pipeline framework that allows you to build document processing clusters.</p>
<p>In terms of the problem you&#8217;ve described here, we&#8217;re using stackless Python to model pipeline stages (referred to as Components). Since tasklets are true co-routines, they&#8217;re instantiated once and run forever making them ideal for loading data models and then using those models across huge volumes of data.</p>
<p>Check it out, it might fit into your needs at some point.</p>
<p><a href="http://www.pypes.org/" rel="nofollow">http://www.pypes.org/</a></p>
<p>A simple example of writing a component.</p>
<p><a href="http://bitbucket.org/diji/pypes/wiki/Reverse_Field" rel="nofollow">http://bitbucket.org/diji/pypes/wiki/Reverse_Field</a></p>
<p>We also provide a visual designer that allows you graphically design data flow graphs from the components you write.</p>
<p><a href="http://bitbucket.org/diji/pypes/wiki/Screenshots" rel="nofollow">http://bitbucket.org/diji/pypes/wiki/Screenshots</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jacob</title>
		<link>http://streamhacker.com/2009/12/14/execnet-disco-distributed-nltk/comment-page-1/#comment-299</link>
		<dc:creator>Jacob</dc:creator>
		<pubDate>Sat, 19 Dec 2009 20:16:58 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=765#comment-299</guid>
		<description>That&#039;s what I thought at first too, but it turns out the Params object (and objects attached to it) are unpickled before every map call. And when unpickling takes 2 seconds, it&#039;s prohibitively expensive.</description>
		<content:encoded><![CDATA[<p>That&#8217;s what I thought at first too, but it turns out the Params object (and objects attached to it) are unpickled before every map call. And when unpickling takes 2 seconds, it&#8217;s prohibitively expensive.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jacob Perkins</title>
		<link>http://streamhacker.com/2009/12/14/execnet-disco-distributed-nltk/comment-page-1/#comment-693</link>
		<dc:creator>Jacob Perkins</dc:creator>
		<pubDate>Sat, 19 Dec 2009 20:16:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=765#comment-693</guid>
		<description>That&#039;s what I thought at first too, but it turns out the Params object (and objects attached to it) are unpickled before every map call. And when unpickling takes 2 seconds, it&#039;s prohibitively expensive.</description>
		<content:encoded><![CDATA[<p>That&#8217;s what I thought at first too, but it turns out the Params object (and objects attached to it) are unpickled before every map call. And when unpickling takes 2 seconds, it&#8217;s prohibitively expensive.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Justin</title>
		<link>http://streamhacker.com/2009/12/14/execnet-disco-distributed-nltk/comment-page-1/#comment-298</link>
		<dc:creator>Justin</dc:creator>
		<pubDate>Sat, 19 Dec 2009 19:45:03 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=765#comment-298</guid>
		<description>Can&#039;t you just use this to cache the object?

http://discoproject.org/doc/faq.html#how-to-maintain-state-across-many-map-reduce-calls</description>
		<content:encoded><![CDATA[<p>Can&#8217;t you just use this to cache the object?</p>
<p><a href="http://discoproject.org/doc/faq.html#how-to-maintain-state-across-many-map-reduce-calls" rel="nofollow">http://discoproject.org/doc/faq.html#how-to-maintain-state-across-many-map-reduce-calls</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Justin</title>
		<link>http://streamhacker.com/2009/12/14/execnet-disco-distributed-nltk/comment-page-1/#comment-692</link>
		<dc:creator>Justin</dc:creator>
		<pubDate>Sat, 19 Dec 2009 19:45:00 +0000</pubDate>
		<guid isPermaLink="false">http://streamhacker.com/?p=765#comment-692</guid>
		<description>Can&#039;t you just use this to cache the object?

http://discoproject.org/doc/faq.html#how-to-maintain-state-across-many-map-reduce-calls</description>
		<content:encoded><![CDATA[<p>Can&#8217;t you just use this to cache the object?</p>
<p><a href="http://discoproject.org/doc/faq.html#how-to-maintain-state-across-many-map-reduce-calls" rel="nofollow">http://discoproject.org/doc/faq.html#how-to-maintain-state-across-many-map-reduce-calls</a></p>
]]></content:encoded>
	</item>
</channel>
</rss>

