<?xml version="1.0" encoding="UTF-8"?><rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
> <channel><title>Comments on: Execnet vs Disco for Distributed NLTK</title> <atom:link href="http://streamhacker.com/2009/12/14/execnet-disco-distributed-nltk/feed/" rel="self" type="application/rss+xml" /><link>http://streamhacker.com/2009/12/14/execnet-disco-distributed-nltk/#utm_source=feed&amp;utm_medium=feed&amp;utm_campaign=feed</link> <description>Weotta be Hacking</description> <lastBuildDate>Fri, 10 Sep 2010 02:31:00 +0000</lastBuildDate> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.0.1</generator> <atom:link rel="hub" href="http://pubsubhubbub.appspot.com" /> <atom:link rel="hub" href="http://superfeedr.com/hubbub" /> <item><title>By: Jacob Perkins</title><link>http://streamhacker.com/2009/12/14/execnet-disco-distributed-nltk/comment-page-1/#comment-641</link> <dc:creator>Jacob Perkins</dc:creator> <pubDate>Thu, 09 Sep 2010 02:46:00 +0000</pubDate> <guid
isPermaLink="false">http://streamhacker.com/?p=765#comment-641</guid> <description>Thanks for the comment. I&#039;m not really a fan of visual programming tools, but I have been meaning to checkout stackless. Could be a good combination with execnet, where execnet handles the distribution, and stackless handles processing on each node.</description> <content:encoded><![CDATA[<p>Thanks for the comment. I&#8217;m not really a fan of visual programming tools, but I have been meaning to checkout stackless. Could be a good combination with execnet, where execnet handles the distribution, and stackless handles processing on each node.</p> ]]></content:encoded> </item> <item><title>By: Eric Gaumer</title><link>http://streamhacker.com/2009/12/14/execnet-disco-distributed-nltk/comment-page-1/#comment-640</link> <dc:creator>Eric Gaumer</dc:creator> <pubDate>Tue, 07 Sep 2010 02:28:00 +0000</pubDate> <guid
isPermaLink="false">http://streamhacker.com/?p=765#comment-640</guid> <description>Great blog for NTLK users. Lot&#039;s of helpful insight. We do a lot of NLP work as a precursor to indexing content for search. It&#039;s pretty typical for us to process tens of millions of documents so we&#039;ve designed a highly scalable pipeline framework that allows you to build document processing clusters.In terms of the problem you&#039;ve described here, we&#039;re using stackless Python to model pipeline stages (referred to as Components). Since tasklets are true co-routines, they&#039;re instantiated once and run forever making them ideal for loading data models and then using those models across huge volumes of data.Check it out, it might fit into your needs at some point.http://www.pypes.org/A simple example of writing a component.http://bitbucket.org/diji/pypes/wiki/Reverse_FieldWe also provide a visual designer that allows you graphically design data flow graphs from the components you write.http://bitbucket.org/diji/pypes/wiki/Screenshots</description> <content:encoded><![CDATA[<p>Great blog for NTLK users. Lot&#8217;s of helpful insight. We do a lot of NLP work as a precursor to indexing content for search. It&#8217;s pretty typical for us to process tens of millions of documents so we&#8217;ve designed a highly scalable pipeline framework that allows you to build document processing clusters.</p><p>In terms of the problem you&#8217;ve described here, we&#8217;re using stackless Python to model pipeline stages (referred to as Components). Since tasklets are true co-routines, they&#8217;re instantiated once and run forever making them ideal for loading data models and then using those models across huge volumes of data.</p><p>Check it out, it might fit into your needs at some point.</p><p><a
href="http://www.pypes.org/" rel="nofollow">http://www.pypes.org/</a></p><p>A simple example of writing a component.</p><p><a
href="http://bitbucket.org/diji/pypes/wiki/Reverse_Field" rel="nofollow">http://bitbucket.org/diji/pypes/wiki/Reverse_Field</a></p><p>We also provide a visual designer that allows you graphically design data flow graphs from the components you write.</p><p><a
href="http://bitbucket.org/diji/pypes/wiki/Screenshots" rel="nofollow">http://bitbucket.org/diji/pypes/wiki/Screenshots</a></p> ]]></content:encoded> </item> <item><title>By: Jacob</title><link>http://streamhacker.com/2009/12/14/execnet-disco-distributed-nltk/comment-page-1/#comment-299</link> <dc:creator>Jacob</dc:creator> <pubDate>Sat, 19 Dec 2009 20:16:58 +0000</pubDate> <guid
isPermaLink="false">http://streamhacker.com/?p=765#comment-299</guid> <description>That&#039;s what I thought at first too, but it turns out the Params object (and objects attached to it) are unpickled before every map call. And when unpickling takes 2 seconds, it&#039;s prohibitively expensive.</description> <content:encoded><![CDATA[<p>That&#8217;s what I thought at first too, but it turns out the Params object (and objects attached to it) are unpickled before every map call. And when unpickling takes 2 seconds, it&#8217;s prohibitively expensive.</p> ]]></content:encoded> </item> <item><title>By: Justin</title><link>http://streamhacker.com/2009/12/14/execnet-disco-distributed-nltk/comment-page-1/#comment-298</link> <dc:creator>Justin</dc:creator> <pubDate>Sat, 19 Dec 2009 19:45:03 +0000</pubDate> <guid
isPermaLink="false">http://streamhacker.com/?p=765#comment-298</guid> <description>Can&#039;t you just use this to cache the object?http://discoproject.org/doc/faq.html#how-to-maintain-state-across-many-map-reduce-calls</description> <content:encoded><![CDATA[<p>Can&#8217;t you just use this to cache the object?</p><p><a
href="http://discoproject.org/doc/faq.html#how-to-maintain-state-across-many-map-reduce-calls" rel="nofollow">http://discoproject.org/doc/faq.html#how-to-maintain-state-across-many-map-reduce-calls</a></p> ]]></content:encoded> </item> </channel> </rss>
<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk
Page Caching using disk (enhanced) (user agent is rejected)

Served from: streamhacker.com @ 2010-09-10 05:07:51 -->