<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>James' Geek Blog</title>
	<atom:link href="http://djeems.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://djeems.wordpress.com</link>
	<description>Who needs a tagline when the title says it all.</description>
	<lastBuildDate>Mon, 30 May 2011 07:29:32 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='djeems.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>James' Geek Blog</title>
		<link>http://djeems.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://djeems.wordpress.com/osd.xml" title="James&#039; Geek Blog" />
	<atom:link rel='hub' href='http://djeems.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Extracting a relevant sentence</title>
		<link>http://djeems.wordpress.com/2007/02/13/extracting-a-relevant-sentence/</link>
		<comments>http://djeems.wordpress.com/2007/02/13/extracting-a-relevant-sentence/#comments</comments>
		<pubDate>Tue, 13 Feb 2007 11:59:28 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[Web Development]]></category>

		<guid isPermaLink="false">http://djeems.wordpress.com/2007/02/13/extracting-a-relevant-sentence/</guid>
		<description><![CDATA[Say you&#8217;ve built a PHP / MySQL FULL-TEXT search functionality, and want to display a relevant sentence for each result, in a way like Google does. I&#8217;ve written a function that does just that, and thought I&#8217;d share it here. The function takes a multiple sentence text string as input, and a search string. The [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=djeems.wordpress.com&amp;blog=594928&amp;post=18&amp;subd=djeems&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Say you&#8217;ve built a PHP / MySQL FULL-TEXT search functionality, and want to display a relevant sentence for each result, in a way like Google does. I&#8217;ve written a function that does just that, and thought I&#8217;d share it here.</p>
<p>The function takes a multiple sentence text string as input, and a search string. The text is then broken up into sentences with a minimum specified character length (as to avoid any sentences that merely consist of &#8216;Mr.&#8217;, &#8216;Hi!&#8217;, or &#8216;What?&#8217; <img src='http://s1.wp.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  ). The first sentence that matches the search string, will then be returned. If no match is found, simply the first sentence of the text will be returned.</p>
<p>Although it&#8217;s not perfect, this function generally does the job quite nicely. Download <a href='http://djeems.files.wordpress.com/2007/02/extract_sentencephp.txt' title='extract_sentencephp.txt'>extract_sentencephp.txt</a> here.</p>
<p>For more information on building your own search functionality, have a look at this <a href="http://www.onlamp.com/pub/a/onlamp/2003/06/26/fulltext.html">MySQL FULLTEXT Searching tutorial</a>, and of course at <a href="http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html">the official documentation</a>.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/djeems.wordpress.com/18/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/djeems.wordpress.com/18/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/djeems.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/djeems.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/djeems.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/djeems.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/djeems.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/djeems.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/djeems.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/djeems.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/djeems.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/djeems.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/djeems.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/djeems.wordpress.com/18/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/djeems.wordpress.com/18/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/djeems.wordpress.com/18/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=djeems.wordpress.com&amp;blog=594928&amp;post=18&amp;subd=djeems&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://djeems.wordpress.com/2007/02/13/extracting-a-relevant-sentence/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/078149ea5ec5e517d175e290e3b9d54d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">djeems</media:title>
		</media:content>
	</item>
		<item>
		<title>Lightbox</title>
		<link>http://djeems.wordpress.com/2007/01/20/lightbox/</link>
		<comments>http://djeems.wordpress.com/2007/01/20/lightbox/#comments</comments>
		<pubDate>Sat, 20 Jan 2007 17:20:51 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Web 2.0]]></category>
		<category><![CDATA[Web Design]]></category>

		<guid isPermaLink="false">http://djeems.wordpress.com/2007/01/20/lightbox/</guid>
		<description><![CDATA[Lightbox is a very useful JavaScript implementation that allows you to overlay images on the current page in Web 2.0 style. Implementing it is easy: you simply include three JavaScript files and one CSS file in the &#60;head&#62;&#60;/head&#62; section of the pages you want to use it on. After that, add a rel="lightbox" attribute to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=djeems.wordpress.com&amp;blog=594928&amp;post=16&amp;subd=djeems&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.huddletogether.com/projects/lightbox2/">Lightbox</a> is a very useful JavaScript implementation that allows you to overlay images on the current page in Web 2.0 style. Implementing it is easy: you simply include three JavaScript files and one CSS file in the <code>&lt;head&gt;&lt;/head&gt;</code> section of the pages you want to use it on. After that, add a <code>rel="lightbox"</code> attribute to the links you want to use Lightbox on, and you&#8217;re done!</p>
<p>Initially I was worried about the size of these included files together, being around 70 kB. But if you deliver your JavaScript and CSS files gzipped to the requesting browsers, the total file size is only 17 kB. A number I find quite acceptable, especially considering <a href="http://www.fuzzytravel.com/about.php#screenshots">the cool effects you get in return</a>.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/djeems.wordpress.com/16/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/djeems.wordpress.com/16/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/djeems.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/djeems.wordpress.com/16/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/djeems.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/djeems.wordpress.com/16/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/djeems.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/djeems.wordpress.com/16/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/djeems.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/djeems.wordpress.com/16/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/djeems.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/djeems.wordpress.com/16/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/djeems.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/djeems.wordpress.com/16/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/djeems.wordpress.com/16/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/djeems.wordpress.com/16/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=djeems.wordpress.com&amp;blog=594928&amp;post=16&amp;subd=djeems&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://djeems.wordpress.com/2007/01/20/lightbox/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/078149ea5ec5e517d175e290e3b9d54d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">djeems</media:title>
		</media:content>
	</item>
		<item>
		<title>Geotagging the blogosphere</title>
		<link>http://djeems.wordpress.com/2007/01/14/geotagging-the-blogosphere/</link>
		<comments>http://djeems.wordpress.com/2007/01/14/geotagging-the-blogosphere/#comments</comments>
		<pubDate>Sun, 14 Jan 2007 22:20:01 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Information Retrieval]]></category>

		<guid isPermaLink="false">http://djeems.wordpress.com/2007/01/14/geotagging-the-blogosphere/</guid>
		<description><![CDATA[If you were to identify and pinpoint blog posts&#8217; location data (i.e. mentioned locations in blog posts), a process known as geotagging or geocoding, these visualizations are the result. In the first and fourth picture, the color represents the popularity of a location (red is highest frequency). In the second and third picture, the size [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=djeems.wordpress.com&amp;blog=594928&amp;post=15&amp;subd=djeems&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>If you were to identify and pinpoint blog posts&#8217; location data (i.e. mentioned locations in blog posts), a process known as <a href="http://en.wikipedia.org/wiki/GeoTagging">geotagging</a> or <a href="http://en.wikipedia.org/wiki/Geocoding">geocoding</a>, these visualizations are the result. In the first and fourth picture, the color represents the popularity of a location (red is highest frequency). In the second and third picture, the size of the square represents the popularity of a location.</p>
<p>The original data set consisted of about 800 randomly chosen blogs with around 80.000 posts. Geotagging was done with <a href="http://sws.clearforest.com/">ClearForest</a>, <a href="http://maps.google.com/">Google Maps</a> and <a href="http://www.world-gazetteer.com/">World Gazetteer</a>.</p>
<p><a href='http://djeems.files.wordpress.com/2007/01/world.gif' title='world.gif'><img src='http://djeems.files.wordpress.com/2007/01/world.gif?w=450' alt='world.gif' /></a></p>
<p><a href='http://djeems.files.wordpress.com/2007/01/europe-2.gif' title='europe-2.gif'><img src='http://djeems.files.wordpress.com/2007/01/europe-2.gif?w=450' alt='europe-2.gif' /></a></p>
<p><a href='http://djeems.files.wordpress.com/2007/01/united-states-2.gif' title='united-states-2.gif'><img src='http://djeems.files.wordpress.com/2007/01/united-states-2.gif?w=450' alt='united-states-2.gif' /></a></p>
<p><a href='http://djeems.files.wordpress.com/2007/01/new-york.gif' title='new-york.gif'><img src='http://djeems.files.wordpress.com/2007/01/new-york.gif?w=450' alt='new-york.gif' /></a></p>
<p>The meaning of these location data is &#8212; unfortunately &#8212; relatively arbitrary.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/djeems.wordpress.com/15/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/djeems.wordpress.com/15/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/djeems.wordpress.com/15/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/djeems.wordpress.com/15/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/djeems.wordpress.com/15/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/djeems.wordpress.com/15/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/djeems.wordpress.com/15/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/djeems.wordpress.com/15/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/djeems.wordpress.com/15/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/djeems.wordpress.com/15/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/djeems.wordpress.com/15/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/djeems.wordpress.com/15/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/djeems.wordpress.com/15/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/djeems.wordpress.com/15/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/djeems.wordpress.com/15/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/djeems.wordpress.com/15/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=djeems.wordpress.com&amp;blog=594928&amp;post=15&amp;subd=djeems&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://djeems.wordpress.com/2007/01/14/geotagging-the-blogosphere/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/078149ea5ec5e517d175e290e3b9d54d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">djeems</media:title>
		</media:content>

		<media:content url="http://djeems.files.wordpress.com/2007/01/world.gif" medium="image">
			<media:title type="html">world.gif</media:title>
		</media:content>

		<media:content url="http://djeems.files.wordpress.com/2007/01/europe-2.gif" medium="image">
			<media:title type="html">europe-2.gif</media:title>
		</media:content>

		<media:content url="http://djeems.files.wordpress.com/2007/01/united-states-2.gif" medium="image">
			<media:title type="html">united-states-2.gif</media:title>
		</media:content>

		<media:content url="http://djeems.files.wordpress.com/2007/01/new-york.gif" medium="image">
			<media:title type="html">new-york.gif</media:title>
		</media:content>
	</item>
		<item>
		<title>RSS Mime type</title>
		<link>http://djeems.wordpress.com/2007/01/06/rss-mime-type/</link>
		<comments>http://djeems.wordpress.com/2007/01/06/rss-mime-type/#comments</comments>
		<pubDate>Fri, 05 Jan 2007 23:23:03 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Web Development]]></category>

		<guid isPermaLink="false">http://djeems.wordpress.com/2007/01/06/rss-mime-type/</guid>
		<description><![CDATA[What&#8217;s the most correct MIME type for RSS feeds? For many developers this matter is a bit of a headache, as there are many differences with regard to application and browser support for application/rss+xml. For that reason, there seems to be a general preference for text/xml, as it&#8217;s the safest bet. So after reading Dave [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=djeems.wordpress.com&amp;blog=594928&amp;post=7&amp;subd=djeems&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>What&#8217;s the most correct <a href="http://en.wikipedia.org/wiki/Mime_type">MIME type</a> for RSS feeds? For many developers this matter is a bit of a headache, as there are many differences with regard to application and browser support for <code>application/rss+xml</code>. For that reason, there seems to be a general preference for <code>text/xml</code>, as it&#8217;s the safest bet. So after reading <a href="http://blogs.law.harvard.edu/crimson1/2004/05/06#a1519">Dave Winer&#8217;s advice on the matter</a>, I decided to use the <code>text/xml</code> MIME type for my website&#8217;s RSS feeds as well.</p>
<p>My view on the most correct MIME type has changed recently though. The cause of this is that &#8212;  for whatever reason &#8212; Google started ranking my website&#8217;s RSS feeds higher than the actual pages, which is obviously not very desirable, nor for the visitor, nor for me.</p>
<p>My initial response was to block Googlebot from indexing any of the RSS feeds via the robots.txt file. A bit later, realizing that this &#8216;solution&#8217; would also quite effectively erase all feeds from Google Blogsearch, I decided to switch to the <code>application/rss+xml</code> MIME type after all.</p>
<p>Luckily, the <a href="http://www.rssboard.org/news/53/vote-board-supports-rss-mime-type">RSS Advisory Board agrees</a> with my action.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/djeems.wordpress.com/7/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/djeems.wordpress.com/7/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/djeems.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/djeems.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/djeems.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/djeems.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/djeems.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/djeems.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/djeems.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/djeems.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/djeems.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/djeems.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/djeems.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/djeems.wordpress.com/7/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/djeems.wordpress.com/7/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/djeems.wordpress.com/7/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=djeems.wordpress.com&amp;blog=594928&amp;post=7&amp;subd=djeems&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://djeems.wordpress.com/2007/01/06/rss-mime-type/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/078149ea5ec5e517d175e290e3b9d54d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">djeems</media:title>
		</media:content>
	</item>
		<item>
		<title>Crazy Egg</title>
		<link>http://djeems.wordpress.com/2007/01/05/crazy-egg/</link>
		<comments>http://djeems.wordpress.com/2007/01/05/crazy-egg/#comments</comments>
		<pubDate>Fri, 05 Jan 2007 16:14:32 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Web Design]]></category>

		<guid isPermaLink="false">http://djeems.wordpress.com/2007/01/05/crazy-egg/</guid>
		<description><![CDATA[Crazy Egg is a wonderful free tool to visualize your visitors&#8217; clicking behavior on your website, and in my opinion a must for every web designer.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=djeems.wordpress.com&amp;blog=594928&amp;post=6&amp;subd=djeems&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://crazyegg.com/">Crazy Egg</a> is a wonderful free tool to visualize your visitors&#8217; clicking behavior on your website, and in my opinion a must for every web designer.</p>
<p><a href='http://djeems.files.wordpress.com/2007/01/crazyegg-fuzzytravel.jpg' title='crazyegg-fuzzytravel.jpg'><img src='http://djeems.files.wordpress.com/2007/01/crazyegg-fuzzytravel.jpg?w=450' alt='crazyegg-fuzzytravel.jpg' /></a></p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/djeems.wordpress.com/6/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/djeems.wordpress.com/6/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/djeems.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/djeems.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/djeems.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/djeems.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/djeems.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/djeems.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/djeems.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/djeems.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/djeems.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/djeems.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/djeems.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/djeems.wordpress.com/6/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/djeems.wordpress.com/6/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/djeems.wordpress.com/6/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=djeems.wordpress.com&amp;blog=594928&amp;post=6&amp;subd=djeems&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://djeems.wordpress.com/2007/01/05/crazy-egg/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/078149ea5ec5e517d175e290e3b9d54d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">djeems</media:title>
		</media:content>

		<media:content url="http://djeems.files.wordpress.com/2007/01/crazyegg-fuzzytravel.jpg" medium="image">
			<media:title type="html">crazyegg-fuzzytravel.jpg</media:title>
		</media:content>
	</item>
		<item>
		<title>Digitizing my CD collection</title>
		<link>http://djeems.wordpress.com/2007/01/05/digitizing-my-cd-collection/</link>
		<comments>http://djeems.wordpress.com/2007/01/05/digitizing-my-cd-collection/#comments</comments>
		<pubDate>Fri, 05 Jan 2007 13:29:37 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://djeems.wordpress.com/2007/01/05/digitizing-my-cd-collection/</guid>
		<description><![CDATA[With the recent acquisition of an external 250 GB hard disk, I&#8217;ve finally decided to take the plunge and digitize my entire audio CD collection. In this blog post I just wanted to share how I go about doing that. The tool I use for converting audio to digital is GoldWave (together with LAME). They [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=djeems.wordpress.com&amp;blog=594928&amp;post=4&amp;subd=djeems&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>With the recent acquisition of an external 250 GB hard disk, I&#8217;ve finally decided to take the plunge and digitize my entire audio CD collection. In this blog post I just wanted to share how I go about doing that.</p>
<p>The tool I use for converting audio to digital is <a href="http://www.goldwave.com/">GoldWave</a> (together with <a href="http://www.mp3dev.org/">LAME</a>). They provide a fully functional, free &#8216;try-out&#8217; version, that includes access to the wonderful <a href="http://www.goldwave.com/features.php#CDReader">CD Reader tool</a>, which can convert analog audio tracks to a digital format of choice.</p>
<p>I convert my CD tracks to <a href="http://en.wikipedia.org/wiki/Mp3">MP3</a> files (I think the choice for this format is quite obvious). In the settings I select a frequency of 44100 Hz, and a bitrate of 160 kbps, in stereo. These settings will lead to MP3 files that are of a good quality, yet will not become too large in size.</p>
<p>One thing I really love about the CD Reader tool is the &#8216;Get Titles&#8217; button. When clicking this, GoldWave will contact <a href="http://www.freedb.org/">freedb.org</a> and retrieve the correct album name and song titles, which &#8212; needless to say &#8212; saves you a lot of typing.</p>
<p>Going about it this way, converting any CD collection to MP3 is a relatively easy and quick process. An entire CD will be digitized in about 15-20 minutes and ends up at ~60 MB in file size.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/djeems.wordpress.com/4/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/djeems.wordpress.com/4/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/djeems.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/djeems.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/djeems.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/djeems.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/djeems.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/djeems.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/djeems.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/djeems.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/djeems.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/djeems.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/djeems.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/djeems.wordpress.com/4/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/djeems.wordpress.com/4/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/djeems.wordpress.com/4/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=djeems.wordpress.com&amp;blog=594928&amp;post=4&amp;subd=djeems&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://djeems.wordpress.com/2007/01/05/digitizing-my-cd-collection/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/078149ea5ec5e517d175e290e3b9d54d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">djeems</media:title>
		</media:content>
	</item>
		<item>
		<title>Fetching Wikipedia content</title>
		<link>http://djeems.wordpress.com/2006/12/21/fetching-wikipedia-content/</link>
		<comments>http://djeems.wordpress.com/2006/12/21/fetching-wikipedia-content/#comments</comments>
		<pubDate>Thu, 21 Dec 2006 19:40:00 +0000</pubDate>
		<dc:creator>James</dc:creator>
				<category><![CDATA[Information Retrieval]]></category>
		<category><![CDATA[PHP]]></category>

		<guid isPermaLink="false">http://djeems.wordpress.com/2006/12/21/fetching-wikipedia-content/</guid>
		<description><![CDATA[So here was the challenge. For my travel website I wanted to have a script that automatically retrieves snippets of information about destinations, so that they can be displayed as a bit of additional info on the appropriate pages. The source of the information is obvious: Wikipedia. Although Wikitravel may seem a better choice at [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=djeems.wordpress.com&amp;blog=594928&amp;post=3&amp;subd=djeems&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>So here was the challenge. For my travel website I wanted to have a script that automatically retrieves snippets of information about destinations, so that they can be displayed as a bit of additional info on the appropriate pages. The source of the information is obvious: <a href="http://en.wikipedia.org/wiki/Main_Page">Wikipedia</a>. Although <a href="http://wikitravel.org/en/Main_Page">Wikitravel</a> may seem a better choice at first sight for this project, Wikitravel&#8217;s purpose is mostly to provide travel tips and alike; for general information, Wikipedia is certainly the best choice.</p>
<p>Wikipedia text can be used freely &#8212; <a href="http://en.wikipedia.org/wiki/Wikipedia:Copyright">under GFDL license</a> &#8212; and also provides several methods for using their information. For power users, there are complete <a href="http://en.wikipedia.org/wiki/Wikipedia:Database_download">database dumps</a> available. Nevertheless, for this small project that would be utter overkill. I do not have the space, nor do I feel the need to download and run scripts on several gigabytes of data for merely article snippets of about a hundred destinations. <img src='http://s1.wp.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>A more targeted approach is therefore recommendable in this particular case. Luckily, a tool for a page-targeted approach to fetching is available as well, namely the <a href="http://en.wikipedia.org/wiki/Special:Export">Special:Export</a> function. This method is always preferred over fetching the actual HTML pages, due to the strain caused on Wikipedia&#8217;s servers when parsing wikicode and converting it to HTML.</p>
<p>For my project, the Special:Export function would do nicely. This returns an XML document that contains the article text in <a href="http://en.wikipedia.org/wiki/Wikipedia:Syntax">wikicode</a> between the &lt;text&gt;&lt;/text&gt; elements. Automatically identifying the information snippet (i.e.the first paragraph) is an interesting task, as Wikipedia articles in wikicode may contain many many elements before even starting with the actual text. Some of these include template tags, information tables, images and definitions. So firstly, all of these elements should be removed, which requires the writing of over a dozen &#8212; and sometimes quite elaborate &#8212; regular expressions. After that, we&#8217;ll have the first paragraph of information at the very start of the resulting text.</p>
<p>The script then finds the first paragraph by locating a text of minimally 200 characters that&#8217;s followed by two linebreaks. So, there we go: the long-sought information snippet has been identified. However, the story doesn&#8217;t end here. Wikipedia&#8217;s article text is still in wikicode, which means that there&#8217;s a lot of markup applied that doesn&#8217;t look very nice on web pages without further parsing. So all of Wikipedia&#8217;s markup has to be either removed, or replaced by its equivalents in HTML or BBCode. When that&#8217;s all done, the information snippet can be saved locally, and is ready for display on the webpage! <img src='http://s0.wp.com/wp-includes/images/smilies/icon_biggrin.gif' alt=':D' class='wp-smiley' /> </p>
<p><strong>Dealing with redirection and disambiguation pages</strong></p>
<p>This fetching method is looking for pages of information on countries, states/provinces and cities. So, based on an input of place names, this script assumes the existence of an article name and tries to fetch that URL (e.g. wikipedia.org/wiki/New_York). That only works out immediately in a number of cases. Sometimes one article name will be redirected to another page. On the Wikipedia website redirection takes place immediately. In the XML feeds this is not the case: the only text you will find is &#8220;#REDIRECT [[Page Name]]&#8221;. The script had to recognize this as well, and then fetch and parse the correct Wikipedia page instead.</p>
<p>Then there are the disambiguation pages. These can be distinguished from the article pages because the wikicode will either contain {{disambig}} or {{geodis}}, followed or preceded by a list of possible articles. The next question is: which page is the correct one? One could go about doing this through semantics analysis, but that&#8217;s quite an IR challenge.</p>
<p>Luckily, disambiguation within this project can be handled a whole lot easier. The thing is that articles on cities are usually named &#8220;City&#8221;, &#8220;City, Province&#8221; or &#8220;City, Country&#8221;. Since the state and country information is already available in my destination data set, finding disambiguation pages can be overcome by fetching and analyzing these alternative article names.</p>
<p><strong>In conclusion</strong></p>
<p>In this post I&#8217;ve described my method for automatically retrieving a relevant Wikipedia information snippet on destinations (cities, provinces, countries) from a set of names. I&#8217;ve built a generic PHP function which does this by itself; it simply has to be fed location names. The script now runs once a week on my web server as a cronjob and fetches Wikipedia information snippets that will be displayed on new destination pages of the travel website (the script also refreshes the information on already existing pages, so that it stays in sync with the latest Wikipedia article changes). The success rate of this method is quite satisfactory: I estimate that for about 95% of all locations an information snippet could be identified correctly. The script took me a good afternoon to program, but will continue to retrieve relevant information snippets, no matter how many new destination pages will be added to my travel website.</p>
<p><ins datetime="2007-01-23T20:40:14+00:00">Update. A live example of this PHP function can be found at <a href="http://www.fuzzytravel.com/sandbox/wikithis.php">http://www.fuzzytravel.com/sandbox/wikithis.php</a>.</ins></p>
<p><ins datetime="2007-02-13T12:07:39+00:00">Update 2. I&#8217;ve made the function&#8217;s code available: <a href='http://djeems.files.wordpress.com/2007/02/wikithisphp.txt' title='wikithisphp.txt'>wikithisphp.txt</a>.</ins></p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/djeems.wordpress.com/3/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/djeems.wordpress.com/3/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/djeems.wordpress.com/3/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/djeems.wordpress.com/3/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/djeems.wordpress.com/3/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/djeems.wordpress.com/3/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/djeems.wordpress.com/3/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/djeems.wordpress.com/3/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/djeems.wordpress.com/3/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/djeems.wordpress.com/3/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/djeems.wordpress.com/3/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/djeems.wordpress.com/3/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/djeems.wordpress.com/3/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/djeems.wordpress.com/3/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/djeems.wordpress.com/3/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/djeems.wordpress.com/3/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=djeems.wordpress.com&amp;blog=594928&amp;post=3&amp;subd=djeems&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://djeems.wordpress.com/2006/12/21/fetching-wikipedia-content/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/078149ea5ec5e517d175e290e3b9d54d?s=96&#38;d=identicon" medium="image">
			<media:title type="html">djeems</media:title>
		</media:content>
	</item>
	</channel>
</rss>
