<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Search Marketing&#187; Andy Webb  &#8211; Epiphany Solutions Digital Marketing Blog Author</title>
	<atom:link href="http://www.epiphanysolutions.co.uk/blog/author/andy-webb/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.epiphanysolutions.co.uk/blog</link>
	<description>Just another WordPress weblog</description>
	<lastBuildDate>Tue, 07 Feb 2012 09:21:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Measuring Keyword Density with Ruby</title>
		<link>http://www.epiphanysolutions.co.uk/blog/measuring-keyword-density-with-ruby/</link>
		<comments>http://www.epiphanysolutions.co.uk/blog/measuring-keyword-density-with-ruby/#comments</comments>
		<pubDate>Tue, 06 Dec 2011 15:00:46 +0000</pubDate>
		<dc:creator>Andy Webb</dc:creator>
				<category><![CDATA[Useful Scripts and Utilities]]></category>

		<guid isPermaLink="false">http://www.epiphanysolutions.co.uk/blog/?p=10235</guid>
		<description><![CDATA[So, keyword density.. a slightly dated metric but an interesting challenge none the less. Here is my response to Drews first challenge, for this challenge we had to take a random URL and parse it’s text content to check for keyword density. Additional parameters to make the tool a little more useful were a ‘minimum [...]]]></description>
			<content:encoded><![CDATA[<p>So, keyword density.. a slightly dated metric but an interesting challenge none the less. Here is my response to Drews first challenge, for this challenge we had to take a random URL and parse it’s text content to check for keyword density. <span id="more-10235"></span></p>
<p>Additional parameters to make the tool a little more useful were a ‘minimum word length’ and a ‘minimum word occurrences’ parameter. These helped cut down the chaff from the real density results. As usual these scripts are a bit of fun and a proof of concept rather than a robust and useable utility so be ready to tweak it to meet your needs!</p>
<p>I ran the solution using Ruby 1.9.2 but the library includes are minimal so you should be able to get it going on most earlier versions.</p>
<p>Enjoy!</p>
<pre class="brush: ruby;">
require 'open-uri'

URL_TO_PARSE = ARGV[0]
MINIMUM_OCCURANCES = ARGV[1].to_i
MINIMUM_WORD_LENGTH = ARGV[2].to_i

# list of stopwords
STOP_WORDS = [&quot;a&quot;,&quot;able&quot;,&quot;about&quot;,&quot;above&quot;,&quot;abroad&quot;,&quot;according&quot;] # etc

# read the page in
puts &quot;\n\nOpening #{URL_TO_PARSE}...&quot;
page = open(URL_TO_PARSE).read()

# First remove the script tags...
puts &quot;Removing script content...&quot;
page.gsub!(/&lt;script.*?&gt;[\s\S]*?&lt;\/script&gt;/i, &quot;&quot;)

# then remove the markup...
puts &quot;Removing mark-up...&quot;
page.gsub!(/&lt;\/?[^&gt;]*&gt;/, &quot; &quot;)

# trim the whitespace off the start and end of the lines...
puts &quot;Tidying text...&quot;
page.gsub!(/^[ \t]+|[ \t]+$/, &quot; &quot;)

# and the excess newlines...
page.gsub!(/\n{2,}/,&quot;\n&quot;)

# then pull it all onto one line...
page.gsub!(&quot;\n&quot;,&quot; &quot;)

# now, strip out all punctuation...
puts &quot;Removing punctuation...&quot;
page.gsub!(/[\.|\,|\@|\!|\?|\-|\'|\(|\)]/, '')

# remove the excess spaces...
puts &quot;Further tidying...&quot;
page.gsub!(/ {2,}/,&quot; &quot;)

# drop everything to the same case
page.downcase!

# now split it the string using spaces...
page_words = page.split(&quot; &quot;)

# how many words do we have?
puts &quot;Found #{page_words.length} words, removing all words less than #{MINIMUM_WORD_LENGTH} characters in length.&quot;

# remove entries which have fewer letters than our parameter
page_words.delete_if{|word| word.length &lt; MINIMUM_WORD_LENGTH}

# how many did we end up with?
puts &quot;Ended up with #{page_words.length} words.&quot;

# remove the stop words from the list
puts &quot;Removing stop words from list...&quot;
cleaned_page_words = page_words - STOP_WORDS
puts &quot;Ended up with #{cleaned_page_words.length} words.&quot;

# create a hash to hold the words and number of occurances
words_and_occurances = {}

# go through all the cleaned words creating a hash for each, start with no instances, just to get the keys.
cleaned_page_words.each{|word| words_and_occurances[word] = 0}

# then go through again adding up the number of occurances
cleaned_page_words.each{|word| words_and_occurances[word] +=1 }

# sort the list by number of occurances
puts &quot;Sorting by frequency...&quot;
sorted_words_and_occurances = words_and_occurances.sort_by{|k,v| -v }

# output the end result...
puts &quot;Filtering words that occur less than #{MINIMUM_OCCURANCES} times.\n\n&quot;
puts &quot;Here's your words...\n\n&quot;
sorted_words_and_occurances.each do |word,occurances|
	if occurances &gt; MINIMUM_OCCURANCES then
		puts &quot;#{occurances} occurances of #{word}&quot;
	end
end
</pre>
<p>And when we run the script on our homepage, looking for words that appear at least 3 times and have a minimum length of 4 letters&#8230;</p>
<pre class="brush: plain;">
C:\path\to\script&gt;ruby keyword_density.rb http://www.epiphanysolutions.co.uk 3 4

Opening http://www.epiphanysolutions.co.uk...
Removing script content...
Removing mark-up...
Tidying text...
Removing punctuation...
Further tidying...
Found 421 words, removing all words less than 4 characters in length.
Ended up with 289 words.
Removing stop words from list...
Ended up with 240 words.
Sorting by frequency...
Filtering words that occur less than 3 times.

Here are your most frequently occuring words...

14 occurrences of search
5 occurrences of google
5 occurrences of conversion
5 occurrences of media
5 occurrences of agency
5 occurrences of social
4 occurrences of marketing
4 occurrences of optimisation
3 occurrences of partner
3 occurrences of december
3 occurrences of london
3 occurrences of certified
3 occurrences of development
3 occurrences of leeds
3 occurrences of 2011
3 occurrences of paid
3 occurrences of organic
3 occurrences of dont
3 occurrences of analytics
3 occurrences of contact
3 occurrences of clients
</pre>


<div class="shr-bookmarks shr-bookmarks-expand shr-bookmarks-center">
<ul class="socials">
		<li class="shr-delicious">
			<a href="http://delicious.com/post?url=http://www.epiphanysolutions.co.uk/blog/measuring-keyword-density-with-ruby/&amp;title=Measuring+Keyword+Density+with+Ruby" rel="nofollow" class="external" title="Share this on del.icio.us">Share this on del.icio.us</a>
		</li>
		<li class="shr-digg">
			<a href="http://digg.com/submit?phase=2&amp;url=http://www.epiphanysolutions.co.uk/blog/measuring-keyword-density-with-ruby/&amp;title=Measuring+Keyword+Density+with+Ruby" rel="nofollow" class="external" title="Digg this!">Digg this!</a>
		</li>
		<li class="shr-facebook">
			<a href="http://www.facebook.com/share.php?v=4&amp;src=bm&amp;u=http://www.epiphanysolutions.co.uk/blog/measuring-keyword-density-with-ruby/&amp;t=Measuring+Keyword+Density+with+Ruby" rel="nofollow" class="external" title="Share this on Facebook">Share this on Facebook</a>
		</li>
		<li class="shr-googlebuzz">
			<a href="http://www.google.com/buzz/post?url=http://www.epiphanysolutions.co.uk/blog/measuring-keyword-density-with-ruby/&amp;imageurl=" rel="nofollow" class="external" title="Post on Google Buzz">Post on Google Buzz</a>
		</li>
		<li class="shr-linkedin">
			<a href="http://www.linkedin.com/shareArticle?mini=true&amp;url=http://www.epiphanysolutions.co.uk/blog/measuring-keyword-density-with-ruby/&amp;title=Measuring+Keyword+Density+with+Ruby&amp;summary=So%2C%20keyword%20density..%20a%20slightly%20dated%20metric%20but%20an%20interesting%20challenge%20none%20the%20less.%20Here%20is%20my%20response%20to%20Drews%20first%20challenge%2C%20for%20this%20challenge%20we%20had%20to%20take%20a%20random%20URL%20and%20parse%20it%E2%80%99s%20text%20content%20to%20check%20for%20keyword%20density.%20%0D%0A%0D%0AAdditional%20parameters%20to%20make%20the%20tool%20a%20little%20more%20&amp;source=Search Marketing" rel="nofollow" class="external" title="Share this on LinkedIn">Share this on LinkedIn</a>
		</li>
		<li class="shr-reddit">
			<a href="http://reddit.com/submit?url=http://www.epiphanysolutions.co.uk/blog/measuring-keyword-density-with-ruby/&amp;title=Measuring+Keyword+Density+with+Ruby" rel="nofollow" class="external" title="Share this on Reddit">Share this on Reddit</a>
		</li>
		<li class="shr-sphinn">
			<a href="http://sphinn.com/index.php?c=post&amp;m=submit&amp;link=http://www.epiphanysolutions.co.uk/blog/measuring-keyword-density-with-ruby/" rel="nofollow" class="external" title="Sphinn this on Sphinn">Sphinn this on Sphinn</a>
		</li>
		<li class="shr-stumbleupon">
			<a href="http://www.stumbleupon.com/submit?url=http://www.epiphanysolutions.co.uk/blog/measuring-keyword-density-with-ruby/&amp;title=Measuring+Keyword+Density+with+Ruby" rel="nofollow" class="external" title="Stumble upon something good? Share it on StumbleUpon">Stumble upon something good? Share it on StumbleUpon</a>
		</li>
		<li class="shr-technorati">
			<a href="http://technorati.com/faves?add=http://www.epiphanysolutions.co.uk/blog/measuring-keyword-density-with-ruby/" rel="nofollow" class="external" title="Share this on Technorati">Share this on Technorati</a>
		</li>
		<li class="shr-twitter">
			<a href="http://twitter.com/home?status=Measuring+Keyword+Density+with+Ruby+-+http://bit.ly/ts0jtA&amp;source=shareaholic" rel="nofollow" class="external" title="Tweet This!">Tweet This!</a>
		</li>
</ul>
<div style="clear:both;"></div>
</div>

]]></content:encoded>
			<wfw:commentRss>http://www.epiphanysolutions.co.uk/blog/measuring-keyword-density-with-ruby/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Throwing Down the Gauntlet</title>
		<link>http://www.epiphanysolutions.co.uk/blog/throwing-down-the-gauntlet/</link>
		<comments>http://www.epiphanysolutions.co.uk/blog/throwing-down-the-gauntlet/#comments</comments>
		<pubDate>Tue, 06 Sep 2011 08:43:52 +0000</pubDate>
		<dc:creator>Andy Webb</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[Web Development]]></category>

		<guid isPermaLink="false">http://www.epiphanysolutions.co.uk/blog/?p=8481</guid>
		<description><![CDATA[So we&#8217;ve been having a bit of banter around the Web Development table about some of the features of the latest programming frameworks and which ones have the best libraries for SEO/HTML tools. We&#8217;ve decided to put our money where our mouth is and go head to head on a number of quick and easy [...]]]></description>
			<content:encoded><![CDATA[<p>So we&#8217;ve been having a bit of banter around the Web Development table about some of the features of the latest programming frameworks and which ones have the best libraries for SEO/HTML tools.</p>
<p>We&#8217;ve decided to put our money where our mouth is and go head to head on a number of quick and easy parsing tools which automate a lot of the reporting and repetitive tasks of your everyday SEO analyst. We&#8217;re comparing C# on Microsoft&#8217;s .NET framework and its extensive range of web libraries with the more popular *nix based languages such as Python, PHP and Ruby. <span id="more-8481"></span></p>
<p>Each month we will be presenting a number of simple utility scripts which should be easy to bundle, install, run but more importantly, easy to modify for tying into databases and attaching email alerts etc. Who knows, we may even venture into threading, making the tools seriously powerful.</p>
<p>Keep an eye on the blog over the next few months to pick up some nice little titbits of functionality to help automate your daily chores, whatever your platform!</p>
<p>Contenders will initially be myself, Andy Webb, technical architect and Drew Brigham, senior .NET developer here at Epiphany. No doubt we&#8217;ll pick up a few more contenders along the way and if you feel you want to contribute a solution, feel free to get involved!</p>
<p>Keep an eye open this month for the first challenge.</p>
<p><a href="http://www.epiphanysolutions.co.uk/blog/wp-content/uploads/2011/09/frameworks.png" rel="lightbox[8481]"><img src="http://www.epiphanysolutions.co.uk/blog/wp-content/uploads/2011/09/frameworks.png" alt="" title="frameworks" width="600" height="131" class="alignnone size-full wp-image-8485" /></a></p>


<div class="shr-bookmarks shr-bookmarks-expand shr-bookmarks-center">
<ul class="socials">
		<li class="shr-delicious">
			<a href="http://delicious.com/post?url=http://www.epiphanysolutions.co.uk/blog/throwing-down-the-gauntlet/&amp;title=Throwing+Down+the+Gauntlet" rel="nofollow" class="external" title="Share this on del.icio.us">Share this on del.icio.us</a>
		</li>
		<li class="shr-digg">
			<a href="http://digg.com/submit?phase=2&amp;url=http://www.epiphanysolutions.co.uk/blog/throwing-down-the-gauntlet/&amp;title=Throwing+Down+the+Gauntlet" rel="nofollow" class="external" title="Digg this!">Digg this!</a>
		</li>
		<li class="shr-facebook">
			<a href="http://www.facebook.com/share.php?v=4&amp;src=bm&amp;u=http://www.epiphanysolutions.co.uk/blog/throwing-down-the-gauntlet/&amp;t=Throwing+Down+the+Gauntlet" rel="nofollow" class="external" title="Share this on Facebook">Share this on Facebook</a>
		</li>
		<li class="shr-googlebuzz">
			<a href="http://www.google.com/buzz/post?url=http://www.epiphanysolutions.co.uk/blog/throwing-down-the-gauntlet/&amp;imageurl=" rel="nofollow" class="external" title="Post on Google Buzz">Post on Google Buzz</a>
		</li>
		<li class="shr-linkedin">
			<a href="http://www.linkedin.com/shareArticle?mini=true&amp;url=http://www.epiphanysolutions.co.uk/blog/throwing-down-the-gauntlet/&amp;title=Throwing+Down+the+Gauntlet&amp;summary=So%20we%27ve%20been%20having%20a%20bit%20of%20banter%20around%20the%20Web%20Development%20table%20about%20some%20of%20the%20features%20of%20the%20latest%20programming%20frameworks%20and%20which%20ones%20have%20the%20best%20libraries%20for%20SEO%2FHTML%20tools.%0D%0A%0D%0AWe%27ve%20decided%20to%20put%20our%20money%20where%20our%20mouth%20is%20and%20go%20head%20to%20head%20on%20a%20number%20of%20quick%20and%20easy%20pars&amp;source=Search Marketing" rel="nofollow" class="external" title="Share this on LinkedIn">Share this on LinkedIn</a>
		</li>
		<li class="shr-reddit">
			<a href="http://reddit.com/submit?url=http://www.epiphanysolutions.co.uk/blog/throwing-down-the-gauntlet/&amp;title=Throwing+Down+the+Gauntlet" rel="nofollow" class="external" title="Share this on Reddit">Share this on Reddit</a>
		</li>
		<li class="shr-sphinn">
			<a href="http://sphinn.com/index.php?c=post&amp;m=submit&amp;link=http://www.epiphanysolutions.co.uk/blog/throwing-down-the-gauntlet/" rel="nofollow" class="external" title="Sphinn this on Sphinn">Sphinn this on Sphinn</a>
		</li>
		<li class="shr-stumbleupon">
			<a href="http://www.stumbleupon.com/submit?url=http://www.epiphanysolutions.co.uk/blog/throwing-down-the-gauntlet/&amp;title=Throwing+Down+the+Gauntlet" rel="nofollow" class="external" title="Stumble upon something good? Share it on StumbleUpon">Stumble upon something good? Share it on StumbleUpon</a>
		</li>
		<li class="shr-technorati">
			<a href="http://technorati.com/faves?add=http://www.epiphanysolutions.co.uk/blog/throwing-down-the-gauntlet/" rel="nofollow" class="external" title="Share this on Technorati">Share this on Technorati</a>
		</li>
		<li class="shr-twitter">
			<a href="http://twitter.com/home?status=Throwing+Down+the+Gauntlet+-+http://bit.ly/r9GISJ&amp;source=shareaholic" rel="nofollow" class="external" title="Tweet This!">Tweet This!</a>
		</li>
</ul>
<div style="clear:both;"></div>
</div>

]]></content:encoded>
			<wfw:commentRss>http://www.epiphanysolutions.co.uk/blog/throwing-down-the-gauntlet/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>HTML Parsing with Ruby and Nokogiri</title>
		<link>http://www.epiphanysolutions.co.uk/blog/html-parsing-with-ruby-and-nokogiri/</link>
		<comments>http://www.epiphanysolutions.co.uk/blog/html-parsing-with-ruby-and-nokogiri/#comments</comments>
		<pubDate>Tue, 30 Aug 2011 09:23:36 +0000</pubDate>
		<dc:creator>Andy Webb</dc:creator>
				<category><![CDATA[Link Building]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[Web Development]]></category>

		<guid isPermaLink="false">http://www.epiphanysolutions.co.uk/blog/?p=8401</guid>
		<description><![CDATA[I&#8217;ve been working on a few health check/utility scripts and since I&#8217;ve been swaying towards Ruby as my language of choice recently (I&#8217;ve put Python on the shelf briefly), I decided to check out Nokogiri. Stupid name I know, but Nokogiri is a pretty powerful XML/HTML parsing library for Ruby. It was written in the [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been working on a few health check/utility scripts and since I&#8217;ve been swaying towards Ruby as my language of choice recently (I&#8217;ve put Python on the shelf briefly), I decided to check out Nokogiri. Stupid name I know, but Nokogiri is a pretty powerful XML/HTML parsing library for Ruby. It was written in the C programming language (which means its quick) and supports XPath and jQuery CSS style selectors meaning no more complicated regular expressions! As with all the high level interpreted languages out there nowadays, Ruby/Nokogiri doesn&#8217;t fail to deliver excellent results within a few short lines of code. <span id="more-8401"></span></p>
<p>Let&#8217;s take a scenario where we have (generously) been given a couple of links from a few of prolific sites in our industry. To make sure we stay in favour with these sites, we run a weekly check to make sure they&#8217;re still happy to link to us. Now weekly checks aren&#8217;t that much of an effort to carry out manually but its best to automate this check for a number of reasons:</p>
<p>1.) Who wants to be setting reminders and reading through web pages manually?</p>
<p>2.) If it&#8217;s not in the usual place for whatever reason, we&#8217;ve got to trawl source code to see where they moved it to.</p>
<p>3.) As small a point as this is, if we needed to check more frequently, we don&#8217;t really want to be triggering their analytics tracking, just in case they&#8217;re funny about robots giving a false impression of traffic. So step in Ruby and Nokogiri.</p>
<p>We want to:</p>
<p>1.) Visit the pages in question.<br />
2.) Parse the HTML.<br />
3.) Pull out all the links.<br />
4.) Find ours and read the anchor text.<br />
5.) Make sure it&#8217;s there and they haven&#8217;t no-followed it!<br />
6.) Report back the status of the links.</p>
<p><strong>Step in Ruby/Nokigiri:</strong></p>
<pre class="brush: ruby;">
require 'rubygems'
require 'open-uri'
require 'nokogiri'

our_url = &quot;http://www.epiphanysolutions.co.uk&quot;
urls = [&quot;http://www.27bslash6.com&quot;,&quot;http://www.ford.com&quot;,&quot;http://www.seomoz.org/blog/improving-reporting-efficiency-and-relevance&quot;,&quot;http://csshook.com&quot;]
today = Time.now.localtime.strftime(&quot;%d-%m-%Y&quot;)

begin
	# cycle through the URLs in the list
	urls.each do |url|
		# read the page and parse it into Nokogiri
		page = Nokogiri::HTML(open(url))
		# select any anchors that link to our domain. (in 1 line! Good old XPath.)
		links_to_us = page.xpath(&quot;//a[substring(@href, 1, #{our_url.size}) = '#{our_url}']&quot;)
		if (links_to_us.size &gt; 0) then
			links_to_us.each do |anchor|
				# tell the user about the link we found
				printf &quot;On %s we found a link to %s from %s with anchor text \&quot;%s\&quot;&quot;, today, anchor.attr('href'), url, anchor.text()
				# and tell them whether or not its nofollowed..
				puts anchor.attr('rel') == &quot;nofollow&quot; ? &quot; but it's no followed!&quot; : &quot;. All good on that link.&quot;
			end
		else
			printf &quot;On %s we couldn't find a link on %s to %s\n&quot;, today, url, our_url
		end
	end
rescue Exception =&gt; e
	# ... if we can't open or parse a page for whatever reason, tell us why
	puts e.message
end
</pre>
<p><strong>And what do we get?</strong></p>
<pre class="brush: plain;">
On 30-08-2011 we found a link to http://www.epiphanysolutions.co.uk/ from http://www.27bslash6.com with anchor text &quot;SEO - Epiphany Solutions&quot;. All good on that link.

On 30-08-2011 we couldn't find a link on http://www.ford.com to http://www.epiphanysolutions.co.uk

On 30-08-2011 we found a link to http://www.epiphanysolutions.co.uk from http://www.seomoz.org/blog/improving-reporting-efficiency-and-relevance with anchor text &quot;Epiphany Solutions&quot; but it's no followed!

On 30-08-2011 we found a link to http://www.epiphanysolutions.co.uk/ from http://csshook.com with anchor text &quot;SEO&quot;. All good on that link.
</pre>
<p>I&#8217;m sure you can see the potential of this tool if you start threading its functionality into databases, logs and alert mailing systems, and the XPath AND jQuery style selectors give it an extra layer of accessibility and flexibility. Say what you want about these slightly more obscure open source programming languages, but they get the job done quickly and reliably! They have also come up trumps many a time with me as the right tool for the job. They come with an array of libraries to do most day to day tasks and with them being an open source, people even compete with each other building libraries that do the same job &#8211; meaning even more simplicity and power!</p>
<p>What are your thoughts?</p>


<div class="shr-bookmarks shr-bookmarks-expand shr-bookmarks-center">
<ul class="socials">
		<li class="shr-delicious">
			<a href="http://delicious.com/post?url=http://www.epiphanysolutions.co.uk/blog/html-parsing-with-ruby-and-nokogiri/&amp;title=HTML+Parsing+with+Ruby+and+Nokogiri" rel="nofollow" class="external" title="Share this on del.icio.us">Share this on del.icio.us</a>
		</li>
		<li class="shr-digg">
			<a href="http://digg.com/submit?phase=2&amp;url=http://www.epiphanysolutions.co.uk/blog/html-parsing-with-ruby-and-nokogiri/&amp;title=HTML+Parsing+with+Ruby+and+Nokogiri" rel="nofollow" class="external" title="Digg this!">Digg this!</a>
		</li>
		<li class="shr-facebook">
			<a href="http://www.facebook.com/share.php?v=4&amp;src=bm&amp;u=http://www.epiphanysolutions.co.uk/blog/html-parsing-with-ruby-and-nokogiri/&amp;t=HTML+Parsing+with+Ruby+and+Nokogiri" rel="nofollow" class="external" title="Share this on Facebook">Share this on Facebook</a>
		</li>
		<li class="shr-googlebuzz">
			<a href="http://www.google.com/buzz/post?url=http://www.epiphanysolutions.co.uk/blog/html-parsing-with-ruby-and-nokogiri/&amp;imageurl=" rel="nofollow" class="external" title="Post on Google Buzz">Post on Google Buzz</a>
		</li>
		<li class="shr-linkedin">
			<a href="http://www.linkedin.com/shareArticle?mini=true&amp;url=http://www.epiphanysolutions.co.uk/blog/html-parsing-with-ruby-and-nokogiri/&amp;title=HTML+Parsing+with+Ruby+and+Nokogiri&amp;summary=I%27ve%20been%20working%20on%20a%20few%20health%20check%2Futility%20scripts%20and%20since%20I%27ve%20been%20swaying%20towards%20Ruby%20as%20my%20language%20of%20choice%20recently%20%28I%27ve%20put%20Python%20on%20the%20shelf%20briefly%29%2C%20I%20decided%20to%20check%20out%20Nokogiri.%20Stupid%20name%20I%20know%2C%20but%20Nokogiri%20is%20a%20pretty%20powerful%20XML%2FHTML%20parsing%20library%20for%20Ruby.%20It%20was%20&amp;source=Search Marketing" rel="nofollow" class="external" title="Share this on LinkedIn">Share this on LinkedIn</a>
		</li>
		<li class="shr-reddit">
			<a href="http://reddit.com/submit?url=http://www.epiphanysolutions.co.uk/blog/html-parsing-with-ruby-and-nokogiri/&amp;title=HTML+Parsing+with+Ruby+and+Nokogiri" rel="nofollow" class="external" title="Share this on Reddit">Share this on Reddit</a>
		</li>
		<li class="shr-sphinn">
			<a href="http://sphinn.com/index.php?c=post&amp;m=submit&amp;link=http://www.epiphanysolutions.co.uk/blog/html-parsing-with-ruby-and-nokogiri/" rel="nofollow" class="external" title="Sphinn this on Sphinn">Sphinn this on Sphinn</a>
		</li>
		<li class="shr-stumbleupon">
			<a href="http://www.stumbleupon.com/submit?url=http://www.epiphanysolutions.co.uk/blog/html-parsing-with-ruby-and-nokogiri/&amp;title=HTML+Parsing+with+Ruby+and+Nokogiri" rel="nofollow" class="external" title="Stumble upon something good? Share it on StumbleUpon">Stumble upon something good? Share it on StumbleUpon</a>
		</li>
		<li class="shr-technorati">
			<a href="http://technorati.com/faves?add=http://www.epiphanysolutions.co.uk/blog/html-parsing-with-ruby-and-nokogiri/" rel="nofollow" class="external" title="Share this on Technorati">Share this on Technorati</a>
		</li>
		<li class="shr-twitter">
			<a href="http://twitter.com/home?status=HTML+Parsing+with+Ruby+and+Nokogiri+-+http://bit.ly/n5JzVw&amp;source=shareaholic" rel="nofollow" class="external" title="Tweet This!">Tweet This!</a>
		</li>
</ul>
<div style="clear:both;"></div>
</div>

]]></content:encoded>
			<wfw:commentRss>http://www.epiphanysolutions.co.uk/blog/html-parsing-with-ruby-and-nokogiri/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Programmatic Web Browsing and Healthchecking with Mechanize</title>
		<link>http://www.epiphanysolutions.co.uk/blog/programmatic-web-browsing-and-healthchecking-with-mechanize/</link>
		<comments>http://www.epiphanysolutions.co.uk/blog/programmatic-web-browsing-and-healthchecking-with-mechanize/#comments</comments>
		<pubDate>Tue, 09 Nov 2010 09:15:11 +0000</pubDate>
		<dc:creator>Andy Webb</dc:creator>
				<category><![CDATA[Tools]]></category>
		<category><![CDATA[Web Development]]></category>

		<guid isPermaLink="false">http://www.epiphanysolutions.co.uk/blog/?p=3360</guid>
		<description><![CDATA[A quick little golden nugget for all you automated site scrapers out there, check out Mechanize! Mechanize is a completely programmable browser implemented (in this instance) purely in Python. This tool is priceless for automated site health checks for everything from link presence, robots.txt rules, automated form submissions, cookie handling, you name it. Whereas mostly [...]]]></description>
			<content:encoded><![CDATA[<p>A quick little golden nugget for all you automated site scrapers out there, check out Mechanize!</p>
<p>Mechanize is a completely programmable browser implemented (in this instance) purely in Python. This tool is priceless for automated site health checks for everything from link presence, robots.txt rules, automated form submissions, cookie handling, you name it. <span id="more-3360"></span></p>
<p>Whereas mostly we would parse a page as text and look for links, then write our own logic to follow that link and pull out information about the page, mechanize allows you to do all this at a much higher level. Here is just one neat trick:</p>
<p><code># Make a browser.<br />
browser = mechanize.Browser()<br />
# Open a page.<br />
browser.open("http://www.epiphanysolutions.co.uk/")<br />
# Follow the second link for the keyword 'analytics'.<br />
myResponse = browser.follow_link(text_regex=r"analytics", nr=1)<br />
# Print the title of the page.<br />
assert browser.viewing_html()<br />
print browser.title()</code></p>
<p>&#8230;returns &#8220;Google Analytics IQ Consultants &#8211; Google Analytics Consulting Services&#8221;, the title of the page linked to by the second &#8216;analytics&#8217; link.</p>
<p>The browser also has a wealth of other features such as robots.txt handling, proxy handling and redirect detection. Not only does this make short work of automated sitemap building, checking for the presence and sanity of acquired links and ensuring our pages aren&#8217;t blocked by robots.txt, but with a little extra work it can submit to forms, handle cookies and also handle various types of errors and exceptions making it invaluable for just throwing onto a regular scheduled task for the daily weeding out of quirks and problems which manual surfing may not immediately uncover.</p>
<p>I&#8217;m aware I&#8217;m in a bit of a niche here but if you write programs to programmatically interact with websites, give it a whirl! You&#8217;ll not be disappointed.</p>


<div class="shr-bookmarks shr-bookmarks-expand shr-bookmarks-center">
<ul class="socials">
		<li class="shr-delicious">
			<a href="http://delicious.com/post?url=http://www.epiphanysolutions.co.uk/blog/programmatic-web-browsing-and-healthchecking-with-mechanize/&amp;title=Programmatic+Web+Browsing+and+Healthchecking+with+Mechanize" rel="nofollow" class="external" title="Share this on del.icio.us">Share this on del.icio.us</a>
		</li>
		<li class="shr-digg">
			<a href="http://digg.com/submit?phase=2&amp;url=http://www.epiphanysolutions.co.uk/blog/programmatic-web-browsing-and-healthchecking-with-mechanize/&amp;title=Programmatic+Web+Browsing+and+Healthchecking+with+Mechanize" rel="nofollow" class="external" title="Digg this!">Digg this!</a>
		</li>
		<li class="shr-facebook">
			<a href="http://www.facebook.com/share.php?v=4&amp;src=bm&amp;u=http://www.epiphanysolutions.co.uk/blog/programmatic-web-browsing-and-healthchecking-with-mechanize/&amp;t=Programmatic+Web+Browsing+and+Healthchecking+with+Mechanize" rel="nofollow" class="external" title="Share this on Facebook">Share this on Facebook</a>
		</li>
		<li class="shr-googlebuzz">
			<a href="http://www.google.com/buzz/post?url=http://www.epiphanysolutions.co.uk/blog/programmatic-web-browsing-and-healthchecking-with-mechanize/&amp;imageurl=" rel="nofollow" class="external" title="Post on Google Buzz">Post on Google Buzz</a>
		</li>
		<li class="shr-linkedin">
			<a href="http://www.linkedin.com/shareArticle?mini=true&amp;url=http://www.epiphanysolutions.co.uk/blog/programmatic-web-browsing-and-healthchecking-with-mechanize/&amp;title=Programmatic+Web+Browsing+and+Healthchecking+with+Mechanize&amp;summary=A%20quick%20little%20golden%20nugget%20for%20all%20you%20automated%20site%20scrapers%20out%20there%2C%20check%20out%20Mechanize%21%0D%0A%0D%0AMechanize%20is%20a%20completely%20programmable%20browser%20implemented%20%28in%20this%20instance%29%20purely%20in%20Python.%20This%20tool%20is%20priceless%20for%20automated%20site%20health%20checks%20for%20everything%20from%20link%20presence%2C%20robots.txt%20ru&amp;source=Search Marketing" rel="nofollow" class="external" title="Share this on LinkedIn">Share this on LinkedIn</a>
		</li>
		<li class="shr-reddit">
			<a href="http://reddit.com/submit?url=http://www.epiphanysolutions.co.uk/blog/programmatic-web-browsing-and-healthchecking-with-mechanize/&amp;title=Programmatic+Web+Browsing+and+Healthchecking+with+Mechanize" rel="nofollow" class="external" title="Share this on Reddit">Share this on Reddit</a>
		</li>
		<li class="shr-sphinn">
			<a href="http://sphinn.com/index.php?c=post&amp;m=submit&amp;link=http://www.epiphanysolutions.co.uk/blog/programmatic-web-browsing-and-healthchecking-with-mechanize/" rel="nofollow" class="external" title="Sphinn this on Sphinn">Sphinn this on Sphinn</a>
		</li>
		<li class="shr-stumbleupon">
			<a href="http://www.stumbleupon.com/submit?url=http://www.epiphanysolutions.co.uk/blog/programmatic-web-browsing-and-healthchecking-with-mechanize/&amp;title=Programmatic+Web+Browsing+and+Healthchecking+with+Mechanize" rel="nofollow" class="external" title="Stumble upon something good? Share it on StumbleUpon">Stumble upon something good? Share it on StumbleUpon</a>
		</li>
		<li class="shr-technorati">
			<a href="http://technorati.com/faves?add=http://www.epiphanysolutions.co.uk/blog/programmatic-web-browsing-and-healthchecking-with-mechanize/" rel="nofollow" class="external" title="Share this on Technorati">Share this on Technorati</a>
		</li>
		<li class="shr-twitter">
			<a href="http://twitter.com/home?status=Programmatic+Web+Browsing+and+Healthchecking+with+Mechanize+-+http://b2l.me/a4kr25&amp;source=shareaholic" rel="nofollow" class="external" title="Tweet This!">Tweet This!</a>
		</li>
</ul>
<div style="clear:both;"></div>
</div>

]]></content:encoded>
			<wfw:commentRss>http://www.epiphanysolutions.co.uk/blog/programmatic-web-browsing-and-healthchecking-with-mechanize/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Rise and Fall of the MP3?</title>
		<link>http://www.epiphanysolutions.co.uk/blog/a-musical-challenge-for-apple/</link>
		<comments>http://www.epiphanysolutions.co.uk/blog/a-musical-challenge-for-apple/#comments</comments>
		<pubDate>Tue, 10 Aug 2010 14:18:49 +0000</pubDate>
		<dc:creator>Andy Webb</dc:creator>
				<category><![CDATA[Digital Innovation]]></category>

		<guid isPermaLink="false">http://www.epiphanysolutions.co.uk/blog/?p=2674</guid>
		<description><![CDATA[I&#8217;ve always been what could be considered a bit of an Apple &#8216;fanboi&#8217;. I&#8217;ve not really had much of a taste for their consumer gadgets (iPods, iPhones, iPads etc) but I&#8217;ll admit I really like their laptops, mice and monitors etc. Even more than their hardware, I love their software – it’s intuitive, easy to [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.epiphanysolutions.co.uk/blog/wp-content/uploads/2010/08/spotify.jpg" rel="lightbox[2674]"><img class="alignright size-full wp-image-2675" title="spotify" src="http://www.epiphanysolutions.co.uk/blog/wp-content/uploads/2010/08/spotify.jpg" alt="" width="76" height="73" /></a>  I&#8217;ve always been what could be considered a bit of an Apple &#8216;fanboi&#8217;. I&#8217;ve not really had much of a taste for their consumer gadgets (iPods, iPhones, iPads etc) but I&#8217;ll admit I really like their laptops, mice and monitors etc. Even more than their hardware, I love their software – it’s intuitive, easy to use, robust and very often bleeding edge.<span id="more-2674"></span></p>
<p>This made me think about my brand loyalty, very rarely do hardcore Mac users ever migrate to PC&#8217;s, usually for the above reasons. I can&#8217;t think of any Mac product or program that I&#8217;ve abandoned for a better alternative on another platform &#8212; except iTunes. This software went from being horrible, clunky and overcomplicated in the days of Winamp to pretty much the most popular music program available today. (I may have made that up.) The iTunes boom also added to the already rife office one-up-man-ship of &#8220;Oh, I have over 84 gigs of music on my computer&#8221;. &#8220;Really, I have 92 gigs of Indy alone on mine.&#8221;</p>
<p>These days have been cut short in our office with the introduction and popularisation of Spotify. I&#8217;m still not 100% sure on how Spotify manages its flawless sound quality and uninterrupted streaming of the most eclectic mix of music you&#8217;d ever want (for free) but Apple are sure taking a long time to react to it. One can only assume the cost of royalties are covered by the advertisements played on Spotify but can you see Apple&#8217;s response running similar advertising campaigns? Knowing their rigorous brand/image protection, I can imagine third party advertising would somehow cheapen their product, which with apple, is the biggest of the big no-no&#8217;s.</p>
<p>How are Apple going to quash Spotify&#8217;s growing popularity without going down a similar route and making free music economically viable? How are they going to continue earning so many billions from the humble compressed audio format (not just iTunes music store sales but iPod sales also.) As more and more record companies publish their catalogues to Spotify, it’s going to be interesting to see how Apple continues to make the format pay!</p>


<div class="shr-bookmarks shr-bookmarks-expand shr-bookmarks-center">
<ul class="socials">
		<li class="shr-delicious">
			<a href="http://delicious.com/post?url=http://www.epiphanysolutions.co.uk/blog/a-musical-challenge-for-apple/&amp;title=The+Rise+and+Fall+of+the+MP3%3F" rel="nofollow" class="external" title="Share this on del.icio.us">Share this on del.icio.us</a>
		</li>
		<li class="shr-digg">
			<a href="http://digg.com/submit?phase=2&amp;url=http://www.epiphanysolutions.co.uk/blog/a-musical-challenge-for-apple/&amp;title=The+Rise+and+Fall+of+the+MP3%3F" rel="nofollow" class="external" title="Digg this!">Digg this!</a>
		</li>
		<li class="shr-facebook">
			<a href="http://www.facebook.com/share.php?v=4&amp;src=bm&amp;u=http://www.epiphanysolutions.co.uk/blog/a-musical-challenge-for-apple/&amp;t=The+Rise+and+Fall+of+the+MP3%3F" rel="nofollow" class="external" title="Share this on Facebook">Share this on Facebook</a>
		</li>
		<li class="shr-googlebuzz">
			<a href="http://www.google.com/buzz/post?url=http://www.epiphanysolutions.co.uk/blog/a-musical-challenge-for-apple/&amp;imageurl=" rel="nofollow" class="external" title="Post on Google Buzz">Post on Google Buzz</a>
		</li>
		<li class="shr-linkedin">
			<a href="http://www.linkedin.com/shareArticle?mini=true&amp;url=http://www.epiphanysolutions.co.uk/blog/a-musical-challenge-for-apple/&amp;title=The+Rise+and+Fall+of+the+MP3%3F&amp;summary=%20%20I%27ve%20always%20been%20what%20could%20be%20considered%20a%20bit%20of%20an%20Apple%20%27fanboi%27.%20I%27ve%20not%20really%20had%20much%20of%20a%20taste%20for%20their%20consumer%20gadgets%20%28iPods%2C%20iPhones%2C%20iPads%20etc%29%20but%20I%27ll%20admit%20I%20really%20like%20their%20laptops%2C%20mice%20and%20monitors%20etc.%20Even%20more%20than%20their%20hardware%2C%20I%20love%20their%20software%20%E2%80%93%20it%E2%80%99s%20intuit&amp;source=Search Marketing" rel="nofollow" class="external" title="Share this on LinkedIn">Share this on LinkedIn</a>
		</li>
		<li class="shr-reddit">
			<a href="http://reddit.com/submit?url=http://www.epiphanysolutions.co.uk/blog/a-musical-challenge-for-apple/&amp;title=The+Rise+and+Fall+of+the+MP3%3F" rel="nofollow" class="external" title="Share this on Reddit">Share this on Reddit</a>
		</li>
		<li class="shr-sphinn">
			<a href="http://sphinn.com/index.php?c=post&amp;m=submit&amp;link=http://www.epiphanysolutions.co.uk/blog/a-musical-challenge-for-apple/" rel="nofollow" class="external" title="Sphinn this on Sphinn">Sphinn this on Sphinn</a>
		</li>
		<li class="shr-stumbleupon">
			<a href="http://www.stumbleupon.com/submit?url=http://www.epiphanysolutions.co.uk/blog/a-musical-challenge-for-apple/&amp;title=The+Rise+and+Fall+of+the+MP3%3F" rel="nofollow" class="external" title="Stumble upon something good? Share it on StumbleUpon">Stumble upon something good? Share it on StumbleUpon</a>
		</li>
		<li class="shr-technorati">
			<a href="http://technorati.com/faves?add=http://www.epiphanysolutions.co.uk/blog/a-musical-challenge-for-apple/" rel="nofollow" class="external" title="Share this on Technorati">Share this on Technorati</a>
		</li>
		<li class="shr-twitter">
			<a href="http://twitter.com/home?status=The+Rise+and+Fall+of+the+MP3%3F+-+http://b2l.me/agbw9h&amp;source=shareaholic" rel="nofollow" class="external" title="Tweet This!">Tweet This!</a>
		</li>
</ul>
<div style="clear:both;"></div>
</div>

]]></content:encoded>
			<wfw:commentRss>http://www.epiphanysolutions.co.uk/blog/a-musical-challenge-for-apple/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Cold Hard Cash to use our Search Engine!</title>
		<link>http://www.epiphanysolutions.co.uk/blog/cold-hard-cash-to-use-our-search-engine/</link>
		<comments>http://www.epiphanysolutions.co.uk/blog/cold-hard-cash-to-use-our-search-engine/#comments</comments>
		<pubDate>Thu, 22 May 2008 09:32:08 +0000</pubDate>
		<dc:creator>Andy Webb</dc:creator>
				<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://www.epiphanysolutions.co.uk/blog/?p=147</guid>
		<description><![CDATA[Good old Microsoft, always the bridesmaid never the bride in the field of innovation have taken to extraordinary measures to pull some business from online advertising giant, Google. With the recent bid to take over Yahoo resulting in failure, the usual tactic of embrace, extend and extinguish which Microsoft has enjoyed much success with in [...]]]></description>
			<content:encoded><![CDATA[<p>Good old Microsoft, always the bridesmaid never the bride in the field of innovation have taken to extraordinary measures to pull some business from online advertising giant, Google. With the recent bid to take over Yahoo resulting in failure, the usual tactic of embrace, extend and extinguish which Microsoft has enjoyed much success with in previous years proves not to be paying off this time round.</p>
<p><span id="more-147"></span></p>
<p>This has caused the OS giant to resort to offering cash discounts to US shoppers making purchases after coming into sites through a Live search advert. This initiative offers partial refunds on purchases through a new &#8220;cashback&#8221; system where money is deposited directly into customer&#8217;s bank accounts or optionally through their PayPal accounts.</p>
<p>Big hitters who have agreed to adopt the cashback scheme are book retailers Barnes &amp; Noble, home improvement and construction products retailer Home Depot and online clearance warehouse Overstock.com amongst others.</p>
<p>Speaking from the annual digital ad conference in Redmond, Bill Gates said &#8220;2008 is the year that search got competitive,&#8221; along with promises of &#8220;big announcements around search to happen every six months&#8221; from the Microsoft camp.</p>


<div class="shr-bookmarks shr-bookmarks-expand shr-bookmarks-center">
<ul class="socials">
		<li class="shr-delicious">
			<a href="http://delicious.com/post?url=http://www.epiphanysolutions.co.uk/blog/cold-hard-cash-to-use-our-search-engine/&amp;title=Cold+Hard+Cash+to+use+our+Search+Engine%21" rel="nofollow" class="external" title="Share this on del.icio.us">Share this on del.icio.us</a>
		</li>
		<li class="shr-digg">
			<a href="http://digg.com/submit?phase=2&amp;url=http://www.epiphanysolutions.co.uk/blog/cold-hard-cash-to-use-our-search-engine/&amp;title=Cold+Hard+Cash+to+use+our+Search+Engine%21" rel="nofollow" class="external" title="Digg this!">Digg this!</a>
		</li>
		<li class="shr-facebook">
			<a href="http://www.facebook.com/share.php?v=4&amp;src=bm&amp;u=http://www.epiphanysolutions.co.uk/blog/cold-hard-cash-to-use-our-search-engine/&amp;t=Cold+Hard+Cash+to+use+our+Search+Engine%21" rel="nofollow" class="external" title="Share this on Facebook">Share this on Facebook</a>
		</li>
		<li class="shr-googlebuzz">
			<a href="http://www.google.com/buzz/post?url=http://www.epiphanysolutions.co.uk/blog/cold-hard-cash-to-use-our-search-engine/&amp;imageurl=" rel="nofollow" class="external" title="Post on Google Buzz">Post on Google Buzz</a>
		</li>
		<li class="shr-linkedin">
			<a href="http://www.linkedin.com/shareArticle?mini=true&amp;url=http://www.epiphanysolutions.co.uk/blog/cold-hard-cash-to-use-our-search-engine/&amp;title=Cold+Hard+Cash+to+use+our+Search+Engine%21&amp;summary=Good%20old%20Microsoft%2C%20always%20the%20bridesmaid%20never%20the%20bride%20in%20the%20field%20of%20innovation%20have%20taken%20to%20extraordinary%20measures%20to%20pull%20some%20business%20from%20online%20advertising%20giant%2C%20Google.%20With%20the%20recent%20bid%20to%20take%20over%20Yahoo%20resulting%20in%20failure%2C%20the%20usual%20tactic%20of%20embrace%2C%20extend%20and%20extinguish%20which&amp;source=Search Marketing" rel="nofollow" class="external" title="Share this on LinkedIn">Share this on LinkedIn</a>
		</li>
		<li class="shr-reddit">
			<a href="http://reddit.com/submit?url=http://www.epiphanysolutions.co.uk/blog/cold-hard-cash-to-use-our-search-engine/&amp;title=Cold+Hard+Cash+to+use+our+Search+Engine%21" rel="nofollow" class="external" title="Share this on Reddit">Share this on Reddit</a>
		</li>
		<li class="shr-sphinn">
			<a href="http://sphinn.com/index.php?c=post&amp;m=submit&amp;link=http://www.epiphanysolutions.co.uk/blog/cold-hard-cash-to-use-our-search-engine/" rel="nofollow" class="external" title="Sphinn this on Sphinn">Sphinn this on Sphinn</a>
		</li>
		<li class="shr-stumbleupon">
			<a href="http://www.stumbleupon.com/submit?url=http://www.epiphanysolutions.co.uk/blog/cold-hard-cash-to-use-our-search-engine/&amp;title=Cold+Hard+Cash+to+use+our+Search+Engine%21" rel="nofollow" class="external" title="Stumble upon something good? Share it on StumbleUpon">Stumble upon something good? Share it on StumbleUpon</a>
		</li>
		<li class="shr-technorati">
			<a href="http://technorati.com/faves?add=http://www.epiphanysolutions.co.uk/blog/cold-hard-cash-to-use-our-search-engine/" rel="nofollow" class="external" title="Share this on Technorati">Share this on Technorati</a>
		</li>
		<li class="shr-twitter">
			<a href="http://twitter.com/home?status=Cold+Hard+Cash+to+use+our+Search+Engine%21+-+http://b2l.me/ugckv&amp;source=shareaholic" rel="nofollow" class="external" title="Tweet This!">Tweet This!</a>
		</li>
</ul>
<div style="clear:both;"></div>
</div>

]]></content:encoded>
			<wfw:commentRss>http://www.epiphanysolutions.co.uk/blog/cold-hard-cash-to-use-our-search-engine/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

