<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Si Dawson . Com &#187; Algorithms</title>
	<atom:link href="http://sidawson.com/category/algorithms/feed" rel="self" type="application/rss+xml" />
	<link>http://sidawson.com</link>
	<description>Self Improving Software. Evolutionary Algorithms. Weak AI.</description>
	<lastBuildDate>Thu, 09 Feb 2012 09:00:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>The Importance Of Pipes</title>
		<link>http://sidawson.com/2009/03/importance-of-pipes.html</link>
		<comments>http://sidawson.com/2009/03/importance-of-pipes.html#comments</comments>
		<pubDate>Mon, 30 Mar 2009 04:51:00 +0000</pubDate>
		<dc:creator>Si</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Software-Engineering]]></category>

		<guid isPermaLink="false">http://sidawson.com/?p=14</guid>
		<description><![CDATA[There&#8217;s a very subtle, often overlooked thing in Unix, the pipeline, or &#124; character (often just called a pipe). This is perhaps the most important thing in the entire operating system, with the possible exception of the &#8220;everything is a file&#8221; concept. In case you&#8217;re unfamiliar (or didn&#8217;t feel like reading the wiki page above), [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s a very subtle, often overlooked thing in Unix, the <a href="http://en.wikipedia.org/wiki/Pipeline_(Unix)">pipeline</a>, or | character (often just called a pipe).</p>
<p>This is perhaps the most important thing in the entire operating system, with the possible exception of the &#8220;everything is a file&#8221; concept.</p>
<p>In case you&#8217;re unfamiliar (or didn&#8217;t feel like reading the wiki page above), here&#8217;s the basic concept:</p>
<p>A pipe allows you to link the output of one program to the input of another.</p>
<p>eg foo | bar &#8211; this takes the output from foo, and feeds whatever-it-is into bar &#8211; rather than, say, having to point bar at a specific file to make it do anything useful.</p>
<p>Why are pipes so awesome?</p>
<p>Well, the following reasons:</p>
<ol>
<li>Each program only has to do one thing, &amp; do it well</li>
<li>As such, development of those programs can be split up &#8211; even to the point where a thousand people can independently write a thousand programs, &amp; they&#8217;ll all still be useful</li>
<li>Each of those programs is very simple, thus faster to develop, easier to debug, etc</li>
<li>Extremely complex behaviour can be created by linking different programs together in different ways</li>
<li>None of that higher level behaviour has to be pre-thought or designed for</li>
</ol>
<p>So, Unix has ended up with a ton of small but powerful programs. For example:</p>
<ul>
<li>ls &#8211; lists a directory</li>
<li>cat &#8211; displays stuff</li>
<li>sort &#8211; sorts stuff</li>
<li>grep &#8211; finds things in stuff</li>
<li>tr &#8211; translates stuff (eg, upper to lower case)</li>
<li>wc &#8211; counts words or lines</li>
<li>less &#8211; pauses stuff, allowing forwards &amp; backwards scrolling</li>
</ul>
<p>I&#8217;ve been deliberately vague with the descriptions. Why? Because &#8216;stuff&#8217; can mean a file &#8211; if we specify it, or, it can mean whatever we pass in by putting a pipe in front of it.</p>
<p>So here&#8217;s an example. The file we&#8217;ll use is /usr/share/dict/words &#8211; ie, the dictionary.</p>
<blockquote><p>cat /usr/share/dict/words</p>
</blockquote>
<p>displays the dictionary</p>
<blockquote><p>cat /usr/share/dict/words | grep eft</p>
</blockquote>
<p>displays dict, but only shows words with &#8216;eft&#8217; in them</p>
<blockquote><p>cat /usr/share/dict/words | grep eft | sort</p>
</blockquote>
<p>displays &#8216;eft&#8217; words, sorted</p>
<blockquote><p>cat /usr/share/dict/words | grep eft | sort | less</p>
</blockquote>
<p>displays sorted &#8216;eft&#8217; words, but paused so we can see what the hell we&#8217;ve got before it scrolls madly off the screen</p>
<blockquote><p>cat /usr/share/dict/words | grep eft | grep -ve &#8216;^[A-Z]&#8216; | sort | less</p>
</blockquote>
<p>displays paused sorted &#8216;eft&#8217; words, but removes any that start with capital letters (ie, all the Proper Nouns)</p>
<blockquote><p>cat /usr/share/dict/words | grep eft | grep -ve &#8216;^[A-Z]&#8216; | wc -l</p>
</blockquote>
<p>gives us the count of how many non proper-noun &#8216;eft&#8217; words there are in the dictionary (in the huge british english dictionary? 149, since I know you&#8217;re curious)</p>
<p>So there&#8217;s an additional benefit which is probably obvious. Debugging a complex set of interactions with pipes is incredibly straightforward. You can simply build up what you think you need, experimenting a little at each stage, &amp; viewing the output. When it looks like what you want, you just remove the output-to-screen, and voila!</p>
<p>For the end-users, this means that the operating cost of using the system in a complex manner is drastically reduced.</p>
<p>What would happen without pipes? You&#8217;d end up with monolithic programs for every imaginable combination of user need. Ie, an unmitigated disaster. You can see elements of this in, umm, &#8216;certain other&#8217; operating systems. *cough*</p>
<p>Most importantly of all, there is a meta benefit. A combination of all of the above benefits.</p>
<p>Pipes enable incredibly complex higher level behaviours to emerge without being designed in. It&#8217;s a spontaneous emergent behaviour of the system. There&#8217;s no onus on the system development programmers to be demi-gods, all they need to do is tackle one simple problem at a time &#8211; display a file, sort a file, and so on. The system as a whole benefits exponentially from every small piece of added functionality, as pipes then enable them to be used in every possible permutation.</p>
<p>It&#8217;s as if an anthill full of differently talented ants was suddenly building space ships.</p>
<p>Perhaps a better bits-vs-atoms metaphor is of money. Specifically the exchange of goods (atoms) for money, allows the conversion of those atoms into other atoms, via money. In the same way, pipes allows different programs to seamlessly interact via streamed data, in infinitely variable ways.</p>
<p>You don&#8217;t need to know how to make a car, since you can do what you&#8217;re good at, get paid, &amp; exchange that money for a car. Or a boat. Or a computer. Society as a whole is vastly better off as each person can specialize &amp; everybody benefits. Think how basic our world would be if we only had things that everybody knew how to build or do. Same thing with computers &amp; pipes.</p>
<p>What seems like an almost ridiculously simple concept, pipes, has allowed an unimaginably sophisticated system to emerge from simple, relatively easily built pieces.</p>
<p>It&#8217;s not quite the holy grail of systems design, but it&#8217;s bloody close.</p>
]]></content:encoded>
			<wfw:commentRss>http://sidawson.com/2009/03/importance-of-pipes.html/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>A Nifty Non-Replacing Selection Algorithm</title>
		<link>http://sidawson.com/2008/12/nifty-non-replacing-selection-algorithm.html</link>
		<comments>http://sidawson.com/2008/12/nifty-non-replacing-selection-algorithm.html#comments</comments>
		<pubDate>Tue, 16 Dec 2008 02:40:00 +0000</pubDate>
		<dc:creator>Si</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Software-Engineering]]></category>

		<guid isPermaLink="false">http://sidawson.com/?p=13</guid>
		<description><![CDATA[Algorithms are awesome fun, so I was super pleased when my little bro asked me to help him with a toy problem he had. The description is this: It&#8217;s a secret santa chooser. A group of people, where each person has to be matched up with one other person, but not themselves. He&#8217;s setup an [...]]]></description>
			<content:encoded><![CDATA[<p>Algorithms are awesome fun, so I was super pleased when my little bro asked me to help him with a toy problem he had.</p>
<p>The description is this: It&#8217;s a secret santa chooser. A group of people, where each person has to be matched up with one other person, but not themselves.</p>
<p>He&#8217;s setup an array that has an id for each person.</p>
<p>His initial shot was something like this (pseudo, obviously):</p>
<blockquote>
<pre style="FONT-SIZE: 12px">
foreach $array as $key =&gt; $subarr {
  do {
      // $count is set to count($array)
      $var = rand(0, $count)
  } while $var != $key and $var isn't already assigned
  $array[$key][$assign] = $var
}
</pre>
</blockquote>
<p>Initially he was mostly concerned that rand would get called a lot of times (it&#8217;s inefficient in the language he&#8217;s using).</p>
<p>However, there&#8217;s a ton of neat (non-obvious) problems with this algorithm:</p>
<ol>
<li>By the time we&#8217;re trying to match the last person, we&#8217;ll be calling rand (on average) N-1 times</li>
<li>As a result, it&#8217;s inefficient as hell ( O(3N+1)/2)? )</li>
<li>There is a small chance that on the last call we&#8217;ll actually lock &#8211; since we won&#8217;t have a non-dupe to match with</li>
<li>Not obvious above, but he also considered recreating the array on every iteration of the loop *wince*</li>
</ol>
<p>Add to this some interesting aspects of the language &#8211; immutable arrays (ie, there&#8217;s no inbuilt linked lists, so you can&#8217;t del from the middle of an array/list) &amp; it becomes an interesting problem.</p>
<p>The key trick was to have two arrays:</p>
<p>One, 2-dimensional array (first dim holding keys, second the matches) <br/>and one 1-dimensional array (which will only hold keys, in order).</p>
<p>Let&#8217;s call the first one &#8220;$list&#8221; and the second &#8220;$valid&#8221;.</p>
<p>The trick is this &#8211; $valid holds a list of all remaining valid keys, in the first N positions of the array, where initially N = $valid length. Both $list &amp; $valid are initially loaded with all keys, in order.</p>
<p>So, to pick a valid key, we just select $valid[rand(N)] and make sure it&#8217;s not equal to the key we&#8217;re assigning to. <br/>Then, we do two things:</p>
<ol>
<li>Swap the item at position rand(N) (which we just selected) with the Nth item in the $valid array, &amp;</li>
<li>Decrement N ($key_to_process).</li>
</ol>
<p>This has the neat effect of ensuring that the item we just selected is always at position N+1. So, next time we rand(N), since N is now one smaller, we can be sure it&#8217;s impossible to re-select the just selected item.</p>
<p>Put another way, by the time we finish, $valid will still hold all the keys, just in reverse order that we selected them.</p>
<p>It also means we don&#8217;t have to do any array creation. There&#8217;s still a 1/N chance that we&#8217;ll self-select of course, but there&#8217;s no simple way of avoiding that.</p>
<p>Note that below we don&#8217;t do the swap (since really, why bother with two extra lines of code?) we simply ensure that position rand(N) (ie, $key_no) now holds the key we <strong>didn&#8217;t</strong> select &#8211; ie, the one that is just off the top of the selectable area.</p>
<p>Oh, and in this rand implementation rand(0, N) includes both 0 AND N (most only go 0-&gt;N-1 inclusive).</p>
<blockquote>
<pre style="FONT-SIZE: 12px">
$valid = array_keys($list);
$key_to_process = count($valid) - 1;
do {
  $key_no = rand(0, $key_to_process);
  if ($key_to_process != $valid[$key_no]) {
    $list[$key_to_process][2] = $valid[$key_no];
    $valid[$key_no] = $valid[$key_to_process];
    $key_to_process--;
  }
  # deal with the horrid edge case where the last
  # $list key is equal to the last available
  # $valid key
  if ($key_to_process == 0 and $valid[0] == 0) {
    $key_no = rand(1, count($list) - 1);
    $list[0][2] = $key_no;
    $list[$key_no][2] = 0;
    $key_to_process--;
  }
} while ($key_to_process &gt;= 0);
</pre>
</blockquote>
<p><br/>Without the edge-case code, this results in a super fast, nice slick little 10 or so line algorithm (depending on how/if you count {}&#8217;s <img src='http://sidawson.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Elegant, I dig it.</p>
]]></content:encoded>
			<wfw:commentRss>http://sidawson.com/2008/12/nifty-non-replacing-selection-algorithm.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cuil Really Isn&#8217;t (Yet)</title>
		<link>http://sidawson.com/2008/08/cuil-really-isn-yet.html</link>
		<comments>http://sidawson.com/2008/08/cuil-really-isn-yet.html#comments</comments>
		<pubDate>Sat, 02 Aug 2008 13:11:00 +0000</pubDate>
		<dc:creator>Si</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://sidawson.com/?p=8</guid>
		<description><![CDATA[There&#8217;s been a lot of grumping about Cuil lately, so I thought I&#8217;d add to it (hopefully in interesting new ways). I&#8217;ll talk in context of a site that I have *cough* some familiarity with: galadarling.com (hint: I built &#38; managed it for the last two years). This site gets over 100k uniques a week, [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s been <a href="http://www.google.com/search?num=100&amp;hl=en&amp;safe=off&amp;q=cuil+sucks&amp;btnG=Search">a lot of grumping</a> about Cuil lately, so I thought I&#8217;d add to it (hopefully in interesting new ways).</p>
<p>I&#8217;ll talk in context of a site that I have *cough* some familiarity with: galadarling.com (hint: I built &amp; managed it for the last two years). This site gets over 100k uniques a week, has a google pagerank of 6, and a technorati rank in the mid 5000&#8242;s. I.e., it&#8217;s not Yahoo, but it is a significant, popular smaller site.</p>
<p>The obvious test &#8211; surely searching for <em>gala darling</em> would return the site? It&#8217;s not complicated. Her name is the url. But no:</p>
<p><a href="http://sidawson.com/images/2008/08/cuil_safe.jpg"><img src="http://sidawson.com/images/2008/08/cuil_safe.jpg" alt="cuil_safe.jpg" height="823" width="450"/> (click for a clearer view)</a></p>
<p>Maybe it&#8217;s the safe search? No, flick that off, and exactly the same results:</p>
<p><a href="http://sidawson.com/images/2008/08/cuil_safe_off.jpg"><img src="http://sidawson.com/images/2008/08/cuil_safe_off.jpg" alt="cuil_safe_off.jpg" height="823" width="450"/> (click for a clearer view)</a></p>
<p>Putting in <em>&#8220;gala darling&#8221;</em> in quotes (just like that) results in the exact same result set. Huh?</p>
<p>Even scarier, putting in <em>galadarling</em> (all one word) doesn&#8217;t even return the site. How is this possible?</p>
<p>Even worse than that &#8211; gala.vox.com is the second returned result. This is a single page that was setup once and then ignored. It&#8217;s not just not finding the correct result, it&#8217;s actively returning junk.</p>
<p>None of these sets include Gala&#8217;s livejournal, which is updated every couple of weeks, let alone the actual site that has her name on it.</p>
<p>With the exception of her twitter account, all the results on the front page are other sites, talking about her&#8230; and this is useful.. how?</p>
<p>I looked at the first 20 pages of results &#8211; couldn&#8217;t find either her livejournal or main site. vox.com somehow managed to get several hundred mentions. A single collegecandy page appeared at least 5 times.</p>
<p>Ok, it&#8217;s common knowledge that the Cuil results are crap. How about some other things.</p>
<ul>
<li>When you first load the page, you have to actually click to get into the textbox to enter your search terms.</li>
<li>For some reason I&#8217;m asked to accept cookies both from cuil.com (fair enough), and cuilimg.com (what? why?)</li>
<li>When paging through the results, there&#8217;s no way to go back to the start, as once you get past page 10, the earlier pages scroll off to the left, so you have to go backwards in chunks of 4 or 5.</li>
<li>There&#8217;s no way to get more results on a screen. Even on my 1900 pixel wide screen &amp; a tiny font, I still only get 10ish results per page. Google allows me 100 &#8211; why should I click-wait-click-wait just to use Cuil?</li>
</ul>
<p>To their credit, Cuil&#8217;s bot is not hitting the site anywhere near as much as it used to, so that&#8217;s one good thing. Considering galadarling is updated typically once a day, plus maybe a hundred comments, the Cuil bot (twiceler) used to hit the site about a thousand times a day (resulting in it <a href="http://www.kirps.com/web/main/_blog/all/why-i-hate-cuil.shtml">getting blocked by damn near everyone</a>). For comparison, Google&#8217;s bots hit it 400 times a day (partly that will be because there is context-sensitive advertising on the site, so Google needs to scan for that). Now Twiceler is visiting about a hundred times a day &#8211; much more reasonable given the update frequency.</p>
<p>I&#8217;ve talked to the guys running Cuil (back when it still had two &#8216;l&#8217;s in its name). They&#8217;re obviously very smart cookies and they definitely care about what they do. If they can shake Google up &#8211; well, great &#8211; and I say that as a Google shareholder. They definitely have a lot of bugs to iron out though, and the reliability of those results needs to be right at the top of the list. Without trust, what do they have?</p>
]]></content:encoded>
			<wfw:commentRss>http://sidawson.com/2008/08/cuil-really-isn-yet.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

