<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Si Dawson . Com &#187; Software-Engineering</title>
	<atom:link href="http://sidawson.com/category/software-engineering/feed" rel="self" type="application/rss+xml" />
	<link>http://sidawson.com</link>
	<description>Self Improving Software. Evolutionary Algorithms. Weak AI.</description>
	<lastBuildDate>Thu, 09 Feb 2012 09:00:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>How to do a multi-table update with a limit in MySQL</title>
		<link>http://sidawson.com/2012/02/how-to-do-a-multi-table-update-with-a-limit-in-mysql.html</link>
		<comments>http://sidawson.com/2012/02/how-to-do-a-multi-table-update-with-a-limit-in-mysql.html#comments</comments>
		<pubDate>Thu, 09 Feb 2012 05:09:02 +0000</pubDate>
		<dc:creator>Si</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Software-Engineering]]></category>

		<guid isPermaLink="false">http://sidawson.com/?p=79</guid>
		<description><![CDATA[According to the MySQL documentation, you can&#8217;t do a multi-table UPDATE with a LIMIT. What&#8217;s a multi-table update with a limit? Well, something like this: UPDATE &#160;&#160;&#160;&#160;foo &#160;&#160;&#160;&#160;, bar SET &#160;&#160;&#160;&#160;foo.baz=bar.baz WHERE &#160;&#160;&#160;&#160;foo.baz IS NULL &#160;&#160;&#160;&#160;AND foo.id=bar.id LIMIT &#160;&#160;&#160;&#160;1000 ; (Which doesn&#8217;t work. Of course, you can do single table UPDATEs with a LIMIT just [...]]]></description>
			<content:encoded><![CDATA[<p>According to <a href="http://dev.mysql.com/doc/refman/5.0/en/update.html">the MySQL documentation</a>, you can&#8217;t do a multi-table UPDATE with a LIMIT.</p>
<p>What&#8217;s a multi-table update with a limit? Well, something like this:</p>
<blockquote><p>UPDATE<br />
&nbsp;&nbsp;&nbsp;&nbsp;foo<br />
&nbsp;&nbsp;&nbsp;&nbsp;, bar<br />
SET<br />
&nbsp;&nbsp;&nbsp;&nbsp;foo.baz=bar.baz<br />
WHERE<br />
&nbsp;&nbsp;&nbsp;&nbsp;foo.baz IS NULL<br />
&nbsp;&nbsp;&nbsp;&nbsp;AND foo.id=bar.id<br />
LIMIT<br />
&nbsp;&nbsp;&nbsp;&nbsp;1000<br />
;</p></blockquote>
<p>(Which doesn&#8217;t work. Of course, you can do single table UPDATEs with a LIMIT just fine)</p>
<p>Why would you even want to do this?</p>
<p>Well, anytime you have monster sized tables, and you don&#8217;t want to lock everybody else while you either read (from bar) or write (to foo). If you can put a limit on the update, you can call it repeatedly, in small chunks, and not choke everything for all other users.</p>
<p>For example, if bar happens to have, say, ohhh, 30 million rows in it, foo happens to have ooh, 2 million rows and they&#8217;re both used by everything, all the time.</p>
<p>So, here&#8217;s a sneaky way to get around this limitation. I did promise one, right there in the title, after all.</p>
<blockquote><p>UPDATE<br />
&nbsp;&nbsp;&nbsp;&nbsp;foo<br />
&nbsp;&nbsp;&nbsp;&nbsp;, (SELECT<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;bar.id<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;, bar.baz<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;FROM<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;foo<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;, bar<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;WHERE<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;foo.id=bar.id<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;AND foo.baz IS NULL<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;LIMIT<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1000<br />
&nbsp;&nbsp;&nbsp;&nbsp;) tmp<br />
SET<br />
&nbsp;&nbsp;&nbsp;&nbsp;foo.baz=tmp.baz<br />
WHERE<br />
&nbsp;&nbsp;&nbsp;&nbsp;foo.id=tmp.id<br />
;</p></blockquote>
<p>Some important notes:</p>
<ul>
<li>The update conditions (foo.baz IS NULL) go inside the subquery, along with the LIMIT.</li>
<li>We have to match ids twice &#8211; once for the subquery, and once against the created temporary table. That&#8217;s why we make sure we SELECT both the id and baz from bar in the subquery.</li>
<li>There&#8217;s no conditionals (other than id match) on the outside WHERE condition, since we&#8217;ve done them all in the subquery.</li>
<li>MySQL also has a limitation of not allowing you to UPDATE while SELECTing from the same table in a subquery. Notice that this sneakily avoids it by only SELECTing from the other table.</li>
</ul>
<p>So, how about that? You can now do limited multi-table updates.</p>
<p>Oh, except for one. Minor. Problem.</p>
<p>This doesn&#8217;t work with temp tables (eg if foo was created with a CREATE TEMPORARY TABLE statement).</p>
<p>Bugger.</p>
<p>However, here&#8217;s a sneaky way around that limitation too.</p>
<p>First of all, give your temp table another column, in the example below &#8220;ref_no INT&#8221;.</p>
<p>Make sure you have an index on the id, otherwise it&#8217;ll be dog slow.</p>
<p>Then do this:</p>
<blockquote><p># do this in chunks of 1k<br />
SET @counter = 1;</p>
<p>REPEAT</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;UPDATE<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;tmp_foo<br />
&nbsp;&nbsp;&nbsp;&nbsp;SET<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ref_no=@counter<br />
&nbsp;&nbsp;&nbsp;&nbsp;WHERE<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ref_no=0<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;AND baz IS NULL<br />
&nbsp;&nbsp;&nbsp;&nbsp;LIMIT<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1000<br />
&nbsp;&nbsp;&nbsp;&nbsp;;<br />
&nbsp;&nbsp;&nbsp;&nbsp;COMMIT;</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;UPDATE<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;tmp_foo<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;, bar<br />
&nbsp;&nbsp;&nbsp;&nbsp;SET<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;tmp_foo.baz=bar.baz<br />
&nbsp;&nbsp;&nbsp;&nbsp;WHERE<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;tmp_foo.ref_no=@counter<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;AND tmp_foo.id=bar.id<br />
&nbsp;&nbsp;&nbsp;&nbsp;;<br />
&nbsp;&nbsp;&nbsp;&nbsp;COMMIT;</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;SET @counter = @counter + 1;</p>
<p>UNTIL (SELECT COUNT(id) FROM tmp_foo WHERE ref_no=0 AND baz IS NULL) = 0<br />
END REPEAT;</p></blockquote>
<p>Some important notes:</p>
<ul>
<li>We&#8217;re basically flagging a thousand rows at a time, then matching only against those rows &#8211; pretty simple concept really.</li>
<li>The commits are in there because MySQL can be a bit weird about not propagating changes to the database if you don&#8217;t commit inside your stored proc. This ensures that updates are passed out, which also means I can run multiple copies of this stored proc concurrently with moderate safety (if I replace @counter with a suitably large RAND() value) &#8211; well, as much as you can normally expect with MySQL anyway.</li>
<li>If you want to reuse the temp table (say, to update something else from &#8211; a reverse update to that shown above) you&#8217;ll need to reset all the ref_no&#8217;s to 0.</li>
<li>Whatever conditions are in the initial WHERE need to be mirrored in the final SELECT COUNT.</li>
<li>Obviously just drop the table when you&#8217;re finished.</li>
</ul>
<p>As a bonus, I&#8217;ve found this is actually quicker than doing one single large scale update. Why? Less memory is used.</p>
<p>So look at that. TWO ways to get multi-table updates with a limit. Nifty.</p>
]]></content:encoded>
			<wfw:commentRss>http://sidawson.com/2012/02/how-to-do-a-multi-table-update-with-a-limit-in-mysql.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to automatically position PuTTY terminal</title>
		<link>http://sidawson.com/2012/02/how-to-automatically-position-putty-terminal.html</link>
		<comments>http://sidawson.com/2012/02/how-to-automatically-position-putty-terminal.html#comments</comments>
		<pubDate>Thu, 09 Feb 2012 04:24:52 +0000</pubDate>
		<dc:creator>Si</dc:creator>
				<category><![CDATA[Software-Engineering]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://sidawson.com/?p=65</guid>
		<description><![CDATA[PuTTY is arguably the best Windows terminal app. I&#8217;ve used it for years. One thing that has always felt missing is the ability to automatically position it on the screen on startup. For example, I always have a couple of windows where I watch certain processes. They&#8217;re always in the same position on my screen, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.chiark.greenend.org.uk/~sgtatham/putty/">PuTTY</a> is arguably the best Windows terminal app. I&#8217;ve used it for years.</p>
<p>One thing that has always felt missing is the ability to automatically position it on the screen on startup. For example, I always have a couple of windows where I watch certain processes. They&#8217;re always in the same position on my screen, and always running the same scripts inside them. How do I do that?</p>
<p>So, you can imagine my glee when I found a patch out there that will do exactly that. <a href="http://www.bradgoodman.com/puttypatch.html">Props go to Brad Goodman</a> who wrote it. Follow that link to get both an exe (his patch applied to a 2005 build) and the source so you can patch it yourself. I&#8217;ve also put a copy of the exe <a href="http://sidawson.com/misc/putty-bkg.exe">here</a>, in case his site disappears.</p>
<p>How to use it? Very simple.</p>
<ol>
<li>Save your session settings from inside PuTTY, as normal</li>
<li>Start up putty-bkg.exe (instead of the usual putty.exe)</li>
<li>Now when you load your sessions you&#8217;ll also see the new &#8220;Position &amp; Icon&#8221; options, under &#8220;Window&#8221;:
<p><div id="attachment_67" class="wp-caption alignnone" style="width: 466px"><a href="http://sidawson.com/wp-content/uploads/2012/02/putty_1.jpg"><img class="size-full wp-image-67" src="http://sidawson.com/wp-content/uploads/2012/02/putty_1.jpg" alt="" width="456" height="437" /></a><p class="wp-caption-text">Enter the desired top, left (&amp; icon, if you feel inspired)</p></div></li>
<li>Alter them to your heart&#8217;s content (don&#8217;t forget to also adjust the rows &amp; columns under &#8220;Window&#8221; to get it the right size too), save your session and voila.</li>
</ol>
<p>Even more useful is if you combine this with setting a remote command to be run at initial connection.</p>
<ol>
<li>Click on &#8220;SSH&#8221; under &#8220;Connection&#8221; (you can&#8217;t do this if the session is already active)</li>
<li>Enter the command or script you want run in the &#8220;Remote Command&#8221; textbox:
<p><div id="attachment_68" class="wp-caption alignnone" style="width: 466px"><a href="http://sidawson.com/wp-content/uploads/2012/02/putty_2.jpg"><img class="size-full wp-image-68" title="" src="http://sidawson.com/wp-content/uploads/2012/02/putty_2.jpg" alt="" width="456" height="437" /></a><p class="wp-caption-text">Enter the startup command</p></div></li>
<li>Click the &#8220;Save&#8221; button (under &#8220;Session&#8221;) as usual.</li>
</ol>
<p>Now, to run all of the above, just setup a shortcut to:</p>
<blockquote><p>&#8220;C:\Program Files\putty\putty-bkg.exe&#8221; -load [session name]</p></blockquote>
<p>(where [session name] is whatever you saved your session as). Obviously adjust the path to wherever you put putty.bkg.exe.</p>
<p>Next time you click the shortcut, not only will PuTTY be positioned correctly on the screen, but it&#8217;ll also automatically be running whatever script you desire. Just like magic! (but with less jiggery pokery)</p>
]]></content:encoded>
			<wfw:commentRss>http://sidawson.com/2012/02/how-to-automatically-position-putty-terminal.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Importance Of Pipes</title>
		<link>http://sidawson.com/2009/03/importance-of-pipes.html</link>
		<comments>http://sidawson.com/2009/03/importance-of-pipes.html#comments</comments>
		<pubDate>Mon, 30 Mar 2009 04:51:00 +0000</pubDate>
		<dc:creator>Si</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Software-Engineering]]></category>

		<guid isPermaLink="false">http://sidawson.com/?p=14</guid>
		<description><![CDATA[There&#8217;s a very subtle, often overlooked thing in Unix, the pipeline, or &#124; character (often just called a pipe). This is perhaps the most important thing in the entire operating system, with the possible exception of the &#8220;everything is a file&#8221; concept. In case you&#8217;re unfamiliar (or didn&#8217;t feel like reading the wiki page above), [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s a very subtle, often overlooked thing in Unix, the <a href="http://en.wikipedia.org/wiki/Pipeline_(Unix)">pipeline</a>, or | character (often just called a pipe).</p>
<p>This is perhaps the most important thing in the entire operating system, with the possible exception of the &#8220;everything is a file&#8221; concept.</p>
<p>In case you&#8217;re unfamiliar (or didn&#8217;t feel like reading the wiki page above), here&#8217;s the basic concept:</p>
<p>A pipe allows you to link the output of one program to the input of another.</p>
<p>eg foo | bar &#8211; this takes the output from foo, and feeds whatever-it-is into bar &#8211; rather than, say, having to point bar at a specific file to make it do anything useful.</p>
<p>Why are pipes so awesome?</p>
<p>Well, the following reasons:</p>
<ol>
<li>Each program only has to do one thing, &amp; do it well</li>
<li>As such, development of those programs can be split up &#8211; even to the point where a thousand people can independently write a thousand programs, &amp; they&#8217;ll all still be useful</li>
<li>Each of those programs is very simple, thus faster to develop, easier to debug, etc</li>
<li>Extremely complex behaviour can be created by linking different programs together in different ways</li>
<li>None of that higher level behaviour has to be pre-thought or designed for</li>
</ol>
<p>So, Unix has ended up with a ton of small but powerful programs. For example:</p>
<ul>
<li>ls &#8211; lists a directory</li>
<li>cat &#8211; displays stuff</li>
<li>sort &#8211; sorts stuff</li>
<li>grep &#8211; finds things in stuff</li>
<li>tr &#8211; translates stuff (eg, upper to lower case)</li>
<li>wc &#8211; counts words or lines</li>
<li>less &#8211; pauses stuff, allowing forwards &amp; backwards scrolling</li>
</ul>
<p>I&#8217;ve been deliberately vague with the descriptions. Why? Because &#8216;stuff&#8217; can mean a file &#8211; if we specify it, or, it can mean whatever we pass in by putting a pipe in front of it.</p>
<p>So here&#8217;s an example. The file we&#8217;ll use is /usr/share/dict/words &#8211; ie, the dictionary.</p>
<blockquote><p>cat /usr/share/dict/words</p>
</blockquote>
<p>displays the dictionary</p>
<blockquote><p>cat /usr/share/dict/words | grep eft</p>
</blockquote>
<p>displays dict, but only shows words with &#8216;eft&#8217; in them</p>
<blockquote><p>cat /usr/share/dict/words | grep eft | sort</p>
</blockquote>
<p>displays &#8216;eft&#8217; words, sorted</p>
<blockquote><p>cat /usr/share/dict/words | grep eft | sort | less</p>
</blockquote>
<p>displays sorted &#8216;eft&#8217; words, but paused so we can see what the hell we&#8217;ve got before it scrolls madly off the screen</p>
<blockquote><p>cat /usr/share/dict/words | grep eft | grep -ve &#8216;^[A-Z]&#8216; | sort | less</p>
</blockquote>
<p>displays paused sorted &#8216;eft&#8217; words, but removes any that start with capital letters (ie, all the Proper Nouns)</p>
<blockquote><p>cat /usr/share/dict/words | grep eft | grep -ve &#8216;^[A-Z]&#8216; | wc -l</p>
</blockquote>
<p>gives us the count of how many non proper-noun &#8216;eft&#8217; words there are in the dictionary (in the huge british english dictionary? 149, since I know you&#8217;re curious)</p>
<p>So there&#8217;s an additional benefit which is probably obvious. Debugging a complex set of interactions with pipes is incredibly straightforward. You can simply build up what you think you need, experimenting a little at each stage, &amp; viewing the output. When it looks like what you want, you just remove the output-to-screen, and voila!</p>
<p>For the end-users, this means that the operating cost of using the system in a complex manner is drastically reduced.</p>
<p>What would happen without pipes? You&#8217;d end up with monolithic programs for every imaginable combination of user need. Ie, an unmitigated disaster. You can see elements of this in, umm, &#8216;certain other&#8217; operating systems. *cough*</p>
<p>Most importantly of all, there is a meta benefit. A combination of all of the above benefits.</p>
<p>Pipes enable incredibly complex higher level behaviours to emerge without being designed in. It&#8217;s a spontaneous emergent behaviour of the system. There&#8217;s no onus on the system development programmers to be demi-gods, all they need to do is tackle one simple problem at a time &#8211; display a file, sort a file, and so on. The system as a whole benefits exponentially from every small piece of added functionality, as pipes then enable them to be used in every possible permutation.</p>
<p>It&#8217;s as if an anthill full of differently talented ants was suddenly building space ships.</p>
<p>Perhaps a better bits-vs-atoms metaphor is of money. Specifically the exchange of goods (atoms) for money, allows the conversion of those atoms into other atoms, via money. In the same way, pipes allows different programs to seamlessly interact via streamed data, in infinitely variable ways.</p>
<p>You don&#8217;t need to know how to make a car, since you can do what you&#8217;re good at, get paid, &amp; exchange that money for a car. Or a boat. Or a computer. Society as a whole is vastly better off as each person can specialize &amp; everybody benefits. Think how basic our world would be if we only had things that everybody knew how to build or do. Same thing with computers &amp; pipes.</p>
<p>What seems like an almost ridiculously simple concept, pipes, has allowed an unimaginably sophisticated system to emerge from simple, relatively easily built pieces.</p>
<p>It&#8217;s not quite the holy grail of systems design, but it&#8217;s bloody close.</p>
]]></content:encoded>
			<wfw:commentRss>http://sidawson.com/2009/03/importance-of-pipes.html/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>A Nifty Non-Replacing Selection Algorithm</title>
		<link>http://sidawson.com/2008/12/nifty-non-replacing-selection-algorithm.html</link>
		<comments>http://sidawson.com/2008/12/nifty-non-replacing-selection-algorithm.html#comments</comments>
		<pubDate>Tue, 16 Dec 2008 02:40:00 +0000</pubDate>
		<dc:creator>Si</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Software-Engineering]]></category>

		<guid isPermaLink="false">http://sidawson.com/?p=13</guid>
		<description><![CDATA[Algorithms are awesome fun, so I was super pleased when my little bro asked me to help him with a toy problem he had. The description is this: It&#8217;s a secret santa chooser. A group of people, where each person has to be matched up with one other person, but not themselves. He&#8217;s setup an [...]]]></description>
			<content:encoded><![CDATA[<p>Algorithms are awesome fun, so I was super pleased when my little bro asked me to help him with a toy problem he had.</p>
<p>The description is this: It&#8217;s a secret santa chooser. A group of people, where each person has to be matched up with one other person, but not themselves.</p>
<p>He&#8217;s setup an array that has an id for each person.</p>
<p>His initial shot was something like this (pseudo, obviously):</p>
<blockquote>
<pre style="FONT-SIZE: 12px">
foreach $array as $key =&gt; $subarr {
  do {
      // $count is set to count($array)
      $var = rand(0, $count)
  } while $var != $key and $var isn't already assigned
  $array[$key][$assign] = $var
}
</pre>
</blockquote>
<p>Initially he was mostly concerned that rand would get called a lot of times (it&#8217;s inefficient in the language he&#8217;s using).</p>
<p>However, there&#8217;s a ton of neat (non-obvious) problems with this algorithm:</p>
<ol>
<li>By the time we&#8217;re trying to match the last person, we&#8217;ll be calling rand (on average) N-1 times</li>
<li>As a result, it&#8217;s inefficient as hell ( O(3N+1)/2)? )</li>
<li>There is a small chance that on the last call we&#8217;ll actually lock &#8211; since we won&#8217;t have a non-dupe to match with</li>
<li>Not obvious above, but he also considered recreating the array on every iteration of the loop *wince*</li>
</ol>
<p>Add to this some interesting aspects of the language &#8211; immutable arrays (ie, there&#8217;s no inbuilt linked lists, so you can&#8217;t del from the middle of an array/list) &amp; it becomes an interesting problem.</p>
<p>The key trick was to have two arrays:</p>
<p>One, 2-dimensional array (first dim holding keys, second the matches) <br/>and one 1-dimensional array (which will only hold keys, in order).</p>
<p>Let&#8217;s call the first one &#8220;$list&#8221; and the second &#8220;$valid&#8221;.</p>
<p>The trick is this &#8211; $valid holds a list of all remaining valid keys, in the first N positions of the array, where initially N = $valid length. Both $list &amp; $valid are initially loaded with all keys, in order.</p>
<p>So, to pick a valid key, we just select $valid[rand(N)] and make sure it&#8217;s not equal to the key we&#8217;re assigning to. <br/>Then, we do two things:</p>
<ol>
<li>Swap the item at position rand(N) (which we just selected) with the Nth item in the $valid array, &amp;</li>
<li>Decrement N ($key_to_process).</li>
</ol>
<p>This has the neat effect of ensuring that the item we just selected is always at position N+1. So, next time we rand(N), since N is now one smaller, we can be sure it&#8217;s impossible to re-select the just selected item.</p>
<p>Put another way, by the time we finish, $valid will still hold all the keys, just in reverse order that we selected them.</p>
<p>It also means we don&#8217;t have to do any array creation. There&#8217;s still a 1/N chance that we&#8217;ll self-select of course, but there&#8217;s no simple way of avoiding that.</p>
<p>Note that below we don&#8217;t do the swap (since really, why bother with two extra lines of code?) we simply ensure that position rand(N) (ie, $key_no) now holds the key we <strong>didn&#8217;t</strong> select &#8211; ie, the one that is just off the top of the selectable area.</p>
<p>Oh, and in this rand implementation rand(0, N) includes both 0 AND N (most only go 0-&gt;N-1 inclusive).</p>
<blockquote>
<pre style="FONT-SIZE: 12px">
$valid = array_keys($list);
$key_to_process = count($valid) - 1;
do {
  $key_no = rand(0, $key_to_process);
  if ($key_to_process != $valid[$key_no]) {
    $list[$key_to_process][2] = $valid[$key_no];
    $valid[$key_no] = $valid[$key_to_process];
    $key_to_process--;
  }
  # deal with the horrid edge case where the last
  # $list key is equal to the last available
  # $valid key
  if ($key_to_process == 0 and $valid[0] == 0) {
    $key_no = rand(1, count($list) - 1);
    $list[0][2] = $key_no;
    $list[$key_no][2] = 0;
    $key_to_process--;
  }
} while ($key_to_process &gt;= 0);
</pre>
</blockquote>
<p><br/>Without the edge-case code, this results in a super fast, nice slick little 10 or so line algorithm (depending on how/if you count {}&#8217;s <img src='http://sidawson.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Elegant, I dig it.</p>
]]></content:encoded>
			<wfw:commentRss>http://sidawson.com/2008/12/nifty-non-replacing-selection-algorithm.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Trouble With Ratios</title>
		<link>http://sidawson.com/2008/09/trouble-with-ratios.html</link>
		<comments>http://sidawson.com/2008/09/trouble-with-ratios.html#comments</comments>
		<pubDate>Tue, 16 Sep 2008 14:18:00 +0000</pubDate>
		<dc:creator>Si</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Software-Engineering]]></category>

		<guid isPermaLink="false">http://sidawson.com/?p=9</guid>
		<description><![CDATA[Ratios are used all over the place. No huge surprise there &#8211; they are, after all, just one number divided by another. The well known problem case is when the denominator (the bottom bit) is zero, or very near zero. However, there are other subtler issues to consider. Here&#8217;s a chart that has a ratio [...]]]></description>
			<content:encoded><![CDATA[<p>Ratios are used all over the place. No huge surprise there &#8211; they are, after all, just one number divided by another.</p>
<p>The well known problem case is when the denominator (the bottom bit) is zero, or very near zero. However, there are other subtler issues to consider.</p>
<p>Here&#8217;s a chart that has a ratio as the X axis:</p>
<p><img src="http://sidawson.com/images/2008/09/ratio_pre.gif" alt="ratio_pre.gif" height="394" width="500"/></p>
<p>Don&#8217;t sweat the details, they&#8217;re not terribly important &#8211; just the rough distribution.</p>
<p>The X axis in this case is what&#8217;s called a Calmar &#8211; ie, the total dollar return of a system divided by it&#8217;s maximum drawdown. Or, in English &#8211; how much you make proportional to how big your pockets need to be. This gives a non-dollar based (ie, &#8220;pure&#8221;) number that can then be compared across markets, systems, products, whatever.</p>
<p>This graph is actually a bit trickier than that, since there&#8217;s actually 3 dimensions of data there &#8211; it&#8217;s just the third dimension isn&#8217;t plotted &#8211; but we&#8217;ll get back to that.</p>
<p>Where this gets ugly is when, in the case of the Calmar above, the drawdown drops to, or near to, zero. For example, if you have a system that only trades once &#8211; and it&#8217;s a winning trade &#8211; the calmar will be very, very large. Even if you chuck out systems that are obviously a bit nutty like that, you can still end up with situations where the ratio has just blown out of all proportion.</p>
<p>Which results in this:</p>
<p><img src="http://sidawson.com/images/2008/09/ratio_post.gif" alt="ratio_post.gif" height="400" width="500"/></p>
<p>See how everything is in a vertical line on the left?</p>
<p>Well, it&#8217;s not. Those points are actually quite well spread out &#8211; it&#8217;s just that instead of the X axis going from 0-&gt;50 as in the first case, it now goes from 0-&gt;22 million &#8211; of which only a small number are greater than a hundred (you can see them spread out on the right, very close to the Y axis)</p>
<p>In this example, we can see the problem, so we&#8217;re aware of it. However, what if the ratio had been the unplotted third dimension? We might never have known.</p>
<p>Now, the way that I&#8217;m using these ratios internally, I&#8217;m protected from these sorts of blowouts &#8211; I simply compare sets of ratios. If one is bigger, it doesn&#8217;t matter if it&#8217;s bigger by 2 or by 2 billion.</p>
<p>However, there are many situations where you might want proportional representation. If one value is twice as big, say, it should occur twice as often. In this case, ratios that explode out by orders of magnitudes quickly swamp results, and drive the whole thing into the ground.</p>
<p>You swiftly end up with a monoculture. One result eats all the others, and instead of a room full of happy spiders doing their thing, you end up with one fat angry spider in the middle of the room. Umm, so to speak.</p>
<p>Ratios can be dangerous, kids. Watch out!</p>
]]></content:encoded>
			<wfw:commentRss>http://sidawson.com/2008/09/trouble-with-ratios.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Unit Testing &#8211; Necessary, but Not Enough</title>
		<link>http://sidawson.com/2008/07/unit-testing-necessary-but-not-enough.html</link>
		<comments>http://sidawson.com/2008/07/unit-testing-necessary-but-not-enough.html#comments</comments>
		<pubDate>Wed, 02 Jul 2008 11:59:00 +0000</pubDate>
		<dc:creator>Si</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Software-Engineering]]></category>

		<guid isPermaLink="false">http://sidawson.com/?p=6</guid>
		<description><![CDATA[I realised recently that I&#8217;d hit a point of diminishing returns. My overall code base was now so complex that any change I introduced in certain areas was taking exponentially longer to debug &#38; ensure accuracy. Of course, I had a test rig &#8211; otherwise how would I know what I was doing was correct [...]]]></description>
			<content:encoded><![CDATA[<p>I realised recently that I&#8217;d hit a point of diminishing returns. My overall code base was now so complex that any change I introduced in certain areas was taking exponentially longer to debug &amp; ensure accuracy.</p>
<p>Of course, I had a test rig &#8211; otherwise how would I know what I was doing was correct in the first place?</p>
<p>The central core of all my systems is a rebuild of a now antiquated black box trading platform. I don&#8217;t have the source, but I need to duplicate the behaviour.</p>
<p>The test rig is pretty sophisticated &#8211; it didn&#8217;t start that way, and it shouldn&#8217;t really have needed to be, buuuuut</p>
<p>The old system:</p>
<p><strong>1. Calculates using single precision floating point math. <br/></strong> If I need to explain why this is painful, <a href="http://www.office-excel.com/articles/microsoft-excel-fails-simple-math-multiplication.html">check this out</a> &#8211; if even the guys running Excel get occasionally tripped up by floating point math, what hope is there for the rest of us? Single point means there&#8217;s only half as many bits (32) to do maths in vs the default double (64 bits). Rough shorthand, single precision gives you get 6 decimal places. A number like &#8217;12000.25&#8242;, you&#8217;ll lose the &#8217;5&#8242;. If it&#8217;s negative, you&#8217;ll lose the &#8216;.25&#8242;. This means lots of rounding errors, and the more calculations you do, the more errors. The systems I&#8217;m working with do a LOT of calculations.</p>
<p><strong>2. Rounds incoming numbers non deterministically</strong> <br/>Mostly you can guess correctly what it&#8217;s going to decide a market price becomes, but particularly with markets that move in 1/32&#8242;s or 1/64 (ie, not simple decimals), this rounding becomes arbitrary if not damn ornery (rounded? no. up? no. down? no. truncated? no. based on equivalent string length? maybe)</p>
<p><strong>3. Makes &#8216;interesting&#8217; assumptions</strong> <br/>Things like the order that prices get hit, how numbers are calculated internally (eg X = function(A/B) often returns a different result from Y = A/B; X = function(Y), that slippage only occurs in some situations and not others, and so on. Some make sense, in a way, many we don&#8217;t want. So now we have two modes of operation &#8220;old, broken, compatible, testable&#8221; and &#8220;new, not-broken, different numbers, untestable&#8221;</p>
<p><strong>4. Has &#8216;chains&#8217; of internal dependencies. <br/></strong>So, unsurprisingly, any of the above errors will then cascade through the output, fundamentally changing large chunks of the results.</p>
<p><br/>So, the test rig allows for all this. Understands where common rounding problems occur, and how they cascade. Sorts by seriousness of the discrepencies, and so forth. Oh, and it does this by automatically tracking 60 or 70 internal variables for each calculation set across 7000 days on 60 markets. Ie, filtering &amp; matching its way through 20-30 million data points.</p>
<p>But this still isn&#8217;t enough.</p>
<p>And this is where I see the light, and realise that this unit testing stuff that people have been raving about might actually be useful. So far, it has been. It&#8217;s enabled me to auto-scan a ton of possible problems, keep things in alignment as the system adjusts to changing requirements &#8211; all the palava you&#8217;ve read about.</p>
<p>But I&#8217;ve been thinking. No amount of unit testing would catch the errors my test rig will. Not that the rig is that amazing &#8211; just that they&#8217;re operating at fundamentally different levels. Unit testing won&#8217;t tell me:</p>
<p><strong>a)</strong> If I&#8217;ve made a mistake in my logic <br/><strong>b)</strong> If I understand the problem space correctly <br/><strong>c)</strong> If my implementation is correct (in the &#8220;are these answers right?&#8221; sense) <br/><strong>d)</strong> If I understand the problems space &lt;b&gt;thoroughly&lt;/b&gt; (obscure, hard-to-find &amp; subtle edge cases are very common) <br/><strong>e)</strong> If my unit tests are reliable &amp; complete &#8211; have they caught everything?</p>
<p>Unfortunately, thinking about this more, I&#8217;m not convinced that even unit testing PLUS my test rigs (yes, rigs. I lied before. I actually have two, no three, that grill the system from subtly different angles) are going to catch everything.</p>
<p>Of course, it&#8217;s a game of diminishing returns. How much time do I spend testing vs actually delivering resuilts?</p>
<p>Shifting to a higher level language helps &#8211; fewer lines of code = fewer bugs. It&#8217;s still a stop gap though. Programs are only getting larger &amp; more complex.</p>
<p>Better architecture always helps of course &#8211; lower coupling = fewer cascading problems across sub-domains, but when we&#8217;re juggling tens, hundreds, or thousands of subsystems in a larger overall system?</p>
<p>I&#8217;m not convinced there&#8217;s an easy answer. And as software gets more complex, I only see the overall problem spiralling at some high power of that complexity. No matter how clever our test rigs, how well covered in tests our code is.. How do we move forward efficiently without getting bogged down in &#8220;Can we trust the results?&#8221;?</p>
<p>Right now, I just don&#8217;t know.</p>
]]></content:encoded>
			<wfw:commentRss>http://sidawson.com/2008/07/unit-testing-necessary-but-not-enough.html/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

