<?xml version="1.0"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link href="http://goddard.net.nz/feed" rel="self" type="application/rss+xml" />
    <title>goddard.net.nz</title>
    <link>http://goddard.net.nz/</link>
      <description>goddard.net.nz - Blog</description>
    <language>en-nz</language>
    <pubDate>Thu, 26 Apr 2012 17:43:23 +1200</pubDate>
    <generator>guidedoglady.org.nz</generator>
      <item>
        <title>Where &quot;Simple&quot; DB Design Misses the Mark</title>
        <link>http://goddard.net.nz/blog/2012/04/26/12/where-simple-db-design-misses-the-mark</link>
        <description><![CDATA[<p>There&#8217;s a strong tendency among web developers at the moment, encouraged by frameworks such as Rails and Django, to throw away the textbook on database design and just treat a database as a bucket of records. The main characteristics of this are:</p>
<ul>
	<li>All tables have an &#8220;id&#8221; field which serves as its key.</li>
	<li>The database is used through an <span class="caps">ORM</span> which uses this to retrieve/update/delete records.</li>
	<li>These systems discourage or disallow multi-column primary keys.</li>
</ul>
<p>The upside is that this method requires little thought, and is often marketed as &#8220;easy&#8221;, &#8220;simple&#8221;, etc. This approach may require little work at first, but it can bite you later. I hope to show how such a structure can become hard to maintain and make it hard to ensure data quality through a few examples.</p>
<h3>ID-per-table design example</h3>
<p>We want to record competitors in a series of dance competitions. We also want to have a simple checkbox to say they&#8217;ve paid entry fees</p>
<pre><code>
CREATE TABLE competition (
  competition_id INTEGER PRIMARY KEY,
  friendly_name TEXT NOT NULL
);

CREATE TABLE competitor (
  competitor_id INTEGER PRIMARY KEY,
  first_name TEXT NOT NULL,
  surname TEXT NOT NULL,
  phone_number TEXT NOT NULL
);

CREATE TABLE entrance (
  entrance_id INTEGER PRIMARY KEY,
  competition_id INTEGER NOT NULL REFERENCES competition (competition_id),
  competitor_id INTEGER NOT NULL REFERENCES competitor (competitor_id),
  paid_fee BOOLEAN NOT NULL
);
</code></pre>
<p>We&#8217;ve already got our first issue &#8211; what does it mean for a competitor to be entered in the competition twice? We probably don&#8217;t want to allow that (or if we do need to be clear about that). We can stop that fairly easily &#8211; slap a <span class="caps">UNIQUE</span> constraint on those fields:</p>
<pre><code>
CREATE TABLE entrance (
  entrance_id INTEGER PRIMARY KEY,
  competition_id INTEGER NOT NULL REFERENCES competition (competition_id),
  competitor_id INTEGER NOT NULL REFERENCES competitor (competitor_id),
  paid_fee BOOLEAN NOT NULL,
  UNIQUE (competition_id, competitor_id)
);
</code></pre>
<p>Now we want to track the different divisions within the competition and awards:</p>
<pre><code>
CREATE TABLE division (
  division_id INTEGER PRIMARY KEY,
  competition_id INTEGER NOT NULL REFERENCES competition (competition_id),
  division_name TEXT,
  UNIQUE (competition_id, division_name)
);

CREATE TABLE award (
  award_id INTEGER PRIMARY KEY,
  division_id INTEGER NOT NULL REFERENCES division (division_id),
  competitor_id INTEGER NOT NULL REFERENCES competitor (competitor_id),
  place INTEGER NOT NULL,
  UNIQUE (division_id, place),
  UNIQUE (competitor_id, division_id)
);
</code></pre>
<p>What&#8217;s wrong with this you may ask? If systems get out of sync then someone could have an award record without being entered in the competition.</p>
<p>You can try to check for this in application code, but unless you check everything everywhere and very carefully manage locking of database rows (<span class="caps">SELECT</span> &#8230; <span class="caps">FOR</span> <span class="caps">UPDATE</span> and such) then you can still get in trouble. One process could kick someone out of the competition for failing to pay their fees shortly before another records an award. This is especially dangerous when information gets stale over time, for example because it comes from a printout, is copied to an offline system or a record is cached. We should really want to catch this!</p>
<h3>A bad fix</h3>
<p>We could make award point to the entrance record instead of directly to the competitor, containing an entrance_id . We can enforce such a constraint by changing our IDs to go through another record, but then we&#8217;ll be using an entrance_id on the award instead of a competitor_id.</p>
<pre><code>
CREATE TABLE award (
  award_id INTEGER PRIMARY KEY,
  division_id INTEGER NOT NULL REFERENCES division (division_id),
  entrance_id INTEGER NOT NULL REFERENCES entrance (entrance_id),
  place INTEGER NOT NULL,
  UNIQUE (division_id, place),
  UNIQUE (entrance_id, division_id)
);
</code></pre>
<p>I would call the entrance_id here an opaque identifier &#8211; it means nothing without looking at the table it refers to. Every query has to link through that table even if it uses none of the non-key data &#8211; performance degrades and opportunities for mistakes increase. It&#8217;s also plain counter-intuitive &#8211; we issue an award to a person, not their entrance record.</p>
<p>Imagine also that you can enter freely, but can only claim your award if you submit blood samples for a drug check. The award needs to reference both the drug check record and the entrance record to check they exist.</p>
<pre><code>
CREATE TABLE award (
  division_id INTEGER PRIMARY KEY,
  entrance_id INTEGER NOT NULL REFERENCES entrance (entrance_id),
  drug_check_id INTEGER NOT NULL REFERENCES drug_check (drug_check_id),
  place INTEGER NOT NULL,
  UNIQUE (division_id, place),
  UNIQUE (entrance_id, division_id)
);
</code></pre>
<p>Do we know that the entrance record and the drug check record point to the same person? Same goes for competitions &#8211; was the drug check done at the same competition we&#8217;re issuing the award for? Probably going to be OK in this case, but we have one record used mostly for checking fees and another used mostly for rules enforcement. The people managing those are probably different and work differently so we&#8217;d be better off making sure things can&#8217;t get out of sync.</p>
<h3>How would we fix this with compound keys?</h3>
<p>To clarify, this is not a rant against using any artificial keys. Sometimes there are two Rob Smiths and you need to ask &#8220;are you the Rob Smith from &#8230;&#8221; instead of making the system assume they&#8217;re the same. I&#8217;m going to keep the competitors and competitions identified by numbers, because people are hard to key (not everyone even has two names) and we could want to reuse competition names.</p>
<p>Firstly, as is often the case, the places where we&#8217;ve been using <span class="caps">UNIQUE</span> are actually the natural identifiers of the table. Let&#8217;s make them the keys:</p>
<pre><code>
CREATE TABLE entrance (
  competition_id INTEGER NOT NULL REFERENCES competition (competition_id),
  competitor_id INTEGER NOT NULL REFERENCES competitor (competitor_id),
  paid_fee BOOLEAN NOT NULL,
  PRIMARY KEY (competition_id, competitor_id)
);

CREATE TABLE division (
  competition_id INTEGER NOT NULL REFERENCES competition (competition_id),
  division_name TEXT NOT NULL,
  PRIMARY KEY (competition_id, division_name)
);
</code></pre>
<p>Now on to the award record:</p>
<pre><code>
CREATE TABLE award (
  competition_id INTEGER NOT NULL,
  division_name TEXT NOT NULL,
  competitor_id  INTEGER NOT NULL,
  place INTEGER NOT NULL,
  PRIMARY KEY (competition_id, division_name, place),
  FOREIGN KEY (competition_id, division_name) REFERENCES division (competition_id, division_name),
  FOREIGN KEY (competition_id, competitor_id) REFERENCES entrance (competition_id, competitor_id)
);
</code></pre>
<p>We have all the columns in this table that we need to be able to enforce the constraint that the competitor is entered in the competition &#8211; we have some record of whether they&#8217;ve paid before we give them any trophies. The database will enforce these constraints between transactions and cannot store data which breaks that rule.</p>
<p>We can extend this &#8211; require that people actually danced in that division and add in the drug check example from the bad fix:</p>
<pre><code>
CREATE TABLE entered_division (
  competition_id INTEGER NOT NULL,
  division_name TEXT NOT NULL,
  competitor_id  INTEGER NOT NULL,
  judging_number INTEGER NOT NULL,
  PRIMARY KEY (competition_id, division_name, competitor_id),
  UNIQUE (competition_id, division_name, judging_number),
  FOREIGN KEY (competition_id, division_name) REFERENCES division,
  FOREIGN KEY (competition_id, competitor_id) REFERENCES entrance
);

CREATE TABLE drug_check (
  competition_id INTEGER NOT NULL,
  competitor_id INTEGER NOT NULL,
  lab_sample_number INTEGER NOT NULL,
  PRIMARY KEY (competition_id, competitor_id),
  UNIQUE (competition_id, lab_sample_number),
  FOREIGN KEY (competition_id, competitor_id) REFERENCES entrance (competition_id, competitor_id)
);

CREATE TABLE award (
  competition_id INTEGER NOT NULL,
  division_name TEXT NOT NULL,
  competitor_id  INTEGER NOT NULL,
  place INTEGER NOT NULL,
  PRIMARY KEY (competition_id, division_name, place),
  FOREIGN KEY (competition_id, division_name, competitor_id) REFERENCES entered_division,
  FOREIGN KEY (competition_id, competitor_id) REFERENCES drug_check
);
</code></pre>
<p>(P.S. <span class="caps">REFERENCES</span> by default targets the primary key columns – can often leave off the column list)</p>
<p>This means that:</p>
<ul>
	<li>You have to enter the competition to enter any divisions.</li>
	<li>You have to enter a division, being allocated a judging number (number pinned to back/leg &#8211; alternate key used by the judges) in that division before you can win an award.</li>
	<li>We can never give an award to somebody who hasn&#8217;t had a drug check. The drug check record we refer to can&#8217;t be for a different person or a different competition – we use the same competitor_id and competition_id in multiple foreign keys.</li>
</ul>
<p>The tables have many duplicated columns, but the ability to enforce these constraints is valuable. As I&#8217;ll discuss next, we also gain some performance advantages by duplicating these columns.</p>
<h3>What about performance?</h3>
<p>In relational database engines these sorts of designs will also tend to use simpler queries which perform much better &#8211; we don&#8217;t need to join through lots of tables to find who or what a record relates to. An example task here would be to find the winners of a competition and enter them in an invitation-only event.</p>
<p>Everybody who placed in competition 999 (any division) is automatically entered in competition 1000:</p>
<pre><code>
INSERT INTO competition (competition_id, friendly_name)
    VALUES (1000, 'Invitational Jam 2020');

INSERT INTO entrance (competition_id, competitor_id)
    SELECT DISTINCT 1000, competitor_id
    FROM award
    WHERE competition_id = 999 AND place &lt; 4;
</code></pre>
<p>We could easily create an index on award(competition_id) or even award(competition_id, place) to make those queries cut right down to the records they&#8217;re likely to need quickly. With the id-per-table style you might have:</p>
<pre><code>
INSERT INTO competition (competition_id, friendly_name)
    VALUES (1000, 'Invitational Jam 2020');

INSERT INTO entrance (competition_id, competitor_id)
    SELECT DISTINCT 1000, competitor_id
    FROM entrance
    JOIN division_entrance ON (entrance.entrance_id = division_entrance.entrance_id)
    JOIN award ON (award.division_entrance_id = division_entrance.division_entrance_id)
    WHERE competition_id = 999 AND place &lt; 4;
</code></pre>
<p>There are a couple of ways the database system could evaluate that, and some indexes that would help, but no way of avoiding joining those tables to each other. It will be a lot slower!</p>
<h3>Conclusion</h3>
<p>The id-per-table design pattern may reduce the amount you have to think about when making your database, but it can make it much harder to enforce constraints in the database. While there is often a way to do so, it can make enforcing uniqueness of records or requiring records to share a common relationship difficult. This tends to force integrity checking in to application code, where it may require additional queries and requires careful locking to make these checks safe between transactions.</p>
<p>It also often has serious performance implications. Creating opaque identifiers for tables requires us to link through those tables even when not using their data, just to get to the tables we actually want. This can turn queries in to twisted monsters after a few rounds of iterative development.</p>
<p>I strongly encourage people to pick up some of the &#8220;traditional&#8221; theory about databases and use compound keys on any record which implies or represents a relationship between other entities.</p>]]></description>
        <pubDate>Thu, 26 Apr 2012 17:43:23 +1200</pubDate>
        <guid>http://goddard.net.nz/blog/2012/04/26/12/where-simple-db-design-misses-the-mark</guid>
      </item>
      <item>
        <title>Javascript is a real language</title>
        <link>http://goddard.net.nz/blog/2012/03/02/11/javascript-is-a-real-language</link>
        <description><![CDATA[<p>I figure it&#8217;s about time for a rant on Javascript. Javascript is <em>the</em> language of web sites &#8211; almost every web browser runs it, and certainly all modern browsers are capable of doing so. Embedded media like Flash/Java and efforts like Google&#8217;s <a href="http://www.dartlang.org/">Dart</a> aside, if you&#8217;re writing code to work in a web site then it&#8217;ll be Javascript.</p>
<p>Javascript, while capable of much more, was originally used to add small pieces of intelligent behaviour to web sites. It might be used to validate a form as you fill it in or to hide a block until you click a button. Nowadays, Javascript is used for a lot more, but is still saddled with being treated as a language for &#8220;decorating&#8221; web sites rather than serious work.</p>
<p>All too often, Javascript is still treated as if it&#8217;s just doing those little bits and pieces, when we ask much more of it. Code meant for validating forms is adapted to send the form via <span class="caps">AJAX</span> and completely override the normal form submission mechanism. Instead of a web site having a couple of pieces of Javascript from the site itself, we import Javascript from advertising networks, social media sites, analytics companies and other third parties and end up with many more, possibly interfering scripts.</p>
<p>Over the last few years I&#8217;ve gone from doing your standard copy-and-paste Javascript to really learning to treat it as a real language.</p>
<h3>Javascript should be modular</h3>
<p>One classic example of where the copy-paste script falls apart is when we have clashes between those snippets. An example I&#8217;ve seen in real life is when one script tried to set the global value &#8220;random&#8221; to a function for generating random numbers, while another script set the global value &#8220;random&#8221; to be an actual random number. One script was then trying to use a number as a function, which didn&#8217;t work very well.</p>
<p>It went something like:</p>
<p>In one file&#8230;<br />
<pre><code>
function createAdCall(location, params) {
  random = Math.floor(Math.random() * 10000000)
  ...
  ad.innerHTML = "&lt;script type=\"text/javascript\" src=\"http://ad.server/give/me/ad?cache_buster=" + random +"\"&gt;&lt;/sc" + "ript&gt;"
}
</code></pre></p>
<p>In another file&#8230;<br />
<pre><code>
function random() {
  return 3;
}
function blah(...) {
  ...
  var foo = random();
  ...
}
</code></pre></p>
<p>Whenever the createAdCall was run it would stomp on &#8220;random&#8221;, replacing the function. <strong>Both parties were to blame</strong>. The party setting random globally should have been giving it a local scope, but the other party shouldn&#8217;t be using the global &#8220;random&#8221; either &#8211; it&#8217;s a very common term.</p>
<p>In almost every other language, it&#8217;s taken for granted that using global variables or common names in the global namespace is an excellent way to tell your peers that you&#8217;re clueless, but people do this in Javascript all the time.</p>
<p>In Javascript it&#8217;s extremely easy to create modules of functions. Here are two, different ways of creating modules:</p>
<p>In one file&#8230;<br />
<pre><code>
ExtraAwesomeAdNetwork = {}
ExtraAwesomeAdNetwork.ads_requested = 0
ExtraAwesomeAdNetwork.createAdCall = function(location, params) {
  var random = Math.floor(Math.random() * 10000000)
  ExtraAwesomeAdNetwork.ads_requested++
  ...
  ad.innerHTML = "&lt;script type=\"text/javascript\" src=\"http://ad.server/give/me/ad?cache_buster=" + random +"\"&gt;&lt;/sc" + "ript&gt;"
}
</code></pre></p>
<p>In the other file&#8230;<br />
<pre><code>
(function() { // Start private scope by wrapping code in a function
  // Create our random function. Will be preferred over global "random" by
  // anything in this block, and invisible outside it.
  function random() {
    return 3
  }
  function blah() {
    ...
    var foo = random();
    ...
  }
  // Export functions
  MyAnalyticsToolkit = {
    "blah": blah
  }
})() // End private scope - calls the function immediately
</code></pre></p>
<p>We now call the functions in the form ExtraAwesomeAdNetwork.createAdCall(&#8230;) and MyAnalyticsToolkit.blah(&#8230;) . If both define functions with the same name, we&#8217;re keeping them separate. Each module can store its own private data without fear that anything else will trample on it.</p>
<h3>Javascript supports objects</h3>
<p>Javascript is entirely capable of doing full OO-style development. This can be used to enhance the modular approach above:</p>
<pre><code>
function Car(fuel_type) {
  this.fuel_type = fuel_type
}

Car.prototype.start = function() {
  if (this.fuel_type == 'petrol') {
    alert("Brrrrrrooom...")
  } else if (this.fuel_type == 'diesel') {
    alert("BrrrrrroooOOOOOOOOOOOOOOOOOOOOOOMMMMMMM!")
  } else {
    alert("Your " + this.fuel_type + " powered car is too hard to start.")
  }
}

var petrol_car = new Car('petrol')
petrol_car.start()
var diesel_car = new Car('diesel')
diesel_car.start()
var pedal_car = new Car('pedal')
pedal_car.start()
</code></pre>
<p>Especially when combined with modularisation as above, this can lead to much cleaner code.</p>
<h3>Events</h3>
<p>Javascript used to be run by putting code in <span class="caps">HTML</span> attributes, but this has a few issues:</p>
<pre><code>
&lt;form onSubmit="return validateName(document.getElementById(\"name\").value)"&gt;
&lt;input id="name" type="text" value="Tony" /&gt;
&lt;/form&gt;
</code></pre>
<p>A few issues in that but really it&#8217;s just plain ugly. It mixes your Javascript logic in with the <span class="caps">HTML</span> and content.</p>
<p>Much better to attach an event listener. You can do this in native Javascript using attachEventListener, but I do prefer to use a library to simplify this. Here&#8217;s an example with jQuery:</p>
<pre><code>
&lt;form id="name_form"&gt;
  &lt;input id="name_form_name" type="text" value="Tony" /&gt;
&lt;/form&gt;
</code></pre>
<p>In your Javascript file:<br />
<pre><code>
$(document).ready(function() {
  $('#name_form').submit(function() {
    return validateName($('#name_form_name').val())
  })
})
</code></pre></p>
<h3>Keeping references</h3>
<p>I get annoyed when I see code creating <span class="caps">HTML</span> with IDs then relying on them having some structure to find them again:</p>
<pre><code>
&lt;script&gt;
  function highlightRow() {
    var id = this.id.replace(/^highlight/, '')
    $("#row" + id).style('background', 'red')
  }

  nextRow = 0
  function createRow(container) {
    var myRowId = nextRow++
    var row = $('&lt;div id="row' + myRowId + '"&gt; ... &lt;button id="highlight' + myRowId + '"&gt;Highlight&lt;/button&gt;&lt;/div&gt;');
    $('#highlight' + myRowId).click(highlightRow)
  }
  
  $(document).ready(function() {
    for (var i = 0; i &lt; 25; i++) {
      createRow($('#rowContainer'))
    }
  })
&lt;/script&gt;
...
&lt;div id="rowContainer"&gt;&lt;/div&gt;
</code></pre>
<p>Things are simpler and more reliable when you keep references to the <span class="caps">DOM</span> nodes around and use them. Javascript functions are <em>closures</em> &#8211; they can keep accessing variables from the scope they were created in.</p>
<pre><code>
&lt;script&gt;
  function highlight(node) {
    node.style('background', 'red')
  }
  
  function createRow(container) {
    var block = $('&lt;div id="row' + myRowId + '"&gt; ... &lt;button class="highlight"&gt;Highlight&lt;/button&gt;&lt;/div&gt;')
    block.find('.highlight').click(function() {
      highlight(block)
    })
    container.append(block)
  }

  $(document).ready(function() {
    var rowContainer = $('#rowContainer')
    for (var i = 0; i &lt; 25; i++) {
      createRow(rowContainer)
    }
  })
&lt;/script&gt;
...
&lt;div id="rowContainer"&gt;&lt;/div&gt;
</code></pre>
<h3>Final words</h3>
<p>Javascript in the wild tends to be a messy, hideous beast, but it doesn&#8217;t have to be that way. Modularise your code rather than polluting the global namespace, use objects to store persistent state, include a library to simplify <span class="caps">DOM</span> access, attach event handlers in JS rather than embed in the <span class="caps">HTML</span>, keep references to <span class="caps">DOM</span> elements rather than abuse IDs and above all, <strong>treat Javascript as a real language</strong>. Trust me, it&#8217;ll soon be a lot less painful.</p>]]></description>
        <pubDate>Fri, 02 Mar 2012 16:03:05 +1300</pubDate>
        <guid>http://goddard.net.nz/blog/2012/03/02/11/javascript-is-a-real-language</guid>
      </item>
      <item>
        <title>Messing around dealing with random packet loss</title>
        <link>http://goddard.net.nz/blog/2012/01/13/10/messing-around-dealing-with-random-packet-loss</link>
        <description><![CDATA[<p>One of my interests recently has been around network quality and how well protocols such as <span class="caps">TCP</span> cope with noise. Having just spent a week trying to use internet in various backpackers, network quality has come to the front of my brain again.</p>
<p>When a router can&#8217;t forward traffic as quickly as it&#8217;s getting it, it starts dropping packets. <span class="caps">TCP</span> takes advantage of this to measure link capacity &#8211; it increases the amount of traffic being sent until it starts losing packets then reduces the speed again. This generally works very well, but has an issue with packet loss not related to the data being sent. Many networks just have noise or lose packets randomly, and <span class="caps">TCP</span> will tend to back right off to a trickle when this happens completely unnecessarily.</p>
<p>I&#8217;ve been having a play with some simple erasure correction over networks. The result so far is a <a href="http://goddard.net.nz/files/projects/fectun.rb">messy ruby script</a> which will set up a tunnel with some erasure tolerance between linux boxes. <strong><span class="caps">EDIT</span>: script updated. Was insanely buggy and kept losing link when client changed source port. Now repairs connection when that happens, editing this now over the link.</strong>. It breaks each packet it gets on the tun interface in to 3 <span class="caps">UDP</span> packets, and can reassemble any 2 of those in to the original. It takes just over half as much again traffic (adds some headers) but is a bit more resilient to random packet loss.</p>
<p>You can set up a server instance like:</p>
<blockquote>
<p>you@server$ sudo ruby fectun.rb 0.0.0.0 4943 server <span class="caps">PASSWORD</span></p>
</blockquote>
<p>It will bind to the IP and port given and wait for a client to connect. The word &#8220;server&#8221; is magic &#8211; no decent command line options yet. The password should be chosen and kept the same between them &#8211; the server can only handle one client at once and any client with the password can take over the connection.</p>
<p>To connect with a client:</p>
<blockquote>
<p>you@client$ sudo ruby fectun.rb 0.0.0.0 9000 my.other.computer 4943 <span class="caps">PASSWORD</span></p>
</blockquote>
<p>If all goes well, the server end should tell you it&#8217;s switching its destination to the client that just connected. The port may be mangled if the client is behind a <span class="caps">NAT</span> router.</p>
<p>Each end now has a &#8220;tun&#8221; interface connected to each other. These need to be given IP addresses:</p>
<blockquote>
<p>you@server$ sudo ifconfig tun0 10.93.0.1/24</p>
</blockquote>
<blockquote>
<p>you@client$ sudo ifconfig tun0 10.93.0.2/24</p>
</blockquote>
<p>Then from the client you should be able to:</p>
<blockquote>
<p>you@client$ ping 10.93.0.1</p>
</blockquote>
<p>or set up a <span class="caps">SOCKS</span> proxy by running <span class="caps">SSH</span> over the link:</p>
<blockquote>
<p>you@client$ ssh you@10.0.0.1 -D 8080</p>
</blockquote>
<p>It&#8217;s all a little sketchy at the moment but have managed to get a link to my server going from home. I hope to extend this to batch packets together within the erasure code so that it doesn&#8217;t send 3 times as many packets.</p>
<p>P.S. Please use this nicely. Could be quite anti-social on busy networks if you sent a lot of traffic over it.</p>]]></description>
        <pubDate>Fri, 13 Jan 2012 02:57:27 +1300</pubDate>
        <guid>http://goddard.net.nz/blog/2012/01/13/10/messing-around-dealing-with-random-packet-loss</guid>
      </item>
      <item>
        <title>Embedding foundation in frames</title>
        <link>http://goddard.net.nz/blog/2011/11/02/9/embedding-foundation-in-frames</link>
        <description><![CDATA[<p>As mentioned on some other sources, I&#8217;m picking up beekeeping and have been assembling hives. One of the steps for this is the need to embed the wax foundation which the bees will build their comb on in to the frames. After discussions with Will at work about how they used to do this, ended up using this method which worked beautifully.</p>
<p>Requires:</p>
<ul>
	<li>Metal framing wire &#8211; a very thin wire.</li>
	<li>Tacks to attach wire to frame.</li>
	<li>Battery from 6 to 12 volts &#8211; lower generally better as gives you a bit more time and reduces risk of cutting foundation in to pieces. I used a 7.2V NiCD battery pack intended for RC cars.</li>
	<li>Clips to attach battery to a nail or wire. &#8220;Crocodile clips&#8221; would be perfect.</li>
	<li>Board or block that is about the size of a sheet of foundation. Must be possible to fit this through the frame, but must also be close to the full length of the foundation and at least as wide as the spacing between the top and bottom wires. Have used a pair of hive-width runners successfully.</li>
</ul>
<ol>
	<li>Nail tacks half-way in where the wire will start and finish going through the holes of the frame.</li>
	<li>Wire the frame with thin metal &#8220;framing wire&#8221;, wrapping around the tacks. Add tension by unwrapping from each end, pulling tight and wrapping around the tack again.</li>
	<li>Once wire is tensioned, nail the tacks fully in. If possible don&#8217;t clip off the left-over wire at each end yet &#8211; having a tail there makes it easier to clip.</li>
	<li>Place the foundation sheet in the groove in the frame. Leave the sheet on one side of the wires &#8211; don&#8217;t try to weave it through.</li>
	<li>Place frame on the block so that the wires are on top and the foundation sheet is below. The foundation sheet should rest on the block, and the frame around it should be held up by the wires running over the foundation.</li>
	<li>Attach one terminal of the battery with a clip to one of the tacks or the tail sticking out from it.</li>
	<li>Attach the clip for the other terminal to the other tack or tail. The wire will heat up &#8211; be ready to unclip after a few seconds!</li>
	<li>The weight of the frame pushes the wire in to the foundation. If you have any loose wire, may need to push down on it to get it to embed. At lower voltages you can touch the wire without burning yourself to help this along.</li>
	<li>When the wax surrounds the wire (a few seconds), unclip the nearest clip from the frame. This generally creates a visible band of clearer wax where the foundation has been heated &#8211; can judge visually quite easily when to disconnect.</li>
</ol>
<p>This allowed me to do my first 20 frames quite quickly. Had a few dodgy embeddings at first due to wires in the first frames I wired being too loose, but the majority came out with no exposed wire.</p>]]></description>
        <pubDate>Wed, 02 Nov 2011 13:23:24 +1300</pubDate>
        <guid>http://goddard.net.nz/blog/2011/11/02/9/embedding-foundation-in-frames</guid>
      </item>
      <item>
        <title>Integration Branches – Reusable Resolution of Code Conflicts</title>
        <link>http://goddard.net.nz/blog/2011/07/07/8/integration-branches-–-reusable-resolution-of-code-conflicts</link>
        <description><![CDATA[<p>A key issue with managing concurrent versions of a project is that the more you separate changes, the more likely it is that the computer can&#8217;t work out how to combine changes. Often two separate streams of work will change the same code, generating a &#8220;conflict&#8221; – the version control system needs a human to tell it how to combine those changes.</p>
<p>You can either make the changes in the same place and not be able to separate them down the track, or you can do the work in different branches and have to integrate them down the line. The more you split your work up, the more you have to merge and more likely it is that conflicts will happen. In one of the projects I work on, we&#8217;ve come up with a few interesting ways of keeping work separate and resolving these conflicts when they occur.</p>
<p>This project has a lot of concurrent work going on at any given point. These are, in Catalyst&#8217;s normal way, entered in to our work management system as &#8220;requests&#8221;. Each request might be for a piece of work which takes a couple of hours, a day or a week or more.</p>
<p>We use <a href="http://git-scm.com">git</a> for source control, but the techniques here apply to any <span class="caps">VCS</span> with usable branching and merging. To manage these, we use a branch for each request and work whenever possible to keep the changes for requests separate. By keeping changes apart, we allow requests to proceed through stages of development, testing and eventually release independently of each other, so that an issue with one request doesn&#8217;t hold others up.</p>
<p>The project has one branch which represents the code running on the &#8220;production&#8221; servers:</p>
<pre>
--o production
</pre>
<p>Whenever we start work on a request, it starts at the latest &#8220;production&#8221; code and changes are added to it:</p>
<pre>
--o production
   \
     ---o request 1234
</pre>
<p>The developer keeps all the changes for that ticket in its own branch. Come time to test, a bunch of requests are merged together to create a testing version:</p>
<pre>
request 9999 --------o
request 1111 -----o   \
request 1234 --o   \   \
                \   \   \
                 o---o---o---o testing
</pre>
<p>This is tested by our QA people, and the tickets are individually passed or failed. If we have tickets 1111, 1234 and 9999 then each ticket could be released, or not, at any given time. We can elect to include 9999 and 1234 in a build, but decide that 1111 isn&#8217;t stable enough yet.</p>
<p>Where this gets tricky is where two requests alter the same piece of code. Our attempt to merge this then generates a conflict. If you just resolve that conflict when it happens, you&#8217;d have to do it every time &#8211; the &#8220;resolved&#8221; combination isn&#8217;t reusable since the testing branch contains dozens of requests&#8217; changes, any of which could be withdrawn at any time.</p>
<p>When a conflict happens with a ticket, our standard approach is to:</p>
<ul>
	<li>Merge in the latest &#8220;production&#8221; code, ensure it doesn&#8217;t clash with that.</li>
	<li>Look for other branches modifying the files where the conflict is happening, find which branch or branches are clashing with it.</li>
	<li>Create an &#8220;integration branch&#8221; to tell the <span class="caps">VCS</span> how to integrate them.</li>
</ul>
<p>The integration branch is yet another branch, the whole purpose of which is to tell the <span class="caps">VCS</span> how to resolve a conflict. If requests 1234 and 5678 are clashing, we would create a branch containing both of them:</p>
<pre>
request 5678 -----o
request 1234 --o   \
                \   \
                 o---X---o int_1234_5678
</pre>
<p>When we merge in the second one, we&#8217;ll get a conflict, which will have to be resolved by hand. Once we&#8217;ve done this, however, we can avoid further conflicts when merging branches. We merge in the integration branch first, then any subsequent changes on the individual requests:</p>
<pre>
request 5678  --------o
request 1234  -----o   \
int_1234_5678 --o   \   \
                 \   \   \
                  o---o---o---o testing
</pre>
<p>Because int_1234_5678 already contains the clashing changes from both, merging this first avoids the conflict reappearing. We still try to merge the individual requests in, because they may contain further changes since the integration branch was created.</p>
<p>This approach works pretty well for clashes between two branches, but it does get messy quickly if three branches change the same code at the same time. For that level of complexity, you would have to create 4 integration branches to allow any combination. At this point we tend to give up on this approach and combine changes irreversibly.</p>]]></description>
        <pubDate>Thu, 07 Jul 2011 11:33:39 +1200</pubDate>
        <guid>http://goddard.net.nz/blog/2011/07/07/8/integration-branches-–-reusable-resolution-of-code-conflicts</guid>
      </item>
      <item>
        <title>Hat Stand Warmer</title>
        <link>http://goddard.net.nz/blog/2010/08/06/7/hat-stand-warmer</link>
        <description><![CDATA[<p>Yesterday I encountered a very cool instructable for making a small knitting loom. The original author of this is using it to make covers for audio cables, but the idea is very clearly generalisable:</p>
<p><a href="http://www.instructables.com/id/Making-a-Small-Knitting-Loom/">http://www.instructables.com/id/Making-a-Small-Knitting-Loom/</a></p>
<p>The only materials needed were some 16 gauge wire, a crochet hook and some wool. I picked up the wire and made a couple of these &#8211; one small one and a slightly larger one.</p>
<p><img src="http://goddard.net.nz/files/projects/hatstandwarmer/4864172657_76d52bc463.jpg" alt="" /></p>
<p>A bit wonky but they seem to work. One of the great things here is that they have a gap in them which allows them to snap around a post/pipe/cable without needing to slip them over an end. These can be used just with some wool and a crochet hook.</p>
<p>My first project with this was to decorate a hat stand at work, with the aim of producing something interesting and different and confusing colleagues &#8211; I wanted this just to appear overnight and obviously be impossible to slip on or off. I came back after work with the crochet hook and some multi-coloured wool and started shortly before 7pm. With a brief break to grab dinner, this fairly large first attempt took me until 12:30 to complete.</p>
<p><img src="http://goddard.net.nz/files/projects/hatstandwarmer/4864133399_e4724b49c3.jpg" alt="" /> <img src="http://goddard.net.nz/files/projects/hatstandwarmer/4864753422_cf53c6b6c5.jpg" alt="" /></p>
<p>Apparently the goal of puzzling colleagues succeeded, but I was tired enough that I got in to work today at the last possible moment and missed all the reactions.</p>
<p>I definitely sped up as I went &#8211; wouldn&#8217;t take as long again, but working on something like that that solidly is one of the most tiring things I&#8217;ve done. I highly recommend trying this technique &#8211; maybe not in such a rush though!</p>]]></description>
        <pubDate>Fri, 06 Aug 2010 12:21:21 +1200</pubDate>
        <guid>http://goddard.net.nz/blog/2010/08/06/7/hat-stand-warmer</guid>
      </item>
      <item>
        <title>Indexed Materialised Paths in PostgreSQL</title>
        <link>http://goddard.net.nz/blog/2010/04/24/6/indexed-materialised-paths-in-postgresql</link>
        <description><![CDATA[<p>A while ago I had a need to manage a series of nested categories in Postgres and needed it to support efficient lookup of descendants of a given category. In the process of doing this I came across a few approaches to the problem. The first approach I ever tried for this was the nested sets approach recommended by the MySQL documentation<sup class="footnote" id="fnr1"><a href="#fn1">1</a></sup>. This is fairly efficient to query, but is difficult to update safely.</p>
<p>Another approach to this is to create and use a &#8220;materialised path&#8221; &#8211; a record on each category which lists all of its ancestors. In Postgres, it seems that materialised paths are quite easy to use and lend themselves naturally to more conventional indexing and querying. First, let&#8217;s create an example data structure:</p>
<pre>
CREATE TABLE public.category (
  category_id SERIAL PRIMARY KEY,
  parent_id INTEGER NULL REFERENCES category(category_id),
  category_name TEXT NOT NULL,
  UNIQUE(parent_id, category_name)
);
</pre>
<p>This describes a tree of named categories. Each category contains a reference to its parent category. It also has a name, where no two sections under the same parent share a name. This can be visualised like a directory structure &#8211; under each category we have zero or more uniquely named sub-categories. There can be any number of &#8220;root&#8221; categories &#8211; designated by setting the parent_id field to <span class="caps">NULL</span>.</p>
<p>The problem we face is that doing queries on the whole tree is difficult because each entry only contains local context, not overall position in the tree. We have to iterate over large portions of the tree to perform many queries. A very good solution to this is to have a materialised path:</p>
<pre>
ALTER TABLE category ADD COLUMN path INTEGER[] NULL;
</pre>
<p>This will contain an array of category id values from the root down to the current one. For example, if we have category 1234 and its parent is 1000 and 1000&#8217;s parent is 500 and 500&#8217;s parent is 10 and 10 is a root, our path will be {10,500,1000,1234} . This structure allows us straight away to do queries over large portions of the tree. To get all descendants of category 3 we could run the query:</p>
<pre>
SELECT * FROM category WHERE 3 = ANY(path);
</pre>
<p>This is obviously convenient, but the path has to be kept up to date for this to work. The most reliable and efficient way to do so is by writing some functions in the database.</p>
<pre>
CREATE FUNCTION public.find_category_path(start_id INTEGER) RETURNS INTEGER[] AS $$
  DECLARE
    current_id INTEGER;
    result INTEGER[];
  BEGIN
    IF NOT EXISTS(SELECT 1 FROM public.category WHERE category_id = start_id) THEN
      -- No category exists with that ID. We will return NULL - could also raise error.
      RETURN NULL;
    END IF;
    
    current_id := start_id;
    WHILE current_id IS NOT NULL LOOP
      result = current_id || result;
      SELECT INTO current_id parent_id FROM public.category WHERE category_id = current_id;
      
      IF current_id = ANY(result) THEN
        -- We have a loop in parent IDs
        RETURN NULL;
      END IF;
    END LOOP;
    RETURN result;
  END;
$$ LANGUAGE 'plpgsql' RETURNS NULL ON NULL INPUT;
</pre>
<p>This will find the path for any given section, provided a valid ID is given and we don&#8217;t have a loop of categories all referring to each other as parents. This may sound like a strange thing to have, but I&#8217;ve seen it happening.</p>
<p>This is fine for new categories, but if we update the parent of an existing category then we also have to update its children, and their children, and so on. We need a method to find its children and do this efficiently:</p>
<pre>
-- This does most of the work but shouldn't be called directly.
-- Use the wrapper below to call this with the proper starting values.
CREATE FUNCTION
  public._update_category_path(target_id INTEGER, path_to INTEGER[], set_path INTEGER[])
RETURNS VOID AS $$
  DECLARE
  BEGIN
    UPDATE public.category SET path = set_path WHERE category_id = target_id;
    IF set_path IS NULL THEN
      -- This means that we have a loop in parent IDs
      -- We have to set all children to also have a NULL path
      PERFORM public._update_category_path(category_id, path_to || category_id, NULL)
        FROM public.category WHERE parent_id = target_id AND NOT category_id = ANY(path_to);
    ELSE
      -- Update our children
      PERFORM public._update_category_path(category_id, path_to || category_id, set_path || category_id)
        FROM public.category WHERE parent_id = target_id AND NOT category_id = ANY(path_to);
    END IF;
  END;
$$ LANGUAGE 'plpgsql' CALLED ON NULL INPUT;

-- This is a wrapper around the above function, starting it off with the right path.
CREATE FUNCTION public.update_category_path(INTEGER) RETURNS VOID AS $$
  SELECT
    _update_category_path(
      $1,
      public.find_category_path($1),
      public.find_category_path($1)
    )
  ;
$$ LANGUAGE 'sql';
</pre>
<p>This floods downwards, updating each child based on its parent&#8217;s new path. The separate parameters for path to it and value to set are to cover the possibility of having a loop in the parent references, in which case we set all their paths to <span class="caps">NULL</span> &#8211; they don&#8217;t have a meaningful path.</p>
<p>Finally, we get these to be run automatically when a category is created or updated:</p>
<pre>
CREATE FUNCTION public.category_path_trigger() RETURNS TRIGGER AS $$
  DECLARE
  BEGIN
    IF TG_OP = 'INSERT' THEN
      PERFORM public.update_category_path(NEW.category_id);
    ELSIF (NEW.parent_id IS NULL AND OLD.parent_id IS NOT NULL)
      OR (OLD.parent_id IS NULL AND NEW.parent_id IS NOT NULL)
      OR (NEW.parent_id != OLD.parent_id)
    THEN
      PERFORM public.update_category_path(NEW.category_id);
    END IF;
    RETURN NEW;
  END;
$$ LANGUAGE 'plpgsql';

CREATE TRIGGER maintain_category_path
  AFTER INSERT OR UPDATE ON public.category
  FOR EACH ROW EXECUTE PROCEDURE public.category_path_trigger();
</pre>
<p>Finally we update any existing categories:</p>
<pre>
SELECT public.update_category_path(category_id)
  FROM category
  WHERE parent_id IS NULL;
</pre>
<p>So now we have a materialised path &#8211; what can we do with it? One of the best features around this in PostgreSQL is that we can build an index on the path!</p>
<pre>
CREATE INDEX category_path_index ON public.category(path);
</pre>
<p>Not only do we now have a list of ancestors on each category, but we can now do some really efficient queries.</p>
<p>We can display the category tree:</p>
<pre>
SELECT repeat('  ', array_upper(path, 1) - 1) || category_name, category_id
  FROM category
  ORDER BY path;
</pre>
<p>To get all descendant categories:</p>
<pre>
-- Add one to the last element of an array
CREATE FUNCTION public.increment_last_array_component(INTEGER[]) RETURNS INTEGER[] AS
  'SELECT $1[1:(array_upper($1, 1) - 1)] || ($1[array_upper($1, 1)] + 1)' 
LANGUAGE 'sql' IMMUTABLE RETURNS NULL ON NULL INPUT;

SELECT sub.* FROM category sub, category top
  WHERE sub.path &gt; top.path
  AND sub.path &lt; increment_last_array_component(top.path)
  AND top.category_id = ?;
</pre>
<p>This is equivalent to the descendants query from earlier, but is capable of using the index. This works because if our category has the path {1, 2, 3} then everything under that, such as {1, 2, 3, 12} will come before {1, 2, 4}. With a helper function to add one to the last element of the array, we can turn this in to a range query which the index can accelerate. All of Postgres&#8217; index functionality works here &#8211; getting part of the tree in order can be done with an index scan.</p>
<p>Materialised paths in PostgreSQL are both easy to create and well supported by an index on the field. As well as the queries here you could:</p>
<ul>
	<li>Create a second path using the names. This would allow you to quickly look up categories based on a path or <span class="caps">URL</span> (depending on application).</li>
	<li>Allow for easier collection of aggregate statistics &#8211; find all descendant categories and add/average information from them.</li>
	<li>Convert complicated application-layer tree traversal in to single queries.</li>
	<li>Allow triggers and routines altering children to also easily update their ancestors.</li>
</ul>
<p>P.S. If you use this technique, please let me know by posting a comment. It&#8217;d be great to know the different uses people find for this.</p>
<p class="footnote" id="fn1"><a href="#fnr1"><sup>1</sup></a> http://dev.mysql.com/tech-resources/articles/hierarchical-data.html</p>
<p><span class="caps">UPDATE</span>: There was an error in the trigger when the parent_id used to be or was becoming <span class="caps">NULL</span> (comparison issues). Fixed now.</p>]]></description>
        <pubDate>Sat, 24 Apr 2010 14:08:37 +1200</pubDate>
        <guid>http://goddard.net.nz/blog/2010/04/24/6/indexed-materialised-paths-in-postgresql</guid>
      </item>
      <item>
        <title>IPv6 Enabled</title>
        <link>http://goddard.net.nz/blog/2010/04/11/5/ipv6-enabled</link>
        <description><![CDATA[<p>The last few days I&#8217;ve been engaging in some serious geekery learning about and setting up IPv6 on both my home network and my server. In the process I&#8217;ve set up connections in a couple of different ways and learned both good and bad things about it. The end result is that we can now browse IPv6 networks from home, and the sites on this server<sup class="footnote" id="fnr1"><a href="#fn1">1</a></sup> are fully IPv6 enabled.</p>
<p>Setting up the connection at home was interesting – I&#8217;m using <a href="http://openwrt.org">OpenWRT</a> on the router (highly recommended!) and set up tunneling using 6to4, one of the transition mechanisms in place to help people get IPv6 enabled. OpenWRT requires a few packages for IPv6 networking:</p>
<pre>opkg install kmod-ipv6 kmod-ip6tables ip6tables kmod-sit kmod-iptunnel6</pre>
<p>Telstraclear, who I connect with, run a local 6to4 relay. 6to4 sends IPv6 traffic encapsulated inside IPv4 packets to the closest 6to4 relay. If you have a good and well connected 6to4 relay nearby, this is a really good option. In my case, with a router on my ISP&#8217;s network, this seems to be the best option.</p>
<p>My router&#8217;s script to set up the tunnel, based on a sample from the OpenWRT wiki, is available <a href="http://goddard.net.nz/files/projects/ipv6/tun6to4">here</a> . One neat thing is that with 6to4, you get allocated a whole /48 of address space. In other words, your local network could support 1,208,925,819,614,629,174,706,176 public addresses<sup class="footnote" id="fnr2"><a href="#fn2">2</a></sup>. This may seem a little wasteful, but it allows all sorts of cool things to happen, and IPv6 has space to spare.</p>
<p>Once you have IPv6 on the router, the next task is to extend it to the network. On Linux routers you can do this by installing and running radvd. Under a recent OpenWRT this is packaged and easy to install. This sends out advertisements to the <span class="caps">LAN</span>, telling them the IP range used by the network. If the IP range is the right size – a 64 bit prefix – then the clients on the network will assign themselves an address within it based on their <span class="caps">MAC</span> address. This is the &#8220;stateless autoconfiguration&#8221; method and is pretty cool &#8211; much better than <span class="caps">DHCP</span>. To make the network larger, just repeat the router advertisements – you&#8217;ll never have collisions.</p>
<p>Unfortunately it seems the Telstraclear 6to4 server isn&#8217;t quite as well connected as I would hope &#8211; I&#8217;ve had trouble connecting to a few sites. One of the issues with setting up an IPv6 network is that (I&#8217;m guessing) about 5% of sites which advertise IPv6 support are actually inaccessible, either because the site is misconfigured or because the routes are much sparser than for IPv4. Getting to servers at my office is really ridiculous &#8211; it goes all the way to Los Angeles and back and some of the links seem unreliable. Since it doesn&#8217;t fall back to IPv4 properly, these become hard to get at. This hasn&#8217;t proven too problematic so far, but I&#8217;ll be keeping an eye on it.</p>
<p>One of the results of setting up IPv6 in this way is that suddenly all the computers on the network have a global address. This is actually an interesting consequence, as a lot of computers probably shouldn&#8217;t be directly connected to the net without a firewall. My flatmates in particular run Windows boxes and have been known to not have an antivirus installed, yet alone understand firewalls and services. I didn&#8217;t want to expose them, so actually blocked access from outside unless you&#8217;re on a whitelist of addresses. To make it easier to configure this, I set up a ruby script<sup class="footnote" id="fnr3"><a href="#fn3">3</a></sup> which runs as a <span class="caps">CGI</span> script on the router and allows any computer on the network to add itself to this whitelist. It also allows a computer to reserve some IPv4 ports to forward, for example to run peer-to-peer software on.</p>
<p>My server required a completely different approach. The 6to4 server near it is really bad and sometimes can&#8217;t be reached at all. As a result, I had to use a tunnel to the Hurricane Electric <a href="http://tunnelbroker.net">tunnel broker</a> . They provide free tunnels to their routers and, being a major IPv6 backbone operator, are very well connected. This works in a really similar way to 6to4, but rather than tunneling to the closest 6to4 server you tunnel to one of their routers, and rather than using a /48 of address space derived from your IP you use a /64 allocated by the tunnel broker.</p>
<p><a href="http://ipv6.he.net/certification/scoresheet.php?pass_name=pruby"><img src="http://ipv6.he.net/certification/create_badge.php?pass_name=pruby&amp;badge=1" style="float: right; padding-left: 1em; border: 0;" alt="" /></a> An interesting side benefit here was that Hurricane Electric also provide a &#8220;certification&#8221; system with a series of tests. Each test involves actual practical set-up and actually tests that you can set up both IPv6 clients and servers. It does this by making you prove that you control a site, then testing that it supports particular features. You can follow through them as you set up a server with IPv6 and they&#8217;ll act as a pretty good guide of the steps to get fully IPv6 compatible. The &#8220;badges&#8221; are a bit naff and the instructions could use a brush up, but it helps to answer the &#8220;what next?&#8221; question.</p>
<p>One thing I found odd was that I needed to make the server respond to the addresses, but you can&#8217;t add them to the tunnel itself. I ended up setting up a bridge as a dummy device and adding the addresses to that. This made the server accept connections to those addresses rather than sending back the &#8220;no route&#8221; response. It feels really nasty and dirty &#8211; there&#8217;s a bit of that still involved I&#8217;m afraid.</p>
<p>Once you have IPv6 set up on a server and some services configured to bind to IPv6 addresses the next thing is to get <span class="caps">DNS</span> going. Whereas IPv4 addresses of servers are in the &#8220;A&#8221; record, you need to add an &#8220;<span class="caps">AAAA</span>&#8221; record with the IPv6 address. Be sure to test that you can access the machine using IPv6 from outside first &#8211; if you add the <span class="caps">DNS</span> entry but the IPv6 connection fails, clients won&#8217;t fall back to IPv4. If you can get reverse <span class="caps">DNS</span> delegation (Hurricane Electric will do this on their tunnels) then setting up reverse <span class="caps">DNS</span> is also a good idea and isn&#8217;t any harder than forward <span class="caps">DNS</span>. Just make sure you get the reverse <span class="caps">DNS</span> name right &#8211; the easiest way is to dig -x on your IP and take it from the output.</p>
<p>The conclusion of all this is that it took a while, but I now have IPv6 at the flat and on my server. For those who do run web sites, I highly encourage doing all this early, before you have to for work. It&#8217;ll take a while and isn&#8217;t easy (I&#8217;ve brushed over the hours of wandering around dead ends) but in doing so you&#8217;ll be more prepared for the imminent rise in use. I&#8217;m sure they&#8217;ll find some way to slow the address space exhaustion, but even so people will be forced to update within half a decade.</p>
<p class="footnote" id="fn1"><a href="#fnr1"><sup>1</sup></a> The <a href="http://goddard.net.nz">main site</a>, <a href="http://guidedoglady.org.nz">my grandmother&#8217;s site</a>, mail server, name servers.</p>
<p class="footnote" id="fn2"><a href="#fnr2"><sup>2</sup></a> OK, some of those are reserved.</p>
<p class="footnote" id="fn3"><a href="#fnr3"><sup>3</sup></a> The <a href="http://goddard.net.nz/files/projects/ipv6/access.rb">main script</a> and an <a href="http://goddard.net.nz/files/projects/ipv6/ipv6_helper.rb">included file</a>, <a href="http://goddard.net.nz/files/projects/ipv6/firewall.user">firewall setup</a> for OpenWRT, and <a href="http://goddard.net.nz/files/projects/ipv6/port_allocations.rb">firewall updater</a> for the reserved ports.</p>]]></description>
        <pubDate>Sun, 11 Apr 2010 19:18:12 +1200</pubDate>
        <guid>http://goddard.net.nz/blog/2010/04/11/5/ipv6-enabled</guid>
      </item>
      <item>
        <title>Order Encoding: Repeated Elements</title>
        <link>http://goddard.net.nz/blog/2010/04/03/4/order-encoding-repeated-elements</link>
        <description><![CDATA[<p>I&#8217;ve updated the script from my <a href="http://goddard.net.nz/blog/2010/04/03/3/encoding-a-message-in-a-deck-of-cards">last post</a> (read that first to understand the point of this) to support repeated elements in the series. This allows us to encode information in the ordering of elements, even if some of those elements are indistinguishable from each other. With the example of the deck of cards from before, we can now use two decks of cards to encode our message.</p>
<p>This is only really a test of multiple indistinguishable elements, which is harder to work with than it would appear. It doesn&#8217;t store close to the actual capacity &#8211; it only lets you use the amount of information that it could encode with worst-case selection. The result of this is that encoding with two decks at once only gives us as much information as encoding with 2 decks individually. It also probably leaks information about the starting order and is likely to be distinguishable from a random shuffling since there will be some amount of unusable capacity.</p>
<p>Still, this does have the advantage of being able to scale it up, or work with orderings of small numbers of elements. I&#8217;ll be thinking about whether this can be done more efficiently.</p>
<p>Script: <a href="http://goddard.net.nz/files/projects/permute/permute_multi.rb">permute_multi.rb</a><br />
Series file with 2 decks: <a href="http://goddard.net.nz/files/projects/permute/cards_2decks.series">cards_2decks.series</a></p>]]></description>
        <pubDate>Sat, 03 Apr 2010 19:15:44 +1300</pubDate>
        <guid>http://goddard.net.nz/blog/2010/04/03/4/order-encoding-repeated-elements</guid>
      </item>
      <item>
        <title>Encoding a Message in a Deck of Cards</title>
        <link>http://goddard.net.nz/blog/2010/04/03/3/encoding-a-message-in-a-deck-of-cards</link>
        <description><![CDATA[<p>Having recently read Cryptonomicon, I decided to try out the Solitaire cipher which Bruce Schneier invented for it. This is a hand-cipher carried out with a deck of cards. When I was using a deck yesterday to try out the technique, a relative (of too complicated a connection to put here) was interested in what I was doing. What he seemed to be more interested in though was the idea of information being stored in the ordering of cards.</p>
<p>The reason you might want to do this is that a deck of cards appears entirely innocent and is expected to be in some unpredictable order. The message is also quite easy to destroy &#8211; shuffling a deck can be done quickly and doesn&#8217;t look as suspicious as burning a slip of paper. The question was how much information can you store in the ordering of a deck of cards and how you can get it in and out.</p>
<p>It turns out that you can actually fit a small message in the order of a deck, though it would be extremely hard to design a means of encoding and decoding by hand. You would be limited in this case to a well defined series of signals &#8211; e.g. the 3 of spades as the 8th card could signal that the secret police were on to you.</p>
<p>With a computer on either end, we can achieve close to the theoretical information capacity of the ordering &#8211; enough to hold 28 bytes of data in a standard deck of cards. I&#8217;ve written a script that can encode a series of bytes in to the ordering of a series of elements and decode on the other end. It&#8217;s not limited to cards, but both ends need to know the series of elements they&#8217;re re-ordering.</p>
<p>My script is available <a href="http://goddard.net.nz/files/projects/permute/permute.rb">here</a> . The script should run on any system with a Ruby interpreter. The examples given below are for use on *nix systems with the bash shell &#8211; may need some changes to the commands to run on a different system.</p>
<p>You start off with a series file. This should contain a number of unique entries, one per line. For example, each line could represent a card from the deck. The minimum length is 6 entries &#8211; enough to encode a single byte. Both ends will need the same series file to encode and decode. An example file with cards is available as <a href="http://goddard.net.nz/files/projects/permute/cards.series">cards.series</a> . I&#8217;ve used &#8220;C&#8221; for clubs, &#8220;H&#8221; for hearts, &#8220;D&#8221; for diamonds, and &#8220;S&#8221; for spades, followed by the card within that suit.</p>
<p>To see how much information can be stored in the ordering of those elements, you can ask it for the capacity:<br />
<pre>ruby permute.rb -c -s cards.series</pre></p>
<p>In the case of the cards in a deck, we can store 28 bytes. If both jokers were added, we could store 29.</p>
<p>To use it you type something like:<br />
<pre>ruby permute.rb -e -s cards.series &lt;&lt;&lt; &#8220;My message&#8221; &gt; encoded</pre></p>
<p>Then arrange the cards in the order in the file &#8220;encoded&#8221;. Put the deck in a box, pass it along to the recipient, and get the person on the other side to enter the entries in to a file in that order. They must exactly reproduce the original &#8220;encoded&#8221; order and create a file with one entry per line identical to the original encoded file. They can then decode:<br />
<pre>ruby permute.rb -d -s cards.series &lt; encoded</pre></p>
<p>The original message should be reproduced, padded with null bytes to the capacity of the series. At the moment there isn&#8217;t a way of storing the length &#8211; people can add their own convention if they&#8217;re happy to sacrifice the storage space.</p>
<p>Questions to be pursued from here:</p>
<ul>
	<li>If we use a randomly shuffled deck as the order of the shared base series, how well does this protect the content of the message?</li>
	<li>If we encrypt the data so that the information appears random before encoding, does this process leave any evidence that the deck contains a message? In other words, can you demonstrate that the deck is not just in a random order? Any unused part-byte of capacity will be fixed at the moment &#8211; should I randomise those bits?</li>
</ul>]]></description>
        <pubDate>Sat, 03 Apr 2010 17:59:50 +1300</pubDate>
        <guid>http://goddard.net.nz/blog/2010/04/03/3/encoding-a-message-in-a-deck-of-cards</guid>
      </item>
  </channel>
</rss>

