Wednesday, April 30, 2008

Relationalizing Nexus files with Ruby and mx

Ok, so "relationalizing" isn't really a word, but I kind of like how it sounds. For the past couple of weeks I've been writing a Mesquite (i.e. Nexus) file parser in Ruby. It uses the same basic lexer/parser engine that reads Newick formatted trees that I mentioned in a previous post to create a Ruby Nexus file object, with all the good bits (well most of them, some blocks are not parsed yet, but that's just a matter of extending the parser) easily accessible from the object. With this Nexus file object it was relatively trivial to write a conversion to mx (see fig.), i.e. a fully relational format.

The Ruby file parsing code is currently a plugin/library in the mx source, it can be easily extracted for use in other projects. Look for the code in mx 0.2.1540 and onwards when it makes it to Sourceforge, or contact me directly if you're really keen to get your hands on it.

Tuesday, April 1, 2008


Slashdot and the BBC have picked up a story on the use of X-ray Radiography to look at fossil insects. I've seen some pre-press examples of the same technology on extant bugs, the results are incredible, in part because all the soft tissues are left intact and you can get any cross section you want. I'm really curious as to 1) the the resolution on the fossils, on extant critters it is apparently very close to SEM calibre; and 2) the cost. This could be a huge boon to making fossil data available to phylogenetic studies. The kicker at the end- the researchers suggest that the printed plastic insect could be designated as a type specimen. In many ways this would have huge advantages, as anybody with the technology could print their own types. The real suggestion underlying this is not that the plastic itself is the type, but rather that a set of 1s and 0s can be typified, the plastic of course being generated from digital data. This of course opens up a whole can of (prehistoric?) worms. How many 1s an 0s are needed before a (meaningful) type can be designated? Can I use a CoolPix for imaging and maybe bundle my images with some 1s and 0s that encode for some specimen data, and typify the resulting zip file? From a pragmatic standpoint- why not?