Sunday, September 21, 2008

ping (visualization)

Popping up to note that from Rod Page's post on the recent Nascent meeting it appears that visualization is a hot topic. If you aren't following Moritz Stefaner's well formed data blog, and you are interested in visualization, then you are really missing out. His latest post brings up all sorts of possibilities for phylogenetic or taxonomic implementations. For kids or students, why not print up a collectible card game <cough>was a long time ago</cough> with a biodiversity theme, embed interesting data in the cards, and marvel as phylogenetic relationships (or ecological etc.) appear when the cards are placed on that ridiculously cool table.

Wednesday, April 30, 2008

Relationalizing Nexus files with Ruby and mx

Ok, so "relationalizing" isn't really a word, but I kind of like how it sounds. For the past couple of weeks I've been writing a Mesquite (i.e. Nexus) file parser in Ruby. It uses the same basic lexer/parser engine that reads Newick formatted trees that I mentioned in a previous post to create a Ruby Nexus file object, with all the good bits (well most of them, some blocks are not parsed yet, but that's just a matter of extending the parser) easily accessible from the object. With this Nexus file object it was relatively trivial to write a conversion to mx (see fig.), i.e. a fully relational format.

The Ruby file parsing code is currently a plugin/library in the mx source, it can be easily extracted for use in other projects. Look for the code in mx 0.2.1540 and onwards when it makes it to Sourceforge, or contact me directly if you're really keen to get your hands on it.

Tuesday, April 1, 2008


Slashdot and the BBC have picked up a story on the use of X-ray Radiography to look at fossil insects. I've seen some pre-press examples of the same technology on extant bugs, the results are incredible, in part because all the soft tissues are left intact and you can get any cross section you want. I'm really curious as to 1) the the resolution on the fossils, on extant critters it is apparently very close to SEM calibre; and 2) the cost. This could be a huge boon to making fossil data available to phylogenetic studies. The kicker at the end- the researchers suggest that the printed plastic insect could be designated as a type specimen. In many ways this would have huge advantages, as anybody with the technology could print their own types. The real suggestion underlying this is not that the plastic itself is the type, but rather that a set of 1s and 0s can be typified, the plastic of course being generated from digital data. This of course opens up a whole can of (prehistoric?) worms. How many 1s an 0s are needed before a (meaningful) type can be designated? Can I use a CoolPix for imaging and maybe bundle my images with some 1s and 0s that encode for some specimen data, and typify the resulting zip file? From a pragmatic standpoint- why not?

Friday, February 29, 2008

barcoding (dna) Google tech talk

Haven't seen this pointed at yet, a little state of the union from Hebert and Janzen that was posted on Google's tech talks. "It [barcoding] works with startling clarity." ;). A number of interesting insights during the question period, with a general focus on scale, diminishing returns, and legal issues (curious that, coming from Google and all).

Sunday, January 27, 2008


Annotating images with user-created overlays is a must for defining morphological characters, or enhancing ontologies. This seems to be a relatively tricky thing to do over the web. Inputdraw is a SWF widget that is free for noncommercial use. It allows you to save overlays drawn on images into forms, the data are then saved as SVG text. The OpenCollections software also mentions an annotation system that is in the works, though the bottom line there is that its not quite ready prime time. At a recent Morphbank meeting someone (apologies for not remembering who) mentioned that another solution might involve using the Google Maps API to create polylines or points on custom "maps", which would in fact be your images. The Beginning Google Maps Applications with Rails and Ajax book is a decent starting point for implementing this approach. Note that you would need .gif or .png formatted images if you attempt this. I used the Beginning Google Maps book to finally implement maps "natively" within mx (we still have hooks to BerkeleyMapper, which is a great service), but not without several hours of frustration. While the book is quite clearly written the code provided in Chapter 3 is incomplete or erroneous in several very frustrating ways, so make sure to download the updated code from the website if you have the book.

As I add new tables to mx I'm starting to add fewer and fewer columns, with the idea that tagging can be used as the primary means of extending the basic objects. Tagging essentially allows you to extend your records to as many fields as you want, and is therefor a very simple way to provide extensibility. A new book on the tagging phenomenon by Gene Smith looks to be a must read, I think I'll order mine now.

Finally, while somewhat older news, I keep thinking about social annotation with respect to taxonomy, and also things like scoring phylogenetic characters. How might things discussed in this Google talk from Luis von Ahn be applied to systematics research? Within taxonomy one approach could be to simply photograph many specimens and then allow the general public to point out the similarities and difference. These could then be vetted by the "experts" as a starting point. This approach, albeit with a greatly simplified "taxonomy" and character set, is being used at GalaxyZoo. In the GalaxyZoo example the galaxies have already been classified by computer algorithms into various types, so there is an excellent comparative dataset for testing things like the trustworthiness of the public contributions. Games like those discussed by von Ahn could also be used in developing hypotheses of character homology. In systematics we present homology hypotheses that are then further tested using phylogenetic analysis. What if part of this testing requires that the definition of these hypotheses be agreed upon by two or more experts, using games like those discussed by von Ahn? In theory this agreement is already required, to some degree, as implemented in the review process that occurs prior to publication. It could, however, be made more explicit (and fun?). Given the right framework for playing these games there would be many beneficial spin offs including obvious things like annotated ontologies of morphological characters.

Sunday, January 13, 2008

FigTree v1.1

Andrew Rambaut has posted a message to the beast-users listserv, he's just released a new version of FigTree. Among other things this new version allows you to re-root trees (something I bugged him for a while back, as I'm sure others did as well). Perhaps even better, the source code has been released. With these new features, and now that others can conceivably add new bells and whistles, I suspect FigTree will remain the premier tree rendering software for some time.