Appendix 3: RNA Modification Subsystems in the SEED Database
Valérie de Crécy-Lagard and Gary Olsen
With over 800 genome sequences available and thousands more in the pipeline (see Genome Online Database for latest number updates www.genomeonline.org), the genetic information used by most biologists/biochemists is now derived mainly from genomic sequences that have been annotated in silico. Functional inferences based on comparative sequence analysis are established foundations of genomic annotation. For well studied gene families, in which the initial annotation has been experimentally verified, these homology‑based methods are quite accurate in predicting function. However, factors such as low sequence similarity, multi‑domain proteins, gene duplications and non‑orthologous displacements have all contributed to incorrect or absent annotations. This has been a major problem in the field of RNA modification enzymes because many are members of large paralogous families and transferring functional annotations using BLAST scores alone can be very dangerous, particularly between kingdoms. Cases where the closest homologs in two genomes do not catalyze the same reaction are numerous in the RNA modification field with the added complication of having both tRNA and rRNA and/or snRNA as potential substrates (see 6‑10 for specific examples). The complexity in the annotation of RNA modification genes is such that the identification of the complete set of modification genes in a given genome requires thorough and extensive analysis, a process that to date has been limited to only a few organisms. Even these analyses are not complete and many RNA modification genes are still missing in the most extensively studied models. Databases such as Modomics (see Appendix 1 by Rother et al) provide a reference for compiling all the modification pathways and corresponding enzymes but are not designed to easily access the corresponding genetic information. In order to make the link betwen the modifications and the corresponding genes, we and other members of the SEED community have started to encode the RNA modification and processing genes in the SEED database.