Using whole genome presence/absence data to untangle function in 12 Drosophila genomes
Volume 2, Issue 6
Downloads and Tools
Pages 291 - 299
Authors: Jeffrey A. Rosenfeld, Ernest K. Lee, Patrick M. O'Grady and Rob DeSalle View affiliations
The Drosophila 12 genome data set was used to construct whole genome, gene family presence/absence matrices using a broad range of E value cutoffs as criteria for gene family inclusion. The various matrices generated behave differently in phylogenetic analyses as a function of the e-value employed. Based on an optimality criterion that maximizes internal corroboration of information, we show that values of e-105 to e-125 extract the most internally consistent phylogenetic signal. Functional class of most genes and gene families can be accurately determined based on the D. melanogaster genome annotation. We used the gene ontology (GO) system to create partitions based on gene function. Several measures of phylogenetic congruence (diagnosis, consistency, partitioned support , hidden support) for different higher and lower level GO categories, were used to mine the data set for genes and gene families that show strong agreement or disagreement with the overall combined phylogenetic hypothesis. We propose that measures of phylogenetic congruence can be used as criteria to identify loci with related GO terms that have a significant impact on cladogenesis.
Received: July 15, 2008; Accepted: November 24, 2008