Using whole genome presence/absence data to untangle function in 12 Drosophila genomes

 Abstract

The Drosophila 12 genome data set was used to construct whole genome, gene family presence/absence matrices using a broad range of E value cutoffs as criteria for gene family inclusion. The various matrices generated behave differently in phylogenetic analyses as a function of the e-value employed. Based on an optimality criterion that maximizes internal corroboration of information, we show that values of e-105 to e-125 extract the most internally consistent phylogenetic signal. Functional class of most genes and gene families can be accurately determined based on the D. melanogaster genome annotation. We used the gene ontology (GO) system to create partitions based on gene function. Several measures of phylogenetic congruence (diagnosis, consistency, partitioned support , hidden support) for different higher and lower level GO categories, were used to mine the data set for genes and gene families that show strong agreement or disagreement with the overall combined phylogenetic hypothesis. We propose that measures of phylogenetic congruence can be used as criteria to identify loci with related GO terms that have a significant impact on cladogenesis.

Full Text Options
Article
Metrics
 Share
 Full Text
 Info
Pages
291 - 299
Type
Research Paper
 Metrics
 Cite This Article
 Permissions
 Permissions
 Reprints
Using whole genome presence/absence data to untangle function in 12 Drosophila genomes