Chapter Category: Bioinformatics

From the book Discovering Biomolecular Mechanisms with Computational Biology

Correlations between Quantitative Measures of Genome Evolution, Expression and Function

Yuri I. Wolf, Liran Carmel and Eugene V. Koonin

In addition to multiple, complete genome sequences, genome-wide data on biological prop properties of genes, such as knockout effect, expression levels, protein-protein interactions, and others, are rapidly accumulating. Numerous attempts were made by many groups to examine connections between these properties and quantitative measures of gene evolution. The questions addressed pertain to the most fundamental aspects of biology: what determines the effect of the knockout of a given gene on the phenotype (in particular, is it essential or not) and the rate of a gene’s evolution and how are the phenotypic properties and evolution connected? Many significant correlations were detected, e.g., positive correlation between the tendency of a gene to be lost during evolution and sequence evolution rate, and negative correlations between each of the above measures of evolutionary variability and expression level or the phenotypic effect of gene knockout. However, most of these correlations are relatively weak and explain a small fraction of the variation present in the data. We propose that the majority of the relationships between the phenotypic (“input”) and evolutionary (“output”) variables can be described with a single, composite variable, the gene’s “social status in the genomic community”, which reflects the biological role of the gene and its mode of evolution. “High-status” genes, involved in house-keeping processes, are more likely to be higher and broader expressed, to have more interaction partners, and to produce lethal or severely impaired knockout mutants. These genes also tend to evolve slower and are less prone to gene loss across various taxonomic groups. “Low-status” genes are expected to be weakly expressed, have fewer interaction partners, and exhibit narrower (and less coherent) phyletic distribution. On average, these genes evolve faster and are more often lost during evolution than high-status genes. The “gene status” notion may serve as a generator of null hypotheses regarding the connections between phenotypic and evolutionary parameters associated with genes. Any deviation from the expected pattern calls for attention—to the quality of the data, the nature of the analyzed relationship, or both.

Taken from the book

Discovering Biomolecular Mechanisms with Computational Biology

Edited by: Yuri I. Wolf, Liran Carmel and Eugene V. Koonin

More chapters from the book:

Single nucleotide polymorphisms (SNPs) are the major source of human genetic varia tion, and the functional subset of SNPs, predominantly in protein coding regions, con tributes to phenotypic variation. However, much of the variation in coding regions may not produce any functional effects....


A new theory of early molecular evolution is described, proceeding from original speculations to specific predictions and their confirmations. This classical cycle is then repeated generating the earliest picture of evolving Life. First, a consensus temporal order (“chronology”) of...


In addition to multiple, complete genome sequences, genome-wide data on biological prop properties of genes, such as knockout effect, expression levels, protein-protein interactions, and others, are rapidly accumulating. Numerous attempts were made by many groups to examine connections between...


A recent series of publications demonstrated that identification of genomic regions subjected to positive selection (hitchhiking mapping) is possible and could be appliedrnin an ecological context. This review focuses on the use of microsatellite markers in genome scans for the identification of...


Cytochrome P450 is a focus of attention as it comprises one of the largest superfamilies of enzyme proteins. Metabolization of many drugs is affected by cytochrome P450. It is an attractive drug target, e.g., cytochrome P450s of Mycobacterium tuberculosis are promising targets in the fight...


Since the 1960s, the mathematical modelling of intracellular systems, such as metabolic pathways, signal transduction cascades and transport processes, is an ever-increasing field of research. The results of most modelling studies in this field are in good qualitative or even quantitative...


The analysis of uncharacterized biomolecular sequences obtained as a result of genetic screens, expression profile studies, etc. is a standard task in a life science research environment. The understanding of protein function is typically the main difficulty. This chapter intends to give...


The development of DNA microarray technology has made it possible to monitor the mRNA abundance of all genes simultaneously (the transcriptome) for a variety of cellular conditions. In addition, microarray-based genomewide measurements of promoter occupancy (the occupome) are now available for...


Extracting Information for Meaningful Function Inference Through Text-Mining
Hong Pan, Li Zuo, Rajaraman Kanagasabai, Zhuo Zhang, Vidhu Choudhary, Bijayalaxmi Mohanty, Sin Lam Tan, S.P.T. Krishnan, Pardha Sarathi Veladandi, Archana Meka, Weng Keong Choy, Sanjay Swarup and Vladimir B. Bajic*

One of the emerging technologies in computational biology is text-mining which in cludes natural language processing. This technology enables extraction of parts of relevant biological knowledge from a large volume of scientific documents in an automated fashion. We present several systems...


Completely sequenced genomes and other types of genomics data provide us with new information to predict protein function. While classical, homology-based function prediction provides information about a proteins’ molecular function (what does the protein do at a molecular scale?), the...


Literature and Genome Data Mining for Prioritizing Disease-Associated Genes
Carolina Perez-Iratxeta, Peer Bork and Miguel A. Andrade

The first step in understanding the molecular biology of an inherited disease is to identify which gene or genes are carrying variants. This process starts with locating the mutations in a chromosomal band, as narrow as possible, and follows with the manual analysis of all the genes mapping in...


Advertisements