Chapter Category: Bioinformatics

From the book Discovering Biomolecular Mechanisms with Computational Biology

Reliable and Specific Protein Function Prediction by Combining Homology with Genomic(s) Context

Martijn A. Huynen, Berend Snel and Toni Gabaldón

Completely sequenced genomes and other types of genomics data provide us with new information to predict protein function. While classical, homology-based function prediction provides information about a proteins’ molecular function (what does the protein do at a molecular scale?), the analysis of the sequence in the context of its genome or in other types of genomics data provides information about its functional context (what are the proteins’ interaction partners, and in which biological process does it play a role?) Genomic context data are however inherently noisy. Only by combining different types of genomic(s) context data (vertical comparative genomics) or by combining the same type of genomics data from different species (horizontal comparative genomics) do they become sufficiently reliable to be used for protein function prediction. Homology-based function prediction and context-based function prediction provide complementary information about a protein’s function and can be combined to make predictions that are specific enough for experimental testing. Here we discuss the genomic coverage and reliability of combining genomics data for protein function prediction and survey predictions that have actually led to experimental confirmation. Using a number of examples we illustrate how combining the information from various types of genomics data can lead to specific protein function predictions. These include the prediction that the Ribonuclease L inhibitor (RLI) is involved in the maturation of ribosomal RNA.

Taken from the book

Discovering Biomolecular Mechanisms with Computational Biology

Edited by: Martijn A. Huynen, Berend Snel and Toni Gabaldón

More chapters from the book:

Single nucleotide polymorphisms (SNPs) are the major source of human genetic varia tion, and the functional subset of SNPs, predominantly in protein coding regions, con tributes to phenotypic variation. However, much of the variation in coding regions may not produce any functional effects....


A new theory of early molecular evolution is described, proceeding from original speculations to specific predictions and their confirmations. This classical cycle is then repeated generating the earliest picture of evolving Life. First, a consensus temporal order (“chronology”) of...


In addition to multiple, complete genome sequences, genome-wide data on biological prop properties of genes, such as knockout effect, expression levels, protein-protein interactions, and others, are rapidly accumulating. Numerous attempts were made by many groups to examine connections between...


A recent series of publications demonstrated that identification of genomic regions subjected to positive selection (hitchhiking mapping) is possible and could be appliedrnin an ecological context. This review focuses on the use of microsatellite markers in genome scans for the identification of...


Cytochrome P450 is a focus of attention as it comprises one of the largest superfamilies of enzyme proteins. Metabolization of many drugs is affected by cytochrome P450. It is an attractive drug target, e.g., cytochrome P450s of Mycobacterium tuberculosis are promising targets in the fight...


Since the 1960s, the mathematical modelling of intracellular systems, such as metabolic pathways, signal transduction cascades and transport processes, is an ever-increasing field of research. The results of most modelling studies in this field are in good qualitative or even quantitative...


The analysis of uncharacterized biomolecular sequences obtained as a result of genetic screens, expression profile studies, etc. is a standard task in a life science research environment. The understanding of protein function is typically the main difficulty. This chapter intends to give...


The development of DNA microarray technology has made it possible to monitor the mRNA abundance of all genes simultaneously (the transcriptome) for a variety of cellular conditions. In addition, microarray-based genomewide measurements of promoter occupancy (the occupome) are now available for...


Extracting Information for Meaningful Function Inference Through Text-Mining
Hong Pan, Li Zuo, Rajaraman Kanagasabai, Zhuo Zhang, Vidhu Choudhary, Bijayalaxmi Mohanty, Sin Lam Tan, S.P.T. Krishnan, Pardha Sarathi Veladandi, Archana Meka, Weng Keong Choy, Sanjay Swarup and Vladimir B. Bajic*

One of the emerging technologies in computational biology is text-mining which in cludes natural language processing. This technology enables extraction of parts of relevant biological knowledge from a large volume of scientific documents in an automated fashion. We present several systems...


Completely sequenced genomes and other types of genomics data provide us with new information to predict protein function. While classical, homology-based function prediction provides information about a proteins’ molecular function (what does the protein do at a molecular scale?), the...


Literature and Genome Data Mining for Prioritizing Disease-Associated Genes
Carolina Perez-Iratxeta, Peer Bork and Miguel A. Andrade

The first step in understanding the molecular biology of an inherited disease is to identify which gene or genes are carrying variants. This process starts with locating the mutations in a chromosomal band, as narrow as possible, and follows with the manual analysis of all the genes mapping in...


Advertisements