[Skip to content]

MRC Prion Unit
From fundamental research to prevention and cure


The Human Genome Project, which set out to identify all human genes and was completed in 2003, generated a vast amount of data of different types such as sequence, protein structure information or information stored in the literature. Additionally, new techniques were developed to examine the emerging data. Examples are microarray experiments (where thousands of miniscule spots of DNA are made visible according to their abundance at a certainsnapshot in the cell) and genome wide association studies (examination of genetic variation across an entire genome).

This increasing scale of information available via the internet or databases requires the use of computational methods for extraction, storage, visualisation and analysis of these data.

Therefore, Bioinformatics is a relatively new, rapidly developing and interdisciplinary branch of science, which merges molecular biology, computer science, information technology and statistics into one discipline. Bioinformatics research includes the development of new algorithms (a sequence of instructions to the computer for automatic analysis and data processing) and statistical methods necessary to assess and investigate relationships among elements of large data sets and to study and interpret various types of data.

Here in the Prion Unit these kinds of experiments are evaluated using packages from Bioconductor packages, an open source software project for the analysis and comprehension of genomic data based on the statistical R programming language. The result of microarrays, for example, are lists of genes, which can be annotated in terms of function (Gene Ontology) and interaction in the cell (KEGG) to make conclusions about how they work together in the cell.

Usually it is not possible to pinpoint exactly which genes and proteins are of biological importance (which genes are involved in prion disease) if these gene lists are long, contain differently regulated genes of disparate function and are not visibly prominent in the data set. The decision, which  proteins warrant further experimental analysis, can be aided by combining all the primary data and the introduction of additional evidence. This could include, information about protein-protein interaction, functional similarity or co-citation in the literature. A combination of data (data integration) can mutually strengthen the significance of genes through multiple appearance in data sets.

A computer program developed here in the Unit, with a graphical user interface, allows the storage of data (in the Postgresql database management system), analysis and integration of prion specific data sets. Links between proteins are imaged using CytoscapeThis is an open source Java program able to build molecular and genetic interaction maps, to visualise high throughput data and to evaluate it using a graph theoretical approach.Therefore, identifying densely interconnected regions in a network, indicating regions of increased interest. An example of such a network between proteins is shown below:

Network between proteins

Additional approaches to integrate data have involved, a more declarative approach, through building  graphical probabilistic models such as Bayesian networks based on conditional probability (see this explanatory publication).

Bioinformatics service

In a secondary multi-faceted support role, my position in this Unit is that of a general adviser in terms of bioinformatic requirements. This demands interaction with the majority of groups/programmes in the Unit on a variety of levels. It ranges from bioinformatics support on a day-to-day basis and may involve bioinformatics related queries of any kind, extended databases searches, data manipulation/extraction or statistical queries, to major projects. Only the use of computational methods enables researchers to discover relevant information in the vast information pool; in either databases, on the internet or in other large data collections, and greatly facilitates and accelerates the process of analysis and information extraction. The requirement of the Unit has necessitated the generation of a number of  specific bioinformatics tools written in a variety of computer languages, the installation of bioinformatics software on a variety of operating system platforms and maintenance of the Bioinformatics server.  Additionally, a number of in-house databases are maintained and curated.