Ensembl

This MedLibrary.org supplementary page on Ensembl is provided directly from the open source Wikipedia as a service to our readers. Please see the note below on authorship of this content, as well as the Wikipedia usage guidelines. To search for other content from our encyclopedia supplement, please use the form below:

Ensembl is a joint project between European Molecular Biology Laboratory (EMBL) - European Bioinformatics Institute (EBI) and the Wellcome Trust Sanger Institute (WTSI) to develop a software system which produces and maintains automatic annotation on selected eukaryotic genomes.

The project consists of:

  • A database schema and associated API to store genomic information;
  • Extension databases to represent functional, comparative and variational genomics;
  • A "genebuild pipeline" which takes sequence data and builds gene models;
  • Databases containing information for approximately 40 genomes;
  • A website for which users can browse this information;
  • An FTP site which stores the release data along with dumps of genomic sequence and it's associated annotation;
  • Public MySQL instances containing copies of the databases behind the Ensembl website.

The project is based at the Wellcome Trust Genome Campus, Hinxton, United Kingdom.

Contents

Goals of Ensembl

The Ensembl project aims to provide:

  • Accurate, automatic analysis of genome data.
  • Analysis and annotation maintained on the current data.
  • Presentation of the analysis to all via the web.
  • Distribution of the analysis to other bioinformatics laboratories.

The initial Ensembl project concentrates on vertebrate genomes. A new project Ensembl Genomes based at the EBI will be extending the project into plants, bacteria, protists and metazoa.

Additionally a number of other projects use some or all of the parts of the Ensembl project to represent their data.

Commitments of Ensembl

The commitments of the Ensembl project are:

  • Release of data and analysis into the public domain immediately.
  • Open, collaborative software development: Ensembl imposes no restrictions on access to, or use of, the data provided and the software used to analyse and present it. For more details see the code licence and disclaimer.
  • Collaboration on agreed standards for distribution.
  • Timely development.

Software and data

The project is open source - all data and all software that is produced in the project can be freely accessed and used.

Most of the software produced and used is written in the language Perl and is based on the BioPerl infrastructure. The Perl API can be easily employed in other genomic projects e.g. for the annotation of gene or clone lists.

The website code uses an extensible plugins system which allows groups to modify the website for their own data sets, e.g. Vega which stores and displays manual annotation and Gramene which stores plant genomes.

Ensembl API

Ensembl website

Currently the Ensembl website is undergoing a significant re-design - with the release of the fifth incarnation of the Ensembl website design.

The website is, as most of the Ensembl code, written in the Perl scriting language, and runs using mod_perl under Apache. The main websites all run on Unix (Linux) architecture.

The webcode is extensible - using a plugin based system to allow the code to be easily extended. The code is written around a Factory/Object/Configuration/Component model (modified MVC).

Ensembl pipeline

Current species

The annotated genomes include most fully sequenced vertebrates and selected model organisms. All of them are eukaryotes, there are no prokaryotes. Currently this includes:

  • Chordates
    • Mammals
      • Primates: Bush baby, Chimp, Human, Macaque, Mouse Lemur, Orangutan
      • Rodents etc.: Guinea Pig, Mouse, Pika, Rabbit, Rat, Squirrel, Tree shrew
      • Laurasiatheria: Cat, Cow, Dog, Hedgehog, Horse, Microbat, Shrew, Pig (pre)
      • Afrotheria: Elephant, Lesser hedgehog tenrec
      • Xenarthra: Armadillo
      • Marsupials & Monotremes: Opossum, Platypus
    • Birds: Chicken
    • Fish: Takifugu rubripes (Fugu), Tetraodon nigroviridis (Green spotted pufferfish), Danio rerio (Zebrafish), Oryzias latipes (Medaka), Gasterosteus aculeatus (Stickleback), Petromyzon marinus (Sea lamprey) (pre)
    • Reptiles & Amphibians: Xenopus tropicalis, Anole Lizard (pre)
    • Ancient relatives: Ciona intestinalis, Ciona savignyi
  • Invertebrates
  • Yeast: Saccharomyces cerevisiae (Baker's yeast)

Usage

The service is used by molecular biologists and bioinformaticians around the world working with genome data of the above organisms. The predictions of coding, controlling and other elements in the genomes can be compared with primary research data and with common repositories of current genomic knowledge (Biological Databases).

The comparison of organisms (comparative genomics or also intergenomics) with respect to their gene structures and the coded proteins is of special interest. The synteny view can be useful educational material for school classes.

See also

External links

Databases supported by Bioinformatic Harvester
NCBI-BLAST | CDD | Ensembl | Entrez | Flybase | Flymine | GFP-cDNA | Genome_browser | GeneCard | Google_Scholar | GoPubMed | HomoloGene | iHOP | IPI | OMIM | Mitocheck | PSORT | PolyMeta | UniProt | SOURCE | SOSUI | RZPD | Sciencenet | STRING | SMART | ZFIN |

Wikipedia content modification information:

  • This page was last modified on 23 June 2008, at 13:25.

Wikipedia Authorship and Review

Wikipedia content provided here is not reviewed directly by MedLibrary.org. Wikipedia content is authored by an open community of volunteers and is not produced by or in any way affiliated with MedLibrary.org.

Wikipedia Usage Guidelines

This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article on "Ensembl".

The URL for this specific entry is:

All Wikipedia text is available under the terms of the GNU Free Documentation License. (See Copyrights for details). Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc.