The Fifth International Conference on Intelligent Systems for Molecular Biology held June 21-25, 1997, in Porto Carras, Greece, ended with a workshop on Automatic Annotation of Genome Sequence Data.
Automatic annotation of large amounts of genomic DNA sequence clearly is and will continue to be a formidable challenge. When completed, the human genome sequence will consist of 24 strings of As, Ts, Cs, and Gs with a combined length of 3 billion characters. Without marking the locations of such biologically important parts of the sequence as the genes and their regulatory elements, this string of characters has little usefulness. Annotating the genome sequence in parallel with its determination is critical.
Attendees felt this problem will be addressed properly only by developing very efficient computational tools for initial sequence annotation, treating the annotations as hypotheses, and testing and verifying them in the laboratory. Additionally, for maximum usefulness, the generated annotation results must be stored in an easily retrievable and queryable form in well-curated databases. The “If you sequence it, the community will annotate it” approach is unlikely to produce desired results, and new paradigms and possibly new organizational models will be needed to present genomic sequence in its most useful form.
Workshop Speakers
Eight workshop speakers addressed the challenges and technologies in automatic annotation and the most efficient division of labor between biology and computer science.
Introductory remarks by session chairman Chris Sander [European Molecular Biology Laboratory–European Bioinformatics Institute (EBI)] made clear that no one yet has the experience to know the right way to proceed with automatic annotation. Richard Durbin (Sanger Centre) stressed an often-repeated theme that proper annotation will require wet-laboratory work as well as computational annotation. He also stressed the need for curated databases. Michael Ashburner (EBI) discussed his experience in annotating Drosophila sequences and the need for hierarchial controlled vocabularies. He suggested the possibility of an annotation database that would be separate from but seamlessly linked to the sequence databases.
Three other speakers addressed general problems in genomic-sequence annotation: Antoine Danchin (Institut Pasteur) discussed annotation of the Bacillus subtilis genome, Terry Gaasterland (Argonne National Laboratory) described annotating microbial genomes, and Chris Overton (University of Pennsylvania) shared experiences from a project to annotate genomic sequence from human chromosome 22. Other speakers discussed annotation efforts and tools being developed in the bioinformatics industry.