World Library  
Flag as Inappropriate
Email this Article

Expressed sequence tag

Article Id: WHEBN0000477426
Reproduction Date:

Title: Expressed sequence tag  
Author: World Heritage Encyclopedia
Language: English
Subject: Plant evolutionary developmental biology, Sequence assembly, GENCODE, Genomics, Gene prediction
Collection: Dna, Gene Expression, Genomics
Publisher: World Heritage Encyclopedia

Expressed sequence tag

An expressed sequence tag or EST is a short sub-sequence of a cDNA sequence.[1] They may be used to identify gene transcripts, and are instrumental in gene discovery and gene sequence determination.[2] The identification of ESTs has proceeded rapidly, with approximately 74.2 million ESTs now available in public databases (e.g. GenBank 1 January 2013, all species).

An EST results from one-short sequencing of a cloned cDNA. The cDNAs used for EST generation are typically individual clones from a cDNA library. The resulting sequence is a relatively low quality fragment whose length is limited by current technology to approximately 500 to 800 nucleotides. Because these clones consist of DNA that is complementary to mRNA, the ESTs represent portions of expressed genes. They may be represented in databases as either cDNA/mRNA sequence or as the reverse complement of the mRNA, the template strand.

ESTs can be mapped to specific chromosome locations using physical mapping techniques, such as radiation hybrid mapping, Happy mapping, or FISH. Alternatively, if the genome of the organism that originated the EST has been sequenced, one can align the EST sequence to that genome using a computer.

The current understanding of the cancer) gives information on the conditions in which the corresponding gene is acting. ESTs contain enough information to permit the design of precise probes for DNA microarrays that then can be used to determine the gene expression.

Some authors use the term "EST" to describe genes for which little or no further information exists besides the tag.[3]

The significance of ESTs, their properties, methods to analyze EST dataset and their applications in various areas of biology have been reviewed by Nagaraj et al. (2007).[4]


  • History 1
  • Sources of data and annotations 2
    • dbEST 2.1
    • EST contigs 2.2
    • Tissue information 2.3
  • See also 3
  • References 4
  • External links 5


In 1979 teams at Harvard and Caltech extended the basic idea of making DNA copies of mRNAs in vitro to amplifying a library of such in bacterial plasmids[5]

In 1982, the idea of selecting random or semi-random clones from such a cDNA library for sequencing was explored by Greg Sutcliffe and coworkers.[6]

In 1983, Putney et al. sequenced 178 clones from a rabbit muscle cDNA library[7]

In 1991 Adams and co-workers coined the term EST and initiated more systematic sequencing as a project (starting with 600 brain cDNAs).[2]

Sources of data and annotations


dbEST is a division of Genbank established in 1992. As for GenBank, data in dbEST is directly submitted by laboratories worldwide and is not curated.

EST contigs

Because of the way ESTs are sequenced, many distinct expressed sequence tags are often partial sequences that correspond to the same mRNA of an organism. In an effort to reduce the number of expressed sequence tags for downstream gene discovery analyses, several groups assembled expressed sequence tags into EST contigs. Example of resources that provide EST contigs include:

  • TIGR gene indices [8]
  • Unigene [9]
  • STACK [10]

Constructing EST contigs is not trivial and may yield artifacts (contigs that contain two distinct gene products). When the complete genome sequence of an organism is available and transcripts are annotated, it is possible to bypass contig assembly and directly match transcripts with ESTs. This approach is used in the TissueInfo system (see below) and makes it easy to link annotations in the genomic database to tissue information provided by EST data.

Tissue information

High-throughput analyses of ESTs often encounter similar data management challenges. A first challenge is that tissue provenance of EST libraries is described in plain English in dbEST.[11] This makes it difficult to write programs that can unambiguously determine that two EST libraries were sequenced from the same tissue. Similarly, disease conditions for the tissue are not annotated in a computationally friendly manner. For instance, cancer origin of a library is often mixed with the tissue name (e.g., the tissue name "glioblastoma" indicates that the EST library was sequenced from brain tissue and the disease condition is cancer).[12] With the notable exception of cancer, the disease condition is often not recorded in dbEST entries. The TissueInfo project was started in 2000 to help with these challenges. The project provides curated data (updated daily) to disambiguate tissue origin and disease state (cancer/non cancer), offers a tissue ontology that links tissues and organs by "is part of" relationships (i.e., formalizes knowledge that hypothalamus is part of brain, and that brain is part of the central nervous system) and distributes open-source software for linking transcript annotations from sequenced genomes to tissue expression profiles calculated with data in dbEST.[13]

See also


  1. ^ ESTs Factsheet. National Center for Biotechnology Information.
  2. ^ a b Adams MD, Kelley JM, Gocayne JD; et al. (Jun 1991). "Complementary DNA sequencing: expressed sequence tags and human genome project". Science 252 (5013): 1651–6.  
  3. ^ dbEST
  4. ^ Nagaraj SH, Gasser RB, Ranganathan S (Jan 2007). "A hitchhiker's guide to expressed sequence tag (EST) analysis". Brief. Bioinformatics 8 (1): 6–21.  
  5. ^ Sim GK, Kafatos FC, Jones CW, Koehler MD, Efstratiadis A, Maniatis T (December 1979). "Use of a cDNA library for studies on evolution and developmental expression of the chorion multigene families". Cell 18 (4): 1303–16.  
  6. ^ Sutcliffe JG, Milner RJ, Bloom FE, Lerner RA (August 1982). "Common 82-nucleotide sequence unique to brain RNA". Proc Natl Acad Sci U S A 79 (16): 4942–6.  
  7. ^ Putney SD, Herlihy WC, Schimmel P (1983). "A new troponin T and cDNA clones for 13 different muscle proteins, found by shotgun sequencing". Nature 302: 718–21.  
  8. ^ Lee Y, Tsai J, Sunkara S; et al. (Jan 2005). "The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes". Nucleic Acids Res. 33 (Database issue): D71–4.  
  9. ^ Stanton JA, Macgregor AB, Green DP (2003). "Identifying tissue-enriched gene expression in mouse tissues using the NIH UniGene database". Appl Bioinformatics 2 (3 Suppl): S65–73.  
  10. ^ Christoffels A, van Gelder A, Greyling G, Miller R, Hide T, Hide W (Jan 2001). "STACK: Sequence Tag Alignment and Consensus Knowledgebase". Nucleic Acids Res. 29 (1): 234–8.  
  11. ^ Skrabanek L, Campagne F (Nov 2001). "TissueInfo: high-throughput identification of tissue expression profiles and specificity". Nucleic Acids Res. 29 (21): E102–2.  
  12. ^ Campagne F, Skrabanek L (2006). "Mining expressed sequence tags identifies cancer markers of clinical interest". BMC Bioinformatics 7: 481.  
  13. ^ :institute for computational biomedicine::TissueInfo

External links

  • ESTs Factsheet from NCBI, a good and easy to read introduction to ESTs.
  • The NCBI Handbook, Part 3, Chapter 21 has a very nice overview.
  • ECLAT a server for the classification of ESTs from mixed EST pools (from fungus infected plants) using codon usage.
  • The current number of EST sequences in the GenBank division dbEST.
  • Web Resources for EST data and analysis
  • TissueInfo project: Curated EST tissue provenance, tissue ontology, open-source software.
  • Web resource contains all publicly available ESTs which has been processed through various cleaning steps where contaminating DNA e.g. vector, E coli and short sequences (<100bp) removed.
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.

Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.