dbSNP—database for single nucleotide polymorphisms and other classes of minor genetic variation

ST Sherry, M Ward, K Sirotkin - Genome research, 1999 - genome.cshlp.org
ST Sherry, M Ward, K Sirotkin
Genome research, 1999genome.cshlp.org
Akey aspect of research in genetics is associating sequence variations with heritable
phenotypes. The most common variations are single nucleotide polymorphisms (SNPs),
which occur approximately once every 500–1000 bases in a large sample of aligned human
sequence. Because SNPs are expected to facilitate large-scale association genetics studies,
there has recently been great interest in SNP discovery and detection. In collaboration with
the National Human Genome Research Institute (NHGRI), the National Center for …
Akey aspect of research in genetics is associating sequence variations with heritable phenotypes. The most common variations are single nucleotide polymorphisms (SNPs), which occur approximately once every 500–1000 bases in a large sample of aligned human sequence. Because SNPs are expected to facilitate large-scale association genetics studies, there has recently been great interest in SNP discovery and detection. In collaboration with the National Human Genome Research Institute (NHGRI), the National Center for Biotechnology Information (NCBI) has established the dbSNP database (http://www. ncbi. nlm. nih. gov/SNP) to serve as a central repository for molecular variation. Designed to serve as a general catalog of molecular variation to supplement GenBank (Benson et al. 1999) database submissions can include a broad range of molecular polymorphisms: single base nucleotide substitutions, short deletion and insertion polymorphisms, microsatellite markers, and polymorphic insertion elements such as retrotransposons. Although the name dbSNP is a slight misnomer given the variations represented, SNP polymorphisms are the largest class of variation in the database, and the name dbSNP, selected at the request of NHGRI, reflects this fact. For the sake of brevity, we elected to use the term SNP as a shorthand for “variation” in the database notation and documentation (http://www. ncbi. nlm. nih. gov/SNP/get_html. cgi? whichHtml= how_to_ submit). Thus terms used in the documentation like “submitted SNP” or “reference SNP” refer to all classes of variation in the database and should be regarded as meaning “a submitted report of variation” and “a reference report of variation.” Furthermore, it should be noted that in serving its role as the variation complement to GenBank, dbSNP does not restrict submissions to only neutral polymorphisms. Submissions are welcome on all classes of simple molecular variation, including those that cause rare clinical phenotypes. Submissions to dbSNP come from a variety of sources including individual laboratories, collaborative polymorphism discovery efforts, large-scale genome sequencing centers, and private industry. The data collected range from the tightly focused characterization of particular genes to broadly sampled levels of variation from random genomic sequence. The distribution of reported marker density across the genome is thus expected to be mixed, with an expected minimum density of 1/3000 bases in regions of random genomic sequence, and local regions of higher density around well-characterized genes. Each variation submitted to dbSNP must have an identifier provided by the submitter (called a “local” identifier by db-SNP), and each is issued a unique identifier, formatted as an integer prefixed with ss (for submitted SNP), for example, ss334. An ss number is thus permanently associated with the submitter’s identifier, and it can be treated as a formal accession number by the scientific publishing community.
genome.cshlp.org