Glossary of 'Omics Terminology#

  • AMPLICON: An amplicon is a short DNA fragment amplified by a polymerase chain reaction (PCR). Often, these will be linear products of PCR used to select for a particular barcode, that are subsequently sequenced. (1)

  • AMPLICON SEQUENCING: Amplicon sequencing is a highly targeted approach that enables researchers to analyze genetic variation in specific genomic regions. The ultra-deep sequencing of PCR products (amplicons) allows efficient variant identification and characterization. This method uses oligonucleotide probes designed to target and capture regions of interest, followed by next-generation sequencing (NGS). (1)

  • AMPLICON SEQUENCE VARIANT (ASV): An amplicon sequence variant (ASV) is a single DNA sequence inferred from high-throughput analysis of marker genes. These sequences are obtained after removing erroneous sequences generated during PCR and sequencing1. ASVs allow for precise identification of sequence variations down to a single nucleotide change. ASVs are particularly useful in eDNA research and genetic studies because they provide higher resolution compared to traditional methods like operational taxonomic units (OTUs) (5)

  • ASSEMBLY: In the context of genomes, assembly refers to the process of taking a large number of DNA sequences and putting them back together to create a representation of the original chromosomes from which the DNA originated. Genome assemblies facilitate many downstream genomic analyses. (1)

  • BIOINFORMATICS: Bioinformatics, as related to genetics and genomics, is a scientific subdiscipline that involves using computer technology to collect, store, analyze, and disseminate biological data and information, such as DNA, RNA, and amino acid sequences or annotations about those sequences. Scientists and clinicians use databases that organize and index such biological information to increase our understanding of human and environmental health. (1)

  • BLAST: The Basic Local Alignment Search Tool that finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and identifies potential matches between the two, providing statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families. (1)

  • CONTIG: A contig is a set of DNA segments or sequences that overlap in a way that provides a contiguous representation of a genomic region. For example, a clone contig provides a physical map of a set of cloned segments of DNA across a genomic region, while a sequence contig provides the actual DNA sequence of a genomic region. In genome assembly, a contig is the assembled overlapping contiguous DNA sequences. (1)

  • DNA BARCODE: “DNA barcodes consist of a standardized short sequence of DNA (400–800 bp) that in principle should be easily generated and characterized for all species on the planet…DNA barcoding aims to use the information of one or a few gene regions to identify all species of life” (4)

  • DNA EXTRACTION: This process purifies the DNA from everything else in the sample. This is typically done with commercially available DNA extraction kits that use basic chemistry principles to separate DNA from lipid membranes, proteins, and other cellular components. (3)

  • ENVIRONMENTAL DNA (eDNA): Environmental DNA (eDNA) is organismal DNA that can be found in the environment. This DNA originates from cellular material shed by larger organisms (via skin, excrement, etc.) into aquatic or terrestrial environments, or can represent the entirety of smaller organisms present within the sample. (1)

  • GENOME: The genome is the entire set of DNA instructions found in a cell. A genome contains all the information needed for an individual to develop and function. (1)

  • GENOTYPING: Genotyping is the process of determining differences in the genetic make-up (genotype) of an individual by examining the individual’s DNA sequence. This involves identifying specific variations or mutations in the genome, such as single nucleotide polymorphisms (SNPs), insertions, deletions, and other genetic markers. (1)

  • MARKERS: A marker (largely synonymous with the word “landmark” and often referred to as a genomic marker or a genetic marker) is a DNA sequence, typically with a known location in a genome. Markers can reflect random sequences, genomic variants or genes.(2)

  • METABARCODING: DNA metabarcoding is an approach that identifies multiple species from a mixed sample (bulk DNA or eDNA) based on high-throughput sequencing (HTS) of a specific DNA marker. It differs from conventional DNA barcoding (usually based on Sanger sequencing of individual specimens) because it sequences and analyses DNA originating from many different individuals and species, allowing taxonomy to be rapidly assigned to many DNA fragments present in a sample. (1)

  • METAGENOMICS: The analysis of entire DNA and/or RNA sequences isolated and analyzed from all of the organisms in a bulk sample, typically recovered directly from environmental samples. These analyses are typically used to study microbes. There are two types of commonly used metagenomic analysis:** 1. Targeted metagenomics uses certain conserved regions (16s rRNA, 18s rRNA, ITS regions) that are amplified with PCR primers and sequenced. These conserved regions have variable regions that allow for identification of different groups of organisms. 2. Shotgun metagenomics. This method is non-discriminant in that it will sequence all genetic material in an environmental sample. (1)

  • MITOGENOME: The entire DNA sequence, or genome, contained within the mitochondria that codes for part of the proteins constituting the organelle. Many genes present on the mitogenome are used as metabarcoding markers. Mitogenomes are double-stranded DNA molecules of variable size that generally are found as circular, linear, or branched forms. Because a cell can have many mitochondria, each cell may contain more than 1,000 copies of a single mitogenome haplotype. (1)

  • OMICS: The word omics refers to a field of study in biological sciences that ends with -omics, such as genomics, transcriptomics, proteomics, or metabolomics. The ending -ome is used to address the objects of study of such fields, such as the genome, transcriptome, proteome, or metabolome, respectively. In common language, -omics typically implies a high throughput method resulting in large amounts of biomolecular data. More specifically, genomics is the science that studies the structure, function, evolution, and mapping of genomes and aims at characterization and quantification of genes, which direct the production of proteins with the assistance of enzymes and messenger molecules. (1)

  • OPERATIONAL TAXONOMIC UNIT (OTU): OTUs are defined as a cluster of sequences that have a sequence identity above a given threshold, usually represented as a % similarity. OTUs are commonly used as proxies for species in metabarcoding or metegenomics. (1)

  • POLYMERASE CHAIN REACTION (PCR): Polymerase chain reaction (abbreviated PCR) is a laboratory technique for rapidly producing (amplifying) millions to billions of copies of a specific segment of DNA, which can then be studied in greater detail. PCR involves using short synthetic DNA fragments called primers to select a segment of the genome to be amplified, and then multiple rounds of DNA synthesis to amplify that segment. (1)

  • PRIMERS: A primer, as related to genomics, is a short single-stranded DNA fragment used in certain laboratory techniques, such as the polymerase chain reaction (PCR). In the PCR method, a pair of primers hybridizes with the sample DNA and defines the region that will be amplified, resulting in millions and millions of copies in a very short timeframe. Primers are also used in DNA sequencing and other experimental processes. (2)

  • QUANTITATIVE POLYMERASE CHAIN REACTION (qPCR): Also known as real-time PCR, is a method of DNA amplification that allows determination of the absolute quantity of target DNA in the sample according to a calibration curve constructed of serially diluted standard samples with known concentrations or copy numbers. It allows for the real-time quantification of amplicons, as opposed to post-amplification as in PCR. (1)

  • TRANSCRIPTOME: The transcriptome is the complete set of transcripts in a cell, and their quantity, for a specific developmental stage or physiological condition. Understanding the transcriptome is essential for interpreting the functional elements of the genome and revealing the molecular constituents of cells and tissues, and also for understanding development and disease. (1)

References:#

1) NASA GCMD Glossary 2) NIH Talking Glossary of Genomic and Genetic Terms 3) USGS The Process of eDNA 4) Kress, W.J., & D.L. Erickson. 2008. DNA barcodes: Genes, genomics, and bioinformatics, Proc. Natl. Acad. Sci. U.S.A., 105 (8) 5) Callahan, B.J., McMurdie, P.J., & S.P. Holmes. 2017. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis, The ISME Journal, 11 (12)