The process of elucidating the nucleotide sequence of a DNA fragment or of an entire genome (i.e. genome sequencing). Sequencing and analysing genomic DNA (or complementary DNA synthesized from RNA) now lies at the heart of medicine, genetics, and many other fields, providing rapid means of pinpointing genes and their disease-causing mutations, and enabling comparison of gene sequences from different species. A landmark was the publication of the first finished sequence of the human genome in 2003 (see human genome project), since when the genomes of organisms from every domain of life have been unravelled. First-generation sequencing relies chiefly on the Sanger method (named after Frederick Sanger), also called the dideoxy method or chain termination method, and introduced in the mid-1970s. This involves synthesizing a new DNA strand using as template single-stranded DNA from the sample being sequenced. Synthesis of the new strand is stopped at any of the four bases by adding the corresponding dideoxy (dd) derivative of the deoxyribonucleoside phosphates; for example, by adding ddATP the synthesis terminates at an adenosine; by adding ddGTP it terminates at a guanosine, etc. The fragments, which comprise fluorescently labelled nucleotides, are subjected to electrophoresis and scanned by a fluorescence detector. The Sanger method can easily be adapted to sequencing RNA, by making single-stranded DNA from the RNA template using the enzyme reverse transcriptase. This enables, for example, sequencing of ribosomal RNA for use in molecular systematics. Furthermore, by carrying out electrophoresis in capillaries (instead of on slab gels) and using fluorescent dyes as labels instead of radioisotopes (as originally), the Sanger method has been fully automated. After separation of the fragments, the products of all four reactions are detected by fluorescence spectroscopy and analysed by computer, which gives a printout of the base sequence.
Advances in miniaturization, more sophisticated separation, labelling, and detection techniques, coupled with greater computing power led to the development of second-generation (or next-generation) sequencing (NGS) methods, introduced from around 2005 onwards. These high-throughput methods dramatically speeded up the process and reduced the cost, so that an entire small genome can be sequenced in a day. A key factor is their ability to read millions or even billions of DNA fragments in parallel. Although different approaches to sequencing are employed, they can generally be characterized as ‘sequencing by synthesis’. The DNA sample is broken into relatively short fragments, typically several hundred bp long, and adaptor sequences are ligated to the ends of the fragments. This library of fragments is then amplified (e.g. using a form of polymerase chain reaction) to form clusters of identical fragments immobilized on a substrate, such as a micrometre-sized bead or a ‘flow cell’. The fragments in each cluster then serve as templates for the assembly of new DNA fragments having a complementary base sequence, through sequential flooding and removal of known nucleotides. Incorporation of each nucleotide into the growing strand is detected in various ways, for instance by generation of light, fluorescence, or a pH change. A computer program identifies overlapping sequences among the fragment reads and assembles them into contiguous sequences (contigs), after eliminating adapter sequences, low-quality reads, and other extraneous information. These can then be compared with sequences from other individuals (see alignment) or organisms, enabling features such as open reading frames, regulatory regions, and mutations to be identified.
These techniques are now being supplanted by third-generation sequencing methods, which rely not on amplification of short DNA fragments but on sequencing much longer single DNA molecules in real time, in some cases up to 1000 kb long. For example, one method uses microfabricated chips of multiple nanowells, with each well containing a single DNA polymerase molecule. The incorporation of a single nucleotide to a template strand can be detected by a fluorescent signal from that specific well. Another technology rapidly gaining ground is nanopore sequencing (see illustration). Double-stranded DNA is denatured and a single strand is made to pass through a nano-scale pore in a membrane, across which a voltage is applied. The ion current through the nanopore changes according to the size of each nucleotide as it traverses the channel. Hence the sequence of bases can be inferred from changes in the current. This simplified approach can be incorporated in small portable devices (the size of a mobile phone) that link to a computer via a USB port. This opens up vast new potential for DNA sequencing in small institutions and in the field. See also genome project.