Next-generation sequencing (NGS) empowers sequencing with remarkable throughput and scale and generates hundreds of billions of bases per day. The high-throughput aspect of NGS lowers the cost of sequencing while delivering rapid, accurate and reproducible data sets, which opens the door to new research areas. NGS refers to genome sequencing, genome resequencing, de novo sequencing, transcriptome sequencing, detecting DNA-protein interactions, and epigenome characterization to name a few. Demand for exponentially increasing sequencing data meets challenges such as computational analysis bottlenecks, interpretation and data storage.
Depending on the application and starting material there are currently several commercialized NGS platforms available that use distinct chemistries to allow massively parallel sequencing of many millions to billions of template DNA molecules. For several NGS platforms material requires prior construction and library preparation.
There are also significant challenges for NGS – in particular data processing and analyses. It is worth to keep in mind that not discussed here, the third-generation technologies may further revolutionize genomics research!
- Whole genome sequencing
- De novo sequencing
- Targeted sequencing
- Exome sequencing
- Transcriptome sequencing
- Genome sequencing
- Mitochondrial sequencing
- DNA-protein interactions (ChIP-seq)
- Variant detection
- Genome finishing
NGS in Research Areas:
- Reproductive health
- Forensic genomics
- Complex diseases
- Microbial genomics
- Food and environmental genomics
- Genomics in drug development – personalized medicine
Terminology in NGS
- a single contiguous stretch of sequence obtained from the instrument
- Fragment read
- a read from a fragment library. Depending on the sequencing platform, a read is typically around 100-300bp
- Fragment paired-end reads
- two reads from each end of a DNA fragment coming from a fragment library
- Mate-paired read
- two reads from each end of a large DNA fragment (usually a pre-defined size-range)
- Coverage (example)
- 30× coverage means that each base pair in the reference genome was covered by 30 reads on average
Illumina uses a sequence-by-synthesis technology with fluorescently labelled reversible chain terminator nucleotides, which are situated on clonally amplified DNA templates (clusters). DNA clusters are immobilized on a surface of a glass flowcell. The workflow composes of repeated cycles: incorporation of all four nucleotides (each with labelled with different fluorescent dye), four-color imaging, cleavage of dye and terminating groups, and again incorporation, imaging etc. The flowcells are subjected to massively parallel sequencing. This strategy possibly avoids errors with mononucleotide runs by controlled addition of a single fluorescently labelled nucleotide at a time. Read lengths are usually around 100-150 bp.
Ion Torrent is powered by semiconductor technology chips and detects the protons released upon incorporation of nucleotides during synthesis. It uses emulsion PCR (emPCR) on the surface of beads called Ion Sphere Particles and amplifies DNA fragments with linked specific adapters. Each bead is covered by one-type DNA fragment. Beads with different DNA fragments are then located in a proton sensing wells of a chip. Chip is flooded with one of the four nucleotides at a time, and the process is repeated every 15 seconds with the different nucleotide. So during sequencing each of the four bases is introduced one by one and in an event of incorporation, protons are released, and a voltage signal is detected proportionally to the incorporation.
Pacific Biosciences enables observing of structural and cell type variation by single molecule real time (SMRT) sequencing with ultra-long read lengths of >20kb base pairs. In this platform ultra-long double stranded DNA (dsDNA) fragments are generated either by random shearing with Diagenode device such as the Megaruptor® or by amplification of target regions of interest. A SMRTbell library is generated by ligating universal hairpin adapters to each end of DNA fragments. After washing steps with size selective conditions, sequencing primers are annealed to the SMRTbell templates and sequencing, involving a DNA polymerase bound to a DNA template, begins in the presence of fluorescently labelled nucleotides. When each base is incorporated, a different pulse of fluorescence is detected in real time.
Oxford Nanopore develops a technology, based on a single DNA molecule sequencing, where a biological molecule, i.e. DNA passes through or near nanoscale pores (nanopores) located as a set of electically resistant polymer membrane, and changes an ionic current. The information on this change is translated into information of the molecule for example by distinguishing all four nucleotides (A or G r C or T) as well as modified ones. A flow cell of the sequencing minION device contains a sensor array of several hundred nanopore channels. The DNA sample requires ultra-long DNA fragments that can be generated by randon shearing with Diagenode device such as the Megaruptor®.
SOLiD, with a unique chemistry, enables simultaneous sequencing of thousands of individual DNA molecules. It starts with library generation by ligation of adapters to sheared genomic DNA (fragments of mate-pair libraries are suitable). In the next step, the emulsion PCR (emPCR) is carried out to amplify individual template DNA molecules clonally on the surface of a bead. In emPCR, individual DNA templates are mixed with PCR reagents and a primer-coated beads within an aqueous droplet surrounded by a hydrophobic shell within an oil-in-water emulsion are randomly attached to the surface of a glass slide that is loaded for sequencing on the instrument. This technology uses a set of four fluorescently labelled di-base probes that compete for ligation to the sequencing primer
454 utilizes a large-scale parallel pyrosequencing. It starts with library preparation on 300-800 bp fragments of either the whole genome DNA or targeted gene fragments. The next step involves attachment of adapters to the DNA fragments and separation into single DNA strands. Later on adapter ligated DNA fragments are processed in emulsion-based clonal amplification (emPCR) and the DNA library fragments are located onto micron-sized beads. Each DNA-bound bead is placed into a well on a fiber optic chip and inserted into the instrument. The four DNA nucleotides are added sequentially in a fixed order during a sequencing run and sequenced in parallel.