SMaSH: Sample matching using SNPs in humans.

Westphal M, Frankhouser D, Sonzone C, Shields PG, Yan P, Bundschuh R

BACKGROUND: Inadvertent sample swaps are a real threat to data quality in any medium to large scale omics studies. While matches between samples from the same individual can in principle be identified from a few well characterized single nucleotide polymorphisms (SNPs), omics data types often only provide low to moderate coverage, thus requiring integration of evidence from a large number of SNPs to determine if two samples derive from the same individual or not. METHODS: We select about six thousand SNPs in the human genome and develop a Bayesian framework that is able to robustly identify sample matches between next generation sequencing data sets. RESULTS: We validate our approach on a variety of data sets. Most importantly, we show that our approach can establish identity between different omics data types such as Exome, RNA-Seq, and MethylCap-Seq. We demonstrate how identity detection degrades with sample quality and read coverage, but show that twenty million reads of a fairly low quality RNA-Seq sample are still sufficient for reliable sample identification. CONCLUSION: Our tool, SMASH, is able to identify sample mismatches in next generation sequencing data sets between different sequencing modalities and for low quality sequencing data.

IP-Star Compact

Share this article

December, 2019


Products used in this publication

  • Methylation kit icon
    MethylCap kit


  • APHL 2024
    Milwaukee, Wisconsin, USA
    May 6-May 9, 2024



The European Regional Development Fund and Wallonia are investing in your future.

Extension of industrial buildings and new laboratories.

       Site map   |   Contact us   |   Conditions of sales   |   Conditions of purchase   |   Privacy policy   |   Diagenode Diagnostics