#+title: Bisonex * Diagnostic Impact and Cost-effectiveness of Whole-Exome Sequencing for Ambulant Children With Suspected Monogenic Conditions :PROPERTIES: :TITLE: Diagnostic Impact and Cost-effectiveness of Whole-Exome Sequencing for Ambulant Children With Suspected Monogenic Conditions :BTYPE: article :CUSTOM_ID: Tan2017 :AUTHOR: Tan, Tiong Yang and Dillon, Oliver James and Stark, Zornitza and Schofield, Deborah and Alam, Khurshid and Shrestha, Rupendra and Chong, Belinda and Phelan, Dean and Brett, Gemma R. and Creed, Emma and Jarmolowicz, Anna and Yap, Patrick and Walsh, Maie and Downie, Lilian and Amor, David J. and Savarirayan, Ravi and McGillivray, George and Yeung, Alison and Peters, Heidi and Robertson, Susan J. and Robinson, Aaron J. and Macciocca, Ivan and Sadedin, Simon and Bell, Katrina and Oshlack, Alicia and Georgeson, Peter and Thorne, Natalie and Gaff, Clara and White, Susan M. :DOI: 10.1001/jamapediatrics.2017.1755 :ISSUE: 9 :JOURNAL: JAMA Pediatrics :LANGUAGE: en :MONTH: 9 :PAGES: 855 :PUBLISHER: American Medical Association (AMA) :URL: http://dx.doi.org/10.1001/jamapediatrics.2017.1755 :VOLUME: 171 :YEAR: 2017 :END: * Meta-analysis of the diagnostic and clinical utility of genome and exome sequencing and chromosomal microarray in children with suspected genetic diseases :PROPERTIES: :TITLE: Meta-analysis of the diagnostic and clinical utility of genome and exome sequencing and chromosomal microarray in children with suspected genetic diseases :BTYPE: article :CUSTOM_ID: Clark2018 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>Genetic diseases are leading causes of childhood mortality. Whole-genome sequencing (WGS) and whole-exome sequencing (WES) are relatively new methods for diagnosing genetic diseases, whereas chromosomal microarray (CMA) is well established. Here we compared the diagnostic utility (rate of causative, pathogenic, or likely pathogenic genotypes in known disease genes) and clinical utility (proportion in whom medical or surgical management was changed by diagnosis) of WGS, WES, and CMA in children with suspected genetic diseases by systematic review of the literature (January 2011\textendash{}August 2017) and meta-analysis, following MOOSE/PRISMA guidelines. In 37 studies, comprising 20,068 children, diagnostic utility of WGS (0.41, 95\% CI 0.34\textendash{}0.48, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}44\%) and WES (0.36, 95\% CI 0.33\textendash{}0.40, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}83\%) were qualitatively greater than CMA (0.10, 95\% CI 0.08\textendash{}0.12, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}81\%). Among studies published in 2017, the diagnostic utility of WGS was significantly greater than CMA (<jats:italic>P</jats:italic>\hspace{0.167em}\<\hspace{0.167em}0.0001, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}13\% and <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}40\%, respectively). Among studies featuring within-cohort comparisons, the diagnostic utility of WES was significantly greater than CMA (<jats:italic>P</jats:italic>\hspace{0.167em}\<\hspace{0.167em}0.001, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}36\%). The diagnostic utility of WGS and WES were not significantly different. In studies featuring within-cohort comparisons of WGS/WES, the likelihood of diagnosis was significantly greater for trios than singletons (odds ratio 2.04, 95\% CI 1.62\textendash{}2.56, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}12\%; <jats:italic>P</jats:italic>\hspace{0.167em}\<\hspace{0.167em}0.0001). Diagnostic utility of WGS/WES with hospital-based interpretation (0.42, 95\% CI 0.38\textendash{}0.45, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}48\%) was qualitatively higher than that of reference laboratories (0.29, 95\% CI 0.27\textendash{}0.31, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}49\%); this difference was significant among studies published in 2017 (<jats:italic>P</jats:italic>\hspace{0.167em}\<\hspace{0.167em}.0001, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}22\% and <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}26\%, respectively). The clinical utility of WGS (0.27, 95\% CI 0.17\textendash{}0.40, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}54\%) and WES (0.17, 95\% CI 0.12\textendash{}0.24, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}76\%) were higher than CMA (0.06, 95\% CI 0.05\textendash{}0.07, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}42\%); this difference was significant for WGS vs CMA (<jats:italic>P</jats:italic>\hspace{0.167em}\<\hspace{0.167em}0.0001). In conclusion, in children with suspected genetic diseases, the diagnostic and clinical utility of WGS/WES were greater than CMA. Subgroups with higher WGS/WES diagnostic utility were trios and those receiving hospital-based interpretation. WGS/WES should be considered a first-line genomic test for children with suspected genetic diseases.</jats:p> :AUTHOR: Clark, Michelle M. and Stark, Zornitza and Farnaes, Lauge and Tan, Tiong Y. and White, Susan M. and Dimmock, David and Kingsmore, Stephen F. :DOI: 10.1038/s41525-018-0053-8 :ISSUE: 1 :JOURNAL: npj Genomic Medicine :LANGUAGE: en :MONTH: 7 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41525-018-0053-8 :VOLUME: 3 :YEAR: 2018 :END: * Recommendations for next generation sequencing data reanalysis of unsolved cases with suspected Mendelian disorders: A systematic review and meta-analysis :PROPERTIES: :TITLE: Recommendations for next generation sequencing data reanalysis of unsolved cases with suspected Mendelian disorders: A systematic review and meta-analysis :BTYPE: article :CUSTOM_ID: Dai2022 :AUTHOR: Dai, Pei and Honda, Andrew and Ewans, Lisa and McGaughran, Julie and Burnett, Leslie and Law, Matthew and Phan, Tri Giang :DOI: 10.1016/j.gim.2022.04.021 :ISSUE: 8 :JOURNAL: Genetics in Medicine :LANGUAGE: en :MONTH: 8 :PAGES: 1618--1629 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.gim.2022.04.021 :VOLUME: 24 :YEAR: 2022 :END: * Combining globally search for a regular expression and print matching lines with bibliographic monitoring of genomic database improves diagnosis :PROPERTIES: :TITLE: Combining globally search for a regular expression and print matching lines with bibliographic monitoring of genomic database improves diagnosis :BTYPE: article :CUSTOM_ID: TranMauThem2023 :ABSTRACT: <jats:p><jats:bold>Introduction:</jats:bold> Exome sequencing has a diagnostic yield ranging from 25\% to 70\% in rare diseases and regularly implicates genes in novel disorders. Retrospective data reanalysis has demonstrated strong efficacy in improving diagnosis, but poses organizational difficulties for clinical laboratories.</jats:p><jats:p><jats:bold>Patients and methods:</jats:bold> We applied a reanalysis strategy based on intensive prospective bibliographic monitoring along with direct application of the GREP command-line tool (to \textquotedblleft{}globally search for a regular expression and print matching lines\textquotedblright{}) in a large ES database. For 18 months, we submitted the same five keywords of interest [(<jats:italic>intellectual disability</jats:italic>, (<jats:italic>neuro</jats:italic>)<jats:italic>developmental delay</jats:italic>, and (<jats:italic>neuro</jats:italic>)<jats:italic>developmental disorder</jats:italic>)] to PubMed on a daily basis to identify recently published novel disease\textendash{}gene associations or new phenotypes in genes already implicated in human pathology. We used the Linux GREP tool and an in-house script to collect all variants of these genes from our 5,459 exome database.</jats:p><jats:p><jats:bold>Results:</jats:bold> After GREP queries and variant filtration, we identified 128 genes of interest and collected 56 candidate variants from 53 individuals. We confirmed causal diagnosis for 19/128 genes (15\%) in 21 individuals and identified variants of unknown significance for 19/128 genes (15\%) in 23 individuals. Altogether, GREP queries for only 128 genes over a period of 18 months permitted a causal diagnosis to be established in 21/2875 undiagnosed affected probands (0.7\%).</jats:p><jats:p><jats:bold>Conclusion:</jats:bold> The GREP query strategy is efficient and less tedious than complete periodic reanalysis. It is an interesting reanalysis strategy to improve diagnosis.</jats:p> :AUTHOR: Tran Mau-Them, Fr\'{e}d\'{e}ric and Overs, Alexis and Bruel, Ange-Line and Duquet, Romain and Thareau, Mylene and Denomm\'{e}-Pichon, Anne-Sophie and Vitobello, Antonio and Sorlin, Arthur and Safraou, Hana and Nambot, Sophie and Delanne, Julian and Moutton, Sebastien and Racine, Caroline and Engel, Camille and De Giraud d'Agay, Melchior and Lehalle, Daphne and Goldenberg, Alice and Willems, Marjolaine and Coubes, Christine and Genevieve, David and Verloes, Alain and Capri, Yline and Perrin, Laurence and Jacquemont, Marie-Line and Lambert, Laetitia and Lacaze, Elodie and Thevenon, Julien and Hana, Nadine and Van-Gils, Julien and Dubucs, Charlotte and Bizaoui, Varoona and Gerard-Blanluet, Marion and Lespinasse, James and Mercier, Sandra and Guerrot, Anne-Marie and Maystadt, Isabelle and Tisserant, Emilie and Faivre, Laurence and Philippe, Christophe and Duffourd, Yannis and Thauvin-Robinet, Christel :DOI: 10.3389/fgene.2023.1122985 :JOURNAL: Frontiers in Genetics :MONTH: 4 :PUBLISHER: Frontiers Media SA :URL: http://dx.doi.org/10.3389/fgene.2023.1122985 :VOLUME: 14 :YEAR: 2023 :END: * SimuSCoP: reliably simulate Illumina sequencing data based on position and context dependent profiles :PROPERTIES: :TITLE: SimuSCoP: reliably simulate Illumina sequencing data based on position and context dependent profiles :BTYPE: article :CUSTOM_ID: Yu2020 :ABSTRACT: <jats:title>Abstract</jats:title><jats:sec> <jats:title>Background</jats:title> <jats:p>A number of simulators have been developed for emulating next-generation sequencing data by incorporating known errors such as base substitutions and indels. However, their practicality may be degraded by functional and runtime limitations. Particularly, the positional and genomic contextual information is not effectively utilized for reliably characterizing base substitution patterns, as well as the positional and contextual difference of Phred quality scores is not fully investigated. Thus, a more effective and efficient bioinformatics tool is sorely required.</jats:p> </jats:sec><jats:sec> <jats:title>Results</jats:title> <jats:p>Here, we introduce a novel tool, SimuSCoP, to reliably emulate complex DNA sequencing data. The base substitution patterns and the statistical behavior of quality scores in Illumina sequencing data are fully explored and integrated into the simulation model for reliably emulating datasets for different applications. In addition, an integrated and easy-to-use pipeline is employed in SimuSCoP to facilitate end-to-end simulation of complex samples, and high runtime efficiency is achieved by implementing the tool to run in multithreading with low memory consumption. These features enable SimuSCoP to gets substantial improvements in reliability, functionality, practicality and runtime efficiency. The tool is comprehensively evaluated in multiple aspects including consistency of profiles, simulation of genomic variations and complex tumor samples, and the results demonstrate the advantages of SimuSCoP over existing tools.</jats:p> </jats:sec><jats:sec> <jats:title>Conclusions</jats:title> <jats:p>SimuSCoP, a new bioinformatics tool is developed to learn informative profiles from real sequencing data and reliably mimic complex data by introducing various genomic variations. We believe that the presented work will catalyse new development of downstream bioinformatics methods for analyzing sequencing data.</jats:p> </jats:sec> :AUTHOR: Yu, Zhenhua and Du, Fang and Ban, Rongjun and Zhang, Yuanwei :DOI: 10.1186/s12859-020-03665-5 :ISSUE: 1 :JOURNAL: BMC Bioinformatics :LANGUAGE: en :MONTH: 12 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1186/s12859-020-03665-5 :VOLUME: 21 :YEAR: 2020 :END: * Genome measures used for quality control are dependent on gene function and ancestry :PROPERTIES: :TITLE: Genome measures used for quality control are dependent on gene function and ancestry :BTYPE: article :CUSTOM_ID: Wang2015 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:p>Motivation: The transition/transversion (Ti/Tv) ratio and heterozygous/nonreference-homozygous (het/nonref-hom) ratio have been commonly computed in genetic studies as a quality control (QC) measurement. Additionally, these two ratios are helpful in our understanding of the patterns of DNA sequence evolution.</jats:p> <jats:p>Results: To thoroughly understand these two genomic measures, we performed a study using 1000 Genomes Project (1000G) released genotype data ( N = 1092). An additional two datasets ( N = 581 and N = 6) were used to validate our findings from the 1000G dataset. We compared the two ratios among continental ancestry, genome regions and gene functionality. We found that the Ti/Tv ratio can be used as a quality indicator for single nucleotide polymorphisms inferred from high-throughput sequencing data. The Ti/Tv ratio varies greatly by genome region and functionality, but not by ancestry. The het/nonref-hom ratio varies greatly by ancestry, but not by genome regions and functionality. Furthermore, extreme guanine + cytosine content (either high or low) is negatively associated with the Ti/Tv ratio magnitude. Thus, when performing QC assessment using these two measures, care must be taken to apply the correct thresholds based on ancestry and genome region. Failure to take these considerations into account at the QC stage will bias any following analysis.</jats:p> <jats:p>Contact: yan.guo@vanderbilt.edu</jats:p> <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.</jats:p> :AUTHOR: Wang, Jing and Raskin, Leon and Samuels, David C. and Shyr, Yu and Guo, Yan :DOI: 10.1093/bioinformatics/btu668 :ISSUE: 3 :JOURNAL: Bioinformatics :LANGUAGE: en :MONTH: 2 :PAGES: 318--323 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/bioinformatics/btu668 :VOLUME: 31 :YEAR: 2015 :END: * Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection :PROPERTIES: :TITLE: Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection :BTYPE: article :CUSTOM_ID: Ewing2015 :AUTHOR: Ewing, Adam D and None, None and Houlahan, Kathleen E and Hu, Yin and Ellrott, Kyle and Caloian, Cristian and Yamaguchi, Takafumi N and Bare, J Christopher and P'ng, Christine and Waggott, Daryl and Sabelnykova, Veronica Y and Kellen, Michael R and Norman, Thea C and Haussler, David and Friend, Stephen H and Stolovitzky, Gustavo and Margolin, Adam A and Stuart, Joshua M and Boutros, Paul C :DOI: 10.1038/nmeth.3407 :ISSUE: 7 :JOURNAL: Nature Methods :LANGUAGE: en :MONTH: 7 :PAGES: 623--630 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/nmeth.3407 :VOLUME: 12 :YEAR: 2015 :END: * PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions :PROPERTIES: :TITLE: PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions :BTYPE: article :CUSTOM_ID: Olson2022 :AUTHOR: Olson, Nathan D. and Wagner, Justin and McDaniel, Jennifer and Stephens, Sarah H. and Westreich, Samuel T. and Prasanna, Anish G. and Johanson, Elaine and Boja, Emily and Maier, Ezekiel J. and Serang, Omar and J\'{a}spez, David and Lorenzo-Salazar, Jos\'{e} M. and Mu\ {n}oz-Barrera, Adri\'{a}n and Rubio-Rodr\'{\i}guez, Luis A. and Flores, Carlos and Kyriakidis, Konstantinos and Malousi, Andigoni and Shafin, Kishwar and Pesout, Trevor and Jain, Miten and Paten, Benedict and Chang, Pi-Chuan and Kolesnikov, Alexey and Nattestad, Maria and Baid, Gunjan and Goel, Sidharth and Yang, Howard and Carroll, Andrew and Eveleigh, Robert and Bourgey, Mathieu and Bourque, Guillaume and Li, Gen and Ma, ChouXian and Tang, LinQi and Du, YuanPing and Zhang, ShaoWei and Morata, Jordi and Tonda, Ra\'{u}l and Parra, Gen\'{\i}s and Trotta, Jean-R\'{e}mi and Brueffer, Christian and Demirkaya-Budak, Sinem and Kabakci-Zorlu, Duygu and Turgut, Deniz and Kalay, \"{O}zem and Budak, Gungor and Narc\i{}, K\"{u}bra and Arslan, Elif and Brown, Richard and Johnson, Ivan J. and Dolgoborodov, Alexey and Semenyuk, Vladimir and Jain, Amit and Tetikol, H. Serhat and Jain, Varun and Ruehle, Mike and Lajoie, Bryan and Roddey, Cooper and Catreux, Severine and Mehio, Rami and Ahsan, Mian Umair and Liu, Qian and Wang, Kai and Ebrahim Sahraeian, Sayed Mohammad and Fang, Li Tai and Mohiyuddin, Marghoob and Hung, Calvin and Jain, Chirag and Feng, Hanying and Li, Zhipan and Chen, Luoqi and Sedlazeck, Fritz J. and Zook, Justin M. :DOI: 10.1016/j.xgen.2022.100129 :ISSUE: 5 :JOURNAL: Cell Genomics :LANGUAGE: en :MONTH: 5 :PAGES: 100129 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.xgen.2022.100129 :VOLUME: 2 :YEAR: 2022 :END: * Library preparation methods for next-generation sequencing: Tone down the bias :PROPERTIES: :TITLE: Library preparation methods for next-generation sequencing: Tone down the bias :BTYPE: article :CUSTOM_ID: VanDijk2014 :AUTHOR: van Dijk, Erwin L. and Jaszczyszyn, Yan and Thermes, Claude :DOI: 10.1016/j.yexcr.2014.01.008 :ISSUE: 1 :JOURNAL: Experimental Cell Research :LANGUAGE: en :MONTH: 3 :PAGES: 12--20 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.yexcr.2014.01.008 :VOLUME: 322 :YEAR: 2014 :END: * Library construction for next-generation sequencing: Overviews and challenges :PROPERTIES: :TITLE: Library construction for next-generation sequencing: Overviews and challenges :BTYPE: article :CUSTOM_ID: Head2014 :ABSTRACT: <jats:p> High-throughput sequencing, also known as next-generation sequencing (NGS), has revolutionized genomic research. In recent years, NGS technology has steadily improved, with costs dropping and the number and range of sequencing applications increasing exponentially. Here, we examine the critical role of sequencing library quality and consider important challenges when preparing NGS libraries from DNA and RNA sources. Factors such as the quantity and physical characteristics of the RNA or DNA source material as well as the desired application (i.e., genome sequencing, targeted sequencing, RNA-seq, ChIP-seq, RIP-seq, and methylation) are addressed in the context of preparing high quality sequencing libraries. In addition, the current methods for preparing NGS libraries from single cells are also discussed. </jats:p> :AUTHOR: Head, Steven R. and Komori, H. Kiyomi and LaMere, Sarah A. and Whisenant, Thomas and Van Nieuwerburgh, Filip and Salomon, Daniel R. and Ordoukhanian, Phillip :DOI: 10.2144/000114133 :ISSUE: 2 :JOURNAL: BioTechniques :LANGUAGE: en :MONTH: 2 :PAGES: 61--77 :PUBLISHER: Future Science Ltd :URL: http://dx.doi.org/10.2144/000114133 :VOLUME: 56 :YEAR: 2014 :END: * Splicing mutations in human genetic disorders: examples, detection, and confirmation :PROPERTIES: :TITLE: Splicing mutations in human genetic disorders: examples, detection, and confirmation :BTYPE: article :CUSTOM_ID: Anna2018 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:p>Precise pre-mRNA splicing, essential for appropriate protein translation, depends on the presence of consensus \textquotedblleft{}cis\textquotedblright{} sequences that define exon-intron boundaries and regulatory sequences recognized by splicing machinery. Point mutations at these consensus sequences can cause improper exon and intron recognition and may result in the formation of an aberrant transcript of the mutated gene. The splicing mutation may occur in both introns and exons and disrupt existing splice sites or splicing regulatory sequences (intronic and exonic splicing silencers and enhancers), create new ones, or activate the cryptic ones. Usually such mutations result in errors during the splicing process and may lead to improper intron removal and thus cause alterations of the open reading frame. Recent research has underlined the abundance and importance of splicing mutations in the etiology of inherited diseases. The application of modern techniques allowed to identify synonymous and nonsynonymous variants as well as deep intronic mutations that affected pre-mRNA splicing. The bioinformatic algorithms can be applied as a tool to assess the possible effect of the identified changes. However, it should be underlined that the results of such tests are only predictive, and the exact effect of the specific mutation should be verified in functional studies. This article summarizes the current knowledge about the \textquotedblleft{}splicing mutations\textquotedblright{} and methods that help to identify such changes in clinical diagnosis.</jats:p> :AUTHOR: Anna, Abramowicz and Monika, Gos :DOI: 10.1007/s13353-018-0444-7 :ISSUE: 3 :JOURNAL: Journal of Applied Genetics :LANGUAGE: en :MONTH: 8 :PAGES: 253--268 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1007/s13353-018-0444-7 :VOLUME: 59 :YEAR: 2018 :END: * A robust model for read count data in exome sequencing experiments and implications for copy number variant calling :PROPERTIES: :TITLE: A robust model for read count data in exome sequencing experiments and implications for copy number variant calling :BTYPE: article :CUSTOM_ID: Plagnol2012 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:p>Motivation: Exome sequencing has proven to be an effective tool to discover the genetic basis of Mendelian disorders. It is well established that copy number variants (CNVs) contribute to the etiology of these disorders. However, calling CNVs from exome sequence data is challenging. A typical read depth strategy consists of using another sample (or a combination of samples) as a reference to control for the variability at the capture and sequencing steps. However, technical variability between samples complicates the analysis and can create spurious CNV calls.</jats:p> <jats:p>Results: Here, we introduce ExomeDepth, a new CNV calling algorithm designed to control for this technical variability. ExomeDepth uses a robust model for the read count data and uses this model to build an optimized reference set in order to maximize the power to detect CNVs. As a result, ExomeDepth is effective across a wider range of exome datasets than the previously existing tools, even for small (e.g. one to two exons) and heterozygous deletions. We used this new approach to analyse exome data from 24 patients with primary immunodeficiencies. Depending on data quality and the exact target region, we find between 170 and 250 exonic CNV calls per sample. Our analysis identified two novel causative deletions in the genes GATA2 and DOCK8.</jats:p> <jats:p>Availability: The code used in this analysis has been implemented into an R package called ExomeDepth and is available at the Comprehensive R Archive Network (CRAN).</jats:p> <jats:p>Contact: v.plagnol@ucl.ac.uk</jats:p> <jats:p>Supplementary Information: Supplementary data are available at Bioinformatics online.</jats:p> :AUTHOR: Plagnol, Vincent and Curtis, James and Epstein, Michael and Mok, Kin Y. and Stebbings, Emma and Grigoriadou, Sofia and Wood, Nicholas W. and Hambleton, Sophie and Burns, Siobhan O. and Thrasher, Adrian J. and Kumararatne, Dinakantha and Doffinger, Rainer and Nejentsev, Sergey :DOI: 10.1093/bioinformatics/bts526 :ISSUE: 21 :JOURNAL: Bioinformatics :LANGUAGE: en :MONTH: 11 :PAGES: 2747--2754 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/bioinformatics/bts526 :VOLUME: 28 :YEAR: 2012 :END: * Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment :PROPERTIES: :TITLE: Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment :BTYPE: article :CUSTOM_ID: Betschart2022 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>Rapid advances in high-throughput DNA sequencing technologies have enabled the conduct of whole genome sequencing (WGS) studies, and several bioinformatics pipelines have become available. The aim of this study was the comparison of 6 WGS data pre-processing pipelines, involving two mapping and alignment approaches (GATK utilizing BWA-MEM2 2.2.1, and DRAGEN 3.8.4) and three variant calling pipelines (GATK 4.2.4.1, DRAGEN 3.8.4 and DeepVariant 1.1.0). We sequenced one genome in a bottle (GIAB) sample 70 times in different runs, and one GIAB trio in triplicate. The truth set of the GIABs was used for comparison, and performance was assessed by computation time, F<jats:sub>1</jats:sub> score, precision, and recall. In the mapping and alignment step, the DRAGEN pipeline was faster than the GATK with BWA-MEM2 pipeline. DRAGEN showed systematically higher F<jats:sub>1</jats:sub> score, precision, and recall values than GATK for single nucleotide variations (SNVs) and Indels in simple-to-map, complex-to-map, coding and non-coding regions. In the variant calling step, DRAGEN was fastest. In terms of accuracy, DRAGEN and DeepVariant performed similarly and both superior to GATK, with slight advantages for DRAGEN for Indels and for DeepVariant for SNVs. The DRAGEN pipeline showed the lowest Mendelian inheritance error fraction for the GIAB trios. Mapping and alignment played a key role in variant calling of WGS, with the DRAGEN outperforming GATK.</jats:p> :AUTHOR: Betschart, Raphael O. and Thi\'{e}ry, Alexandre and Aguilera-Garcia, Domingo and Zoche, Martin and Moch, Holger and Twerenbold, Raphael and Zeller, Tanja and Blankenberg, Stefan and Ziegler, Andreas :DOI: 10.1038/s41598-022-26181-3 :ISSUE: 1 :JOURNAL: Scientific Reports :LANGUAGE: en :MONTH: 12 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41598-022-26181-3 :VOLUME: 12 :YEAR: 2022 :END: * Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants :PROPERTIES: :TITLE: Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants :BTYPE: article :CUSTOM_ID: Belkadi2015 :ABSTRACT: <jats:title>Significance</jats:title> <jats:p>Whole-exome sequencing (WES) is gradually being optimized to identify mutations in increasing proportions of the protein-coding exome, but whole-genome sequencing (WGS) is becoming an attractive alternative. WGS is currently more expensive than WES, but its cost should decrease more rapidly than that of WES. We compared WES and WGS on six unrelated individuals. The distribution of quality parameters for single-nucleotide variants (SNVs) and insertions/deletions (indels) was more uniform for WGS than for WES. The vast majority of SNVs and indels were identified by both techniques, but an estimated 650 high-quality coding SNVs (\sim{}3\% of coding variants) were detected by WGS and missed by WES. WGS is therefore slightly more efficient than WES for detecting mutations in the targeted exome.</jats:p> :AUTHOR: Belkadi, Aziz and Bolze, Alexandre and Itan, Yuval and Cobat, Aur\'{e}lie and Vincent, Quentin B. and Antipenko, Alexander and Shang, Lei and Boisson, Bertrand and Casanova, Jean-Laurent and Abel, Laurent :DOI: 10.1073/pnas.1418631112 :ISSUE: 17 :JOURNAL: Proceedings of the National Academy of Sciences :LANGUAGE: en :MONTH: 4 :PAGES: 5473--5478 :PUBLISHER: Proceedings of the National Academy of Sciences :URL: http://dx.doi.org/10.1073/pnas.1418631112 :VOLUME: 112 :YEAR: 2015 :END: * VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing :PROPERTIES: :TITLE: VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing :BTYPE: article :CUSTOM_ID: Koboldt2012 :ABSTRACT: <jats:p>Cancer is a disease driven by genetic variation and mutation. Exome sequencing can be utilized for discovering these variants and mutations across hundreds of tumors. Here we present an analysis tool, VarScan 2, for the detection of somatic mutations and copy number alterations (CNAs) in exome data from tumor\textendash{}normal pairs. Unlike most current approaches, our algorithm reads data from both samples simultaneously; a heuristic and statistical algorithm detects sequence variants and classifies them by somatic status (germline, somatic, or LOH); while a comparison of normalized read depth delineates relative copy number changes. We apply these methods to the analysis of exome sequence data from 151 high-grade ovarian tumors characterized as part of the Cancer Genome Atlas (TCGA). We validated some 7790 somatic coding mutations, achieving 93\% sensitivity and 85\% precision for single nucleotide variant (SNV) detection. Exome-based CNA analysis identified 29 large-scale alterations and 619 focal events per tumor on average. As in our previous analysis of these data, we observed frequent amplification of oncogenes (e.g., <jats:italic>CCNE1</jats:italic>, <jats:italic>MYC</jats:italic>) and deletion of tumor suppressors (<jats:italic>NF1</jats:italic>, <jats:italic>PTEN</jats:italic>, and <jats:italic>CDKN2A</jats:italic>). We searched for additional recurrent focal CNAs using the correlation matrix diagonal segmentation (CMDS) algorithm, which identified 424 significant events affecting 582 genes. Taken together, our results demonstrate the robust performance of VarScan 2 for somatic mutation and CNA detection and shed new light on the landscape of genetic alterations in ovarian cancer.</jats:p> :AUTHOR: Koboldt, Daniel C. and Zhang, Qunyuan and Larson, David E. and Shen, Dong and McLellan, Michael D. and Lin, Ling and Miller, Christopher A. and Mardis, Elaine R. and Ding, Li and Wilson, Richard K. :DOI: 10.1101/gr.129684.111 :ISSUE: 3 :JOURNAL: Genome Research :LANGUAGE: en :MONTH: 3 :PAGES: 568--576 :PUBLISHER: Cold Spring Harbor Laboratory :URL: http://dx.doi.org/10.1101/gr.129684.111 :VOLUME: 22 :YEAR: 2012 :END: * Curated variation benchmarks for challenging medically relevant autosomal genes :PROPERTIES: :TITLE: Curated variation benchmarks for challenging medically relevant autosomal genes :BTYPE: article :CUSTOM_ID: Wagner2022gene :AUTHOR: Wagner, Justin and Olson, Nathan D. and Harris, Lindsay and McDaniel, Jennifer and Cheng, Haoyu and Fungtammasan, Arkarachai and Hwang, Yih-Chii and Gupta, Richa and Wenger, Aaron M. and Rowell, William J. and Khan, Ziad M. and Farek, Jesse and Zhu, Yiming and Pisupati, Aishwarya and Mahmoud, Medhat and Xiao, Chunlin and Yoo, Byunggil and Sahraeian, Sayed Mohammad Ebrahim and Miller, Danny E. and J\'{a}spez, David and Lorenzo-Salazar, Jos\'{e} M. and Mu\ {n}oz-Barrera, Adri\'{a}n and Rubio-Rodr\'{\i}guez, Luis A. and Flores, Carlos and Narzisi, Giuseppe and Evani, Uday Shanker and Clarke, Wayne E. and Lee, Joyce and Mason, Christopher E. and Lincoln, Stephen E. and Miga, Karen H. and Ebbert, Mark T. W. and Shumate, Alaina and Li, Heng and Chin, Chen-Shan and Zook, Justin M. and Sedlazeck, Fritz J. :DOI: 10.1038/s41587-021-01158-1 :ISSUE: 5 :JOURNAL: Nature Biotechnology :LANGUAGE: en :MONTH: 5 :PAGES: 672--680 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41587-021-01158-1 :VOLUME: 40 :YEAR: 2022 :END: * Exome variant discrepancies due to reference-genome differences :PROPERTIES: :TITLE: Exome variant discrepancies due to reference-genome differences :BTYPE: article :CUSTOM_ID: Li2021 :AUTHOR: Li, He and Dawood, Moez and Khayat, Michael M. and Farek, Jesse R. and Jhangiani, Shalini N. and Khan, Ziad M. and Mitani, Tadahiro and Coban-Akdemir, Zeynep and Lupski, James R. and Venner, Eric and Posey, Jennifer E. and Sabo, Aniko and Gibbs, Richard A. :DOI: 10.1016/j.ajhg.2021.05.011 :ISSUE: 7 :JOURNAL: The American Journal of Human Genetics :LANGUAGE: en :MONTH: 7 :PAGES: 1239--1250 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.ajhg.2021.05.011 :VOLUME: 108 :YEAR: 2021 :END: * Evaluation of an optimized germline exomes pipeline using BWA-MEM2 and Dragen-GATK tools :PROPERTIES: :TITLE: Evaluation of an optimized germline exomes pipeline using BWA-MEM2 and Dragen-GATK tools :BTYPE: article :CUSTOM_ID: Alganmi2023 :ABSTRACT: <jats:p>The next-generation sequencing (NGS) technology represents a significant advance in genomics and medical diagnosis. Nevertheless, the time it takes to perform sequencing, data analysis, and variant interpretation is a bottleneck in using next-generation sequencing in precision medicine. For accurate and efficient performance in clinical diagnostic lab practice, a consistent data analysis pipeline is necessary to avoid false variant calls and achieve optimum accuracy. This study aims to compare the performance of two NGS data analysis pipeline compartments, including short-read mapping (BWA-MEM and BWA-MEM2) and variant calling (GATK-HaplotypeCaller and DRAGEN-GATK). On Whole Exome Sequencing (WES) data, computational performance was assessed using several criteria, including mapping efficiency, variant calling performance, false positive calls rate, and time. We examined four gold-standard WES data sets: Ashkenazim father (NA24149), Ashkenazim mother (NA24143), Ashkenazim son (NA24385), and Asian son (NA25631). In addition, eighteen exome samples were analyzed based on different read counts, and coverage was used precisely in the run-time assessment. By using BWA-MEM 2 and Dragen-GATK, this study achieved faster and more accurate detection for SNVs and indels than the standard GATK Best Practices workflow. This systematic comparison will enable the bioinformatics community to develop a more efficient and faster solution for analyzing NGS data.</jats:p> :AUTHOR: Alganmi, Nofe and Abusamra, Heba :DOI: 10.1371/journal.pone.0288371 :ISSUE: 8 :JOURNAL: PLOS ONE :LANGUAGE: en :MONTH: 8 :PAGES: e0288371 :PUBLISHER: Public Library of Science (PLoS) :URL: http://dx.doi.org/10.1371/journal.pone.0288371 :VOLUME: 18 :YEAR: 2023 :END: * Molecular genetic studies of complete hydatidiform moles :PROPERTIES: :TITLE: Molecular genetic studies of complete hydatidiform moles :BTYPE: article :CUSTOM_ID: Carey :ABSTRACT: Complete hydatidiform moles (CHM) are abnormal pregnancies with no fetal development resulting from having two paternal genomes with no maternal contribution. It is important to distinguish CHM from partial hydatidiform moles, and non-molar abortuses, ... :AUTHOR: Carey, Louise and Nash, Benjamin M. and Wright, Dale C. :DATE: 2015 Apr :DOI: 10.3978/j.issn.2224-4336.2015.04.02 :ISSUE: 2 :JOURNAL: Translational Pediatrics :LANGUAGE: en :PUBLISHER: AME Publications :URL: /pmc/articles/PMC4729092/ :VOLUME: 4 :END: * Nix based fully automated workflows and ecosystem to guarantee scientific result reproducibility across software environments and systems :PROPERTIES: :TITLE: Nix based fully automated workflows and ecosystem to guarantee scientific result reproducibility across software environments and systems :BTYPE: inproceedings :CUSTOM_ID: Devresse2015 :AUTHOR: Devresse, Adrien and Delalondre, Fabien and Sch\"{u}rmann, Felix :BOOKTITLE: SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis :DOI: 10.1145/2830168.2830172 :JOURNAL: Proceedings of the 3rd International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering :MONTH: 11 :PUBLISHER: ACM :URL: http://dx.doi.org/10.1145/2830168.2830172 :VENUE: Austin Texas :YEAR: 2015 :END: * A universal SNP and small-indel variant caller using deep neural networks :PROPERTIES: :TITLE: A universal SNP and small-indel variant caller using deep neural networks :BTYPE: article :CUSTOM_ID: Poplin2018 :AUTHOR: Poplin, Ryan and Chang, Pi-Chuan and Alexander, David and Schwartz, Scott and Colthurst, Thomas and Ku, Alexander and Newburger, Dan and Dijamco, Jojo and Nguyen, Nam and Afshar, Pegah T and Gross, Sam S and Dorfman, Lizzie and McLean, Cory Y and DePristo, Mark A :DOI: 10.1038/nbt.4235 :ISSUE: 10 :JOURNAL: Nature Biotechnology :LANGUAGE: en :MONTH: 11 :PAGES: 983--987 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/nbt.4235 :VOLUME: 36 :YEAR: 2018 :END: * PCR-Free Shallow Whole Genome Sequencing for Chromosomal Copy Number Detection from Plasma of Cancer Patients Is an Efficient Alternative to the Conventional PCR-Based Approach :PROPERTIES: :TITLE: PCR-Free Shallow Whole Genome Sequencing for Chromosomal Copy Number Detection from Plasma of Cancer Patients Is an Efficient Alternative to the Conventional PCR-Based Approach :BTYPE: article :CUSTOM_ID: Beagan2021 :ABSTRACT: Somatic copy number alterations can be detected in cell-free DNA (cfDNA) by shallow whole genome sequencing (sWGS). PCR is typically included in libra\ldots{} :AUTHOR: Beagan, Jamie J. and Drees, Esther E.E. and Stathi, Phylicia and Eijk, Paul P. and Meulenbroeks, Laura and Kessler, Floortje and Middeldorp, Jaap M. and Pegtel, D. Michiel and Zijlstra, Jos\'{e}e M. and Sie, Daoud and Heideman, Dani\"{e}lle A.M. and Thunnissen, Erik and Smit, Linda and de Jong, Daphne and Mouliere, Florent and Ylstra, Bauke and Roemer, Margaretha G.M. and van Dijk, Erik :DOI: 10.1016/j.jmoldx.2021.08.008 :ISSN: 1525-1578 :ISSUE: 11 :JOURNAL: The Journal of Molecular Diagnostics :PAGES: 1553-1563 :PUBLISHER: Elsevier :URL: https://www.sciencedirect.com/science/article/pii/S1525157821002646 :VOLUME: 23 :YEAR: 2021 :END: * GENCODE: The reference human genome annotation for The ENCODE Project :PROPERTIES: :TITLE: GENCODE: The reference human genome annotation for The ENCODE Project :BTYPE: article :CUSTOM_ID: Harrow :ABSTRACT: An international, peer-reviewed genome sciences journal featuring outstanding original research that offers novel insights into the biology of all organisms :AUTHOR: Harrow, Jennifer and Frankish, Adam and Gonzalez, Jose M. and Tapanari, Electra and Diekhans, Mark and Kokocinski, Felix and Aken, Bronwen L. and Barrell, Daniel and Zadissa, Amonida and Searle, Stephen and Barnes, If and Bignell, Alexandra and Boychenko, Veronika and Hunt, Toby and Kay, Mike and Mukherjee, Gaurab and Rajan, Jeena and Despacio-Reyes, Gloria and Saunders, Gary and Steward, Charles and Harte, Rachel and Lin, Michael and Howald, C\'{e}dric and Tanzer, Andrea and Derrien, Thomas and Chrast, Jacqueline and Walters, Nathalie and Balasubramanian, Suganthi and Pei, Baikang and Tress, Michael and Rodriguez, Jose Manuel and Ezkurdia, Iakes and van Baren, Jeltje and Brent, Michael and Haussler, David and Kellis, Manolis and Valencia, Alfonso and Reymond, Alexandre and Gerstein, Mark and Guig\'{o}, Roderic and Hubbard, Tim J. :DATE: 2012-09-01 :DOI: 10.1101/gr.135350.111 :ISSN: 1088-9051 :ISSUE: 9 :JOURNAL: Genome Research :LANGUAGE: en :PUBLISHER: Cold Spring Harbor Laboratory Press :URL: https://genome.cshlp.org/content/22/9/1760.full :VOLUME: 22 :END: * Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings :PROPERTIES: :TITLE: Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings :BTYPE: article :CUSTOM_ID: Hwang2019 :ABSTRACT: Comprehensive and accurate detection of variants from whole-genome sequencing (WGS) is a strong prerequisite for translational genomic medicine; however, low concordance between analytic pipelines is an outstanding challenge. We processed a European and an African WGS samples with 70 analytic pipelines comprising the combination of 7 short-read aligners and 10 variant calling algorithms (VCAs), and observed remarkable differences in the number of variants called by different pipelines (max/min ratio: 1.3\textasciitilde{}3.4). The similarity between variant call sets was more closely determined by VCAs rather than by short-read aligners. Remarkably, reported minor allele frequency had a substantial effect on concordance between pipelines (concordance rate ratio: 0.11\textasciitilde{}0.92; Wald tests, P\hspace{0.167em}\<\hspace{0.167em}0.001), entailing more discordant results for rare and novel variants. We compared the performance of analytic pipelines and pipeline ensembles using gold-standard variant call sets and the catalog of variants from the 1000 Genomes Project. Notably, a single pipeline using BWA-MEM and GATK-HaplotypeCaller performed comparable to the pipeline ensembles for `callable' regions (\textasciitilde{}97\%) of the human reference genome. While a single pipeline is capable of analyzing common variants in most genomic regions, our findings demonstrated the limitations and challenges in analyzing rare or novel variants, especially for non-European genomes. :AUTHOR: Hwang, Kyu-Baek and Lee, In-Hee and Li, Honglan and Won, Dhong-Geon and Hernandez-Ferrer, Carles and Negron, Jose Alberto and Kong, Sek Won :DATE: 2019-03-01 :DOI: 10.1038/s41598-019-39108-2 :ISSN: 2045-2322 :ISSUE: 1 :JOURNAL: Scientific Reports :KEYWORDS: Genetics research :LANGUAGE: En :PUBLISHER: Nature Publishing Group :URL: https://www.nature.com/articles/s41598-019-39108-2 :VOLUME: 9 :YEAR: 2019 :END: * Frequent nonallelic gene conversion on the human lineage and its effect on the divergence of gene duplicates :PROPERTIES: :TITLE: Frequent nonallelic gene conversion on the human lineage and its effect on the divergence of gene duplicates :BTYPE: article :CUSTOM_ID: Harpak2017 :ABSTRACT: <jats:title>Significance</jats:title> <jats:p>Nonallelic gene conversion (NAGC) is a driver of more than 20 diseases. It is also thought to drive the \textquotedblleft{}concerted evolution\textquotedblright{} of gene duplicates because it acts to eliminate any differences that accumulate between them. Despite its importance, the parameters that govern NAGC are not well characterized. We developed statistical tools to study NAGC and its consequences for human gene duplicates. We find that the baseline rate of NAGC in humans is 20 times faster than the point mutation rate. Despite this high rate, NAGC has a surprisingly small effect on the average sequence divergence of human duplicates\textemdash{}and concerted evolution is not as pervasive as previously thought.</jats:p> :AUTHOR: Harpak, Arbel and Lan, Xun and Gao, Ziyue and Pritchard, Jonathan K. :DOI: 10.1073/pnas.1708151114 :ISSUE: 48 :JOURNAL: Proceedings of the National Academy of Sciences :LANGUAGE: en :MONTH: 11 :PAGES: 12779--12784 :PUBLISHER: Proceedings of the National Academy of Sciences :URL: http://dx.doi.org/10.1073/pnas.1708151114 :VOLUME: 114 :YEAR: 2017 :END: * Benchmarking challenging small variants with linked and long reads :PROPERTIES: :TITLE: Benchmarking challenging small variants with linked and long reads :BTYPE: misc :CUSTOM_ID: Wagner2022 :ABSTRACT: <jats:title>Summary</jats:title><jats:p>Genome in a Bottle (GIAB) benchmarks have been widely used to help validate clinical sequencing pipelines and develop new variant calling and sequencing methods. Here, we use accurate linked reads and long reads to expand the prior benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are not readily accessible to short reads. Our new benchmark adds more than 300,000 SNVs, 50,000 indels, and 16 \% new exonic variants, many in challenging, clinically relevant genes not previously covered (e.g., <jats:italic>PMS2</jats:italic>). For HG002, we include 92\% of the autosomal GRCh38 assembly, while excluding problematic regions for benchmarking small variants (e.g., copy number variants and reference errors) that should not have been in the previous version, which included 85\% of GRCh38. By including difficult-to-map regions, this benchmark identifies eight times more false negatives in a short read variant call set relative to our previous benchmark.We have demonstrated the utility of this benchmark to reliably identify false positives and false negatives across technologies in more challenging regions, which enables continued technology and bioinformatics development.</jats:p> :AUTHOR: Wagner, Justin and Olson, Nathan D and Harris, Lindsay and McDaniel, Jennifer and Khan, Ziad and Farek, Jesse and Mahmoud, Medhat and Stankovic, Ana and Kovacevic, Vladimir and Yoo, Byunggil and Miller, Neil and Rosenfeld, Jeffrey A. and Ni, Bohan and Zarate, Samantha and Kirsche, Melanie and Aganezov, Sergey and Schatz, Michael and Narzisi, Giuseppe and Byrska-Bishop, Marta and Clarke, Wayne and Evani, Uday S. and Markello, Charles and Shafin, Kishwar and Zhou, Xin and Sidow, Arend and Bansal, Vikas and Ebert, Peter and Marschall, Tobias and Lansdorp, Peter and Hanlon, Vincent and Mattsson, Carl-Adam and Barrio, Alvaro Martinez and Fiddes, Ian T and Xiao, Chunlin and Fungtammasan, Arkarachai and Chin, Chen-Shan and Wenger, Aaron M and Rowell, William J and Sedlazeck, Fritz J and Carroll, Andrew and Salit, Marc and Zook, Justin M :DOI: 10.1101/2020.07.24.212712 :MONTH: 7 :PUBLISHER: Cold Spring Harbor Laboratory :URL: http://dx.doi.org/10.1101/2020.07.24.212712 :YEAR: 2022 :END: * Alignment-free approaches for predicting novel Nuclear Mitochondrial Segments (NUMTs) in the human genome :PROPERTIES: :TITLE: Alignment-free approaches for predicting novel Nuclear Mitochondrial Segments (NUMTs) in the human genome :BTYPE: article :CUSTOM_ID: Li2019 :AUTHOR: Li, Wentian and Freudenberg, Jerome and Freudenberg, Jan :DOI: 10.1016/j.gene.2018.12.040 :JOURNAL: Gene :LANGUAGE: en :MONTH: 4 :PAGES: 141--152 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.gene.2018.12.040 :VOLUME: 691 :YEAR: 2019 :END: * Enhanced copy number variants detection from whole-exome sequencing data using EXCAVATOR2 :PROPERTIES: :TITLE: Enhanced copy number variants detection from whole-exome sequencing data using EXCAVATOR2 :BTYPE: article :CUSTOM_ID: DAurizio2016 :AUTHOR: D'Aurizio, Romina and Pippucci, Tommaso and Tattini, Lorenzo and Giusti, Betti and Pellegrini, Marco and Magi, Alberto :DOI: 10.1093/nar/gkw695 :JOURNAL: Nucleic Acids Research :LANGUAGE: en :MONTH: 8 :PAGES: gkw695 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/nar/gkw695 :YEAR: 2016 :END: * Scaling accurate genetic variant discovery to tens of thousands of samples :PROPERTIES: :TITLE: Scaling accurate genetic variant discovery to tens of thousands of samples :BTYPE: misc :CUSTOM_ID: Poplin2017 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>Comprehensive disease gene discovery in both common and rare diseases will require the efficient and accurate detection of all classes of genetic variation across tens to hundreds of thousands of human samples. We describe here a novel assembly-based approach to variant calling, the GATK HaplotypeCaller (HC) and Reference Confidence Model (RCM), that determines genotype likelihoods independently per-sample but performs joint calling across all samples within a project simultaneously. We show by calling over 90,000 samples from the Exome Aggregation Consortium (ExAC) that, in contrast to other algorithms, the HC-RCM scales efficiently to very large sample sizes without loss in accuracy; and that the accuracy of indel variant calling is superior in comparison to other algorithms. More importantly, the HC-RCM produces a fully squared-off matrix of genotypes across all samples at every genomic position being investigated. The HC-RCM is a novel, scalable, assembly-based algorithm with abundant applications for population genetics and clinical studies.</jats:p> :AUTHOR: Poplin, Ryan and Ruano-Rubio, Valentin and DePristo, Mark A. and Fennell, Tim J. and Carneiro, Mauricio O. and Van der Auwera, Geraldine A. and Kling, David E. and Gauthier, Laura D. and Levy-Moonshine, Ami and Roazen, David and Shakir, Khalid and Thibault, Joel and Chandran, Sheila and Whelan, Chris and Lek, Monkol and Gabriel, Stacey and Daly, Mark J and Neale, Ben and MacArthur, Daniel G. and Banks, Eric :DOI: 10.1101/201178 :MONTH: 11 :PUBLISHER: Cold Spring Harbor Laboratory :URL: http://dx.doi.org/10.1101/201178 :YEAR: 2017 :END: * Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR :PROPERTIES: :TITLE: Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR :BTYPE: article :CUSTOM_ID: Yang :ABSTRACT: This protocol describes how to annotate genomic variants using either the ANNOVAR software or the web-based wANNOVAR tool. Recent developments in sequencing techniques have enabled rapid and high-throughput generation of sequence data, democratizing the ability to compile information on large amounts of genetic variations in individual laboratories. However, there is a growing gap between the generation of raw sequencing data and the extraction of meaningful biological information. Here, we describe a protocol to use the ANNOVAR (ANNOtate VARiation) software to facilitate fast and easy variant annotations, including gene-based, region-based and filter-based annotations on a variant call format (VCF) file generated from human genomes. We further describe a protocol for gene-based annotation of a newly sequenced nonhuman species. Finally, we describe how to use a user-friendly and easily accessible web server called wANNOVAR to prioritize candidate genes for a Mendelian disease. The variant annotation protocols take 5\textendash{}30 min of computer time, depending on the size of the variant file, and 5\textendash{}10 min of hands-on time. In summary, through the command-line tool and the web server, these protocols provide a convenient means to analyze genetic variants generated in humans and other species. :AUTHOR: Yang, Hui and Wang, Kai :DATE: 2015-09-17 :DOI: 10.1038/nprot.2015.105 :ISSN: 1750-2799 :ISSUE: 10 :JOURNAL: Nature Protocols :KEYWORDS: Genetic variation :LANGUAGE: En :PUBLISHER: Nature Publishing Group :URL: https://www.nature.com/articles/nprot.2015.105 :VOLUME: 10 :YEAR: 2015 :END: * Similarities and differences between variants called with human reference genome HG19 or HG38 :PROPERTIES: :TITLE: Similarities and differences between variants called with human reference genome HG19 or HG38 :BTYPE: article :CUSTOM_ID: Pan :ABSTRACT: Reference genome selection is a prerequisite for successful analysis of next generation sequencing (NGS) data. Current practice employs one of the two most recent human reference genome versions: HG19 or HG38. To date, the impact of genome version on SNV identification has not been rigorously assessed. We conducted analysis comparing the SNVs identified based on HG19 vs HG38, leveraging whole genome sequencing (WGS) data from the genome-in-a-bottle (GIAB) project. First, SNVs were called using 26 different bioinformatics pipelines with either HG19 or HG38. Next, two tools were used to convert the called SNVs between HG19 and HG38. Lastly we calculated conversion rates, analyzed discordant rates between SNVs called with HG19 or HG38, and characterized the discordant SNVs. The conversion rates from HG38 to HG19 (average 95\%) were lower than the conversion rates from HG19 to HG38 (average 99\%). The conversion rates varied slightly among the various calling pipelines. Around 1.5\% SNVs were discordantly converted between HG19 or HG38. The conversions from HG38 to HG19 had more SNVs which failed conversion and more discordant SNVs than the opposite conversion (HG19 to HG38). Most of the discordant SNVs had low read depth, were low confidence SNVs as defined by GIAB, and/or were predominated by G/C alleles (52\% observed versus 42\% expected). A significant number of SNVs could not be converted between HG19 and HG38. Based on careful review of our comparisons, we recommend HG38 (the newer version) for NGS SNV analysis. To summarize, our findings suggest caution when translating identified SNVs between different versions of the human reference genome. :AUTHOR: Pan, Bohu and Kusko, Rebecca and Xiao, Wenming and Zheng, Yuanting and Liu, Zhichao and Xiao, Chunlin and Sakkiah, Sugunadevi and Guo, Wenjing and Gong, Ping and Zhang, Chaoyang and Ge, Weigong and Shi, Leming and Tong, Weida and Hong, Huixiao :DATE: 2019-03-14 :DOI: 10.1186/s12859-019-2620-0 :ISSN: 1471-2105 :ISSUE: 2 :JOURNAL: BMC Bioinformatics :KEYWORDS: Bioinformatics :LANGUAGE: En :PUBLISHER: BioMed Central :URL: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2620-0 :VOLUME: 20 :END: * Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects :PROPERTIES: :TITLE: Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects :BTYPE: article :CUSTOM_ID: Regier2018 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>Hundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power. A central challenge for joint analysis is that different WGS data processing pipelines cause substantial differences in variant calling in combined datasets, necessitating computationally expensive reprocessing. This approach is no longer tenable given the scale of current studies and data volumes. Here, we define WGS data processing standards that allow different groups to produce functionally equivalent (FE) results, yet still innovate on data processing pipelines. We present initial FE pipelines developed at five genome centers and show that they yield similar variant calling results and produce significantly less variability than sequencing replicates. This work alleviates a key technical bottleneck for genome aggregation and helps lay the foundation for community-wide human genetics studies.</jats:p> :AUTHOR: Regier, Allison A. and Farjoun, Yossi and Larson, David E. and Krasheninina, Olga and Kang, Hyun Min and Howrigan, Daniel P. and Chen, Bo-Juen and Kher, Manisha and Banks, Eric and Ames, Darren C. and English, Adam C. and Li, Heng and Xing, Jinchuan and Zhang, Yeting and Matise, Tara and Abecasis, Goncalo R. and Salerno, Will and Zody, Michael C. and Neale, Benjamin M. and Hall, Ira M. :DOI: 10.1038/s41467-018-06159-4 :ISSUE: 1 :JOURNAL: Nature Communications :LANGUAGE: en :MONTH: 10 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41467-018-06159-4 :VOLUME: 9 :YEAR: 2018 :END: * Unifying package managers, workflow engines, and containers: Computational reproducibility with BioNix :PROPERTIES: :TITLE: Unifying package managers, workflow engines, and containers: Computational reproducibility with BioNix :BTYPE: article :CUSTOM_ID: Bedo2020 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:sec> <jats:title>Motivation</jats:title> <jats:p>A challenge for computational biologists is to make our analyses reproducible\textemdash{}i.e. to rerun, combine, and share, with the assurance that equivalent runs will generate identical results. Current best practice aims at this using a combination of package managers, workflow engines, and containers.</jats:p> </jats:sec> <jats:sec> <jats:title>Results</jats:title> <jats:p>We present BioNix, a lightweight library built on the Nix deployment system. BioNix manages software dependencies, computational environments, and workflow stages together using a single abstraction: pure functions. This lets users specify workflows in a clean, uniform way, with strong reproducibility guarantees.</jats:p> </jats:sec> <jats:sec> <jats:title>Availability and Implementation</jats:title> <jats:p>BioNix is implemented in the Nix expression language and is released on GitHub under the 3-clause BSD license: https://github.com/PapenfussLab/bionix (biotools:BioNix) (BioNix, RRID:SCR\_017662).</jats:p> </jats:sec> :AUTHOR: Bed\H{o}, Justin and Di Stefano, Leon and Papenfuss, Anthony T :DOI: 10.1093/gigascience/giaa121 :ISSUE: 11 :JOURNAL: GigaScience :LANGUAGE: en :MONTH: 11 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/gigascience/giaa121 :VOLUME: 9 :YEAR: 2020 :END: * Mobster: accurate detection of mobile element insertions in next generation sequencing data :PROPERTIES: :TITLE: Mobster: accurate detection of mobile element insertions in next generation sequencing data :BTYPE: article :CUSTOM_ID: Thung2014 :AUTHOR: Thung, Djie Tjwan and de Ligt, Joep and Vissers, Lisenka EM and Steehouwer, Marloes and Kroon, Mark and de Vries, Petra and Slagboom, Eline P and Ye, Kai and Veltman, Joris A and Hehir-Kwa, Jayne Y :DOI: 10.1186/s13059-014-0488-x :ISSUE: 10 :JOURNAL: Genome Biology :LANGUAGE: en :MONTH: 10 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1186/s13059-014-0488-x :VOLUME: 15 :YEAR: 2014 :END: * ANNOVAR documentation :PROPERTIES: :TITLE: ANNOVAR documentation :BTYPE: article :CUSTOM_ID: AnnovarDoc :ABSTRACT: Documentation for ANNOVAR software :AUTHOR: ANNOVAR :DOI: 10.1093/nar/gkz923/5603227 :URL: https://annovar.openbioinformatics.org/en/latest/ :YEAR: 2023 :END: * Variant calling: Considerations, practices, and developments :PROPERTIES: :TITLE: Variant calling: Considerations, practices, and developments :BTYPE: article :CUSTOM_ID: Zverinova2022 :AUTHOR: Zverinova, Stepanka and Guryev, Victor :DOI: 10.1002/humu.24311 :ISSUE: 8 :JOURNAL: Human Mutation :LANGUAGE: en :MONTH: 8 :PAGES: 976--985 :PUBLISHER: Hindawi Limited :URL: http://dx.doi.org/10.1002/humu.24311 :VOLUME: 43 :YEAR: 2022 :END: * De novo genome assembly: what every biologist should know :PROPERTIES: :TITLE: De novo genome assembly: what every biologist should know :BTYPE: article :CUSTOM_ID: Baker2012 :AUTHOR: Baker, Monya :DOI: 10.1038/nmeth.1935 :ISSUE: 4 :JOURNAL: Nature Methods :LANGUAGE: en :MONTH: 4 :PAGES: 333--337 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/nmeth.1935 :VOLUME: 9 :YEAR: 2012 :END: * Variant calling and benchmarking in an era of complete human genome sequences :PROPERTIES: :TITLE: Variant calling and benchmarking in an era of complete human genome sequences :BTYPE: article :CUSTOM_ID: Olson :ABSTRACT: Genetic variant calling from DNA sequencing has enabled understanding of germline variation in hundreds of thousands of humans. Sequencing technologies and variant-calling methods have advanced rapidly, routinely providing reliable variant calls in most of the human genome. We describe how advances in long reads, deep learning, de novo assembly and pangenomes have expanded access to variant calls in increasingly challenging, repetitive genomic regions, including medically relevant regions, and how new benchmark sets and benchmarking methods illuminate their strengths and limitations. Finally, we explore the possible future of more complete characterization of human genome variation in light of the recent completion of a telomere-to-telomere human genome reference assembly and human pangenomes, and we consider the innovations needed to benchmark their newly accessible repetitive regions and complex variants. Variant calling is the process of identifying genetic variants, which is important for characterizing population genetic diversity and for identifying disease-associated variants in clinical sequencing projects. In this Review, the authors discuss the state-of-the-art in variant calling, focusing on challenging types of genetic variants, advances in both sequencing technologies and computational pipelines, and benchmarking strategies to assess the robustness of variant-calling strategies. :AUTHOR: Olson, Nathan D. and Wagner, Justin and Dwarshuis, Nathan and Miga, Karen H. and Sedlazeck, Fritz J. and Salit, Marc and Zook, Justin M. :DATE: 2023-04-14 :DOI: 10.1038/s41576-023-00590-0 :ISSN: 1471-0064 :ISSUE: 7 :JOURNAL: Nature Reviews Genetics :KEYWORDS: DNA sequencing :LANGUAGE: En :PUBLISHER: Nature Publishing Group :URL: https://www.nature.com/articles/s41576-023-00590-0.epdf?sharing\_token=g0rQWMn7wp4g\_pOXuSFBBtRgN0jAjWel9jnR3ZoTv0N-vGxs0JU76MliazF53ZilSipARn0MhRuH-GQkm\_Ozmxe6pLVKUtVDxOyTXgPQNV\_apvVT9cT3pRn\_v1iQDYVlp03nYAkpC5VvwWJ1maXqJG4cCSabFvnLoaGv0H6-SUg\%3D :END: * Detection of long repeat expansions from PCR-free whole-genome sequence data :PROPERTIES: :TITLE: Detection of long repeat expansions from PCR-free whole-genome sequence data :BTYPE: article :CUSTOM_ID: Dolzhenko2017 :ABSTRACT: <jats:p>Identifying large expansions of short tandem repeats (STRs), such as those that cause amyotrophic lateral sclerosis (ALS) and fragile X syndrome, is challenging for short-read whole-genome sequencing (WGS) data. A solution to this problem is an important step toward integrating WGS into precision medicine. We developed a software tool called ExpansionHunter that, using PCR-free WGS short-read data, can genotype repeats at the locus of interest, even if the expanded repeat is larger than the read length. We applied our algorithm to WGS data from 3001 ALS patients who have been tested for the presence of the <jats:italic>C9orf72</jats:italic> repeat expansion with repeat-primed PCR (RP-PCR). Compared against this truth data, ExpansionHunter correctly classified all (212/212, 95\% CI [0.98, 1.00]) of the expanded samples as either expansions (208) or potential expansions (4). Additionally, 99.9\% (2786/2789, 95\% CI [0.997, 1.00]) of the wild-type samples were correctly classified as wild type by this method with the remaining three samples identified as possible expansions. We further applied our algorithm to a set of 152 samples in which every sample had one of eight different pathogenic repeat expansions, including those associated with fragile X syndrome, Friedreich's ataxia, and Huntington's disease, and correctly flagged all but one of the known repeat expansions. Thus, ExpansionHunter can be used to accurately detect known pathogenic repeat expansions and provides researchers with a tool that can be used to identify new pathogenic repeat expansions.</jats:p> :AUTHOR: Dolzhenko, Egor and van Vugt, Joke J.F.A. and Shaw, Richard J. and Bekritsky, Mitchell A. and van Blitterswijk, Marka and Narzisi, Giuseppe and Ajay, Subramanian S. and Rajan, Vani and Lajoie, Bryan R. and Johnson, Nathan H. and Kingsbury, Zoya and Humphray, Sean J. and Schellevis, Raymond D. and Brands, William J. and Baker, Matt and Rademakers, Rosa and Kooyman, Maarten and Tazelaar, Gijs H.P. and van Es, Michael A. and McLaughlin, Russell and Sproviero, William and Shatunov, Aleksey and Jones, Ashley and Al Khleifat, Ahmad and Pittman, Alan and Morgan, Sarah and Hardiman, Orla and Al-Chalabi, Ammar and Shaw, Chris and Smith, Bradley and Neo, Edmund J. and Morrison, Karen and Shaw, Pamela J. and Reeves, Catherine and Winterkorn, Lara and Wexler, Nancy S. and Housman, David E. and Ng, Christopher W. and Li, Alina L. and Taft, Ryan J. and van den Berg, Leonard H. and Bentley, David R. and Veldink, Jan H. and Eberle, Michael A. and None, None :DOI: 10.1101/gr.225672.117 :ISSUE: 11 :JOURNAL: Genome Research :LANGUAGE: en :MONTH: 11 :PAGES: 1895--1903 :PUBLISHER: Cold Spring Harbor Laboratory :URL: http://dx.doi.org/10.1101/gr.225672.117 :VOLUME: 27 :YEAR: 2017 :END: * Centromere reference models for human chromosomes X and Y satellite arrays :PROPERTIES: :TITLE: Centromere reference models for human chromosomes X and Y satellite arrays :BTYPE: article :CUSTOM_ID: Miga2014 :ABSTRACT: <jats:p>The human genome sequence remains incomplete, with multimegabase-sized gaps representing the endogenous centromeres and other heterochromatic regions. Available sequence-based studies within these sites in the genome have demonstrated a role in centromere function and chromosome pairing, necessary to ensure proper chromosome segregation during cell division. A common genomic feature of these regions is the enrichment of long arrays of near-identical tandem repeats, known as satellite DNAs, which offer a limited number of variant sites to differentiate individual repeat copies across millions of bases. This substantial sequence homogeneity challenges available assembly strategies and, as a result, centromeric regions are omitted from ongoing genomic studies. To address this problem, we utilize monomer sequence and ordering information obtained from whole-genome shotgun reads to model two haploid human satellite arrays on chromosomes X and Y, resulting in an initial characterization of 3.83 Mb of centromeric DNA within an individual genome. To further expand the utility of each centromeric reference sequence model, we evaluate sites within the arrays for short-read mappability and chromosome specificity. Because satellite DNAs evolve in a concerted manner, we use these centromeric assemblies to assess the extent of sequence variation among 366 individuals from distinct human populations. We thus identify two satellite array variants in both X and Y centromeres, as determined by array length and sequence composition. This study provides an initial sequence characterization of a regional centromere and establishes a foundation to extend genomic characterization to these sites as well as to other repeat-rich regions within complex genomes.</jats:p> :AUTHOR: Miga, Karen H. and Newton, Yulia and Jain, Miten and Altemose, Nicolas and Willard, Huntington F. and Kent, W. James :DOI: 10.1101/gr.159624.113 :ISSUE: 4 :JOURNAL: Genome Research :LANGUAGE: en :MONTH: 4 :PAGES: 697--707 :PUBLISHER: Cold Spring Harbor Laboratory :URL: http://dx.doi.org/10.1101/gr.159624.113 :VOLUME: 24 :YEAR: 2014 :END: * DNA Sequencing Costs: Data :PROPERTIES: :TITLE: DNA Sequencing Costs: Data :BTYPE: article :CUSTOM_ID: Wetterstrand :ABSTRACT: Data used to estimate the cost of sequencing the human genome over time since the Human Genome Project. :AUTHOR: Wetterstrand, Kris A. :URL: https://www.genome.gov/about-genomics/fact-sheets/DNA-Sequencing-Costs-Data :END: * Sustainable packaging of quantum chemistry software with the Nix package manager :PROPERTIES: :TITLE: Sustainable packaging of quantum chemistry software with the Nix package manager :BTYPE: article :CUSTOM_ID: Kowalewski2022 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>The installation of quantum chemistry software packages is commonly done manually and can be a time-consuming and complicated process. An update of the underlying Linux system requires a reinstallation in many cases and can quietly break software installed on the system. In this paper, we present an approach that allows for an easy installation of quantum chemistry software packages, which is also independent of operating system updates. The use of the Nix package manager allows building software in a reproducible manner, which allows for a reconstruction of the software for later reproduction of scientific results. The build recipes that are provided can be readily used by anyone to avoid complex installation procedures.</jats:p> :AUTHOR: Kowalewski, Markus and Seeber, Phillip :DOI: 10.1002/qua.26872 :ISSUE: 9 :JOURNAL: International Journal of Quantum Chemistry :LANGUAGE: en :MONTH: 5 :PUBLISHER: Wiley :URL: http://dx.doi.org/10.1002/qua.26872 :VOLUME: 122 :YEAR: 2022 :END: * Snakemake\textemdash{}a scalable bioinformatics workflow engine :PROPERTIES: :TITLE: Snakemake\textemdash{}a scalable bioinformatics workflow engine :BTYPE: article :CUSTOM_ID: Koster2012 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:p>Summary: Snakemake is a workflow engine that provides a readable Python-based workflow definition language and a powerful execution environment that scales from single-core workstations to compute clusters without modifying the workflow. It is the first system to support the use of automatically inferred multiple named wildcards (or variables) in input and output filenames.</jats:p> <jats:p>Availability: http://snakemake.googlecode.com.</jats:p> <jats:p>Contact: johannes.koester@uni-due.de</jats:p> :AUTHOR: K\"{o}ster, Johannes and Rahmann, Sven :DOI: 10.1093/bioinformatics/bts480 :ISSUE: 19 :JOURNAL: Bioinformatics :LANGUAGE: en :MONTH: 10 :PAGES: 2520--2522 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/bioinformatics/bts480 :VOLUME: 28 :YEAR: 2012 :END: * An open resource for accurately benchmarking small variant and reference calls :PROPERTIES: :TITLE: An open resource for accurately benchmarking small variant and reference calls :BTYPE: article :CUSTOM_ID: Zook2019 :AUTHOR: Zook, Justin M. and McDaniel, Jennifer and Olson, Nathan D. and Wagner, Justin and Parikh, Hemang and Heaton, Haynes and Irvine, Sean A. and Trigg, Len and Truty, Rebecca and McLean, Cory Y. and De La Vega, Francisco M. and Xiao, Chunlin and Sherry, Stephen and Salit, Marc :DOI: 10.1038/s41587-019-0074-6 :ISSUE: 5 :JOURNAL: Nature Biotechnology :LANGUAGE: en :MONTH: 5 :PAGES: 561--566 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41587-019-0074-6 :VOLUME: 37 :YEAR: 2019 :END: * Comparison of Short-Read Sequence Aligners Indicates Strengths and Weaknesses for Biologists to Consider :PROPERTIES: :TITLE: Comparison of Short-Read Sequence Aligners Indicates Strengths and Weaknesses for Biologists to Consider :BTYPE: article :CUSTOM_ID: Musich2021 :ABSTRACT: <jats:p>Aligning short-read sequences is the foundational step to most genomic and transcriptomic analyses, but not all tools perform equally, and choosing among the growing body of available tools can be daunting. Here, in order to increase awareness in the research community, we discuss the merits of common algorithms and programs in a way that should be approachable to biologists with limited experience in bioinformatics. We will only in passing consider the effects of data cleanup, a precursor analysis to most alignment tools, and no consideration will be given to downstream processing of the aligned fragments. To compare aligners [Bowtie2, Burrows Wheeler Aligner (BWA), HISAT2, MUMmer4, STAR, and TopHat2], an RNA-seq dataset was used containing data from 48 geographically distinct samples of the grapevine powdery mildew fungus <jats:italic>Erysiphe necator</jats:italic>. Based on alignment rate and gene coverage, all aligners performed well with the exception of TopHat2, which HISAT2 superseded. BWA perhaps had the best performance in these metrics, except for longer transcripts (\&gt;500 bp) for which HISAT2 and STAR performed well. HISAT2 was \textasciitilde{}3-fold faster than the next fastest aligner in runtime, which we consider a secondary factor in most alignments. At the end, this direct comparison of commonly used aligners illustrates key considerations when choosing which tool to use for the specific sequencing data and objectives. No single tool meets all needs for every user, and there are many quality aligners available.</jats:p> :AUTHOR: Musich, Ryan and Cadle-Davidson, Lance and Osier, Michael V. :DOI: 10.3389/fpls.2021.657240 :JOURNAL: Frontiers in Plant Science :MONTH: 4 :PUBLISHER: Frontiers Media SA :URL: http://dx.doi.org/10.3389/fpls.2021.657240 :VOLUME: 12 :YEAR: 2021 :END: * Recommendations for whole genome sequencing in diagnostics for rare diseases :PROPERTIES: :TITLE: Recommendations for whole genome sequencing in diagnostics for rare diseases :BTYPE: article :CUSTOM_ID: Souche2022 :ABSTRACT: In 2016, guidelines for diagnostic Next Generation Sequencing (NGS) have been published by EuroGentest in order to assist laboratories in the implementation and accreditation of NGS in a diagnostic setting. These guidelines mainly focused on Whole Exome Sequencing (WES) and targeted (gene panels) sequencing detecting small germline variants (Single Nucleotide Variants (SNVs) and insertions/deletions (indels)). Since then, Whole Genome Sequencing (WGS) has been increasingly introduced in the diagnosis of rare diseases as WGS allows the simultaneous detection of SNVs, Structural Variants (SVs) and other types of variants such as repeat expansions. The use of WGS in diagnostics warrants the re-evaluation and update of previously published guidelines. This work was jointly initiated by EuroGentest and the Horizon2020 project Solve-RD. Statements from the 2016 guidelines have been reviewed in the context of WGS and updated where necessary. The aim of these recommendations is primarily to list the points to consider for clinical (laboratory) geneticists, bioinformaticians, and (non-)geneticists, to provide technical advice, aid clinical decision-making and the reporting of the results. :AUTHOR: Souche, Erika and Beltran, Sergi and Brosens, Erwin and Belmont, John W. and Fossum, Magdalena and Riess, Olaf and Gilissen, Christian and Ardeshirdavani, Amin and Houge, Gunnar and van Gijn, Marielle and Clayton-Smith, Jill and Synofzik, Matthis and de Leeuw, Nicole and Deans, Zandra C. and Dincer, Yasemin and Eck, Sebastian H. and van der Crabben, Saskia and Balasubramanian, Meena and Graessner, Holm and Sturm, Marc and Firth, Helen and Ferlini, Alessandra and Nabbout, Rima and De Baere, Elfride and Liehr, Thomas and Macek, Milan and Matthijs, Gert and Scheffer, Hans and Bauer, Peter and Yntema, Helger G. and Weiss, Marjan M. :DATE: 2022-05-16 :DOI: 10.1038/s41431-022-01113-x :ISSN: 1476-5438 :ISSUE: 9 :JOURNAL: European Journal of Human Genetics :KEYWORDS: Medical genetics :LANGUAGE: En :PUBLISHER: Nature Publishing Group :URL: https://www.nature.com/articles/s41431-022-01113-x :VOLUME: 30 :YEAR: 2022 :END: * SMIM1 variants rs1175550 and rs143702418 independently modulate Vel blood group antigen expression :PROPERTIES: :TITLE: SMIM1 variants rs1175550 and rs143702418 independently modulate Vel blood group antigen expression :BTYPE: article :CUSTOM_ID: Christophersen2017 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>The Vel blood group antigen is expressed on the red blood cells of most individuals. Recently, we described that homozygosity for inactivating mutations in <jats:italic>SMIM1</jats:italic> defines the rare Vel-negative phenotype. Still, Vel-positive individuals show great variability in Vel antigen expression, creating a risk for Vel blood typing errors and transfusion reactions. We fine-mapped the regulatory region located in <jats:italic>SMIM1</jats:italic> intron 2 in Swedish blood donors, and observed a strong correlation between expression and rs1175550 as well as with a previously unreported tri-nucleotide insertion (rs143702418; C\hspace{0.167em}\>\hspace{0.167em}CGCA). While the two variants are tightly linked in Caucasians, we separated their effects in African Americans, and found that rs1175550G and to a lesser extent rs143702418C independently increase <jats:italic>SMIM1</jats:italic> and Vel antigen expression. Gel shift and luciferase assays indicate that both variants are transcriptionally active, and we identified binding of the transcription factor TAL1 as a potential mediator of the increased expression associated with rs1175550G. Our results provide insight into the regulatory logic of Vel antigen expression, and extend the set of markers for genetic Vel blood group typing.</jats:p> :AUTHOR: Christophersen, Mikael K. and J\"{o}ud, Magnus and Ajore, Ram and Vege, Sunitha and Ljungdahl, Klara W. and Westhoff, Connie M. and Olsson, Martin L. and Storry, Jill R. and Nilsson, Bj\"{o}rn :DOI: 10.1038/srep40451 :ISSUE: 1 :JOURNAL: Scientific Reports :LANGUAGE: en :MONTH: 1 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/srep40451 :VOLUME: 7 :YEAR: 2017 :END: * Comparing Variant Call Files for Performance Benchmarking of Next-Generation Sequencing Variant Calling Pipelines :PROPERTIES: :TITLE: Comparing Variant Call Files for Performance Benchmarking of Next-Generation Sequencing Variant Calling Pipelines :BTYPE: misc :CUSTOM_ID: Cleary2015 :ABSTRACT: <jats:p>To evaluate and compare the performance of variant calling methods and their confidence scores, comparisons between a test call set and a ?gold standard? need to be carried out. Unfortunately, these comparisons are not straightforward with the current Variant Call Files (VCF), which are the standard output of most variant calling algorithms for high-throughput sequencing data. Comparisons of VCFs are often confounded by the different representations of indels, MNPs, and combinations thereof with SNVs in complex regions of the genome, resulting in misleading results. A variant caller is inherently a classification method designed to score putative variants with confidence scores that could permit controlling the rate of false positives (FP) or false negatives (FN) for a given application. Receiver operator curves (ROC) and the area under the ROC (AUC) are efficient metrics to evaluate a test call set versus a gold standard. However, in the case of VCF data this also requires a special accounting to deal with discrepant representations. We developed a novel algorithm for comparing variant call sets that deals with complex call representation discrepancies and through a dynamic programing method that minimizes false positives and negatives globally across the entire call sets for accurate performance evaluation of VCFs.</jats:p> :AUTHOR: Cleary, John G. and Braithwaite, Ross and Gaastra, Kurt and Hilbush, Brian S and Inglis, Stuart and Irvine, Sean A and Jackson, Alan and Littin, Richard and Rathod, Mehul and Ware, David and Zook, Justin M. and Trigg, Len and De La Vega, Francisco M. :DOI: 10.1101/023754 :MONTH: 8 :PUBLISHER: Cold Spring Harbor Laboratory :URL: http://dx.doi.org/10.1101/023754 :YEAR: 2015 :END: * Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers :PROPERTIES: :TITLE: Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers :BTYPE: article :CUSTOM_ID: Chen2019 :ABSTRACT: The development and innovation of next generation sequencing (NGS) and the subsequent analysis tools have gain popularity in scientific researches and clinical diagnostic applications. Hence, a systematic comparison of the sequencing platforms and variant calling pipelines could provide significant guidance to NGS-based scientific and clinical genomics. In this study, we compared the performance, concordance and operating efficiency of 27 combinations of sequencing platforms and variant calling pipelines, testing three variant calling pipelines\textemdash{}Genome Analysis Tool Kit HaplotypeCaller, Strelka2 and Samtools-Varscan2 for nine data sets for the NA12878 genome sequenced by different platforms including BGISEQ500, MGISEQ2000, HiSeq4000, NovaSeq and HiSeq Xten. For the variants calling performance of 12 combinations in WES datasets, all combinations displayed good performance in calling SNPs, with their F-scores entirely higher than 0.96, and their performance in calling INDELs varies from 0.75 to 0.91. And all 15 combinations in WGS datasets also manifested good performance, with F-scores in calling SNPs were entirely higher than 0.975 and their performance in calling INDELs varies from 0.71 to 0.93. All of these combinations manifested high concordance in variant identification, while the divergence of variants identification in WGS datasets were larger than that in WES datasets. We also down-sampled the original WES and WGS datasets at a series of gradient coverage across multiple platforms, then the variants calling period consumed by the three pipelines at each coverage were counted, respectively. For the GIAB datasets on both BGI and Illumina platforms, Strelka2 manifested its ultra-performance in detecting accuracy and processing efficiency compared with other two pipelines on each sequencing platform, which was recommended in the further promotion and application of next generation sequencing technology. The results of our researches will provide useful and comprehensive guidelines for personal or organizational researchers in reliable and consistent variants identification. :AUTHOR: Chen, Jiayun and Li, Xingsong and Zhong, Hongbin and Meng, Yuhuan and Du, Hongli :DATE: 2019-06-27 :DOI: 10.1038/s41598-019-45835-3 :ISSN: 2045-2322 :ISSUE: 1 :JOURNAL: Scientific Reports :KEYWORDS: DNA sequencing :LANGUAGE: En :PUBLISHER: Nature Publishing Group :URL: https://www.nature.com/articles/s41598-019-45835-3 :VOLUME: 9 :YEAR: 2019 :END: * Strelka2: fast and accurate calling of germline and somatic variants :PROPERTIES: :TITLE: Strelka2: fast and accurate calling of germline and somatic variants :BTYPE: article :CUSTOM_ID: Kim2018 :AUTHOR: Kim, Sangtae and Scheffler, Konrad and Halpern, Aaron L. and Bekritsky, Mitchell A. and Noh, Eunho and K\"{a}llberg, Morten and Chen, Xiaoyu and Kim, Yeonbin and Beyter, Doruk and Krusche, Peter and Saunders, Christopher T. :DOI: 10.1038/s41592-018-0051-x :ISSUE: 8 :JOURNAL: Nature Methods :LANGUAGE: en :MONTH: 8 :PAGES: 591--594 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41592-018-0051-x :VOLUME: 15 :YEAR: 2018 :END: * Systematic Evaluation of Sanger Validation of Next-Generation Sequencing Variants :PROPERTIES: :TITLE: Systematic Evaluation of Sanger Validation of Next-Generation Sequencing Variants :BTYPE: article :CUSTOM_ID: Beck2016 :ABSTRACT: AbstractBACKGROUND. Next-generation sequencing (NGS) data are used for both clinical care and clinical research. DNA sequence variants identified using NGS are :AUTHOR: Beck, Tyler F and Mullikin, James C and the NISC Comparative Sequencing Program, and Biesecker, Leslie G :DOI: 10.1373/clinchem.2015.249623 :ISSN: 0009-9147 :ISSUE: 4 :JOURNAL: Clinical Chemistry :PUBLISHER: Oxford Academic :URL: https://dx.doi.org/10.1373/clinchem.2015.249623 :VOLUME: 62 :YEAR: 2016 :END: * Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications :PROPERTIES: :TITLE: Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications :BTYPE: article :CUSTOM_ID: Chen2016 :ABSTRACT: <jats:p>Summary: We describe Manta, a method to discover structural variants and indels from next generation sequencing data. Manta is optimized for rapid germline and somatic analysis, calling structural variants, medium-sized indels and large insertions on standard compute hardware in less than a tenth of the time that comparable methods require to identify only subsets of these variant types: for example NA12878 at 50\texttimes{} genomic coverage is analyzed in less than 20\hspace{0.167em}min. Manta can discover and score variants based on supporting paired and split-read evidence, with scoring models optimized for germline analysis of diploid individuals and somatic analysis of tumor-normal sample pairs. Call quality is similar to or better than comparable methods, as determined by pedigree consistency of germline calls and comparison of somatic calls to COSMIC database variants. Manta consistently assembles a higher fraction of its calls to base-pair resolution, allowing for improved downstream annotation and analysis of clinical significance. We provide Manta as a community resource to facilitate practical and routine structural variant analysis in clinical and research sequencing scenarios.</jats:p> <jats:p>Availability and implementation: Manta is released under the open-source GPLv3 license. Source code, documentation and Linux binaries are available from https://github.com/Illumina/manta.</jats:p> <jats:p>Contact: csaunders@illumina.com</jats:p> <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.</jats:p> :AUTHOR: Chen, Xiaoyu and Schulz-Trieglaff, Ole and Shaw, Richard and Barnes, Bret and Schlesinger, Felix and K\"{a}llberg, Morten and Cox, Anthony J. and Kruglyak, Semyon and Saunders, Christopher T. :DOI: 10.1093/bioinformatics/btv710 :ISSUE: 8 :JOURNAL: Bioinformatics :LANGUAGE: en :MONTH: 4 :PAGES: 1220--1222 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/bioinformatics/btv710 :VOLUME: 32 :YEAR: 2016 :END: * Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications :PROPERTIES: :TITLE: Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications :BTYPE: article :CUSTOM_ID: Rimmer2014 :AUTHOR: Rimmer, Andy and None, None and Phan, Hang and Mathieson, Iain and Iqbal, Zamin and Twigg, Stephen R F and Wilkie, Andrew O M and McVean, Gil and Lunter, Gerton :DOI: 10.1038/ng.3036 :ISSUE: 8 :JOURNAL: Nature Genetics :LANGUAGE: en :MONTH: 8 :PAGES: 912--918 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/ng.3036 :VOLUME: 46 :YEAR: 2014 :END: * New evaluation methods of read mapping by 17 aligners on simulated and empirical NGS data: an updated comparison of DNA- and RNA-Seq data from Illumina and Ion Torrent technologies :PROPERTIES: :TITLE: New evaluation methods of read mapping by 17 aligners on simulated and empirical NGS data: an updated comparison of DNA- and RNA-Seq data from Illumina and Ion Torrent technologies :BTYPE: article :CUSTOM_ID: Donato2021 :AUTHOR: Donato, Luigi and Scimone, Concetta and Rinaldi, Carmela and D'Angelo, Rosalia and Sidoti, Antonina :DOI: 10.1007/s00521-021-06188-z :ISSUE: 22 :JOURNAL: Neural Computing and Applications :LANGUAGE: en :MONTH: 11 :PAGES: 15669--15692 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1007/s00521-021-06188-z :VOLUME: 33 :YEAR: 2021 :END: * Systematic comparison of variant calling pipelines using gold standard personal exome variants :PROPERTIES: :TITLE: Systematic comparison of variant calling pipelines using gold standard personal exome variants :BTYPE: article :CUSTOM_ID: Hwang2015 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>The success of clinical genomics using next generation sequencing (NGS) requires the accurate and consistent identification of personal genome variants. Assorted variant calling methods have been developed, which show low concordance between their calls. Hence, a systematic comparison of the variant callers could give important guidance to NGS-based clinical genomics. Recently, a set of high-confident variant calls for one individual (NA12878) has been published by the Genome in a Bottle (GIAB) consortium, enabling performance benchmarking of different variant calling pipelines. Based on the gold standard reference variant calls from GIAB, we compared the performance of thirteen variant calling pipelines, testing combinations of three read aligners\textemdash{}BWA-MEM, Bowtie2 and Novoalign\textemdash{}and four variant callers\textemdash{}Genome Analysis Tool Kit HaplotypeCaller (GATK-HC), Samtools mpileup, Freebayes and Ion Proton Variant Caller (TVC), for twelve data sets for the NA12878 genome sequenced by different platforms including Illumina2000, Illumina2500 and Ion Proton, with various exome capture systems and exome coverage. We observed different biases toward specific types of SNP genotyping errors by the different variant callers. The results of our study provide useful guidelines for reliable variant identification from deep sequencing of personal genomes.</jats:p> :AUTHOR: Hwang, Sohyun and Kim, Eiru and Lee, Insuk and Marcotte, Edward M. :DOI: 10.1038/srep17875 :ISSUE: 1 :JOURNAL: Scientific Reports :LANGUAGE: en :MONTH: 12 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/srep17875 :VOLUME: 5 :YEAR: 2015 :END: * Extensive sequencing of seven human genomes to characterize benchmark reference materials :PROPERTIES: :TITLE: Extensive sequencing of seven human genomes to characterize benchmark reference materials :BTYPE: article :CUSTOM_ID: Zook2016 :ABSTRACT: The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a ... :AUTHOR: Zook, Justin M. and Catoe, David and McDaniel, Jennifer and Vang, Lindsay and Spies, Noah and Sidow, Arend and Weng, Ziming and Liu, Yuling and Mason, Christopher E. and Alexander, Noah and Henaff, Elizabeth and McIntyre, Alexa B.R. and Chandramohan, Dhruva and Chen, Feng and Jaeger, Erich and Moshrefi, Ali and Pham, Khoa and Stedman, William and Liang, Tiffany and Saghbini, Michael and Dzakula, Zeljko and Hastie, Alex and Cao, Han and Deikus, Gintaras and Schadt, Eric and Sebra, Robert and Bashir, Ali and Truty, Rebecca M. and Chang, Christopher C. and Gulbahce, Natali and Zhao, Keyan and Ghosh, Srinka and Hyland, Fiona and Fu, Yutao and Chaisson, Mark and Xiao, Chunlin and Trow, Jonathan and Sherry, Stephen T. and Zaranek, Alexander W. and Ball, Madeleine and Bobe, Jason and Estep, Preston and Church, George M. and Marks, Patrick and Kyriazopoulou-Panagiotopoulou, Sofia and Zheng, Grace X.Y. and Schnall-Levin, Michael and Ordonez, Heather S. and Mudivarti, Patrice A. and Giorda, Kristina and Sheng, Ying and Rypdal, Karoline Bjarnesdatter and Salit, Marc :DATE: 2016 :DOI: 10.1038/sdata.2016.25 :JOURNAL: Scientific Data :LANGUAGE: en :PUBLISHER: Nature Publishing Group :URL: /pmc/articles/PMC4896128/ :VOLUME: 3 :YEAR: 2016 :END: * A robust benchmark for detection of germline large deletions and insertions :PROPERTIES: :TITLE: A robust benchmark for detection of germline large deletions and insertions :BTYPE: article :CUSTOM_ID: Zook2020 :AUTHOR: Zook, Justin M. and Hansen, Nancy F. and Olson, Nathan D. and Chapman, Lesley and Mullikin, James C. and Xiao, Chunlin and Sherry, Stephen and Koren, Sergey and Phillippy, Adam M. and Boutros, Paul C. and Sahraeian, Sayed Mohammad E. and Huang, Vincent and Rouette, Alexandre and Alexander, Noah and Mason, Christopher E. and Hajirasouliha, Iman and Ricketts, Camir and Lee, Joyce and Tearle, Rick and Fiddes, Ian T. and Barrio, Alvaro Martinez and Wala, Jeremiah and Carroll, Andrew and Ghaffari, Noushin and Rodriguez, Oscar L. and Bashir, Ali and Jackman, Shaun and Farrell, John J. and Wenger, Aaron M. and Alkan, Can and Soylev, Arda and Schatz, Michael C. and Garg, Shilpa and Church, George and Marschall, Tobias and Chen, Ken and Fan, Xian and English, Adam C. and Rosenfeld, Jeffrey A. and Zhou, Weichen and Mills, Ryan E. and Sage, Jay M. and Davis, Jennifer R. and Kaiser, Michael D. and Oliver, John S. and Catalano, Anthony P. and Chaisson, Mark J. P. and Spies, Noah and Sedlazeck, Fritz J. and Salit, Marc :DOI: 10.1038/s41587-020-0538-8 :ISSUE: 11 :JOURNAL: Nature Biotechnology :LANGUAGE: en :MONTH: 11 :PAGES: 1347--1355 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41587-020-0538-8 :VOLUME: 38 :YEAR: 2020 :END: * Next-generation sequencing technologies: An overview :PROPERTIES: :TITLE: Next-generation sequencing technologies: An overview :BTYPE: article :CUSTOM_ID: Hu2021 :AUTHOR: Hu, Taishan and Chitnis, Nilesh and Monos, Dimitri and Dinh, Anh :DOI: 10.1016/j.humimm.2021.02.012 :ISSUE: 11 :JOURNAL: Human Immunology :LANGUAGE: en :MONTH: 11 :PAGES: 801--811 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.humimm.2021.02.012 :VOLUME: 82 :YEAR: 2021 :END: * Guidelines for diagnostic next-generation sequencing :PROPERTIES: :TITLE: Guidelines for diagnostic next-generation sequencing :BTYPE: article :CUSTOM_ID: Matthijs2015 :ABSTRACT: We present, on behalf of EuroGentest and the European Society of Human Genetics, guidelines for the evaluation and validation of next-generation sequencing (NGS) applications for the diagnosis of genetic disorders. The work was performed by a group of laboratory geneticists and bioinformaticians, and discussed with clinical geneticists, industry and patients' representatives, and other stakeholders in the field of human genetics. The statements that were written during the elaboration of the guidelines are presented here. The background document and full guidelines are available as supplementary material. They include many examples to assist the laboratories in the implementation of NGS and accreditation of this service. The work and ideas presented by others in guidelines that have emerged elsewhere in the course of the past few years were also considered and are acknowledged in the full text. Interestingly, a few new insights that have not been cited before have emerged during the preparation of the guidelines. The most important new feature is the presentation of a `rating system' for NGS-based diagnostic tests. The guidelines and statements have been applauded by the genetic diagnostic community, and thus seem to be valuable for the harmonization and quality assurance of NGS diagnostics in Europe. :AUTHOR: Matthijs, Gert and Souche, Erika and Alders, Mari\"{e}lle and Corveleyn, Anniek and Eck, Sebastian and Feenstra, Ilse and Race, Val\'{e}rie and Sistermans, Erik and Sturm, Marc and Weiss, Marjan and Yntema, Helger and Bakker, Egbert and Scheffer, Hans and Bauer, Peter :DATE: 2015-10-28 :DOI: 10.1038/ejhg.2015.226 :ISSN: 1476-5438 :ISSUE: 1 :JOURNAL: European Journal of Human Genetics :KEYWORDS: Genetic testing :LANGUAGE: En :PUBLISHER: Nature Publishing Group :URL: https://www.nature.com/articles/ejhg2015226 :VOLUME: 24 :YEAR: 2015 :END: * Ruffus: a lightweight Python library for computational pipelines :PROPERTIES: :TITLE: Ruffus: a lightweight Python library for computational pipelines :BTYPE: article :CUSTOM_ID: Goodstadt :ABSTRACT: Abstract. Summary: Computational pipelines are common place in scientific research. However, most of the resources for constructing pipelines are heavyweight sy :AUTHOR: Goodstadt, Leo :DOI: 10.1093/bioinformatics/btq524 :ISSN: 1367-4803 :ISSUE: 21 :JOURNAL: Bioinformatics :PUBLISHER: Oxford Academic :URL: https://dx.doi.org/10.1093/bioinformatics/btq524 :VOLUME: 26 :END: * A synthetic-diploid benchmark for accurate variant-calling evaluation :PROPERTIES: :TITLE: A synthetic-diploid benchmark for accurate variant-calling evaluation :BTYPE: article :CUSTOM_ID: Li2018 :AUTHOR: Li, Heng and Bloom, Jonathan M. and Farjoun, Yossi and Fleharty, Mark and Gauthier, Laura and Neale, Benjamin and MacArthur, Daniel :DOI: 10.1038/s41592-018-0054-7 :ISSUE: 8 :JOURNAL: Nature Methods :LANGUAGE: en :MONTH: 8 :PAGES: 595--597 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41592-018-0054-7 :VOLUME: 15 :YEAR: 2018 :END: * VC@Scale: Scalable and high-performance variant calling on cluster environments :PROPERTIES: :TITLE: VC@Scale: Scalable and high-performance variant calling on cluster environments :BTYPE: article :CUSTOM_ID: Ahmad2021 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:sec> <jats:title>Background</jats:title> <jats:p>Recently many new deep learning\textendash{}based variant-calling methods like DeepVariant have emerged as more accurate compared with conventional variant-calling algorithms such as GATK HaplotypeCaller, Sterlka2, and Freebayes albeit at higher computational costs. Therefore, there is a need for more scalable and higher performance workflows of these deep learning methods. Almost all existing cluster-scaled variant-calling workflows that use Apache Spark/Hadoop as big data frameworks loosely integrate existing single-node pre-processing and variant-calling applications. Using Apache Spark just for distributing/scheduling data among loosely coupled applications or using I/O-based storage for storing the output of intermediate applications does not exploit the full benefit of Apache Spark in-memory processing. To achieve this, we propose a native Spark-based workflow that uses Python and Apache Arrow to enable efficient transfer of data between different workflow stages. This benefits from the ease of programmability of Python and the high efficiency of Arrow's columnar in-memory data transformations.</jats:p> </jats:sec> <jats:sec> <jats:title>Results</jats:title> <jats:p>Here we present a scalable, parallel, and efficient implementation of next-generation sequencing data pre-processing and variant-calling workflows. Our design tightly integrates most pre-processing workflow stages, using Spark built-in functions to sort reads by coordinates and mark duplicates efficiently. Our approach outperforms state-of-the-art implementations by \&gt;2 times for the pre-processing stages, creating a scalable and high-performance solution for DeepVariant for both CPU-only and CPU + GPU clusters.</jats:p> </jats:sec> <jats:sec> <jats:title>Conclusions</jats:title> <jats:p>We show the feasibility and easy scalability of our approach to achieve high performance and efficient resource utilization for variant-calling analysis on high-performance computing clusters using the standardized Apache Arrow data representations. All codes, scripts, and configurations used to run our implementations are publicly available and open sourced; see https://github.com/abs-tudelft/variant-calling-at-scale.</jats:p> </jats:sec> :AUTHOR: Ahmad, Tanveer and Al Ars, Zaid and Hofstee, H Peter :DOI: 10.1093/gigascience/giab057 :ISSUE: 9 :JOURNAL: GigaScience :LANGUAGE: en :MONTH: 9 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/gigascience/giab057 :VOLUME: 10 :YEAR: 2021 :END: * Nextflow enables reproducible computational workflows :PROPERTIES: :TITLE: Nextflow enables reproducible computational workflows :BTYPE: article :CUSTOM_ID: DiTommaso2017 :AUTHOR: Di Tommaso, Paolo and Chatzou, Maria and Floden, Evan W and Barja, Pablo Prieto and Palumbo, Emilio and Notredame, Cedric :DOI: 10.1038/nbt.3820 :ISSUE: 4 :JOURNAL: Nature Biotechnology :LANGUAGE: en :MONTH: 4 :PAGES: 316--319 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/nbt.3820 :VOLUME: 35 :YEAR: 2017 :END: * Heterozygosity Ratio, a Robust Global Genomic Measure of Autozygosity and Its Association with Height and Disease Risk :PROPERTIES: :TITLE: Heterozygosity Ratio, a Robust Global Genomic Measure of Autozygosity and Its Association with Height and Disease Risk :BTYPE: article :CUSTOM_ID: Samuels2016 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:p>Greater genetic variability in an individual is protective against recessive disease. However, existing quantifications of autozygosity, such as runs of homozygosity (ROH), have proved highly sensitive to genotyping density and have yielded inconclusive results about the relationship of diversity and disease risk. Using genotyping data from three data sets with \&gt;43,000 subjects, we demonstrated that an alternative approach to quantifying genetic variability, the heterozygosity ratio, is a robust measure of diversity and is positively associated with the nondisease trait height and several disease phenotypes in subjects of European ancestry. The heterozygosity ratio is the number of heterozygous sites in an individual divided by the number of nonreference homozygous sites and is strongly affected by the degree of genetic admixture of the population and varies across human populations. Unlike quantifications of ROH, the heterozygosity ratio is not sensitive to the density of genotyping performed. Our results establish the heterozygosity ratio as a powerful new statistic for exploring the patterns and phenotypic effects of different levels of genetic variation in populations.</jats:p> :AUTHOR: Samuels, David C and Wang, Jing and Ye, Fei and He, Jing and Levinson, Rebecca T and Sheng, Quanhu and Zhao, Shilin and Capra, John A and Shyr, Yu and Zheng, Wei and Guo, Yan :DOI: 10.1534/genetics.116.189936 :ISSUE: 3 :JOURNAL: Genetics :LANGUAGE: en :MONTH: 11 :PAGES: 893--904 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1534/genetics.116.189936 :VOLUME: 204 :YEAR: 2016 :END: * Declining transition/transversion ratios through time reveal limitations to the accuracy of nucleotide substitution models :PROPERTIES: :TITLE: Declining transition/transversion ratios through time reveal limitations to the accuracy of nucleotide substitution models :BTYPE: article :CUSTOM_ID: Duchene2015 :AUTHOR: Duch\^{e}ne, Sebasti\'{a}n and Ho, Simon YW and Holmes, Edward C :DOI: 10.1186/s12862-015-0312-6 :ISSUE: 1 :JOURNAL: BMC Evolutionary Biology :LANGUAGE: en :MONTH: 12 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1186/s12862-015-0312-6 :VOLUME: 15 :YEAR: 2015 :END: * Coming of age: ten years of next-generation sequencing technologies :PROPERTIES: :TITLE: Coming of age: ten years of next-generation sequencing technologies :BTYPE: article :CUSTOM_ID: Goodwin2016 :AUTHOR: Goodwin, Sara and McPherson, John D. and McCombie, W. Richard :DOI: 10.1038/nrg.2016.49 :ISSUE: 6 :JOURNAL: Nature Reviews Genetics :LANGUAGE: en :MONTH: 6 :PAGES: 333--351 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/nrg.2016.49 :VOLUME: 17 :YEAR: 2016 :END: * SPiP: Splicing Prediction Pipeline, a machine learning tool for massive detection of exonic and intronic variant effects on mRNA splicing :PROPERTIES: :TITLE: SPiP: Splicing Prediction Pipeline, a machine learning tool for massive detection of exonic and intronic variant effects on mRNA splicing :BTYPE: article :CUSTOM_ID: Leman2022 :AUTHOR: Leman, Rapha\"{e}l and Parfait, B\'{e}atrice and Vidaud, Dominique and Girodon, Emmanuelle and Pacot, Laurence and Le Gac, G\'{e}rald and Ka, Chandran and Ferec, Claude and Fichou, Yann and Quesnelle, C\'{e}line and Aucouturier, Camille and Muller, Etienne and Vaur, Dominique and Castera, Laurent and Boulouard, Flavie and Ricou, Agathe and Tubeuf, H\'{e}l\`{e}ne and Soukarieh, Omar and Gaildrat, Pascaline and Riant, Florence and Guillaud-Bataille, Marine and Caputo, Sandrine M. and Caux-Moncoutier, Virginie and Boutry-Kryza, Nadia and Bonnet-Dorion, Fran\c{c}oise and Schultz, Ines and Rossing, Maria and Quenez, Olivier and Goldenberg, Louis and Harter, Valentin and Parsons, Michael T. and Spurdle, Amanda B. and Fr\'{e}bourg, Thierry and Martins, Alexandra and Houdayer, Claude and Krieger, Sophie :DOI: 10.1002/humu.24491 :ISSUE: 12 :JOURNAL: Human Mutation :LANGUAGE: en :MONTH: 12 :PAGES: 2308--2323 :PUBLISHER: Hindawi Limited :URL: http://dx.doi.org/10.1002/humu.24491 :VOLUME: 43 :YEAR: 2022 :END: * Recommendations for the Use of in Silico Approaches for Next-Generation Sequencing Bioinformatic Pipeline Validation :PROPERTIES: :TITLE: Recommendations for the Use of in Silico Approaches for Next-Generation Sequencing Bioinformatic Pipeline Validation :BTYPE: article :CUSTOM_ID: Duncavage2023 :AUTHOR: Duncavage, Eric J. and Coleman, Joshua F. and de Baca, Monica E. and Kadri, Sabah and Leon, Annette and Routbort, Mark and Roy, Somak and Suarez, Carlos J. and Vanderbilt, Chad and Zook, Justin M. :DOI: 10.1016/j.jmoldx.2022.09.007 :ISSUE: 1 :JOURNAL: The Journal of Molecular Diagnostics :LANGUAGE: en :MONTH: 1 :PAGES: 3--16 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.jmoldx.2022.09.007 :VOLUME: 25 :YEAR: 2023 :END: * VarBen: Generating in Silico Reference Data Sets for Clinical Next-Generation Sequencing Bioinformatics Pipeline Evaluation :PROPERTIES: :TITLE: VarBen: Generating in Silico Reference Data Sets for Clinical Next-Generation Sequencing Bioinformatics Pipeline Evaluation :BTYPE: article :CUSTOM_ID: Li2021Varben :ABSTRACT: Next-generation sequencing is increasingly being adopted as a valuable method for the detection of somatic variants in clinical oncology. However, it \ldots{} :AUTHOR: Li, Ziyang and Fang, Shuangsang and Zhang, Rui and Yu, Lijia and Zhang, Jiawei and Bu, Dechao and Sun, Liang and Zhao, Yi and Li, Jinming :DOI: 10.1016/j.jmoldx.2020.11.010 :ISSN: 1525-1578 :ISSUE: 3 :JOURNAL: The Journal of Molecular Diagnostics :PAGES: 285-299 :PUBLISHER: Elsevier :URL: https://www.sciencedirect.com/science/article/pii/S1525157820305857 :VOLUME: 23 :YEAR: 2021 :END: * Firing patterns in the adaptive exponential integrate-and-fire model :PROPERTIES: :TITLE: Firing patterns in the adaptive exponential integrate-and-fire model :BTYPE: article :CUSTOM_ID: Naud2008 :AUTHOR: Naud, Richard and Marcille, Nicolas and Clopath, Claudia and Gerstner, Wulfram :DOI: 10.1007/s00422-008-0264-7 :ISSUE: 4-5 :JOURNAL: Biological Cybernetics :LANGUAGE: en :MONTH: 11 :PAGES: 335--347 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1007/s00422-008-0264-7 :VOLUME: 99 :YEAR: 2008 :END: * Protocol for unbiased, consolidated variant calling from whole exome sequencing data :PROPERTIES: :TITLE: Protocol for unbiased, consolidated variant calling from whole exome sequencing data :BTYPE: article :CUSTOM_ID: Verrou2022 :AUTHOR: Verrou, Kleio-Maria and Pavlopoulos, Georgios A. and Moulos, Panagiotis :DOI: 10.1016/j.xpro.2022.101418 :ISSUE: 2 :JOURNAL: STAR Protocols :LANGUAGE: en :MONTH: 6 :PAGES: 101418 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.xpro.2022.101418 :VOLUME: 3 :YEAR: 2022 :END: * Accuracy and efficiency of germline variant calling pipelines for human genome data :PROPERTIES: :TITLE: Accuracy and efficiency of germline variant calling pipelines for human genome data :BTYPE: article :CUSTOM_ID: Zhao2020 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>Advances in next-generation sequencing technology have enabled whole genome sequencing (WGS) to be widely used for identification of causal variants in a spectrum of genetic-related disorders, and provided new insight into how genetic polymorphisms affect disease phenotypes. The development of different bioinformatics pipelines has continuously improved the variant analysis of WGS data. However, there is a necessity for a systematic performance comparison of these pipelines to provide guidance on the application of WGS-based scientific and clinical genomics. In this study, we evaluated the performance of three variant calling pipelines (GATK, DRAGEN and DeepVariant) using the Genome in a Bottle Consortium, \textquotedblleft{}synthetic-diploid\textquotedblright{} and simulated WGS datasets. DRAGEN and DeepVariant show better accuracy in SNP and indel calling, with no significant differences in their F1-score. DRAGEN platform offers accuracy, flexibility and a highly-efficient execution speed, and therefore superior performance in the analysis of WGS data on a large scale. The combination of DRAGEN and DeepVariant also suggests a good balance of accuracy and efficiency as an alternative solution for germline variant detection in further applications. Our results facilitate the standardization of benchmarking analysis of bioinformatics pipelines for reliable variant detection, which is critical in genetics-based medical research and clinical applications.</jats:p> :AUTHOR: Zhao, Sen and Agafonov, Oleg and Azab, Abdulrahman and Stokowy, Tomasz and Hovig, Eivind :DOI: 10.1038/s41598-020-77218-4 :ISSUE: 1 :JOURNAL: Scientific Reports :LANGUAGE: en :MONTH: 11 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41598-020-77218-4 :VOLUME: 10 :YEAR: 2020 :END: * A global reference for human genetic variation :PROPERTIES: :TITLE: A global reference for human genetic variation :BTYPE: article :CUSTOM_ID: 1000Genomes :AUTHOR: The 1000 Genomes Project Consortium :DOI: 10.1038/nature15393 :ISSUE: 7571 :JOURNAL: Nature :LANGUAGE: en :MONTH: 10 :PAGES: 68--74 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/nature15393 :VOLUME: 526 :YEAR: 2015 :END: * Genomics in the cloud: using Docker, GATK, and WDL in Terra :PROPERTIES: :TITLE: Genomics in the cloud: using Docker, GATK, and WDL in Terra :BTYPE: article :CUSTOM_ID: Auwera2020 :AUTHOR: Van der Auwera, Geraldine A and O'Connor, Brian D. :PUBLISHER: O'Reilly Media :YEAR: 2020 :END: * A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree :PROPERTIES: :TITLE: A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree :BTYPE: article :CUSTOM_ID: Eberle2017 :ABSTRACT: <jats:p>Improvement of variant calling in next-generation sequence data requires a comprehensive, genome-wide catalog of high-confidence variants called in a set of genomes for use as a benchmark. We generated deep, whole-genome sequence data of 17 individuals in a three-generation pedigree and called variants in each genome using a range of currently available algorithms. We used haplotype transmission information to create a phased \textquotedblleft{}Platinum\textquotedblright{} variant catalog of 4.7 million single-nucleotide variants (SNVs) plus 0.7 million small (1\textendash{}50 bp) insertions and deletions (indels) that are consistent with the pattern of inheritance in the parents and 11 children of this pedigree. Platinum genotypes are highly concordant with the current catalog of the National Institute of Standards and Technology for both SNVs (\>99.99\%) and indels (99.92\%) and add a validated truth catalog that has 26\% more SNVs and 45\% more indels. Analysis of 334,652 SNVs that were consistent between informatics pipelines yet inconsistent with haplotype transmission (\textquotedblleft{}nonplatinum\textquotedblright{}) revealed that the majority of these variants are de novo and cell-line mutations or reside within previously unidentified duplications and deletions. The reference materials from this study are a resource for objective assessment of the accuracy of variant calls throughout genomes.</jats:p> :AUTHOR: Eberle, Michael A. and Fritzilas, Epameinondas and Krusche, Peter and K\"{a}llberg, Morten and Moore, Benjamin L. and Bekritsky, Mitchell A. and Iqbal, Zamin and Chuang, Han-Yu and Humphray, Sean J. and Halpern, Aaron L. and Kruglyak, Semyon and Margulies, Elliott H. and McVean, Gil and Bentley, David R. :DOI: 10.1101/gr.210500.116 :ISSUE: 1 :JOURNAL: Genome Research :LANGUAGE: en :MONTH: 1 :PAGES: 157--164 :PUBLISHER: Cold Spring Harbor Laboratory :URL: http://dx.doi.org/10.1101/gr.210500.116 :VOLUME: 27 :YEAR: 2017 :END: * A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff :PROPERTIES: :TITLE: A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff :BTYPE: article :CUSTOM_ID: Cingolani2012 :AUTHOR: Cingolani, Pablo and Platts, Adrian and Wang, Le Lily and Coon, Melissa and Nguyen, Tung and Wang, Luan and Land, Susan J. and Lu, Xiangyi and Ruden, Douglas M. :DOI: 10.4161/fly.19695 :ISSUE: 2 :JOURNAL: Fly :LANGUAGE: en :MONTH: 4 :PAGES: 80--92 :PUBLISHER: Informa UK Limited :URL: http://dx.doi.org/10.4161/fly.19695 :VOLUME: 6 :YEAR: 2012 :END: * Best practices for the interpretation and reporting of clinical whole genome sequencing :PROPERTIES: :TITLE: Best practices for the interpretation and reporting of clinical whole genome sequencing :BTYPE: article :CUSTOM_ID: AustinTse2022 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>Whole genome sequencing (WGS) shows promise as a first-tier diagnostic test for patients with rare genetic disorders. However, standards addressing the definition and deployment practice of a best-in-class test are lacking. To address these gaps, the Medical Genome Initiative, a consortium of leading health care and research organizations in the US and Canada, was formed to expand access to high quality clinical WGS by convening experts and publishing best practices. Here, we present best practice recommendations for the interpretation and reporting of clinical diagnostic WGS, including discussion of challenges and emerging approaches that will be critical to harness the full potential of this comprehensive test.</jats:p> :AUTHOR: Austin-Tse, Christina A. and Jobanputra, Vaidehi and Perry, Denise L. and Bick, David and Taft, Ryan J. and Venner, Eric and Gibbs, Richard A. and Young, Ted and Barnett, Sarah and Belmont, John W. and Boczek, Nicole and Chowdhury, Shimul and Ellsworth, Katarzyna A. and Guha, Saurav and Kulkarni, Shashikant and Marcou, Cherisse and Meng, Linyan and Murdock, David R. and Rehman, Atteeq U. and Spiteri, Elizabeth and Thomas-Wilson, Amanda and Kearney, Hutton M. and Rehm, Heidi L. and None, None :DOI: 10.1038/s41525-022-00295-z :ISSUE: 1 :JOURNAL: npj Genomic Medicine :LANGUAGE: en :MONTH: 4 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41525-022-00295-z :VOLUME: 7 :YEAR: 2022 :END: * Whole Exome Sequencing Achieved a Definite Diagnosis of Kaufman Oculocerebrofacial Syndrome in a Bahraini Family: A Case Report :PROPERTIES: :TITLE: Whole Exome Sequencing Achieved a Definite Diagnosis of Kaufman Oculocerebrofacial Syndrome in a Bahraini Family: A Case Report :BTYPE: article :CUSTOM_ID: Fida2023 :ABSTRACT: <jats:p> A 1\hspace{0.167em}year and 7\hspace{0.167em}months old girl presented to the medical genetic clinic as a referral from the pediatrics clinic. Upon examining the patient and assessing past medical history, an autosomal recessive disorder was suspected. The family underwent whole exome sequencing, which resulted in the diagnosis of Kaufman oculocerebrofacial syndrome (OMIM \#244450) in the patient due to the fact that both parents were heterozygous carriers of a novel pathogenic variant in the gene UBE3B that lies on 12q24. It has been recommended for the family that preimplantation genetic testing should be considered for future pregnancies. In this case report, we present a novel variant of the gene and highlight the support of whole exome sequencing in the unveiling of genetic disorders. </jats:p> :AUTHOR: Fida, Mariam and Sinan, Israa and Finan, Alan :DOI: 10.1177/11795565231200130 :JOURNAL: Clinical Medicine Insights: Pediatrics :LANGUAGE: en :MONTH: 1 :PUBLISHER: SAGE Publications :URL: http://dx.doi.org/10.1177/11795565231200130 :VOLUME: 17 :YEAR: 2023 :END: * On genomic repeats and reproducibility :PROPERTIES: :TITLE: On genomic repeats and reproducibility :BTYPE: article :CUSTOM_ID: Firtina2016 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:p>Results: Here, we present a comprehensive analysis on the reproducibility of computational characterization of genomic variants using high throughput sequencing data. We reanalyzed the same datasets twice, using the same tools with the same parameters, where we only altered the order of reads in the input (i.e. FASTQ file). Reshuffling caused the reads from repetitive regions being mapped to different locations in the second alignment, and we observed similar results when we only applied a scatter/gather approach for read mapping\textemdash{}without prior shuffling. Our results show that, some of the most common variation discovery algorithms do not handle the ambiguous read mappings accurately when random locations are selected. In addition, we also observed that even when the exact same alignment is used, the GATK HaplotypeCaller generates slightly different call sets, which we pinpoint to the variant filtration step. We conclude that, algorithms at each step of genomic variation discovery and characterization need to treat ambiguous mappings in a deterministic fashion to ensure full replication of results.</jats:p> <jats:p>Availability and Implementation: Code, scripts and the generated VCF files are available at DOI:10.5281/zenodo.32611.</jats:p> <jats:p>Contact: calkan@cs.bilkent.edu.tr</jats:p> <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.</jats:p> :AUTHOR: Firtina, Can and Alkan, Can :DOI: 10.1093/bioinformatics/btw139 :ISSUE: 15 :JOURNAL: Bioinformatics :LANGUAGE: en :MONTH: 8 :PAGES: 2243--2247 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/bioinformatics/btw139 :VOLUME: 32 :YEAR: 2016 :END: * OMIM.org: Online Mendelian Inheritance in Man (OMIM\textregistered{}), an online catalog of human genes and genetic disorders :PROPERTIES: :TITLE: OMIM.org: Online Mendelian Inheritance in Man (OMIM\textregistered{}), an online catalog of human genes and genetic disorders :BTYPE: article :CUSTOM_ID: Amberger2015 :AUTHOR: Amberger, Joanna S. and Bocchini, Carol A. and Schiettecatte, Fran\c{c}ois and Scott, Alan F. and Hamosh, Ada :DOI: 10.1093/nar/gku1205 :ISSUE: D1 :JOURNAL: Nucleic Acids Research :LANGUAGE: en :MONTH: 1 :PAGES: D789--D798 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/nar/gku1205 :VOLUME: 43 :YEAR: 2015 :END: * Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants :PROPERTIES: :TITLE: Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants :BTYPE: article :CUSTOM_ID: Garcia2020 :ABSTRACT: <ns4:p>Whole-genome sequencing (WGS) is a fundamental technology for research to advance precision medicine, but the limited availability of portable and user-friendly workflows for WGS analyses poses a major challenge for many research groups and hampers scientific progress. Here we present Sarek, an open-source workflow to detect germline variants and somatic mutations based on sequencing data from WGS, whole-exome sequencing (WES), or gene panels. Sarek features (i) easy installation, (ii) robust portability across different computer environments, (iii) comprehensive documentation, (iv) transparent and easy-to-read code, and (v) extensive quality metrics reporting. Sarek is implemented in the Nextflow workflow language and supports both Docker and Singularity containers as well as Conda environments, making it ideal for easy deployment on any POSIX-compatible computers and cloud compute environments. Sarek follows the GATK best-practice recommendations for read alignment and pre-processing, and includes a wide range of software for the identification and annotation of germline and somatic single-nucleotide variants, insertion and deletion variants, structural variants, tumour sample purity, and variations in ploidy and copy number. Sarek offers easy, efficient, and reproducible WGS analyses, and can readily be used both as a production workflow at sequencing facilities and as a powerful stand-alone tool for individual research groups. The Sarek source code, documentation and installation instructions are freely available at <ns4:ext-link xmlns:ns3="http://www.w3.org/1999/xlink" ext-link-type="uri" ns3:href="https://github.com/nf-core/sarek">https://github.com/nf-core/sarek</ns4:ext-link> and at <ns4:ext-link xmlns:ns3="http://www.w3.org/1999/xlink" ext-link-type="uri" ns3:href="https://nf-co.re/sarek/">https://nf-co.re/sarek/</ns4:ext-link>.</ns4:p> :AUTHOR: Garcia, Maxime and Juhos, Szilveszter and Larsson, Malin and Olason, Pall I. and Martin, Marcel and Eisfeldt, Jesper and DiLorenzo, Sebastian and Sandgren, Johanna and D\'{\i}az De St\aa{}hl, Teresita and Ewels, Philip and Wirta, Valtteri and Nist\'{e}r, Monica and K\"{a}ller, Max and Nystedt, Bj\"{o}rn :DOI: 10.12688/f1000research.16665.2 :JOURNAL: F1000Research :LANGUAGE: en :MONTH: 9 :PAGES: 63 :PUBLISHER: F1000 Research Ltd :URL: http://dx.doi.org/10.12688/f1000research.16665.2 :VOLUME: 9 :YEAR: 2020 :END: * Predicting Splicing from Primary Sequence with Deep Learning :PROPERTIES: :TITLE: Predicting Splicing from Primary Sequence with Deep Learning :BTYPE: article :CUSTOM_ID: Jaganathan2019 :AUTHOR: Jaganathan, Kishore and Kyriazopoulou Panagiotopoulou, Sofia and McRae, Jeremy F. and Darbandi, Siavash Fazel and Knowles, David and Li, Yang I. and Kosmicki, Jack A. and Arbelaez, Juan and Cui, Wenwu and Schwartz, Grace B. and Chow, Eric D. and Kanterakis, Efstathios and Gao, Hong and Kia, Amirali and Batzoglou, Serafim and Sanders, Stephan J. and Farh, Kyle Kai-How :DOI: 10.1016/j.cell.2018.12.015 :ISSUE: 3 :JOURNAL: Cell :LANGUAGE: en :MONTH: 1 :PAGES: 535--548.e24 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.cell.2018.12.015 :VOLUME: 176 :YEAR: 2019 :END: * A draft human pangenome reference :PROPERTIES: :TITLE: A draft human pangenome reference :BTYPE: article :CUSTOM_ID: Liao2023 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals<jats:sup>1</jats:sup>. These assemblies cover more than 99\% of the expected sequence in each genome and are more than 99\% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119\hspace{0.167em}million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90\hspace{0.167em}million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34\% and increased the number of structural variants detected per haplotype by 104\% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.</jats:p> :AUTHOR: Liao, Wen-Wei and Asri, Mobin and Ebler, Jana and Doerr, Daniel and Haukness, Marina and Hickey, Glenn and Lu, Shuangjia and Lucas, Julian K. and Monlong, Jean and Abel, Haley J. and Buonaiuto, Silvia and Chang, Xian H. and Cheng, Haoyu and Chu, Justin and Colonna, Vincenza and Eizenga, Jordan M. and Feng, Xiaowen and Fischer, Christian and Fulton, Robert S. and Garg, Shilpa and Groza, Cristian and Guarracino, Andrea and Harvey, William T. and Heumos, Simon and Howe, Kerstin and Jain, Miten and Lu, Tsung-Yu and Markello, Charles and Martin, Fergal J. and Mitchell, Matthew W. and Munson, Katherine M. and Mwaniki, Moses Njagi and Novak, Adam M. and Olsen, Hugh E. and Pesout, Trevor and Porubsky, David and Prins, Pjotr and Sibbesen, Jonas A. and Sir\'{e}n, Jouni and Tomlinson, Chad and Villani, Flavia and Vollger, Mitchell R. and Antonacci-Fulton, Lucinda L. and Baid, Gunjan and Baker, Carl A. and Belyaeva, Anastasiya and Billis, Konstantinos and Carroll, Andrew and Chang, Pi-Chuan and Cody, Sarah and Cook, Daniel E. and Cook-Deegan, Robert M. and Cornejo, Omar E. and Diekhans, Mark and Ebert, Peter and Fairley, Susan and Fedrigo, Olivier and Felsenfeld, Adam L. and Formenti, Giulio and Frankish, Adam and Gao, Yan and Garrison, Nanibaa' A. and Giron, Carlos Garcia and Green, Richard E. and Haggerty, Leanne and Hoekzema, Kendra and Hourlier, Thibaut and Ji, Hanlee P. and Kenny, Eimear E. and Koenig, Barbara A. and Kolesnikov, Alexey and Korbel, Jan O. and Kordosky, Jennifer and Koren, Sergey and Lee, HoJoon and Lewis, Alexandra P. and Magalh\ {a}es, Hugo and Marco-Sola, Santiago and Marijon, Pierre and McCartney, Ann and McDaniel, Jennifer and Mountcastle, Jacquelyn and Nattestad, Maria and Nurk, Sergey and Olson, Nathan D. and Popejoy, Alice B. and Puiu, Daniela and Rautiainen, Mikko and Regier, Allison A. and Rhie, Arang and Sacco, Samuel and Sanders, Ashley D. and Schneider, Valerie A. and Schultz, Baergen I. and Shafin, Kishwar and Smith, Michael W. and Sofia, Heidi J. and Abou Tayoun, Ahmad N. and Thibaud-Nissen, Fran\c{c}oise and Tricomi, Francesca Floriana and Wagner, Justin and Walenz, Brian and Wood, Jonathan M. D. and Zimin, Aleksey V. and Bourque, Guillaume and Chaisson, Mark J. P. and Flicek, Paul and Phillippy, Adam M. and Zook, Justin M. and Eichler, Evan E. and Haussler, David and Wang, Ting and Jarvis, Erich D. and Miga, Karen H. and Garrison, Erik and Marschall, Tobias and Hall, Ira M. and Li, Heng and Paten, Benedict :DOI: 10.1038/s41586-023-05896-x :ISSUE: 7960 :JOURNAL: Nature :LANGUAGE: en :MONTH: 5 :PAGES: 312--324 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41586-023-05896-x :VOLUME: 617 :YEAR: 2023 :END: * Standards and Guidelines for Validating Next-Generation Sequencing Bioinformatics Pipelines :PROPERTIES: :TITLE: Standards and Guidelines for Validating Next-Generation Sequencing Bioinformatics Pipelines :BTYPE: article :CUSTOM_ID: Roy2018 :AUTHOR: Roy, Somak and Coldren, Christopher and Karunamurthy, Arivarasan and Kip, Nefize S. and Klee, Eric W. and Lincoln, Stephen E. and Leon, Annette and Pullambhatla, Mrudula and Temple-Smolkin, Robyn L. and Voelkerding, Karl V. and Wang, Chen and Carter, Alexis B. :DOI: 10.1016/j.jmoldx.2017.11.003 :ISSUE: 1 :JOURNAL: The Journal of Molecular Diagnostics :LANGUAGE: en :MONTH: 1 :PAGES: 4--27 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.jmoldx.2017.11.003 :VOLUME: 20 :YEAR: 2018 :END: * Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly :PROPERTIES: :TITLE: Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly :BTYPE: article :CUSTOM_ID: Schneider2017 :ABSTRACT: <jats:p>The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.</jats:p> :AUTHOR: Schneider, Valerie A. and Graves-Lindsay, Tina and Howe, Kerstin and Bouk, Nathan and Chen, Hsiu-Chuan and Kitts, Paul A. and Murphy, Terence D. and Pruitt, Kim D. and Thibaud-Nissen, Fran\c{c}oise and Albracht, Derek and Fulton, Robert S. and Kremitzki, Milinn and Magrini, Vincent and Markovic, Chris and McGrath, Sean and Steinberg, Karyn Meltz and Auger, Kate and Chow, William and Collins, Joanna and Harden, Glenn and Hubbard, Timothy and Pelan, Sarah and Simpson, Jared T. and Threadgold, Glen and Torrance, James and Wood, Jonathan M. and Clarke, Laura and Koren, Sergey and Boitano, Matthew and Peluso, Paul and Li, Heng and Chin, Chen-Shan and Phillippy, Adam M. and Durbin, Richard and Wilson, Richard K. and Flicek, Paul and Eichler, Evan E. and Church, Deanna M. :DOI: 10.1101/gr.213611.116 :ISSUE: 5 :JOURNAL: Genome Research :LANGUAGE: en :MONTH: 5 :PAGES: 849--864 :PUBLISHER: Cold Spring Harbor Laboratory :URL: http://dx.doi.org/10.1101/gr.213611.116 :VOLUME: 27 :YEAR: 2017 :END: * Haplotype-based variant detection from short-read sequencing :PROPERTIES: :TITLE: Haplotype-based variant detection from short-read sequencing :BTYPE: article :CUSTOM_ID: Garrison2012 :ABSTRACT: The direct detection of haplotypes from short-read DNA sequencing data requires changes to existing small-variant detection methods. Here, we develop a Bayesian statistical framework which is capable of modeling multiallelic loci in sets of individuals with non-uniform copy number. We then describe our implementation of this framework in a haplotype-based variant detector, FreeBayes. :ARCHIVEPREFIX: arXiv :AUTHOR: Garrison, Erik and Marth, Gabor :EPRINT: 1207.3907v2 :FILE: 1207.3907v2.pdf :MONTH: Jul :PRIMARYCLASS: q-bio.GN :URL: http://arxiv.org/abs/1207.3907v2 :YEAR: 2012 :END: * Assembling and Validating Bioinformatic Pipelines for Next-Generation Sequencing Clinical Assays :PROPERTIES: :TITLE: Assembling and Validating Bioinformatic Pipelines for Next-Generation Sequencing Clinical Assays :BTYPE: article :CUSTOM_ID: Sorelle2020 :ABSTRACT: <jats:sec> <jats:title>Context.\textemdash{}</jats:title> <jats:p>Clinical next-generation sequencing (NGS) is being rapidly adopted, but analysis and interpretation of large data sets prompt new challenges for a clinical laboratory setting. Clinical NGS results rely heavily on the bioinformatics pipeline for identifying genetic variation in complex samples. The choice of bioinformatics algorithms, genome assembly, and genetic annotation databases are important for determining genetic alterations associated with disease. The analysis methods are often tuned to the assay to maximize accuracy. Once a pipeline has been developed, it must be validated to determine accuracy and reproducibility for samples similar to real-world cases. In silico proficiency testing or institutional data exchange will ensure consistency among clinical laboratories.</jats:p> </jats:sec> <jats:sec> <jats:title>Objective.\textemdash{}</jats:title> <jats:p>To provide molecular pathologists a step-by-step guide to bioinformatics analysis and validation design in order to navigate the regulatory and validation standards of implementing a bioinformatic pipeline as a part of a new clinical NGS assay.</jats:p> </jats:sec> <jats:sec> <jats:title>Data Sources.\textemdash{}</jats:title> <jats:p>This guide uses published studies on genomic analysis, bioinformatics methods, and methods comparison studies to inform the reader on what resources, including open source software tools and databases, are available for genetic variant detection and interpretation.</jats:p> </jats:sec> <jats:sec> <jats:title>Conclusions.\textemdash{}</jats:title> <jats:p>This review covers 4 key concepts: (1) bioinformatic analysis design for detecting genetic variation, (2) the resources for assessing genetic effects, (3) analysis validation assessment experiments and data sets, including a diverse set of samples to mimic real-world challenges that assess accuracy and reproducibility, and (4) if concordance between clinical laboratories will be improved by proficiency testing designed to test bioinformatic pipelines.</jats:p> </jats:sec> :AUTHOR: SoRelle, Jeffrey A and Wachsmann, Megan and Cantarel, Brandi L. :DOI: 10.5858/arpa.2019-0476-ra :ISSUE: 9 :JOURNAL: Archives of Pathology \textbackslash{}\& Laboratory Medicine :LANGUAGE: en :MONTH: 9 :PAGES: 1118--1130 :PUBLISHER: Archives of Pathology and Laboratory Medicine :URL: http://dx.doi.org/10.5858/arpa.2019-0476-ra :VOLUME: 144 :YEAR: 2020 :END: * SpliceAI-visual: a free online tool to improve SpliceAI splicing variant interpretation :PROPERTIES: :TITLE: SpliceAI-visual: a free online tool to improve SpliceAI splicing variant interpretation :BTYPE: article :CUSTOM_ID: DeSainteAgathe :ABSTRACT: SpliceAI is an open-source deep learning splicing prediction algorithm that has demonstrated in the past few years its high ability to predict splicing defects caused by DNA variations. However, its outputs present several drawbacks: (1) although the numerical values are very convenient for batch filtering, their precise interpretation can be difficult, (2) the outputs are delta scores which can sometimes mask a severe consequence, and (3) complex delins are most often not handled. We present here SpliceAI-visual, a free online tool based on the SpliceAI algorithm, and show how it complements the traditional SpliceAI analysis. First, SpliceAI-visual manipulates raw scores and not delta scores, as the latter can be misleading in certain circumstances. Second, the outcome of SpliceAI-visual is user-friendly thanks to the graphical presentation. Third, SpliceAI-visual is currently one of the only SpliceAI-derived implementations able to annotate complex variants (e.g., complex delins). We report here the benefits of using SpliceAI-visual and demonstrate its relevance in the assessment/modulation of the PVS1 classification criteria. We also show how SpliceAI-visual can elucidate several complex splicing defects taken from the literature but also from unpublished cases. SpliceAI-visual is available as a Google Colab notebook and has also been fully integrated in a free online variant interpretation tool, MobiDetails ( https://mobidetails.iurc.montp.inserm.fr/MD ). :AUTHOR: de Sainte Agathe, Jean-Madeleine and Filser, Mathilde and Isidor, Bertrand and Besnard, Thomas and Gueguen, Paul and Perrin, Aur\'{e}lien and Van Goethem, Charles and Verebi, Camille and Masingue, Marion and Rendu, John and Coss\'{e}e, Mireille and Bergougnoux, Anne and Frobert, Laurent and Buratti, Julien and Lejeune, \'{E}lodie and Le Guern, \'{E}ric and Pasquier, Florence and Clot, Fabienne and Kalatzis, Vasiliki and Roux, Anne-Fran\c{c}oise and Cogn\'{e}, Benjamin and Baux, David :DATE: 2023-02-10 :DOI: 10.1186/s40246-023-00451-1 :ISSN: 1479-7364 :ISSUE: 1 :JOURNAL: Human Genomics :KEYWORDS: Human Genetics :LANGUAGE: En :PUBLISHER: BioMed Central :URL: https://humgenomics.biomedcentral.com/articles/10.1186/s40246-023-00451-1 :VOLUME: 17 :END: * Toil enables reproducible, open source, big biomedical data analyses :PROPERTIES: :TITLE: Toil enables reproducible, open source, big biomedical data analyses :BTYPE: article :CUSTOM_ID: Vivian2017 :AUTHOR: Vivian, John and Rao, Arjun Arkal and Nothaft, Frank Austin and Ketchum, Christopher and Armstrong, Joel and Novak, Adam and Pfeil, Jacob and Narkizian, Jake and Deran, Alden D and Musselman-Brown, Audrey and Schmidt, Hannes and Amstutz, Peter and Craft, Brian and Goldman, Mary and Rosenbloom, Kate and Cline, Melissa and O'Connor, Brian and Hanna, Megan and Birger, Chet and Kent, W James and Patterson, David A and Joseph, Anthony D and Zhu, Jingchun and Zaranek, Sasha and Getz, Gad and Haussler, David and Paten, Benedict :DOI: 10.1038/nbt.3772 :ISSUE: 4 :JOURNAL: Nature Biotechnology :LANGUAGE: en :MONTH: 4 :PAGES: 314--316 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/nbt.3772 :VOLUME: 35 :YEAR: 2017 :END: * Split-Read Indel and Structural Variant Calling Using PINDEL :PROPERTIES: :TITLE: Split-Read Indel and Structural Variant Calling Using PINDEL :BTYPE: inbook :CUSTOM_ID: Ye2018 :AUTHOR: Ye, Kai and Guo, Li and Yang, Xiaofei and Lamijer, Eric-Wubbo and Raine, Keiran and Ning, Zemin :DOI: 10.1007/978-1-4939-8666-8\_7 :ISBN: ['9781493986651', '9781493986668'] :JOURNAL: Methods in Molecular Biology :MONTH: 7 :PAGES: 95--105 :PUBLISHER: Springer New York :URL: http://dx.doi.org/10.1007/978-1-4939-8666-8\_7 :YEAR: 2018 :END: * VarScan: variant detection in massively parallel sequencing of individual and pooled samples :PROPERTIES: :TITLE: VarScan: variant detection in massively parallel sequencing of individual and pooled samples :BTYPE: article :CUSTOM_ID: Koboldt2009 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:p>Summary: Massively parallel sequencing technologies hold incredible promise for the study of DNA sequence variation, particularly the identification of variants affecting human disease. The unprecedented throughput and relatively short read lengths of Roche/454, Illumina/Solexa, and other platforms have spurred development of a new generation of sequence alignment algorithms. Yet detection of sequence variants based on short read alignments remains challenging, and most currently available tools are limited to a single platform or aligner type. We present VarScan, an open source tool for variant detection that is compatible with several short read aligners. We demonstrate VarScan's ability to detect SNPs and indels with high sensitivity and specificity, in both Roche/454 sequencing of individuals and deep Illumina/Solexa sequencing of pooled samples.</jats:p> <jats:p>Availability and Implementation: Source code and documentation freely available at http://genome.wustl.edu/tools/cancer-genomics implemented as a Perl package and supported on Linux/UNIX, MS Windows and Mac OSX.</jats:p> <jats:p>Contact: dkoboldt@genome.wustl.edu</jats:p> <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.</jats:p> :AUTHOR: Koboldt, Daniel C. and Chen, Ken and Wylie, Todd and Larson, David E. and McLellan, Michael D. and Mardis, Elaine R. and Weinstock, George M. and Wilson, Richard K. and Ding, Li :DOI: 10.1093/bioinformatics/btp373 :ISSUE: 17 :JOURNAL: Bioinformatics :LANGUAGE: en :MONTH: 9 :PAGES: 2283--2285 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/bioinformatics/btp373 :VOLUME: 25 :YEAR: 2009 :END: * Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples :PROPERTIES: :TITLE: Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples :BTYPE: article :CUSTOM_ID: Cibulskis2013 :AUTHOR: Cibulskis, Kristian and Lawrence, Michael S and Carter, Scott L and Sivachenko, Andrey and Jaffe, David and Sougnez, Carrie and Gabriel, Stacey and Meyerson, Matthew and Lander, Eric S and Getz, Gad :DOI: 10.1038/nbt.2514 :ISSUE: 3 :JOURNAL: Nature Biotechnology :LANGUAGE: en :MONTH: 3 :PAGES: 213--219 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/nbt.2514 :VOLUME: 31 :YEAR: 2013 :END: * Semi-automated assembly of high-quality diploid human reference genomes :PROPERTIES: :TITLE: Semi-automated assembly of high-quality diploid human reference genomes :BTYPE: article :CUSTOM_ID: Jarvis2022 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society<jats:sup>1,2</jats:sup>. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals<jats:sup>3,4</jats:sup>. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome<jats:sup>5</jats:sup>. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity<jats:sup>6</jats:sup>. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent\textendash{}child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within \pm{}1\% of the length of CHM13. Nearly 48\% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.</jats:p> :AUTHOR: Jarvis, Erich D. and Formenti, Giulio and Rhie, Arang and Guarracino, Andrea and Yang, Chentao and Wood, Jonathan and Tracey, Alan and Thibaud-Nissen, Francoise and Vollger, Mitchell R. and Porubsky, David and Cheng, Haoyu and Asri, Mobin and Logsdon, Glennis A. and Carnevali, Paolo and Chaisson, Mark J. P. and Chin, Chen-Shan and Cody, Sarah and Collins, Joanna and Ebert, Peter and Escalona, Merly and Fedrigo, Olivier and Fulton, Robert S. and Fulton, Lucinda L. and Garg, Shilpa and Gerton, Jennifer L. and Ghurye, Jay and Granat, Anastasiya and Green, Richard E. and Harvey, William and Hasenfeld, Patrick and Hastie, Alex and Haukness, Marina and Jaeger, Erich B. and Jain, Miten and Kirsche, Melanie and Kolmogorov, Mikhail and Korbel, Jan O. and Koren, Sergey and Korlach, Jonas and Lee, Joyce and Li, Daofeng and Lindsay, Tina and Lucas, Julian and Luo, Feng and Marschall, Tobias and Mitchell, Matthew W. and McDaniel, Jennifer and Nie, Fan and Olsen, Hugh E. and Olson, Nathan D. and Pesout, Trevor and Potapova, Tamara and Puiu, Daniela and Regier, Allison and Ruan, Jue and Salzberg, Steven L. and Sanders, Ashley D. and Schatz, Michael C. and Schmitt, Anthony and Schneider, Valerie A. and Selvaraj, Siddarth and Shafin, Kishwar and Shumate, Alaina and Stitziel, Nathan O. and Stober, Catherine and Torrance, James and Wagner, Justin and Wang, Jianxin and Wenger, Aaron and Xiao, Chuanle and Zimin, Aleksey V. and Zhang, Guojie and Wang, Ting and Li, Heng and Garrison, Erik and Haussler, David and Hall, Ira and Zook, Justin M. and Eichler, Evan E. and Phillippy, Adam M. and Paten, Benedict and Howe, Kerstin and Miga, Karen H. and None, None :DOI: 10.1038/s41586-022-05325-5 :ISSUE: 7936 :JOURNAL: Nature :LANGUAGE: en :MONTH: 11 :PAGES: 519--531 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41586-022-05325-5 :VOLUME: 611 :YEAR: 2022 :END: * The complete sequence of a human Y chromosome :PROPERTIES: :TITLE: The complete sequence of a human Y chromosome :BTYPE: article :CUSTOM_ID: Rhie2023 :AUTHOR: Rhie, Arang and Nurk, Sergey and Cechova, Monika and Hoyt, Savannah J. and Taylor, Dylan J. and Altemose, Nicolas and Hook, Paul W. and Koren, Sergey and Rautiainen, Mikko and Alexandrov, Ivan A. and Allen, Jamie and Asri, Mobin and Bzikadze, Andrey V. and Chen, Nae-Chyun and Chin, Chen-Shan and Diekhans, Mark and Flicek, Paul and Formenti, Giulio and Fungtammasan, Arkarachai and Garcia Giron, Carlos and Garrison, Erik and Gershman, Ariel and Gerton, Jennifer L. and Grady, Patrick G. S. and Guarracino, Andrea and Haggerty, Leanne and Halabian, Reza and Hansen, Nancy F. and Harris, Robert and Hartley, Gabrielle A. and Harvey, William T. and Haukness, Marina and Heinz, Jakob and Hourlier, Thibaut and Hubley, Robert M. and Hunt, Sarah E. and Hwang, Stephen and Jain, Miten and Kesharwani, Rupesh K. and Lewis, Alexandra P. and Li, Heng and Logsdon, Glennis A. and Lucas, Julian K. and Makalowski, Wojciech and Markovic, Christopher and Martin, Fergal J. and Mc Cartney, Ann M. and McCoy, Rajiv C. and McDaniel, Jennifer and McNulty, Brandy M. and Medvedev, Paul and Mikheenko, Alla and Munson, Katherine M. and Murphy, Terence D. and Olsen, Hugh E. and Olson, Nathan D. and Paulin, Luis F. and Porubsky, David and Potapova, Tamara and Ryabov, Fedor and Salzberg, Steven L. and Sauria, Michael E. G. and Sedlazeck, Fritz J. and Shafin, Kishwar and Shepelev, Valery A. and Shumate, Alaina and Storer, Jessica M. and Surapaneni, Likhitha and Taravella Oill, Angela M. and Thibaud-Nissen, Fran\c{c}oise and Timp, Winston and Tomaszkiewicz, Marta and Vollger, Mitchell R. and Walenz, Brian P. and Watwood, Allison C. and Weissensteiner, Matthias H. and Wenger, Aaron M. and Wilson, Melissa A. and Zarate, Samantha and Zhu, Yiming and Zook, Justin M. and Eichler, Evan E. and O'Neill, Rachel J. and Schatz, Michael C. and Miga, Karen H. and Makova, Kateryna D. and Phillippy, Adam M. :DOI: 10.1038/s41586-023-06457-y :ISSUE: 7978 :JOURNAL: Nature :LANGUAGE: en :MONTH: 9 :PAGES: 344--354 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41586-023-06457-y :VOLUME: 621 :YEAR: 2023 :END: * The Ensembl Variant Effect Predictor :PROPERTIES: :TITLE: The Ensembl Variant Effect Predictor :BTYPE: article :CUSTOM_ID: Mclaren :ABSTRACT: The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs. :AUTHOR: McLaren, William and Gil, Laurent and Hunt, Sarah E. and Riat, Harpreet Singh and Ritchie, Graham R. S. and Thormann, Anja and Flicek, Paul and Cunningham, Fiona :DATE: 2016-06-06 :DOI: 10.1186/s13059-016-0974-4 :ISSN: 1474-760X :ISSUE: 1 :JOURNAL: Genome Biology :KEYWORDS: Animal Genetics and Genomics :LANGUAGE: En :PUBLISHER: BioMed Central :URL: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0974-4 :VOLUME: 17 :END: * RefSeq: an update on mammalian reference sequences :PROPERTIES: :TITLE: RefSeq: an update on mammalian reference sequences :BTYPE: article :CUSTOM_ID: Pruitt :ABSTRACT: Abstract. The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and pro :AUTHOR: Pruitt, Kim D. and Brown, Garth R. and Hiatt, Susan M. and Thibaud-Nissen, Fran\c{c}oise and Astashyn, Alexander and Ermolaeva, Olga and Farrell, Catherine M. and Hart, Jennifer and Landrum, Melissa J. and McGarvey, Kelly M. and Murphy, Michael R. and O'Leary, Nuala A. and Pujar, Shashikant and Rajput, Bhanu and Rangwala, Sanjida H. and Riddick, Lillian D. and Shkeda, Andrei and Sun, Hanzhen and Tamez, Pamela and Tully, Raymond E. and Wallin, Craig and Webb, David and Weber, Janet and Wu, Wendy and DiCuccio, Michael and Kitts, Paul and Maglott, Donna R. and Murphy, Terence D. and Ostell, James M. :DOI: 10.1093/nar/gkt1114 :ISSN: 0305-1048 :ISSUE: D1 :JOURNAL: Nucleic Acids Research :PUBLISHER: Oxford Academic :URL: https://dx.doi.org/10.1093/nar/gkt1114 :VOLUME: 42 :END: * Reference standards for next-generation sequencing :PROPERTIES: :TITLE: Reference standards for next-generation sequencing :BTYPE: article :CUSTOM_ID: Hardwick2017 :AUTHOR: Hardwick, Simon A. and Deveson, Ira W. and Mercer, Tim R. :DOI: 10.1038/nrg.2017.44 :ISSUE: 8 :JOURNAL: Nature Reviews Genetics :LANGUAGE: en :MONTH: 8 :PAGES: 473--484 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/nrg.2017.44 :VOLUME: 18 :YEAR: 2017 :END: * Twelve years of SAMtools and BCFtools :PROPERTIES: :TITLE: Twelve years of SAMtools and BCFtools :BTYPE: article :CUSTOM_ID: Danecek2021 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:sec> <jats:title>Background</jats:title> <jats:p>SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. They include tools for file format conversion and manipulation, sorting, querying, statistics, variant calling, and effect analysis amongst other methods.</jats:p> </jats:sec> <jats:sec> <jats:title>Findings</jats:title> <jats:p>The first version appeared online 12 years ago and has been maintained and further developed ever since, with many new features and improvements added over the years. The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines.</jats:p> </jats:sec> <jats:sec> <jats:title>Conclusion</jats:title> <jats:p>Both SAMtools and BCFtools are freely available on GitHub under the permissive MIT licence, free for both non-commercial and commercial use. Both packages have been installed \&gt;1 million times via Bioconda. The source code and documentation are available from https://www.htslib.org.</jats:p> </jats:sec> :AUTHOR: Danecek, Petr and Bonfield, James K and Liddle, Jennifer and Marshall, John and Ohan, Valeriu and Pollard, Martin O and Whitwham, Andrew and Keane, Thomas and McCarthy, Shane A and Davies, Robert M and Li, Heng :DOI: 10.1093/gigascience/giab008 :ISSUE: 2 :JOURNAL: GigaScience :LANGUAGE: en :MONTH: 1 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/gigascience/giab008 :VOLUME: 10 :YEAR: 2021 :END: * sideRETRO: a pipeline for identifying somatic and polymorphic insertions of processed pseudogenes or retrocopies :PROPERTIES: :TITLE: sideRETRO: a pipeline for identifying somatic and polymorphic insertions of processed pseudogenes or retrocopies :BTYPE: article :CUSTOM_ID: Miller2021 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:sec> <jats:title>Motivation</jats:title> <jats:p>Retrocopies or processed pseudogenes are gene copies resulting from mRNA retrotransposition. These gene duplicates can be fixed, somatically inserted or polymorphic in the genome. However, knowledge regarding unfixed retrocopies (retroCNVs) is still limited, and the development of computational tools for effectively identifying and genotyping them is an urgent need.</jats:p> </jats:sec> <jats:sec> <jats:title>Results</jats:title> <jats:p>Here, we present sideRETRO, a pipeline dedicated not only to detecting retroCNVs in whole-genome or whole-exome sequencing data but also to revealing their insertion sites, zygosity and genomic context and classifying them as somatic or polymorphic events. We show that sideRETRO can identify novel retroCNVs and genotype them, in addition to finding polymorphic retroCNVs in whole-genome and whole-exome data. Therefore, sideRETRO fills a gap in the literature and presents an efficient and straightforward algorithm to accelerate the study of bona fide retroCNVs.</jats:p> </jats:sec> <jats:sec> <jats:title>Availability and implementation</jats:title> <jats:p>sideRETRO is available at https://github.com/galantelab/sideRETRO</jats:p> </jats:sec> <jats:sec> <jats:title>Supplementary information</jats:title> <jats:p>Supplementary data are available at Bioinformatics online.</jats:p> </jats:sec> :AUTHOR: Miller, Thiago L A and Orpinelli Rego, Fernanda and Buzzo, Jos\'{e} Leonel L and Galante, Pedro A F :DOI: 10.1093/bioinformatics/btaa689 :ISSUE: 3 :JOURNAL: Bioinformatics :LANGUAGE: en :MONTH: 4 :PAGES: 419--421 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/bioinformatics/btaa689 :VOLUME: 37 :YEAR: 2021 :END: * precisionFDA Truth Challenge V2: Calling variants from short- and long-reads in difficult-to-map regions :PROPERTIES: :TITLE: precisionFDA Truth Challenge V2: Calling variants from short- and long-reads in difficult-to-map regions :BTYPE: misc :CUSTOM_ID: Olson2020 :ABSTRACT: <jats:title>Summary</jats:title><jats:p>The precisionFDA Truth Challenge V2 aimed to assess the state-of-the-art of variant calling in difficult-to-map regions and the Major Histocompatibility Complex (MHC). Starting with FASTQ files, 20 challenge participants applied their variant calling pipelines and submitted 64 variant callsets for one or more sequencing technologies (\textasciitilde{}35X Illumina, \textasciitilde{}35X PacBio HiFi, and \textasciitilde{}50X Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with the new GIAB benchmark sets and genome stratifications. Challenge submissions included a number of innovative methods for all three technologies, with graph-based and machine-learning methods scoring best for short-read and long-read datasets, respectively. New methods out-performed the 2016 Truth Challenge winners, and new machine-learning approaches combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants.</jats:p> :AUTHOR: Olson, Nathan D. and Wagner, Justin and McDaniel, Jennifer and Stephens, Sarah H. and Westreich, Samuel T. and Prasanna, Anish G. and Johanson, Elaine and Boja, Emily and Maier, Ezekiel J. and Serang, Omar and J\'{a}spez, David and Lorenzo-Salazar, Jos\'{e} M. and Mu\ {n}oz-Barrera, Adri\'{a}n and Rubio-Rodr\'{\i}guez, Luis A. and Flores, Carlos and Kyriakidis, Konstantinos and Malousi, Andigoni and Shafin, Kishwar and Pesout, Trevor and Jain, Miten and Paten, Benedict and Chang, Pi-Chuan and Kolesnikov, Alexey and Nattestad, Maria and Baid, Gunjan and Goel, Sidharth and Yang, Howard and Carroll, Andrew and Eveleigh, Robert and Bourgey, Mathieu and Bourque, Guillaume and Li, Gen and ChouXian, MA and Tang, LinQi and YuanPing, DU and Zhang, ShaoWei and Morata, Jordi and Tonda, Ra\'{u}l and Parra, Gen\'{\i}s and Trotta, Jean-R\'{e}mi and Brueffer, Christian and Demirkaya-Budak, Sinem and Kabakci-Zorlu, Duygu and Turgut, Deniz and Kalay, \"{O}zem and Budak, Gungor and Narc\i{}, K\"{u}bra and Arslan, Elif and Brown, Richard and Johnson, Ivan J and Dolgoborodov, Alexey and Semenyuk, Vladimir and Jain, Amit and Tetikol, H. Serhat and Jain, Varun and Ruehle, Mike and Lajoie, Bryan and Roddey, Cooper and Catreux, Severine and Mehio, Rami and Ahsan, Mian Umair and Liu, Qian and Wang, Kai and Sahraeian, Sayed Mohammad Ebrahim and Fang, Li Tai and Mohiyuddin, Marghoob and Hung, Calvin and Jain, Chirag and Feng, Hanying and Li, Zhipan and Chen, Luoqi and Sedlazeck, Fritz J. and Zook, Justin M. :DOI: 10.1101/2020.11.13.380741 :MONTH: 11 :PUBLISHER: Cold Spring Harbor Laboratory :URL: http://dx.doi.org/10.1101/2020.11.13.380741 :YEAR: 2020 :END: * Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls :PROPERTIES: :TITLE: Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls :BTYPE: article :CUSTOM_ID: Zook2014 :AUTHOR: Zook, Justin M and Chapman, Brad and Wang, Jason and Mittelman, David and Hofmann, Oliver and Hide, Winston and Salit, Marc :DOI: 10.1038/nbt.2835 :ISSUE: 3 :JOURNAL: Nature Biotechnology :LANGUAGE: en :MONTH: 3 :PAGES: 246--251 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/nbt.2835 :VOLUME: 32 :YEAR: 2014 :END: * A complete reference genome improves analysis of human genetic variation :PROPERTIES: :TITLE: A complete reference genome improves analysis of human genetic variation :BTYPE: article :CUSTOM_ID: Aganezov2022 :ABSTRACT: <jats:p>Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 million base pairs of sequence, corrects thousands of structural errors, and unlocks the most complex regions of the human genome for clinical and functional study. We show how this reference universally improves read mapping and variant calling for 3202 and 17 globally diverse samples sequenced with short and long reads, respectively. We identify hundreds of thousands of variants per sample in previously unresolved regions, showcasing the promise of the T2T-CHM13 reference for evolutionary and biomedical discovery. Simultaneously, this reference eliminates tens of thousands of spurious variants per sample, including reduction of false positives in 269 medically relevant genes by up to a factor of 12. Because of these improvements in variant discovery coupled with population and functional genomic resources, T2T-CHM13 is positioned to replace GRCh38 as the prevailing reference for human genetics.</jats:p> :AUTHOR: Aganezov, Sergey and Yan, Stephanie M. and Soto, Daniela C. and Kirsche, Melanie and Zarate, Samantha and Avdeyev, Pavel and Taylor, Dylan J. and Shafin, Kishwar and Shumate, Alaina and Xiao, Chunlin and Wagner, Justin and McDaniel, Jennifer and Olson, Nathan D. and Sauria, Michael E. G. and Vollger, Mitchell R. and Rhie, Arang and Meredith, Melissa and Martin, Skylar and Lee, Joyce and Koren, Sergey and Rosenfeld, Jeffrey A. and Paten, Benedict and Layer, Ryan and Chin, Chen-Shan and Sedlazeck, Fritz J. and Hansen, Nancy F. and Miller, Danny E. and Phillippy, Adam M. and Miga, Karen H. and McCoy, Rajiv C. and Dennis, Megan Y. and Zook, Justin M. and Schatz, Michael C. :DOI: 10.1126/science.abl3533 :ISSUE: 6588 :JOURNAL: Science :LANGUAGE: en :MONTH: 4 :PUBLISHER: American Association for the Advancement of Science (AAAS) :URL: http://dx.doi.org/10.1126/science.abl3533 :VOLUME: 376 :YEAR: 2022 :END: * Nix: A Safe and Policy-Free System for Software Deployment :PROPERTIES: :TITLE: Nix: A Safe and Policy-Free System for Software Deployment :BTYPE: inbook :CUSTOM_ID: Dolstra2004 :AUTHOR: Eelco Dolstra and Merijn de Jonge and Eelco Visser :URL: https://edolstra.github.io/pubs/nspfssd-lisa2004-final.pdf :YEAR: 2004 :END: * An Extensive Sequence Dataset of Gold-Standard Samples for Benchmarking and Development :PROPERTIES: :TITLE: An Extensive Sequence Dataset of Gold-Standard Samples for Benchmarking and Development :BTYPE: misc :CUSTOM_ID: Baid2020 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>Accurate standards and extensive development datasets are the foundation of technical progress. To facilitate benchmarking and development, we sequence 9 samples, covering the Genome in a Bottle truth sets on multiple instruments (NovaSeq, HiSeqX, HiSeq4000, PacBio Sequel II System) and sample preparations (PCR-Free, PCR-Positive) for both whole genome and multiple exome kits. We benchmark pipelines, quantifying strengths and limitations for sequencing and analysis methods. We identify variability within and between instruments, preparation methods, and analytical pipelines, across various sequencing depths. We discuss the relevance of this variability to downstream analyses, and strategies to reduce variability.</jats:p> :AUTHOR: Baid, Gunjan and Nattestad, Maria and Kolesnikov, Alexey and Goel, Sidharth and Yang, Howard and Chang, Pi-Chuan and Carroll, Andrew :DOI: 10.1101/2020.12.11.422022 :MONTH: 12 :PUBLISHER: Cold Spring Harbor Laboratory :URL: http://dx.doi.org/10.1101/2020.12.11.422022 :YEAR: 2020 :END: * Best practices for benchmarking germline small-variant calls in human genomes :PROPERTIES: :TITLE: Best practices for benchmarking germline small-variant calls in human genomes :BTYPE: article :CUSTOM_ID: Krusche2019 :AUTHOR: Krusche, Peter and None, None and Trigg, Len and Boutros, Paul C. and Mason, Christopher E. and De La Vega, Francisco M. and Moore, Benjamin L. and Gonzalez-Porta, Mar and Eberle, Michael A. and Tezak, Zivana and Lababidi, Samir and Truty, Rebecca and Asimenos, George and Funke, Birgit and Fleharty, Mark and Chapman, Brad A. and Salit, Marc and Zook, Justin M. :DOI: 10.1038/s41587-019-0054-x :ISSUE: 5 :JOURNAL: Nature Biotechnology :LANGUAGE: en :MONTH: 5 :PAGES: 555--560 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41587-019-0054-x :VOLUME: 37 :YEAR: 2019 :END: * ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data :PROPERTIES: :TITLE: ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data :BTYPE: article :CUSTOM_ID: Wang2010 :ABSTRACT: Abstract. High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpo :AUTHOR: Wang, Kai and Li, Mingyao and Hakonarson, Hakon :DOI: 10.1093/nar/gkq603 :ISSN: 0305-1048 :ISSUE: 16 :JOURNAL: Nucleic Acids Research :PUBLISHER: Oxford Academic :URL: https://dx.doi.org/10.1093/nar/gkq603 :VOLUME: 38 :YEAR: 2010 :END: * Benchmarking variant callers in next-generation and third-generation sequencing analysis :PROPERTIES: :TITLE: Benchmarking variant callers in next-generation and third-generation sequencing analysis :BTYPE: article :CUSTOM_ID: Pei2021 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>DNA variants represent an important source of genetic variations among individuals. Next- generation sequencing (NGS) is the most popular technology for genome-wide variant calling. Third-generation sequencing (TGS) has also recently been used in genetic studies. Although many variant callers are available, no single caller can call both types of variants on NGS or TGS data with high sensitivity and specificity. In this study, we systematically evaluated 11 variant callers on 12 NGS and TGS datasets. For germline variant calling, we tested DNAseq and DNAscope modes from Sentieon, HaplotypeCaller mode from GATK and WGS mode from DeepVariant. All the four callers had comparable performance on NGS data and 30\texttimes{} coverage of WGS data was recommended. For germline variant calling on TGS data, we tested DNAseq mode from Sentieon, HaplotypeCaller mode from GATK and PACBIO mode from DeepVariant. All the three callers had similar performance in SNP calling, while DeepVariant outperformed the others in InDel calling. TGS detected more variants than NGS, particularly in complex and repetitive regions. For somatic variant calling on NGS, we tested TNscope and TNseq modes from Sentieon, MuTect2 mode from GATK, NeuSomatic, VarScan2, and Strelka2. TNscope and Mutect2 outperformed the other callers. A higher proportion of tumor sample purity (from 10 to 20\%) significantly increased the recall value of calling. Finally, computational costs of the callers were compared and Sentieon required the least computational cost. These results suggest that careful selection of a tool and parameters is needed for accurate SNP or InDel calling under different scenarios.</jats:p> :AUTHOR: Pei, Surui and Liu, Tao and Ren, Xue and Li, Weizhong and Chen, Chongjian and Xie, Zhi :DOI: 10.1093/bib/bbaa148 :ISSUE: 3 :JOURNAL: Briefings in Bioinformatics :LANGUAGE: en :MONTH: 5 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/bib/bbaa148 :VOLUME: 22 :YEAR: 2021 :END: * dbSNP: the NCBI database of genetic variation :PROPERTIES: :TITLE: dbSNP: the NCBI database of genetic variation :BTYPE: article :CUSTOM_ID: Sherry2001 :AUTHOR: Sherry, S. T. :DOI: 10.1093/nar/29.1.308 :ISSUE: 1 :JOURNAL: Nucleic Acids Research :MONTH: 1 :PAGES: 308--311 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/nar/29.1.308 :VOLUME: 29 :YEAR: 2001 :END: * GRIDSS2: comprehensive characterisation of somatic structural variation using single breakend variants and structural variant phasing :PROPERTIES: :TITLE: GRIDSS2: comprehensive characterisation of somatic structural variation using single breakend variants and structural variant phasing :BTYPE: article :CUSTOM_ID: Cameron2021 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>GRIDSS2 is the first structural variant caller to explicitly report single breakends\textemdash{}breakpoints in which only one side can be unambiguously determined. By treating single breakends as a fundamental genomic rearrangement signal on par with breakpoints, GRIDSS2 can explain 47\% of somatic centromere copy number changes using single breakends to non-centromere sequence. On a cohort of 3782 deeply sequenced metastatic cancers, GRIDSS2 achieves an unprecedented 3.1\% false negative rate and 3.3\% false discovery rate and identifies a novel 32\textendash{}100\hspace{0.167em}bp duplication signature. GRIDSS2 simplifies complex rearrangement interpretation through phasing of structural variants with 16\% of somatic calls phasable using paired-end sequencing.</jats:p> :AUTHOR: Cameron, Daniel L. and Baber, Jonathan and Shale, Charles and Valle-Inclan, Jose Espejo and Besselink, Nicolle and van Hoeck, Arne and Janssen, Roel and Cuppen, Edwin and Priestley, Peter and Papenfuss, Anthony T. :DOI: 10.1186/s13059-021-02423-x :ISSUE: 1 :JOURNAL: Genome Biology :LANGUAGE: en :MONTH: 12 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1186/s13059-021-02423-x :VOLUME: 22 :YEAR: 2021 :END: * Creation of an Expert Curated Variant List for Clinical Genomic Test Development and Validation :PROPERTIES: :TITLE: Creation of an Expert Curated Variant List for Clinical Genomic Test Development and Validation :BTYPE: article :CUSTOM_ID: Wilcox2021 :AUTHOR: Wilcox, Emma and Harrison, Steven M. and Lockhart, Edward and Voelkerding, Karl and Lubin, Ira M. and Rehm, Heidi L. and Kalman, Lisa V. and Funke, Birgit :DOI: 10.1016/j.jmoldx.2021.07.018 :ISSUE: 11 :JOURNAL: The Journal of Molecular Diagnostics :LANGUAGE: en :MONTH: 11 :PAGES: 1500--1505 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.jmoldx.2021.07.018 :VOLUME: 23 :YEAR: 2021 :END: * DELLY: structural variant discovery by integrated paired-end and split-read analysis :PROPERTIES: :TITLE: DELLY: structural variant discovery by integrated paired-end and split-read analysis :BTYPE: article :CUSTOM_ID: Rausch2012 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:p>Motivation: The discovery of genomic structural variants (SVs) at high sensitivity and specificity is an essential requirement for characterizing naturally occurring variation and for understanding pathological somatic rearrangements in personal genome sequencing data. Of particular interest are integrated methods that accurately identify simple and complex rearrangements in heterogeneous sequencing datasets at single-nucleotide resolution, as an optimal basis for investigating the formation mechanisms and functional consequences of SVs.</jats:p> <jats:p>Results: We have developed an SV discovery method, called DELLY, that integrates short insert paired-ends, long-range mate-pairs and split-read alignments to accurately delineate genomic rearrangements at single-nucleotide resolution. DELLY is suitable for detecting copy-number variable deletion and tandem duplication events as well as balanced rearrangements such as inversions or reciprocal translocations. DELLY, thus, enables to ascertain the full spectrum of genomic rearrangements, including complex events. On simulated data, DELLY compares favorably to other SV prediction methods across a wide range of sequencing parameters. On real data, DELLY reliably uncovers SVs from the 1000 Genomes Project and cancer genomes, and validation experiments of randomly selected deletion loci show a high specificity.</jats:p> <jats:p>Availability: DELLY is available at www.korbel.embl.de/software.html</jats:p> <jats:p>Contact: tobias.rausch@embl.de</jats:p> :AUTHOR: Rausch, Tobias and Zichner, Thomas and Schlattl, Andreas and St\"{u}tz, Adrian M. and Benes, Vladimir and Korbel, Jan O. :DOI: 10.1093/bioinformatics/bts378 :ISSUE: 18 :JOURNAL: Bioinformatics :LANGUAGE: en :MONTH: 9 :PAGES: i333--i339 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/bioinformatics/bts378 :VOLUME: 28 :YEAR: 2012 :END: * Les ph\'{e}notypes \'{e}rythrocytaires rares : un enjeu de sant\'{e} publique :PROPERTIES: :TITLE: Les ph\'{e}notypes \'{e}rythrocytaires rares : un enjeu de sant\'{e} publique :BTYPE: article :CUSTOM_ID: Peyrard2008 :AUTHOR: Peyrard, T. and Pham, B.-N. and Le Pennec, P.-Y. and Rouger, P. :DOI: 10.1016/j.tracli.2008.02.001 :ISSUE: 3 :JOURNAL: Transfusion Clinique et Biologique :LANGUAGE: fr :MONTH: 6 :PAGES: 109--119 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.tracli.2008.02.001 :VOLUME: 15 :YEAR: 2008 :END: * Predicting RNA splicing from DNA sequence using Pangolin :PROPERTIES: :TITLE: Predicting RNA splicing from DNA sequence using Pangolin :BTYPE: article :CUSTOM_ID: Zeng2022 :ABSTRACT: Recent progress in deep learning has greatly improved the prediction of RNA splicing from DNA sequence. Here, we present Pangolin, a deep learning model to predict splice site strength in multiple tissues. Pangolin outperforms state-of-the-art methods for predicting RNA splicing on a variety of prediction tasks. Pangolin improves prediction of the impact of genetic variants on RNA splicing, including common, rare, and lineage-specific genetic variation. In addition, Pangolin identifies loss-of-function mutations with high accuracy and recall, particularly for mutations that are not missense or nonsense, demonstrating remarkable potential for identifying pathogenic variants. :AUTHOR: Zeng, Tony and Li, Yang I :DATE: 2022-04-21 :DOI: 10.1186/s13059-022-02664-4 :ISSN: 1474-760X :ISSUE: 1 :JOURNAL: Genome Biology :KEYWORDS: Animal Genetics and Genomics :LANGUAGE: En :PUBLISHER: BioMed Central :URL: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02664-4 :VOLUME: 23 :YEAR: 2022 :END: * From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline :PROPERTIES: :TITLE: From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline :BTYPE: article :CUSTOM_ID: VanDerAuwera2013 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>This unit describes how to use BWA and the Genome Analysis Toolkit (GATK) to map genome sequencing data to a reference and produce high-quality variant calls that can be used in downstream analyses. The complete workflow includes the core NGS data-processing steps that are necessary to make the raw data suitable for analysis by the GATK, as well as the key methods involved in variant discovery using the GATK. <jats:italic>Curr. Protoc. Bioinform</jats:italic>. 43:11.10.1-11.10.33. \textcopyright{} 2013 by John Wiley \& Sons, Inc.</jats:p> :AUTHOR: Van der Auwera, Geraldine A. and Carneiro, Mauricio O. and Hartl, Christopher and Poplin, Ryan and del Angel, Guillermo and Levy-Moonshine, Ami and Jordan, Tadeusz and Shakir, Khalid and Roazen, David and Thibault, Joel and Banks, Eric and Garimella, Kiran V. and Altshuler, David and Gabriel, Stacey and DePristo, Mark A. :DOI: 10.1002/0471250953.bi1110s43 :ISSUE: 1 :JOURNAL: Current Protocols in Bioinformatics :LANGUAGE: en :MONTH: 10 :PUBLISHER: Wiley :URL: http://dx.doi.org/10.1002/0471250953.bi1110s43 :VOLUME: 43 :YEAR: 2013 :END: * ClinVar: public archive of interpretations of clinically relevant variants :PROPERTIES: :TITLE: ClinVar: public archive of interpretations of clinically relevant variants :BTYPE: article :CUSTOM_ID: Landrum2016 :AUTHOR: Landrum, Melissa J. and Lee, Jennifer M. and Benson, Mark and Brown, Garth and Chao, Chen and Chitipiralla, Shanmuga and Gu, Baoshan and Hart, Jennifer and Hoffman, Douglas and Hoover, Jeffrey and Jang, Wonhee and Katz, Kenneth and Ovetsky, Michael and Riley, George and Sethi, Amanjeev and Tully, Ray and Villamarin-Salomon, Ricardo and Rubinstein, Wendy and Maglott, Donna R. :DOI: 10.1093/nar/gkv1222 :ISSUE: D1 :JOURNAL: Nucleic Acids Research :LANGUAGE: en :MONTH: 1 :PAGES: D862--D868 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/nar/gkv1222 :VOLUME: 44 :YEAR: 2016 :END: * Guide technique d\textbackslash{}'accr\'{e}ditation de la technologie de s\'{e}quen\c{c}age \`{a} haut d\'{e}bit :PROPERTIES: :TITLE: Guide technique d\textbackslash{}'accr\'{e}ditation de la technologie de s\'{e}quen\c{c}age \`{a} haut d\'{e}bit :BTYPE: article :CUSTOM_ID: CofracSHGTA16 :AUTHOR: COFRAC :URLDATE: 2024-01-13-19:59:59 :YEAR: 2019 :END: * Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery :PROPERTIES: :TITLE: Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery :BTYPE: article :CUSTOM_ID: Barbitoff2022 :ABSTRACT: <jats:title>Abstract</jats:title><jats:sec><jats:title>Background</jats:title><jats:p>Accurate variant detection in the coding regions of the human genome is a key requirement for molecular diagnostics of Mendelian disorders. Efficiency of variant discovery from next-generation sequencing (NGS) data depends on multiple factors, including reproducible coverage biases of NGS methods and the performance of read alignment and variant calling software. Although variant caller benchmarks are published constantly, no previous publications have leveraged the full extent of available gold standard whole-genome (WGS) and whole-exome (WES) sequencing datasets.</jats:p></jats:sec><jats:sec><jats:title>Results</jats:title><jats:p>In this work, we systematically evaluated the performance of 4 popular short read aligners (Bowtie2, BWA, Isaac, and Novoalign) and 9 novel and well-established variant calling and filtering methods (Clair3, DeepVariant, Octopus, GATK, FreeBayes, and Strelka2) using a set of 14 \textquotedblleft{}gold standard\textquotedblright{} WES and WGS datasets available from Genome In A Bottle (GIAB) consortium. Additionally, we have indirectly evaluated each pipeline's performance using a set of 6 non-GIAB samples of African and Russian ethnicity. In our benchmark, Bowtie2 performed significantly worse than other aligners, suggesting it should not be used for medical variant calling. When other aligners were considered, the accuracy of variant discovery mostly depended on the variant caller and not the read aligner. Among the tested variant callers, DeepVariant consistently showed the best performance and the highest robustness. Other actively developed tools, such as Clair3, Octopus, and Strelka2, also performed well, although their efficiency had greater dependence on the quality and type of the input data. We have also compared the consistency of variant calls in GIAB and non-GIAB samples. With few important caveats, best-performing tools have shown little evidence of overfitting.</jats:p></jats:sec><jats:sec><jats:title>Conclusions</jats:title><jats:p>The results show surprisingly large differences in the performance of cutting-edge tools even in high confidence regions of the coding genome. This highlights the importance of regular benchmarking of quickly evolving tools and pipelines. We also discuss the need for a more diverse set of gold standard genomes that would include samples of African, Hispanic, or mixed ancestry. Additionally, there is also a need for better variant caller assessment in the repetitive regions of the coding genome.</jats:p></jats:sec> :AUTHOR: Barbitoff, Yury A. and Abasov, Ruslan and Tvorogova, Varvara E. and Glotov, Andrey S. and Predeus, Alexander V. :DOI: 10.1186/s12864-022-08365-3 :ISSUE: 1 :JOURNAL: BMC Genomics :LANGUAGE: en :MONTH: 12 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1186/s12864-022-08365-3 :VOLUME: 23 :YEAR: 2022 :END: * Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data :PROPERTIES: :TITLE: Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data :BTYPE: article :CUSTOM_ID: Kumaran2019 :AUTHOR: Kumaran, Manojkumar and Subramanian, Umadevi and Devarajan, Bharanidharan :DOI: 10.1186/s12859-019-2928-9 :ISSUE: 1 :JOURNAL: BMC Bioinformatics :LANGUAGE: en :MONTH: 12 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1186/s12859-019-2928-9 :VOLUME: 20 :YEAR: 2019 :END: * Benchmarking short sequence mapping tools :PROPERTIES: :TITLE: Benchmarking short sequence mapping tools :BTYPE: article :CUSTOM_ID: Hatem2013 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:sec> <jats:title>Background</jats:title> <jats:p>The development of next-generation sequencing instruments has led to the generation of millions of short sequences in a single run. The process of aligning these reads to a reference genome is time consuming and demands the development of fast and accurate alignment tools. However, the current proposed tools make different compromises between the accuracy and the speed of mapping. Moreover, many important aspects are overlooked while comparing the performance of a newly developed tool to the state of the art. Therefore, there is a need for an objective evaluation method that covers all the aspects. In this work, we introduce a benchmarking suite to extensively analyze sequencing tools with respect to various aspects and provide an objective comparison.</jats:p> </jats:sec> <jats:sec> <jats:title>Results</jats:title> <jats:p>We applied our benchmarking tests on 9 well known mapping tools, namely, Bowtie, Bowtie2, BWA, SOAP2, MAQ, RMAP, GSNAP, Novoalign, and mrsFAST (mrFAST) using synthetic data and real RNA-Seq data. MAQ and RMAP are based on building hash tables for the reads, whereas the remaining tools are based on indexing the reference genome. The benchmarking tests reveal the strengths and weaknesses of each tool. The results show that no single tool outperforms all others in all metrics. However, Bowtie maintained the best throughput for most of the tests while BWA performed better for longer read lengths. The benchmarking tests are not restricted to the mentioned tools and can be further applied to others.</jats:p> </jats:sec> <jats:sec> <jats:title>Conclusion</jats:title> <jats:p>The mapping process is still a hard problem that is affected by many factors. In this work, we provided a benchmarking suite that reveals and evaluates the different factors affecting the mapping process. Still, there is no tool that outperforms all of the others in all the tests. Therefore, the end user should clearly specify his needs in order to choose the tool that provides the best results.</jats:p> </jats:sec> :AUTHOR: Hatem, Ayat and Bozda\u{g}, Doruk and Toland, Amanda E and \c{C}ataly\"{u}rek, \"{U}mit V :DOI: 10.1186/1471-2105-14-184 :ISSUE: 1 :JOURNAL: BMC Bioinformatics :LANGUAGE: en :MONTH: 12 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1186/1471-2105-14-184 :VOLUME: 14 :YEAR: 2013 :END: * The use of next-generation sequencing for the determination of rare blood group genotypes :PROPERTIES: :TITLE: The use of next-generation sequencing for the determination of rare blood group genotypes :BTYPE: article :CUSTOM_ID: Jakobsen2019 :ABSTRACT: <jats:title>SUMMARY</jats:title><jats:sec><jats:title>Objectives</jats:title><jats:p>Next-generation sequencing (NGS) for the determination of rare blood group genotypes was tested in 72 individuals from different ethnicities.</jats:p></jats:sec><jats:sec><jats:title>Background</jats:title><jats:p>Traditional serological-based antigen detection methods, as well as genotyping based on specific single nucleotide polymorphisms (SNPs) or single nucleotide variants (SNVs), are limited to detecting only a limited number of known antigens or alleles. NGS methods do not have this limitation.</jats:p></jats:sec><jats:sec><jats:title>Methods</jats:title><jats:p>NGS using Ion torrent Personal Genome Machine (PGM) was performed with a customised Ampliseq panel targeting 15 different blood group systems on 72 blood donors of various ethnicities (Caucasian, Hispanic, Asian, Middle Eastern and Black).</jats:p></jats:sec><jats:sec><jats:title>Results</jats:title><jats:p>Blood group genotypes for 70 of 72 samples could be obtained for 15 blood group systems in one step using the NGS assay and, for common SNPs, are consistent with previously determined genotypes using commercial SNP assays. However, particularly for the Kidd, Duffy and Lutheran blood group systems, several SNVs were detected by the NGS assay that revealed additional coding information compared to other methods. Furthermore, the NGS assay allowed for the detection of genotypes related to VEL, Knops, Gerbich, Globoside, P1PK and Landsteiner-Wiener blood group systems.</jats:p></jats:sec><jats:sec><jats:title>Conclusions</jats:title><jats:p>The NGS assay enables a comprehensive genotype analysis of many blood group systems and is capable of detecting common and rare alleles, including alleles not currently detected by commercial assays.</jats:p></jats:sec> :AUTHOR: Jakobsen, M. A. and Dellgren, C. and Sheppard, C. and Yazer, M. and Sprog\o{}e, U. :DOI: 10.1111/tme.12496 :ISSUE: 3 :JOURNAL: Transfusion Medicine :LANGUAGE: en :MONTH: 6 :PAGES: 162--168 :PUBLISHER: Wiley :URL: http://dx.doi.org/10.1111/tme.12496 :VOLUME: 29 :YEAR: 2019 :END: * The complete sequence of a human genome :PROPERTIES: :TITLE: The complete sequence of a human genome :BTYPE: article :CUSTOM_ID: Nurk2022 :ABSTRACT: <jats:p>Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8\% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion\textendash{}base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.</jats:p> :AUTHOR: Nurk, Sergey and Koren, Sergey and Rhie, Arang and Rautiainen, Mikko and Bzikadze, Andrey V. and Mikheenko, Alla and Vollger, Mitchell R. and Altemose, Nicolas and Uralsky, Lev and Gershman, Ariel and Aganezov, Sergey and Hoyt, Savannah J. and Diekhans, Mark and Logsdon, Glennis A. and Alonge, Michael and Antonarakis, Stylianos E. and Borchers, Matthew and Bouffard, Gerard G. and Brooks, Shelise Y. and Caldas, Gina V. and Chen, Nae-Chyun and Cheng, Haoyu and Chin, Chen-Shan and Chow, William and de Lima, Leonardo G. and Dishuck, Philip C. and Durbin, Richard and Dvorkina, Tatiana and Fiddes, Ian T. and Formenti, Giulio and Fulton, Robert S. and Fungtammasan, Arkarachai and Garrison, Erik and Grady, Patrick G. S. and Graves-Lindsay, Tina A. and Hall, Ira M. and Hansen, Nancy F. and Hartley, Gabrielle A. and Haukness, Marina and Howe, Kerstin and Hunkapiller, Michael W. and Jain, Chirag and Jain, Miten and Jarvis, Erich D. and Kerpedjiev, Peter and Kirsche, Melanie and Kolmogorov, Mikhail and Korlach, Jonas and Kremitzki, Milinn and Li, Heng and Maduro, Valerie V. and Marschall, Tobias and McCartney, Ann M. and McDaniel, Jennifer and Miller, Danny E. and Mullikin, James C. and Myers, Eugene W. and Olson, Nathan D. and Paten, Benedict and Peluso, Paul and Pevzner, Pavel A. and Porubsky, David and Potapova, Tamara and Rogaev, Evgeny I. and Rosenfeld, Jeffrey A. and Salzberg, Steven L. and Schneider, Valerie A. and Sedlazeck, Fritz J. and Shafin, Kishwar and Shew, Colin J. and Shumate, Alaina and Sims, Ying and Smit, Arian F. A. and Soto, Daniela C. and Sovi\'{c}, Ivan and Storer, Jessica M. and Streets, Aaron and Sullivan, Beth A. and Thibaud-Nissen, Fran\c{c}oise and Torrance, James and Wagner, Justin and Walenz, Brian P. and Wenger, Aaron and Wood, Jonathan M. D. and Xiao, Chunlin and Yan, Stephanie M. and Young, Alice C. and Zarate, Samantha and Surti, Urvashi and McCoy, Rajiv C. and Dennis, Megan Y. and Alexandrov, Ivan A. and Gerton, Jennifer L. and O'Neill, Rachel J. and Timp, Winston and Zook, Justin M. and Schatz, Michael C. and Eichler, Evan E. and Miga, Karen H. and Phillippy, Adam M. :DOI: 10.1126/science.abj6987 :ISSUE: 6588 :JOURNAL: Science :LANGUAGE: en :MONTH: 4 :PAGES: 44--53 :PUBLISHER: American Association for the Advancement of Science (AAAS) :URL: http://dx.doi.org/10.1126/science.abj6987 :VOLUME: 376 :YEAR: 2022 :END: * A variant by any name: quantifying annotation discordance across tools and clinical databases :PROPERTIES: :TITLE: A variant by any name: quantifying annotation discordance across tools and clinical databases :BTYPE: article :CUSTOM_ID: Yen2017 :ABSTRACT: Clinical genomic testing is dependent on the robust identification and reporting of variant-level information in relation to disease. With the shift to high-throughput sequencing, a major challenge for clinical diagnostics is the cross-identification of variants called on their genomic position to resources that rely on transcript- or protein-based descriptions. We evaluated the accuracy of three tools (SnpEff, Variant Effect Predictor, and Variation Reporter) that generate transcript and protein-based variant nomenclature from genomic coordinates according to guidelines by the Human Genome Variation Society (HGVS). Our evaluation was based on transcript-controlled comparisons to a manually curated set of 126 test variants of various types drawn from data sources, each with HGVS-compliant transcript and protein descriptors. We further evaluated the concordance between annotations generated by Snpeff and Variant Effect Predictor and those in major germline and cancer databases: ClinVar and COSMIC, respectively. We find that there is substantial discordance between the annotation tools and databases in the description of insertions and/or deletions. Using our ground truth set of variants, constructed specifically to identify challenging events, accuracy was between 80 and 90\% for coding and 50 and 70\% for protein changes for 114 to 126 variants. Exact concordance for SNV syntax was over 99.5\% between ClinVar and Variant Effect Predictor and SnpEff, but less than 90\% for non-SNV variants. For COSMIC, exact concordance for coding and protein SNVs was between 65 and 88\% and less than 15\% for insertions. Across the tools and datasets, there was a wide range of different but equivalent expressions describing protein variants. Our results reveal significant inconsistency in variant representation across tools and databases. While some of these syntax differences may be clear to a clinician, they can confound variant matching, an important step in variant classification. These results highlight the urgent need for the adoption and adherence to uniform standards in variant annotation, with consistent reporting on the genomic reference, to enable accurate and efficient data-driven clinical care. :AUTHOR: Yen, Jennifer L. and Garcia, Sarah and Montana, Aldrin and Harris, Jason and Chervitz, Stephen and Morra, Massimo and West, John and Chen, Richard and Church, Deanna M. :DATE: 2017-01-26 :DOI: 10.1186/s13073-016-0396-7 :ISSN: 1756-994X :ISSUE: 1 :JOURNAL: Genome Medicine :KEYWORDS: Human Genetics :LANGUAGE: En :PUBLISHER: BioMed Central :URL: https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-016-0396-7 :VOLUME: 9 :YEAR: 2017 :END: * Toward practical transparent verifiable and long-term reproducible research using Guix :PROPERTIES: :TITLE: Toward practical transparent verifiable and long-term reproducible research using Guix :BTYPE: article :CUSTOM_ID: Vallet2022 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>Reproducibility crisis urge scientists to promote transparency which allows peers to draw same conclusions after performing identical steps from hypothesis to results. Growing resources are developed to open the access to methods, data and source codes. Still, the computational environment, an interface between data and source code running analyses, is not addressed. Environments are usually described with software and library names associated with version labels or provided as an opaque container image. This is not enough to describe the complexity of the dependencies on which they rely to operate on. We describe this issue and illustrate how open tools like Guix can be used by any scientist to share their environment and allow peers to reproduce it. Some steps of research might not be fully reproducible, but at least, transparency for computation is technically addressable. These tools should be considered by scientists willing to promote transparency and open science.</jats:p> :AUTHOR: Vallet, Nicolas and Michonneau, David and Tournier, Simon :DOI: 10.1038/s41597-022-01720-9 :ISSUE: 1 :JOURNAL: Scientific Data :LANGUAGE: en :MONTH: 10 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41597-022-01720-9 :VOLUME: 9 :YEAR: 2022 :END: * Technology dictates algorithms: recent developments in read alignment :PROPERTIES: :TITLE: Technology dictates algorithms: recent developments in read alignment :BTYPE: article :CUSTOM_ID: Alser2021 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>Aligning sequencing reads onto a reference is an essential step of the majority of genomic analysis pipelines. Computational algorithms for read alignment have evolved in accordance with technological advances, leading to today's diverse array of alignment methods. We provide a systematic survey of algorithmic foundations and methodologies across 107 alignment methods, for both short and long reads. We provide a rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read alignment. We discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology.</jats:p> :AUTHOR: Alser, Mohammed and Rotman, Jeremy and Deshpande, Dhrithi and Taraszka, Kodi and Shi, Huwenbo and Baykal, Pelin Icer and Yang, Harry Taegyun and Xue, Victor and Knyazev, Sergey and Singer, Benjamin D. and Balliu, Brunilda and Koslicki, David and Skums, Pavel and Zelikovsky, Alex and Alkan, Can and Mutlu, Onur and Mangul, Serghei :DOI: 10.1186/s13059-021-02443-7 :ISSUE: 1 :JOURNAL: Genome Biology :LANGUAGE: en :MONTH: 12 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1186/s13059-021-02443-7 :VOLUME: 22 :YEAR: 2021 :END: * Qualification des solution bioinformatiques: note technique :PROPERTIES: :TITLE: Qualification des solution bioinformatiques: note technique :BTYPE: article :CUSTOM_ID: ngsdiag2019 :AUTHOR: NGS-Diag :YEAR: 2019 :END: * The Sequence Alignment/Map format and SAMtools :PROPERTIES: :TITLE: The Sequence Alignment/Map format and SAMtools :BTYPE: article :CUSTOM_ID: Li2009 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:p>Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.</jats:p> <jats:p>Availability: http://samtools.sourceforge.net</jats:p> <jats:p>Contact: rd@sanger.ac.uk</jats:p> :AUTHOR: Li, Heng and Handsaker, Bob and Wysoker, Alec and Fennell, Tim and Ruan, Jue and Homer, Nils and Marth, Gabor and Abecasis, Goncalo and Durbin, Richard and None, None :DOI: 10.1093/bioinformatics/btp352 :ISSUE: 16 :JOURNAL: Bioinformatics :LANGUAGE: en :MONTH: 8 :PAGES: 2078--2079 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/bioinformatics/btp352 :VOLUME: 25 :YEAR: 2009 :END: * Meta-analysis of the diagnostic and clinical utility of genome and exome sequencing and chromosomal microarray in children with suspected genetic diseases :PROPERTIES: :TITLE: Meta-analysis of the diagnostic and clinical utility of genome and exome sequencing and chromosomal microarray in children with suspected genetic diseases :BTYPE: article :CUSTOM_ID: Clark2018 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>Genetic diseases are leading causes of childhood mortality. Whole-genome sequencing (WGS) and whole-exome sequencing (WES) are relatively new methods for diagnosing genetic diseases, whereas chromosomal microarray (CMA) is well established. Here we compared the diagnostic utility (rate of causative, pathogenic, or likely pathogenic genotypes in known disease genes) and clinical utility (proportion in whom medical or surgical management was changed by diagnosis) of WGS, WES, and CMA in children with suspected genetic diseases by systematic review of the literature (January 2011\textendash{}August 2017) and meta-analysis, following MOOSE/PRISMA guidelines. In 37 studies, comprising 20,068 children, diagnostic utility of WGS (0.41, 95\% CI 0.34\textendash{}0.48, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}44\%) and WES (0.36, 95\% CI 0.33\textendash{}0.40, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}83\%) were qualitatively greater than CMA (0.10, 95\% CI 0.08\textendash{}0.12, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}81\%). Among studies published in 2017, the diagnostic utility of WGS was significantly greater than CMA (<jats:italic>P</jats:italic>\hspace{0.167em}\<\hspace{0.167em}0.0001, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}13\% and <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}40\%, respectively). Among studies featuring within-cohort comparisons, the diagnostic utility of WES was significantly greater than CMA (<jats:italic>P</jats:italic>\hspace{0.167em}\<\hspace{0.167em}0.001, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}36\%). The diagnostic utility of WGS and WES were not significantly different. In studies featuring within-cohort comparisons of WGS/WES, the likelihood of diagnosis was significantly greater for trios than singletons (odds ratio 2.04, 95\% CI 1.62\textendash{}2.56, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}12\%; <jats:italic>P</jats:italic>\hspace{0.167em}\<\hspace{0.167em}0.0001). Diagnostic utility of WGS/WES with hospital-based interpretation (0.42, 95\% CI 0.38\textendash{}0.45, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}48\%) was qualitatively higher than that of reference laboratories (0.29, 95\% CI 0.27\textendash{}0.31, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}49\%); this difference was significant among studies published in 2017 (<jats:italic>P</jats:italic>\hspace{0.167em}\<\hspace{0.167em}.0001, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}22\% and <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}26\%, respectively). The clinical utility of WGS (0.27, 95\% CI 0.17\textendash{}0.40, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}54\%) and WES (0.17, 95\% CI 0.12\textendash{}0.24, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}76\%) were higher than CMA (0.06, 95\% CI 0.05\textendash{}0.07, <jats:italic>I</jats:italic><jats:sup>2</jats:sup>\hspace{0.167em}=\hspace{0.167em}42\%); this difference was significant for WGS vs CMA (<jats:italic>P</jats:italic>\hspace{0.167em}\<\hspace{0.167em}0.0001). In conclusion, in children with suspected genetic diseases, the diagnostic and clinical utility of WGS/WES were greater than CMA. Subgroups with higher WGS/WES diagnostic utility were trios and those receiving hospital-based interpretation. WGS/WES should be considered a first-line genomic test for children with suspected genetic diseases.</jats:p> :AUTHOR: Clark, Michelle M. and Stark, Zornitza and Farnaes, Lauge and Tan, Tiong Y. and White, Susan M. and Dimmock, David and Kingsmore, Stephen F. :DOI: 10.1038/s41525-018-0053-8 :ISSUE: 1 :JOURNAL: npj Genomic Medicine :LANGUAGE: en :MONTH: 7 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41525-018-0053-8 :VOLUME: 3 :YEAR: 2018 :END: * Diagnostic Impact and Cost-effectiveness of Whole-Exome Sequencing for Ambulant Children With Suspected Monogenic Conditions :PROPERTIES: :TITLE: Diagnostic Impact and Cost-effectiveness of Whole-Exome Sequencing for Ambulant Children With Suspected Monogenic Conditions :BTYPE: article :CUSTOM_ID: Tan2017 :AUTHOR: Tan, Tiong Yang and Dillon, Oliver James and Stark, Zornitza and Schofield, Deborah and Alam, Khurshid and Shrestha, Rupendra and Chong, Belinda and Phelan, Dean and Brett, Gemma R. and Creed, Emma and Jarmolowicz, Anna and Yap, Patrick and Walsh, Maie and Downie, Lilian and Amor, David J. and Savarirayan, Ravi and McGillivray, George and Yeung, Alison and Peters, Heidi and Robertson, Susan J. and Robinson, Aaron J. and Macciocca, Ivan and Sadedin, Simon and Bell, Katrina and Oshlack, Alicia and Georgeson, Peter and Thorne, Natalie and Gaff, Clara and White, Susan M. :DOI: 10.1001/jamapediatrics.2017.1755 :ISSUE: 9 :JOURNAL: JAMA Pediatrics :LANGUAGE: en :MONTH: 9 :PAGES: 855 :PUBLISHER: American Medical Association (AMA) :URL: http://dx.doi.org/10.1001/jamapediatrics.2017.1755 :VOLUME: 171 :YEAR: 2017 :END: * Recommendations for next generation sequencing data reanalysis of unsolved cases with suspected Mendelian disorders: A systematic review and meta-analysis :PROPERTIES: :TITLE: Recommendations for next generation sequencing data reanalysis of unsolved cases with suspected Mendelian disorders: A systematic review and meta-analysis :BTYPE: article :CUSTOM_ID: Dai2022 :AUTHOR: Dai, Pei and Honda, Andrew and Ewans, Lisa and McGaughran, Julie and Burnett, Leslie and Law, Matthew and Phan, Tri Giang :DOI: 10.1016/j.gim.2022.04.021 :ISSUE: 8 :JOURNAL: Genetics in Medicine :LANGUAGE: en :MONTH: 8 :PAGES: 1618--1629 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.gim.2022.04.021 :VOLUME: 24 :YEAR: 2022 :END: * Combining globally search for a regular expression and print matching lines with bibliographic monitoring of genomic database improves diagnosis :PROPERTIES: :TITLE: Combining globally search for a regular expression and print matching lines with bibliographic monitoring of genomic database improves diagnosis :BTYPE: article :CUSTOM_ID: TranMauThem2023 :ABSTRACT: <jats:p><jats:bold>Introduction:</jats:bold> Exome sequencing has a diagnostic yield ranging from 25\% to 70\% in rare diseases and regularly implicates genes in novel disorders. Retrospective data reanalysis has demonstrated strong efficacy in improving diagnosis, but poses organizational difficulties for clinical laboratories.</jats:p><jats:p><jats:bold>Patients and methods:</jats:bold> We applied a reanalysis strategy based on intensive prospective bibliographic monitoring along with direct application of the GREP command-line tool (to \textquotedblleft{}globally search for a regular expression and print matching lines\textquotedblright{}) in a large ES database. For 18 months, we submitted the same five keywords of interest [(<jats:italic>intellectual disability</jats:italic>, (<jats:italic>neuro</jats:italic>)<jats:italic>developmental delay</jats:italic>, and (<jats:italic>neuro</jats:italic>)<jats:italic>developmental disorder</jats:italic>)] to PubMed on a daily basis to identify recently published novel disease\textendash{}gene associations or new phenotypes in genes already implicated in human pathology. We used the Linux GREP tool and an in-house script to collect all variants of these genes from our 5,459 exome database.</jats:p><jats:p><jats:bold>Results:</jats:bold> After GREP queries and variant filtration, we identified 128 genes of interest and collected 56 candidate variants from 53 individuals. We confirmed causal diagnosis for 19/128 genes (15\%) in 21 individuals and identified variants of unknown significance for 19/128 genes (15\%) in 23 individuals. Altogether, GREP queries for only 128 genes over a period of 18 months permitted a causal diagnosis to be established in 21/2875 undiagnosed affected probands (0.7\%).</jats:p><jats:p><jats:bold>Conclusion:</jats:bold> The GREP query strategy is efficient and less tedious than complete periodic reanalysis. It is an interesting reanalysis strategy to improve diagnosis.</jats:p> :AUTHOR: Tran Mau-Them, Fr\'{e}d\'{e}ric and Overs, Alexis and Bruel, Ange-Line and Duquet, Romain and Thareau, Mylene and Denomm\'{e}-Pichon, Anne-Sophie and Vitobello, Antonio and Sorlin, Arthur and Safraou, Hana and Nambot, Sophie and Delanne, Julian and Moutton, Sebastien and Racine, Caroline and Engel, Camille and De Giraud d'Agay, Melchior and Lehalle, Daphne and Goldenberg, Alice and Willems, Marjolaine and Coubes, Christine and Genevieve, David and Verloes, Alain and Capri, Yline and Perrin, Laurence and Jacquemont, Marie-Line and Lambert, Laetitia and Lacaze, Elodie and Thevenon, Julien and Hana, Nadine and Van-Gils, Julien and Dubucs, Charlotte and Bizaoui, Varoona and Gerard-Blanluet, Marion and Lespinasse, James and Mercier, Sandra and Guerrot, Anne-Marie and Maystadt, Isabelle and Tisserant, Emilie and Faivre, Laurence and Philippe, Christophe and Duffourd, Yannis and Thauvin-Robinet, Christel :DOI: 10.3389/fgene.2023.1122985 :JOURNAL: Frontiers in Genetics :MONTH: 4 :PUBLISHER: Frontiers Media SA :URL: http://dx.doi.org/10.3389/fgene.2023.1122985 :VOLUME: 14 :YEAR: 2023 :END: * SimuSCoP: reliably simulate Illumina sequencing data based on position and context dependent profiles :PROPERTIES: :TITLE: SimuSCoP: reliably simulate Illumina sequencing data based on position and context dependent profiles :BTYPE: article :CUSTOM_ID: Yu2020 :ABSTRACT: <jats:title>Abstract</jats:title><jats:sec> <jats:title>Background</jats:title> <jats:p>A number of simulators have been developed for emulating next-generation sequencing data by incorporating known errors such as base substitutions and indels. However, their practicality may be degraded by functional and runtime limitations. Particularly, the positional and genomic contextual information is not effectively utilized for reliably characterizing base substitution patterns, as well as the positional and contextual difference of Phred quality scores is not fully investigated. Thus, a more effective and efficient bioinformatics tool is sorely required.</jats:p> </jats:sec><jats:sec> <jats:title>Results</jats:title> <jats:p>Here, we introduce a novel tool, SimuSCoP, to reliably emulate complex DNA sequencing data. The base substitution patterns and the statistical behavior of quality scores in Illumina sequencing data are fully explored and integrated into the simulation model for reliably emulating datasets for different applications. In addition, an integrated and easy-to-use pipeline is employed in SimuSCoP to facilitate end-to-end simulation of complex samples, and high runtime efficiency is achieved by implementing the tool to run in multithreading with low memory consumption. These features enable SimuSCoP to gets substantial improvements in reliability, functionality, practicality and runtime efficiency. The tool is comprehensively evaluated in multiple aspects including consistency of profiles, simulation of genomic variations and complex tumor samples, and the results demonstrate the advantages of SimuSCoP over existing tools.</jats:p> </jats:sec><jats:sec> <jats:title>Conclusions</jats:title> <jats:p>SimuSCoP, a new bioinformatics tool is developed to learn informative profiles from real sequencing data and reliably mimic complex data by introducing various genomic variations. We believe that the presented work will catalyse new development of downstream bioinformatics methods for analyzing sequencing data.</jats:p> </jats:sec> :AUTHOR: Yu, Zhenhua and Du, Fang and Ban, Rongjun and Zhang, Yuanwei :DOI: 10.1186/s12859-020-03665-5 :ISSUE: 1 :JOURNAL: BMC Bioinformatics :LANGUAGE: en :MONTH: 12 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1186/s12859-020-03665-5 :VOLUME: 21 :YEAR: 2020 :END: * Genome measures used for quality control are dependent on gene function and ancestry :PROPERTIES: :TITLE: Genome measures used for quality control are dependent on gene function and ancestry :BTYPE: article :CUSTOM_ID: Wang2015 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:p>Motivation: The transition/transversion (Ti/Tv) ratio and heterozygous/nonreference-homozygous (het/nonref-hom) ratio have been commonly computed in genetic studies as a quality control (QC) measurement. Additionally, these two ratios are helpful in our understanding of the patterns of DNA sequence evolution.</jats:p> <jats:p>Results: To thoroughly understand these two genomic measures, we performed a study using 1000 Genomes Project (1000G) released genotype data ( N = 1092). An additional two datasets ( N = 581 and N = 6) were used to validate our findings from the 1000G dataset. We compared the two ratios among continental ancestry, genome regions and gene functionality. We found that the Ti/Tv ratio can be used as a quality indicator for single nucleotide polymorphisms inferred from high-throughput sequencing data. The Ti/Tv ratio varies greatly by genome region and functionality, but not by ancestry. The het/nonref-hom ratio varies greatly by ancestry, but not by genome regions and functionality. Furthermore, extreme guanine + cytosine content (either high or low) is negatively associated with the Ti/Tv ratio magnitude. Thus, when performing QC assessment using these two measures, care must be taken to apply the correct thresholds based on ancestry and genome region. Failure to take these considerations into account at the QC stage will bias any following analysis.</jats:p> <jats:p>Contact: yan.guo@vanderbilt.edu</jats:p> <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.</jats:p> :AUTHOR: Wang, Jing and Raskin, Leon and Samuels, David C. and Shyr, Yu and Guo, Yan :DOI: 10.1093/bioinformatics/btu668 :ISSUE: 3 :JOURNAL: Bioinformatics :LANGUAGE: en :MONTH: 2 :PAGES: 318--323 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/bioinformatics/btu668 :VOLUME: 31 :YEAR: 2015 :END: * Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection :PROPERTIES: :TITLE: Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection :BTYPE: article :CUSTOM_ID: Ewing2015 :AUTHOR: Ewing, Adam D and None, None and Houlahan, Kathleen E and Hu, Yin and Ellrott, Kyle and Caloian, Cristian and Yamaguchi, Takafumi N and Bare, J Christopher and P'ng, Christine and Waggott, Daryl and Sabelnykova, Veronica Y and Kellen, Michael R and Norman, Thea C and Haussler, David and Friend, Stephen H and Stolovitzky, Gustavo and Margolin, Adam A and Stuart, Joshua M and Boutros, Paul C :DOI: 10.1038/nmeth.3407 :ISSUE: 7 :JOURNAL: Nature Methods :LANGUAGE: en :MONTH: 7 :PAGES: 623--630 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/nmeth.3407 :VOLUME: 12 :YEAR: 2015 :END: * PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions :PROPERTIES: :TITLE: PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions :BTYPE: article :CUSTOM_ID: Olson2022 :AUTHOR: Olson, Nathan D. and Wagner, Justin and McDaniel, Jennifer and Stephens, Sarah H. and Westreich, Samuel T. and Prasanna, Anish G. and Johanson, Elaine and Boja, Emily and Maier, Ezekiel J. and Serang, Omar and J\'{a}spez, David and Lorenzo-Salazar, Jos\'{e} M. and Mu\ {n}oz-Barrera, Adri\'{a}n and Rubio-Rodr\'{\i}guez, Luis A. and Flores, Carlos and Kyriakidis, Konstantinos and Malousi, Andigoni and Shafin, Kishwar and Pesout, Trevor and Jain, Miten and Paten, Benedict and Chang, Pi-Chuan and Kolesnikov, Alexey and Nattestad, Maria and Baid, Gunjan and Goel, Sidharth and Yang, Howard and Carroll, Andrew and Eveleigh, Robert and Bourgey, Mathieu and Bourque, Guillaume and Li, Gen and Ma, ChouXian and Tang, LinQi and Du, YuanPing and Zhang, ShaoWei and Morata, Jordi and Tonda, Ra\'{u}l and Parra, Gen\'{\i}s and Trotta, Jean-R\'{e}mi and Brueffer, Christian and Demirkaya-Budak, Sinem and Kabakci-Zorlu, Duygu and Turgut, Deniz and Kalay, \"{O}zem and Budak, Gungor and Narc\i{}, K\"{u}bra and Arslan, Elif and Brown, Richard and Johnson, Ivan J. and Dolgoborodov, Alexey and Semenyuk, Vladimir and Jain, Amit and Tetikol, H. Serhat and Jain, Varun and Ruehle, Mike and Lajoie, Bryan and Roddey, Cooper and Catreux, Severine and Mehio, Rami and Ahsan, Mian Umair and Liu, Qian and Wang, Kai and Ebrahim Sahraeian, Sayed Mohammad and Fang, Li Tai and Mohiyuddin, Marghoob and Hung, Calvin and Jain, Chirag and Feng, Hanying and Li, Zhipan and Chen, Luoqi and Sedlazeck, Fritz J. and Zook, Justin M. :DOI: 10.1016/j.xgen.2022.100129 :ISSUE: 5 :JOURNAL: Cell Genomics :LANGUAGE: en :MONTH: 5 :PAGES: 100129 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.xgen.2022.100129 :VOLUME: 2 :YEAR: 2022 :END: * Library preparation methods for next-generation sequencing: Tone down the bias :PROPERTIES: :TITLE: Library preparation methods for next-generation sequencing: Tone down the bias :BTYPE: article :CUSTOM_ID: VanDijk2014 :AUTHOR: van Dijk, Erwin L. and Jaszczyszyn, Yan and Thermes, Claude :DOI: 10.1016/j.yexcr.2014.01.008 :ISSUE: 1 :JOURNAL: Experimental Cell Research :LANGUAGE: en :MONTH: 3 :PAGES: 12--20 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.yexcr.2014.01.008 :VOLUME: 322 :YEAR: 2014 :END: * Library construction for next-generation sequencing: Overviews and challenges :PROPERTIES: :TITLE: Library construction for next-generation sequencing: Overviews and challenges :BTYPE: article :CUSTOM_ID: Head2014 :ABSTRACT: <jats:p> High-throughput sequencing, also known as next-generation sequencing (NGS), has revolutionized genomic research. In recent years, NGS technology has steadily improved, with costs dropping and the number and range of sequencing applications increasing exponentially. Here, we examine the critical role of sequencing library quality and consider important challenges when preparing NGS libraries from DNA and RNA sources. Factors such as the quantity and physical characteristics of the RNA or DNA source material as well as the desired application (i.e., genome sequencing, targeted sequencing, RNA-seq, ChIP-seq, RIP-seq, and methylation) are addressed in the context of preparing high quality sequencing libraries. In addition, the current methods for preparing NGS libraries from single cells are also discussed. </jats:p> :AUTHOR: Head, Steven R. and Komori, H. Kiyomi and LaMere, Sarah A. and Whisenant, Thomas and Van Nieuwerburgh, Filip and Salomon, Daniel R. and Ordoukhanian, Phillip :DOI: 10.2144/000114133 :ISSUE: 2 :JOURNAL: BioTechniques :LANGUAGE: en :MONTH: 2 :PAGES: 61--77 :PUBLISHER: Future Science Ltd :URL: http://dx.doi.org/10.2144/000114133 :VOLUME: 56 :YEAR: 2014 :END: * Splicing mutations in human genetic disorders: examples, detection, and confirmation :PROPERTIES: :TITLE: Splicing mutations in human genetic disorders: examples, detection, and confirmation :BTYPE: article :CUSTOM_ID: Anna2018 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:p>Precise pre-mRNA splicing, essential for appropriate protein translation, depends on the presence of consensus \textquotedblleft{}cis\textquotedblright{} sequences that define exon-intron boundaries and regulatory sequences recognized by splicing machinery. Point mutations at these consensus sequences can cause improper exon and intron recognition and may result in the formation of an aberrant transcript of the mutated gene. The splicing mutation may occur in both introns and exons and disrupt existing splice sites or splicing regulatory sequences (intronic and exonic splicing silencers and enhancers), create new ones, or activate the cryptic ones. Usually such mutations result in errors during the splicing process and may lead to improper intron removal and thus cause alterations of the open reading frame. Recent research has underlined the abundance and importance of splicing mutations in the etiology of inherited diseases. The application of modern techniques allowed to identify synonymous and nonsynonymous variants as well as deep intronic mutations that affected pre-mRNA splicing. The bioinformatic algorithms can be applied as a tool to assess the possible effect of the identified changes. However, it should be underlined that the results of such tests are only predictive, and the exact effect of the specific mutation should be verified in functional studies. This article summarizes the current knowledge about the \textquotedblleft{}splicing mutations\textquotedblright{} and methods that help to identify such changes in clinical diagnosis.</jats:p> :AUTHOR: Anna, Abramowicz and Monika, Gos :DOI: 10.1007/s13353-018-0444-7 :ISSUE: 3 :JOURNAL: Journal of Applied Genetics :LANGUAGE: en :MONTH: 8 :PAGES: 253--268 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1007/s13353-018-0444-7 :VOLUME: 59 :YEAR: 2018 :END: * A robust model for read count data in exome sequencing experiments and implications for copy number variant calling :PROPERTIES: :TITLE: A robust model for read count data in exome sequencing experiments and implications for copy number variant calling :BTYPE: article :CUSTOM_ID: Plagnol2012 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:p>Motivation: Exome sequencing has proven to be an effective tool to discover the genetic basis of Mendelian disorders. It is well established that copy number variants (CNVs) contribute to the etiology of these disorders. However, calling CNVs from exome sequence data is challenging. A typical read depth strategy consists of using another sample (or a combination of samples) as a reference to control for the variability at the capture and sequencing steps. However, technical variability between samples complicates the analysis and can create spurious CNV calls.</jats:p> <jats:p>Results: Here, we introduce ExomeDepth, a new CNV calling algorithm designed to control for this technical variability. ExomeDepth uses a robust model for the read count data and uses this model to build an optimized reference set in order to maximize the power to detect CNVs. As a result, ExomeDepth is effective across a wider range of exome datasets than the previously existing tools, even for small (e.g. one to two exons) and heterozygous deletions. We used this new approach to analyse exome data from 24 patients with primary immunodeficiencies. Depending on data quality and the exact target region, we find between 170 and 250 exonic CNV calls per sample. Our analysis identified two novel causative deletions in the genes GATA2 and DOCK8.</jats:p> <jats:p>Availability: The code used in this analysis has been implemented into an R package called ExomeDepth and is available at the Comprehensive R Archive Network (CRAN).</jats:p> <jats:p>Contact: v.plagnol@ucl.ac.uk</jats:p> <jats:p>Supplementary Information: Supplementary data are available at Bioinformatics online.</jats:p> :AUTHOR: Plagnol, Vincent and Curtis, James and Epstein, Michael and Mok, Kin Y. and Stebbings, Emma and Grigoriadou, Sofia and Wood, Nicholas W. and Hambleton, Sophie and Burns, Siobhan O. and Thrasher, Adrian J. and Kumararatne, Dinakantha and Doffinger, Rainer and Nejentsev, Sergey :DOI: 10.1093/bioinformatics/bts526 :ISSUE: 21 :JOURNAL: Bioinformatics :LANGUAGE: en :MONTH: 11 :PAGES: 2747--2754 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/bioinformatics/bts526 :VOLUME: 28 :YEAR: 2012 :END: * Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment :PROPERTIES: :TITLE: Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment :BTYPE: article :CUSTOM_ID: Betschart2022 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>Rapid advances in high-throughput DNA sequencing technologies have enabled the conduct of whole genome sequencing (WGS) studies, and several bioinformatics pipelines have become available. The aim of this study was the comparison of 6 WGS data pre-processing pipelines, involving two mapping and alignment approaches (GATK utilizing BWA-MEM2 2.2.1, and DRAGEN 3.8.4) and three variant calling pipelines (GATK 4.2.4.1, DRAGEN 3.8.4 and DeepVariant 1.1.0). We sequenced one genome in a bottle (GIAB) sample 70 times in different runs, and one GIAB trio in triplicate. The truth set of the GIABs was used for comparison, and performance was assessed by computation time, F<jats:sub>1</jats:sub> score, precision, and recall. In the mapping and alignment step, the DRAGEN pipeline was faster than the GATK with BWA-MEM2 pipeline. DRAGEN showed systematically higher F<jats:sub>1</jats:sub> score, precision, and recall values than GATK for single nucleotide variations (SNVs) and Indels in simple-to-map, complex-to-map, coding and non-coding regions. In the variant calling step, DRAGEN was fastest. In terms of accuracy, DRAGEN and DeepVariant performed similarly and both superior to GATK, with slight advantages for DRAGEN for Indels and for DeepVariant for SNVs. The DRAGEN pipeline showed the lowest Mendelian inheritance error fraction for the GIAB trios. Mapping and alignment played a key role in variant calling of WGS, with the DRAGEN outperforming GATK.</jats:p> :AUTHOR: Betschart, Raphael O. and Thi\'{e}ry, Alexandre and Aguilera-Garcia, Domingo and Zoche, Martin and Moch, Holger and Twerenbold, Raphael and Zeller, Tanja and Blankenberg, Stefan and Ziegler, Andreas :DOI: 10.1038/s41598-022-26181-3 :ISSUE: 1 :JOURNAL: Scientific Reports :LANGUAGE: en :MONTH: 12 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41598-022-26181-3 :VOLUME: 12 :YEAR: 2022 :END: * Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants :PROPERTIES: :TITLE: Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants :BTYPE: article :CUSTOM_ID: Belkadi2015 :ABSTRACT: <jats:title>Significance</jats:title> <jats:p>Whole-exome sequencing (WES) is gradually being optimized to identify mutations in increasing proportions of the protein-coding exome, but whole-genome sequencing (WGS) is becoming an attractive alternative. WGS is currently more expensive than WES, but its cost should decrease more rapidly than that of WES. We compared WES and WGS on six unrelated individuals. The distribution of quality parameters for single-nucleotide variants (SNVs) and insertions/deletions (indels) was more uniform for WGS than for WES. The vast majority of SNVs and indels were identified by both techniques, but an estimated 650 high-quality coding SNVs (\sim{}3\% of coding variants) were detected by WGS and missed by WES. WGS is therefore slightly more efficient than WES for detecting mutations in the targeted exome.</jats:p> :AUTHOR: Belkadi, Aziz and Bolze, Alexandre and Itan, Yuval and Cobat, Aur\'{e}lie and Vincent, Quentin B. and Antipenko, Alexander and Shang, Lei and Boisson, Bertrand and Casanova, Jean-Laurent and Abel, Laurent :DOI: 10.1073/pnas.1418631112 :ISSUE: 17 :JOURNAL: Proceedings of the National Academy of Sciences :LANGUAGE: en :MONTH: 4 :PAGES: 5473--5478 :PUBLISHER: Proceedings of the National Academy of Sciences :URL: http://dx.doi.org/10.1073/pnas.1418631112 :VOLUME: 112 :YEAR: 2015 :END: * VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing :PROPERTIES: :TITLE: VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing :BTYPE: article :CUSTOM_ID: Koboldt2012 :ABSTRACT: <jats:p>Cancer is a disease driven by genetic variation and mutation. Exome sequencing can be utilized for discovering these variants and mutations across hundreds of tumors. Here we present an analysis tool, VarScan 2, for the detection of somatic mutations and copy number alterations (CNAs) in exome data from tumor\textendash{}normal pairs. Unlike most current approaches, our algorithm reads data from both samples simultaneously; a heuristic and statistical algorithm detects sequence variants and classifies them by somatic status (germline, somatic, or LOH); while a comparison of normalized read depth delineates relative copy number changes. We apply these methods to the analysis of exome sequence data from 151 high-grade ovarian tumors characterized as part of the Cancer Genome Atlas (TCGA). We validated some 7790 somatic coding mutations, achieving 93\% sensitivity and 85\% precision for single nucleotide variant (SNV) detection. Exome-based CNA analysis identified 29 large-scale alterations and 619 focal events per tumor on average. As in our previous analysis of these data, we observed frequent amplification of oncogenes (e.g., <jats:italic>CCNE1</jats:italic>, <jats:italic>MYC</jats:italic>) and deletion of tumor suppressors (<jats:italic>NF1</jats:italic>, <jats:italic>PTEN</jats:italic>, and <jats:italic>CDKN2A</jats:italic>). We searched for additional recurrent focal CNAs using the correlation matrix diagonal segmentation (CMDS) algorithm, which identified 424 significant events affecting 582 genes. Taken together, our results demonstrate the robust performance of VarScan 2 for somatic mutation and CNA detection and shed new light on the landscape of genetic alterations in ovarian cancer.</jats:p> :AUTHOR: Koboldt, Daniel C. and Zhang, Qunyuan and Larson, David E. and Shen, Dong and McLellan, Michael D. and Lin, Ling and Miller, Christopher A. and Mardis, Elaine R. and Ding, Li and Wilson, Richard K. :DOI: 10.1101/gr.129684.111 :ISSUE: 3 :JOURNAL: Genome Research :LANGUAGE: en :MONTH: 3 :PAGES: 568--576 :PUBLISHER: Cold Spring Harbor Laboratory :URL: http://dx.doi.org/10.1101/gr.129684.111 :VOLUME: 22 :YEAR: 2012 :END: * Curated variation benchmarks for challenging medically relevant autosomal genes :PROPERTIES: :TITLE: Curated variation benchmarks for challenging medically relevant autosomal genes :BTYPE: article :CUSTOM_ID: Wagner2022gene :AUTHOR: Wagner, Justin and Olson, Nathan D. and Harris, Lindsay and McDaniel, Jennifer and Cheng, Haoyu and Fungtammasan, Arkarachai and Hwang, Yih-Chii and Gupta, Richa and Wenger, Aaron M. and Rowell, William J. and Khan, Ziad M. and Farek, Jesse and Zhu, Yiming and Pisupati, Aishwarya and Mahmoud, Medhat and Xiao, Chunlin and Yoo, Byunggil and Sahraeian, Sayed Mohammad Ebrahim and Miller, Danny E. and J\'{a}spez, David and Lorenzo-Salazar, Jos\'{e} M. and Mu\ {n}oz-Barrera, Adri\'{a}n and Rubio-Rodr\'{\i}guez, Luis A. and Flores, Carlos and Narzisi, Giuseppe and Evani, Uday Shanker and Clarke, Wayne E. and Lee, Joyce and Mason, Christopher E. and Lincoln, Stephen E. and Miga, Karen H. and Ebbert, Mark T. W. and Shumate, Alaina and Li, Heng and Chin, Chen-Shan and Zook, Justin M. and Sedlazeck, Fritz J. :DOI: 10.1038/s41587-021-01158-1 :ISSUE: 5 :JOURNAL: Nature Biotechnology :LANGUAGE: en :MONTH: 5 :PAGES: 672--680 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41587-021-01158-1 :VOLUME: 40 :YEAR: 2022 :END: * Exome variant discrepancies due to reference-genome differences :PROPERTIES: :TITLE: Exome variant discrepancies due to reference-genome differences :BTYPE: article :CUSTOM_ID: Li2021 :AUTHOR: Li, He and Dawood, Moez and Khayat, Michael M. and Farek, Jesse R. and Jhangiani, Shalini N. and Khan, Ziad M. and Mitani, Tadahiro and Coban-Akdemir, Zeynep and Lupski, James R. and Venner, Eric and Posey, Jennifer E. and Sabo, Aniko and Gibbs, Richard A. :DOI: 10.1016/j.ajhg.2021.05.011 :ISSUE: 7 :JOURNAL: The American Journal of Human Genetics :LANGUAGE: en :MONTH: 7 :PAGES: 1239--1250 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.ajhg.2021.05.011 :VOLUME: 108 :YEAR: 2021 :END: * Evaluation of an optimized germline exomes pipeline using BWA-MEM2 and Dragen-GATK tools :PROPERTIES: :TITLE: Evaluation of an optimized germline exomes pipeline using BWA-MEM2 and Dragen-GATK tools :BTYPE: article :CUSTOM_ID: Alganmi2023 :ABSTRACT: <jats:p>The next-generation sequencing (NGS) technology represents a significant advance in genomics and medical diagnosis. Nevertheless, the time it takes to perform sequencing, data analysis, and variant interpretation is a bottleneck in using next-generation sequencing in precision medicine. For accurate and efficient performance in clinical diagnostic lab practice, a consistent data analysis pipeline is necessary to avoid false variant calls and achieve optimum accuracy. This study aims to compare the performance of two NGS data analysis pipeline compartments, including short-read mapping (BWA-MEM and BWA-MEM2) and variant calling (GATK-HaplotypeCaller and DRAGEN-GATK). On Whole Exome Sequencing (WES) data, computational performance was assessed using several criteria, including mapping efficiency, variant calling performance, false positive calls rate, and time. We examined four gold-standard WES data sets: Ashkenazim father (NA24149), Ashkenazim mother (NA24143), Ashkenazim son (NA24385), and Asian son (NA25631). In addition, eighteen exome samples were analyzed based on different read counts, and coverage was used precisely in the run-time assessment. By using BWA-MEM 2 and Dragen-GATK, this study achieved faster and more accurate detection for SNVs and indels than the standard GATK Best Practices workflow. This systematic comparison will enable the bioinformatics community to develop a more efficient and faster solution for analyzing NGS data.</jats:p> :AUTHOR: Alganmi, Nofe and Abusamra, Heba :DOI: 10.1371/journal.pone.0288371 :ISSUE: 8 :JOURNAL: PLOS ONE :LANGUAGE: en :MONTH: 8 :PAGES: e0288371 :PUBLISHER: Public Library of Science (PLoS) :URL: http://dx.doi.org/10.1371/journal.pone.0288371 :VOLUME: 18 :YEAR: 2023 :END: * Molecular genetic studies of complete hydatidiform moles :PROPERTIES: :TITLE: Molecular genetic studies of complete hydatidiform moles :BTYPE: article :CUSTOM_ID: Carey :ABSTRACT: Complete hydatidiform moles (CHM) are abnormal pregnancies with no fetal development resulting from having two paternal genomes with no maternal contribution. It is important to distinguish CHM from partial hydatidiform moles, and non-molar abortuses, ... :AUTHOR: Carey, Louise and Nash, Benjamin M. and Wright, Dale C. :DATE: 2015 Apr :DOI: 10.3978/j.issn.2224-4336.2015.04.02 :ISSUE: 2 :JOURNAL: Translational Pediatrics :LANGUAGE: en :PUBLISHER: AME Publications :URL: /pmc/articles/PMC4729092/ :VOLUME: 4 :END: * Nix based fully automated workflows and ecosystem to guarantee scientific result reproducibility across software environments and systems :PROPERTIES: :TITLE: Nix based fully automated workflows and ecosystem to guarantee scientific result reproducibility across software environments and systems :BTYPE: inproceedings :CUSTOM_ID: Devresse2015 :AUTHOR: Devresse, Adrien and Delalondre, Fabien and Sch\"{u}rmann, Felix :BOOKTITLE: SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis :DOI: 10.1145/2830168.2830172 :JOURNAL: Proceedings of the 3rd International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering :MONTH: 11 :PUBLISHER: ACM :URL: http://dx.doi.org/10.1145/2830168.2830172 :VENUE: Austin Texas :YEAR: 2015 :END: * A universal SNP and small-indel variant caller using deep neural networks :PROPERTIES: :TITLE: A universal SNP and small-indel variant caller using deep neural networks :BTYPE: article :CUSTOM_ID: Poplin2018 :AUTHOR: Poplin, Ryan and Chang, Pi-Chuan and Alexander, David and Schwartz, Scott and Colthurst, Thomas and Ku, Alexander and Newburger, Dan and Dijamco, Jojo and Nguyen, Nam and Afshar, Pegah T and Gross, Sam S and Dorfman, Lizzie and McLean, Cory Y and DePristo, Mark A :DOI: 10.1038/nbt.4235 :ISSUE: 10 :JOURNAL: Nature Biotechnology :LANGUAGE: en :MONTH: 11 :PAGES: 983--987 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/nbt.4235 :VOLUME: 36 :YEAR: 2018 :END: * PCR-Free Shallow Whole Genome Sequencing for Chromosomal Copy Number Detection from Plasma of Cancer Patients Is an Efficient Alternative to the Conventional PCR-Based Approach :PROPERTIES: :TITLE: PCR-Free Shallow Whole Genome Sequencing for Chromosomal Copy Number Detection from Plasma of Cancer Patients Is an Efficient Alternative to the Conventional PCR-Based Approach :BTYPE: article :CUSTOM_ID: Beagan2021 :ABSTRACT: Somatic copy number alterations can be detected in cell-free DNA (cfDNA) by shallow whole genome sequencing (sWGS). PCR is typically included in libra\ldots{} :AUTHOR: Beagan, Jamie J. and Drees, Esther E.E. and Stathi, Phylicia and Eijk, Paul P. and Meulenbroeks, Laura and Kessler, Floortje and Middeldorp, Jaap M. and Pegtel, D. Michiel and Zijlstra, Jos\'{e}e M. and Sie, Daoud and Heideman, Dani\"{e}lle A.M. and Thunnissen, Erik and Smit, Linda and de Jong, Daphne and Mouliere, Florent and Ylstra, Bauke and Roemer, Margaretha G.M. and van Dijk, Erik :DOI: 10.1016/j.jmoldx.2021.08.008 :ISSN: 1525-1578 :ISSUE: 11 :JOURNAL: The Journal of Molecular Diagnostics :PAGES: 1553-1563 :PUBLISHER: Elsevier :URL: https://www.sciencedirect.com/science/article/pii/S1525157821002646 :VOLUME: 23 :YEAR: 2021 :END: * GENCODE: The reference human genome annotation for The ENCODE Project :PROPERTIES: :TITLE: GENCODE: The reference human genome annotation for The ENCODE Project :BTYPE: article :CUSTOM_ID: Harrow :ABSTRACT: An international, peer-reviewed genome sciences journal featuring outstanding original research that offers novel insights into the biology of all organisms :AUTHOR: Harrow, Jennifer and Frankish, Adam and Gonzalez, Jose M. and Tapanari, Electra and Diekhans, Mark and Kokocinski, Felix and Aken, Bronwen L. and Barrell, Daniel and Zadissa, Amonida and Searle, Stephen and Barnes, If and Bignell, Alexandra and Boychenko, Veronika and Hunt, Toby and Kay, Mike and Mukherjee, Gaurab and Rajan, Jeena and Despacio-Reyes, Gloria and Saunders, Gary and Steward, Charles and Harte, Rachel and Lin, Michael and Howald, C\'{e}dric and Tanzer, Andrea and Derrien, Thomas and Chrast, Jacqueline and Walters, Nathalie and Balasubramanian, Suganthi and Pei, Baikang and Tress, Michael and Rodriguez, Jose Manuel and Ezkurdia, Iakes and van Baren, Jeltje and Brent, Michael and Haussler, David and Kellis, Manolis and Valencia, Alfonso and Reymond, Alexandre and Gerstein, Mark and Guig\'{o}, Roderic and Hubbard, Tim J. :DATE: 2012-09-01 :DOI: 10.1101/gr.135350.111 :ISSN: 1088-9051 :ISSUE: 9 :JOURNAL: Genome Research :LANGUAGE: en :PUBLISHER: Cold Spring Harbor Laboratory Press :URL: https://genome.cshlp.org/content/22/9/1760.full :VOLUME: 22 :END: * Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings :PROPERTIES: :TITLE: Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings :BTYPE: article :CUSTOM_ID: Hwang2019 :ABSTRACT: Comprehensive and accurate detection of variants from whole-genome sequencing (WGS) is a strong prerequisite for translational genomic medicine; however, low concordance between analytic pipelines is an outstanding challenge. We processed a European and an African WGS samples with 70 analytic pipelines comprising the combination of 7 short-read aligners and 10 variant calling algorithms (VCAs), and observed remarkable differences in the number of variants called by different pipelines (max/min ratio: 1.3\textasciitilde{}3.4). The similarity between variant call sets was more closely determined by VCAs rather than by short-read aligners. Remarkably, reported minor allele frequency had a substantial effect on concordance between pipelines (concordance rate ratio: 0.11\textasciitilde{}0.92; Wald tests, P\hspace{0.167em}\<\hspace{0.167em}0.001), entailing more discordant results for rare and novel variants. We compared the performance of analytic pipelines and pipeline ensembles using gold-standard variant call sets and the catalog of variants from the 1000 Genomes Project. Notably, a single pipeline using BWA-MEM and GATK-HaplotypeCaller performed comparable to the pipeline ensembles for `callable' regions (\textasciitilde{}97\%) of the human reference genome. While a single pipeline is capable of analyzing common variants in most genomic regions, our findings demonstrated the limitations and challenges in analyzing rare or novel variants, especially for non-European genomes. :AUTHOR: Hwang, Kyu-Baek and Lee, In-Hee and Li, Honglan and Won, Dhong-Geon and Hernandez-Ferrer, Carles and Negron, Jose Alberto and Kong, Sek Won :DATE: 2019-03-01 :DOI: 10.1038/s41598-019-39108-2 :ISSN: 2045-2322 :ISSUE: 1 :JOURNAL: Scientific Reports :KEYWORDS: Genetics research :LANGUAGE: En :PUBLISHER: Nature Publishing Group :URL: https://www.nature.com/articles/s41598-019-39108-2 :VOLUME: 9 :YEAR: 2019 :END: * Frequent nonallelic gene conversion on the human lineage and its effect on the divergence of gene duplicates :PROPERTIES: :TITLE: Frequent nonallelic gene conversion on the human lineage and its effect on the divergence of gene duplicates :BTYPE: article :CUSTOM_ID: Harpak2017 :ABSTRACT: <jats:title>Significance</jats:title> <jats:p>Nonallelic gene conversion (NAGC) is a driver of more than 20 diseases. It is also thought to drive the \textquotedblleft{}concerted evolution\textquotedblright{} of gene duplicates because it acts to eliminate any differences that accumulate between them. Despite its importance, the parameters that govern NAGC are not well characterized. We developed statistical tools to study NAGC and its consequences for human gene duplicates. We find that the baseline rate of NAGC in humans is 20 times faster than the point mutation rate. Despite this high rate, NAGC has a surprisingly small effect on the average sequence divergence of human duplicates\textemdash{}and concerted evolution is not as pervasive as previously thought.</jats:p> :AUTHOR: Harpak, Arbel and Lan, Xun and Gao, Ziyue and Pritchard, Jonathan K. :DOI: 10.1073/pnas.1708151114 :ISSUE: 48 :JOURNAL: Proceedings of the National Academy of Sciences :LANGUAGE: en :MONTH: 11 :PAGES: 12779--12784 :PUBLISHER: Proceedings of the National Academy of Sciences :URL: http://dx.doi.org/10.1073/pnas.1708151114 :VOLUME: 114 :YEAR: 2017 :END: * Benchmarking challenging small variants with linked and long reads :PROPERTIES: :TITLE: Benchmarking challenging small variants with linked and long reads :BTYPE: misc :CUSTOM_ID: Wagner2022 :ABSTRACT: <jats:title>Summary</jats:title><jats:p>Genome in a Bottle (GIAB) benchmarks have been widely used to help validate clinical sequencing pipelines and develop new variant calling and sequencing methods. Here, we use accurate linked reads and long reads to expand the prior benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are not readily accessible to short reads. Our new benchmark adds more than 300,000 SNVs, 50,000 indels, and 16 \% new exonic variants, many in challenging, clinically relevant genes not previously covered (e.g., <jats:italic>PMS2</jats:italic>). For HG002, we include 92\% of the autosomal GRCh38 assembly, while excluding problematic regions for benchmarking small variants (e.g., copy number variants and reference errors) that should not have been in the previous version, which included 85\% of GRCh38. By including difficult-to-map regions, this benchmark identifies eight times more false negatives in a short read variant call set relative to our previous benchmark.We have demonstrated the utility of this benchmark to reliably identify false positives and false negatives across technologies in more challenging regions, which enables continued technology and bioinformatics development.</jats:p> :AUTHOR: Wagner, Justin and Olson, Nathan D and Harris, Lindsay and McDaniel, Jennifer and Khan, Ziad and Farek, Jesse and Mahmoud, Medhat and Stankovic, Ana and Kovacevic, Vladimir and Yoo, Byunggil and Miller, Neil and Rosenfeld, Jeffrey A. and Ni, Bohan and Zarate, Samantha and Kirsche, Melanie and Aganezov, Sergey and Schatz, Michael and Narzisi, Giuseppe and Byrska-Bishop, Marta and Clarke, Wayne and Evani, Uday S. and Markello, Charles and Shafin, Kishwar and Zhou, Xin and Sidow, Arend and Bansal, Vikas and Ebert, Peter and Marschall, Tobias and Lansdorp, Peter and Hanlon, Vincent and Mattsson, Carl-Adam and Barrio, Alvaro Martinez and Fiddes, Ian T and Xiao, Chunlin and Fungtammasan, Arkarachai and Chin, Chen-Shan and Wenger, Aaron M and Rowell, William J and Sedlazeck, Fritz J and Carroll, Andrew and Salit, Marc and Zook, Justin M :DOI: 10.1101/2020.07.24.212712 :MONTH: 7 :PUBLISHER: Cold Spring Harbor Laboratory :URL: http://dx.doi.org/10.1101/2020.07.24.212712 :YEAR: 2022 :END: * Alignment-free approaches for predicting novel Nuclear Mitochondrial Segments (NUMTs) in the human genome :PROPERTIES: :TITLE: Alignment-free approaches for predicting novel Nuclear Mitochondrial Segments (NUMTs) in the human genome :BTYPE: article :CUSTOM_ID: Li2019 :AUTHOR: Li, Wentian and Freudenberg, Jerome and Freudenberg, Jan :DOI: 10.1016/j.gene.2018.12.040 :JOURNAL: Gene :LANGUAGE: en :MONTH: 4 :PAGES: 141--152 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.gene.2018.12.040 :VOLUME: 691 :YEAR: 2019 :END: * Enhanced copy number variants detection from whole-exome sequencing data using EXCAVATOR2 :PROPERTIES: :TITLE: Enhanced copy number variants detection from whole-exome sequencing data using EXCAVATOR2 :BTYPE: article :CUSTOM_ID: DAurizio2016 :AUTHOR: D'Aurizio, Romina and Pippucci, Tommaso and Tattini, Lorenzo and Giusti, Betti and Pellegrini, Marco and Magi, Alberto :DOI: 10.1093/nar/gkw695 :JOURNAL: Nucleic Acids Research :LANGUAGE: en :MONTH: 8 :PAGES: gkw695 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/nar/gkw695 :YEAR: 2016 :END: * Scaling accurate genetic variant discovery to tens of thousands of samples :PROPERTIES: :TITLE: Scaling accurate genetic variant discovery to tens of thousands of samples :BTYPE: misc :CUSTOM_ID: Poplin2017 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>Comprehensive disease gene discovery in both common and rare diseases will require the efficient and accurate detection of all classes of genetic variation across tens to hundreds of thousands of human samples. We describe here a novel assembly-based approach to variant calling, the GATK HaplotypeCaller (HC) and Reference Confidence Model (RCM), that determines genotype likelihoods independently per-sample but performs joint calling across all samples within a project simultaneously. We show by calling over 90,000 samples from the Exome Aggregation Consortium (ExAC) that, in contrast to other algorithms, the HC-RCM scales efficiently to very large sample sizes without loss in accuracy; and that the accuracy of indel variant calling is superior in comparison to other algorithms. More importantly, the HC-RCM produces a fully squared-off matrix of genotypes across all samples at every genomic position being investigated. The HC-RCM is a novel, scalable, assembly-based algorithm with abundant applications for population genetics and clinical studies.</jats:p> :AUTHOR: Poplin, Ryan and Ruano-Rubio, Valentin and DePristo, Mark A. and Fennell, Tim J. and Carneiro, Mauricio O. and Van der Auwera, Geraldine A. and Kling, David E. and Gauthier, Laura D. and Levy-Moonshine, Ami and Roazen, David and Shakir, Khalid and Thibault, Joel and Chandran, Sheila and Whelan, Chris and Lek, Monkol and Gabriel, Stacey and Daly, Mark J and Neale, Ben and MacArthur, Daniel G. and Banks, Eric :DOI: 10.1101/201178 :MONTH: 11 :PUBLISHER: Cold Spring Harbor Laboratory :URL: http://dx.doi.org/10.1101/201178 :YEAR: 2017 :END: * Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR :PROPERTIES: :TITLE: Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR :BTYPE: article :CUSTOM_ID: Yang2015 :ABSTRACT: This protocol describes how to annotate genomic variants using either the ANNOVAR software or the web-based wANNOVAR tool. Recent developments in sequencing techniques have enabled rapid and high-throughput generation of sequence data, democratizing the ability to compile information on large amounts of genetic variations in individual laboratories. However, there is a growing gap between the generation of raw sequencing data and the extraction of meaningful biological information. Here, we describe a protocol to use the ANNOVAR (ANNOtate VARiation) software to facilitate fast and easy variant annotations, including gene-based, region-based and filter-based annotations on a variant call format (VCF) file generated from human genomes. We further describe a protocol for gene-based annotation of a newly sequenced nonhuman species. Finally, we describe how to use a user-friendly and easily accessible web server called wANNOVAR to prioritize candidate genes for a Mendelian disease. The variant annotation protocols take 5\textendash{}30 min of computer time, depending on the size of the variant file, and 5\textendash{}10 min of hands-on time. In summary, through the command-line tool and the web server, these protocols provide a convenient means to analyze genetic variants generated in humans and other species. :AUTHOR: Yang, Hui and Wang, Kai :DATE: 2015-09-17 :DOI: 10.1038/nprot.2015.105 :ISSN: 1750-2799 :ISSUE: 10 :JOURNAL: Nature Protocols :KEYWORDS: Genetic variation :LANGUAGE: En :PUBLISHER: Nature Publishing Group :URL: https://www.nature.com/articles/nprot.2015.105 :VOLUME: 10 :YEAR: 2015 :END: * Similarities and differences between variants called with human reference genome HG19 or HG38 :PROPERTIES: :TITLE: Similarities and differences between variants called with human reference genome HG19 or HG38 :BTYPE: article :CUSTOM_ID: Pan :ABSTRACT: Reference genome selection is a prerequisite for successful analysis of next generation sequencing (NGS) data. Current practice employs one of the two most recent human reference genome versions: HG19 or HG38. To date, the impact of genome version on SNV identification has not been rigorously assessed. We conducted analysis comparing the SNVs identified based on HG19 vs HG38, leveraging whole genome sequencing (WGS) data from the genome-in-a-bottle (GIAB) project. First, SNVs were called using 26 different bioinformatics pipelines with either HG19 or HG38. Next, two tools were used to convert the called SNVs between HG19 and HG38. Lastly we calculated conversion rates, analyzed discordant rates between SNVs called with HG19 or HG38, and characterized the discordant SNVs. The conversion rates from HG38 to HG19 (average 95\%) were lower than the conversion rates from HG19 to HG38 (average 99\%). The conversion rates varied slightly among the various calling pipelines. Around 1.5\% SNVs were discordantly converted between HG19 or HG38. The conversions from HG38 to HG19 had more SNVs which failed conversion and more discordant SNVs than the opposite conversion (HG19 to HG38). Most of the discordant SNVs had low read depth, were low confidence SNVs as defined by GIAB, and/or were predominated by G/C alleles (52\% observed versus 42\% expected). A significant number of SNVs could not be converted between HG19 and HG38. Based on careful review of our comparisons, we recommend HG38 (the newer version) for NGS SNV analysis. To summarize, our findings suggest caution when translating identified SNVs between different versions of the human reference genome. :AUTHOR: Pan, Bohu and Kusko, Rebecca and Xiao, Wenming and Zheng, Yuanting and Liu, Zhichao and Xiao, Chunlin and Sakkiah, Sugunadevi and Guo, Wenjing and Gong, Ping and Zhang, Chaoyang and Ge, Weigong and Shi, Leming and Tong, Weida and Hong, Huixiao :DATE: 2019-03-14 :DOI: 10.1186/s12859-019-2620-0 :ISSN: 1471-2105 :ISSUE: 2 :JOURNAL: BMC Bioinformatics :KEYWORDS: Bioinformatics :LANGUAGE: En :PUBLISHER: BioMed Central :URL: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2620-0 :VOLUME: 20 :END: * Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects :PROPERTIES: :TITLE: Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects :BTYPE: article :CUSTOM_ID: Regier2018 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>Hundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power. A central challenge for joint analysis is that different WGS data processing pipelines cause substantial differences in variant calling in combined datasets, necessitating computationally expensive reprocessing. This approach is no longer tenable given the scale of current studies and data volumes. Here, we define WGS data processing standards that allow different groups to produce functionally equivalent (FE) results, yet still innovate on data processing pipelines. We present initial FE pipelines developed at five genome centers and show that they yield similar variant calling results and produce significantly less variability than sequencing replicates. This work alleviates a key technical bottleneck for genome aggregation and helps lay the foundation for community-wide human genetics studies.</jats:p> :AUTHOR: Regier, Allison A. and Farjoun, Yossi and Larson, David E. and Krasheninina, Olga and Kang, Hyun Min and Howrigan, Daniel P. and Chen, Bo-Juen and Kher, Manisha and Banks, Eric and Ames, Darren C. and English, Adam C. and Li, Heng and Xing, Jinchuan and Zhang, Yeting and Matise, Tara and Abecasis, Goncalo R. and Salerno, Will and Zody, Michael C. and Neale, Benjamin M. and Hall, Ira M. :DOI: 10.1038/s41467-018-06159-4 :ISSUE: 1 :JOURNAL: Nature Communications :LANGUAGE: en :MONTH: 10 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41467-018-06159-4 :VOLUME: 9 :YEAR: 2018 :END: * Unifying package managers, workflow engines, and containers: Computational reproducibility with BioNix :PROPERTIES: :TITLE: Unifying package managers, workflow engines, and containers: Computational reproducibility with BioNix :BTYPE: article :CUSTOM_ID: Bedo2020 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:sec> <jats:title>Motivation</jats:title> <jats:p>A challenge for computational biologists is to make our analyses reproducible\textemdash{}i.e. to rerun, combine, and share, with the assurance that equivalent runs will generate identical results. Current best practice aims at this using a combination of package managers, workflow engines, and containers.</jats:p> </jats:sec> <jats:sec> <jats:title>Results</jats:title> <jats:p>We present BioNix, a lightweight library built on the Nix deployment system. BioNix manages software dependencies, computational environments, and workflow stages together using a single abstraction: pure functions. This lets users specify workflows in a clean, uniform way, with strong reproducibility guarantees.</jats:p> </jats:sec> <jats:sec> <jats:title>Availability and Implementation</jats:title> <jats:p>BioNix is implemented in the Nix expression language and is released on GitHub under the 3-clause BSD license: https://github.com/PapenfussLab/bionix (biotools:BioNix) (BioNix, RRID:SCR\_017662).</jats:p> </jats:sec> :AUTHOR: Bed\H{o}, Justin and Di Stefano, Leon and Papenfuss, Anthony T :DOI: 10.1093/gigascience/giaa121 :ISSUE: 11 :JOURNAL: GigaScience :LANGUAGE: en :MONTH: 11 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/gigascience/giaa121 :VOLUME: 9 :YEAR: 2020 :END: * Mobster: accurate detection of mobile element insertions in next generation sequencing data :PROPERTIES: :TITLE: Mobster: accurate detection of mobile element insertions in next generation sequencing data :BTYPE: article :CUSTOM_ID: Thung2014 :AUTHOR: Thung, Djie Tjwan and de Ligt, Joep and Vissers, Lisenka EM and Steehouwer, Marloes and Kroon, Mark and de Vries, Petra and Slagboom, Eline P and Ye, Kai and Veltman, Joris A and Hehir-Kwa, Jayne Y :DOI: 10.1186/s13059-014-0488-x :ISSUE: 10 :JOURNAL: Genome Biology :LANGUAGE: en :MONTH: 10 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1186/s13059-014-0488-x :VOLUME: 15 :YEAR: 2014 :END: * ANNOVAR documentation :PROPERTIES: :TITLE: ANNOVAR documentation :BTYPE: article :CUSTOM_ID: AnnovarDoc :ABSTRACT: Documentation for ANNOVAR software :AUTHOR: ANNOVAR :DOI: 10.1093/nar/gkz923/5603227 :URL: https://annovar.openbioinformatics.org/en/latest/ :YEAR: 2023 :END: * Variant calling: Considerations, practices, and developments :PROPERTIES: :TITLE: Variant calling: Considerations, practices, and developments :BTYPE: article :CUSTOM_ID: Zverinova2022 :AUTHOR: Zverinova, Stepanka and Guryev, Victor :DOI: 10.1002/humu.24311 :ISSUE: 8 :JOURNAL: Human Mutation :LANGUAGE: en :MONTH: 8 :PAGES: 976--985 :PUBLISHER: Hindawi Limited :URL: http://dx.doi.org/10.1002/humu.24311 :VOLUME: 43 :YEAR: 2022 :END: * De novo genome assembly: what every biologist should know :PROPERTIES: :TITLE: De novo genome assembly: what every biologist should know :BTYPE: article :CUSTOM_ID: Baker2012 :AUTHOR: Baker, Monya :DOI: 10.1038/nmeth.1935 :ISSUE: 4 :JOURNAL: Nature Methods :LANGUAGE: en :MONTH: 4 :PAGES: 333--337 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/nmeth.1935 :VOLUME: 9 :YEAR: 2012 :END: * Variant calling and benchmarking in an era of complete human genome sequences :PROPERTIES: :TITLE: Variant calling and benchmarking in an era of complete human genome sequences :BTYPE: article :CUSTOM_ID: Olson :ABSTRACT: Genetic variant calling from DNA sequencing has enabled understanding of germline variation in hundreds of thousands of humans. Sequencing technologies and variant-calling methods have advanced rapidly, routinely providing reliable variant calls in most of the human genome. We describe how advances in long reads, deep learning, de novo assembly and pangenomes have expanded access to variant calls in increasingly challenging, repetitive genomic regions, including medically relevant regions, and how new benchmark sets and benchmarking methods illuminate their strengths and limitations. Finally, we explore the possible future of more complete characterization of human genome variation in light of the recent completion of a telomere-to-telomere human genome reference assembly and human pangenomes, and we consider the innovations needed to benchmark their newly accessible repetitive regions and complex variants. Variant calling is the process of identifying genetic variants, which is important for characterizing population genetic diversity and for identifying disease-associated variants in clinical sequencing projects. In this Review, the authors discuss the state-of-the-art in variant calling, focusing on challenging types of genetic variants, advances in both sequencing technologies and computational pipelines, and benchmarking strategies to assess the robustness of variant-calling strategies. :AUTHOR: Olson, Nathan D. and Wagner, Justin and Dwarshuis, Nathan and Miga, Karen H. and Sedlazeck, Fritz J. and Salit, Marc and Zook, Justin M. :DATE: 2023-04-14 :DOI: 10.1038/s41576-023-00590-0 :ISSN: 1471-0064 :ISSUE: 7 :JOURNAL: Nature Reviews Genetics :KEYWORDS: DNA sequencing :LANGUAGE: En :PUBLISHER: Nature Publishing Group :URL: https://www.nature.com/articles/s41576-023-00590-0.epdf?sharing\_token=g0rQWMn7wp4g\_pOXuSFBBtRgN0jAjWel9jnR3ZoTv0N-vGxs0JU76MliazF53ZilSipARn0MhRuH-GQkm\_Ozmxe6pLVKUtVDxOyTXgPQNV\_apvVT9cT3pRn\_v1iQDYVlp03nYAkpC5VvwWJ1maXqJG4cCSabFvnLoaGv0H6-SUg\%3D :END: * Detection of long repeat expansions from PCR-free whole-genome sequence data :PROPERTIES: :TITLE: Detection of long repeat expansions from PCR-free whole-genome sequence data :BTYPE: article :CUSTOM_ID: Dolzhenko2017 :ABSTRACT: <jats:p>Identifying large expansions of short tandem repeats (STRs), such as those that cause amyotrophic lateral sclerosis (ALS) and fragile X syndrome, is challenging for short-read whole-genome sequencing (WGS) data. A solution to this problem is an important step toward integrating WGS into precision medicine. We developed a software tool called ExpansionHunter that, using PCR-free WGS short-read data, can genotype repeats at the locus of interest, even if the expanded repeat is larger than the read length. We applied our algorithm to WGS data from 3001 ALS patients who have been tested for the presence of the <jats:italic>C9orf72</jats:italic> repeat expansion with repeat-primed PCR (RP-PCR). Compared against this truth data, ExpansionHunter correctly classified all (212/212, 95\% CI [0.98, 1.00]) of the expanded samples as either expansions (208) or potential expansions (4). Additionally, 99.9\% (2786/2789, 95\% CI [0.997, 1.00]) of the wild-type samples were correctly classified as wild type by this method with the remaining three samples identified as possible expansions. We further applied our algorithm to a set of 152 samples in which every sample had one of eight different pathogenic repeat expansions, including those associated with fragile X syndrome, Friedreich's ataxia, and Huntington's disease, and correctly flagged all but one of the known repeat expansions. Thus, ExpansionHunter can be used to accurately detect known pathogenic repeat expansions and provides researchers with a tool that can be used to identify new pathogenic repeat expansions.</jats:p> :AUTHOR: Dolzhenko, Egor and van Vugt, Joke J.F.A. and Shaw, Richard J. and Bekritsky, Mitchell A. and van Blitterswijk, Marka and Narzisi, Giuseppe and Ajay, Subramanian S. and Rajan, Vani and Lajoie, Bryan R. and Johnson, Nathan H. and Kingsbury, Zoya and Humphray, Sean J. and Schellevis, Raymond D. and Brands, William J. and Baker, Matt and Rademakers, Rosa and Kooyman, Maarten and Tazelaar, Gijs H.P. and van Es, Michael A. and McLaughlin, Russell and Sproviero, William and Shatunov, Aleksey and Jones, Ashley and Al Khleifat, Ahmad and Pittman, Alan and Morgan, Sarah and Hardiman, Orla and Al-Chalabi, Ammar and Shaw, Chris and Smith, Bradley and Neo, Edmund J. and Morrison, Karen and Shaw, Pamela J. and Reeves, Catherine and Winterkorn, Lara and Wexler, Nancy S. and Housman, David E. and Ng, Christopher W. and Li, Alina L. and Taft, Ryan J. and van den Berg, Leonard H. and Bentley, David R. and Veldink, Jan H. and Eberle, Michael A. and None, None :DOI: 10.1101/gr.225672.117 :ISSUE: 11 :JOURNAL: Genome Research :LANGUAGE: en :MONTH: 11 :PAGES: 1895--1903 :PUBLISHER: Cold Spring Harbor Laboratory :URL: http://dx.doi.org/10.1101/gr.225672.117 :VOLUME: 27 :YEAR: 2017 :END: * Centromere reference models for human chromosomes X and Y satellite arrays :PROPERTIES: :TITLE: Centromere reference models for human chromosomes X and Y satellite arrays :BTYPE: article :CUSTOM_ID: Miga2014 :ABSTRACT: <jats:p>The human genome sequence remains incomplete, with multimegabase-sized gaps representing the endogenous centromeres and other heterochromatic regions. Available sequence-based studies within these sites in the genome have demonstrated a role in centromere function and chromosome pairing, necessary to ensure proper chromosome segregation during cell division. A common genomic feature of these regions is the enrichment of long arrays of near-identical tandem repeats, known as satellite DNAs, which offer a limited number of variant sites to differentiate individual repeat copies across millions of bases. This substantial sequence homogeneity challenges available assembly strategies and, as a result, centromeric regions are omitted from ongoing genomic studies. To address this problem, we utilize monomer sequence and ordering information obtained from whole-genome shotgun reads to model two haploid human satellite arrays on chromosomes X and Y, resulting in an initial characterization of 3.83 Mb of centromeric DNA within an individual genome. To further expand the utility of each centromeric reference sequence model, we evaluate sites within the arrays for short-read mappability and chromosome specificity. Because satellite DNAs evolve in a concerted manner, we use these centromeric assemblies to assess the extent of sequence variation among 366 individuals from distinct human populations. We thus identify two satellite array variants in both X and Y centromeres, as determined by array length and sequence composition. This study provides an initial sequence characterization of a regional centromere and establishes a foundation to extend genomic characterization to these sites as well as to other repeat-rich regions within complex genomes.</jats:p> :AUTHOR: Miga, Karen H. and Newton, Yulia and Jain, Miten and Altemose, Nicolas and Willard, Huntington F. and Kent, W. James :DOI: 10.1101/gr.159624.113 :ISSUE: 4 :JOURNAL: Genome Research :LANGUAGE: en :MONTH: 4 :PAGES: 697--707 :PUBLISHER: Cold Spring Harbor Laboratory :URL: http://dx.doi.org/10.1101/gr.159624.113 :VOLUME: 24 :YEAR: 2014 :END: * DNA Sequencing Costs: Data :PROPERTIES: :TITLE: DNA Sequencing Costs: Data :BTYPE: article :CUSTOM_ID: Wetterstrand :ABSTRACT: Data used to estimate the cost of sequencing the human genome over time since the Human Genome Project. :AUTHOR: Wetterstrand, Kris A. :URL: https://www.genome.gov/about-genomics/fact-sheets/DNA-Sequencing-Costs-Data :END: * Sustainable packaging of quantum chemistry software with the Nix package manager :PROPERTIES: :TITLE: Sustainable packaging of quantum chemistry software with the Nix package manager :BTYPE: article :CUSTOM_ID: Kowalewski2022 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>The installation of quantum chemistry software packages is commonly done manually and can be a time-consuming and complicated process. An update of the underlying Linux system requires a reinstallation in many cases and can quietly break software installed on the system. In this paper, we present an approach that allows for an easy installation of quantum chemistry software packages, which is also independent of operating system updates. The use of the Nix package manager allows building software in a reproducible manner, which allows for a reconstruction of the software for later reproduction of scientific results. The build recipes that are provided can be readily used by anyone to avoid complex installation procedures.</jats:p> :AUTHOR: Kowalewski, Markus and Seeber, Phillip :DOI: 10.1002/qua.26872 :ISSUE: 9 :JOURNAL: International Journal of Quantum Chemistry :LANGUAGE: en :MONTH: 5 :PUBLISHER: Wiley :URL: http://dx.doi.org/10.1002/qua.26872 :VOLUME: 122 :YEAR: 2022 :END: * Snakemake\textemdash{}a scalable bioinformatics workflow engine :PROPERTIES: :TITLE: Snakemake\textemdash{}a scalable bioinformatics workflow engine :BTYPE: article :CUSTOM_ID: Koster2012 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:p>Summary: Snakemake is a workflow engine that provides a readable Python-based workflow definition language and a powerful execution environment that scales from single-core workstations to compute clusters without modifying the workflow. It is the first system to support the use of automatically inferred multiple named wildcards (or variables) in input and output filenames.</jats:p> <jats:p>Availability: http://snakemake.googlecode.com.</jats:p> <jats:p>Contact: johannes.koester@uni-due.de</jats:p> :AUTHOR: K\"{o}ster, Johannes and Rahmann, Sven :DOI: 10.1093/bioinformatics/bts480 :ISSUE: 19 :JOURNAL: Bioinformatics :LANGUAGE: en :MONTH: 10 :PAGES: 2520--2522 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/bioinformatics/bts480 :VOLUME: 28 :YEAR: 2012 :END: * An open resource for accurately benchmarking small variant and reference calls :PROPERTIES: :TITLE: An open resource for accurately benchmarking small variant and reference calls :BTYPE: article :CUSTOM_ID: Zook2019 :AUTHOR: Zook, Justin M. and McDaniel, Jennifer and Olson, Nathan D. and Wagner, Justin and Parikh, Hemang and Heaton, Haynes and Irvine, Sean A. and Trigg, Len and Truty, Rebecca and McLean, Cory Y. and De La Vega, Francisco M. and Xiao, Chunlin and Sherry, Stephen and Salit, Marc :DOI: 10.1038/s41587-019-0074-6 :ISSUE: 5 :JOURNAL: Nature Biotechnology :LANGUAGE: en :MONTH: 5 :PAGES: 561--566 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41587-019-0074-6 :VOLUME: 37 :YEAR: 2019 :END: * Comparison of Short-Read Sequence Aligners Indicates Strengths and Weaknesses for Biologists to Consider :PROPERTIES: :TITLE: Comparison of Short-Read Sequence Aligners Indicates Strengths and Weaknesses for Biologists to Consider :BTYPE: article :CUSTOM_ID: Musich2021 :ABSTRACT: <jats:p>Aligning short-read sequences is the foundational step to most genomic and transcriptomic analyses, but not all tools perform equally, and choosing among the growing body of available tools can be daunting. Here, in order to increase awareness in the research community, we discuss the merits of common algorithms and programs in a way that should be approachable to biologists with limited experience in bioinformatics. We will only in passing consider the effects of data cleanup, a precursor analysis to most alignment tools, and no consideration will be given to downstream processing of the aligned fragments. To compare aligners [Bowtie2, Burrows Wheeler Aligner (BWA), HISAT2, MUMmer4, STAR, and TopHat2], an RNA-seq dataset was used containing data from 48 geographically distinct samples of the grapevine powdery mildew fungus <jats:italic>Erysiphe necator</jats:italic>. Based on alignment rate and gene coverage, all aligners performed well with the exception of TopHat2, which HISAT2 superseded. BWA perhaps had the best performance in these metrics, except for longer transcripts (\&gt;500 bp) for which HISAT2 and STAR performed well. HISAT2 was \textasciitilde{}3-fold faster than the next fastest aligner in runtime, which we consider a secondary factor in most alignments. At the end, this direct comparison of commonly used aligners illustrates key considerations when choosing which tool to use for the specific sequencing data and objectives. No single tool meets all needs for every user, and there are many quality aligners available.</jats:p> :AUTHOR: Musich, Ryan and Cadle-Davidson, Lance and Osier, Michael V. :DOI: 10.3389/fpls.2021.657240 :JOURNAL: Frontiers in Plant Science :MONTH: 4 :PUBLISHER: Frontiers Media SA :URL: http://dx.doi.org/10.3389/fpls.2021.657240 :VOLUME: 12 :YEAR: 2021 :END: * Recommendations for whole genome sequencing in diagnostics for rare diseases :PROPERTIES: :TITLE: Recommendations for whole genome sequencing in diagnostics for rare diseases :BTYPE: article :CUSTOM_ID: Souche2022 :ABSTRACT: In 2016, guidelines for diagnostic Next Generation Sequencing (NGS) have been published by EuroGentest in order to assist laboratories in the implementation and accreditation of NGS in a diagnostic setting. These guidelines mainly focused on Whole Exome Sequencing (WES) and targeted (gene panels) sequencing detecting small germline variants (Single Nucleotide Variants (SNVs) and insertions/deletions (indels)). Since then, Whole Genome Sequencing (WGS) has been increasingly introduced in the diagnosis of rare diseases as WGS allows the simultaneous detection of SNVs, Structural Variants (SVs) and other types of variants such as repeat expansions. The use of WGS in diagnostics warrants the re-evaluation and update of previously published guidelines. This work was jointly initiated by EuroGentest and the Horizon2020 project Solve-RD. Statements from the 2016 guidelines have been reviewed in the context of WGS and updated where necessary. The aim of these recommendations is primarily to list the points to consider for clinical (laboratory) geneticists, bioinformaticians, and (non-)geneticists, to provide technical advice, aid clinical decision-making and the reporting of the results. :AUTHOR: Souche, Erika and Beltran, Sergi and Brosens, Erwin and Belmont, John W. and Fossum, Magdalena and Riess, Olaf and Gilissen, Christian and Ardeshirdavani, Amin and Houge, Gunnar and van Gijn, Marielle and Clayton-Smith, Jill and Synofzik, Matthis and de Leeuw, Nicole and Deans, Zandra C. and Dincer, Yasemin and Eck, Sebastian H. and van der Crabben, Saskia and Balasubramanian, Meena and Graessner, Holm and Sturm, Marc and Firth, Helen and Ferlini, Alessandra and Nabbout, Rima and De Baere, Elfride and Liehr, Thomas and Macek, Milan and Matthijs, Gert and Scheffer, Hans and Bauer, Peter and Yntema, Helger G. and Weiss, Marjan M. :DATE: 2022-05-16 :DOI: 10.1038/s41431-022-01113-x :ISSN: 1476-5438 :ISSUE: 9 :JOURNAL: European Journal of Human Genetics :KEYWORDS: Medical genetics :LANGUAGE: En :PUBLISHER: Nature Publishing Group :URL: https://www.nature.com/articles/s41431-022-01113-x :VOLUME: 30 :YEAR: 2022 :END: * SMIM1 variants rs1175550 and rs143702418 independently modulate Vel blood group antigen expression :PROPERTIES: :TITLE: SMIM1 variants rs1175550 and rs143702418 independently modulate Vel blood group antigen expression :BTYPE: article :CUSTOM_ID: Christophersen2017 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>The Vel blood group antigen is expressed on the red blood cells of most individuals. Recently, we described that homozygosity for inactivating mutations in <jats:italic>SMIM1</jats:italic> defines the rare Vel-negative phenotype. Still, Vel-positive individuals show great variability in Vel antigen expression, creating a risk for Vel blood typing errors and transfusion reactions. We fine-mapped the regulatory region located in <jats:italic>SMIM1</jats:italic> intron 2 in Swedish blood donors, and observed a strong correlation between expression and rs1175550 as well as with a previously unreported tri-nucleotide insertion (rs143702418; C\hspace{0.167em}\>\hspace{0.167em}CGCA). While the two variants are tightly linked in Caucasians, we separated their effects in African Americans, and found that rs1175550G and to a lesser extent rs143702418C independently increase <jats:italic>SMIM1</jats:italic> and Vel antigen expression. Gel shift and luciferase assays indicate that both variants are transcriptionally active, and we identified binding of the transcription factor TAL1 as a potential mediator of the increased expression associated with rs1175550G. Our results provide insight into the regulatory logic of Vel antigen expression, and extend the set of markers for genetic Vel blood group typing.</jats:p> :AUTHOR: Christophersen, Mikael K. and J\"{o}ud, Magnus and Ajore, Ram and Vege, Sunitha and Ljungdahl, Klara W. and Westhoff, Connie M. and Olsson, Martin L. and Storry, Jill R. and Nilsson, Bj\"{o}rn :DOI: 10.1038/srep40451 :ISSUE: 1 :JOURNAL: Scientific Reports :LANGUAGE: en :MONTH: 1 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/srep40451 :VOLUME: 7 :YEAR: 2017 :END: * Comparing Variant Call Files for Performance Benchmarking of Next-Generation Sequencing Variant Calling Pipelines :PROPERTIES: :TITLE: Comparing Variant Call Files for Performance Benchmarking of Next-Generation Sequencing Variant Calling Pipelines :BTYPE: misc :CUSTOM_ID: Cleary2015 :ABSTRACT: <jats:p>To evaluate and compare the performance of variant calling methods and their confidence scores, comparisons between a test call set and a ?gold standard? need to be carried out. Unfortunately, these comparisons are not straightforward with the current Variant Call Files (VCF), which are the standard output of most variant calling algorithms for high-throughput sequencing data. Comparisons of VCFs are often confounded by the different representations of indels, MNPs, and combinations thereof with SNVs in complex regions of the genome, resulting in misleading results. A variant caller is inherently a classification method designed to score putative variants with confidence scores that could permit controlling the rate of false positives (FP) or false negatives (FN) for a given application. Receiver operator curves (ROC) and the area under the ROC (AUC) are efficient metrics to evaluate a test call set versus a gold standard. However, in the case of VCF data this also requires a special accounting to deal with discrepant representations. We developed a novel algorithm for comparing variant call sets that deals with complex call representation discrepancies and through a dynamic programing method that minimizes false positives and negatives globally across the entire call sets for accurate performance evaluation of VCFs.</jats:p> :AUTHOR: Cleary, John G. and Braithwaite, Ross and Gaastra, Kurt and Hilbush, Brian S and Inglis, Stuart and Irvine, Sean A and Jackson, Alan and Littin, Richard and Rathod, Mehul and Ware, David and Zook, Justin M. and Trigg, Len and De La Vega, Francisco M. :DOI: 10.1101/023754 :MONTH: 8 :PUBLISHER: Cold Spring Harbor Laboratory :URL: http://dx.doi.org/10.1101/023754 :YEAR: 2015 :END: * Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers :PROPERTIES: :TITLE: Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers :BTYPE: article :CUSTOM_ID: Chen2019 :ABSTRACT: The development and innovation of next generation sequencing (NGS) and the subsequent analysis tools have gain popularity in scientific researches and clinical diagnostic applications. Hence, a systematic comparison of the sequencing platforms and variant calling pipelines could provide significant guidance to NGS-based scientific and clinical genomics. In this study, we compared the performance, concordance and operating efficiency of 27 combinations of sequencing platforms and variant calling pipelines, testing three variant calling pipelines\textemdash{}Genome Analysis Tool Kit HaplotypeCaller, Strelka2 and Samtools-Varscan2 for nine data sets for the NA12878 genome sequenced by different platforms including BGISEQ500, MGISEQ2000, HiSeq4000, NovaSeq and HiSeq Xten. For the variants calling performance of 12 combinations in WES datasets, all combinations displayed good performance in calling SNPs, with their F-scores entirely higher than 0.96, and their performance in calling INDELs varies from 0.75 to 0.91. And all 15 combinations in WGS datasets also manifested good performance, with F-scores in calling SNPs were entirely higher than 0.975 and their performance in calling INDELs varies from 0.71 to 0.93. All of these combinations manifested high concordance in variant identification, while the divergence of variants identification in WGS datasets were larger than that in WES datasets. We also down-sampled the original WES and WGS datasets at a series of gradient coverage across multiple platforms, then the variants calling period consumed by the three pipelines at each coverage were counted, respectively. For the GIAB datasets on both BGI and Illumina platforms, Strelka2 manifested its ultra-performance in detecting accuracy and processing efficiency compared with other two pipelines on each sequencing platform, which was recommended in the further promotion and application of next generation sequencing technology. The results of our researches will provide useful and comprehensive guidelines for personal or organizational researchers in reliable and consistent variants identification. :AUTHOR: Chen, Jiayun and Li, Xingsong and Zhong, Hongbin and Meng, Yuhuan and Du, Hongli :DATE: 2019-06-27 :DOI: 10.1038/s41598-019-45835-3 :ISSN: 2045-2322 :ISSUE: 1 :JOURNAL: Scientific Reports :KEYWORDS: DNA sequencing :LANGUAGE: En :PUBLISHER: Nature Publishing Group :URL: https://www.nature.com/articles/s41598-019-45835-3 :VOLUME: 9 :YEAR: 2019 :END: * Strelka2: fast and accurate calling of germline and somatic variants :PROPERTIES: :TITLE: Strelka2: fast and accurate calling of germline and somatic variants :BTYPE: article :CUSTOM_ID: Kim2018 :AUTHOR: Kim, Sangtae and Scheffler, Konrad and Halpern, Aaron L. and Bekritsky, Mitchell A. and Noh, Eunho and K\"{a}llberg, Morten and Chen, Xiaoyu and Kim, Yeonbin and Beyter, Doruk and Krusche, Peter and Saunders, Christopher T. :DOI: 10.1038/s41592-018-0051-x :ISSUE: 8 :JOURNAL: Nature Methods :LANGUAGE: en :MONTH: 8 :PAGES: 591--594 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41592-018-0051-x :VOLUME: 15 :YEAR: 2018 :END: * Systematic Evaluation of Sanger Validation of Next-Generation Sequencing Variants :PROPERTIES: :TITLE: Systematic Evaluation of Sanger Validation of Next-Generation Sequencing Variants :BTYPE: article :CUSTOM_ID: Beck2016 :ABSTRACT: AbstractBACKGROUND. Next-generation sequencing (NGS) data are used for both clinical care and clinical research. DNA sequence variants identified using NGS are :AUTHOR: Beck, Tyler F and Mullikin, James C and the NISC Comparative Sequencing Program, and Biesecker, Leslie G :DOI: 10.1373/clinchem.2015.249623 :ISSN: 0009-9147 :ISSUE: 4 :JOURNAL: Clinical Chemistry :PUBLISHER: Oxford Academic :URL: https://dx.doi.org/10.1373/clinchem.2015.249623 :VOLUME: 62 :YEAR: 2016 :END: * Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications :PROPERTIES: :TITLE: Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications :BTYPE: article :CUSTOM_ID: Chen2016 :ABSTRACT: <jats:p>Summary: We describe Manta, a method to discover structural variants and indels from next generation sequencing data. Manta is optimized for rapid germline and somatic analysis, calling structural variants, medium-sized indels and large insertions on standard compute hardware in less than a tenth of the time that comparable methods require to identify only subsets of these variant types: for example NA12878 at 50\texttimes{} genomic coverage is analyzed in less than 20\hspace{0.167em}min. Manta can discover and score variants based on supporting paired and split-read evidence, with scoring models optimized for germline analysis of diploid individuals and somatic analysis of tumor-normal sample pairs. Call quality is similar to or better than comparable methods, as determined by pedigree consistency of germline calls and comparison of somatic calls to COSMIC database variants. Manta consistently assembles a higher fraction of its calls to base-pair resolution, allowing for improved downstream annotation and analysis of clinical significance. We provide Manta as a community resource to facilitate practical and routine structural variant analysis in clinical and research sequencing scenarios.</jats:p> <jats:p>Availability and implementation: Manta is released under the open-source GPLv3 license. Source code, documentation and Linux binaries are available from https://github.com/Illumina/manta.</jats:p> <jats:p>Contact: csaunders@illumina.com</jats:p> <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.</jats:p> :AUTHOR: Chen, Xiaoyu and Schulz-Trieglaff, Ole and Shaw, Richard and Barnes, Bret and Schlesinger, Felix and K\"{a}llberg, Morten and Cox, Anthony J. and Kruglyak, Semyon and Saunders, Christopher T. :DOI: 10.1093/bioinformatics/btv710 :ISSUE: 8 :JOURNAL: Bioinformatics :LANGUAGE: en :MONTH: 4 :PAGES: 1220--1222 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/bioinformatics/btv710 :VOLUME: 32 :YEAR: 2016 :END: * Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications :PROPERTIES: :TITLE: Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications :BTYPE: article :CUSTOM_ID: Rimmer2014 :AUTHOR: Rimmer, Andy and None, None and Phan, Hang and Mathieson, Iain and Iqbal, Zamin and Twigg, Stephen R F and Wilkie, Andrew O M and McVean, Gil and Lunter, Gerton :DOI: 10.1038/ng.3036 :ISSUE: 8 :JOURNAL: Nature Genetics :LANGUAGE: en :MONTH: 8 :PAGES: 912--918 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/ng.3036 :VOLUME: 46 :YEAR: 2014 :END: * New evaluation methods of read mapping by 17 aligners on simulated and empirical NGS data: an updated comparison of DNA- and RNA-Seq data from Illumina and Ion Torrent technologies :PROPERTIES: :TITLE: New evaluation methods of read mapping by 17 aligners on simulated and empirical NGS data: an updated comparison of DNA- and RNA-Seq data from Illumina and Ion Torrent technologies :BTYPE: article :CUSTOM_ID: Donato2021 :AUTHOR: Donato, Luigi and Scimone, Concetta and Rinaldi, Carmela and D'Angelo, Rosalia and Sidoti, Antonina :DOI: 10.1007/s00521-021-06188-z :ISSUE: 22 :JOURNAL: Neural Computing and Applications :LANGUAGE: en :MONTH: 11 :PAGES: 15669--15692 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1007/s00521-021-06188-z :VOLUME: 33 :YEAR: 2021 :END: * Systematic comparison of variant calling pipelines using gold standard personal exome variants :PROPERTIES: :TITLE: Systematic comparison of variant calling pipelines using gold standard personal exome variants :BTYPE: article :CUSTOM_ID: Hwang2015 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>The success of clinical genomics using next generation sequencing (NGS) requires the accurate and consistent identification of personal genome variants. Assorted variant calling methods have been developed, which show low concordance between their calls. Hence, a systematic comparison of the variant callers could give important guidance to NGS-based clinical genomics. Recently, a set of high-confident variant calls for one individual (NA12878) has been published by the Genome in a Bottle (GIAB) consortium, enabling performance benchmarking of different variant calling pipelines. Based on the gold standard reference variant calls from GIAB, we compared the performance of thirteen variant calling pipelines, testing combinations of three read aligners\textemdash{}BWA-MEM, Bowtie2 and Novoalign\textemdash{}and four variant callers\textemdash{}Genome Analysis Tool Kit HaplotypeCaller (GATK-HC), Samtools mpileup, Freebayes and Ion Proton Variant Caller (TVC), for twelve data sets for the NA12878 genome sequenced by different platforms including Illumina2000, Illumina2500 and Ion Proton, with various exome capture systems and exome coverage. We observed different biases toward specific types of SNP genotyping errors by the different variant callers. The results of our study provide useful guidelines for reliable variant identification from deep sequencing of personal genomes.</jats:p> :AUTHOR: Hwang, Sohyun and Kim, Eiru and Lee, Insuk and Marcotte, Edward M. :DOI: 10.1038/srep17875 :ISSUE: 1 :JOURNAL: Scientific Reports :LANGUAGE: en :MONTH: 12 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/srep17875 :VOLUME: 5 :YEAR: 2015 :END: * Extensive sequencing of seven human genomes to characterize benchmark reference materials :PROPERTIES: :TITLE: Extensive sequencing of seven human genomes to characterize benchmark reference materials :BTYPE: article :CUSTOM_ID: Zook2016 :ABSTRACT: The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a ... :AUTHOR: Zook, Justin M. and Catoe, David and McDaniel, Jennifer and Vang, Lindsay and Spies, Noah and Sidow, Arend and Weng, Ziming and Liu, Yuling and Mason, Christopher E. and Alexander, Noah and Henaff, Elizabeth and McIntyre, Alexa B.R. and Chandramohan, Dhruva and Chen, Feng and Jaeger, Erich and Moshrefi, Ali and Pham, Khoa and Stedman, William and Liang, Tiffany and Saghbini, Michael and Dzakula, Zeljko and Hastie, Alex and Cao, Han and Deikus, Gintaras and Schadt, Eric and Sebra, Robert and Bashir, Ali and Truty, Rebecca M. and Chang, Christopher C. and Gulbahce, Natali and Zhao, Keyan and Ghosh, Srinka and Hyland, Fiona and Fu, Yutao and Chaisson, Mark and Xiao, Chunlin and Trow, Jonathan and Sherry, Stephen T. and Zaranek, Alexander W. and Ball, Madeleine and Bobe, Jason and Estep, Preston and Church, George M. and Marks, Patrick and Kyriazopoulou-Panagiotopoulou, Sofia and Zheng, Grace X.Y. and Schnall-Levin, Michael and Ordonez, Heather S. and Mudivarti, Patrice A. and Giorda, Kristina and Sheng, Ying and Rypdal, Karoline Bjarnesdatter and Salit, Marc :DATE: 2016 :DOI: 10.1038/sdata.2016.25 :JOURNAL: Scientific Data :LANGUAGE: en :PUBLISHER: Nature Publishing Group :URL: /pmc/articles/PMC4896128/ :VOLUME: 3 :YEAR: 2016 :END: * A robust benchmark for detection of germline large deletions and insertions :PROPERTIES: :TITLE: A robust benchmark for detection of germline large deletions and insertions :BTYPE: article :CUSTOM_ID: Zook2020 :AUTHOR: Zook, Justin M. and Hansen, Nancy F. and Olson, Nathan D. and Chapman, Lesley and Mullikin, James C. and Xiao, Chunlin and Sherry, Stephen and Koren, Sergey and Phillippy, Adam M. and Boutros, Paul C. and Sahraeian, Sayed Mohammad E. and Huang, Vincent and Rouette, Alexandre and Alexander, Noah and Mason, Christopher E. and Hajirasouliha, Iman and Ricketts, Camir and Lee, Joyce and Tearle, Rick and Fiddes, Ian T. and Barrio, Alvaro Martinez and Wala, Jeremiah and Carroll, Andrew and Ghaffari, Noushin and Rodriguez, Oscar L. and Bashir, Ali and Jackman, Shaun and Farrell, John J. and Wenger, Aaron M. and Alkan, Can and Soylev, Arda and Schatz, Michael C. and Garg, Shilpa and Church, George and Marschall, Tobias and Chen, Ken and Fan, Xian and English, Adam C. and Rosenfeld, Jeffrey A. and Zhou, Weichen and Mills, Ryan E. and Sage, Jay M. and Davis, Jennifer R. and Kaiser, Michael D. and Oliver, John S. and Catalano, Anthony P. and Chaisson, Mark J. P. and Spies, Noah and Sedlazeck, Fritz J. and Salit, Marc :DOI: 10.1038/s41587-020-0538-8 :ISSUE: 11 :JOURNAL: Nature Biotechnology :LANGUAGE: en :MONTH: 11 :PAGES: 1347--1355 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41587-020-0538-8 :VOLUME: 38 :YEAR: 2020 :END: * Next-generation sequencing technologies: An overview :PROPERTIES: :TITLE: Next-generation sequencing technologies: An overview :BTYPE: article :CUSTOM_ID: Hu2021 :AUTHOR: Hu, Taishan and Chitnis, Nilesh and Monos, Dimitri and Dinh, Anh :DOI: 10.1016/j.humimm.2021.02.012 :ISSUE: 11 :JOURNAL: Human Immunology :LANGUAGE: en :MONTH: 11 :PAGES: 801--811 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.humimm.2021.02.012 :VOLUME: 82 :YEAR: 2021 :END: * Guidelines for diagnostic next-generation sequencing :PROPERTIES: :TITLE: Guidelines for diagnostic next-generation sequencing :BTYPE: article :CUSTOM_ID: Matthijs2015 :ABSTRACT: We present, on behalf of EuroGentest and the European Society of Human Genetics, guidelines for the evaluation and validation of next-generation sequencing (NGS) applications for the diagnosis of genetic disorders. The work was performed by a group of laboratory geneticists and bioinformaticians, and discussed with clinical geneticists, industry and patients' representatives, and other stakeholders in the field of human genetics. The statements that were written during the elaboration of the guidelines are presented here. The background document and full guidelines are available as supplementary material. They include many examples to assist the laboratories in the implementation of NGS and accreditation of this service. The work and ideas presented by others in guidelines that have emerged elsewhere in the course of the past few years were also considered and are acknowledged in the full text. Interestingly, a few new insights that have not been cited before have emerged during the preparation of the guidelines. The most important new feature is the presentation of a `rating system' for NGS-based diagnostic tests. The guidelines and statements have been applauded by the genetic diagnostic community, and thus seem to be valuable for the harmonization and quality assurance of NGS diagnostics in Europe. :AUTHOR: Matthijs, Gert and Souche, Erika and Alders, Mari\"{e}lle and Corveleyn, Anniek and Eck, Sebastian and Feenstra, Ilse and Race, Val\'{e}rie and Sistermans, Erik and Sturm, Marc and Weiss, Marjan and Yntema, Helger and Bakker, Egbert and Scheffer, Hans and Bauer, Peter :DATE: 2015-10-28 :DOI: 10.1038/ejhg.2015.226 :ISSN: 1476-5438 :ISSUE: 1 :JOURNAL: European Journal of Human Genetics :KEYWORDS: Genetic testing :LANGUAGE: En :PUBLISHER: Nature Publishing Group :URL: https://www.nature.com/articles/ejhg2015226 :VOLUME: 24 :YEAR: 2015 :END: * Ruffus: a lightweight Python library for computational pipelines :PROPERTIES: :TITLE: Ruffus: a lightweight Python library for computational pipelines :BTYPE: article :CUSTOM_ID: Goodstadt :ABSTRACT: Abstract. Summary: Computational pipelines are common place in scientific research. However, most of the resources for constructing pipelines are heavyweight sy :AUTHOR: Goodstadt, Leo :DOI: 10.1093/bioinformatics/btq524 :ISSN: 1367-4803 :ISSUE: 21 :JOURNAL: Bioinformatics :PUBLISHER: Oxford Academic :URL: https://dx.doi.org/10.1093/bioinformatics/btq524 :VOLUME: 26 :END: * A synthetic-diploid benchmark for accurate variant-calling evaluation :PROPERTIES: :TITLE: A synthetic-diploid benchmark for accurate variant-calling evaluation :BTYPE: article :CUSTOM_ID: Li2018 :AUTHOR: Li, Heng and Bloom, Jonathan M. and Farjoun, Yossi and Fleharty, Mark and Gauthier, Laura and Neale, Benjamin and MacArthur, Daniel :DOI: 10.1038/s41592-018-0054-7 :ISSUE: 8 :JOURNAL: Nature Methods :LANGUAGE: en :MONTH: 8 :PAGES: 595--597 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41592-018-0054-7 :VOLUME: 15 :YEAR: 2018 :END: * VC@Scale: Scalable and high-performance variant calling on cluster environments :PROPERTIES: :TITLE: VC@Scale: Scalable and high-performance variant calling on cluster environments :BTYPE: article :CUSTOM_ID: Ahmad2021 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:sec> <jats:title>Background</jats:title> <jats:p>Recently many new deep learning\textendash{}based variant-calling methods like DeepVariant have emerged as more accurate compared with conventional variant-calling algorithms such as GATK HaplotypeCaller, Sterlka2, and Freebayes albeit at higher computational costs. Therefore, there is a need for more scalable and higher performance workflows of these deep learning methods. Almost all existing cluster-scaled variant-calling workflows that use Apache Spark/Hadoop as big data frameworks loosely integrate existing single-node pre-processing and variant-calling applications. Using Apache Spark just for distributing/scheduling data among loosely coupled applications or using I/O-based storage for storing the output of intermediate applications does not exploit the full benefit of Apache Spark in-memory processing. To achieve this, we propose a native Spark-based workflow that uses Python and Apache Arrow to enable efficient transfer of data between different workflow stages. This benefits from the ease of programmability of Python and the high efficiency of Arrow's columnar in-memory data transformations.</jats:p> </jats:sec> <jats:sec> <jats:title>Results</jats:title> <jats:p>Here we present a scalable, parallel, and efficient implementation of next-generation sequencing data pre-processing and variant-calling workflows. Our design tightly integrates most pre-processing workflow stages, using Spark built-in functions to sort reads by coordinates and mark duplicates efficiently. Our approach outperforms state-of-the-art implementations by \&gt;2 times for the pre-processing stages, creating a scalable and high-performance solution for DeepVariant for both CPU-only and CPU + GPU clusters.</jats:p> </jats:sec> <jats:sec> <jats:title>Conclusions</jats:title> <jats:p>We show the feasibility and easy scalability of our approach to achieve high performance and efficient resource utilization for variant-calling analysis on high-performance computing clusters using the standardized Apache Arrow data representations. All codes, scripts, and configurations used to run our implementations are publicly available and open sourced; see https://github.com/abs-tudelft/variant-calling-at-scale.</jats:p> </jats:sec> :AUTHOR: Ahmad, Tanveer and Al Ars, Zaid and Hofstee, H Peter :DOI: 10.1093/gigascience/giab057 :ISSUE: 9 :JOURNAL: GigaScience :LANGUAGE: en :MONTH: 9 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/gigascience/giab057 :VOLUME: 10 :YEAR: 2021 :END: * Nextflow enables reproducible computational workflows :PROPERTIES: :TITLE: Nextflow enables reproducible computational workflows :BTYPE: article :CUSTOM_ID: DiTommaso2017 :AUTHOR: Di Tommaso, Paolo and Chatzou, Maria and Floden, Evan W and Barja, Pablo Prieto and Palumbo, Emilio and Notredame, Cedric :DOI: 10.1038/nbt.3820 :ISSUE: 4 :JOURNAL: Nature Biotechnology :LANGUAGE: en :MONTH: 4 :PAGES: 316--319 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/nbt.3820 :VOLUME: 35 :YEAR: 2017 :END: * Heterozygosity Ratio, a Robust Global Genomic Measure of Autozygosity and Its Association with Height and Disease Risk :PROPERTIES: :TITLE: Heterozygosity Ratio, a Robust Global Genomic Measure of Autozygosity and Its Association with Height and Disease Risk :BTYPE: article :CUSTOM_ID: Samuels2016 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:p>Greater genetic variability in an individual is protective against recessive disease. However, existing quantifications of autozygosity, such as runs of homozygosity (ROH), have proved highly sensitive to genotyping density and have yielded inconclusive results about the relationship of diversity and disease risk. Using genotyping data from three data sets with \&gt;43,000 subjects, we demonstrated that an alternative approach to quantifying genetic variability, the heterozygosity ratio, is a robust measure of diversity and is positively associated with the nondisease trait height and several disease phenotypes in subjects of European ancestry. The heterozygosity ratio is the number of heterozygous sites in an individual divided by the number of nonreference homozygous sites and is strongly affected by the degree of genetic admixture of the population and varies across human populations. Unlike quantifications of ROH, the heterozygosity ratio is not sensitive to the density of genotyping performed. Our results establish the heterozygosity ratio as a powerful new statistic for exploring the patterns and phenotypic effects of different levels of genetic variation in populations.</jats:p> :AUTHOR: Samuels, David C and Wang, Jing and Ye, Fei and He, Jing and Levinson, Rebecca T and Sheng, Quanhu and Zhao, Shilin and Capra, John A and Shyr, Yu and Zheng, Wei and Guo, Yan :DOI: 10.1534/genetics.116.189936 :ISSUE: 3 :JOURNAL: Genetics :LANGUAGE: en :MONTH: 11 :PAGES: 893--904 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1534/genetics.116.189936 :VOLUME: 204 :YEAR: 2016 :END: * Declining transition/transversion ratios through time reveal limitations to the accuracy of nucleotide substitution models :PROPERTIES: :TITLE: Declining transition/transversion ratios through time reveal limitations to the accuracy of nucleotide substitution models :BTYPE: article :CUSTOM_ID: Duchene2015 :AUTHOR: Duch\^{e}ne, Sebasti\'{a}n and Ho, Simon YW and Holmes, Edward C :DOI: 10.1186/s12862-015-0312-6 :ISSUE: 1 :JOURNAL: BMC Evolutionary Biology :LANGUAGE: en :MONTH: 12 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1186/s12862-015-0312-6 :VOLUME: 15 :YEAR: 2015 :END: * Coming of age: ten years of next-generation sequencing technologies :PROPERTIES: :TITLE: Coming of age: ten years of next-generation sequencing technologies :BTYPE: article :CUSTOM_ID: Goodwin2016 :AUTHOR: Goodwin, Sara and McPherson, John D. and McCombie, W. Richard :DOI: 10.1038/nrg.2016.49 :ISSUE: 6 :JOURNAL: Nature Reviews Genetics :LANGUAGE: en :MONTH: 6 :PAGES: 333--351 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/nrg.2016.49 :VOLUME: 17 :YEAR: 2016 :END: * SPiP: Splicing Prediction Pipeline, a machine learning tool for massive detection of exonic and intronic variant effects on mRNA splicing :PROPERTIES: :TITLE: SPiP: Splicing Prediction Pipeline, a machine learning tool for massive detection of exonic and intronic variant effects on mRNA splicing :BTYPE: article :CUSTOM_ID: Leman2022 :AUTHOR: Leman, Rapha\"{e}l and Parfait, B\'{e}atrice and Vidaud, Dominique and Girodon, Emmanuelle and Pacot, Laurence and Le Gac, G\'{e}rald and Ka, Chandran and Ferec, Claude and Fichou, Yann and Quesnelle, C\'{e}line and Aucouturier, Camille and Muller, Etienne and Vaur, Dominique and Castera, Laurent and Boulouard, Flavie and Ricou, Agathe and Tubeuf, H\'{e}l\`{e}ne and Soukarieh, Omar and Gaildrat, Pascaline and Riant, Florence and Guillaud-Bataille, Marine and Caputo, Sandrine M. and Caux-Moncoutier, Virginie and Boutry-Kryza, Nadia and Bonnet-Dorion, Fran\c{c}oise and Schultz, Ines and Rossing, Maria and Quenez, Olivier and Goldenberg, Louis and Harter, Valentin and Parsons, Michael T. and Spurdle, Amanda B. and Fr\'{e}bourg, Thierry and Martins, Alexandra and Houdayer, Claude and Krieger, Sophie :DOI: 10.1002/humu.24491 :ISSUE: 12 :JOURNAL: Human Mutation :LANGUAGE: en :MONTH: 12 :PAGES: 2308--2323 :PUBLISHER: Hindawi Limited :URL: http://dx.doi.org/10.1002/humu.24491 :VOLUME: 43 :YEAR: 2022 :END: * Recommendations for the Use of in Silico Approaches for Next-Generation Sequencing Bioinformatic Pipeline Validation :PROPERTIES: :TITLE: Recommendations for the Use of in Silico Approaches for Next-Generation Sequencing Bioinformatic Pipeline Validation :BTYPE: article :CUSTOM_ID: Duncavage2023 :AUTHOR: Duncavage, Eric J. and Coleman, Joshua F. and de Baca, Monica E. and Kadri, Sabah and Leon, Annette and Routbort, Mark and Roy, Somak and Suarez, Carlos J. and Vanderbilt, Chad and Zook, Justin M. :DOI: 10.1016/j.jmoldx.2022.09.007 :ISSUE: 1 :JOURNAL: The Journal of Molecular Diagnostics :LANGUAGE: en :MONTH: 1 :PAGES: 3--16 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.jmoldx.2022.09.007 :VOLUME: 25 :YEAR: 2023 :END: * VarBen: Generating in Silico Reference Data Sets for Clinical Next-Generation Sequencing Bioinformatics Pipeline Evaluation :PROPERTIES: :TITLE: VarBen: Generating in Silico Reference Data Sets for Clinical Next-Generation Sequencing Bioinformatics Pipeline Evaluation :BTYPE: article :CUSTOM_ID: Li2021Varben :ABSTRACT: Next-generation sequencing is increasingly being adopted as a valuable method for the detection of somatic variants in clinical oncology. However, it \ldots{} :AUTHOR: Li, Ziyang and Fang, Shuangsang and Zhang, Rui and Yu, Lijia and Zhang, Jiawei and Bu, Dechao and Sun, Liang and Zhao, Yi and Li, Jinming :DOI: 10.1016/j.jmoldx.2020.11.010 :ISSN: 1525-1578 :ISSUE: 3 :JOURNAL: The Journal of Molecular Diagnostics :PAGES: 285-299 :PUBLISHER: Elsevier :URL: https://www.sciencedirect.com/science/article/pii/S1525157820305857 :VOLUME: 23 :YEAR: 2021 :END: * Firing patterns in the adaptive exponential integrate-and-fire model :PROPERTIES: :TITLE: Firing patterns in the adaptive exponential integrate-and-fire model :BTYPE: article :CUSTOM_ID: Naud2008 :AUTHOR: Naud, Richard and Marcille, Nicolas and Clopath, Claudia and Gerstner, Wulfram :DOI: 10.1007/s00422-008-0264-7 :ISSUE: 4-5 :JOURNAL: Biological Cybernetics :LANGUAGE: en :MONTH: 11 :PAGES: 335--347 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1007/s00422-008-0264-7 :VOLUME: 99 :YEAR: 2008 :END: * Protocol for unbiased, consolidated variant calling from whole exome sequencing data :PROPERTIES: :TITLE: Protocol for unbiased, consolidated variant calling from whole exome sequencing data :BTYPE: article :CUSTOM_ID: Verrou2022 :AUTHOR: Verrou, Kleio-Maria and Pavlopoulos, Georgios A. and Moulos, Panagiotis :DOI: 10.1016/j.xpro.2022.101418 :ISSUE: 2 :JOURNAL: STAR Protocols :LANGUAGE: en :MONTH: 6 :PAGES: 101418 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.xpro.2022.101418 :VOLUME: 3 :YEAR: 2022 :END: * Accuracy and efficiency of germline variant calling pipelines for human genome data :PROPERTIES: :TITLE: Accuracy and efficiency of germline variant calling pipelines for human genome data :BTYPE: article :CUSTOM_ID: Zhao2020 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>Advances in next-generation sequencing technology have enabled whole genome sequencing (WGS) to be widely used for identification of causal variants in a spectrum of genetic-related disorders, and provided new insight into how genetic polymorphisms affect disease phenotypes. The development of different bioinformatics pipelines has continuously improved the variant analysis of WGS data. However, there is a necessity for a systematic performance comparison of these pipelines to provide guidance on the application of WGS-based scientific and clinical genomics. In this study, we evaluated the performance of three variant calling pipelines (GATK, DRAGEN and DeepVariant) using the Genome in a Bottle Consortium, \textquotedblleft{}synthetic-diploid\textquotedblright{} and simulated WGS datasets. DRAGEN and DeepVariant show better accuracy in SNP and indel calling, with no significant differences in their F1-score. DRAGEN platform offers accuracy, flexibility and a highly-efficient execution speed, and therefore superior performance in the analysis of WGS data on a large scale. The combination of DRAGEN and DeepVariant also suggests a good balance of accuracy and efficiency as an alternative solution for germline variant detection in further applications. Our results facilitate the standardization of benchmarking analysis of bioinformatics pipelines for reliable variant detection, which is critical in genetics-based medical research and clinical applications.</jats:p> :AUTHOR: Zhao, Sen and Agafonov, Oleg and Azab, Abdulrahman and Stokowy, Tomasz and Hovig, Eivind :DOI: 10.1038/s41598-020-77218-4 :ISSUE: 1 :JOURNAL: Scientific Reports :LANGUAGE: en :MONTH: 11 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41598-020-77218-4 :VOLUME: 10 :YEAR: 2020 :END: * A global reference for human genetic variation :PROPERTIES: :TITLE: A global reference for human genetic variation :BTYPE: article :CUSTOM_ID: 1000Genomes :AUTHOR: The 1000 Genomes Project Consortium :DOI: 10.1038/nature15393 :ISSUE: 7571 :JOURNAL: Nature :LANGUAGE: en :MONTH: 10 :PAGES: 68--74 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/nature15393 :VOLUME: 526 :YEAR: 2015 :END: * Genomics in the cloud: using Docker, GATK, and WDL in Terra :PROPERTIES: :TITLE: Genomics in the cloud: using Docker, GATK, and WDL in Terra :BTYPE: article :CUSTOM_ID: Auwera2020 :AUTHOR: Van der Auwera, Geraldine A and O'Connor, Brian D. :PUBLISHER: O'Reilly Media :YEAR: 2020 :END: * A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree :PROPERTIES: :TITLE: A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree :BTYPE: article :CUSTOM_ID: Eberle2017 :ABSTRACT: <jats:p>Improvement of variant calling in next-generation sequence data requires a comprehensive, genome-wide catalog of high-confidence variants called in a set of genomes for use as a benchmark. We generated deep, whole-genome sequence data of 17 individuals in a three-generation pedigree and called variants in each genome using a range of currently available algorithms. We used haplotype transmission information to create a phased \textquotedblleft{}Platinum\textquotedblright{} variant catalog of 4.7 million single-nucleotide variants (SNVs) plus 0.7 million small (1\textendash{}50 bp) insertions and deletions (indels) that are consistent with the pattern of inheritance in the parents and 11 children of this pedigree. Platinum genotypes are highly concordant with the current catalog of the National Institute of Standards and Technology for both SNVs (\>99.99\%) and indels (99.92\%) and add a validated truth catalog that has 26\% more SNVs and 45\% more indels. Analysis of 334,652 SNVs that were consistent between informatics pipelines yet inconsistent with haplotype transmission (\textquotedblleft{}nonplatinum\textquotedblright{}) revealed that the majority of these variants are de novo and cell-line mutations or reside within previously unidentified duplications and deletions. The reference materials from this study are a resource for objective assessment of the accuracy of variant calls throughout genomes.</jats:p> :AUTHOR: Eberle, Michael A. and Fritzilas, Epameinondas and Krusche, Peter and K\"{a}llberg, Morten and Moore, Benjamin L. and Bekritsky, Mitchell A. and Iqbal, Zamin and Chuang, Han-Yu and Humphray, Sean J. and Halpern, Aaron L. and Kruglyak, Semyon and Margulies, Elliott H. and McVean, Gil and Bentley, David R. :DOI: 10.1101/gr.210500.116 :ISSUE: 1 :JOURNAL: Genome Research :LANGUAGE: en :MONTH: 1 :PAGES: 157--164 :PUBLISHER: Cold Spring Harbor Laboratory :URL: http://dx.doi.org/10.1101/gr.210500.116 :VOLUME: 27 :YEAR: 2017 :END: * A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff :PROPERTIES: :TITLE: A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff :BTYPE: article :CUSTOM_ID: Cingolani2012 :AUTHOR: Cingolani, Pablo and Platts, Adrian and Wang, Le Lily and Coon, Melissa and Nguyen, Tung and Wang, Luan and Land, Susan J. and Lu, Xiangyi and Ruden, Douglas M. :DOI: 10.4161/fly.19695 :ISSUE: 2 :JOURNAL: Fly :LANGUAGE: en :MONTH: 4 :PAGES: 80--92 :PUBLISHER: Informa UK Limited :URL: http://dx.doi.org/10.4161/fly.19695 :VOLUME: 6 :YEAR: 2012 :END: * Best practices for the interpretation and reporting of clinical whole genome sequencing :PROPERTIES: :TITLE: Best practices for the interpretation and reporting of clinical whole genome sequencing :BTYPE: article :CUSTOM_ID: AustinTse2022 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>Whole genome sequencing (WGS) shows promise as a first-tier diagnostic test for patients with rare genetic disorders. However, standards addressing the definition and deployment practice of a best-in-class test are lacking. To address these gaps, the Medical Genome Initiative, a consortium of leading health care and research organizations in the US and Canada, was formed to expand access to high quality clinical WGS by convening experts and publishing best practices. Here, we present best practice recommendations for the interpretation and reporting of clinical diagnostic WGS, including discussion of challenges and emerging approaches that will be critical to harness the full potential of this comprehensive test.</jats:p> :AUTHOR: Austin-Tse, Christina A. and Jobanputra, Vaidehi and Perry, Denise L. and Bick, David and Taft, Ryan J. and Venner, Eric and Gibbs, Richard A. and Young, Ted and Barnett, Sarah and Belmont, John W. and Boczek, Nicole and Chowdhury, Shimul and Ellsworth, Katarzyna A. and Guha, Saurav and Kulkarni, Shashikant and Marcou, Cherisse and Meng, Linyan and Murdock, David R. and Rehman, Atteeq U. and Spiteri, Elizabeth and Thomas-Wilson, Amanda and Kearney, Hutton M. and Rehm, Heidi L. and None, None :DOI: 10.1038/s41525-022-00295-z :ISSUE: 1 :JOURNAL: npj Genomic Medicine :LANGUAGE: en :MONTH: 4 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41525-022-00295-z :VOLUME: 7 :YEAR: 2022 :END: * Whole Exome Sequencing Achieved a Definite Diagnosis of Kaufman Oculocerebrofacial Syndrome in a Bahraini Family: A Case Report :PROPERTIES: :TITLE: Whole Exome Sequencing Achieved a Definite Diagnosis of Kaufman Oculocerebrofacial Syndrome in a Bahraini Family: A Case Report :BTYPE: article :CUSTOM_ID: Fida2023 :ABSTRACT: <jats:p> A 1\hspace{0.167em}year and 7\hspace{0.167em}months old girl presented to the medical genetic clinic as a referral from the pediatrics clinic. Upon examining the patient and assessing past medical history, an autosomal recessive disorder was suspected. The family underwent whole exome sequencing, which resulted in the diagnosis of Kaufman oculocerebrofacial syndrome (OMIM \#244450) in the patient due to the fact that both parents were heterozygous carriers of a novel pathogenic variant in the gene UBE3B that lies on 12q24. It has been recommended for the family that preimplantation genetic testing should be considered for future pregnancies. In this case report, we present a novel variant of the gene and highlight the support of whole exome sequencing in the unveiling of genetic disorders. </jats:p> :AUTHOR: Fida, Mariam and Sinan, Israa and Finan, Alan :DOI: 10.1177/11795565231200130 :JOURNAL: Clinical Medicine Insights: Pediatrics :LANGUAGE: en :MONTH: 1 :PUBLISHER: SAGE Publications :URL: http://dx.doi.org/10.1177/11795565231200130 :VOLUME: 17 :YEAR: 2023 :END: * On genomic repeats and reproducibility :PROPERTIES: :TITLE: On genomic repeats and reproducibility :BTYPE: article :CUSTOM_ID: Firtina2016 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:p>Results: Here, we present a comprehensive analysis on the reproducibility of computational characterization of genomic variants using high throughput sequencing data. We reanalyzed the same datasets twice, using the same tools with the same parameters, where we only altered the order of reads in the input (i.e. FASTQ file). Reshuffling caused the reads from repetitive regions being mapped to different locations in the second alignment, and we observed similar results when we only applied a scatter/gather approach for read mapping\textemdash{}without prior shuffling. Our results show that, some of the most common variation discovery algorithms do not handle the ambiguous read mappings accurately when random locations are selected. In addition, we also observed that even when the exact same alignment is used, the GATK HaplotypeCaller generates slightly different call sets, which we pinpoint to the variant filtration step. We conclude that, algorithms at each step of genomic variation discovery and characterization need to treat ambiguous mappings in a deterministic fashion to ensure full replication of results.</jats:p> <jats:p>Availability and Implementation: Code, scripts and the generated VCF files are available at DOI:10.5281/zenodo.32611.</jats:p> <jats:p>Contact: calkan@cs.bilkent.edu.tr</jats:p> <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.</jats:p> :AUTHOR: Firtina, Can and Alkan, Can :DOI: 10.1093/bioinformatics/btw139 :ISSUE: 15 :JOURNAL: Bioinformatics :LANGUAGE: en :MONTH: 8 :PAGES: 2243--2247 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/bioinformatics/btw139 :VOLUME: 32 :YEAR: 2016 :END: * OMIM.org: Online Mendelian Inheritance in Man (OMIM\textregistered{}), an online catalog of human genes and genetic disorders :PROPERTIES: :TITLE: OMIM.org: Online Mendelian Inheritance in Man (OMIM\textregistered{}), an online catalog of human genes and genetic disorders :BTYPE: article :CUSTOM_ID: Amberger2015 :AUTHOR: Amberger, Joanna S. and Bocchini, Carol A. and Schiettecatte, Fran\c{c}ois and Scott, Alan F. and Hamosh, Ada :DOI: 10.1093/nar/gku1205 :ISSUE: D1 :JOURNAL: Nucleic Acids Research :LANGUAGE: en :MONTH: 1 :PAGES: D789--D798 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/nar/gku1205 :VOLUME: 43 :YEAR: 2015 :END: * Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants :PROPERTIES: :TITLE: Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants :BTYPE: article :CUSTOM_ID: Garcia2020 :ABSTRACT: <ns4:p>Whole-genome sequencing (WGS) is a fundamental technology for research to advance precision medicine, but the limited availability of portable and user-friendly workflows for WGS analyses poses a major challenge for many research groups and hampers scientific progress. Here we present Sarek, an open-source workflow to detect germline variants and somatic mutations based on sequencing data from WGS, whole-exome sequencing (WES), or gene panels. Sarek features (i) easy installation, (ii) robust portability across different computer environments, (iii) comprehensive documentation, (iv) transparent and easy-to-read code, and (v) extensive quality metrics reporting. Sarek is implemented in the Nextflow workflow language and supports both Docker and Singularity containers as well as Conda environments, making it ideal for easy deployment on any POSIX-compatible computers and cloud compute environments. Sarek follows the GATK best-practice recommendations for read alignment and pre-processing, and includes a wide range of software for the identification and annotation of germline and somatic single-nucleotide variants, insertion and deletion variants, structural variants, tumour sample purity, and variations in ploidy and copy number. Sarek offers easy, efficient, and reproducible WGS analyses, and can readily be used both as a production workflow at sequencing facilities and as a powerful stand-alone tool for individual research groups. The Sarek source code, documentation and installation instructions are freely available at <ns4:ext-link xmlns:ns3="http://www.w3.org/1999/xlink" ext-link-type="uri" ns3:href="https://github.com/nf-core/sarek">https://github.com/nf-core/sarek</ns4:ext-link> and at <ns4:ext-link xmlns:ns3="http://www.w3.org/1999/xlink" ext-link-type="uri" ns3:href="https://nf-co.re/sarek/">https://nf-co.re/sarek/</ns4:ext-link>.</ns4:p> :AUTHOR: Garcia, Maxime and Juhos, Szilveszter and Larsson, Malin and Olason, Pall I. and Martin, Marcel and Eisfeldt, Jesper and DiLorenzo, Sebastian and Sandgren, Johanna and D\'{\i}az De St\aa{}hl, Teresita and Ewels, Philip and Wirta, Valtteri and Nist\'{e}r, Monica and K\"{a}ller, Max and Nystedt, Bj\"{o}rn :DOI: 10.12688/f1000research.16665.2 :JOURNAL: F1000Research :LANGUAGE: en :MONTH: 9 :PAGES: 63 :PUBLISHER: F1000 Research Ltd :URL: http://dx.doi.org/10.12688/f1000research.16665.2 :VOLUME: 9 :YEAR: 2020 :END: * Predicting Splicing from Primary Sequence with Deep Learning :PROPERTIES: :TITLE: Predicting Splicing from Primary Sequence with Deep Learning :BTYPE: article :CUSTOM_ID: Jaganathan2019 :AUTHOR: Jaganathan, Kishore and Kyriazopoulou Panagiotopoulou, Sofia and McRae, Jeremy F. and Darbandi, Siavash Fazel and Knowles, David and Li, Yang I. and Kosmicki, Jack A. and Arbelaez, Juan and Cui, Wenwu and Schwartz, Grace B. and Chow, Eric D. and Kanterakis, Efstathios and Gao, Hong and Kia, Amirali and Batzoglou, Serafim and Sanders, Stephan J. and Farh, Kyle Kai-How :DOI: 10.1016/j.cell.2018.12.015 :ISSUE: 3 :JOURNAL: Cell :LANGUAGE: en :MONTH: 1 :PAGES: 535--548.e24 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.cell.2018.12.015 :VOLUME: 176 :YEAR: 2019 :END: * A draft human pangenome reference :PROPERTIES: :TITLE: A draft human pangenome reference :BTYPE: article :CUSTOM_ID: Liao2023 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals<jats:sup>1</jats:sup>. These assemblies cover more than 99\% of the expected sequence in each genome and are more than 99\% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119\hspace{0.167em}million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90\hspace{0.167em}million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34\% and increased the number of structural variants detected per haplotype by 104\% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.</jats:p> :AUTHOR: Liao, Wen-Wei and Asri, Mobin and Ebler, Jana and Doerr, Daniel and Haukness, Marina and Hickey, Glenn and Lu, Shuangjia and Lucas, Julian K. and Monlong, Jean and Abel, Haley J. and Buonaiuto, Silvia and Chang, Xian H. and Cheng, Haoyu and Chu, Justin and Colonna, Vincenza and Eizenga, Jordan M. and Feng, Xiaowen and Fischer, Christian and Fulton, Robert S. and Garg, Shilpa and Groza, Cristian and Guarracino, Andrea and Harvey, William T. and Heumos, Simon and Howe, Kerstin and Jain, Miten and Lu, Tsung-Yu and Markello, Charles and Martin, Fergal J. and Mitchell, Matthew W. and Munson, Katherine M. and Mwaniki, Moses Njagi and Novak, Adam M. and Olsen, Hugh E. and Pesout, Trevor and Porubsky, David and Prins, Pjotr and Sibbesen, Jonas A. and Sir\'{e}n, Jouni and Tomlinson, Chad and Villani, Flavia and Vollger, Mitchell R. and Antonacci-Fulton, Lucinda L. and Baid, Gunjan and Baker, Carl A. and Belyaeva, Anastasiya and Billis, Konstantinos and Carroll, Andrew and Chang, Pi-Chuan and Cody, Sarah and Cook, Daniel E. and Cook-Deegan, Robert M. and Cornejo, Omar E. and Diekhans, Mark and Ebert, Peter and Fairley, Susan and Fedrigo, Olivier and Felsenfeld, Adam L. and Formenti, Giulio and Frankish, Adam and Gao, Yan and Garrison, Nanibaa' A. and Giron, Carlos Garcia and Green, Richard E. and Haggerty, Leanne and Hoekzema, Kendra and Hourlier, Thibaut and Ji, Hanlee P. and Kenny, Eimear E. and Koenig, Barbara A. and Kolesnikov, Alexey and Korbel, Jan O. and Kordosky, Jennifer and Koren, Sergey and Lee, HoJoon and Lewis, Alexandra P. and Magalh\ {a}es, Hugo and Marco-Sola, Santiago and Marijon, Pierre and McCartney, Ann and McDaniel, Jennifer and Mountcastle, Jacquelyn and Nattestad, Maria and Nurk, Sergey and Olson, Nathan D. and Popejoy, Alice B. and Puiu, Daniela and Rautiainen, Mikko and Regier, Allison A. and Rhie, Arang and Sacco, Samuel and Sanders, Ashley D. and Schneider, Valerie A. and Schultz, Baergen I. and Shafin, Kishwar and Smith, Michael W. and Sofia, Heidi J. and Abou Tayoun, Ahmad N. and Thibaud-Nissen, Fran\c{c}oise and Tricomi, Francesca Floriana and Wagner, Justin and Walenz, Brian and Wood, Jonathan M. D. and Zimin, Aleksey V. and Bourque, Guillaume and Chaisson, Mark J. P. and Flicek, Paul and Phillippy, Adam M. and Zook, Justin M. and Eichler, Evan E. and Haussler, David and Wang, Ting and Jarvis, Erich D. and Miga, Karen H. and Garrison, Erik and Marschall, Tobias and Hall, Ira M. and Li, Heng and Paten, Benedict :DOI: 10.1038/s41586-023-05896-x :ISSUE: 7960 :JOURNAL: Nature :LANGUAGE: en :MONTH: 5 :PAGES: 312--324 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41586-023-05896-x :VOLUME: 617 :YEAR: 2023 :END: * Standards and Guidelines for Validating Next-Generation Sequencing Bioinformatics Pipelines :PROPERTIES: :TITLE: Standards and Guidelines for Validating Next-Generation Sequencing Bioinformatics Pipelines :BTYPE: article :CUSTOM_ID: Roy2018 :AUTHOR: Roy, Somak and Coldren, Christopher and Karunamurthy, Arivarasan and Kip, Nefize S. and Klee, Eric W. and Lincoln, Stephen E. and Leon, Annette and Pullambhatla, Mrudula and Temple-Smolkin, Robyn L. and Voelkerding, Karl V. and Wang, Chen and Carter, Alexis B. :DOI: 10.1016/j.jmoldx.2017.11.003 :ISSUE: 1 :JOURNAL: The Journal of Molecular Diagnostics :LANGUAGE: en :MONTH: 1 :PAGES: 4--27 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.jmoldx.2017.11.003 :VOLUME: 20 :YEAR: 2018 :END: * Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly :PROPERTIES: :TITLE: Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly :BTYPE: article :CUSTOM_ID: Schneider2017 :ABSTRACT: <jats:p>The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.</jats:p> :AUTHOR: Schneider, Valerie A. and Graves-Lindsay, Tina and Howe, Kerstin and Bouk, Nathan and Chen, Hsiu-Chuan and Kitts, Paul A. and Murphy, Terence D. and Pruitt, Kim D. and Thibaud-Nissen, Fran\c{c}oise and Albracht, Derek and Fulton, Robert S. and Kremitzki, Milinn and Magrini, Vincent and Markovic, Chris and McGrath, Sean and Steinberg, Karyn Meltz and Auger, Kate and Chow, William and Collins, Joanna and Harden, Glenn and Hubbard, Timothy and Pelan, Sarah and Simpson, Jared T. and Threadgold, Glen and Torrance, James and Wood, Jonathan M. and Clarke, Laura and Koren, Sergey and Boitano, Matthew and Peluso, Paul and Li, Heng and Chin, Chen-Shan and Phillippy, Adam M. and Durbin, Richard and Wilson, Richard K. and Flicek, Paul and Eichler, Evan E. and Church, Deanna M. :DOI: 10.1101/gr.213611.116 :ISSUE: 5 :JOURNAL: Genome Research :LANGUAGE: en :MONTH: 5 :PAGES: 849--864 :PUBLISHER: Cold Spring Harbor Laboratory :URL: http://dx.doi.org/10.1101/gr.213611.116 :VOLUME: 27 :YEAR: 2017 :END: * Haplotype-based variant detection from short-read sequencing :PROPERTIES: :TITLE: Haplotype-based variant detection from short-read sequencing :BTYPE: article :CUSTOM_ID: Garrison2012 :ABSTRACT: The direct detection of haplotypes from short-read DNA sequencing data requires changes to existing small-variant detection methods. Here, we develop a Bayesian statistical framework which is capable of modeling multiallelic loci in sets of individuals with non-uniform copy number. We then describe our implementation of this framework in a haplotype-based variant detector, FreeBayes. :ARCHIVEPREFIX: arXiv :AUTHOR: Garrison, Erik and Marth, Gabor :EPRINT: 1207.3907v2 :FILE: 1207.3907v2.pdf :MONTH: Jul :PRIMARYCLASS: q-bio.GN :URL: http://arxiv.org/abs/1207.3907v2 :YEAR: 2012 :END: * Assembling and Validating Bioinformatic Pipelines for Next-Generation Sequencing Clinical Assays :PROPERTIES: :TITLE: Assembling and Validating Bioinformatic Pipelines for Next-Generation Sequencing Clinical Assays :BTYPE: article :CUSTOM_ID: Sorelle2020 :ABSTRACT: <jats:sec> <jats:title>Context.\textemdash{}</jats:title> <jats:p>Clinical next-generation sequencing (NGS) is being rapidly adopted, but analysis and interpretation of large data sets prompt new challenges for a clinical laboratory setting. Clinical NGS results rely heavily on the bioinformatics pipeline for identifying genetic variation in complex samples. The choice of bioinformatics algorithms, genome assembly, and genetic annotation databases are important for determining genetic alterations associated with disease. The analysis methods are often tuned to the assay to maximize accuracy. Once a pipeline has been developed, it must be validated to determine accuracy and reproducibility for samples similar to real-world cases. In silico proficiency testing or institutional data exchange will ensure consistency among clinical laboratories.</jats:p> </jats:sec> <jats:sec> <jats:title>Objective.\textemdash{}</jats:title> <jats:p>To provide molecular pathologists a step-by-step guide to bioinformatics analysis and validation design in order to navigate the regulatory and validation standards of implementing a bioinformatic pipeline as a part of a new clinical NGS assay.</jats:p> </jats:sec> <jats:sec> <jats:title>Data Sources.\textemdash{}</jats:title> <jats:p>This guide uses published studies on genomic analysis, bioinformatics methods, and methods comparison studies to inform the reader on what resources, including open source software tools and databases, are available for genetic variant detection and interpretation.</jats:p> </jats:sec> <jats:sec> <jats:title>Conclusions.\textemdash{}</jats:title> <jats:p>This review covers 4 key concepts: (1) bioinformatic analysis design for detecting genetic variation, (2) the resources for assessing genetic effects, (3) analysis validation assessment experiments and data sets, including a diverse set of samples to mimic real-world challenges that assess accuracy and reproducibility, and (4) if concordance between clinical laboratories will be improved by proficiency testing designed to test bioinformatic pipelines.</jats:p> </jats:sec> :AUTHOR: SoRelle, Jeffrey A and Wachsmann, Megan and Cantarel, Brandi L. :DOI: 10.5858/arpa.2019-0476-ra :ISSUE: 9 :JOURNAL: Archives of Pathology \textbackslash{}\& Laboratory Medicine :LANGUAGE: en :MONTH: 9 :PAGES: 1118--1130 :PUBLISHER: Archives of Pathology and Laboratory Medicine :URL: http://dx.doi.org/10.5858/arpa.2019-0476-ra :VOLUME: 144 :YEAR: 2020 :END: * SpliceAI-visual: a free online tool to improve SpliceAI splicing variant interpretation :PROPERTIES: :TITLE: SpliceAI-visual: a free online tool to improve SpliceAI splicing variant interpretation :BTYPE: article :CUSTOM_ID: DeSainteAgathe :ABSTRACT: SpliceAI is an open-source deep learning splicing prediction algorithm that has demonstrated in the past few years its high ability to predict splicing defects caused by DNA variations. However, its outputs present several drawbacks: (1) although the numerical values are very convenient for batch filtering, their precise interpretation can be difficult, (2) the outputs are delta scores which can sometimes mask a severe consequence, and (3) complex delins are most often not handled. We present here SpliceAI-visual, a free online tool based on the SpliceAI algorithm, and show how it complements the traditional SpliceAI analysis. First, SpliceAI-visual manipulates raw scores and not delta scores, as the latter can be misleading in certain circumstances. Second, the outcome of SpliceAI-visual is user-friendly thanks to the graphical presentation. Third, SpliceAI-visual is currently one of the only SpliceAI-derived implementations able to annotate complex variants (e.g., complex delins). We report here the benefits of using SpliceAI-visual and demonstrate its relevance in the assessment/modulation of the PVS1 classification criteria. We also show how SpliceAI-visual can elucidate several complex splicing defects taken from the literature but also from unpublished cases. SpliceAI-visual is available as a Google Colab notebook and has also been fully integrated in a free online variant interpretation tool, MobiDetails ( https://mobidetails.iurc.montp.inserm.fr/MD ). :AUTHOR: de Sainte Agathe, Jean-Madeleine and Filser, Mathilde and Isidor, Bertrand and Besnard, Thomas and Gueguen, Paul and Perrin, Aur\'{e}lien and Van Goethem, Charles and Verebi, Camille and Masingue, Marion and Rendu, John and Coss\'{e}e, Mireille and Bergougnoux, Anne and Frobert, Laurent and Buratti, Julien and Lejeune, \'{E}lodie and Le Guern, \'{E}ric and Pasquier, Florence and Clot, Fabienne and Kalatzis, Vasiliki and Roux, Anne-Fran\c{c}oise and Cogn\'{e}, Benjamin and Baux, David :DATE: 2023-02-10 :DOI: 10.1186/s40246-023-00451-1 :ISSN: 1479-7364 :ISSUE: 1 :JOURNAL: Human Genomics :KEYWORDS: Human Genetics :LANGUAGE: En :PUBLISHER: BioMed Central :URL: https://humgenomics.biomedcentral.com/articles/10.1186/s40246-023-00451-1 :VOLUME: 17 :END: * Toil enables reproducible, open source, big biomedical data analyses :PROPERTIES: :TITLE: Toil enables reproducible, open source, big biomedical data analyses :BTYPE: article :CUSTOM_ID: Vivian2017 :AUTHOR: Vivian, John and Rao, Arjun Arkal and Nothaft, Frank Austin and Ketchum, Christopher and Armstrong, Joel and Novak, Adam and Pfeil, Jacob and Narkizian, Jake and Deran, Alden D and Musselman-Brown, Audrey and Schmidt, Hannes and Amstutz, Peter and Craft, Brian and Goldman, Mary and Rosenbloom, Kate and Cline, Melissa and O'Connor, Brian and Hanna, Megan and Birger, Chet and Kent, W James and Patterson, David A and Joseph, Anthony D and Zhu, Jingchun and Zaranek, Sasha and Getz, Gad and Haussler, David and Paten, Benedict :DOI: 10.1038/nbt.3772 :ISSUE: 4 :JOURNAL: Nature Biotechnology :LANGUAGE: en :MONTH: 4 :PAGES: 314--316 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/nbt.3772 :VOLUME: 35 :YEAR: 2017 :END: * Split-Read Indel and Structural Variant Calling Using PINDEL :PROPERTIES: :TITLE: Split-Read Indel and Structural Variant Calling Using PINDEL :BTYPE: inbook :CUSTOM_ID: Ye2018 :AUTHOR: Ye, Kai and Guo, Li and Yang, Xiaofei and Lamijer, Eric-Wubbo and Raine, Keiran and Ning, Zemin :DOI: 10.1007/978-1-4939-8666-8\_7 :ISBN: ['9781493986651', '9781493986668'] :JOURNAL: Methods in Molecular Biology :MONTH: 7 :PAGES: 95--105 :PUBLISHER: Springer New York :URL: http://dx.doi.org/10.1007/978-1-4939-8666-8\_7 :YEAR: 2018 :END: * VarScan: variant detection in massively parallel sequencing of individual and pooled samples :PROPERTIES: :TITLE: VarScan: variant detection in massively parallel sequencing of individual and pooled samples :BTYPE: article :CUSTOM_ID: Koboldt2009 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:p>Summary: Massively parallel sequencing technologies hold incredible promise for the study of DNA sequence variation, particularly the identification of variants affecting human disease. The unprecedented throughput and relatively short read lengths of Roche/454, Illumina/Solexa, and other platforms have spurred development of a new generation of sequence alignment algorithms. Yet detection of sequence variants based on short read alignments remains challenging, and most currently available tools are limited to a single platform or aligner type. We present VarScan, an open source tool for variant detection that is compatible with several short read aligners. We demonstrate VarScan's ability to detect SNPs and indels with high sensitivity and specificity, in both Roche/454 sequencing of individuals and deep Illumina/Solexa sequencing of pooled samples.</jats:p> <jats:p>Availability and Implementation: Source code and documentation freely available at http://genome.wustl.edu/tools/cancer-genomics implemented as a Perl package and supported on Linux/UNIX, MS Windows and Mac OSX.</jats:p> <jats:p>Contact: dkoboldt@genome.wustl.edu</jats:p> <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.</jats:p> :AUTHOR: Koboldt, Daniel C. and Chen, Ken and Wylie, Todd and Larson, David E. and McLellan, Michael D. and Mardis, Elaine R. and Weinstock, George M. and Wilson, Richard K. and Ding, Li :DOI: 10.1093/bioinformatics/btp373 :ISSUE: 17 :JOURNAL: Bioinformatics :LANGUAGE: en :MONTH: 9 :PAGES: 2283--2285 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/bioinformatics/btp373 :VOLUME: 25 :YEAR: 2009 :END: * Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples :PROPERTIES: :TITLE: Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples :BTYPE: article :CUSTOM_ID: Cibulskis2013 :AUTHOR: Cibulskis, Kristian and Lawrence, Michael S and Carter, Scott L and Sivachenko, Andrey and Jaffe, David and Sougnez, Carrie and Gabriel, Stacey and Meyerson, Matthew and Lander, Eric S and Getz, Gad :DOI: 10.1038/nbt.2514 :ISSUE: 3 :JOURNAL: Nature Biotechnology :LANGUAGE: en :MONTH: 3 :PAGES: 213--219 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/nbt.2514 :VOLUME: 31 :YEAR: 2013 :END: * Semi-automated assembly of high-quality diploid human reference genomes :PROPERTIES: :TITLE: Semi-automated assembly of high-quality diploid human reference genomes :BTYPE: article :CUSTOM_ID: Jarvis2022 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society<jats:sup>1,2</jats:sup>. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals<jats:sup>3,4</jats:sup>. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome<jats:sup>5</jats:sup>. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity<jats:sup>6</jats:sup>. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent\textendash{}child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within \pm{}1\% of the length of CHM13. Nearly 48\% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.</jats:p> :AUTHOR: Jarvis, Erich D. and Formenti, Giulio and Rhie, Arang and Guarracino, Andrea and Yang, Chentao and Wood, Jonathan and Tracey, Alan and Thibaud-Nissen, Francoise and Vollger, Mitchell R. and Porubsky, David and Cheng, Haoyu and Asri, Mobin and Logsdon, Glennis A. and Carnevali, Paolo and Chaisson, Mark J. P. and Chin, Chen-Shan and Cody, Sarah and Collins, Joanna and Ebert, Peter and Escalona, Merly and Fedrigo, Olivier and Fulton, Robert S. and Fulton, Lucinda L. and Garg, Shilpa and Gerton, Jennifer L. and Ghurye, Jay and Granat, Anastasiya and Green, Richard E. and Harvey, William and Hasenfeld, Patrick and Hastie, Alex and Haukness, Marina and Jaeger, Erich B. and Jain, Miten and Kirsche, Melanie and Kolmogorov, Mikhail and Korbel, Jan O. and Koren, Sergey and Korlach, Jonas and Lee, Joyce and Li, Daofeng and Lindsay, Tina and Lucas, Julian and Luo, Feng and Marschall, Tobias and Mitchell, Matthew W. and McDaniel, Jennifer and Nie, Fan and Olsen, Hugh E. and Olson, Nathan D. and Pesout, Trevor and Potapova, Tamara and Puiu, Daniela and Regier, Allison and Ruan, Jue and Salzberg, Steven L. and Sanders, Ashley D. and Schatz, Michael C. and Schmitt, Anthony and Schneider, Valerie A. and Selvaraj, Siddarth and Shafin, Kishwar and Shumate, Alaina and Stitziel, Nathan O. and Stober, Catherine and Torrance, James and Wagner, Justin and Wang, Jianxin and Wenger, Aaron and Xiao, Chuanle and Zimin, Aleksey V. and Zhang, Guojie and Wang, Ting and Li, Heng and Garrison, Erik and Haussler, David and Hall, Ira and Zook, Justin M. and Eichler, Evan E. and Phillippy, Adam M. and Paten, Benedict and Howe, Kerstin and Miga, Karen H. and None, None :DOI: 10.1038/s41586-022-05325-5 :ISSUE: 7936 :JOURNAL: Nature :LANGUAGE: en :MONTH: 11 :PAGES: 519--531 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41586-022-05325-5 :VOLUME: 611 :YEAR: 2022 :END: * The complete sequence of a human Y chromosome :PROPERTIES: :TITLE: The complete sequence of a human Y chromosome :BTYPE: article :CUSTOM_ID: Rhie2023 :AUTHOR: Rhie, Arang and Nurk, Sergey and Cechova, Monika and Hoyt, Savannah J. and Taylor, Dylan J. and Altemose, Nicolas and Hook, Paul W. and Koren, Sergey and Rautiainen, Mikko and Alexandrov, Ivan A. and Allen, Jamie and Asri, Mobin and Bzikadze, Andrey V. and Chen, Nae-Chyun and Chin, Chen-Shan and Diekhans, Mark and Flicek, Paul and Formenti, Giulio and Fungtammasan, Arkarachai and Garcia Giron, Carlos and Garrison, Erik and Gershman, Ariel and Gerton, Jennifer L. and Grady, Patrick G. S. and Guarracino, Andrea and Haggerty, Leanne and Halabian, Reza and Hansen, Nancy F. and Harris, Robert and Hartley, Gabrielle A. and Harvey, William T. and Haukness, Marina and Heinz, Jakob and Hourlier, Thibaut and Hubley, Robert M. and Hunt, Sarah E. and Hwang, Stephen and Jain, Miten and Kesharwani, Rupesh K. and Lewis, Alexandra P. and Li, Heng and Logsdon, Glennis A. and Lucas, Julian K. and Makalowski, Wojciech and Markovic, Christopher and Martin, Fergal J. and Mc Cartney, Ann M. and McCoy, Rajiv C. and McDaniel, Jennifer and McNulty, Brandy M. and Medvedev, Paul and Mikheenko, Alla and Munson, Katherine M. and Murphy, Terence D. and Olsen, Hugh E. and Olson, Nathan D. and Paulin, Luis F. and Porubsky, David and Potapova, Tamara and Ryabov, Fedor and Salzberg, Steven L. and Sauria, Michael E. G. and Sedlazeck, Fritz J. and Shafin, Kishwar and Shepelev, Valery A. and Shumate, Alaina and Storer, Jessica M. and Surapaneni, Likhitha and Taravella Oill, Angela M. and Thibaud-Nissen, Fran\c{c}oise and Timp, Winston and Tomaszkiewicz, Marta and Vollger, Mitchell R. and Walenz, Brian P. and Watwood, Allison C. and Weissensteiner, Matthias H. and Wenger, Aaron M. and Wilson, Melissa A. and Zarate, Samantha and Zhu, Yiming and Zook, Justin M. and Eichler, Evan E. and O'Neill, Rachel J. and Schatz, Michael C. and Miga, Karen H. and Makova, Kateryna D. and Phillippy, Adam M. :DOI: 10.1038/s41586-023-06457-y :ISSUE: 7978 :JOURNAL: Nature :LANGUAGE: en :MONTH: 9 :PAGES: 344--354 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41586-023-06457-y :VOLUME: 621 :YEAR: 2023 :END: * The Ensembl Variant Effect Predictor :PROPERTIES: :TITLE: The Ensembl Variant Effect Predictor :BTYPE: article :CUSTOM_ID: Mclaren :ABSTRACT: The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs. :AUTHOR: McLaren, William and Gil, Laurent and Hunt, Sarah E. and Riat, Harpreet Singh and Ritchie, Graham R. S. and Thormann, Anja and Flicek, Paul and Cunningham, Fiona :DATE: 2016-06-06 :DOI: 10.1186/s13059-016-0974-4 :ISSN: 1474-760X :ISSUE: 1 :JOURNAL: Genome Biology :KEYWORDS: Animal Genetics and Genomics :LANGUAGE: En :PUBLISHER: BioMed Central :URL: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0974-4 :VOLUME: 17 :END: * RefSeq: an update on mammalian reference sequences :PROPERTIES: :TITLE: RefSeq: an update on mammalian reference sequences :BTYPE: article :CUSTOM_ID: Pruitt :ABSTRACT: Abstract. The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and pro :AUTHOR: Pruitt, Kim D. and Brown, Garth R. and Hiatt, Susan M. and Thibaud-Nissen, Fran\c{c}oise and Astashyn, Alexander and Ermolaeva, Olga and Farrell, Catherine M. and Hart, Jennifer and Landrum, Melissa J. and McGarvey, Kelly M. and Murphy, Michael R. and O'Leary, Nuala A. and Pujar, Shashikant and Rajput, Bhanu and Rangwala, Sanjida H. and Riddick, Lillian D. and Shkeda, Andrei and Sun, Hanzhen and Tamez, Pamela and Tully, Raymond E. and Wallin, Craig and Webb, David and Weber, Janet and Wu, Wendy and DiCuccio, Michael and Kitts, Paul and Maglott, Donna R. and Murphy, Terence D. and Ostell, James M. :DOI: 10.1093/nar/gkt1114 :ISSN: 0305-1048 :ISSUE: D1 :JOURNAL: Nucleic Acids Research :PUBLISHER: Oxford Academic :URL: https://dx.doi.org/10.1093/nar/gkt1114 :VOLUME: 42 :END: * Reference standards for next-generation sequencing :PROPERTIES: :TITLE: Reference standards for next-generation sequencing :BTYPE: article :CUSTOM_ID: Hardwick2017 :AUTHOR: Hardwick, Simon A. and Deveson, Ira W. and Mercer, Tim R. :DOI: 10.1038/nrg.2017.44 :ISSUE: 8 :JOURNAL: Nature Reviews Genetics :LANGUAGE: en :MONTH: 8 :PAGES: 473--484 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/nrg.2017.44 :VOLUME: 18 :YEAR: 2017 :END: * Twelve years of SAMtools and BCFtools :PROPERTIES: :TITLE: Twelve years of SAMtools and BCFtools :BTYPE: article :CUSTOM_ID: Danecek2021 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:sec> <jats:title>Background</jats:title> <jats:p>SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. They include tools for file format conversion and manipulation, sorting, querying, statistics, variant calling, and effect analysis amongst other methods.</jats:p> </jats:sec> <jats:sec> <jats:title>Findings</jats:title> <jats:p>The first version appeared online 12 years ago and has been maintained and further developed ever since, with many new features and improvements added over the years. The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines.</jats:p> </jats:sec> <jats:sec> <jats:title>Conclusion</jats:title> <jats:p>Both SAMtools and BCFtools are freely available on GitHub under the permissive MIT licence, free for both non-commercial and commercial use. Both packages have been installed \&gt;1 million times via Bioconda. The source code and documentation are available from https://www.htslib.org.</jats:p> </jats:sec> :AUTHOR: Danecek, Petr and Bonfield, James K and Liddle, Jennifer and Marshall, John and Ohan, Valeriu and Pollard, Martin O and Whitwham, Andrew and Keane, Thomas and McCarthy, Shane A and Davies, Robert M and Li, Heng :DOI: 10.1093/gigascience/giab008 :ISSUE: 2 :JOURNAL: GigaScience :LANGUAGE: en :MONTH: 1 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/gigascience/giab008 :VOLUME: 10 :YEAR: 2021 :END: * sideRETRO: a pipeline for identifying somatic and polymorphic insertions of processed pseudogenes or retrocopies :PROPERTIES: :TITLE: sideRETRO: a pipeline for identifying somatic and polymorphic insertions of processed pseudogenes or retrocopies :BTYPE: article :CUSTOM_ID: Miller2021 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:sec> <jats:title>Motivation</jats:title> <jats:p>Retrocopies or processed pseudogenes are gene copies resulting from mRNA retrotransposition. These gene duplicates can be fixed, somatically inserted or polymorphic in the genome. However, knowledge regarding unfixed retrocopies (retroCNVs) is still limited, and the development of computational tools for effectively identifying and genotyping them is an urgent need.</jats:p> </jats:sec> <jats:sec> <jats:title>Results</jats:title> <jats:p>Here, we present sideRETRO, a pipeline dedicated not only to detecting retroCNVs in whole-genome or whole-exome sequencing data but also to revealing their insertion sites, zygosity and genomic context and classifying them as somatic or polymorphic events. We show that sideRETRO can identify novel retroCNVs and genotype them, in addition to finding polymorphic retroCNVs in whole-genome and whole-exome data. Therefore, sideRETRO fills a gap in the literature and presents an efficient and straightforward algorithm to accelerate the study of bona fide retroCNVs.</jats:p> </jats:sec> <jats:sec> <jats:title>Availability and implementation</jats:title> <jats:p>sideRETRO is available at https://github.com/galantelab/sideRETRO</jats:p> </jats:sec> <jats:sec> <jats:title>Supplementary information</jats:title> <jats:p>Supplementary data are available at Bioinformatics online.</jats:p> </jats:sec> :AUTHOR: Miller, Thiago L A and Orpinelli Rego, Fernanda and Buzzo, Jos\'{e} Leonel L and Galante, Pedro A F :DOI: 10.1093/bioinformatics/btaa689 :ISSUE: 3 :JOURNAL: Bioinformatics :LANGUAGE: en :MONTH: 4 :PAGES: 419--421 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/bioinformatics/btaa689 :VOLUME: 37 :YEAR: 2021 :END: * precisionFDA Truth Challenge V2: Calling variants from short- and long-reads in difficult-to-map regions :PROPERTIES: :TITLE: precisionFDA Truth Challenge V2: Calling variants from short- and long-reads in difficult-to-map regions :BTYPE: misc :CUSTOM_ID: Olson2020 :ABSTRACT: <jats:title>Summary</jats:title><jats:p>The precisionFDA Truth Challenge V2 aimed to assess the state-of-the-art of variant calling in difficult-to-map regions and the Major Histocompatibility Complex (MHC). Starting with FASTQ files, 20 challenge participants applied their variant calling pipelines and submitted 64 variant callsets for one or more sequencing technologies (\textasciitilde{}35X Illumina, \textasciitilde{}35X PacBio HiFi, and \textasciitilde{}50X Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with the new GIAB benchmark sets and genome stratifications. Challenge submissions included a number of innovative methods for all three technologies, with graph-based and machine-learning methods scoring best for short-read and long-read datasets, respectively. New methods out-performed the 2016 Truth Challenge winners, and new machine-learning approaches combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants.</jats:p> :AUTHOR: Olson, Nathan D. and Wagner, Justin and McDaniel, Jennifer and Stephens, Sarah H. and Westreich, Samuel T. and Prasanna, Anish G. and Johanson, Elaine and Boja, Emily and Maier, Ezekiel J. and Serang, Omar and J\'{a}spez, David and Lorenzo-Salazar, Jos\'{e} M. and Mu\ {n}oz-Barrera, Adri\'{a}n and Rubio-Rodr\'{\i}guez, Luis A. and Flores, Carlos and Kyriakidis, Konstantinos and Malousi, Andigoni and Shafin, Kishwar and Pesout, Trevor and Jain, Miten and Paten, Benedict and Chang, Pi-Chuan and Kolesnikov, Alexey and Nattestad, Maria and Baid, Gunjan and Goel, Sidharth and Yang, Howard and Carroll, Andrew and Eveleigh, Robert and Bourgey, Mathieu and Bourque, Guillaume and Li, Gen and ChouXian, MA and Tang, LinQi and YuanPing, DU and Zhang, ShaoWei and Morata, Jordi and Tonda, Ra\'{u}l and Parra, Gen\'{\i}s and Trotta, Jean-R\'{e}mi and Brueffer, Christian and Demirkaya-Budak, Sinem and Kabakci-Zorlu, Duygu and Turgut, Deniz and Kalay, \"{O}zem and Budak, Gungor and Narc\i{}, K\"{u}bra and Arslan, Elif and Brown, Richard and Johnson, Ivan J and Dolgoborodov, Alexey and Semenyuk, Vladimir and Jain, Amit and Tetikol, H. Serhat and Jain, Varun and Ruehle, Mike and Lajoie, Bryan and Roddey, Cooper and Catreux, Severine and Mehio, Rami and Ahsan, Mian Umair and Liu, Qian and Wang, Kai and Sahraeian, Sayed Mohammad Ebrahim and Fang, Li Tai and Mohiyuddin, Marghoob and Hung, Calvin and Jain, Chirag and Feng, Hanying and Li, Zhipan and Chen, Luoqi and Sedlazeck, Fritz J. and Zook, Justin M. :DOI: 10.1101/2020.11.13.380741 :MONTH: 11 :PUBLISHER: Cold Spring Harbor Laboratory :URL: http://dx.doi.org/10.1101/2020.11.13.380741 :YEAR: 2020 :END: * Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls :PROPERTIES: :TITLE: Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls :BTYPE: article :CUSTOM_ID: Zook2014 :AUTHOR: Zook, Justin M and Chapman, Brad and Wang, Jason and Mittelman, David and Hofmann, Oliver and Hide, Winston and Salit, Marc :DOI: 10.1038/nbt.2835 :ISSUE: 3 :JOURNAL: Nature Biotechnology :LANGUAGE: en :MONTH: 3 :PAGES: 246--251 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/nbt.2835 :VOLUME: 32 :YEAR: 2014 :END: * A complete reference genome improves analysis of human genetic variation :PROPERTIES: :TITLE: A complete reference genome improves analysis of human genetic variation :BTYPE: article :CUSTOM_ID: Aganezov2022 :ABSTRACT: <jats:p>Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 million base pairs of sequence, corrects thousands of structural errors, and unlocks the most complex regions of the human genome for clinical and functional study. We show how this reference universally improves read mapping and variant calling for 3202 and 17 globally diverse samples sequenced with short and long reads, respectively. We identify hundreds of thousands of variants per sample in previously unresolved regions, showcasing the promise of the T2T-CHM13 reference for evolutionary and biomedical discovery. Simultaneously, this reference eliminates tens of thousands of spurious variants per sample, including reduction of false positives in 269 medically relevant genes by up to a factor of 12. Because of these improvements in variant discovery coupled with population and functional genomic resources, T2T-CHM13 is positioned to replace GRCh38 as the prevailing reference for human genetics.</jats:p> :AUTHOR: Aganezov, Sergey and Yan, Stephanie M. and Soto, Daniela C. and Kirsche, Melanie and Zarate, Samantha and Avdeyev, Pavel and Taylor, Dylan J. and Shafin, Kishwar and Shumate, Alaina and Xiao, Chunlin and Wagner, Justin and McDaniel, Jennifer and Olson, Nathan D. and Sauria, Michael E. G. and Vollger, Mitchell R. and Rhie, Arang and Meredith, Melissa and Martin, Skylar and Lee, Joyce and Koren, Sergey and Rosenfeld, Jeffrey A. and Paten, Benedict and Layer, Ryan and Chin, Chen-Shan and Sedlazeck, Fritz J. and Hansen, Nancy F. and Miller, Danny E. and Phillippy, Adam M. and Miga, Karen H. and McCoy, Rajiv C. and Dennis, Megan Y. and Zook, Justin M. and Schatz, Michael C. :DOI: 10.1126/science.abl3533 :ISSUE: 6588 :JOURNAL: Science :LANGUAGE: en :MONTH: 4 :PUBLISHER: American Association for the Advancement of Science (AAAS) :URL: http://dx.doi.org/10.1126/science.abl3533 :VOLUME: 376 :YEAR: 2022 :END: * Nix: A Safe and Policy-Free System for Software Deployment :PROPERTIES: :TITLE: Nix: A Safe and Policy-Free System for Software Deployment :BTYPE: inbook :CUSTOM_ID: Dolstra2004 :AUTHOR: Eelco Dolstra and Merijn de Jonge and Eelco Visser :URL: https://edolstra.github.io/pubs/nspfssd-lisa2004-final.pdf :YEAR: 2004 :END: * An Extensive Sequence Dataset of Gold-Standard Samples for Benchmarking and Development :PROPERTIES: :TITLE: An Extensive Sequence Dataset of Gold-Standard Samples for Benchmarking and Development :BTYPE: misc :CUSTOM_ID: Baid2020 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>Accurate standards and extensive development datasets are the foundation of technical progress. To facilitate benchmarking and development, we sequence 9 samples, covering the Genome in a Bottle truth sets on multiple instruments (NovaSeq, HiSeqX, HiSeq4000, PacBio Sequel II System) and sample preparations (PCR-Free, PCR-Positive) for both whole genome and multiple exome kits. We benchmark pipelines, quantifying strengths and limitations for sequencing and analysis methods. We identify variability within and between instruments, preparation methods, and analytical pipelines, across various sequencing depths. We discuss the relevance of this variability to downstream analyses, and strategies to reduce variability.</jats:p> :AUTHOR: Baid, Gunjan and Nattestad, Maria and Kolesnikov, Alexey and Goel, Sidharth and Yang, Howard and Chang, Pi-Chuan and Carroll, Andrew :DOI: 10.1101/2020.12.11.422022 :MONTH: 12 :PUBLISHER: Cold Spring Harbor Laboratory :URL: http://dx.doi.org/10.1101/2020.12.11.422022 :YEAR: 2020 :END: * Best practices for benchmarking germline small-variant calls in human genomes :PROPERTIES: :TITLE: Best practices for benchmarking germline small-variant calls in human genomes :BTYPE: article :CUSTOM_ID: Krusche2019 :AUTHOR: Krusche, Peter and None, None and Trigg, Len and Boutros, Paul C. and Mason, Christopher E. and De La Vega, Francisco M. and Moore, Benjamin L. and Gonzalez-Porta, Mar and Eberle, Michael A. and Tezak, Zivana and Lababidi, Samir and Truty, Rebecca and Asimenos, George and Funke, Birgit and Fleharty, Mark and Chapman, Brad A. and Salit, Marc and Zook, Justin M. :DOI: 10.1038/s41587-019-0054-x :ISSUE: 5 :JOURNAL: Nature Biotechnology :LANGUAGE: en :MONTH: 5 :PAGES: 555--560 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41587-019-0054-x :VOLUME: 37 :YEAR: 2019 :END: * ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data :PROPERTIES: :TITLE: ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data :BTYPE: article :CUSTOM_ID: Wang :ABSTRACT: Abstract. High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpo :AUTHOR: Wang, Kai and Li, Mingyao and Hakonarson, Hakon :DOI: 10.1093/nar/gkq603 :ISSN: 0305-1048 :ISSUE: 16 :JOURNAL: Nucleic Acids Research :PUBLISHER: Oxford Academic :URL: https://dx.doi.org/10.1093/nar/gkq603 :VOLUME: 38 :YEAR: 2010 :END: * Benchmarking variant callers in next-generation and third-generation sequencing analysis :PROPERTIES: :TITLE: Benchmarking variant callers in next-generation and third-generation sequencing analysis :BTYPE: article :CUSTOM_ID: Pei2021 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>DNA variants represent an important source of genetic variations among individuals. Next- generation sequencing (NGS) is the most popular technology for genome-wide variant calling. Third-generation sequencing (TGS) has also recently been used in genetic studies. Although many variant callers are available, no single caller can call both types of variants on NGS or TGS data with high sensitivity and specificity. In this study, we systematically evaluated 11 variant callers on 12 NGS and TGS datasets. For germline variant calling, we tested DNAseq and DNAscope modes from Sentieon, HaplotypeCaller mode from GATK and WGS mode from DeepVariant. All the four callers had comparable performance on NGS data and 30\texttimes{} coverage of WGS data was recommended. For germline variant calling on TGS data, we tested DNAseq mode from Sentieon, HaplotypeCaller mode from GATK and PACBIO mode from DeepVariant. All the three callers had similar performance in SNP calling, while DeepVariant outperformed the others in InDel calling. TGS detected more variants than NGS, particularly in complex and repetitive regions. For somatic variant calling on NGS, we tested TNscope and TNseq modes from Sentieon, MuTect2 mode from GATK, NeuSomatic, VarScan2, and Strelka2. TNscope and Mutect2 outperformed the other callers. A higher proportion of tumor sample purity (from 10 to 20\%) significantly increased the recall value of calling. Finally, computational costs of the callers were compared and Sentieon required the least computational cost. These results suggest that careful selection of a tool and parameters is needed for accurate SNP or InDel calling under different scenarios.</jats:p> :AUTHOR: Pei, Surui and Liu, Tao and Ren, Xue and Li, Weizhong and Chen, Chongjian and Xie, Zhi :DOI: 10.1093/bib/bbaa148 :ISSUE: 3 :JOURNAL: Briefings in Bioinformatics :LANGUAGE: en :MONTH: 5 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/bib/bbaa148 :VOLUME: 22 :YEAR: 2021 :END: * dbSNP: the NCBI database of genetic variation :PROPERTIES: :TITLE: dbSNP: the NCBI database of genetic variation :BTYPE: article :CUSTOM_ID: Sherry2001 :AUTHOR: Sherry, S. T. :DOI: 10.1093/nar/29.1.308 :ISSUE: 1 :JOURNAL: Nucleic Acids Research :MONTH: 1 :PAGES: 308--311 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/nar/29.1.308 :VOLUME: 29 :YEAR: 2001 :END: * GRIDSS2: comprehensive characterisation of somatic structural variation using single breakend variants and structural variant phasing :PROPERTIES: :TITLE: GRIDSS2: comprehensive characterisation of somatic structural variation using single breakend variants and structural variant phasing :BTYPE: article :CUSTOM_ID: Cameron2021 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>GRIDSS2 is the first structural variant caller to explicitly report single breakends\textemdash{}breakpoints in which only one side can be unambiguously determined. By treating single breakends as a fundamental genomic rearrangement signal on par with breakpoints, GRIDSS2 can explain 47\% of somatic centromere copy number changes using single breakends to non-centromere sequence. On a cohort of 3782 deeply sequenced metastatic cancers, GRIDSS2 achieves an unprecedented 3.1\% false negative rate and 3.3\% false discovery rate and identifies a novel 32\textendash{}100\hspace{0.167em}bp duplication signature. GRIDSS2 simplifies complex rearrangement interpretation through phasing of structural variants with 16\% of somatic calls phasable using paired-end sequencing.</jats:p> :AUTHOR: Cameron, Daniel L. and Baber, Jonathan and Shale, Charles and Valle-Inclan, Jose Espejo and Besselink, Nicolle and van Hoeck, Arne and Janssen, Roel and Cuppen, Edwin and Priestley, Peter and Papenfuss, Anthony T. :DOI: 10.1186/s13059-021-02423-x :ISSUE: 1 :JOURNAL: Genome Biology :LANGUAGE: en :MONTH: 12 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1186/s13059-021-02423-x :VOLUME: 22 :YEAR: 2021 :END: * Creation of an Expert Curated Variant List for Clinical Genomic Test Development and Validation :PROPERTIES: :TITLE: Creation of an Expert Curated Variant List for Clinical Genomic Test Development and Validation :BTYPE: article :CUSTOM_ID: Wilcox2021 :AUTHOR: Wilcox, Emma and Harrison, Steven M. and Lockhart, Edward and Voelkerding, Karl and Lubin, Ira M. and Rehm, Heidi L. and Kalman, Lisa V. and Funke, Birgit :DOI: 10.1016/j.jmoldx.2021.07.018 :ISSUE: 11 :JOURNAL: The Journal of Molecular Diagnostics :LANGUAGE: en :MONTH: 11 :PAGES: 1500--1505 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.jmoldx.2021.07.018 :VOLUME: 23 :YEAR: 2021 :END: * DELLY: structural variant discovery by integrated paired-end and split-read analysis :PROPERTIES: :TITLE: DELLY: structural variant discovery by integrated paired-end and split-read analysis :BTYPE: article :CUSTOM_ID: Rausch2012 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:p>Motivation: The discovery of genomic structural variants (SVs) at high sensitivity and specificity is an essential requirement for characterizing naturally occurring variation and for understanding pathological somatic rearrangements in personal genome sequencing data. Of particular interest are integrated methods that accurately identify simple and complex rearrangements in heterogeneous sequencing datasets at single-nucleotide resolution, as an optimal basis for investigating the formation mechanisms and functional consequences of SVs.</jats:p> <jats:p>Results: We have developed an SV discovery method, called DELLY, that integrates short insert paired-ends, long-range mate-pairs and split-read alignments to accurately delineate genomic rearrangements at single-nucleotide resolution. DELLY is suitable for detecting copy-number variable deletion and tandem duplication events as well as balanced rearrangements such as inversions or reciprocal translocations. DELLY, thus, enables to ascertain the full spectrum of genomic rearrangements, including complex events. On simulated data, DELLY compares favorably to other SV prediction methods across a wide range of sequencing parameters. On real data, DELLY reliably uncovers SVs from the 1000 Genomes Project and cancer genomes, and validation experiments of randomly selected deletion loci show a high specificity.</jats:p> <jats:p>Availability: DELLY is available at www.korbel.embl.de/software.html</jats:p> <jats:p>Contact: tobias.rausch@embl.de</jats:p> :AUTHOR: Rausch, Tobias and Zichner, Thomas and Schlattl, Andreas and St\"{u}tz, Adrian M. and Benes, Vladimir and Korbel, Jan O. :DOI: 10.1093/bioinformatics/bts378 :ISSUE: 18 :JOURNAL: Bioinformatics :LANGUAGE: en :MONTH: 9 :PAGES: i333--i339 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/bioinformatics/bts378 :VOLUME: 28 :YEAR: 2012 :END: * Les ph\'{e}notypes \'{e}rythrocytaires rares : un enjeu de sant\'{e} publique :PROPERTIES: :TITLE: Les ph\'{e}notypes \'{e}rythrocytaires rares : un enjeu de sant\'{e} publique :BTYPE: article :CUSTOM_ID: Peyrard2008 :AUTHOR: Peyrard, T. and Pham, B.-N. and Le Pennec, P.-Y. and Rouger, P. :DOI: 10.1016/j.tracli.2008.02.001 :ISSUE: 3 :JOURNAL: Transfusion Clinique et Biologique :LANGUAGE: fr :MONTH: 6 :PAGES: 109--119 :PUBLISHER: Elsevier BV :URL: http://dx.doi.org/10.1016/j.tracli.2008.02.001 :VOLUME: 15 :YEAR: 2008 :END: * Predicting RNA splicing from DNA sequence using Pangolin :PROPERTIES: :TITLE: Predicting RNA splicing from DNA sequence using Pangolin :BTYPE: article :CUSTOM_ID: Zeng2022 :ABSTRACT: Recent progress in deep learning has greatly improved the prediction of RNA splicing from DNA sequence. Here, we present Pangolin, a deep learning model to predict splice site strength in multiple tissues. Pangolin outperforms state-of-the-art methods for predicting RNA splicing on a variety of prediction tasks. Pangolin improves prediction of the impact of genetic variants on RNA splicing, including common, rare, and lineage-specific genetic variation. In addition, Pangolin identifies loss-of-function mutations with high accuracy and recall, particularly for mutations that are not missense or nonsense, demonstrating remarkable potential for identifying pathogenic variants. :AUTHOR: Zeng, Tony and Li, Yang I :DATE: 2022-04-21 :DOI: 10.1186/s13059-022-02664-4 :ISSN: 1474-760X :ISSUE: 1 :JOURNAL: Genome Biology :KEYWORDS: Animal Genetics and Genomics :LANGUAGE: En :PUBLISHER: BioMed Central :URL: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02664-4 :VOLUME: 23 :YEAR: 2022 :END: * From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline :PROPERTIES: :TITLE: From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline :BTYPE: article :CUSTOM_ID: VanDerAuwera2013 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>This unit describes how to use BWA and the Genome Analysis Toolkit (GATK) to map genome sequencing data to a reference and produce high-quality variant calls that can be used in downstream analyses. The complete workflow includes the core NGS data-processing steps that are necessary to make the raw data suitable for analysis by the GATK, as well as the key methods involved in variant discovery using the GATK. <jats:italic>Curr. Protoc. Bioinform</jats:italic>. 43:11.10.1-11.10.33. \textcopyright{} 2013 by John Wiley \& Sons, Inc.</jats:p> :AUTHOR: Van der Auwera, Geraldine A. and Carneiro, Mauricio O. and Hartl, Christopher and Poplin, Ryan and del Angel, Guillermo and Levy-Moonshine, Ami and Jordan, Tadeusz and Shakir, Khalid and Roazen, David and Thibault, Joel and Banks, Eric and Garimella, Kiran V. and Altshuler, David and Gabriel, Stacey and DePristo, Mark A. :DOI: 10.1002/0471250953.bi1110s43 :ISSUE: 1 :JOURNAL: Current Protocols in Bioinformatics :LANGUAGE: en :MONTH: 10 :PUBLISHER: Wiley :URL: http://dx.doi.org/10.1002/0471250953.bi1110s43 :VOLUME: 43 :YEAR: 2013 :END: * ClinVar: public archive of interpretations of clinically relevant variants :PROPERTIES: :TITLE: ClinVar: public archive of interpretations of clinically relevant variants :BTYPE: article :CUSTOM_ID: Landrum2016 :AUTHOR: Landrum, Melissa J. and Lee, Jennifer M. and Benson, Mark and Brown, Garth and Chao, Chen and Chitipiralla, Shanmuga and Gu, Baoshan and Hart, Jennifer and Hoffman, Douglas and Hoover, Jeffrey and Jang, Wonhee and Katz, Kenneth and Ovetsky, Michael and Riley, George and Sethi, Amanjeev and Tully, Ray and Villamarin-Salomon, Ricardo and Rubinstein, Wendy and Maglott, Donna R. :DOI: 10.1093/nar/gkv1222 :ISSUE: D1 :JOURNAL: Nucleic Acids Research :LANGUAGE: en :MONTH: 1 :PAGES: D862--D868 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/nar/gkv1222 :VOLUME: 44 :YEAR: 2016 :END: * Guide technique d\textbackslash{}'accr\'{e}ditation de la technologie de s\'{e}quen\c{c}age \`{a} haut d\'{e}bit :PROPERTIES: :TITLE: Guide technique d\textbackslash{}'accr\'{e}ditation de la technologie de s\'{e}quen\c{c}age \`{a} haut d\'{e}bit :BTYPE: article :CUSTOM_ID: CofracSHGTA16 :AUTHOR: COFRAC :URLDATE: 2024-01-13-19:59:59 :YEAR: 2019 :END: * Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery :PROPERTIES: :TITLE: Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery :BTYPE: article :CUSTOM_ID: Barbitoff2022 :ABSTRACT: <jats:title>Abstract</jats:title><jats:sec><jats:title>Background</jats:title><jats:p>Accurate variant detection in the coding regions of the human genome is a key requirement for molecular diagnostics of Mendelian disorders. Efficiency of variant discovery from next-generation sequencing (NGS) data depends on multiple factors, including reproducible coverage biases of NGS methods and the performance of read alignment and variant calling software. Although variant caller benchmarks are published constantly, no previous publications have leveraged the full extent of available gold standard whole-genome (WGS) and whole-exome (WES) sequencing datasets.</jats:p></jats:sec><jats:sec><jats:title>Results</jats:title><jats:p>In this work, we systematically evaluated the performance of 4 popular short read aligners (Bowtie2, BWA, Isaac, and Novoalign) and 9 novel and well-established variant calling and filtering methods (Clair3, DeepVariant, Octopus, GATK, FreeBayes, and Strelka2) using a set of 14 \textquotedblleft{}gold standard\textquotedblright{} WES and WGS datasets available from Genome In A Bottle (GIAB) consortium. Additionally, we have indirectly evaluated each pipeline's performance using a set of 6 non-GIAB samples of African and Russian ethnicity. In our benchmark, Bowtie2 performed significantly worse than other aligners, suggesting it should not be used for medical variant calling. When other aligners were considered, the accuracy of variant discovery mostly depended on the variant caller and not the read aligner. Among the tested variant callers, DeepVariant consistently showed the best performance and the highest robustness. Other actively developed tools, such as Clair3, Octopus, and Strelka2, also performed well, although their efficiency had greater dependence on the quality and type of the input data. We have also compared the consistency of variant calls in GIAB and non-GIAB samples. With few important caveats, best-performing tools have shown little evidence of overfitting.</jats:p></jats:sec><jats:sec><jats:title>Conclusions</jats:title><jats:p>The results show surprisingly large differences in the performance of cutting-edge tools even in high confidence regions of the coding genome. This highlights the importance of regular benchmarking of quickly evolving tools and pipelines. We also discuss the need for a more diverse set of gold standard genomes that would include samples of African, Hispanic, or mixed ancestry. Additionally, there is also a need for better variant caller assessment in the repetitive regions of the coding genome.</jats:p></jats:sec> :AUTHOR: Barbitoff, Yury A. and Abasov, Ruslan and Tvorogova, Varvara E. and Glotov, Andrey S. and Predeus, Alexander V. :DOI: 10.1186/s12864-022-08365-3 :ISSUE: 1 :JOURNAL: BMC Genomics :LANGUAGE: en :MONTH: 12 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1186/s12864-022-08365-3 :VOLUME: 23 :YEAR: 2022 :END: * Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data :PROPERTIES: :TITLE: Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data :BTYPE: article :CUSTOM_ID: Kumaran2019 :AUTHOR: Kumaran, Manojkumar and Subramanian, Umadevi and Devarajan, Bharanidharan :DOI: 10.1186/s12859-019-2928-9 :ISSUE: 1 :JOURNAL: BMC Bioinformatics :LANGUAGE: en :MONTH: 12 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1186/s12859-019-2928-9 :VOLUME: 20 :YEAR: 2019 :END: * Benchmarking short sequence mapping tools :PROPERTIES: :TITLE: Benchmarking short sequence mapping tools :BTYPE: article :CUSTOM_ID: Hatem2013 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:sec> <jats:title>Background</jats:title> <jats:p>The development of next-generation sequencing instruments has led to the generation of millions of short sequences in a single run. The process of aligning these reads to a reference genome is time consuming and demands the development of fast and accurate alignment tools. However, the current proposed tools make different compromises between the accuracy and the speed of mapping. Moreover, many important aspects are overlooked while comparing the performance of a newly developed tool to the state of the art. Therefore, there is a need for an objective evaluation method that covers all the aspects. In this work, we introduce a benchmarking suite to extensively analyze sequencing tools with respect to various aspects and provide an objective comparison.</jats:p> </jats:sec> <jats:sec> <jats:title>Results</jats:title> <jats:p>We applied our benchmarking tests on 9 well known mapping tools, namely, Bowtie, Bowtie2, BWA, SOAP2, MAQ, RMAP, GSNAP, Novoalign, and mrsFAST (mrFAST) using synthetic data and real RNA-Seq data. MAQ and RMAP are based on building hash tables for the reads, whereas the remaining tools are based on indexing the reference genome. The benchmarking tests reveal the strengths and weaknesses of each tool. The results show that no single tool outperforms all others in all metrics. However, Bowtie maintained the best throughput for most of the tests while BWA performed better for longer read lengths. The benchmarking tests are not restricted to the mentioned tools and can be further applied to others.</jats:p> </jats:sec> <jats:sec> <jats:title>Conclusion</jats:title> <jats:p>The mapping process is still a hard problem that is affected by many factors. In this work, we provided a benchmarking suite that reveals and evaluates the different factors affecting the mapping process. Still, there is no tool that outperforms all of the others in all the tests. Therefore, the end user should clearly specify his needs in order to choose the tool that provides the best results.</jats:p> </jats:sec> :AUTHOR: Hatem, Ayat and Bozda\u{g}, Doruk and Toland, Amanda E and \c{C}ataly\"{u}rek, \"{U}mit V :DOI: 10.1186/1471-2105-14-184 :ISSUE: 1 :JOURNAL: BMC Bioinformatics :LANGUAGE: en :MONTH: 12 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1186/1471-2105-14-184 :VOLUME: 14 :YEAR: 2013 :END: * The use of next-generation sequencing for the determination of rare blood group genotypes :PROPERTIES: :TITLE: The use of next-generation sequencing for the determination of rare blood group genotypes :BTYPE: article :CUSTOM_ID: Jakobsen2019 :ABSTRACT: <jats:title>SUMMARY</jats:title><jats:sec><jats:title>Objectives</jats:title><jats:p>Next-generation sequencing (NGS) for the determination of rare blood group genotypes was tested in 72 individuals from different ethnicities.</jats:p></jats:sec><jats:sec><jats:title>Background</jats:title><jats:p>Traditional serological-based antigen detection methods, as well as genotyping based on specific single nucleotide polymorphisms (SNPs) or single nucleotide variants (SNVs), are limited to detecting only a limited number of known antigens or alleles. NGS methods do not have this limitation.</jats:p></jats:sec><jats:sec><jats:title>Methods</jats:title><jats:p>NGS using Ion torrent Personal Genome Machine (PGM) was performed with a customised Ampliseq panel targeting 15 different blood group systems on 72 blood donors of various ethnicities (Caucasian, Hispanic, Asian, Middle Eastern and Black).</jats:p></jats:sec><jats:sec><jats:title>Results</jats:title><jats:p>Blood group genotypes for 70 of 72 samples could be obtained for 15 blood group systems in one step using the NGS assay and, for common SNPs, are consistent with previously determined genotypes using commercial SNP assays. However, particularly for the Kidd, Duffy and Lutheran blood group systems, several SNVs were detected by the NGS assay that revealed additional coding information compared to other methods. Furthermore, the NGS assay allowed for the detection of genotypes related to VEL, Knops, Gerbich, Globoside, P1PK and Landsteiner-Wiener blood group systems.</jats:p></jats:sec><jats:sec><jats:title>Conclusions</jats:title><jats:p>The NGS assay enables a comprehensive genotype analysis of many blood group systems and is capable of detecting common and rare alleles, including alleles not currently detected by commercial assays.</jats:p></jats:sec> :AUTHOR: Jakobsen, M. A. and Dellgren, C. and Sheppard, C. and Yazer, M. and Sprog\o{}e, U. :DOI: 10.1111/tme.12496 :ISSUE: 3 :JOURNAL: Transfusion Medicine :LANGUAGE: en :MONTH: 6 :PAGES: 162--168 :PUBLISHER: Wiley :URL: http://dx.doi.org/10.1111/tme.12496 :VOLUME: 29 :YEAR: 2019 :END: * The complete sequence of a human genome :PROPERTIES: :TITLE: The complete sequence of a human genome :BTYPE: article :CUSTOM_ID: Nurk2022 :ABSTRACT: <jats:p>Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8\% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion\textendash{}base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.</jats:p> :AUTHOR: Nurk, Sergey and Koren, Sergey and Rhie, Arang and Rautiainen, Mikko and Bzikadze, Andrey V. and Mikheenko, Alla and Vollger, Mitchell R. and Altemose, Nicolas and Uralsky, Lev and Gershman, Ariel and Aganezov, Sergey and Hoyt, Savannah J. and Diekhans, Mark and Logsdon, Glennis A. and Alonge, Michael and Antonarakis, Stylianos E. and Borchers, Matthew and Bouffard, Gerard G. and Brooks, Shelise Y. and Caldas, Gina V. and Chen, Nae-Chyun and Cheng, Haoyu and Chin, Chen-Shan and Chow, William and de Lima, Leonardo G. and Dishuck, Philip C. and Durbin, Richard and Dvorkina, Tatiana and Fiddes, Ian T. and Formenti, Giulio and Fulton, Robert S. and Fungtammasan, Arkarachai and Garrison, Erik and Grady, Patrick G. S. and Graves-Lindsay, Tina A. and Hall, Ira M. and Hansen, Nancy F. and Hartley, Gabrielle A. and Haukness, Marina and Howe, Kerstin and Hunkapiller, Michael W. and Jain, Chirag and Jain, Miten and Jarvis, Erich D. and Kerpedjiev, Peter and Kirsche, Melanie and Kolmogorov, Mikhail and Korlach, Jonas and Kremitzki, Milinn and Li, Heng and Maduro, Valerie V. and Marschall, Tobias and McCartney, Ann M. and McDaniel, Jennifer and Miller, Danny E. and Mullikin, James C. and Myers, Eugene W. and Olson, Nathan D. and Paten, Benedict and Peluso, Paul and Pevzner, Pavel A. and Porubsky, David and Potapova, Tamara and Rogaev, Evgeny I. and Rosenfeld, Jeffrey A. and Salzberg, Steven L. and Schneider, Valerie A. and Sedlazeck, Fritz J. and Shafin, Kishwar and Shew, Colin J. and Shumate, Alaina and Sims, Ying and Smit, Arian F. A. and Soto, Daniela C. and Sovi\'{c}, Ivan and Storer, Jessica M. and Streets, Aaron and Sullivan, Beth A. and Thibaud-Nissen, Fran\c{c}oise and Torrance, James and Wagner, Justin and Walenz, Brian P. and Wenger, Aaron and Wood, Jonathan M. D. and Xiao, Chunlin and Yan, Stephanie M. and Young, Alice C. and Zarate, Samantha and Surti, Urvashi and McCoy, Rajiv C. and Dennis, Megan Y. and Alexandrov, Ivan A. and Gerton, Jennifer L. and O'Neill, Rachel J. and Timp, Winston and Zook, Justin M. and Schatz, Michael C. and Eichler, Evan E. and Miga, Karen H. and Phillippy, Adam M. :DOI: 10.1126/science.abj6987 :ISSUE: 6588 :JOURNAL: Science :LANGUAGE: en :MONTH: 4 :PAGES: 44--53 :PUBLISHER: American Association for the Advancement of Science (AAAS) :URL: http://dx.doi.org/10.1126/science.abj6987 :VOLUME: 376 :YEAR: 2022 :END: * A variant by any name: quantifying annotation discordance across tools and clinical databases :PROPERTIES: :TITLE: A variant by any name: quantifying annotation discordance across tools and clinical databases :BTYPE: article :CUSTOM_ID: Yen2017 :ABSTRACT: Clinical genomic testing is dependent on the robust identification and reporting of variant-level information in relation to disease. With the shift to high-throughput sequencing, a major challenge for clinical diagnostics is the cross-identification of variants called on their genomic position to resources that rely on transcript- or protein-based descriptions. We evaluated the accuracy of three tools (SnpEff, Variant Effect Predictor, and Variation Reporter) that generate transcript and protein-based variant nomenclature from genomic coordinates according to guidelines by the Human Genome Variation Society (HGVS). Our evaluation was based on transcript-controlled comparisons to a manually curated set of 126 test variants of various types drawn from data sources, each with HGVS-compliant transcript and protein descriptors. We further evaluated the concordance between annotations generated by Snpeff and Variant Effect Predictor and those in major germline and cancer databases: ClinVar and COSMIC, respectively. We find that there is substantial discordance between the annotation tools and databases in the description of insertions and/or deletions. Using our ground truth set of variants, constructed specifically to identify challenging events, accuracy was between 80 and 90\% for coding and 50 and 70\% for protein changes for 114 to 126 variants. Exact concordance for SNV syntax was over 99.5\% between ClinVar and Variant Effect Predictor and SnpEff, but less than 90\% for non-SNV variants. For COSMIC, exact concordance for coding and protein SNVs was between 65 and 88\% and less than 15\% for insertions. Across the tools and datasets, there was a wide range of different but equivalent expressions describing protein variants. Our results reveal significant inconsistency in variant representation across tools and databases. While some of these syntax differences may be clear to a clinician, they can confound variant matching, an important step in variant classification. These results highlight the urgent need for the adoption and adherence to uniform standards in variant annotation, with consistent reporting on the genomic reference, to enable accurate and efficient data-driven clinical care. :AUTHOR: Yen, Jennifer L. and Garcia, Sarah and Montana, Aldrin and Harris, Jason and Chervitz, Stephen and Morra, Massimo and West, John and Chen, Richard and Church, Deanna M. :DATE: 2017-01-26 :DOI: 10.1186/s13073-016-0396-7 :ISSN: 1756-994X :ISSUE: 1 :JOURNAL: Genome Medicine :KEYWORDS: Human Genetics :LANGUAGE: En :PUBLISHER: BioMed Central :URL: https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-016-0396-7 :VOLUME: 9 :YEAR: 2017 :END: * Toward practical transparent verifiable and long-term reproducible research using Guix :PROPERTIES: :TITLE: Toward practical transparent verifiable and long-term reproducible research using Guix :BTYPE: article :CUSTOM_ID: Vallet2022 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>Reproducibility crisis urge scientists to promote transparency which allows peers to draw same conclusions after performing identical steps from hypothesis to results. Growing resources are developed to open the access to methods, data and source codes. Still, the computational environment, an interface between data and source code running analyses, is not addressed. Environments are usually described with software and library names associated with version labels or provided as an opaque container image. This is not enough to describe the complexity of the dependencies on which they rely to operate on. We describe this issue and illustrate how open tools like Guix can be used by any scientist to share their environment and allow peers to reproduce it. Some steps of research might not be fully reproducible, but at least, transparency for computation is technically addressable. These tools should be considered by scientists willing to promote transparency and open science.</jats:p> :AUTHOR: Vallet, Nicolas and Michonneau, David and Tournier, Simon :DOI: 10.1038/s41597-022-01720-9 :ISSUE: 1 :JOURNAL: Scientific Data :LANGUAGE: en :MONTH: 10 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1038/s41597-022-01720-9 :VOLUME: 9 :YEAR: 2022 :END: * Technology dictates algorithms: recent developments in read alignment :PROPERTIES: :TITLE: Technology dictates algorithms: recent developments in read alignment :BTYPE: article :CUSTOM_ID: Alser2021 :ABSTRACT: <jats:title>Abstract</jats:title><jats:p>Aligning sequencing reads onto a reference is an essential step of the majority of genomic analysis pipelines. Computational algorithms for read alignment have evolved in accordance with technological advances, leading to today's diverse array of alignment methods. We provide a systematic survey of algorithmic foundations and methodologies across 107 alignment methods, for both short and long reads. We provide a rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read alignment. We discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology.</jats:p> :AUTHOR: Alser, Mohammed and Rotman, Jeremy and Deshpande, Dhrithi and Taraszka, Kodi and Shi, Huwenbo and Baykal, Pelin Icer and Yang, Harry Taegyun and Xue, Victor and Knyazev, Sergey and Singer, Benjamin D. and Balliu, Brunilda and Koslicki, David and Skums, Pavel and Zelikovsky, Alex and Alkan, Can and Mutlu, Onur and Mangul, Serghei :DOI: 10.1186/s13059-021-02443-7 :ISSUE: 1 :JOURNAL: Genome Biology :LANGUAGE: en :MONTH: 12 :PUBLISHER: Springer Science and Business Media LLC :URL: http://dx.doi.org/10.1186/s13059-021-02443-7 :VOLUME: 22 :YEAR: 2021 :END: * Qualification des solution bioinformatiques: note technique :PROPERTIES: :TITLE: Qualification des solution bioinformatiques: note technique :BTYPE: article :CUSTOM_ID: ngsdiag2019 :AUTHOR: NGS-Diag :YEAR: 2019 :END: * The Sequence Alignment/Map format and SAMtools :PROPERTIES: :TITLE: The Sequence Alignment/Map format and SAMtools :BTYPE: article :CUSTOM_ID: Li2009 :ABSTRACT: <jats:title>Abstract</jats:title> <jats:p>Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.</jats:p> <jats:p>Availability: http://samtools.sourceforge.net</jats:p> <jats:p>Contact: rd@sanger.ac.uk</jats:p> :AUTHOR: Li, Heng and Handsaker, Bob and Wysoker, Alec and Fennell, Tim and Ruan, Jue and Homer, Nils and Marth, Gabor and Abecasis, Goncalo and Durbin, Richard and None, None :DOI: 10.1093/bioinformatics/btp352 :ISSUE: 16 :JOURNAL: Bioinformatics :LANGUAGE: en :MONTH: 8 :PAGES: 2078--2079 :PUBLISHER: Oxford University Press (OUP) :URL: http://dx.doi.org/10.1093/bioinformatics/btp352 :VOLUME: 25 :YEAR: 2009 :END: * Surface model of the human red blood cell simulating changes in membrane curvature under strain :PROPERTIES: :TITLE: Surface model of the human red blood cell simulating changes in membrane curvature under strain :BTYPE: article :CUSTOM_ID: Kuchel :ABSTRACT: We present mathematical simulations of shapes of red blood cells (RBCs) and their cytoskeleton when they are subjected to linear strain. The cell surface is described by a previously reported quartic equation in three dimensional (3D) Cartesian space. Using recently available functions in Mathematica to triangularize the surfaces we computed four types of curvature of the membrane. We also mapped changes in mesh-triangle area and curvatures as the RBCs were distorted. The highly deformable red blood cell (erythrocyte; RBC) responds to mechanically imposed shape changes with enhanced glycolytic flux and cation transport. Such morphological changes are produced experimentally by suspending the cells in a gelatin gel, which is then elongated or compressed in a custom apparatus inside an NMR spectrometer. A key observation is the extent to which the maximum and minimum Principal Curvatures are localized symmetrically in patches at the poles or equators and distributed in rings around the main axis of the strained RBC. Changes on the nanometre to micro-meter scale of curvature, suggest activation of only a subset of the intrinsic mechanosensitive cation channels, Piezo1, during experiments carried out with controlled distortions, which persist for many hours. This finding is relevant to a proposal for non-uniform distribution of Piezo1 molecules around the RBC membrane. However, if the curvature that gates Piezo1 is at a very fine length scale, then membrane tension will determine local curvature; so, curvatures as computed here\ (in contrast to much finer surface irregularities) may not influence Piezo1 activity. Nevertheless, our analytical methods can be extended address these new mechanistic proposals. The geometrical reorganization of the simulated cytoskeleton informs ideas about the mechanism of concerted metabolic and cation-flux responses of the RBC to mechanically imposed shape changes. :AUTHOR: Kuchel, Philip W. and Cox, Charles D. and Daners, Daniel and Shishmarev, Dmitry and Galvosas, Petrik :DATE: 2021-07-01 :DOI: 10.1038/s41598-021-92699-7 :ISSN: 2045-2322 :ISSUE: 1 :JOURNAL: Scientific Reports :KEYWORDS: Biophysics :LANGUAGE: En :PUBLISHER: Nature Publishing Group :URL: https://www.nature.com/articles/s41598-021-92699-7 :VOLUME: 11 :END: