apraga/org - Change JQUKAFWIHMUVF73PZMBQLSGQYWEJWZ2VK55P7JXPNH45MKEFO5UAC

Bisonex + workout

Created by Alexis Praga on April 22, 2023

JQUKAFWIHMUVF73PZMBQLSGQYWEJWZ2VK55P7JXPNH45MKEFO5UAC

Dependencies

In channels

main

Change contents

Insertion in workout.org at line 6125 [3.1]

[2.41]

* RTO
- 28-15-13
* L-sit
- 0
* Muscle-up
- 2+1 - 5 neg
- 2+2 - 4 neg
- 2+1+1- 4 neg
* Extension:
-  3x22
* FL tucked row :
- 3+2 x3
* Pistols :
- 4x3
* Planche tucked push-up:
- 3+3+3+1
- 3+3+2+2
- 3+3+3+1
* Compression:
- 3x10
* Norwegian roll
- 3x4

Replacement in projects/bisonex.org at line 1 [4.35]

B:BD[4.35] → [5.1399:9591]

#+title: Bisonex
* Biblio :biblio:
** Workflow
Comparaison WDL, Cromwell, nextflow
https://www.nature.com/articles/s41598-021-99288-8
Nextflow = bon compromis ?
Comparison alignement, variant caller (2021)
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04144-1
** Étapes du pipeline
*** Variant calling: Haplotype caller
https://gatk.broadinstitute.org/hc/en-us/articles/360035531412
Définis l'algorithme + image
** VCF
*** GT genotype
encoded as alleles values separated by either of ”/” or “|”, e.g. The allele values are 0 for the reference allele (what is in the reference sequence), 1 for the first allele listed in ALT, 2 for the second allele list in ALT and so on. For diploid calls examples could be 0/1 or 1|0 etc. For haploid calls, e.g. on Y, male X, mitochondrion, only one allele value should be given. All samples must have GT call information; if a call cannot be made for a sample at a given locus, ”.” must be specified for each missing allele in the GT field (for example ./. for a diploid). The meanings of the separators are:
    / : genotype unphased
    | : genotype phased
** Validation
*** NA12878
**** KILL [[https://precision.fda.gov/challenges/truth/results][fdaPrecision challenge]]
Attention, génome et en hg19 donc comparaison non adaptée ...
**** TODO Best practices for the analytical validation of clinical whole-genome sequencing intended for the diagnosis of germline disease
SCHEDULED:   <2023-04-06 Thu>
https://www.nature.com/articles/s41525-020-00154-9
Recommandations générale pour genome, sans données brutes
**** TODO [#A] Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data
SCHEDULED: <2023-04-06 Thu>
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2928-9
1. variant calling seul
2. NA12878 + données simulées
3. exome
4. évalué via F-score
Code disponible ! https://github.com/bharani-lab/WES-Benchmarking-Pipeline_Manoj/tree/master/Script
Résultat: BWA/Novoalign_DeepVariant
Aligneurs
- BWA-MEM 0.7.16
- Bowtie2 2.2.6
- Novoalign 3.08.02
- SOAP 2.21
- MOSAIK 2.2.3
Variantcalling
- GATK HaplotypeCaller 4
- FreeBayes 1.1.0
- SAMtools mpileup 1.7
- DeepVariant r0.4
  SNV
| Exome | Pipeline |    TP |   FP |  FN | Sensitivity | Precision | F-Score |   FDR |
|     1 | BWA_GATK | 23689 | 1397 | 613 |       0.975 |     0.944 |   0.959 | 0.057 |
|     2 | BWA_GATK | 23946 |  865 | 356 |       0.985 |     0.965 |   0.975 | 0.036 |
indel
 |   TP | FP | FN | Sensitivity | Precision | F-Score |   FDR |   |
 | 1254 | 72 | 75 |       0.944 |     0.946 |   0.945 | 0.054 |   |
 | 1309 | 10 | 20 |       0.985 |     0.992 |   0.989 | 0.008 |   |
Valeur brutes :
https://static-content.springer.com/esm/art%3A10.1186%2Fs12859-019-2928-9/MediaObjects/12859_2019_2928_MOESM8_ESM.pdf
Autres articles avec même comparaison en exome sur NA12878
- Hwang et al., 2015 studyi
- Highnam et al, 2015
-  Cornish and Guda, 2015
Variant Type
|                       | SNVs & Indels | CNVs (>10Kb) | SVs | Mitochondrial variants | Pseudogenes | REs | Somatic/ mosaic | Literature/Data | Source   |
| NA12878               |         100%a |          40% |   0 |                      0 |           0 |   0 |               0 | Zook et  al18   | NIST     |
| Other NIST standard   |           71% |          40% | 50% |                      0 |           0 |   0 |               0 | Zook  et al18   |          |
| (e.g. AJ/Asian trios) |               |              |     |                        |             |     |                 |                 |          |
| Platinum              |           29% |            0 |   0 |                      0 |           0 |   0 |               0 | Eberle et  al8  | Platinum |
| Genomes               |               |              |     |                        |             |     |                 |                 |          |
| Venter/HuRef          |           14% |          40% |   0 |                      0 |           0 |   0 |               0 | Trost et al1    | HuRef    |
**** Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers
#+begin_src bibtex
@ARTICLE{Chen2019-fp,
  title     = "Systematic comparison of germline variant calling pipelines
               cross multiple next-generation sequencers",
  author    = "Chen, Jiayun and Li, Xingsong and Zhong, Hongbin and Meng,
               Yuhuan and Du, Hongli",
  abstract  = "The development and innovation of next generation sequencing
               (NGS) and the subsequent analysis tools have gain popularity in
               scientific researches and clinical diagnostic applications.
               Hence, a systematic comparison of the sequencing platforms and
               variant calling pipelines could provide significant guidance to
               NGS-based scientific and clinical genomics. In this study, we
               compared the performance, concordance and operating efficiency
               of 27 combinations of sequencing platforms and variant calling
               pipelines, testing three variant calling pipelines-Genome
               Analysis Tool Kit HaplotypeCaller, Strelka2 and
               Samtools-Varscan2 for nine data sets for the NA12878 genome
               sequenced by different platforms including BGISEQ500,
               MGISEQ2000, HiSeq4000, NovaSeq and HiSeq Xten. For the variants
               calling performance of 12 combinations in WES datasets, all
               combinations displayed good performance in calling SNPs, with
               their F-scores entirely higher than 0.96, and their performance
               in calling INDELs varies from 0.75 to 0.91. And all 15
               combinations in WGS datasets also manifested good performance,
               with F-scores in calling SNPs were entirely higher than 0.975
               and their performance in calling INDELs varies from 0.71 to
               0.93. All of these combinations manifested high concordance in
               variant identification, while the divergence of variants
               identification in WGS datasets were larger than that in WES
               datasets. We also down-sampled the original WES and WGS datasets
               at a series of gradient coverage across multiple platforms, then
               the variants calling period consumed by the three pipelines at
               each coverage were counted, respectively. For the GIAB datasets
               on both BGI and Illumina platforms, Strelka2 manifested its
               ultra-performance in detecting accuracy and processing
               efficiency compared with other two pipelines on each sequencing
               platform, which was recommended in the further promotion and
               application of next generation sequencing technology. The
               results of our researches will provide useful and comprehensive
               guidelines for personal or organizational researchers in
               reliable and consistent variants identification.",
  journal   = "Sci. Rep.",
  publisher = "Springer Science and Business Media LLC",
  volume    =  9,
  number    =  1,
  pages     = "9345",
  month     =  jun,
  year      =  2019,
  copyright = "https://creativecommons.org/licenses/by/4.0",
  language  = "en"
}
#+end_src
Comparaison de différents pipeline 2019
https://www.nature.com/articles/s41598-019-45835-3
Combinaison
- variant calling = GATK, Strelka2 and Samtools-Varscan2
- sur NA12878
- séquencé sur BGISEQ500, MGISEQ2000, HiSeq4000, NovaSeq and HiSeq Xten.
  Conclusion: strelka2 supérieur mais biais sur NA12878 ?
Illumina > BGI pour indel, probablement car reads plus grand
#+begin_quote
 For WES datasets, the BGI platforms displayed the superior performance in SNPs
 calling while Illumina platforms manifested the better variants calling
 performance in INDELs calling, which could be explained by their divergence in
 sequencing s

[4.35]

[5.9591]

#+title: Bisonex
* Biblio :biblio:
** Workflow
Comparaison WDL, Cromwell, nextflow
https://www.nature.com/articles/s41598-021-99288-8
Nextflow = bon compromis ?
Comparison alignement, variant caller (2021)
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04144-1
** Étapes du pipeline
*** Variant calling: Haplotype caller
https://gatk.broadinstitute.org/hc/en-us/articles/360035531412
Définis l'algorithme + image
** VCF
*** GT genotype
encoded as alleles values separated by either of ”/” or “|”, e.g. The allele values are 0 for the reference allele (what is in the reference sequence), 1 for the first allele listed in ALT, 2 for the second allele list in ALT and so on. For diploid calls examples could be 0/1 or 1|0 etc. For haploid calls, e.g. on Y, male X, mitochondrion, only one allele value should be given. All samples must have GT call information; if a call cannot be made for a sample at a given locus, ”.” must be specified for each missing allele in the GT field (for example ./. for a diploid). The meanings of the separators are:
    / : genotype unphased
    | : genotype phased
** Validation
*** NA12878
**** KILL [[https://precision.fda.gov/challenges/truth/results][fdaPrecision challenge]]
Attention, génome et en hg19 donc comparaison non adaptée ...
**** TODO Best practices for the analytical validation of clinical whole-genome sequencing intended for the diagnosis of germline disease
https://www.nature.com/articles/s41525-020-00154-9
Recommandations générale pour genome, sans données brutes
**** TODO [#A] Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2928-9
1. variant calling seul
2. NA12878 + données simulées
3. exome
4. évalué via F-score
Code disponible ! https://github.com/bharani-lab/WES-Benchmarking-Pipeline_Manoj/tree/master/Script
Résultat: BWA/Novoalign_DeepVariant
Aligneurs
- BWA-MEM 0.7.16
- Bowtie2 2.2.6
- Novoalign 3.08.02
- SOAP 2.21
- MOSAIK 2.2.3
Variantcalling
- GATK HaplotypeCaller 4
- FreeBayes 1.1.0
- SAMtools mpileup 1.7
- DeepVariant r0.4
  SNV
| Exome | Pipeline |    TP |   FP |  FN | Sensitivity | Precision | F-Score |   FDR |
|     1 | BWA_GATK | 23689 | 1397 | 613 |       0.975 |     0.944 |   0.959 | 0.057 |
|     2 | BWA_GATK | 23946 |  865 | 356 |       0.985 |     0.965 |   0.975 | 0.036 |
indel
 |   TP | FP | FN | Sensitivity | Precision | F-Score |   FDR |   |
 | 1254 | 72 | 75 |       0.944 |     0.946 |   0.945 | 0.054 |   |
 | 1309 | 10 | 20 |       0.985 |     0.992 |   0.989 | 0.008 |   |
Valeur brutes :
https://static-content.springer.com/esm/art%3A10.1186%2Fs12859-019-2928-9/MediaObjects/12859_2019_2928_MOESM8_ESM.pdf
Autres articles avec même comparaison en exome sur NA12878
- Hwang et al., 2015 studyi
- Highnam et al, 2015
-  Cornish and Guda, 2015
Variant Type
|                       | SNVs & Indels | CNVs (>10Kb) | SVs | Mitochondrial variants | Pseudogenes | REs | Somatic/ mosaic | Literature/Data | Source   |
| NA12878               |         100%a |          40% |   0 |                      0 |           0 |   0 |               0 | Zook et  al18   | NIST     |
| Other NIST standard   |           71% |          40% | 50% |                      0 |           0 |   0 |               0 | Zook  et al18   |          |
| (e.g. AJ/Asian trios) |               |              |     |                        |             |     |                 |                 |          |
| Platinum              |           29% |            0 |   0 |                      0 |           0 |   0 |               0 | Eberle et  al8  | Platinum |
| Genomes               |               |              |     |                        |             |     |                 |                 |          |
| Venter/HuRef          |           14% |          40% |   0 |                      0 |           0 |   0 |               0 | Trost et al1    | HuRef    |
**** Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers
#+begin_src bibtex
@ARTICLE{Chen2019-fp,
  title     = "Systematic comparison of germline variant calling pipelines
               cross multiple next-generation sequencers",
  author    = "Chen, Jiayun and Li, Xingsong and Zhong, Hongbin and Meng,
               Yuhuan and Du, Hongli",
  abstract  = "The development and innovation of next generation sequencing
               (NGS) and the subsequent analysis tools have gain popularity in
               scientific researches and clinical diagnostic applications.
               Hence, a systematic comparison of the sequencing platforms and
               variant calling pipelines could provide significant guidance to
               NGS-based scientific and clinical genomics. In this study, we
               compared the performance, concordance and operating efficiency
               of 27 combinations of sequencing platforms and variant calling
               pipelines, testing three variant calling pipelines-Genome
               Analysis Tool Kit HaplotypeCaller, Strelka2 and
               Samtools-Varscan2 for nine data sets for the NA12878 genome
               sequenced by different platforms including BGISEQ500,
               MGISEQ2000, HiSeq4000, NovaSeq and HiSeq Xten. For the variants
               calling performance of 12 combinations in WES datasets, all
               combinations displayed good performance in calling SNPs, with
               their F-scores entirely higher than 0.96, and their performance
               in calling INDELs varies from 0.75 to 0.91. And all 15
               combinations in WGS datasets also manifested good performance,
               with F-scores in calling SNPs were entirely higher than 0.975
               and their performance in calling INDELs varies from 0.71 to
               0.93. All of these combinations manifested high concordance in
               variant identification, while the divergence of variants
               identification in WGS datasets were larger than that in WES
               datasets. We also down-sampled the original WES and WGS datasets
               at a series of gradient coverage across multiple platforms, then
               the variants calling period consumed by the three pipelines at
               each coverage were counted, respectively. For the GIAB datasets
               on both BGI and Illumina platforms, Strelka2 manifested its
               ultra-performance in detecting accuracy and processing
               efficiency compared with other two pipelines on each sequencing
               platform, which was recommended in the further promotion and
               application of next generation sequencing technology. The
               results of our researches will provide useful and comprehensive
               guidelines for personal or organizational researchers in
               reliable and consistent variants identification.",
  journal   = "Sci. Rep.",
  publisher = "Springer Science and Business Media LLC",
  volume    =  9,
  number    =  1,
  pages     = "9345",
  month     =  jun,
  year      =  2019,
  copyright = "https://creativecommons.org/licenses/by/4.0",
  language  = "en"
}
#+end_src
Comparaison de différents pipeline 2019
https://www.nature.com/articles/s41598-019-45835-3
Combinaison
- variant calling = GATK, Strelka2 and Samtools-Varscan2
- sur NA12878
- séquencé sur BGISEQ500, MGISEQ2000, HiSeq4000, NovaSeq and HiSeq Xten.
  Conclusion: strelka2 supérieur mais biais sur NA12878 ?
Illumina > BGI pour indel, probablement car reads plus grand
#+begin_quote
 For WES datasets, the BGI platforms displayed the superior performance in SNPs
 calling while Illumina platforms manifested the better variants calling
 performance in INDELs calling, which could be explained by their divergence in
 sequencing s

Replacement in projects/bisonex.org at line 8 [4.35]

B:BD[6.16393] → [2.325:7578]

∅:D[2.7578] → [7.8199:9138]

B:BD[7.8199] → [7.8199:9138]

gomp.so est fourni par gcc donc il faut charger le module
 module load gcc@11.3.0/gcc-12.1.0
** KILL Utiliser subworkflow
CLOSED: [2023-04-02 Sun 18:08]
Notre version permet d'être plus souple
*** KILL Alignement
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
*** KILL Vep
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
vcf_annotate_ensemblvep
** TODO Annotation avec nextflow :annotation:
*** KILL VEP : --gene-phenotype ?
CLOSED: [2023-04-18 mar. 18:32]
Vu avec alexis : bases de données non à jour
https://www.ensembl.org/info/genome/variation/phenotype/sources_phenotype_documentation.html
*** DONE plugin VEP
CLOSED: [2023-04-18 mar. 18:32]
Cloner dépôt git avec plugin
Puis utiliser --dir_plugins
*** TODO Utiliser code d’Alexis
*** TODO Nouvelle version avec VEP
**** TODO Ajout spliceAI
plugin VEP
**** TODO Ajout pLI
plugin VEP
**** KILL Ajout LOEUF
CLOSED: [2023-04-19 mer. 16:32]
plugin VEP
**** TODO Spip avec export VCF
BED ne semble pas bien marcher (il faut définir une zone)
VCF : trop d’information
Attention, plusieurs transcripts mais résultats identiques. On supprimer les doublons
***** TODO interpretation + intervalle de confiance
***** DONE Score
CLOSED: [2023-04-22 Sat 15:30]
**** TODO CADD: remplacer par plugin VEP
***** Test
#+begin_src
vep  -i test.vcf  -o lol.vcf --offline --dir  /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz  --dir_plugins ../VEP_plugins/ -v
#+end_src
Test
#+begin_src sh
vep --id "1  230710048 230710048 A/G 1"   --offline --dir  /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz  --hgvsg --plugin pLI --plugin LOEUF -o lol
#+end_src
CSQ=G|missense_variant|MODERATE|AGT|ENSG00000135744|Transcript|ENST00000366667|protein_coding|2/5||||843|776|259|M/T|aTg/aCg|||-1||HGNC|HGNC:333||Ensembl||A|A||1:g.230710048A>G|0.347|-0.277922|
Correspond bien à https://www.ensembl.org/Homo_sapiens/Tools/VEP/Results?tl=I7ZsIbrj14P6lD43-9115494
***** TODO Utiliser whole genome
***** TODO Renommer les chromosome avant ...
**** DONE clinvar
CLOSED: [2023-04-22 Sat 15:31]
**** STRT Remplacer script R par vep ?
Example avec --custom
https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html
**** TODO Filtrer après VEP
**** TODO Vérifier résultats HGVS avec mutalyzer
**** TODO filter_vep
**** TODO OMIM
**** TODO Grantham
**** TODO ACMG incidental
**** TODO Gnomad ?
*** HOLD Ancienne version
**** TODO HGVS
**** TODO Filtrer après VEP
**** TODO OMIM
**** TODO clinvar
**** TODO ACMG incidental
**** TODO Grantham
**** KILL LRG
CLOSED: [2023-04-18 mar. 17:22] SCHEDULED: <2023-04-18 Tue>
Vu avec alexis, n’est plus à jour
**** TODO Gnomad
** DONE Porter exactement la version d'Alexis sur Helios
CLOSED: [2023-01-14 Sat 17:56]
Branche "prod"
** STRT Tester version d'alexis avec Nix
*** DONE Ajouter clinvar
CLOSED: [2022-11-13 Sun 19:37]
*** DONE Alignement
CLOSED: [2022-11-13 Sun 12:52]
*** DONE Haplotype caller
CLOSED: [2022-11-13 Sun 13:00]
*** TODO Filter
- [X] depth
- [ ] comon snp not path
Problème avec liste des ID
**** TODO variant annotation
Besoin de vep
*** TODO Variant calling
* Amélioration :amelioration:
* Documentation :doc:
** DONE Procédure d'installation nix + dependences pour VM CHU
CLOSED: [2023-04-22 Sat 15:27] SCHEDULED: <2023-04-13 Thu>
* Manuscript :manuscript:
* Tests :tests:hg002:
** WAIT Non régression : version prod
*** DONE ID common snp
CLOSED: [2022-11-19 Sat 21:36]
#+begin_src
$ wc -l ID_of_common_snp.txt
23194290 ID_of_common_snp.txt
$ wc -l /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
23194290 /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
#+end_src
*** DONE ID common snp not clinvar patho
CLOSED: [2022-12-11 Sun 20:11]
**** DONE Vérification du problème
CLOSED: [2022-12-11 Sun 16:30]
Sur le J:
21155134 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref
Version de "non-régression"
21155076 database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt
Nouvelle version
23193391 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt
Si on enlève les doublons
$ sort database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt | uniq > old.txt
$ wc -l old.txt
21107097 old.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt | uniq > new.txt
$ wc -l new.txt
21174578 new.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref | uniq > ref.txt
$ wc -l ref.txt
21107155 ref.txt
Si on regarde la différence
 comm -23 ref.txt old.txt
rs1052692
rs1057518973
rs1057518973
rs11074121
rs112848754
rs12573787
rs145033890
rs147889095
rs1553904159
rs1560294695
rs1560296615
rs1560310926
rs1560325547
rs1560342418
rs1560356225
rs1578287542
...
On cherche le premier
bcftools query -i 'ID="rs1052692"' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 1619351 C A,T
Il est bien patho...
$ bcftools query -i 'POS=1619351' database/clinvar/clinvar.vcf.gz -f '%CHROM %POS %REF %ALT %INFO/CLNSIG\n'
19 1619351 C T Conflicting_interpretations_of_pathogenicity
On vérifie pour tous les autres
$ comm -23 ref.txt old.txt > tocheck.txt
On génère les régions à vérifier (chromosome number:position)
$ bcftools query -i 'ID=@tocheck.txt' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM\t%POS\n' > tocheck.pos
On génère le mapping inverse (chromosome number -> NC)
$ awk ' { t = $1; $1 = $2; $2 = t; print; } ' database/RefSeq/refseq_to_number_only_consensual.txt  > mapping.txt
On remap clinvar
$ bcftools annotate --rename-chrs mapping.txt database/clinvar/clinvar.vcf.gz -o clinvar_remapped.vcf.gz
$ tabix clinvar_remapped.vcf.gz
Enfin, on cherche dans clinvar la classification
$ bcftools query -R tocheck.pos clinvar_remapped.vcf.gz -f '%CHROM %POS %INFO/CLNSIG\n'
$ bcftools query -R tocheck.pos database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %ID \n' | grep '^NC'
#+RESULTS:
**** DONE Comprendre pourquoi la nouvelle version donne un résultat différent
CLOSED: [2022-12-11 Sun 20:11]
***** DONE Même version dbsnp et clinvar ?
CLOSED: [2022-12-10 Sat 23:02]
Clinvar différent !
  $ bcftools stats clinvar.gz
  clinvar (Alexis)
SN	0	number of samples:	0
SN	0	number of records:	1492828
SN	0	number of no-ALTs:	965
SN	0	number of SNPs:	1338007
SN	0	number of MNPs:	5562
SN	0	number of indels:	144580
SN	0	number of others:	3714
SN	0	number of multiallelic sites:	0
SN	0	number of multiallelic SNP sites:	0
clinvar (new)
SN	0	number of sample
s:	0
SN	0	number of records:	1493470
SN	0	number of no-ALTs:	965
SN	0	number of SNPs:	1338561
SN	0	number of MNPs:	5565
SN	0	number of indels:	144663
SN	0	number of others:	3716
SN	0	number of multiallelic sites:	0
SN	0	number of multiallelic SNP sites:	0
***** DONE Mettre à jour clinvar et dbnSNP pour travailler sur les mêm bases
CLOSED: [2022-12-11 Sun 12:10]
Problème persiste
***** DONE Supprimer la conversion en int du chromosome
CLOSED: [2022-12-10 Sat 19:29]
***** KILL Même NC ?
CLOSED: [2022-12-10 Sat 19:29]
$  zgrep "contig=<ID=NC_\(.*\)" clinvar/GRCh38/clinvar.vcf.gz > contig.clinvar
$ diff contig.txt contig.clinvar
< ##contig=<ID=NC_012920.1>
***** DONE Tester sur chromosome 19: ok
CLOSED: [2022-12-11 Sun 13:53]
On prépare les données
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
bcftools filter -i 'CHROM="NC_000019.10"'

[6.16393]

[7.9138]

gomp.so est fourni par gcc donc il faut charger le module
 module load gcc@11.3.0/gcc-12.1.0
** KILL Utiliser subworkflow
CLOSED: [2023-04-02 Sun 18:08]
Notre version permet d'être plus souple
*** KILL Alignement
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
*** KILL Vep
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
vcf_annotate_ensemblvep
** TODO Annotation avec nextflow :annotation:
*** KILL VEP : --gene-phenotype ?
CLOSED: [2023-04-18 mar. 18:32]
Vu avec alexis : bases de données non à jour
https://www.ensembl.org/info/genome/variation/phenotype/sources_phenotype_documentation.html
*** DONE plugin VEP
CLOSED: [2023-04-18 mar. 18:32]
Cloner dépôt git avec plugin
Puis utiliser --dir_plugins
*** TODO Utiliser code d’Alexis
*** TODO Nouvelle version avec VEP
**** TODO Ajout spliceAI
plugin VEP
**** TODO Ajout pLI
plugin VEP
**** KILL Ajout LOEUF
CLOSED: [2023-04-19 mer. 16:32]
plugin VEP
**** TODO Spip avec export VCF
BED ne semble pas bien marcher (il faut définir une zone)
VCF : trop d’information
Attention, plusieurs transcripts mais résultats identiques. On supprimer les doublons
***** TODO interpretation + intervalle de confiance
***** DONE Score
CLOSED: [2023-04-22 Sat 15:30]
**** TODO CADD: remplacer par plugin VEP
***** Test
#+begin_src
vep  -i test.vcf  -o lol.vcf --offline --dir  /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz  --dir_plugins ../VEP_plugins/ -v
#+end_src
Test
#+begin_src sh
vep --id "1  230710048 230710048 A/G 1"   --offline --dir  /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz  --hgvsg --plugin pLI --plugin LOEUF -o lol
#+end_src
CSQ=G|missense_variant|MODERATE|AGT|ENSG00000135744|Transcript|ENST00000366667|protein_coding|2/5||||843|776|259|M/T|aTg/aCg|||-1||HGNC|HGNC:333||Ensembl||A|A||1:g.230710048A>G|0.347|-0.277922|
Correspond bien à https://www.ensembl.org/Homo_sapiens/Tools/VEP/Results?tl=I7ZsIbrj14P6lD43-9115494
***** TODO Utiliser whole genome
***** TODO Renommer les chromosome avant ...
**** DONE clinvar
CLOSED: [2023-04-22 Sat 15:31]
**** STRT Remplacer script R par vep ?
Example avec --custom
https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html
**** TODO Filtrer après VEP
**** TODO Vérifier résultats HGVS avec mutalyzer
**** TODO filter_vep
**** TODO OMIM
**** TODO Grantham
**** TODO ACMG incidental
**** TODO Gnomad ?
*** HOLD Ancienne version
**** TODO HGVS
**** TODO Filtrer après VEP
**** TODO OMIM
**** TODO clinvar
**** TODO ACMG incidental
**** TODO Grantham
**** KILL LRG
CLOSED: [2023-04-18 mar. 17:22] SCHEDULED: <2023-04-18 Tue>
Vu avec alexis, n’est plus à jour
**** TODO Gnomad
** DONE Porter exactement la version d'Alexis sur Helios
CLOSED: [2023-01-14 Sat 17:56]
Branche "prod"
** STRT Tester version d'alexis avec Nix
*** DONE Ajouter clinvar
CLOSED: [2022-11-13 Sun 19:37]
*** DONE Alignement
CLOSED: [2022-11-13 Sun 12:52]
*** DONE Haplotype caller
CLOSED: [2022-11-13 Sun 13:00]
*** TODO Filter
- [X] depth
- [ ] comon snp not path
Problème avec liste des ID
**** TODO variant annotation
Besoin de vep
*** TODO Variant calling
* Amélioration :amelioration:
* Documentation :doc:
** DONE Procédure d'installation nix + dependences pour VM CHU
CLOSED: [2023-04-22 Sat 15:27] SCHEDULED: <2023-04-13 Thu>
* Manuscript :manuscript:
* Tests :tests:
** WAIT Non régression : version prod
*** DONE ID common snp
CLOSED: [2022-11-19 Sat 21:36]
#+begin_src
$ wc -l ID_of_common_snp.txt
23194290 ID_of_common_snp.txt
$ wc -l /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
23194290 /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
#+end_src
*** DONE ID common snp not clinvar patho
CLOSED: [2022-12-11 Sun 20:11]
**** DONE Vérification du problème
CLOSED: [2022-12-11 Sun 16:30]
Sur le J:
21155134 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref
Version de "non-régression"
21155076 database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt
Nouvelle version
23193391 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt
Si on enlève les doublons
$ sort database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt | uniq > old.txt
$ wc -l old.txt
21107097 old.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt | uniq > new.txt
$ wc -l new.txt
21174578 new.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref | uniq > ref.txt
$ wc -l ref.txt
21107155 ref.txt
Si on regarde la différence
 comm -23 ref.txt old.txt
rs1052692
rs1057518973
rs1057518973
rs11074121
rs112848754
rs12573787
rs145033890
rs147889095
rs1553904159
rs1560294695
rs1560296615
rs1560310926
rs1560325547
rs1560342418
rs1560356225
rs1578287542
...
On cherche le premier
bcftools query -i 'ID="rs1052692"' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 1619351 C A,T
Il est bien patho...
$ bcftools query -i 'POS=1619351' database/clinvar/clinvar.vcf.gz -f '%CHROM %POS %REF %ALT %INFO/CLNSIG\n'
19 1619351 C T Conflicting_interpretations_of_pathogenicity
On vérifie pour tous les autres
$ comm -23 ref.txt old.txt > tocheck.txt
On génère les régions à vérifier (chromosome number:position)
$ bcftools query -i 'ID=@tocheck.txt' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM\t%POS\n' > tocheck.pos
On génère le mapping inverse (chromosome number -> NC)
$ awk ' { t = $1; $1 = $2; $2 = t; print; } ' database/RefSeq/refseq_to_number_only_consensual.txt  > mapping.txt
On remap clinvar
$ bcftools annotate --rename-chrs mapping.txt database/clinvar/clinvar.vcf.gz -o clinvar_remapped.vcf.gz
$ tabix clinvar_remapped.vcf.gz
Enfin, on cherche dans clinvar la classification
$ bcftools query -R tocheck.pos clinvar_remapped.vcf.gz -f '%CHROM %POS %INFO/CLNSIG\n'
$ bcftools query -R tocheck.pos database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %ID \n' | grep '^NC'
#+RESULTS:
**** DONE Comprendre pourquoi la nouvelle version donne un résultat différent
CLOSED: [2022-12-11 Sun 20:11]
***** DONE Même version dbsnp et clinvar ?
CLOSED: [2022-12-10 Sat 23:02]
Clinvar différent !
  $ bcftools stats clinvar.gz
  clinvar (Alexis)
SN	0	number of samples:	0
SN	0	number of records:	1492828
SN	0	number of no-ALTs:	965
SN	0	number of SNPs:	1338007
SN	0	number of MNPs:	5562
SN	0	number of indels:	144580
SN	0	number of others:	3714
SN	0	number of multiallelic sites:	0
SN	0	number of multiallelic SNP sites:	0
clinvar (new)
SN	0	number of samples:	0
SN	0	number of records:	1493470
SN	0	number of no-ALTs:	965
SN	0	number of SNPs:	1338561
SN	0	number of MNPs:	5565
SN	0	number of indels:	144663
SN	0	number of others:	3716
SN	0	number of multiallelic sites:	0
SN	0	number of multiallelic SNP sites:	0
***** DONE Mettre à jour clinvar et dbnSNP pour travailler sur les mêm bases
CLOSED: [2022-12-11 Sun 12:10]
Problème persiste
***** DONE Supprimer la conversion en int du chromosome
CLOSED: [2022-12-10 Sat 19:29]
***** KILL Même NC ?
CLOSED: [2022-12-10 Sat 19:29]
$  zgrep "contig=<ID=NC_\(.*\)" clinvar/GRCh38/clinvar.vcf.gz > contig.clinvar
$ diff contig.txt contig.clinvar
< ##contig=<ID=NC_012920.1>
***** DONE Tester sur chromosome 19: ok
CLOSED: [2022-12-11 Sun 13:53]
On prépare les données
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
bcftools filter -i 'CHROM="NC_000019.10"'

Replacement in projects/bisonex.org at line 38 [4.35]

B:BD[8.31275] → [8.31275:32308]

B:BD[8.32308] → [9.9931:15680]

36965        449       4069     0.9880       0.9015     0.9428
     None              37248          36972        461       4062     0.9877       0.9017     0.9427
| Type  | Filter | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision | METRIC.Frac_NA | METRIC.F1_Score | TRUTH.TOTAL.TiTv_ratio | QUERY.TOTAL.TiTv_ratio | TRUTH.TOTAL.het_hom_ratio | QUERY.TOTAL.het_hom_ratio |
| INDEL | ALL    |        2909 |     2477 |      432 |        3229 |      207 |       519 |    52 |    50 |      0.851495 |         0.923616 |       0.160731 |        0.886091 |                        |                        |        1.4964850615114236 |        1.8339222614840989 |
| INDEL | PASS   |        2909 |     2477 |      432 |        3229 |      207 |       519 |    52 |    50 |      0.851495 |         0.923616 |       0.160731 |        0.886091 |                        |                        |        1.4964850615114236 |        1.8339222614840989 |
| SNP   | 
ALL    |       38406 |    34793 |     3613 |       36935 |      275 |      1868 |    37 |    15 |      0.905926 |         0.992158 |       0.050575 |        0.947083 |     2.6247759222568168 |     2.5752854654538417 |         1.588953331534934 |        1.6192536889897844 |
| SNP   | PASS   |       38406 |    34793 |     3613 |       36935 |      275 |      1868 |    37 |    15 |      0.905926 |         0.992158 |       0.050575 |        0.947083 |     2.6247759222568168 |     2.5752854654538417 |         1.588953331534934 |        1.6192536889897844 |
**** DONE HG003 :hg003:
CLOSED: [2023-04-16 Sun 00:20]
#+begin_src sh
NXF_OPTS=-D"user.name=${USER}" nextflow run main.nf -profile standard,helios  --input /Work/Groups/bisonex/data/giab/GRCh38/HG003_{1,2}.fq.gz -bg
#+end_src
#+begin_src  sh
NXF_OPTS=-D"user.name=${USER}" nextflow run workflows/compareVCF.nf -profile standard,helios -resume --outdir=compareHG003  --test.id=HG003 --test.query=out/HG003_1/variantCalling/haplotypecaller/HG003_1.vcf.gz  --test.compare=vcfeval,happy --test.capture=data/AgilentSureSelectv05_hg38.bed
#+end_src
vcfeval
Threshold  True-pos-baseline  True-pos-call  False-pos  False-neg  Precision  Sensitivity  F-measure
----------------------------------------------------------------------------------------------------
    5.000              36745          36473        486       3988     0.9869       0.9021     0.9426
     None              36748          36476        495       3985     0.9866       0.9022     0.9425
$ zcat NA12878.snp_roc.tsv.gz  | tail -n 1 | awk '{print $7 $6}'
happy
Type Filter  TRUTH.TOTAL  TRUTH.TP  TRUTH.FN  QUERY.TOTAL  QUERY.FP  QUERY.UNK  FP.gt  FP.al  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA  METRIC.F1_Score  TRUTH.TOTAL.TiTv_ratio  QUERY.TOTAL.TiTv_ratio  TRUTH.TOTAL.het_hom_ratio  QUERY.TOTAL.het_hom_ratio
INDEL    ALL         2731      2290       441         3092       208        577     62     53       0.838521          0.917296        0.186611         0.876141                     NaN                     NaN                   1.505145                   1.888993
INDEL   PASS         2731      2290       441         3092       208        577     62     53       0.838521          0.917296        0.186611         0.876141                     NaN                     NaN                   1.505145                   1.888993
  SNP    ALL        37997     34481      3516        36861       306       2074     33     13       0.907466          0.991204        0.056265         0.947488                2.611269                2.565915                   1.555780                   1.621727
  SNP   PASS        37997     34481      3516        36861       306       2074     33     13       0.907466          0.991204        0.056265         0.947488                2.611269                2.5659
**** DONE HG004
CLOSED: [2023-04-16 Sun 00:20]
#+begin_src sh
NXF_OPTS=-D"user.name=${USER}" nextflow run main.nf -profile standard,helios  --input /Work/Groups/bisonex/data/giab/GRCh38/HG004_{1,2}.fq.gz -bg
#+end_src
vcfeval
Threshold  True-pos-baseline  True-pos-call  False-pos  False-neg  Precision  Sensitivity  F-measure
----------------------------------------------------------------------------------------------------
    6.000              36938          36678        421       4040     0.9887       0.9014     0.9430
     None              36942          36682        432       4036     0.9884       0.9015     0.9429
happy
 Type Filter  TRUTH.TOTAL  TRUTH.TP  TRUTH.FN  QUERY.TOTAL  QUERY.FP  QUERY.UNK  FP.gt  FP.al  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA  METRIC.F1_Score  TRUTH.TOTAL.TiTv_ratio  QUERY.TOTAL.TiTv_ratio  TRUTH.TOTAL.het_hom_ratio  QUERY.TOTAL.het_hom_ratio
INDEL    ALL         2787      2388       399         3183       195        580     53     38       0.856835          0.925086        0.182218         0.889654                     NaN                     NaN                   1.507834                   1.848649
INDEL   PASS         2787      2388       399         3183       195        580     53     38       0.856835          0.925086        0.182218         0.889654                     NaN                     NaN                   1.507834                   1.848649
  SNP    ALL        38185     34560      3625        36921       254       2107     46      7       0.905067          0.992704        0.057068         0.946862                2.589175                2.553546                   1.632595                   1.653534
  SNP   PASS        38185     34560      3625        36921       254       2107     46      7       0.905067          0.992704        0.057068         0.946862                2.589175                2.553546                   1.632595                   1.653534
**** DONE Résumer résultats pour Paul + article :resultats:
CLOSED: [2023-04-06 Thu 21:41] SCHEDULED: <2023-04-02 Sun>
**** DONE Plot : ashkenazim trio
CLOSED: [2023-04-18 Tue 21:27] SCHEDULED: <2023-04-16 Sun>
/Entered on/ [2023-04-16 Sun 17:29]
*** TODO Platinum genome
https://emea.illumina.com/platinumgenomes.html
*** TODO Séquencer NA12878
Discussion avec Paul : sous-traitant ne nous donnera pas les données, il faut commander l'ADN
** TODO Fastq avec tous les variants centogène
*** TODO Extraire liste des variants
SCHEDULED: <2023-04-17 Mon>
*** TODO Générer fastq
SCHEDULED: <2023-04-16 Sun>
*** TODO Vérifier qu'on les retrouve tous
SCHEDULED: <2023-04-17 Mon>
** Divers
*** DONE Vérifier nombre de reads fastq - bam
CLOSED: [2022-10-09 Sun 22:31]
* DONE Plot : ashkenazim trio
CLOSED: [2023-04-18 Tue 21:28] SCHEDULED: <2023-04-16 Sun>
/Entered on/ [2023-04-16 Sun 17:29]

[8.31275]

36965        449       4069     0.9880       0.9015     0.9428
     None              37248          36972        461       4062     0.9877       0.9017     0.9427
| Type  | Filter | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision | METRIC.Frac_NA | METRIC.F1_Score | TRUTH.TOTAL.TiTv_ratio | QUERY.TOTAL.TiTv_ratio | TRUTH.TOTAL.het_hom_ratio | QUERY.TOTAL.het_hom_ratio |
| INDEL | ALL    |        2909 |     2477 |      432 |        3229 |      207 |       519 |    52 |    50 |      0.851495 |         0.923616 |       0.160731 |        0.886091 |                        |                        |        1.4964850615114236 |        1.8339222614840989 |
| INDEL | PASS   |        2909 |     2477 |      432 |        3229 |      207 |       519 |    52 |    50 |      0.851495 |         0.923616 |       0.160731 |        0.886091 |                        |                        |        1.4964850615114236 |        1.8339222614840989 |
| SNP   | ALL    |       38406 |    34793 |     3613 |       36935 |      275 |      1868 |    37 |    15 |      0.905926 |         0.992158 |       0.050575 |        0.947083 |     2.6247759222568168 |     2.5752854654538417 |         1.588953331534934 |        1.6192536889897844 |
| SNP   | PASS   |       38406 |    34793 |     3613 |       36935 |      275 |      1868 |    37 |    15 |      0.905926 |         0.992158 |       0.050575 |        0.947083 |     2.6247759222568168 |     2.5752854654538417 |         1.588953331534934 |        1.6192536889897844 |
**** DONE HG003 :hg003:
CLOSED: [2023-04-16 Sun 00:20]
#+begin_src sh
NXF_OPTS=-D"user.name=${USER}" nextflow run main.nf -profile standard,helios  --input /Work/Groups/bisonex/data/giab/GRCh38/HG003_{1,2}.fq.gz -bg
#+end_src
#+begin_src  sh
NXF_OPTS=-D"user.name=${USER}" nextflow run workflows/compareVCF.nf -profile standard,helios -resume --outdir=compareHG003  --test.id=HG003 --test.query=out/HG003_1/variantCalling/haplotypecaller/HG003_1.vcf.gz  --test.compare=vcfeval,happy --test.capture=data/AgilentSureSelectv05_hg38.bed
#+end_src
vcfeval
Threshold  True-pos-baseline  True-pos-call  False-pos  False-neg  Precision  Sensitivity  F-measure
----------------------------------------------------------------------------------------------------
    5.000              36745          36473        486       3988     0.9869       0.9021     0.9426
     None              36748          36476        495       3985     0.9866       0.9022     0.9425
$ zcat NA12878.snp_roc.tsv.gz  | tail -n 1 | awk '{print $7 $6}'
happy
Type Filter  TRUTH.TOTAL  TRUTH.TP  TRUTH.FN  QUERY.TOTAL  QUERY.FP  QUERY.UNK  FP.gt  FP.al  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA  METRIC.F1_Score  TRUTH.TOTAL.TiTv_ratio  QUERY.TOTAL.TiTv_ratio  TRUTH.TOTAL.het_hom_ratio  QUERY.TOTAL.het_hom_ratio
INDEL    ALL         2731      2290       441         3092       208        577     62     53       0.838521          0.917296        0.186611         0.876141                     NaN                     NaN                   1.505145                   1.888993
INDEL   PASS         2731      2290       441         3092       208        577     62     53       0.838521          0.917296        0.186611         0.876141                     NaN                     NaN                   1.505145                   1.888993
  SNP    ALL        37997     34481      3516        36861       306       2074     33     13       0.907466          0.991204        0.056265         0.947488                2.611269                2.565915                   1.555780                   1.621727
  SNP   PASS        37997     34481      3516        36861       306       2074     33     13       0.907466          0.991204        0.056265         0.947488                2.611269                2.5659
**** DONE HG004
CLOSED: [2023-04-16 Sun 00:20]
#+begin_src sh
NXF_OPTS=-D"user.name=${USER}" nextflow run main.nf -profile standard,helios  --input /Work/Groups/bisonex/data/giab/GRCh38/HG004_{1,2}.fq.gz -bg
#+end_src
vcfeval
Threshold  True-pos-baseline  True-pos-call  False-pos  False-neg  Precision  Sensitivity  F-measure
----------------------------------------------------------------------------------------------------
    6.000              36938          36678        421       4040     0.9887       0.9014     0.9430
     None              36942          36682        432       4036     0.9884       0.9015     0.9429
happy
 Type Filter  TRUTH.TOTAL  TRUTH.TP  TRUTH.FN  QUERY.TOTAL  QUERY.FP  QUERY.UNK  FP.gt  FP.al  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA  METRIC.F1_Score  TRUTH.TOTAL.TiTv_ratio  QUERY.TOTAL.TiTv_ratio  TRUTH.TOTAL.het_hom_ratio  QUERY.TOTAL.het_hom_ratio
INDEL    ALL         2787      2388       399         3183       195        580     53     38       0.856835          0.925086        0.182218         0.889654                     NaN                     NaN                   1.507834                   1.848649
INDEL   PASS         2787      2388       399         3183       195        580     53     38       0.856835          0.925086        0.182218         0.889654                     NaN                     NaN                   1.507834                   1.848649
  SNP    ALL        38185     34560      3625        36921       254       2107     46      7       0.905067          0.992704        0.057068         0.946862                2.589175                2.553546                   1.632595                   1.653534
  SNP   PASS        38185     34560      3625        36921       254       2107     46      7       0.905067          0.992704        0.057068         0.946862                2.589175                2.553546                   1.632595                   1.653534
**** DONE Résumer résultats pour Paul + article :resultats:
CLOSED: [2023-04-06 Thu 21:41] SCHEDULED: <2023-04-02 Sun>
**** DONE Plot : ashkenazim trio
CLOSED: [2023-04-18 Tue 21:27] SCHEDULED: <2023-04-16 Sun>
/Entered on/ [2023-04-16 Sun 17:29]
*** TODO Platinum genome
https://emea.illumina.com/platinumgenomes.html
*** TODO Séquencer NA12878
Discussion avec Paul : sous-traitant ne nous donnera pas les données, il faut commander l'ADN
** TODO Fastq avec tous les variants centogène :centogene:
*** DONE Extraire liste des SNVs
CLOSED: [2023-04-22 Sat 17:32] SCHEDULED: <2023-04-17 Mon>
**** DONE Corriger manquant à la main
CLOSED: [2023-04-22 Sat 17:31]
**** DONE Automatique
CLOSED: [2023-04-22 Sat 17:31]
*** TODO Convert SNVs : trnascript -> génomique avec variant_recoder
SCHEDULED: <2023-04-22 Sat>
*** TODO Extraire liste des CNVs
SCHEDULED: <2023-04-17 Mon>
*** TODO Générer fastq avec simuscop
SCHEDULED: <2023-04-22 Sat>
**** TODO Génerer un profile
SCHEDULED: <2023-04-22 Sat>
NA12878 mais à refaire avec un vrai séquencage
**** TODO Générer les données
SCHEDULED: <2023-04-22 Sat>
Quelle capture ???
*** TODO Vérifier qu'on les retrouve tous
SCHEDULED: <2023-04-22 Sat>
** Divers
*** DONE Vérifier nombre de reads fastq - bam
CLOSED: [2022-10-09 Sun 22:31]
* DONE Plot : ashkenazim trio
CLOSED: [2023-04-18 Tue 21:28] SCHEDULED: <2023-04-16 Sun>
/Entered on/ [2023-04-16 Sun 17:29]