apraga/org - Change F4OH5H3ONZKUVBUI5NKYJ25B66FS22QQ7LRAO53OQX3ZBL2BL6JAC

Cleaning up tasks

Created by Alexis Praga on October 7, 2023

F4OH5H3ONZKUVBUI5NKYJ25B66FS22QQ7LRAO53OQX3ZBL2BL6JAC

Dependencies

In channels

main

Change contents

Replacement in projects.org at line 84 [7.123895]
B:BD[5.189] → [6.509:529]
```
** Examen bactério
```
[5.189]
[6.529]
```
** TODO Examen bactério :partiel:
```

Replacement in projects.org at line 88 [7.123895]

B:BD[6.568] → [8.15:99]

∅:D[8.99] → [9.41:113]

B:BD[6.624] → [9.41:113]

B:BD[9.113] → [8.100:180]

∅:D[8.180] → [10.110:130]

B:BD[10.110] → [10.110:130]

B:BD[10.130] → [8.181:3824]

B:BD[8.3824] → [4.15:70]

∅:D[4.70] → [8.3879:4051]

B:BD[8.3879] → [8.3879:4051]

B:BD[8.4051] → [4.71:3587]

*** DONE Passe 1 [37/37]
CLOSED: [2023-10-06 Fri 23:27] SCHEDULED: <2023-10-01 Sun>
- [X] https://scut.srht.site/notes/medecine/20230528235124-culture.html
- [X] <[Syphilis]> - ~/annex/public/lessons/microbiologie/bacterio/Syphilis.pdf
- [X] A. agalactiae
- [X] <[ATBg staphylocoques]> - "~/annex/public/lessons/microbiologie/bacterio/Antibiogramme - Staphylocoques.pdf"
- [X] <[ATBg streptococque, entérocoque, Listeria]> - "~/annex/public/lessons/microbiologie/bacterio/Antibiogramme - Streptococoque, enterocoque, Listeria.pdf"
- [X] <[Chlamydia, mycoplasme]> - "~/annex/public/lessons/microbiologie/bacterio/Chlamydia - mycoplasmes.pdf"
- [X] <[Moléculaire (bactério)]> - "~/annex/public/lessons/microbiologie/bacterio/Diagnostic moleculaire - Bacteriologie.pdf"
- [X] <[EBCU etiologies]> - "~/annex/public/lessons/microbiologie/bacterio/EBCU etiologies.pdf"
- [X] <[ECBU interpretation]> - "~/annex/public/lessons/microbiologie/bacterio/ECBU interpretation.pdf"
- [X] <[ECBU pre-analytique]> - "~/annex/public/lessons/microbiologie/bacterio/ECBU pre-analytique.pdf"
- [X] <[EEQ, CIQ]> - "~/annex/public/lessons/microbiologie/bacterio/EEQ, CIQ.pdf"
- [X] <[Examen microscopique]> - "~/annex/public/lessons/microbiologie/bacterio/Examen microscopique.pdf"
- [X] <[Gonocoque]> - "~/annex/public/lessons/microbiologie/bacterio/Gonocoque.pdf"
- [X] <[Hygiène]> - "~/annex/public/lessons/microbiologie/bacterio/Hygiène.pdf"
- [X] <[Infections cutanees]> - "~/annex/public/lessons/microbiologie/bacterio/Infections cutanees.pdf"
- [X] <[MALDI - TOF]> - "~/annex/public/lessons/microbiologie/bacterio/MALDI - TOF.pdf"
- [X] <[Pre-analytique bacteriologie]> - "~/annex/public/lessons/microbiologie/bacterio/Pre-analytique bacteriologie.pdf"
- [X] <[Qualite]> - "~/annex/public/lessons/microbiologie/bacterio/Qualite.pdf"
- [X] <[Securite Transfusionnelle]> - "~/annex/public/lessons/microbiologie/bacterio/Securite Transfusionnelle.pdf"
- [X] <[Serologie bacterienne]> - "~/annex/public/lessons/microbiologie/bacterio/Serologie bacterienne.pdf"
- [X] <[Tests rapides antigeniques et moleculaires]> - "~/annex/public/lessons/microbiologie/bacterio/Tests rapides antigeniques et moleculaires.pdf"
- [X] <[Tuberculose]> - "~/annex/public/lessons/microbiologie/bacterio/Tuberculose.pdf"
- [X] <[Typage moleculaire bacterien]> - "~/annex/public/lessons/microbiologie/bacterio/Typage moleculaire bacterien.pdf"
- [X] <[Vaccination personnel]> - "~/annex/public/lessons/microbiologie/bacterio/Vaccination personnel.pdf"
- [X] <[Angines bacteriennes]> - "~/annex/public/lessons/microbiologie/bacterio/Angines bacteriennes.pdf"
- [X] <[Antibiogramme - Enterobacteries]> - "~/annex/public/lessons/microbiologie/bacterio/Antibiogramme - Enterobacteries.pdf"
- [X] <[Antibiogramme]> - "~/annex/public/lessons/microbiologie/bacterio/Antibiogramme.pdf"
- [X] <[Cambylobacter]> - "~/annex/public/lessons/microbiologie/bacterio/Cambylobacter.pdf"
- [X] <[Clostridium difficile]> - "~/annex/public/lessons/microbiologie/bacterio/Clostridium difficile.pdf"
- [X] <[Concentrations critiques]> - "~/annex/public/lessons/microbiologie/bacterio/Concentrations critiques.pdf"
- [X] <[Conseil anti-infectieux]> - "~/annex/public/lessons/microbiologie/bacterio/Conseil anti-infectieux.pdf"
- [X] <[Declaration obligatoire]> - "~/annex/public/lessons/microbiologie/bacterio/Declaration obligatoire.pdf"
- [X] <[Hemocultures 1]> - "~/annex/public/lessons/microbiologie/bacterio/Hemocultures 1.pdf"
- [X] <[Hemocultures 2]> - "~/annex/public/lessons/microbiologie/bacterio/Hemocultures 2.pdf"
- [X] <[Legionelle]> - "~/annex/public/lessons/microbiologie/bacterio/Legionelle.pdf"
- [X] <[Meningites bacteriennes ]> - "~/annex/public/lessons/microbiologie/bacterio/Meningites bacteriennes .pdf"
- [X] <[Salmonelle - shigelle]> - "~/annex/public/lessons/microbiologie/bacterio/Salmonelle - shigelle.pdf"
*** TODO Passe 2 [37/37]
SCHEDULED: <2023-10-06 Fri> DEADLINE: <2023-10-10 Tue>
- [ ] https://scut.srht.site/notes/medecine/20230528235124-culture.html
- [ ] <[Syphilis]> - ~/annex/public/lessons/microbiologie/bacterio/Syphilis.pdf
- [ ] A. agalactiae
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Antibiogramme - Staphylocoques.pdf][ATBg staphylocoques]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Antibiogramme - Streptococoque, enterocoque, Listeria.pdf][ATBg streptococque, entérocoque, Listeria]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Chlamydia - mycoplasmes.pdf][Chlamydia, mycoplasme]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Diagnostic moleculaire - Bacteriologie.pdf][Moléculaire (bactério)]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/EBCU etiologies.pdf][EBCU etiologies]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/ECBU interpretation.pdf][ECBU interpretation]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/ECBU pre-analytique.pdf][ECBU pre-analytique]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/EEQ, CIQ.pdf][EEQ, CIQ]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Examen microscopique.pdf][Examen microscopique]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Gonocoque.pdf][Gonocoque]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Hygiène.pdf][Hygiène]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Infections cutanees.pdf][Infections cutanees]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/MALDI - TOF.pdf][MALDI - TOF]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Pre-analytique bacteriologie.pdf][Pre-analytique bacteriologie]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Qualite.pdf][Qualite]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Securite Transfusionnelle.pdf][Securite Transfusionnelle]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Serologie bacterienne.pdf][Serologie bacterienne]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Tests rapides antigeniques et moleculaires.pdf][Tests rapides antigeniques et moleculaires]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Tuberculose.pdf][Tuberculose]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Typage moleculaire bacterien.pdf][Typage moleculaire bacterien]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Vaccination personnel.pdf][Vaccination personnel]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Angines bacteriennes.pdf][Angines bacteriennes]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Antibiogramme - Enterobacteries.pdf][Antibiogramme - Enterobacteries]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Antibiogramme.pdf][Antibiogramme]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Cambylobacter.pdf][Cambylobacter]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Clostridium difficile.pdf][Clostridium difficile]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Concentrations critiques.pdf][Concentrations critiques]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Conseil anti-infectieux.pdf][Conseil anti-infectieux]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Declaration obligatoire.pdf][Declaration obligatoire]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Hemocultures 1.pdf][Hemocultures 1]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Hemocultures 2.pdf][Hemocultures 2]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Legionelle.pdf][Legionelle]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Meningites bacteriennes .pdf][Meningites bacteriennes ]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Salmonelle - shigelle.pdf][Salmonelle - shigelle]]

[6.568]

[11.14]

*** TODO Cours
SCHEDULED: <2023-10-07 Sat>
- [1/5] <[Culture]> - https://scut.srht.site/notes/medecine/20230528235124-culture.html
- [1/5] <[Syphilis]> - "~/annex/public/lessons/microbiologie/bacterio/Syphilis.pdf"
- [1/5] A. agalactiae
- [1/5] <[ATBg staphylocoques]> - "~/annex/public/lessons/microbiologie/bacterio/Antibiogramme - Staphylocoques.pdf"
- [1/5] <[ATBg streptococque, entérocoque, Listeria]> - "~/annex/public/lessons/microbiologie/bacterio/Antibiogramme - Streptococoque, enterocoque, Listeria.pdf"
- [1/5] <[Chlamydia, mycoplasme]> - "~/annex/public/lessons/microbiologie/bacterio/Chlamydia - mycoplasmes.pdf"
- [1/5] <[Moléculaire (bactério)]> - "~/annex/public/lessons/microbiologie/bacterio/Diagnostic moleculaire - Bacteriologie.pdf"
- [1/5] <[EBCU etiologies]> - "~/annex/public/lessons/microbiologie/bacterio/EBCU etiologies.pdf"
- [1/5] <[ECBU interpretation]> - "~/annex/public/lessons/microbiologie/bacterio/ECBU interpretation.pdf"
- [1/5] <[ECBU pre-analytique]> - "~/annex/public/lessons/microbiologie/bacterio/ECBU pre-analytique.pdf"
- [1/5] <[EEQ, CIQ]> - "~/annex/public/lessons/microbiologie/bacterio/EEQ, CIQ.pdf"
- [1/5] <[Examen microscopique]> - "~/annex/public/lessons/microbiologie/bacterio/Examen microscopique.pdf"
- [1/5] <[Gonocoque]> - "~/annex/public/lessons/microbiologie/bacterio/Gonocoque.pdf"
- [1/5] <[Hygiène]> - "~/annex/public/lessons/microbiologie/bacterio/Hygiène.pdf"
- [1/5] <[Infections cutanees]> - "~/annex/public/lessons/microbiologie/bacterio/Infections cutanees.pdf"
- [1/5] <[MALDI - TOF]> - "~/annex/public/lessons/microbiologie/bacterio/MALDI - TOF.pdf"
- [1/5] <[Pre-analytique microbiologie/bacteriologie]> - "~/annex/public/lessons/bacterio/Pre-analytique bacteriologie.pdf"
- [1/5] <[Qualite]> - "~/annex/public/lessons/microbiologie/bacterio/Qualite.pdf"
- [1/5] <[Securite Transfusionnelle]> - "~/annex/public/lessons/microbiologie/bacterio/Securite Transfusionnelle.pdf"
- [1/5] <[Serologie bacterienne]> - "~/annex/public/lessons/microbiologie/bacterio/Serologie bacterienne.pdf"
- [1/5] <[Tests rapides antigeniques et moleculaires]> - "~/annex/public/lessons/microbiologie/bacterio/Tests rapides antigeniques et moleculaires.pdf"
- [1/5] <[Tuberculose]> - "~/annex/public/lessons/microbiologie/bacterio/Tuberculose.pdf"
- [1/5] <[Moleculaire]> - "~/annex/public/lessons/microbiologie/bacterio/Typage moleculaire bacterien.pdf"
- [1/5] <[Vaccination personnel]> - "~/annex/public/lessons/microbiologie/bacterio/Vaccination personnel.pdf"
- [1/5] <[Angines]> - "~/annex/public/lessons/microbiologie/bacterio/Angines bacteriennes.pdf"
- [1/5] <[ATBg Enterobacteries]> - "~/annex/public/lessons/microbiologie/bacterio/Antibiogramme - Enterobacteries.pdf"
- [1/5] <[ATBg]> - "~/annex/public/lessons/microbiologie/bacterio/Antibiogramme.pdf"
- [1/5] <[Cambylobacter]> - "~/annex/public/lessons/microbiologie/bacterio/Cambylobacter.pdf"
- [1/5] <[Clostridium difficile]> - "~/annex/public/lessons/microbiologie/bacterio/Clostridium difficile.pdf"
- [1/5] <[Concentrations critiques]> - "~/annex/public/lessons/microbiologie/bacterio/Concentrations critiques.pdf"
- [1/5] <[Conseil anti-infectieux]> - "~/annex/public/lessons/microbiologie/bacterio/Conseil anti-infectieux.pdf"
- [1/5] <[Declaration obligatoire]> - "~/annex/public/lessons/microbiologie/bacterio/Declaration obligatoire.pdf"
- [1/5] <[Hemocultures 1]> - "~/annex/public/lessons/microbiologie/bacterio/Hemocultures 1.pdf"
- [1/5] <[Hemocultures 2]> - "~/annex/public/lessons/microbiologie/bacterio/Hemocultures 2.pdf"
- [1/5] <[Legionelle]> - "~/annex/public/lessons/microbiologie/bacterio/Legionelle.pdf"
- [1/5] <hMeningites bacteriennes ]> - "~/annex/public/lessons/microbiologie/bacterio/Meningites bacteriennes .pdf"
- [1/5] <[Salmonelle - shigelle]> - "~/annex/public/lessons/microbiologie/bacterio/Salmonelle - shigelle.pdf"
*** DONE Passe 1
CLOSED: [2023-10-05 Thu 16:58] DEADLINE: <2023-10-06 Fri> SCHEDULED: <2023-10-06 Fri>
*** TODO Passe 2: autres cours
DEADLINE: <2023-10-11 Wed> SCHEDULED: <2023-10-07 Sat>
** TODO Présentation dépistage hémato :presentation:
:PROPERTIES:
:CATEGORY: bacterio
:END:
*** DONE Refaire analyse bouche
CLOSED: [2023-10-07 Sat 17:49] SCHEDULED: <2023-10-07 Sat>
*** TODO Traitement patients bouches
SCHEDULED: <2023-10-07 Sat>
*** TODO Faire analyse selles
SCHEDULED: <2023-10-07 Sat>
*** TODO Traitement patients selles
SCHEDULED: <2023-10-07 Sat>
*** TODO Présenter Torres 2022
SCHEDULED: <2023-10-07 Sat>
*** TODO Présenter Santibiez 2023

Replacement in projects.org at line 314 [7.123895]
B:BD[12.502] → [13.402:420]
```
**** TODO Article
```
[12.502]
[13.420]
```
**** DONE Article
CLOSED: [2023-10-07 Sat 18:00]
```
Replacement in projects.org at line 328 [7.123895]
∅:D[14.77] → [12.503:524]
∅:D[15.148] → [12.503:524]
∅:D[16.174] → [12.503:524]
∅:D[17.412] → [12.503:524]
∅:D[13.495] → [12.503:524]
∅:D[18.1316] → [12.503:524]
B:BD[19.396] → [12.503:524]
```
**** TODO Soumission
```
[18.1316]
[18.1317]
```
**** DONE Soumission
CLOSED: [2023-10-07 Sat 18:00]
```

Replacement in projects.org at line 761 [7.123895]

B:BD[20.249] → [20.249:300]

*** TODO Prendre rendez vous courroie distribution

[20.249]

[21.986]

*** DONE Prendre rendez vous courroie distribution
CLOSED: [2023-10-07 Sat 17:48]
*** TODO Changer phare arrière droit
SCHEDULED: <2023-10-08 Sun>

Replacement in projects.org at line 795 [7.123895]

B:BD[22.461] → [22.461:499]

B:BD[22.499] → [9.424:452]

∅:D[9.452] → [22.527:561]

B:BD[22.527] → [22.527:561]

B:BD[22.561] → [9.453:481]

** TODO Résilier ordures ménagères
SCHEDULED: <2023-10-06 Fri>
** TODO Payer ordures ménagères
SCHEDULED: <2023-10-06 Fri>

[22.461]

[22.687]

** WAIT Résilier ordures ménagères
SCHEDULED: <2023-10-14 Sat>
Mail envoyé
** DONE Payer ordures ménagères
CLOSED: [2023-10-07 Sat 17:48] SCHEDULED: <2023-10-06 Fri>

Insertion in projects.org at line 801 [7.123895]

[22.723]

[23.72]

Envoyé TIP. RIB déjà envoyé ? Sinon à repaer <2023-10-07 Sat>

Insertion in projects.org at line 807 [7.123895]
[24.120]
[23.204]
```
SCHEDULED: <2023-10-07 Sat>
```
Insertion in projects.org at line 809 [7.123895]
[23.240]
[25.279]
```
À surveiller
```
Replacement in projects.org at line 929 [7.123895]
B:BD[6.1128] → [6.1128:1156]
```
SCHEDULED: <2023-10-07 Sat>
```
[6.1128]
[6.1156]
```
SCHEDULED: <2023-10-08 Sun>
```
Insertion in projects.org at line 933 [7.123895]
[6.1241]

Replacement in projects/bisonex.org at line 1 [26.35]

B:BD[26.35] → [25.5100:13315]

∅:D[25.13315] → [27.8221:16390]

B:BD[27.8221] → [27.8221:16390]

#+title: Bisonex
#+category: bisonex
* Biblio
:PROPERTIES:
:CATEGORY: biblio
:END:
** Workflow
Comparaison WDL, Cromwell, nextflow
https://www.nature.com/articles/s41598-021-99288-8
Nextflow = bon compromis ?
Comparison alignement, variant caller (2021)
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04144-1
** Étapes du pipeline
*** Variant calling: Haplotype caller
https://gatk.broadinstitute.org/hc/en-us/articles/360035531412
Définis l'algorithme + image
*** Phred score
https://gatk.broadinstitute.org/hc/en-us/articles/360035531872-Phred-scaled-quality-scores
** VCF
*** GT genotype
encoded as alleles values separated by either of ”/” or “|”, e.g. The allele values are 0 for the reference allele (what is in the reference sequence), 1 for the first allele listed in ALT, 2 for the second allele list in ALT and so on. For diploid calls examples could be 0/1 or 1|0 etc. For haploid calls, e.g. on Y, male X, mitochondrion, only one allele value should be given. All samples must have GT call information; if a call cannot be made for a sample at a given locus, ”.” must be specified for each missing allele in the GT field (for example ./. for a diploid). The meanings of the separators are:
    / : genotype unphased
    | : genotype phased
** Validation
*** NA12878
**** KILL [[https://precision.fda.gov/challegnges/truth/results][fdaPrecision challenge]]
Attention, génome et en hg19 donc comparaison non adaptée ...
**** TODO Best practices for the analytical validation of clinical whole-genome sequencing intended for the diagnosis of germline disease
https://www.nature.com/articles/s41525-020-00154-9
Recommandations générale pour genome, sans données brutes
**** TODO [#A] Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2928-9
1. variant calling seul
2. NA12878 + données simulées
3. exome
4. évalué via F-score
Code disponible ! https://github.com/bharani-lab/WES-Benchmarking-Pipeline_Manoj/tree/master/Script
Résultat: BWA/Novoalign_DeepVariant
Aligneurs
- BWA-MEM 0.7.16
- Bowtie2 2.2.6
- Novoalign 3.08.02
- SOAP 2.21
- MOSAIK 2.2.3
Variantcalling
- GATK HaplotypeCaller 4
- FreeBayes 1.1.0
- SAMtools mpileup 1.7
- DeepVariant r0.4
  SNV
| Exome | Pipeline |    TP |   FP |  FN | Sensitivity | Precision | F-Score |   FDR |
|     1 | BWA_GATK | 2368g9 | 1397 | 613 |       0.975 |     0.944 |   0.959 | 0.057 |
|     2 | BWA_GATK | 23946 |  865 | 356 |       0.985 |     0.965 |   0.975 | 0.036 |
indel
 |   TP | FP | FN | Sensitivity | Precision | F-Score |   FDR |   |
 | 1254 | 72 | 75 |       0.944 |     0.946 |   0.945 | 0.054 |   |
 | 1309 | 10 | 20 |       0.985 |     0.992 |   0.989 | 0.008 |   |
Valeur brutes :
https://static-content.springer.com/esm/art%3A10.1186%2Fs12859-019-2928-9/MediaObjects/12859_2019_2928_MOESM8_ESM.pdf
Autres articles avec même comparaison en exome sur NA12878
- Hwang et al., 2015 studyi
- Highnam et al, 2015
-  Cornish and Guda, 2015
Variant Type
|                       | SNVs & Indels | CNVs (>10Kb) | SVs | Mitochondrial variants | Pseudogenes | REs | Somatic/ mosaic | Literature/Data | Source   |
| NA12878               |         100%a |          40% |   0 |                      0 |           0 |   0 |               0 | Zook et  al18   | NIST     |
| Other NIST standard   |           71% |          40% | 50% |                      0 |           0 |   0 |               0 | Zook  et al18   |          |
| (e.g. AJ/Asian trios) |               |              |     |                        |             |     |                 |                 |          |
| Platinum              |           29% |            0 |   0 |                      0 |           0 |   0 |               0 | Eberle et  al8  | Platinum |
| Genomes               |               |              |     |                        |             |     |                 |                 |          |
| Venter/HuRef          |           14% |          40% |   0 |                      0 |           0 |   0 |               0 | Trost et al1    | HuRef    |
**** Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers
#+begin_src bibtex
@ARTICLE{Chen2019-fp,
  title     = "Systematic comparison of germline variant calling pipelines
               cross multiple next-generation sequencers",
  author    = "Chen, Jiayun and Li, Xingsong and Zhong, Hongbin and Meng,
               Yuhuan and Du, Hongli",
  abstract  = "The development and innovation of next generation sequencing
               (NGS) and the subsequent analysis tools have gain popularity in
               scientific researches and clinical diagnostic applications.
               Hence, a systematic comparison of the sequencing platforms and
               variant calling pipelines could provide significant guidance to
               NGS-based scientific and clinical genomics. In this study, we
               compared the performance, concordance and operating efficiency
               of 27 combinations of sequencing platforms and variant calling
               pipelines, testing three variant calling pipelines-Genome
               Analysis Tool Kit HaplotypeCaller, Strelka2 and
               Samtools-Varscan2 for nine data sets for the NA12878 genome
               sequenced by different platforms including BGISEQ500,
               MGISEQ2000, HiSeq4000, NovaSeq and HiSeq Xten. For the variants
               calling performance of 12 combinations in WES datasets, all
               combinations displayed good performance in calling SNPs, with
               their F-scores entirely higher than 0.96, and their performance
               in calling INDELs varies from 0.75 to 0.91. And all 15
               combinations in WGS datasets also manifested good performance,
               with F-scores in calling SNPs were entirely higher than 0.975
               and their performance in calling INDELs varies from 0.71 to
               0.93. All of these combinations manifested high concordance in
               variant identification, while the divergence of variants
               identification in WGS datasets were larger than that in WES
               datasets. We also down-sampled the original WES and WGS datasets
               at a series of gradient coverage across multiple platforms, then
               the variants calling period consumed by the three pipelines at
               each coverage were counted, respectively. For the GIAB datasets
               on both BGI and Illumina platforms, Strelka2 manifested its
               ultra-performance in detecting accuracy and processing
               efficiency compared with other two pipelines on each sequencing
               platform, which was recommended in the further promotion and
               application of next generation sequencing technology. The
               results of our researches will provide useful and comprehensive
               guidelines for personal or organizational researchers in
               reliable and consistent variants identification.",
  journal   = "Sci. Rep.",
  publisher = "Springer Science and Business Media LLC",
  volume    =  9,
  number    =  1,
  pages     = "9345",
  month     =  jun,
  year      =  2019,
  copyright = "https://creativecommons.org/licenses/by/4.0",
  language  = "en"
}
#+end_src
Comparaison de différents pipeline 2019
https://www.nature.com/articles/s41598-019-45835-3
Combinaison
- variant calling = GATK, Strelka2 and Samtools-Varscan2
- sur NA12878
- séquencé sur BGISEQ500, MGISEQ2000, HiSeq4000, NovaSeq and HiSeq Xten.
  Conclusion: strelka2 supérieur mais biais sur NA12878 ?
Illumina > BGI pour indel, probablement car reads plus grand
#+begin_quote
 For WES datasets, the BGI platforms displayed the superior performance in SNPs
 calling while Illumina platforms manifested the better variants calling
 performance
 in INDELs calling, which could be explained by their divergence in
 sequencing strategy that producing different length of reads (all BGI platforms
 were 100 base pair read length while all Illumina platforms were 150 base pair
 read length). The read length effects, as a key factor between two platforms,
 would bring alignment bias and error which are higher for short reads and
 ultimately affect the variants calling especially the INDELs identification
#+end_quote
*** Débugger variant calling (haplotypecaller)
https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant
https://gatk.broadinstitute.org/hc/en-us/articles/360035891111-Expected-variant-at-a-specific-site-was-not-called
*** Hap.py
Format de sortie :
#+begin_src r
vcf_field_names(vcf, tag = "FORMAT")
#+end_src
#+RESULTS:
: FORMAT BD    1      String  Decision for call (TP/FP/FN/N)
: FORMAT BK    1      String  Sub-type for decision (match/mismatch type)
: FORMAT BVT   1      String  High-level variant type (SNP|INDEL).
: FORMAT BLT   1      String  High-level location type (het|homref|hetalt|homa
am = genotype mismatch
lm = allele/haplotype mismatch
. = non vu
**** On vérifie que am = genotype mismatch
référence  = T/T
high-confidence = T/C
notre = C/C
#+begin_src sh
bcftools filter -i 'POS=19196584'  /Work/Groups/bisonex/data/giab/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz | grep -v '#'
bcftools filter -i 'POS=19196584'  ../out/NA12878_NIST7035-dbsnp/variantCalling/haplotypecaller/NA12878_NIST.vcf.gz | grep -v '#'
#+end_src
#+RESULTS:
: NC_000022.11    19196584        .       T       C       50      PASS    platforms=5;platformnames=Illumina,PacBio,10X,Ion,Solid;datasets=5;datasetnames=HiSeqPE300x,CCS15kb_20kb,10XChromiumLR,IonExome,SolidSE75bp;callsets=7;callsetnames=HiSeqPE300xGATK,CCS15kb_20kbDV,CCS15kb_20kbGATK4,HiSeqPE300xfreebayes,10XLRGATK,IonExomeTVC,SolidSE75GATKHC;datasetsmissingcall=CGnormal;callable=CS_HiSeqPE300xGATK_callable,CS_CCS15kb_20kbDV_callable,CS_10XLRGATK_callable,CS_CCS15kb_20kbGATK4_callable,CS_HiSeqPE300xfreebayes_callable GT:PS:DP:ADALL:AD:GQ    0/1:.:781:109,123:138,150:348
: NC_000022.11    19196584        rs1061325       T       C       59.32   PASS    AC=2;AF=1;AN=2;DB;DP=2;ExcessHet=0;FS=0;MLEAC=1;MLEAF=0.5;MQ=60;QD=29.66;SOR=2.303      GT:AD:DP:GQ:PL  1/1:0,2:2:6:71,6,0
**** On vérifie que lm = allele/haplotype mismatch
référence  = CAA/CAA
high-confidence = CA/CA
notre = C/CA
#+begin_src sh
 bcftools filter -i 'POS=31277416'  /Work/Groups/bisonex/data/giab/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz | grep -v '#'
 bcftools filter -i 'POS=31277416'  ../out/NA12878_NIST7035-dbsnp/variantCalling/haplotypecaller/NA12878_NIST.vcf.gz | grep -v '#'
#+end_src
#+RESULTS:
: NC_000022.11    31277416        .       CA      C       50      PASS    platforms=3;platformnames=Illumina,PacBio,10X;datasets=3;datasetnames=HiSeqPE300x,CCS15kb_20kb,10XChromiumLR;callsets=4;callsetnames=HiSeqPE300xGATK,CCS15kb_20kbDV,10XLRGATK,HiSeqPE300xfreebayes;datasetsmissingcall=CCS15kb_20kb,CGnormal,IonExome,SolidSE75bp;callable=CS_HiSeqPE300xGATK_callable;difficultregion=GRCh38_AllHomopolymers_gt6bp_imperfectgt10bp_slop5,GRCh38_SimpleRepeat_imperfecthomopolgt10_slop5  GT:PS:DP:ADALL:AD:GQ    1/1:.:465:16,229:0,190:129
: NC_000022.11    31277416        rs57244615      CAA     C,CA    389.02  PASS    AC=1,1;AF=0.5,0.5;AN=2;BaseQRankSum=0.37;DB;DP=37;ExcessHet=0;FS=0;MLEAC=1,1;MLEAF=0.5,0.5;MQ=60;MQRankSum=0;QD=13.41;ReadPosRankSum=-0.651;SOR=0.572    GT:AD:DP:GQ:PL  1/2:5,10,14:29:64:406,202,313,64,0,88
*** Génération de reads
Biblio récente
https://www.biorxiv.org/content/10.1101/2022.03.29.486262v1.full.pdf
Parmi ceux qui gèrent les variations
- *simuscop* reads non centré sur les zones de capture
- *NEAT: exome* mais trop lent en pratique
- *Reseq* exome
- gensim : pas d'exome
- pIRS : non plus
- varsim : non plus
  ...
  Temps de calcul selon l'article de reseq https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02265-7
  #+begin_quote
  Due to ReSeq’s effective parallelization, its elapsed times are low for this benchmark with 48 virtual CPUs (Additional file 1: Figure S34b,e). In contrast, the single-threaded processes implemented in perl or python have strikingly high elapsed times. This is well visible in Hs-HiX-TruSeq and applies to the training of pIRS (over a week), NEAT (several days), and BEAR (half a week) as well as the simulation of NEAT (close to 2 weeks) and BEAR (several weeks).
Biblio : https://www.nature.com/articles/s41437-022-00577-3
  #+end_quote
Divers
- Liste ancienne : https://www.biostars.org/p/128762/
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02265-7
* Idées
** Validation analytique
mail Yannis : données patients +/- simulées
*** Utiliser données GCAT et uploader le notre ?
https://www.nature.com/articles/ncomms7275
*** [#A] Variant calling : Genome in a bottle : NA12878 + autres
Résumé : https://www.nist.gov/programs-projects/genome-bottle
Manuscript : https://www.nature.com/articles/s41587-019-0054-x.epdf?author_access_token=E_1bL0MtBBwZr91xEsy6B9RgN0jAjWel9jnR3ZoTv0OLNnFBR7rUIZNDXq0DIKdg3w6KhBF8Rz2RWQFFc0St45kC6CZs3cDYc87HNHovbWSOubJHDa9CeJV-pN0BW_mQ0n7cM13KF2JRr_wAAn524w%3D%3D
Article comparant les variant calling : https://www.biorxiv.org/content/10.1101/2020.12.11.422022v1.full.pdf
**** KILL Tester le séquencage aussi
CLOSED: [2023-01-30 lun. 18:30]
Depuis un fastq correspondant à Illumina  https://github.com/genome-in-a-bottle/giab_data_indexes
   puis on compare le VCF avec les "high confidence"
On séquence directement NA12878 -> inutile pour le pipeline seul
**** TODO Tester seul la partie bioinformatique
   Tout résumé ici : https://www.nist.gov/programs-projects/genome-bottle
- methode https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/analysis/Illumina_PlatinumGenomes_NA12877_NA12878_09162015/IlluminaPlatinumGenomes-user-guide.pdf
- vcf
     https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/latest/GRCh38/
NB: à quoi correspond https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/analysis/Illumina_PlatinumGenomes_NA12877_NA12878_09162015/hg38/2.0.1/NA12878/ ??
   Article comparant les variant calling : https://www.biorxiv.org/content/10.1101/2020.12.11.422022v1.full.pdf
   Article pour vcfeval : https://www.nature.com/articles/s41587-019-0054-x
   La version 4 ajoute 273 gènes "clinically relevant" https://www.biorxiv.org/content/10.1101/2021.06.07.444885v3.full.pdf
   Ajout des zones "difficiles"
   https://www.biorxiv.org/content/10.1101/2020.07.24.212712v5.full.pdf
*** [#B] Pipeline : générer patient avec tous les variants retrouvés à Cento
Comparaison de génération ADN (2019)
https://academic.oup.com/bfg/article/19/1/49/5680294
**** SimuSCop (exome)
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03665-5
https://github.com/qasimyu/simuscop
1. Crééer un modèle depuis bam + vcf : Setoprofile
2. Génerer données NGS
** Annotation :
*** Comparaison vep / snpeff et annovar
* Changement nouvelle version
- Dernière version du génome (la version "prête à l'emploi" est seulement GRCh38 sans les version patchées)
* Notes
** Nextflow
*** afficher les résultats d'un process/workflow
#+begin_src
lol.out.view()
#+end_src
Attention, ne fonctionne pas si plusieurs sortie:
#+begin_src
lol.out[0].view()
#+end_src
ou si /a/ est le nom de la sortie
#+begin_src
lol.out.a.view()
#+end_src
** Quelle version du génome ?
- T2T: notation chromose = chR1,2 : ok genome, clinvar, dbSNP
- GRCh38: notation chromose = NC_... : ok genome, clinvar, dbSNP
** Performances
Ordinateur de Carine (WSL2) : 4h dont 1h15 alignement (parallélisé) et 1h15 haplotypecaller (séquentiel)
** Chromosomes NC, NT, NW
Correspondance :
https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&chromInfoPage=
Signification
https://genome.ucsc.edu/FAQ/FAQdownloads.htm

[26.35]

[28.246]

#+title: Bisonex
#+category: bisonex
* Idées
** Validation analytique
mail Yannis : données patients +/- simulées
*** Utiliser données GCAT et uploader le notre ?
https://www.nature.com/articles/ncomms7275
*** [#A] Variant calling : Genome in a bottle : NA12878 + autres
Résumé : https://www.nist.gov/programs-projects/genome-bottle
Manuscript : https://www.nature.com/articles/s41587-019-0054-x.epdf?author_access_token=E_1bL0MtBBwZr91xEsy6B9RgN0jAjWel9jnR3ZoTv0OLNnFBR7rUIZNDXq0DIKdg3w6KhBF8Rz2RWQFFc0St45kC6CZs3cDYc87HNHovbWSOubJHDa9CeJV-pN0BW_mQ0n7cM13KF2JRr_wAAn524w%3D%3D
Article comparant les variant calling : https://www.biorxiv.org/content/10.1101/2020.12.11.422022v1.full.pdf
**** Tester le séquencage aussi
Depuis un fastq correspondant à Illumina  https://github.com/genome-in-a-bottle/giab_data_indexes
   puis on compare le VCF avec les "high confidence"
On séquence directement NA12878 -> inutile pour le pipeline seul
**** Tester seul la partie bioinformatique
   Tout résumé ici : https://www.nist.gov/programs-projects/genome-bottle
- methode https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/analysis/Illumina_PlatinumGenomes_NA12877_NA12878_09162015/IlluminaPlatinumGenomes-user-guide.pdf
- vcf
     https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/latest/GRCh38/
NB: à quoi correspond https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/analysis/Illumina_PlatinumGenomes_NA12877_NA12878_09162015/hg38/2.0.1/NA12878/ ??
   Article comparant les variant calling : https://www.biorxiv.org/content/10.1101/2020.12.11.422022v1.full.pdf
   Article pour vcfeval : https://www.nature.com/articles/s41587-019-0054-x
   La version 4 ajoute 273 gènes "clinically relevant" https://www.biorxiv.org/content/10.1101/2021.06.07.444885v3.full.pdf
   Ajout des zones "difficiles"
   https://www.biorxiv.org/content/10.1101/2020.07.24.212712v5.full.pdf
*** [#B] Pipeline : générer patient avec tous les variants retrouvés à Cento
Comparaison de génération ADN (2019)
https://academic.oup.com/bfg/article/19/1/49/5680294
**** SimuSCop (exome)
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03665-5
https://github.com/qasimyu/simuscop
1. Crééer un modèle depuis bam + vcf : Setoprofile
2. Génerer données NGS
** Annotation :
*** Comparaison vep / snpeff et annovar
* Changement nouvelle version
- Dernière version du génome (la version "prête à l'emploi" est seulement GRCh38 sans les version patchées)
* Notes
** Nextflow
*** afficher les résultats d'un process/workflow
#+begin_src
lol.out.view()
#+end_src
Attention, ne fonctionne pas si plusieurs sortie:
#+begin_src
lol.out[0].view()
#+end_src
ou si /a/ est le nom de la sortie
#+begin_src
lol.out.a.view()
#+end_src
** Quelle version du génome ?
- T2T: notation chromose = chR1,2 : ok genome, clinvar, dbSNP
- GRCh38: notation chromose = NC_... : ok genome, clinvar, dbSNP
** Performances
Ordinateur de Carine (WSL2) : 4h dont 1h15 alignement (parallélisé) et 1h15 haplotypecaller (séquentiel)
** Chromosomes NC, NT, NW
Correspondance :
https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&chromInfoPage=
Signification
https://genome.ucsc.edu/FAQ/FAQdownloads.htm

Replacement in projects/bisonex.org at line 3 [26.35]

B:BD[28.8438] → [29.58:8250]

:00>
:LOGBOOK:
CLOCK: [2023-07-30 Sun 19:13]--[2023-07-30 Sun 20:50] =>  1:37
:END:
Modification nécessaire pour kent :
- plus de patch
- suppression d'une boucle dans postPatch
On supprime aussi NIX_BUILD_TOP
*** DONE [[https://github.com/NixOS/nixpkgs/pull/186459][BioDBHTS]]
CLOSED: [2023-05-06 Sat 08:49] SCHEDULED: <2023-04-15 Sat>
/Entered on/ [2022-08-10 Wed 14:28]
Correction pour review faites <2022-10-10 Mon>
*** DONE [[https://github.com/NixOS/nixpkgs/pull/186464][BioExtAlign]]
CLOSED: [2022-10-22 Sat 12:43] SCHEDULED: <2022-08-10 Wed>
/Entered on/ [2022-08-10 Wed 14:28]
Review <2022-10-10 Mon>, correction dans la journée.
Correction 2e passe, attente
Impossible de faire marcher les tests Car il ne trouve pas le module Bio::Tools::Align, qui est dans un dossier ailleurs dans le dépôt. Même en compilant tout le dépôt, cela ne fonctionne pas... On skip les tests.
*** TODO VEP
** WAIT [[https://github.com/NixOS/nixpkgs/pull/230394][rtg-tools]] :vcfeval:
Soumis
** WAIT Package Spip https://github.com/NixOS/nixpkgs/pull/247476
** TODO Happy :happy:
*** PROJ PR python 3 upstream
*** PROJ nixpkgs en l'état
** PROJ SpliceAI
** TODO Bamsurgeon
/Entered on/ [2023-05-13 Sat 19:11]
*** TODO Velvet
** TODO PR Picard avec option pour gérer la mémoire
Similaire à
https://github.com/bioconda/bioconda-recipes/blob/master/recipes/picard/picard.sh
* Julia :julia:
** KILL XAM.jl: PR pour modification record :julia:
CLOSED: [2023-05-29 Mon 15:40] SCHEDULED: <2023-05-28 Sun>
/Entered on/ [2023-05-27 Sat 22:39]
** TODO XAMscissors.jl :xamscissors:
Modification de la séquence dans BAM.
*Pas de mise à jour de CIGAR*
On convertit en fastq et on lance le pipeline pour "corriger"
#+begin_src sh
cd /home/alex/code/bisonex/out/63003856/preprocessing/mapped
samtools view 63003856_S135.bam NC_000022.11 -o 63003856_S135_chr22.bam
cd /home/alex/recherche/bisonex/code/BamScissors.jl
cp ~/code/bisonex/out/63003856/preprocessing/mapped/63003856_S135_chr22.bam .
samtools index 63003856_chr22.bam
#+end_src
Le script va modifier le bam, le trier et générer le fastq. !!!
Attention: ne pas oublier l'option -n !!!
#+begin_src sh
time julia --project=.. insertVariant.jl
scp 63003856_S135_chr22_{1,2}.fq.gz meso:/Work/Users/apraga/bisonex/tests/bamscissors/
#+end_src
*** WAIT Implémenter les SNV avec VAF :snv:
Stratégie :
1. calculer la profondeur sur les positions
2. créer un dictionnaire { nom du reads : position dataframe }
3. itérer sur tous les reads et changer ceux marqués
**** DONE VAF = 1
CLOSED: [2023-05-29 Mon 15:34]
**** DONE VAF selon loi normale
CLOSED: [2023-05-29 Mon 15:35]
Tronquée si > 1
**** WAIT Tests unitaires
***** DONE NA12878: 1 gène sur chromosome 22
CLOSED: [2023-05-30 Tue 23:55]
root = "https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/"
#+begin_src sh
samtools view project.NIST_NIST7035_H7AP8ADXX_NA12878.bwa.markDuplicates.bam  chr22 -o project.NIST_NIST7035_H7AP8ADXX_NA12878_chr22.bam
samtools view project.NIST_NIST7035_H7AP8ADXX_NA12878_chr22.bam chr22:19419700-19424000 -o NIST7035_H7AP8ADXX_NA12878_chr22_MRPL40_hg19.bam
#+end_src
***** WAIT Pull request formatspeciment
https://github.com/BioJulia/FormatSpecimens.jl/pull/8
***** DONE Formatspecimens
CLOSED: [2023-05-29 Mon 23:03]
****** DONE 1 read
CLOSED: [2023-05-29 Mon 23:02]
****** DONE VAF sur 1 exon
CLOSED: [2023-05-29 Mon 23:03]
**** DONE [#A] Bug: perte de nombreux reads avec NA12878
CLOSED: [2023-08-19 Sat 20:45] SCHEDULED: <2023-08-18 Fri>
:PROPERTIES:
:ID:       5c1c36f3-f68e-4e6d-a7b6-61dca89abc37
:END:
Ex: chrX:g.124056226 : on passe de 65 reads à 1
Test xamscissors: pas de soucis...
On teste sur cette position +/- 200bp
#+begin_src sh :dir /home/alex/roam/research/bisonex/code/sanger
samtools view   /home/alex/code/bisonex/out/2300346867_NA12878-63118093_S260-GRCh38/preprocessing/mapped/2300346867_NA12878-63118093_S260-GRCh38.bam chrX:124056026-124056426 -o chrXsmall.bam
#+end_src
#+RESULTS:
***** DONE Vérifier profondeur avec dernière version :
CLOSED: [2023-08-19 Sat 20:34] SCHEDULED: <2023-08-19 Sat>
****** DONE chr20: profondeur ok
SCHEDULED: <2023-08-19 Sat>
****** DONE toutes les données
CLOSED: [2023-08-19 Sat 20:34] SCHEDULED: <2023-08-19 Sat>
Ok pour 7 variants (IGV) notament chromosome X
*** TODO Implémenter les indel avec VAF :indel:
*** TODO Soumission paquet
* Données
:PROPERTIES:
:CATEGORY: data
:END:
** DONE Remplacer bam par fastq sur mesocentre
CLOSED: [2023-04-16 Sun 16:33]
Commande
*** DONE Supprimer les fastq non "paired"
CLOSED: [2023-04-16 Sun 16:33]
nushell
Liste des fastq avec "paired-end" manquant
#+begin_src nu
ls **/*.fastq.gz | get name | path basename | split column "_" | get column1 | uniq -u | save single.txt
#+end_src
#+RESULTS:
: 62907927
: 62907970
: 62899606
: 62911287
: 62913201
: 62914084
: 62915905
: 62921595
: 62923065
: 62925220
: 62926503
: 62926502
: 62926500
: 62926499
: 62926498
: 62931719
: 62943423
: 62943400
: 62948290
: 62949205
: 62949206
: 62949118
: 62951284
: 62960792
: 62960785
: 62960787
: 62960617
: 62962561
: 62962692
: 62967473
: 62972194
: 62979102
On vérifie
#+begin_src nu
open single.txt  | lines | each {|e| ls $"fastq/*_($in)/*" | get 0  }
open single.txt  | lines | each {|e| ls $"fastq/*_($in)/*" | get 0.name }  | path basename | split column "_" | get column1 | uniq -c
#+end_src
On met tous dans un dossier (pas de suppression )
#+begin_src
open single.txt  | lines | each {|e| ls $"fastq/*_($in)/*" | get 0  }  | each {|e| ^mv $e.name bad-fastq/}
#+end_src
On vérifie que les dossiier sont videsj
 open single.txt  | lines | each {|e| ls $"fastq/*_($in)" | get 0.name } | ^ls -l $in
 Puis on supprime
 open single.txt  | lines | each {|e| ls $"fastq/*_($in)" | get 0.name } | ^rm -r $in
*** DONE Supprimer bam qui ont des fastq
CLOSED: [2023-04-16 Sun 16:33]
On liste les identifiants des fastq et bam dans un tableau avec leur type :
#+begin_src
let fastq = (ls fastq/*/*.fastq.gz | get name | parse "{dir}/{full_id}/{id}_{R}_001.fastq.gz"  | select dir id | uniq )
let bam = (ls bam/*/*.bam | get name | parse "{dir}/{full_id}/{id}_{S}.bqrt.bam"  | select dir id)
#+end_src
On groupe les résultat par identifiant (résultats = liste de records qui doit être convertie en table)
et on trie ceux qui n'ont qu'un fastq ou un bam
#+begin_src
let single = ( $bam | append $fastq | group-by id | transpose id files | get files | where {|x| ($x | length) == 1})
#+end_src
On convertit en table et on récupère seulement les bam
#+begin_src
$single | reduce {|it, acc| $acc | append $it} | where dir == bam | get id | each {|e| ^ls $"bam/*_($e)/*.bam"}
#+end_src
#+RESULTS:
: bam/2100656174_62913201/62913201_S52.bqrt.bam
: bam/2100733271_62925220/62925220_S33.bqrt.bam
: bam/2100738763_62926502/62926502_S108.bqrt.bam
: bam/2100746726_62926498/62926498_S105.bqrt.bam
: bam/2100787936_62931955/62931955_S4.bqrt.bam
: bam/2200066374_62948290/62948290_S130.bqrt.bam
: bam/2200074722_62948298/62948298_S131.bqrt.bam
: bam/2200074990_62948306/62948306_S218.bqrt.bam
: bam/2200214581_62967331/62967331_S267.bqrt.bam
: bam/2200225399_62972187/62972187_S85.bqrt.bam
: bam/2200293962_62979117/62979117_S63.bqrt.bam
: bam/2200423985_62999352/62999352_S1.bqrt.bam
: bam/2200495073_63010427/63010427_S20.bqrt.bam
: bam/2200511274_63012586/63012586_S114.bqrt.bam
: bam/2200669188_63036688/63036688_S150.bqrt.bam
* Nouveau workflow :workflow:
** TODO Bases de données
*** KILL Nix pour télécharger les données brutes
**** Conclusion
Non viable sur cluster car en dehors de /nix/store
On peut utiliser des symlink mais trop compliqué
**** KILL Axel au lieu de curl pour gérer les timeout?
CLOSED: [2022-08-19 Fri 15:18]
*** DONE Tester patch de @pennae pour gros fichiers
SCHEDULED: <2022-08-19 Fri>
*** KILL Télécharger les données avec nextflow: hg38
CLOSED: [2023-06-12 Mon 23:29]
**** DONE Genome de référence
**** DONE dbSNP
**** DONE VEP 20G
CLO

[28.8438]

[29.8250]

:00>
:LOGBOOK:
CLOCK: [2023-07-30 Sun 19:13]--[2023-07-30 Sun 20:50] =>  1:37
:END:
Modification nécessaire pour kent :
- plus de patch
- suppression d'une boucle dans postPatch
On supprime aussi NIX_BUILD_TOP
*** DONE [[https://github.com/NixOS/nixpkgs/pull/186459][BioDBHTS]]
CLOSED: [2023-05-06 Sat 08:49] SCHEDULED: <2023-04-15 Sat>
/Entered on/ [2022-08-10 Wed 14:28]
Correction pour review faites <2022-10-10 Mon>
*** DONE [[https://github.com/NixOS/nixpkgs/pull/186464][BioExtAlign]]
CLOSED: [2022-10-22 Sat 12:43] SCHEDULED: <2022-08-10 Wed>
/Entered on/ [2022-08-10 Wed 14:28]
Review <2022-10-10 Mon>, correction dans la journée.
Correction 2e passe, attente
Impossible de faire marcher les tests Car il ne trouve pas le module Bio::Tools::Align, qui est dans un dossier ailleurs dans le dépôt. Même en compilant tout le dépôt, cela ne fonctionne pas... On skip les tests.
*** TODO VEP
** WAIT [[https://github.com/NixOS/nixpkgs/pull/230394][rtg-tools]] :vcfeval:
Soumis
** WAIT Package Spip https://github.com/NixOS/nixpkgs/pull/247476
** TODO Happy :happy:
*** TODO PR python 3 upstream
SCHEDULED: <2023-10-14 Sat>
*** TODO nixpkgs en l'état
SCHEDULED: <2023-10-14 Sat>
** PROJ SpliceAI
** TODO Bamsurgeon
/Entered on/ [2023-05-13 Sat 19:11]
*** TODO Velvet
** TODO PR Picard avec option pour gérer la mémoire
Similaire à
https://github.com/bioconda/bioconda-recipes/blob/master/recipes/picard/picard.sh
* Julia :julia:
** KILL XAM.jl: PR pour modification record :julia:
CLOSED: [2023-05-29 Mon 15:40] SCHEDULED: <2023-05-28 Sun>
/Entered on/ [2023-05-27 Sat 22:39]
** TODO XAMscissors.jl :xamscissors:
Modification de la séquence dans BAM.
*Pas de mise à jour de CIGAR*
On convertit en fastq et on lance le pipeline pour "corriger"
#+begin_src sh
cd /home/alex/code/bisonex/out/63003856/preprocessing/mapped
samtools view 63003856_S135.bam NC_000022.11 -o 63003856_S135_chr22.bam
cd /home/alex/recherche/bisonex/code/BamScissors.jl
cp ~/code/bisonex/out/63003856/preprocessing/mapped/63003856_S135_chr22.bam .
samtools index 63003856_chr22.bam
#+end_src
Le script va modifier le bam, le trier et générer le fastq. !!!
Attention: ne pas oublier l'option -n !!!
#+begin_src sh
time julia --project=.. insertVariant.jl
scp 63003856_S135_chr22_{1,2}.fq.gz meso:/Work/Users/apraga/bisonex/tests/bamscissors/
#+end_src
*** WAIT Implémenter les SNV avec VAF :snv:
Stratégie :
1. calculer la profondeur sur les positions
2. créer un dictionnaire { nom du reads : position dataframe }
3. itérer sur tous les reads et changer ceux marqués
**** DONE VAF = 1
CLOSED: [2023-05-29 Mon 15:34]
**** DONE VAF selon loi normale
CLOSED: [2023-05-29 Mon 15:35]
Tronquée si > 1
**** WAIT Tests unitaires
***** DONE NA12878: 1 gène sur chromosome 22
CLOSED: [2023-05-30 Tue 23:55]
root = "https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/"
#+begin_src sh
samtools view project.NIST_NIST7035_H7AP8ADXX_NA12878.bwa.markDuplicates.bam  chr22 -o project.NIST_NIST7035_H7AP8ADXX_NA12878_chr22.bam
samtools view project.NIST_NIST7035_H7AP8ADXX_NA12878_chr22.bam chr22:19419700-19424000 -o NIST7035_H7AP8ADXX_NA12878_chr22_MRPL40_hg19.bam
#+end_src
***** WAIT Pull request formatspeciment
https://github.com/BioJulia/FormatSpecimens.jl/pull/8
***** DONE Formatspecimens
CLOSED: [2023-05-29 Mon 23:03]
****** DONE 1 read
CLOSED: [2023-05-29 Mon 23:02]
****** DONE VAF sur 1 exon
CLOSED: [2023-05-29 Mon 23:03]
**** DONE [#A] Bug: perte de nombreux reads avec NA12878
CLOSED: [2023-08-19 Sat 20:45] SCHEDULED: <2023-08-18 Fri>
:PROPERTIES:
:ID:       5c1c36f3-f68e-4e6d-a7b6-61dca89abc37
:END:
Ex: chrX:g.124056226 : on passe de 65 reads à 1
Test xamscissors: pas de soucis...
On teste sur cette position +/- 200bp
#+begin_src sh :dir /home/alex/roam/research/bisonex/code/sanger
samtools view   /home/alex/code/bisonex/out/2300346867_NA12878-63118093_S260-GRCh38/preprocessing/mapped/2300346867_NA12878-63118093_S260-GRCh38.bam chrX:124056026-124056426 -o chrXsmall.bam
#+end_src
#+RESULTS:
***** DONE Vérifier profondeur avec dernière version :
CLOSED: [2023-08-19 Sat 20:34] SCHEDULED: <2023-08-19 Sat>
****** DONE chr20: profondeur ok
SCHEDULED: <2023-08-19 Sat>
****** DONE toutes les données
CLOSED: [2023-08-19 Sat 20:34] SCHEDULED: <2023-08-19 Sat>
Ok pour 7 variants (IGV) notament chromosome X
*** TODO Implémenter les indel avec VAF :indel:
*** TODO Soumission paquet
* Données
:PROPERTIES:
:CATEGORY: data
:END:
** DONE Remplacer bam par fastq sur mesocentre
CLOSED: [2023-04-16 Sun 16:33]
Commande
*** DONE Supprimer les fastq non "paired"
CLOSED: [2023-04-16 Sun 16:33]
nushell
Liste des fastq avec "paired-end" manquant
#+begin_src nu
ls **/*.fastq.gz | get name | path basename | split column "_" | get column1 | uniq -u | save single.txt
#+end_src
#+RESULTS:
: 62907927
: 62907970
: 62899606
: 62911287
: 62913201
: 62914084
: 62915905
: 62921595
: 62923065
: 62925220
: 62926503
: 62926502
: 62926500
: 62926499
: 62926498
: 62931719
: 62943423
: 62943400
: 62948290
: 62949205
: 62949206
: 62949118
: 62951284
: 62960792
: 62960785
: 62960787
: 62960617
: 62962561
: 62962692
: 62967473
: 62972194
: 62979102
On vérifie
#+begin_src nu
open single.txt  | lines | each {|e| ls $"fastq/*_($in)/*" | get 0  }
open single.txt  | lines | each {|e| ls $"fastq/*_($in)/*" | get 0.name }  | path basename | split column "_" | get column1 | uniq -c
#+end_src
On met tous dans un dossier (pas de suppression )
#+begin_src
open single.txt  | lines | each {|e| ls $"fastq/*_($in)/*" | get 0  }  | each {|e| ^mv $e.name bad-fastq/}
#+end_src
On vérifie que les dossiier sont videsj
 open single.txt  | lines | each {|e| ls $"fastq/*_($in)" | get 0.name } | ^ls -l $in
 Puis on supprime
 open single.txt  | lines | each {|e| ls $"fastq/*_($in)" | get 0.name } | ^rm -r $in
*** DONE Supprimer bam qui ont des fastq
CLOSED: [2023-04-16 Sun 16:33]
On liste les identifiants des fastq et bam dans un tableau avec leur type :
#+begin_src
let fastq = (ls fastq/*/*.fastq.gz | get name | parse "{dir}/{full_id}/{id}_{R}_001.fastq.gz"  | select dir id | uniq )
let bam = (ls bam/*/*.bam | get name | parse "{dir}/{full_id}/{id}_{S}.bqrt.bam"  | select dir id)
#+end_src
On groupe les résultat par identifiant (résultats = liste de records qui doit être convertie en table)
et on trie ceux qui n'ont qu'un fastq ou un bam
#+begin_src
let single = ( $bam | append $fastq | group-by id | transpose id files | get files | where {|x| ($x | length) == 1})
#+end_src
On convertit en table et on récupère seulement les bam
#+begin_src
$single | reduce {|it, acc| $acc | append $it} | where dir == bam | get id | each {|e| ^ls $"bam/*_($e)/*.bam"}
#+end_src
#+RESULTS:
: bam/2100656174_62913201/62913201_S52.bqrt.bam
: bam/2100733271_62925220/62925220_S33.bqrt.bam
: bam/2100738763_62926502/62926502_S108.bqrt.bam
: bam/2100746726_62926498/62926498_S105.bqrt.bam
: bam/2100787936_62931955/62931955_S4.bqrt.bam
: bam/2200066374_62948290/62948290_S130.bqrt.bam
: bam/2200074722_62948298/62948298_S131.bqrt.bam
: bam/2200074990_62948306/62948306_S218.bqrt.bam
: bam/2200214581_62967331/62967331_S267.bqrt.bam
: bam/2200225399_62972187/62972187_S85.bqrt.bam
: bam/2200293962_62979117/62979117_S63.bqrt.bam
: bam/2200423985_62999352/62999352_S1.bqrt.bam
: bam/2200495073_63010427/63010427_S20.bqrt.bam
: bam/2200511274_63012586/63012586_S114.bqrt.bam
: bam/2200669188_63036688/63036688_S150.bqrt.bam
* Nouveau workflow :workflow:
** TODO Bases de données
*** KILL Nix pour télécharger les données brutes
**** Conclusion
Non viable sur cluster car en dehors de /nix/store
On peut utiliser des symlink mais trop compliqué
**** KILL Axel au lieu de curl pour gérer les timeout?
CLOSED: [2022-08-19 Fri 15:18]
*** DONE Tester patch de @pennae pour gros fichiers
SCHEDULED: <2022-08-19 Fri>
*** KILL Télécharger les données avec nextflow: hg38
CLOSED: [2023-06-12 Mon 23:29]
**** DONE Genome de référence
**** DONE dbSNP
**** DONE VEP 20G
CLO

Replacement in projects/bisonex.org at line 7 [26.35]

B:BD[30.10850] → [30.10850:12706]

B:BD[30.12706] → [25.35842:37612]

B:BD[25.37612] → [31.8257:12823]

| G         | A         | Conflicting_interpretations_of_pathogenicity |
| NC_000020.11 | 63414925 |      129337 | G         | C         | Benign                                       |
| NC_000020.11 | 63414925 |      851545 | GG        | CA        | Uncertain_significance                       |
| ------       |          |             |           |           |                                              |
On a donc plusieurs problèmes :
1. isec devrait fonctionner au moins sur
| NC_000020.11 | 25390747 | rs373200654 | G         | C         |                                              |
| NC_000020.11 | 25390747 |      338000 | G         | C         | Conflicting_interpretations_of_pathogenicity |
On teste juste sur cette ligne
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools filter -i 'POS=25390747' clinvar_chr20.vcf.gz -o clinvar_test.vcf.gz
bcftools filter -i 'POS=25390747' dbSNP_common_chr20.vcf.gz -o dbSNP_test.vcf.gz
#+end_src
On retrouve bien la ligne dans l'intersection...
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools filter -i 'POS=25390747' clinvar_chr20.vcf.gz -o clinvar_test.vcf.gz
bcftools index dbSNP_test.vcf.gz dbSNP_test.vcf.gz
bcftools index dbSNP_test.vcf.gz clinvar_test.vcf.gz
bcftools isec dbSNP_test.vcf.gz clinvar_test.vcf.gz -p test
#+end_src
#+RESULTS:
2. isec ne semble pas fonctionner sur en cas d'ALT multiples
| NC_000020.11 | 32800145 | rs2424926 | C | G,T |                                              |
| NC_000020.11 | 32800145 |    338173 | C | G   | Benign                                       |
| NC_000020.11 | 32800145 |    338174 | C | T   | Conflicting_interpretations_of_pathogenicity |
|              |          |           |   |     |                                              |
3. s'il y a plusieurs variantions à une position, il faut 
bien vérifier que tous ne sont pas patho.
   La version d'Alexis le fait bien
| NC_000020.11 | 3234173 | rs3827075 | T         | A,C,G     |                                              |
| NC_000020.11 | 3234173 |    262001 | T         | G         | Conflicting_interpretations_of_pathogenicity |
| NC_000020.11 | 3234173 |   1072511 | T         | TGGCGAAGC | Pathogenic                                   |
| NC_000020.11 | 3234173 |    208613 | TGGCGAAGC | G         | Pathogenic                                   |
| NC_000020.11 | 3234173 |      1312 | TGGCGAAGC | T         | Pathogenic                                   |
****** DONE Voir si isec gère les multiallélique (chr20) : non, impossible de faire marcher
CLOSED: [2022-11-27 Sun 00:37]
******* DONE chr20 en prenant un patho clinvar aussi dans dbSNP
CLOSED: [2022-11-27 Sun 00:37]
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools filter dbSNP_common_chr20.vcf.gz -i 'POS=10652589' -o test_dbsnp.vcf.gz
bcftools filter clinvar_chr20.vcf.gz -i 'POS=10652589' -o test_clinvar.vcf.gz
bcftools index test_dbsnp.vcf.gz
bcftools index test_clinvar.vcf.gz
#+end_src
#+RESULTS:
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools isec test_dbsnp.vcf.gz test_clinvar.vcf.gz -p tmp
grep '^[^#]' tmp/0002.vcf
grep '^[^#]' tmp/0003.vcf
#+end_src
#+RESULTS:
Même en biallélique, ne fonctionne pas.
Testé en modifiant test_dbsnp !
Fonctionne avec un variant par ligne
****** DONE isec en coupant les sites multialléliques: non
CLOSED: [2022-11-27 Sun 00:37]
******* DONE Exemple simple ok
CLOSED: [2022-11-27 Sun 00:34]
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools filter -i 'POS=10652589' dbSNP_common_chr20.vcf.gz -o dbsnp_mwi.vcf.gz
bcftools filter 
-i 'POS=10652589' clinvar_chr20.vcf.gz -o clinvar_mwi.vcf.gz
bcftools index -f dbsnp_mwi.vcf.gz
bcftools index -f clinvar_mwi.vcf.gz
bcftools isec dbsnp_mwi.vcf.gz clinvar_mwi.vcf.gz -n=2
#+end_src
#+RESULTS:
Même en biallélique, ne fonctionne pas.
Chr 20
Avec les fichiers du teste précédent
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools norm -m -any dbsnp_mwi.vcf.gz -o dbsnp_mwi_norm.vcf.gz
bcftools index dbsnp_mwi_norm.vcf.gz
bcftools isec dbsnp_mwi_norm.vcf.gz clinvar_mwi.vcf.gz -n=2
#+end_src
#+RESULTS:
| NC_000020.11 | 10652589 | G | A | 11 |
| NC_000020.11 | 10652589 | G | C | 11 |
******* TODO Sur dbSNP chr20 non
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools norm -m -any dbSNP_common_chr20 -o dbSNP_common_chr20_norm.vcf.gz
#+end_src
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools isec -i 'INFO/CLNSIG="Pathogenic"' dbSNP_common_chr20_norm.vcf.gz clinvar_chr20.vcf.gz -p tmp
#+end_src
#+RESULTS:
***** DONE Essai bedtools intersect
#+begin_src sh
bedtools intersect -a  dbSNP_common.vcf.gz -b clinvar.vcf.gz
#+end_src
$ wc -l intersect.vcf
220206 intersect.vcf
** TODO Dépendences avec Nix
*** DONE GATK
CLOSED: [2022-10-21 Fri 21:59]
*** WAIT BioDBHTS
Contribuer pull request
*** DONE BioExtAlign
CLOSED: [2022-10-22 Sat 00:38]
*** WAIT BioBigFile
Revoir si on peut utliser kent dernière version
Contribuer pull request
*** HOLD rtg-tools
Convertir clinvar NC
*** DONE simuscop
CLOSED: [2022-12-30 Fri 22:31]
*** DONE Spip
CLOSED: [2022-12-04 Sun 12:49]
Pas de pull request
*** DONE R + packages
CLOSED: [2022-11-19 Sat 21:05]
*** TODO hap.py
https://github.com/Illumina/hap.py
**** DONE Version sans rtgtools avec python 3
CLOSED: [2023-02-02 Thu 22:15]
Procédure pour tester
#+begin_src
nix develop .#hap-py
$ genericBuild
#+end_src
1. Supprimer l’appel à make_dependencies dans cmakelist.txt : on peut tout installer avec nix
2. Patch Roc.cpp pour avoir numeric_limits ( error: 'numeric_limits' is not a member of 'std')
3. ajout de flags de link (essai, error)
set(ZLIB_LIBRARIES -lz -lbz2 -lcurl -lcrypto -llzma)
4. Changer les appels à print en print() dans le code python et suppression de quelques import
[nix-shell:~/source]$ sed -i.orig 's/print \"\(.*\)"/print(\1)/' src/python/*.py
**** DONE Sérialiser json pour écrire données de sorties
CLOSED: [2023-02-17 Fri 19:25]
**** DONE Tester sur example
CLOSED: [2023-02-04 Sat 00:25]
#+begin_src sh
$ cd hap.py
$ ../result/bin/hap.py example/happy/PG_NA12878_chr21.vcf.gz       example/happy/NA12878_chr21.vcf.gz       -f example/happy/PG_Conf_chr21.bed.gz       -o test -r example/chr21.fa
#+end_src
#+RESULTS:
| Type  | Filter | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision | METRIC.Frac_NA | METRIC.F1_Score |
| INDEL | ALL    |        8937 |     7839 |     1098 |       11812 |      343 |      3520 |    45 |   283 |      0.877140 |         0.958635 |       0.298002 |        0.916079 |
| INDEL | PASS   |        8937 |     7550 |     1387 |        9971 |      283 |      1964 |    30 |   242 |      0.844803 |         0.964656 |       0.196971 |        0.900760 |
| SNP   | ALL    |       52494 |    52125 |      369 |       90092 |      582 |     37348 |   107 |   354 |      0.992971 |         0.988966 |       0.414554 |        0.990964 |
| SNP   | PASS   |       52494 |    46920 |     5574 |       48078 |      143 |       992 |     8 |    97 |      0.893816 |         0.996963 |       0.020633 |        0.942576 |
**** KILL Version avec rtg-tools
CLOSED: [2023-07-30 Sun 14:38]
**** HOLD Faire fonctionner Tests
***** HOLD Essai 2 : depuis nix develop:
#+begin_src
nix develop .#hap-py
genericBuild
#+end_src
Lancé initialement à la main, mais on peut maintenant utiliser run_tests
#+begin_src
HCDIR=bin/ ../src/sh/run_tests.sha
#+end_src
- [X] test boost
- [X] multimerge
- [X] hapenum
- [X] fp accuracy
- [X] faulty variant
- leftshift fails
- [X] other vcf
- [X] chr prefix
- [X] gvcf
- [X] decomp
- [X] contig lengt
- [X]  integration test
- [ ] scmp fails sur le type
- [X] giab
- [X] performance
- [ ] quantify fails sur le type
- [ ] stratified échec sur les résultats !
- [X] pg counting
- [ ] sompy: ne trouve pas Strelka dans somatic
phases="buildPhase checkPhase installPhase fixupPhase" genericBuild
#+end_src
**** KILL Reproduire les performances precisionchallenge : attention à HG002 et HG001!
CLOSED: [2023-04-01 Sa

[30.10850]

[31.12823]

| G         | A         | Conflicting_interpretations_of_pathogenicity |
| NC_000020.11 | 63414925 |      129337 | G         | C         | Benign                                       |
| NC_000020.11 | 63414925 |      851545 | GG        | CA        | Uncertain_significance                       |
| ------       |          |             |           |           |                                              |
On a donc plusieurs problèmes :
1. isec devrait fonctionner au moins sur
| NC_000020.11 | 25390747 | rs373200654 | G         | C         |                                              |
| NC_000020.11 | 25390747 |      338000 | G         | C         | Conflicting_interpretations_of_pathogenicity |
On teste juste sur cette ligne
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools filter -i 'POS=25390747' clinvar_chr20.vcf.gz -o clinvar_test.vcf.gz
bcftools filter -i 'POS=25390747' dbSNP_common_chr20.vcf.gz -o dbSNP_test.vcf.gz
#+end_src
On retrouve bien la ligne dans l'intersection...
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools filter -i 'POS=25390747' clinvar_chr20.vcf.gz -o clinvar_test.vcf.gz
bcftools index dbSNP_test.vcf.gz dbSNP_test.vcf.gz
bcftools index dbSNP_test.vcf.gz clinvar_test.vcf.gz
bcftools isec dbSNP_test.vcf.gz clinvar_test.vcf.gz -p test
#+end_src
#+RESULTS:
2. isec ne semble pas fonctionner sur en cas d'ALT multiples
| NC_000020.11 | 32800145 | rs2424926 | C | G,T |                                              |
| NC_000020.11 | 32800145 |    338173 | C | G   | Benign                                       |
| NC_000020.11 | 32800145 |    338174 | C | T   | Conflicting_interpretations_of_pathogenicity |
|              |          |           |   |     |                                              |
3. s'il y a plusieurs variantions à une position, il faut bien vérifier que tous ne sont pas patho.
   La version d'Alexis le fait bien
| NC_000020.11 | 3234173 | rs3827075 | T         | A,C,G     |                                              |
| NC_000020.11 | 3234173 |    262001 | T         | G         | Conflicting_interpretations_of_pathogenicity |
| NC_000020.11 | 3234173 |   1072511 | T         | TGGCGAAGC | Pathogenic                                   |
| NC_000020.11 | 3234173 |    208613 | TGGCGAAGC | G         | Pathogenic                                   |
| NC_000020.11 | 3234173 |      1312 | TGGCGAAGC | T         | Pathogenic                                   |
****** DONE Voir si isec gère les multiallélique (chr20) : non, impossible de faire marcher
CLOSED: [2022-11-27 Sun 00:37]
******* DONE chr20 en prenant un patho clinvar aussi dans dbSNP
CLOSED: [2022-11-27 Sun 00:37]
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools filter dbSNP_common_chr20.vcf.gz -i 'POS=10652589' -o test_dbsnp.vcf.gz
bcftools filter clinvar_chr20.vcf.gz -i 'POS=10652589' -o test_clinvar.vcf.gz
bcftools index test_dbsnp.vcf.gz
bcftools index test_clinvar.vcf.gz
#+end_src
#+RESULTS:
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools isec test_dbsnp.vcf.gz test_clinvar.vcf.gz -p tmp
grep '^[^#]' tmp/0002.vcf
grep '^[^#]' tmp/0003.vcf
#+end_src
#+RESULTS:
Même en biallélique, ne fonctionne pas.
Testé en modifiant test_dbsnp !
Fonctionne avec un variant par ligne
****** DONE isec en coupant les sites multialléliques: non
CLOSED: [2022-11-27 Sun 00:37]
******* DONE Exemple simple ok
CLOSED: [2022-11-27 Sun 00:34]
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools filter -i 'POS=10652589' dbSNP_common_chr20.vcf.gz -o dbsnp_mwi.vcf.gz
bcftools filter -i 'POS=10652589' clinvar_chr20.vcf.gz -o clinvar_mwi.vcf.gz
bcftools index -f dbsnp_mwi.vcf.gz
bcftools index -f clinvar_mwi.vcf.gz
bcftools isec dbsnp_mwi.vcf.gz clinvar_mwi.vcf.gz -n=2
#+end_src
#+RESULTS:
Même en biallélique, ne fonctionne pas.
Chr 20
Avec les fichiers du teste précédent
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools norm -m -any dbsnp_mwi.vcf.gz -o dbsnp_mwi_norm.vcf.gz
bcftools index dbsnp_mwi_norm.vcf.gz
bcftools isec dbsnp_mwi_norm.vcf.gz clinvar_mwi.vcf.gz -n=2
#+end_src
#+RESULTS:
| NC_000020.11 | 10652589 | G | A | 11 |
| NC_000020.11 | 10652589 | G | C | 11 |
******* DONE Sur dbSNP chr20 non
CLOSED: [2023-10-07 Sat 17:57]
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools norm -m -any dbSNP_common_chr20 -o dbSNP_common_chr20_norm.vcf.gz
#+end_src
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools isec -i 'INFO/CLNSIG="Pathogenic"' dbSNP_common_chr20_norm.vcf.gz clinvar_chr20.vcf.gz -p tmp
#+end_src
#+RESULTS:
***** DONE Essai bedtools intersect
#+begin_src sh
bedtools intersect -a  dbSNP_common.vcf.gz -b clinvar.vcf.gz
#+end_src
$ wc -l intersect.vcf
220206 intersect.vcf
** TODO Dépendences avec Nix
*** DONE GATK
CLOSED: [2022-10-21 Fri 21:59]
*** WAIT BioDBHTS
Contribuer pull request
*** DONE BioExtAlign
CLOSED: [2022-10-22 Sat 00:38]
*** WAIT BioBigFile
Revoir si on peut utliser kent dernière version
Contribuer pull request
*** HOLD rtg-tools
Convertir clinvar NC
*** DONE simuscop
CLOSED: [2022-12-30 Fri 22:31]
*** DONE Spip
CLOSED: [2022-12-04 Sun 12:49]
Pas de pull request
*** DONE R + packages
CLOSED: [2022-11-19 Sat 21:05]
*** TODO hap.py
https://github.com/Illumina/hap.py
**** DONE Version sans rtgtools avec python 3
CLOSED: [2023-02-02 Thu 22:15]
Procédure pour tester
#+begin_src
nix develop .#hap-py
$ genericBuild
#+end_src
1. Supprimer l’appel à make_dependencies dans cmakelist.txt : on peut tout installer avec nix
2. Patch Roc.cpp pour avoir numeric_limits ( error: 'numeric_limits' is not a member of 'std')
3. ajout de flags de link (essai, error)
set(ZLIB_LIBRARIES -lz -lbz2 -lcurl -lcrypto -llzma)
4. Changer les appels à print en print() dans le code python et suppression de quelques import
[nix-shell:~/source]$ sed -i.orig 's/print \"\(.*\)"/print(\1)/' src/python/*.py
**** DONE Sérialiser json pour écrire données de sorties
CLOSED: [2023-02-17 Fri 19:25]
**** DONE Tester sur example
CLOSED: [2023-02-04 Sat 00:25]
#+begin_src sh
$ cd hap.py
$ ../result/bin/hap.py example/happy/PG_NA12878_chr21.vcf.gz       example/happy/NA12878_chr21.vcf.gz       -f example/happy/PG_Conf_chr21.bed.gz       -o test -r example/chr21.fa
#+end_src
#+RESULTS:
| Type  | Filter | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision | METRIC.Frac_NA | METRIC.F1_Score |
| INDEL | ALL    |        8937 |     7839 |     1098 |       11812 |      343 |      3520 |    45 |   283 |      0.877140 |         0.958635 |       0.298002 |        0.916079 |
| INDEL | PASS   |        8937 |     7550 |     1387 |        9971 |      283 |      1964 |    30 |   242 |      0.844803 |         0.964656 |       0.196971 |        0.900760 |
| SNP   | ALL    |       52494 |    52125 |      369 |       90092 |      582 |     37348 |   107 |   354 |      0.992971 |         0.988966 |       0.414554 |        0.990964 |
| SNP   | PASS   |       52494 |    46920 |     5574 |       48078 |      143 |       992 |     8 |    97 |      0.893816 |         0.996963 |       0.020633 |        0.942576 |
**** KILL Version avec rtg-tools
CLOSED: [2023-07-30 Sun 14:38]
**** HOLD Faire fonctionner Tests
***** HOLD Essai 2 : depuis nix develop:
#+begin_src
nix develop .#hap-py
genericBuild
#+end_src
Lancé initialement à la main, mais on peut maintenant utiliser run_tests
#+begin_src
HCDIR=bin/ ../src/sh/run_tests.sha
#+end_src
- [X] test boost
- [X] multimerge
- [X] hapenum
- [X] fp accuracy
- [X] faulty variant
- leftshift fails
- [X] other vcf
- [X] chr prefix
- [X] gvcf
- [X] decomp
- [X] contig lengt
- [X]  integration test
- [ ] scmp fails sur le type
- [X] giab
- [X] performance
- [ ] quantify fails sur le type
- [ ] stratified échec sur les résultats !
- [X] pg counting
- [ ] sompy: ne trouve pas Strelka dans somatic
phases="buildPhase checkPhase installPhase fixupPhase" genericBuild
#+end_src
**** KILL Reproduire les performances precisionchallenge : attention à HG002 et HG001!
CLOSED: [2023-04-01 Sa

Replacement in projects/bisonex.org at line 23 [26.35]

B:BD[10.8775] → [10.8775:8806]

∅:D[10.8806] → [32.8778:8849]

B:BD[32.8778] → [32.8778:8849]

∅:D[32.8849] → [33.16413:16486]

B:BD[33.16413] → [33.16413:16486]

∅:D[33.16486] → [29.32827:34001]

B:BD[29.32827] → [29.32827:34001]

∅:D[29.34001] → [34.29:6872]

B:BD[35.8393] → [34.29:6872]

 to load libgkl_utils.so from n
ative/libgkl_utils.so (/Work/Users/apraga/bisonex/out/NA12878_NIST7035/
preprocessing/applybqsr/libgkl_utils821485189051585397.so: libgomp.so.1: 
cannot open shared object file: No such file or directory)
17:28:00.733 WARN  IntelPairHmm - Intel GKL Utils not loaded
17:28:00.733 WARN  PairHMM - ***WARNING: Machine does not have the AVX instruction set support needed for the accelerated AVX PairHmm. Falling back to the MUCH slower LOGLESS_CACHING implementation!
17:28:00.763 INFO  ProgressMeter - Starting traversal
#+end_quote
libgomp.so est fourni par gcc donc il faut charger le module
 module load gcc@11.3.0/gcc-12.1.0
** KILL Utiliser subworkflow
CLOSED: [2023-04-02 Sun 18:08]
Notre version permet d'être plus souple
*** KILL Alignement
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
*** KILL Vep
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
vcf_annotate_ensemblvep
** TODO Annotation avec nextflow :annotation:
*** KILL VEP : --gene-phenotype ?
CLOSED: [2023-04-18 mar. 18:32]
Vu avec alexis : bases de données non à jour
https://www.ensembl.org/info/genome/variation/phenotype/sources_phenotype_documentation.html
*** DONE plugin VEP
CLOSED: [2023-04-18 mar. 18:32]
Cloner dépôt git avec plugin
Puis utiliser --dir_plugins
*** HOLD Utiliser code d’Ale
xis
*** TODO Nouvelle version avec VEP
Example avec --custom
https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html
**** DONE Ajout spliceAI
CLOSED: [2023-05-18 Thu 11:02] SCHEDULED: <2023-04-30 Sun>
plugin VEP
***** DONE Télécharger les données
CLOSED: [2023-05-11 Thu 19:01]
Difficile d'automatiser, le lien est temporaire...
***** DONE PLugin
CLOSED: [2023-05-11 Thu 20:16]
***** DONE Séparer score en plusieurs colonnes
CLOSED: [2023-05-11 Thu 20:16]
Test avec ce fichier pour avoir une ligne avec annotation et une ligne sans
#CHROM	POS	ID	REF	ALT
1	9091	.	A	C
1	69091	.	A	C
et
#+begin_src sh
rm -f postvep.tsv* && vep -i testspliceai.vcf.gz -o postvep.tsv --tab  --dir 109 --merged --pick --use_given_ref   --offline  --plugin SpliceAI,snv=spliceai_scores.raw.snv.hg38.vcf.gz,indel=spliceai_scores.raw.indel.hg38.vcf.gz
#+end_src
#+begin_src
$ bgzip postvep.tsv
$ python spliceai.py
$ cat postvep2.tsv
,variation,Location,Allele,Gene,Feature,Feature_type,Consequence,cDNA_position,CDS_position,Protein_position,Amino_acids,Codons,Existing_variation,IMPACT,DISTANCE,STRAND,FLAGS,REFSEQ_MATCH,SOURCE,REFSEQ_OFFSET,SpliceAI_AG,SpliceAI_AL,SpliceAI_DG,SpliceAI_DL
0,1_9091_A/C,1:9091,C,ENSG00000290825,ENST00000456328,Transcript,upstream_gene_variant,-,-,-,-,-,-,MODIFIER,2778,1,-,-,Ensembl,-,,,,
1,1_69091_A/C,1:69091,C,ENSG00000186092,ENST00000641515,Transcript,missense_variant,124,64,22,M/L,Atg/Ctg,-,MODERATE,-,1,-,-,Ensembl,-,0.01,0.00,0.00,0.01
#+end_src
Test
cp work/bf/437ae511958509e43072f032f4d495/small.tab.gz tests/vep-spip.tab.gz
cp work/d5/3b1244b5ae83d54409ee0d456e8c55/small_cadd.tab.gz tests/vep-cadd-splice.tab.gz
**** STRT Package Nix spliceAI
On utilise le tensorflow fourni avec nix (branche spliceai)
Il faut LD_PRELOAD=/lib64/libcuda.so  pour l'exécution
***** DONE Version CPU non optimisé
CLOSED: [2023-09-23 Sat 21:34]
****** DONE Vérifier annotation hg19 et 38
CLOSED: [2023-09-23 Sat 18:49]
****** DONE Annotation maison T2T
CLOSED: [2023-09-23 Sat 21:33] SCHEDULED: <2023-09-23 Sat>
***** DONE Version CPU optismisée
CLOSED: [2023-09-23 Sat 21:34]
Activer flag dans le package nix
***** DONE Version CPU:  Test chr20
CLOSED: [2023-09-23 Sat 22:25] SCHEDULED: <2023-09-23 Sat>
10min ?
***** DONE Version CPU:  NA12878 sanger complet: kill car trop long
CLOSED: [2023-09-24 Sun 08:22] SCHEDULED: <2023-09-23 Sat>
10min ?
***** DONE GPU avec la version sur mésocentre: mail envoyé
CLOSED: [2023-09-26 Tue 11:49]
#+begin_src sh
module unload nix/2.11.0
module load anaconda3@2021.05/gcc-12.1.0
module load deep/tensorflow-gpu
pip install spliceai
#+end_src
Puis on teste sur la queue gpu
#+begin_src sh
srun -p gpu -t 4:00:00 --gres=gpu:1 --pty bash
cd /Work/Users/apraga/bisonex/tests/spliceai/
module load deep/tensorflow-gpu
module unload nix/2.11.0
time spliceai -I NA12878-sanger-chr20-T2T.vep.vcf.gz -O output-20-2.vcf -R /Work/Groups/bisonex/data/fasta/chm13v2.0/chm13v2.0.fa -A ~/t2t.txt
#+end_src
Échec: librarie DNN not found...
#+begin_quote
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-24 09:37:45.892545: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-09-24 09:37:47.759421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 38188 MB memory:  -> device: 0, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:21:00.0, compute capability: 8.0
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
2023-09-24 09:37:54.143021: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:637] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2023-09-24 09:37:54.217160: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:429] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2023-09-24 09:37:54.217220: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:438] Possibly insufficient driver version: 510.108.3
2023-09-24 09:37:54.217245: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at conv_ops.cc:1068 : UNIMPLEMENTED: DNN library is not found.
2023-09-24 09:37:54.217262: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:GPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): UNIMPLEMENTED: DNN library is not found.
         [[{{node model_1/conv1d_3/Conv1D}}]]
Traceback (most recent call last):
  File "/Home/Users/apraga/.local/bin/spliceai", line 8, in <module>
    sys.exit(main())
  File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/__main__.py", line 72, in main
    scores = get_delta_scores(record, ann, args.D, args.M)
  File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/utils.py", line 159, in get_delta_scores
    y_ref = np.mean([ann.models[m].predict(x_ref) for m in range(5)], axis=0)
  File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/utils.py", line 159, in <listcomp>
    y_ref = np.mean([ann.models[m].predict(x_ref) for m in range(5)], axis=0)
  File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:
Detected at node 'model_1/conv1d_3/Conv1D' defined at (most recent call last):
    File "/Home/Users/apraga/.local/bin/spliceai", line 8, in <module>
      sys.exit(main())
    File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/__main__.py", line 72, in main
      scores = get_delta_scores(record, ann, args.D, args.M)
    File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/utils.py", line 159, in get_

[10.8775]

[34.6872]

 to load libgkl_utils.so from native/libgkl_utils.so (/Work/Users/apraga/bisonex/out/NA12878_NIST7035/preprocessing/applybqsr/libgkl_utils821485189051585397.so: libgomp.so.1: cannot open shared object file: No such file or directory)
17:28:00.733 WARN  IntelPairHmm - Intel GKL Utils not loaded
17:28:00.733 WARN  PairHMM - ***WARNING: Machine does not have the AVX instruction set support needed for the accelerated AVX PairHmm. Falling back to the MUCH slower LOGLESS_CACHING implementation!
17:28:00.763 INFO  ProgressMeter - Starting traversal
#+end_quote
libgomp.so est fourni par gcc donc il faut charger le module
 module load gcc@11.3.0/gcc-12.1.0
** KILL Utiliser subworkflow
CLOSED: [2023-04-02 Sun 18:08]
Notre version permet d'être plus souple
*** KILL Alignement
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
*** KILL Vep
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
vcf_annotate_ensemblvep
** TODO Annotation avec nextflow :annotation:
*** KILL VEP : --gene-phenotype ?
CLOSED: [2023-04-18 mar. 18:32]
Vu avec alexis : bases de données non à jour
https://www.ensembl.org/info/genome/variation/phenotype/sources_phenotype_documentation.html
*** DONE plugin VEP
CLOSED: [2023-04-18 mar. 18:32]
Cloner dépôt git avec plugin
Puis utiliser --dir_plugins
*** HOLD Utiliser code d’Alexis
*** TODO Nouvelle version avec VEP
Example avec --custom
https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html
**** DONE Ajout spliceAI
CLOSED: [2023-05-18 Thu 11:02] SCHEDULED: <2023-04-30 Sun>
plugin VEP
***** DONE Télécharger les données
CLOSED: [2023-05-11 Thu 19:01]
Difficile d'automatiser, le lien est temporaire...
***** DONE PLugin
CLOSED: [2023-05-11 Thu 20:16]
***** DONE Séparer score en plusieurs colonnes
CLOSED: [2023-05-11 Thu 20:16]
Test avec ce fichier pour avoir une ligne avec annotation et une ligne sans
#CHROM	POS	ID	REF	ALT
1	9091	.	A	C
1	69091	.	A	C
et
#+begin_src sh
rm -f postvep.tsv* && vep -i testspliceai.vcf.gz -o postvep.tsv --tab  --dir 109 --merged --pick --use_given_ref   --offline  --plugin SpliceAI,snv=spliceai_scores.raw.snv.hg38.vcf.gz,indel=spliceai_scores.raw.indel.hg38.vcf.gz
#+end_src
#+begin_src
$ bgzip postvep.tsv
$ python spliceai.py
$ cat postvep2.tsv
,variation,Location,Allele,Gene,Feature,Feature_type,Consequence,cDNA_position,CDS_position,Protein_position,Amino_acids,Codons,Existing_variation,IMPACT,DISTANCE,STRAND,FLAGS,REFSEQ_MATCH,SOURCE,REFSEQ_OFFSET,SpliceAI_AG,SpliceAI_AL,SpliceAI_DG,SpliceAI_DL
0,1_9091_A/C,1:9091,C,ENSG00000290825,ENST00000456328,Transcript,upstream_gene_variant,-,-,-,-,-,-,MODIFIER,2778,1,-,-,Ensembl,-,,,,
1,1_69091_A/C,1:69091,C,ENSG00000186092,ENST00000641515,Transcript,missense_variant,124,64,22,M/L,Atg/Ctg,-,MODERATE,-,1,-,-,Ensembl,-,0.01,0.00,0.00,0.01
#+end_src
Test
cp work/bf/437ae511958509e43072f032f4d495/small.tab.gz tests/vep-spip.tab.gz
cp work/d5/3b1244b5ae83d54409ee0d456e8c55/small_cadd.tab.gz tests/vep-cadd-splice.tab.gz
**** DONE Package Nix spliceAI
CLOSED: [2023-10-07 Sat 18:00]
On utilise le tensorflow fourni avec nix (branche spliceai)
Il faut LD_PRELOAD=/lib64/libcuda.so  pour l'exécution
***** DONE Version CPU non optimisé
CLOSED: [2023-09-23 Sat 21:34]
****** DONE Vérifier annotation hg19 et 38
CLOSED: [2023-09-23 Sat 18:49]
****** DONE Annotation maison T2T
CLOSED: [2023-09-23 Sat 21:33] SCHEDULED: <2023-09-23 Sat>
***** DONE Version CPU optismisée
CLOSED: [2023-09-23 Sat 21:34]
Activer flag dans le package nix
***** DONE Version CPU:  Test chr20
CLOSED: [2023-09-23 Sat 22:25] SCHEDULED: <2023-09-23 Sat>
10min ?
***** DONE Version CPU:  NA12878 sanger complet: kill car trop long
CLOSED: [2023-09-24 Sun 08:22] SCHEDULED: <2023-09-23 Sat>
10min ?
***** DONE GPU avec la version sur mésocentre: mail envoyé
CLOSED: [2023-09-26 Tue 11:49]
#+begin_src sh
module unload nix/2.11.0
module load anaconda3@2021.05/gcc-12.1.0
module load deep/tensorflow-gpu
pip install spliceai
#+end_src
Puis on teste sur la queue gpu
#+begin_src sh
srun -p gpu -t 4:00:00 --gres=gpu:1 --pty bash
cd /Work/Users/apraga/bisonex/tests/spliceai/
module load deep/tensorflow-gpu
module unload nix/2.11.0
time spliceai -I NA12878-sanger-chr20-T2T.vep.vcf.gz -O output-20-2.vcf -R /Work/Groups/bisonex/data/fasta/chm13v2.0/chm13v2.0.fa -A ~/t2t.txt
#+end_src
Échec: librarie DNN not found...
#+begin_quote
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-24 09:37:45.892545: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-09-24 09:37:47.759421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 38188 MB memory:  -> device: 0, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:21:00.0, compute capability: 8.0
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
2023-09-24 09:37:54.143021: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:637] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2023-09-24 09:37:54.217160: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:429] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2023-09-24 09:37:54.217220: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:438] Possibly insufficient driver version: 510.108.3
2023-09-24 09:37:54.217245: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at conv_ops.cc:1068 : UNIMPLEMENTED: DNN library is not found.
2023-09-24 09:37:54.217262: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:GPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): UNIMPLEMENTED: DNN library is not found.
         [[{{node model_1/conv1d_3/Conv1D}}]]
Traceback (most recent call last):
  File "/Home/Users/apraga/.local/bin/spliceai", line 8, in <module>
    sys.exit(main())
  File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/__main__.py", line 72, in main
    scores = get_delta_scores(record, ann, args.D, args.M)
  File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/utils.py", line 159, in get_delta_scores
    y_ref = np.mean([ann.models[m].predict(x_ref) for m in range(5)], axis=0)
  File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/utils.py", line 159, in <listcomp>
    y_ref = np.mean([ann.models[m].predict(x_ref) for m in range(5)], axis=0)
  File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:
Detected at node 'model_1/conv1d_3/Conv1D' defined at (most recent call last):
    File "/Home/Users/apraga/.local/bin/spliceai", line 8, in <module>
      sys.exit(main())
    File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/__main__.py", line 72, in main
      scores = get_delta_scores(record, ann, args.D, args.M)
    File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/utils.py", line 159, in get_

Replacement in projects/bisonex.org at line 26 [26.35]

B:BD[2.8119] → [2.8119:8252]

∅:D[2.8252] → [33.32798:40857]

B:BD[33.32798] → [33.32798:40857]

e_tract_variant&intron_variant&non_coding_transcript_variant
     15 splice_region_variant&synonymous_variant
     14 start_lost
 
    44 stop_gained
      4 stop_gained&frameshift_variant
      2 stop_gained&frameshift_variant&splice_region_variant
      3 stop_gained&nmd_transcript_variant
      3 stop_gained&splice_region_variant
      2 stop_gained&splice_region_variant&nmd_transcript_variant
      2 stop_lost
      1 stop_lost&nmd_transcript_variant
      6 stop_retained_variant
      2 stop_retained_variant&nmd_transcript_variant
     89 synonymous_variant
      2 synonymous_variant&nmd_transcript_variant
      1 transcript_ablation
      1 upstream_gene_variant
En prenant tous les transcrits, 66% missense (total = 22085)
           39 3_prime_UTR_variant
    120 3_prime_UTR_variant&NMD_transcript_variant
     22 5_prime_UTR_variant
      2 5_prime_UTR_variant&NMD_transcript_variant
     94 coding_sequence_variant
     13 coding_sequence_variant&NMD_transcript_variant
    527 downstream_gene_variant
    257 frameshift_variant
     21 frameshift_variant&NMD_transcript_variant
      2 frameshift_variant&splice_donor_region_variant
     20 frameshift_variant&splice_region_variant
      1 frameshift_variant&splice_region_variant&NMD_transcript_variant
      1 incomplete_terminal_codon_variant&coding_sequence_variant
    211 inframe_deletion
     18 inframe_deletion&NMD_transcript_variant
      6 inframe_deletion&splice_region_variant
    242 inframe_insertion
     22 inframe_insertion&NMD_transcript_variant
      4 inframe_insertion&splice_region_variant
    983 intron_variant
    244 intron_variant&NMD_transcript_variant
    358 intron_variant&non_coding_transcript_variant
  14690 missense_variant
   1416 missense_variant&NMD_transcript_variant
      6 missense_variant&splice_donor_5th_base_variant
    374 missense_variant&splice_region_variant
     34 missense_variant&splice_region_variant&NMD_transcript_variant
    383 non_coding_transcript_exon_variant
     53 splice_acceptor_variant
     11 splice_acceptor_variant&NMD_transcript_variant
     11 splice_acceptor_variant&non_coding_transcript_variant
     20 splice_donor_5th_base_variant&intron_variant
      4 splice_donor_5th_base_variant&intron_variant&NMD_transcript_variant
      9 splice_donor_5th_base_variant&intron_variant&non_coding_transcript_variant
     59 splice_donor_region_variant&intron_variant
     11 splice_donor_region_variant&intron_variant&NMD_transcript_variant
     24 splice_donor_region_variant&intron_variant&non_coding_transcript_variant
     79 splice_donor_variant
      6 splice_donor_variant&NMD_transcript_variant
     17 splice_donor_variant&non_coding_transcript_variant
      1 splice_donor_variant&splice_donor_5th_base_variant&3_prime_UTR_variant&intron_variant&NMD_transcript_variant
     21 splice_donor_variant&splice_donor_5th_base_variant&coding_sequence_variant&intron_variant
      3 splice_donor_variant&splice_donor_5th_base_variant&intron_variant
      1 splice_donor_variant&splice_donor_5th_base_variant&non_coding_transcript_exon_variant&intron_variant
    176 splice_polypyrimidine_tract_variant&intron_variant
     27 splice_polypyrimidine_tract_variant&intron_variant&NMD_transcript_variant
     48 splice_polypyrimidine_tract_variant&intron_variant&non_coding_transcript_variant
      1 splice_region_variant&3_prime_UTR_variant
     24 splice_region_variant&3_prime_UTR_variant&NMD_transcript_variant
      9 splice_region_variant&5_prime_UTR_variant
     61 splice_region_variant&intron_variant
     23 splice_region_variant&intron_variant&NMD_transcript_variant
     37 splice_region_variant&intron_variant&non_coding_transcript_variant
     26 splice_region_variant&non_coding_transcript_exon_variant
      5 splice_region_variant&non_coding_transcript_variant
    145 splice_region_variant&splice_polypyrimidine_tract_variant&intron_variant
     27 splice_region_variant&splice_polypyrimidine_tract_variant&intron_variant&NMD_transcript_variant
     41 splice_region_variant&splice_polypyrimidine_tract_variant&intron_variant&non_coding_transcript_variant
     37 splice_region_variant&synonymous_variant
      3 splice_region_variant&synonymous_variant&NMD_transcript_variant
     30 start_lost
      5 start_lost&NMD_transcript_variant
    135 stop_gained
     13 stop_gained&frameshift_variant
      3 stop_gained&frameshift_variant&NMD_transcript_variant
      2 stop_gained&frameshift_variant&splice_region_variant
     14 stop_gained&NMD_transcript_variant
      5 stop_gained&splice_region_variant
      2 stop_gained&splice_region_variant&NMD_transcript_variant
      4 stop_lost
      1 stop_lost&NMD_transcript_variant
      9 stop_retained_variant
      6 stop_retained_variant&NMD_transcript_variant
    311 synonymous_variant
     24 synonymous_variant&NMD_transcript_variant
      1 transcript_ablation
    390 upstream_gene_variant
**** TODO Ajout LOEUF et pli
plugin VEP
**** TODO NMD
plugin VEP
**** KILL Ajout LOEUF
CLOSED: [2023-04-19 mer. 16:32]
plugin VEP
**** DONE Spip
CLOSED: [2023-05-01 Mon 23:07] SCHEDULED: <2023-04-30 Sun>
BED ne semble pas bien marcher (il faut définir une zone)
VCF : trop d’information
Attention, plusieurs transcripts mais résultats identiques. On supprimer les doublons
***** DONE interpretation + score + intervalle de confiance séparé
CLOSED: [2023-05-01 Mon 23:07] SCHEDULED: <2023-04-30 Sun>
Tests :
dans tests/
vep -i 63004925-small.vcf -o postvep.vcf --vcf --fasta genomeRef.fna --dir 109 --merged --pick  --offline --custom ../script/spip_annotation.vcf.gz,SPIP,vcf,exact,0,spipInterp,spipScore,spipConfidence
***** DONE Score
CLOSED: [2023-04-22 Sat 15:30]
**** DONE CADD: remplacer par plugin VEP
CLOSED: [2023-05-07 Sun 14:45] SCHEDULED: <2023-05-07 Sun>
***** Test
#+begin_src
vep  -i test.vcf  -o lol.vcf --offline --dir  /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz  --dir_plugins ../VEP_plugins/ -v
#+end_src
Test
#+begin_src sh
vep --id "1  230710048 230710048 A/G 1"   --offline --dir  /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz  --hgvsg --plugin pLI --plugin LOEUF -o lol
#+end_src
CSQ=G|missense_variant|MODERATE|AGT|ENSG00000135744|Transcript|ENST00000366667|protein_coding|2/5||||843|776|259|M/T|aTg/aCg|||-1||HGNC|HGNC:333||Ensembl||A|A||1:g.230710048A>G|0.347|-0.277922|
Correspond bien à https://www.ensembl.org/Homo_sapiens/Tools/VEP/Results?tl=I7ZsIbrj14P6lD43-9115494
***** DONE Utiliser whole genome
CLOSED: [2023-04-29 Sat 15:46]
***** KILL Renommer les chromosome avant ...
CLOSED: [2023-05-01 Mon 09:14] SCHEDULED: <2023-04-30 Sun>
Trop long !
- Téléchargement de CADD: 4h20
- renommer les chromosome pour SNV : 6h20
- tabix sur les SNV : job tué au bout de 21h....
***** DONE annoter séparément et fusionner les tableaux
CLOSED: [2023-05-07 Sun 14:45] SCHEDULED: <2023-05-01 Mon>
NB: on pourrait filtrer CADD avec tabix pour se restreindre à nos variants
**** DONE clinvar
CLOSED: [2023-04-22 Sat 15:31]
**** KILL Vérifier résultats HGVS avec mutalyzer
CLOSED: [2023-05-01 Mon 09:26]
**** HOLD Parallélisation
***** HOLD par chromosome avec workflow VEP
https://github.com/Ensembl/ensembl-vep/blob/release/109/nextflow/workflows/run_vep.nf
***** HOLD Avec option --fork
**** DONE Utiliser la version de nf-core de VEP
CLOSED: [2023-05-13 Sat 18:27] SCHEDULED: <2023-05-07 Sun>
**** DONE OMIM
CLOSED: [2023-08-31 Thu 10:38] SCHEDULED: <2023-08-29 Tue>
**** DONE plI et LOEUF depuis gnomad
CLOSED: [2023-08-3

[2.8119]

[9.497]

e_tract_variant&intron_variant&non_coding_transcript_variant
     15 splice_region_variant&synonymous_variant
     14 start_lost
     44 stop_gained
      4 stop_gained&frameshift_variant
      2 stop_gained&frameshift_variant&splice_region_variant
      3 stop_gained&nmd_transcript_variant
      3 stop_gained&splice_region_variant
      2 stop_gained&splice_region_variant&nmd_transcript_variant
      2 stop_lost
      1 stop_lost&nmd_transcript_variant
      6 stop_retained_variant
      2 stop_retained_variant&nmd_transcript_variant
     89 synonymous_variant
      2 synonymous_variant&nmd_transcript_variant
      1 transcript_ablation
      1 upstream_gene_variant
En prenant tous les transcrits, 66% missense (total = 22085)
           39 3_prime_UTR_variant
    120 3_prime_UTR_variant&NMD_transcript_variant
     22 5_prime_UTR_variant
      2 5_prime_UTR_variant&NMD_transcript_variant
     94 coding_sequence_variant
     13 coding_sequence_variant&NMD_transcript_variant
    527 downstream_gene_variant
    257 frameshift_variant
     21 frameshift_variant&NMD_transcript_variant
      2 frameshift_variant&splice_donor_region_variant
     20 frameshift_variant&splice_region_variant
      1 frameshift_variant&splice_region_variant&NMD_transcript_variant
      1 incomplete_terminal_codon_variant&coding_sequence_variant
    211 inframe_deletion
     18 inframe_deletion&NMD_transcript_variant
      6 inframe_deletion&splice_region_variant
    242 inframe_insertion
     22 inframe_insertion&NMD_transcript_variant
      4 inframe_insertion&splice_region_variant
    983 intron_variant
    244 intron_variant&NMD_transcript_variant
    358 intron_variant&non_coding_transcript_variant
  14690 missense_variant
   1416 missense_variant&NMD_transcript_variant
      6 missense_variant&splice_donor_5th_base_variant
    374 missense_variant&splice_region_variant
     34 missense_variant&splice_region_variant&NMD_transcript_variant
    383 non_coding_transcript_exon_variant
     53 splice_acceptor_variant
     11 splice_acceptor_variant&NMD_transcript_variant
     11 splice_acceptor_variant&non_coding_transcript_variant
     20 splice_donor_5th_base_variant&intron_variant
      4 splice_donor_5th_base_variant&intron_variant&NMD_transcript_variant
      9 splice_donor_5th_base_variant&intron_variant&non_coding_transcript_variant
     59 splice_donor_region_variant&intron_variant
     11 splice_donor_region_variant&intron_variant&NMD_transcript_variant
     24 splice_donor_region_variant&intron_variant&non_coding_transcript_variant
     79 splice_donor_variant
      6 splice_donor_variant&NMD_transcript_variant
     17 splice_donor_variant&non_coding_transcript_variant
      1 splice_donor_variant&splice_donor_5th_base_variant&3_prime_UTR_variant&intron_variant&NMD_transcript_variant
     21 splice_donor_variant&splice_donor_5th_base_variant&coding_sequence_variant&intron_variant
      3 splice_donor_variant&splice_donor_5th_base_variant&intron_variant
      1 splice_donor_variant&splice_donor_5th_base_variant&non_coding_transcript_exon_variant&intron_variant
    176 splice_polypyrimidine_tract_variant&intron_variant
     27 splice_polypyrimidine_tract_variant&intron_variant&NMD_transcript_variant
     48 splice_polypyrimidine_tract_variant&intron_variant&non_coding_transcript_variant
      1 splice_region_variant&3_prime_UTR_variant
     24 splice_region_variant&3_prime_UTR_variant&NMD_transcript_variant
      9 splice_region_variant&5_prime_UTR_variant
     61 splice_region_variant&intron_variant
     23 splice_region_variant&intron_variant&NMD_transcript_variant
     37 splice_region_variant&intron_variant&non_coding_transcript_variant
     26 splice_region_variant&non_coding_transcript_exon_variant
      5 splice_region_variant&non_coding_transcript_variant
    145 splice_region_variant&splice_polypyrimidine_tract_variant&intron_variant
     27 splice_region_variant&splice_polypyrimidine_tract_variant&intron_variant&NMD_transcript_variant
     41 splice_region_variant&splice_polypyrimidine_tract_variant&intron_variant&non_coding_transcript_variant
     37 splice_region_variant&synonymous_variant
      3 splice_region_variant&synonymous_variant&NMD_transcript_variant
     30 start_lost
      5 start_lost&NMD_transcript_variant
    135 stop_gained
     13 stop_gained&frameshift_variant
      3 stop_gained&frameshift_variant&NMD_transcript_variant
      2 stop_gained&frameshift_variant&splice_region_variant
     14 stop_gained&NMD_transcript_variant
      5 stop_gained&splice_region_variant
      2 stop_gained&splice_region_variant&NMD_transcript_variant
      4 stop_lost
      1 stop_lost&NMD_transcript_variant
      9 stop_retained_variant
      6 stop_retained_variant&NMD_transcript_variant
    311 synonymous_variant
     24 synonymous_variant&NMD_transcript_variant
      1 transcript_ablation
    390 upstream_gene_variant
**** KILL Ajout LOEUF et pli
CLOSED: [2023-10-07 Sat 17:57]
plugin VEP
**** DONE NMD
CLOSED: [2023-10-07 Sat 17:57]
plugin VEP
**** KILL Ajout LOEUF
CLOSED: [2023-04-19 mer. 16:32]
plugin VEP
**** DONE Spip
CLOSED: [2023-05-01 Mon 23:07] SCHEDULED: <2023-04-30 Sun>
BED ne semble pas bien marcher (il faut définir une zone)
VCF : trop d’information
Attention, plusieurs transcripts mais résultats identiques. On supprimer les doublons
***** DONE interpretation + score + intervalle de confiance séparé
CLOSED: [2023-05-01 Mon 23:07] SCHEDULED: <2023-04-30 Sun>
Tests :
dans tests/
vep -i 63004925-small.vcf -o postvep.vcf --vcf --fasta genomeRef.fna --dir 109 --merged --pick  --offline --custom ../script/spip_annotation.vcf.gz,SPIP,vcf,exact,0,spipInterp,spipScore,spipConfidence
***** DONE Score
CLOSED: [2023-04-22 Sat 15:30]
**** DONE CADD: remplacer par plugin VEP
CLOSED: [2023-05-07 Sun 14:45] SCHEDULED: <2023-05-07 Sun>
***** Test
#+begin_src
vep  -i test.vcf  -o lol.vcf --offline --dir  /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz  --dir_plugins ../VEP_plugins/ -v
#+end_src
Test
#+begin_src sh
vep --id "1  230710048 230710048 A/G 1"   --offline --dir  /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz  --hgvsg --plugin pLI --plugin LOEUF -o lol
#+end_src
CSQ=G|missense_variant|MODERATE|AGT|ENSG00000135744|Transcript|ENST00000366667|protein_coding|2/5||||843|776|259|M/T|aTg/aCg|||-1||HGNC|HGNC:333||Ensembl||A|A||1:g.230710048A>G|0.347|-0.277922|
Correspond bien à https://www.ensembl.org/Homo_sapiens/Tools/VEP/Results?tl=I7ZsIbrj14P6lD43-9115494
***** DONE Utiliser whole genome
CLOSED: [2023-04-29 Sat 15:46]
***** KILL Renommer les chromosome avant ...
CLOSED: [2023-05-01 Mon 09:14] SCHEDULED: <2023-04-30 Sun>
Trop long !
- Téléchargement de CADD: 4h20
- renommer les chromosome pour SNV : 6h20
- tabix sur les SNV : job tué au bout de 21h....
***** DONE annoter séparément et fusionner les tableaux
CLOSED: [2023-05-07 Sun 14:45] SCHEDULED: <2023-05-01 Mon>
NB: on pourrait filtrer CADD avec tabix pour se restreindre à nos variants
**** DONE clinvar
CLOSED: [2023-04-22 Sat 15:31]
**** KILL Vérifier résultats HGVS avec mutalyzer
CLOSED: [2023-05-01 Mon 09:26]
**** HOLD Parallélisation
***** HOLD par chromosome avec workflow VEP
https://github.com/Ensembl/ensembl-vep/blob/release/109/nextflow/workflows/run_vep.nf
***** HOLD Avec option --fork
**** DONE Utiliser la version de nf-core de VEP
CLOSED: [2023-05-13 Sat 18:27] SCHEDULED: <2023-05-07 Sun>
**** DONE OMIM
CLOSED: [2023-08-31 Thu 10:38] SCHEDULED: <2023-08-29 Tue>
**** DONE plI et LOEUF depuis gnomad
CLOSED: [2023-08-3

Replacement in projects/bisonex.org at line 28 [26.35]

B:BD[9.8689] → [9.8689:8778]

∅:D[9.8778] → [36.9240:9271]

B:BD[36.9240] → [36.9240:9271]

B:BD[36.9271] → [10.8807:16879]

pendences pour VM CHU
CLOSED: [2023-04-22 Sat 15:27] SCHEDULED: <2023-04-13 Thu>
* Bibl
iographie
** DONE Finir[cite:@
alser2021]
CLOSED: [2023-09-26 Tue 11:26] SCHEDULED: <2023-09-22 Fri>
* Manuscript
:PROPERTIES:
:CATEGORY: manuscript
:END:
** DONE Flowchart pipeline (avec T2T)
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
** DONE Figure: nombre de publication par aligneur
CLOSED: [2023-09-19 Tue 16:54] SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
** TODO Biblio performance aligneur
SCHEDULED: <2023-10-01 Sun>
** TODO Biblio appel de variant
SCHEDULED: <2023-10-01 Sun>
** TODO Figure: nombre de publication par appel de variant
SCHEDULED: <2023-10-04 Wed>
/Entered on/ [2023-09-19 Tue 08:43]
** TODO Figure: nombre d'articles citant les principaux aligneur par année
SCHEDULED: <2023-10-03 Tue>
** TODO Figure: nombre d'exomes par années
SCHEDULED: <2023-10-10 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
* Tests :tests:
** KILL Non régression : version prod
CLOSED: [2023-05-23 Tue 08:46]
*** DONE ID common snp
CLOSED: [2022-11-19 Sat 21:36]
#+begin_src
$ wc -l ID_of_common_snp.txt
23194290 ID_of_common_snp.txt
$ wc -l /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
23194290 /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
#+end_src
*** DONE ID common snp not clinvar patho
CLOSED: [2022-12-11 Sun 20:11]
**** DONE Vérification du problème
CLOSED: [2022-12-11 Sun 16:30]
Sur le J:
21155134 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref
Version de "non-régression"
21155076 database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt
Nouvelle version
23193391 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt
Si on enlève les doublons
$ sort database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt | uniq > old.txt
$ wc -l old.txt
21107097 old.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt | uniq > new.txt
$ wc -l new.txt
21174578 new.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref | uniq > ref.txt
$ wc -l ref.txt
21107155 ref.txt
Si on regarde la différence
 comm -23 ref.txt old.txt
rs1052692
rs1057518973
rs1057518973
rs11074121
rs112848754
rs12573787
rs145033890
rs147889095
rs1553904159
rs1560294695
rs1560296615
rs1560310926
rs1560325547
rs1560342418
rs1560356225
rs1578287542
...
On cherche le premier
bcftools query -i 'ID="rs1052692"' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 1619351 C A,T
Il est bien patho...
$ bcftools query -i 'POS=1619351' database/clinvar/clinvar.vcf.gz -f '%CHROM %POS %REF %ALT %INFO/CLNSIG\n'
19 1619351 C T Conflicting_interpretations_of_pathogenicity
On vérifie pour tous les autres
$ comm -23 ref.txt old.txt > tocheck.txt
On génère les régions à vérifier (chromosome number:position)
$ bcftools query -i 'ID=@tocheck.txt' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM\t%POS\n' > tocheck.pos
On génère le mapping inverse (chromosome number -> NC)
$ awk ' { t = $1; $1 = $2; $2 = t; print; } ' database/RefSeq/refseq_to_number_only_consensual.txt  > mapping.txt
On remap clinvar
$ bcftools annotate --rename-chrs mapping.txt database/clinvar/clinvar.vcf.gz -o clinvar_remapped.vcf.gz
$ tabix clinvar_remapped.vcf.gz
Enfin, on cherche dans clinvar la classification
$ bcftools query -R tocheck.pos clinvar_remapped.vcf.gz -f '%CHROM %POS %INFO/CLNSIG\n'
$ bcftools query -R tocheck.pos database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %ID \n' | grep '^NC'
#+RESULTS:
**** DONE Comprendre pourquoi la nouvelle version donne un résultat différent
CLOSED: [2022-12-11 Sun 20:11]
***** DONE Même version dbsnp et clinvar ?
CLOSED: [2022-12-10 Sat 23:02]
Clinvar différent !
  $ bcftools stats clinvar.gz
  clinvar (Alexis)
SN	0	number of samples:	0
SN	0	number of records:	1492828
SN	0	number of no-ALTs:	965
SN	0	number of SNPs:	1338007
SN	0	number of MNPs:	5562
SN	0	number of indels:	144580
SN	0	number of others:	3714
SN	0	number of multiallelic sites:	0
SN	0	number of multiallelic SNP sites:	0
clinvar (new)
SN	0	number of samples:	0
SN	0	number of records:	1493470
SN	0	number of no-ALTs:	965
SN	0	number of SNPs:	1338561
SN	0	number of MNPs:	5565
SN	0	number of indels:	144663
SN	0	number of others:	3716
SN	0	number of multiallelic sites:	0
SN	0	number of multiallelic SNP sites:	0
***** DONE Mettre à jour clinvar et dbnSNP pour travailler sur les mêm bases
CLOSED: [2022-12-11 Sun 12:10]
Problème persiste
***** DONE Supprimer la conversion en int du chromosome
CLOSED: [2022-12-10 Sat 19:29]
***** KILL Même NC ?
CLOSED: [2022-12-10 Sat 19:29]
$  zgrep "contig=<ID=NC_\(.*\)" clinvar/GRCh38/clinvar.vcf.gz > contig.clinvar
$ diff contig.txt contig.clinvar
< ##contig=<ID=NC_012920.1>
***** DONE Tester sur chromosome 19: ok
CLOSED: [2022-12-11 Sun 13:53]
On prépare les données
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/dbSNP_common.vcf.gz -o dbSNP_common_19.vcf.gz
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/clinvar/GRCh38/clinvar.vcf.gz -o clinvar_19.vcf.gz
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data-alexis/dbSNP/dbSNP_common.vcf.gz -o dbSNP_common_19_old.vcf.gz
 bcftools filter -i 'CHROM="19"' /Work/Groups/bisonex/data-alexis/clinvar/clinvar.vcf.gz -o clinvar_19_old.vcf.gz
#+end_src
On récupère les 2 versions du script
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
git checkout regression ../../script/pythonScript/clinvar_sbSNP.py
cp ../../script/pythonScript/clinvar_sbSNP.py clinvar_sbSNP_old.py
git checkout HEAD ../../script/pythonScript/clinvar_sbSNP.py
#+end_src
#+RESULTS:
On compare
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
python ../../script/pythonScript/clinvar_sbSNP.py clinvar_sbSNP.py --clinvar clinvar_19.vcf.gz --dbSNP dbSNP_common_19.vcf.gz --output tmp.txt
sort tmp.txt | uniq > new.txt
table=/Work/Groups/bisonex/data-alexis/RefSeq/refseq_to_number_only_consensual.txt
python clinvar_sbSNP_old.py --clinvar clinvar_19_old.vcf.gz --dbSNP dbSNP_common_19_old.vcf.gz --output tmp_old.txt --chrm_name_table $table
sort tmp_old.txt | uniq > old.txt
wc -l old.txt new.txt
#+end_src
#+RESULTS:
|  535155 | old.txt |
|  535194 | new.txt |
| 1070349 | total   |
Si on prend le premier manquant dans new, il est conflicting patho donc il ne devrait pas y être...
$ bcftools query -i 'ID="rs10418277"' dbSNP
_common_19.vcf.gz  -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 54939682 C G,T
$ bcftools query -i 'ID="rs10418277"' dbSNP_common_19_old.vcf.gz  -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 54939682 C G,T
$ bcftools query -i 'POS=54939682' clinvar_19.vcf.gz  -f '%POS %REF %ALT %INFO/CLNSIG\n'
54939682 C G Conflicting_interpretations_of_pathogenicity
54939682 C T Benign
$ bcftools query -i 'POS=54939682' clinvar_19_old.vcf.gz  -f '%POS %REF %ALT %INFO/CLNSIG\n'
54939682 C G Conflicting_interpretations_of_pathogenicity
54939682 C T Benign
$ grep rs10418277 *.txt
new.txt:rs10418277
tmp.txt:rs10418277
Le problème venait de la POS qui n'était plus convertie en int (suppression de la ligne par erreur ??)
On vérifie
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
python ../../script/pythonScript/clinvar_sbSNP.py --clinvar clinvar_19.vcf.gz --dbSNP dbSNP_common_19.vcf.gz --output tmp.txt
sort tmp.txt | uniq > new.txt
table=/Work/Groups/bisonex/data-alexis/RefSeq/refseq_to_number_only_consensual.txt
python clinvar_sbSNP_old.py --clinvar clinvar_19_old.vcf.gz --dbSNP dbSNP_common_19_old.vcf.gz --output tmp_old.txt --chrm_name_table $table
sort tmp_old.txt | uniq > old.txt
wc -l

[9.8689]

[10.16879]

pendences pour VM CHU
CLOSED: [2023-04-22 Sat 15:27] SCHEDULED: <2023-04-13 Thu>
* Bibliographie
** DONE Finir[cite:@alser2021]
CLOSED: [2023-09-26 Tue 11:26] SCHEDULED: <2023-09-22 Fri>
* Manuscript
:PROPERTIES:
:CATEGORY: manuscript
:END:
** DONE Flowchart pipeline (avec T2T)
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
** DONE Figure: nombre de publication par aligneur
CLOSED: [2023-09-19 Tue 16:54] SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
** DONE Biblio performance aligneur
CLOSED: [2023-10-06 Fri 16:51] SCHEDULED: <2023-10-01 Sun>
** TODO Biblio appel de variant
SCHEDULED: <2023-10-01 Sun>
** TODO Figure: nombre de publication par appel de variant
SCHEDULED: <2023-10-04 Wed>
/Entered on/ [2023-09-19 Tue 08:43]
** TODO Figure: nombre d'articles citant les principaux aligneur par année
SCHEDULED: <2023-10-03 Tue>
** TODO Figure: nombre d'exomes par années
SCHEDULED: <2023-10-10 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
* Tests :tests:
** KILL Non régression : version prod
CLOSED: [2023-05-23 Tue 08:46]
*** DONE ID common snp
CLOSED: [2022-11-19 Sat 21:36]
#+begin_src
$ wc -l ID_of_common_snp.txt
23194290 ID_of_common_snp.txt
$ wc -l /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
23194290 /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
#+end_src
*** DONE ID common snp not clinvar patho
CLOSED: [2022-12-11 Sun 20:11]
**** DONE Vérification du problème
CLOSED: [2022-12-11 Sun 16:30]
Sur le J:
21155134 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref
Version de "non-régression"
21155076 database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt
Nouvelle version
23193391 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt
Si on enlève les doublons
$ sort database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt | uniq > old.txt
$ wc -l old.txt
21107097 old.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt | uniq > new.txt
$ wc -l new.txt
21174578 new.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref | uniq > ref.txt
$ wc -l ref.txt
21107155 ref.txt
Si on regarde la différence
 comm -23 ref.txt old.txt
rs1052692
rs1057518973
rs1057518973
rs11074121
rs112848754
rs12573787
rs145033890
rs147889095
rs1553904159
rs1560294695
rs1560296615
rs1560310926
rs1560325547
rs1560342418
rs1560356225
rs1578287542
...
On cherche le premier
bcftools query -i 'ID="rs1052692"' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 1619351 C A,T
Il est bien patho...
$ bcftools query -i 'POS=1619351' database/clinvar/clinvar.vcf.gz -f '%CHROM %POS %REF %ALT %INFO/CLNSIG\n'
19 1619351 C T Conflicting_interpretations_of_pathogenicity
On vérifie pour tous les autres
$ comm -23 ref.txt old.txt > tocheck.txt
On génère les régions à vérifier (chromosome number:position)
$ bcftools query -i 'ID=@tocheck.txt' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM\t%POS\n' > tocheck.pos
On génère le mapping inverse (chromosome number -> NC)
$ awk ' { t = $1; $1 = $2; $2 = t; print; } ' database/RefSeq/refseq_to_number_only_consensual.txt  > mapping.txt
On remap clinvar
$ bcftools annotate --rename-chrs mapping.txt database/clinvar/clinvar.vcf.gz -o clinvar_remapped.vcf.gz
$ tabix clinvar_remapped.vcf.gz
Enfin, on cherche dans clinvar la classification
$ bcftools query -R tocheck.pos clinvar_remapped.vcf.gz -f '%CHROM %POS %INFO/CLNSIG\n'
$ bcftools query -R tocheck.pos database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %ID \n' | grep '^NC'
#+RESULTS:
**** DONE Comprendre pourquoi la nouvelle version donne un résultat différent
CLOSED: [2022-12-11 Sun 20:11]
***** DONE Même version dbsnp et clinvar ?
CLOSED: [2022-12-10 Sat 23:02]
Clinvar différent !
  $ bcftools stats clinvar.gz
  clinvar (Alexis)
SN	0	number of samples:	0
SN	0	number of records:	1492828
SN	0	number of no-ALTs:	965
SN	0	number of SNPs:	1338007
SN	0	number of MNPs:	5562
SN	0	number of indels:	144580
SN	0	number of others:	3714
SN	0	number of multiallelic sites:	0
SN	0	number of multiallelic SNP sites:	0
clinvar (new)
SN	0	number of samples:	0
SN	0	number of records:	1493470
SN	0	number of no-ALTs:	965
SN	0	number of SNPs:	1338561
SN	0	number of MNPs:	5565
SN	0	number of indels:	144663
SN	0	number of others:	3716
SN	0	number of multiallelic sites:	0
SN	0	number of multiallelic SNP sites:	0
***** DONE Mettre à jour clinvar et dbnSNP pour travailler sur les mêm bases
CLOSED: [2022-12-11 Sun 12:10]
Problème persiste
***** DONE Supprimer la conversion en int du chromosome
CLOSED: [2022-12-10 Sat 19:29]
***** KILL Même NC ?
CLOSED: [2022-12-10 Sat 19:29]
$  zgrep "contig=<ID=NC_\(.*\)" clinvar/GRCh38/clinvar.vcf.gz > contig.clinvar
$ diff contig.txt contig.clinvar
< ##contig=<ID=NC_012920.1>
***** DONE Tester sur chromosome 19: ok
CLOSED: [2022-12-11 Sun 13:53]
On prépare les données
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/dbSNP_common.vcf.gz -o dbSNP_common_19.vcf.gz
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/clinvar/GRCh38/clinvar.vcf.gz -o clinvar_19.vcf.gz
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data-alexis/dbSNP/dbSNP_common.vcf.gz -o dbSNP_common_19_old.vcf.gz
 bcftools filter -i 'CHROM="19"' /Work/Groups/bisonex/data-alexis/clinvar/clinvar.vcf.gz -o clinvar_19_old.vcf.gz
#+end_src
On récupère les 2 versions du script
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
git checkout regression ../../script/pythonScript/clinvar_sbSNP.py
cp ../../script/pythonScript/clinvar_sbSNP.py clinvar_sbSNP_old.py
git checkout HEAD ../../script/pythonScript/clinvar_sbSNP.py
#+end_src
#+RESULTS:
On compare
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
python ../../script/pythonScript/clinvar_sbSNP.py clinvar_sbSNP.py --clinvar clinvar_19.vcf.gz --dbSNP dbSNP_common_19.vcf.gz --output tmp.txt
sort tmp.txt | uniq > new.txt
table=/Work/Groups/bisonex/data-alexis/RefSeq/refseq_to_number_only_consensual.txt
python clinvar_sbSNP_old.py --clinvar clinvar_19_old.vcf.gz --dbSNP dbSNP_common_19_old.vcf.gz --output tmp_old.txt --chrm_name_table $table
sort tmp_old.txt | uniq > old.txt
wc -l old.txt new.txt
#+end_src
#+RESULTS:
|  535155 | old.txt |
|  535194 | new.txt |
| 1070349 | total   |
Si on prend le premier manquant dans new, il est conflicting patho donc il ne devrait pas y être...
$ bcftools query -i 'ID="rs10418277"' dbSNP
_common_19.vcf.gz  -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 54939682 C G,T
$ bcftools query -i 'ID="rs10418277"' dbSNP_common_19_old.vcf.gz  -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 54939682 C G,T
$ bcftools query -i 'POS=54939682' clinvar_19.vcf.gz  -f '%POS %REF %ALT %INFO/CLNSIG\n'
54939682 C G Conflicting_interpretations_of_pathogenicity
54939682 C T Benign
$ bcftools query -i 'POS=54939682' clinvar_19_old.vcf.gz  -f '%POS %REF %ALT %INFO/CLNSIG\n'
54939682 C G Conflicting_interpretations_of_pathogenicity
54939682 C T Benign
$ grep rs10418277 *.txt
new.txt:rs10418277
tmp.txt:rs10418277
Le problème venait de la POS qui n'était plus convertie en int (suppression de la ligne par erreur ??)
On vérifie
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
python ../../script/pythonScript/clinvar_sbSNP.py --clinvar clinvar_19.vcf.gz --dbSNP dbSNP_common_19.vcf.gz --output tmp.txt
sort tmp.txt | uniq > new.txt
table=/Work/Groups/bisonex/data-alexis/RefSeq/refseq_to_number_only_consensual.txt
python clinvar_sbSNP_old.py --clinvar clinvar_19_old.vcf.gz --dbSNP dbSNP_common_19_old.vcf.gz --output tmp_old.txt --chrm_name_table $table
sort tmp_old.txt | uniq > old.txt
wc -l

Replacement in projects/bisonex.org at line 61 [26.35]

B:BD[37.17453] → [37.17453:17677]

B:BD[37.17677] → [10.17104:25161]

B:BD[10.25161] → [38.364:8467]

  2841 |    26 |    58 |      0.977661 |         0.529245 |
Hg38
| Type  | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision |
| INDEL |         54
9 |      489 |       60 |         899 |       64 |       340 |     8 |    17 |      0.890710 |         0.885510 |
| SNP   |       21973 |    21462 |      511 |       26285 |      563 |      4263 |    68 |    16 |      0.976744 |         0.974435 |
****** DONE Interesection des bed: similaire
CLOSED: [2023-07-04 Tue 23:11]
HG38
 #+begin_src sh
 bedtools intersect -a capture/Agilent_SureSelect_All_Exons_v7_hg38_Regions.bed -b /Work/Groups/bisonex/data/giab/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.bed  | wc -l
 #+end_src
 204280
 T2T
 #+begin_src sh
 bedtools intersect -a /Work/Groups/bisonex/data/giab/T2T/Agilent_SureSelect_All_Exons_v7_hg38_Regions_hg38_T2T.bed -b /Work/Groups/bisonex/data/giab/T2T/HG001_GRCh38_1_22_v4.2.1_benchmark_hg38_T2T.bed  | wc -l
 #+end_src
 204021
****** DONE Vérifier la ligne de commande
CLOSED: [2023-07-04 Tue 23:38]
#+begin_src sh
hap.py \
    HG001_GRCh38_1_22_v4_lifted_merged.vcf.gz \
    HG001-SRX11061486_SRR14724513-T2T.vcf.gz \
     \
    --reference chm13v2.0.fa \
    --threads 6 \
     \
    -T Agilent_SureSelect_All_Exons_v7_hg38_Regions_hg38_T2T.bed \
    --false-positives HG001_GRCh38_1_22_v4.2.1_benchmark_hg38_T2T.bed \
     \
    -o HG001
#+end_src
****** DONE Corriger FILTER : mieux mais toujours trop de négatifs. 3/4 SNP retrouvés
CLOSED: [2023-07-08 Sat 15:19] SCHEDULED: <2023-07-08 Sat>
 Type Filter  TRUTH.TOTAL  TRUTH.TP  TRUTH.FN  QUERY.TOTAL  QUERY.FP  QUERY.UNK  FP.gt  FP.al  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA  METRIC.F1_Score  TRUTH.TOTAL.TiTv_ratio  QUERY.TOTAL.TiTv_ratio  TRUTH.TOTAL.het_hom_ratio  QUERY.TOTAL.het_hom_ratio
INDEL    ALL          413       246       167          751       289        215      2     98       0.595642          0.460821        0.286285         0.519629                     NaN                     NaN                   2.428571                   2.465116
INDEL   PASS          413       246       167          751       289        215      2     98       0.595642          0.460821        0.286285         0.519629                     NaN                     NaN                   2.428571                   2.465116
  SNP    ALL        15883     15479       404        23597      5277       2841     46     44       0.974564          0.745760        0.120397         0.844947                3.017198                 2.85705                   5.560099                   2.114633
  SNP   PASS        15883     15479       404        23597      5277       2841     46     44       0.974564          0.745760        0.120397         0.844947                3.017198                 2.85705                   5.560099                   2.114633
******* DONE Vérifier qu'il ne reste plus de filtre autre que PASS
CLOSED: [2023-07-08 Sat 15:19]
#+begin_src
$ zgrep -c 'PASS' HG001_GRCh38_1_22_v4_lifted_merged.vcf.gz
3730505
$ zgrep -c '^chr' HG001_GRCh38_1_22_v4_lifted_merged.vcf.gz
3730506
#+end_src
****** TODO 1/4 SNP manquant ?
******* DONE Regarder avec Julia si ce sont vraiment des FP: 61/5277 qui ne le sont pas
CLOSED: [2023-07-09 Sun 12:09]
******* DONE Examiner les FP
CLOSED: [2023-07-30 Sun 22:05]
******* DONE Tester un FP
CLOSED: [2023-07-30 Sun 22:05]
  2 │ chr1        608765  A           G           ./.:.:.:.:NOCALL:nocall:.  1/1:FP:.:ti:SNP:homalt:188
  liftDown UCSC: rien en GIAB : vrai FP
 3 │ chr1        762943  A           G           ./.:.:.:.:NOCALL:nocall:.  1/1:FP:.:ti:SNP:homalt:287
 4 │ chr1        762945  A           T           ./.:.:.:.:NOCALL:nocall:.  1/1:FP:.:tv:SNP:homalt:287
 Remaniements complexes ? Pas dans le gène en HG38
******* DONE La plupart des FP (4705/5566) sont homozygotes: erreur de référence ?
CLOSED: [2023-07-12 Wed 21:10] SCHEDULED: <2023-07-09 Sun>
Sur les 2 premiers variants, ils montrent en fait la différence entre T2T et GRCh38
Erreur à l'alignement ?
******** KILL relancer l'alignement
CLOSED: [2023-07-09 Sun 17:36]
******** DONE vérifier reads identiques hg38 et T2T: oui
CLOSED: [2023-07-09 Sun 16:36]
T2T CHR1608765
38   	chr1:1180168-1180168 (
SRR14724513.24448214
SRR14724513.24448214
******* DONE Vérifier quelques variants sur IGV
CLOSED: [2023-07-09 Sun 17:36]
******* KILL Répartition des FP : cluster ?
CLOSED: [2023-07-09 Sun 17:36]
****** DONE Examiner les FP restant après correction selon séquence de référence
CLOSED: [2023-08-12 Sat 15:57]
****** HOLD Examiner les variants supprimé
****** TODO Enlever les FP qui correspondent à un changement dans le génome
******* Condition:
- pas de variation à la position en GRCh38
- variantion homozygote
- la varation en T2T correspond au changement de pair de base GRC38 -> T2T
  pour les SNP:
  alt_T2T[i] = DNA_GRC38[j]
  avec i la position en T2T et j la position en GRCh38
  Note: définir un ID n'est pas correct car les variants peuvent être modifié par happy !
******* Idée
 - Pour chaque FP, c'est un "faux" FP si
     - REF en hg38 == ALT en T2T
     - et REF en hg38 != REF en T2T
     - et variant homozygote
Comment obtenir les séquences de réferences ?
1. liftover
2. blat sur la séquence autour du variant
3. identifier quelques reads contenant le variant et regarder leur aligneement en hg38
Après discussion avec Alexis: solution 3
******* Algorithme
1. Extraire les coordonnées en T2T des faux positifs *homozygote*
2. Pour chaque faux positif
   1. lister 10 reads contenant le variant
   2. pour chacun de ces reads, récupérer la séquence en T2T et GRCh38 via le nom du read dans le bam
   3. si la séquence en T2T modifiée par le variant est "identique" à celle en GRCh38, alors on ignore ce faux positif
Note: on ignore les reads qui ont changé de chromosome entre les version
******* DONE Résultat préliminaire
CLOSED: [2023-07-23 Sun 14:30]
cf [[file:~/roam/research/bisonex/code/giab/giab-corrected.csv][script julia]]
3498 faux positifs en moins, soit 0.89 sensibilité
julia> tp=15479
julia> fp=5277
julia> tp/(tp+fp)
0.7457602620928888
julia> tp/(tp+(fp-3498))
0.8969173716537258
On est toujours en dessous des 97%
******* HOLD Corriger proprement VCF ou résultats Happy
******* TODO Adapter pour gérer plusieurs variants par read
****** DONE Méthodologie du pangenome
CLOSED: [2023-10-03 Tue 21:28]
Voir biblio[cite:@liao2023]  mais ont aligné sur GRCH38
******* DONE Mail alexis
CLOSED: [2023-10-03 Tue 21:28]
****** TODO Méthodologie T2T
Mail alexis
SCHEDULED: <2023-10-04 Wed>
***** KILL Mail Yannis
CLOSED: [2023-07-08 Sat 10:44]
***** DONE Mail GIAB pour version T2T
CLOSED: [2023-07-07 Fri 18:37]
**** TODO HG002 :hg002:T2T:
**** TODO HG003 :hg003:T2T:
**** TODO HG004 :hg004:T2T:
**** DONE Plot : ashkenazim trio :hg38:
CLOSED: [2023-07-30 Sun 16:49] SCHEDULED: <2023-07-30 Sun 15:00>
:LOGBOOK:
CLOCK: [2023-07-30 Sun 16:06]--[2023-07-30 Sun 16:35] =>  0:29
CLOCK: [2023-07-30 Sun 15:39]--[2023-07-30 Sun 15:40] =>  0:01
:END:
/Entered on/ [2023-04-16 Sun 17:29]
Refaire résultats
**** DONE Mail Paul sur les résultat ashkenazim +/- centogene
CLOSED: [2023-08-06 Sun 20:24] SCHEDULED: <2023-08-06 Sun>
**** DONE Relancer comparaison GIAB avec GATK 4.4.0
CLOSED: [2023-08-12 Sat 15:55]
/Entered on/ [2023-08-03 Thu 12:42]
*** KILL Platinum genome
CLOSED: [2023-06-14 Wed 22:37]
https://emea.illumina.com/platinumgenomes.html
*** TODO Séquencer NA12878 :cento:hg001:
Discussion avec Paul : sous-traitant ne nous donnera pas les données, il faut commander l'ADN
**** DONE ADN commandé
CLOSED: [2023-06-30 Fri 22:29]
**** DONE Sauvegarder les données brutes
CLOSED: [2023-07-30 Sun 14:22] SCHEDULED: <2023-07-19 Wed>
K, scality, S
**** KILL Récupérer le fichier de capture
CLOSED: [2023-07-30 Sun 14:25] SCHEDULED: <2023-07-23 Sun>
Candidats donnés dans publication https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8354858/
#+begin_quote
In short, the Nextera Rapid Capture Exome Kit (Illumina, San Diego, CA), the SureSelect Human All Exon kit (Agile
nt, Santa Clara, CA) or the Twist Human Core Exome was used for enrichment, and a Nextseq500, HiSeq4000, or Novoseq 6000 (Illumina) instrument was used for the actual sequencing, with the average coverage targeted to at least 100× or at least 98% of the target DNA covered 20×.
#+end_quote
Par défaut, on utilisera https://www.twistbioscience.com/products/ngs/alliance-panels#tab-3
ANnonce récente pour nouveau panel Twist : https://www.centogene.com/news-events/news/newsdetails/twist-bioscience-and-centogene-launch-three-panels-to-advance-rare-disease-and-hereditary-cancer-research-and-support-diagnostics
Masi pas de fichier BED
***** DONE Mail centogène
CLOSED: [2023-07-30 Sun 14:22] DEADLINE: <2023-07-23 Sun>
**** DONE Tester Nextera Rapid Capture Exome v1.2 (hg19) :giab:
CLOSED: [2023-08-06 Sun 19:05] SCHEDULED: <2023-08-03 Thu 19:00>
https://support.illumina.com/downloads/nextera-rapid-capture-exome-v1-2-product-files.html
***** DONE Liftover capture
CLOSED: [2023-08-06 Sun 18:30] SCHEDULED: <2023-08-06 Sun>
#+begin_src sh
 nextflow run -profile standard,helios workflows/lift-nextera-capture.nf  -lib lib
#+end_src
Vérification rapide : ok
***** DONE Run
CLOSED: [2023-08-06 Sun 19:05] SCHEDULED: <2023-08-06 Sun>
#+begin_src sh
 nextflow run workflows/compareVCF.nf -profile standard,helios --query=out/2300346867_NA12878-63118093_S260-GRCh38/callVariant/haplotypecaller/2300346867_NA12878-63118093_S260-GRCh38.vcf.gz --outdir=out/2300346867_NA12878-63118093_S260-GRCh38/happy-nextera-lifted/ --compare=happy -lib lib --capture=capture/nexterarapidcapture_exome_targetedregions_v1.2-nochrM_lifted.bed  --id=HG001 --genome=GRCh38
#+end_src
**** DONE Tester Agilent SureSelect All Exon V8 (hg38) :giab:
CLOSED: [2023-07-31 Mon 23:09] SCHEDULED: <2023-07-31 Mon>
https://earray.chem.agilent.com/suredesign/index.htm
"Find design"
"Agilent catalog"
Fichiers:
- Regions.bed: Targeted exon intervals, curated and targeted by Agilent Technologies
- MergedProbes.bed: Merged probes for targeted enrichment of exons described in Regions.bed
- Covered.bed: Merged probes and sequences with 95% homology or above
- Padded.bed: Merged probes and sequences with 95% homology or above extended 50 bp at each side
- AllTracks.bed: Targeted regions and covered tracks
 #+begin_src sh
nextflow run workflows/compareVCF.nf -profile standard,helios --query=out/2300346867_63118093_NA12878-GRCh38/callVariant/haplotypecaller/2300346867_63118093_NA12878-GRCh38.vcf.gz --outdir=out/2300346867_63118093_NA12878-GRCh38/happy/ --compare=happy -lib lib --capture=capture/Agilent_SureSelect_All_Exons_v8_hg38_Regions.bed  --id=HG001 --genome=GRCh38
 #+end_src
| Type  | Filter | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision | METRIC.Frac_NA | METRIC.F1_Score | TRUTH.TOTAL.TiTv_ratio | QUERY.TOTAL.TiTv_ratio | TRUTH.TOTAL.het_hom_ratio | QUERY.TOTAL.het_hom_ratio |
| INDEL | ALL    |         423 |      395 |       28 |         915 |      108 |       405 |     4 |    13 |      0.933806 |         0.788235 |       0.442623 |        0.854868 |                        |                        |        1.7012987012987013 |        2.7916666666666665 |
| INDEL | PASS   |         423 |      395 |       28 |         915 |      108 |       405 |     4 |    13 |      0.933806 |         0.788235 |       0.442623 |        0.854868 |                        |                        |        1.7012987012987013 |        2.7916666666666665 |
| SNP   | ALL    |       20984 |    20600 |      384 |       26080 |      780 |      4703 |    62 |    10 |        0.9817 |         0.963512 |        0.18033 |        0.972521 |     3.0499710592321048 |     2.7596541786743516 |          1.58256372367935 |        1.8978207694018234 |
| SNP   | PASS   |       20984 |    20600 |      384 |       26080 |      780 |      4703 |    62 |    10 |        0.9817 |         0.963512 |        0.18033 |        0.972521 |     3.0499710592321048 |     2.7596541786743516 |          1.58256372367935 |        1.8978207694018234 |
**** DONE Test Twist Human core Exome (hg38):giab:
CLOSED: [2023-08-01 Tue 23:16] SCHEDULED: <202 3-08-02 Wed>
https://www.twistbioscience.com/resources/data-files/ngs-human-core-exome-panel-bed-file
#+begin_src
nextflow run workflows/compareVCF.nf -profile standard,helios --query=out/2300346867_63118093_NA12878-GRCh38/callVariant/haplotypecaller/2300346867_63118093_NA12878-GRCh38.vcf.gz --outdir=out/2300346867_63118093_NA12878-GRCh38/happy-twist-exome-core/ --compare=happy -lib lib --capture=capture/Twist_Exome_Core_Covered_Targets_hg38.bed  --id=HG001 --genome=GRCh38 -bg
#+end_src
| Type  | Filter | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision | METRIC.Frac_NA | METRIC.F1_Score | TRUTH.TOTAL.TiTv_ratio | QUERY.TOTAL.TiTv_ratio | TRUTH.TOTAL.het_hom_ratio | QUERY.TOTAL.het_hom_ratio |
| INDEL | ALL    |         328 |      313 |       15 |         722 |       95 |       309 |     4 |    13 |      0.954268 |         0.769976 |       0.427978 |        0.852273 |                        |                        |        1.8584070796460177 |        2.8967391304347827 |
| INDEL | PASS   |         328 |      313 |       15 |         722 |       95 |       309 |     4 |    13 |      0.954268 |         0.769976 |       0.427978 |        0.852273 |                        |                        |        1.8584070796460177 |        2.8967391304347827 |
| SNP   | ALL    |       19198 |    18962 |      236 |       23381 |      684 |      3738 |    48 |    10 |      0.987707 |         0.965178 |       0.159873 |        0.976313 |     3.1034188034188035 |      2.859264147830391 |        1.5669565217391304 |        1.8578767123287672 |
| SNP   | PASS   |       19198 |    18962 |      236 |       23381 |      684 |      3738 |    48 |    10 |      0.987707 |         0.965178 |       0.159873 |        0.976313 |     3.1034188034188035 |      2.859264147830391 |        1.5669565217391304 |        1.8578767123287672 |
**** DONE Test Twist Human core Exome (hg38):giab:
CLOSED: [2023-08-05 Sat 09:25] SCHEDULED: <2023-08-03 Thu 20:00>
#+begin_src sh
ID="2300346867_NA12878-63118093_S260-GRCh38"; nextflow run workflows/compareVCF.nf -profile standard,helios --query=out/${ID}/callVariant/haplotypecaller/${ID}.vcf.gz --outdir=out/${ID}/happy-twist-exome-core/ --compare=happy -lib lib --capture=capture/Twist_Exome_Core_Covered_Targets_hg38.bed  --id=HG001 --genome=GRCh38 -bg
#+end_src
**** DONE Tester Agilen SureSelect All Exon V8 (hg38) GATK-4.4:giab:
CLOSED: [2023-08-05 Sat 09:25] SCHEDULED: <2023-08-03 Thu 20:00>
**** DONE Vérifier l'impact gatk 4.3 - 4.4 : aucun
CLOSED: [2023-08-05 Sat 09:25]
**** DONE Figure comparant les 3 capture :hg001:
CLOSED: [2023-08-06 Sun 20:24] SCHEDULED: <2023-08-06 Sun>
**** DONE Mail Paul sur  les 3 capture :hg001:
CLOSED: [2023-08-06 Sun 20:24] SCHEDULED: <2023-08-06 Sun>
**** KILL Tester si le panel Twist Alliance VCGS Exome suffit
CLOSED: [2023-07-31 Mon 22:31] SCHEDULED: <2023-07-30 Sun>
**** PROJ Comparer happy et happy-vcfeval :giab:
**** WAIT Mail cento pour demande le type de capture
/Entered on/ [2023-08-07 Mon 20:40]
** TODO Données CHM13 :chm:
https://github.com/lh3/CHM-eval
*** TODO Run ERR1341793
SCHEDULED: <2023-10-07 Sat>
(raw reads ERR1341793_1.fastq.gz and ERR1341793_2.fastq.gz downloaded from https://www.ebi.ac.uk/ena/browser/view/ERR1341793)
*** TODO Run ERR1341796
SCHEDULED: <2023-10-07 Sat>
** TODO Insilico :cento:
*** TODO tous les variants centogène
**** DONE Extraire liste des SNVs
CLOSED: [2023-04-22 Sat 17:32] SCHEDULED: <2023-04-17 Mon>
***** DONE Corriger manquant à la main
CLOSED: [2023-04-22 Sat 17:31]
La sortie est sauvegardé dans git-annex : variants_success.csv
***** DONE Automatique
CLOSED: [2023-04-22 Sat 17:31]
**** DONE Convert SNVs : transcript -> génomique
CLOSED: [2023-06-03 Sat 17:16]
***** DONE Variant_recoder
CLOSED: [2023-

[37.17453]

[38.8467]

  2841 |    26 |    58 |      0.977661 |         0.529245 |
Hg38
| Type  | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision |
| INDEL |         549 |      489 |       60 |         899 |       64 |       340 |     8 |    17 |      0.890710 |         0.885510 |
| SNP   |       21973 |    21462 |      511 |       26285 |      563 |      4263 |    68 |    16 |      0.976744 |         0.974435 |
****** DONE Interesection des bed: similaire
CLOSED: [2023-07-04 Tue 23:11]
HG38
 #+begin_src sh
 bedtools intersect -a capture/Agilent_SureSelect_All_Exons_v7_hg38_Regions.bed -b /Work/Groups/bisonex/data/giab/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.bed  | wc -l
 #+end_src
 204280
 T2T
 #+begin_src sh
 bedtools intersect -a /Work/Groups/bisonex/data/giab/T2T/Agilent_SureSelect_All_Exons_v7_hg38_Regions_hg38_T2T.bed -b /Work/Groups/bisonex/data/giab/T2T/HG001_GRCh38_1_22_v4.2.1_benchmark_hg38_T2T.bed  | wc -l
 #+end_src
 204021
****** DONE Vérifier la ligne de commande
CLOSED: [2023-07-04 Tue 23:38]
#+begin_src sh
hap.py \
    HG001_GRCh38_1_22_v4_lifted_merged.vcf.gz \
    HG001-SRX11061486_SRR14724513-T2T.vcf.gz \
     \
    --reference chm13v2.0.fa \
    --threads 6 \
     \
    -T Agilent_SureSelect_All_Exons_v7_hg38_Regions_hg38_T2T.bed \
    --false-positives HG001_GRCh38_1_22_v4.2.1_benchmark_hg38_T2T.bed \
     \
    -o HG001
#+end_src
****** DONE Corriger FILTER : mieux mais toujours trop de négatifs. 3/4 SNP retrouvés
CLOSED: [2023-07-08 Sat 15:19] SCHEDULED: <2023-07-08 Sat>
 Type Filter  TRUTH.TOTAL  TRUTH.TP  TRUTH.FN  QUERY.TOTAL  QUERY.FP  QUERY.UNK  FP.gt  FP.al  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA  METRIC.F1_Score  TRUTH.TOTAL.TiTv_ratio  QUERY.TOTAL.TiTv_ratio  TRUTH.TOTAL.het_hom_ratio  QUERY.TOTAL.het_hom_ratio
INDEL    ALL          413       246       167          751       289        215      2     98       0.595642          0.460821        0.286285         0.519629                     NaN                     NaN                   2.428571                   2.465116
INDEL   PASS          413       246       167          751       289        215      2     98       0.595642          0.460821        0.286285         0.519629                     NaN                     NaN                   2.428571                   2.465116
  SNP    ALL        15883     15479       404        23597      5277       2841     46     44       0.974564          0.745760        0.120397         0.844947                3.017198                 2.85705                   5.560099                   2.114633
  SNP   PASS        15883     15479       404        23597      5277       2841     46     44       0.974564          0.745760        0.120397         0.844947                3.017198                 2.85705                   5.560099                   2.114633
******* DONE Vérifier qu'il ne reste plus de filtre autre que PASS
CLOSED: [2023-07-08 Sat 15:19]
#+begin_src
$ zgrep -c 'PASS' HG001_GRCh38_1_22_v4_lifted_merged.vcf.gz
3730505
$ zgrep -c '^chr' HG001_GRCh38_1_22_v4_lifted_merged.vcf.gz
3730506
#+end_src
****** TODO 1/4 SNP manquant ?
******* DONE Regarder avec Julia si ce sont vraiment des FP: 61/5277 qui ne le sont pas
CLOSED: [2023-07-09 Sun 12:09]
******* DONE Examiner les FP
CLOSED: [2023-07-30 Sun 22:05]
******* DONE Tester un FP
CLOSED: [2023-07-30 Sun 22:05]
  2 │ chr1        608765  A           G           ./.:.:.:.:NOCALL:nocall:.  1/1:FP:.:ti:SNP:homalt:188
  liftDown UCSC: rien en GIAB : vrai FP
 3 │ chr1        762943  A           G           ./.:.:.:.:NOCALL:nocall:.  1/1:FP:.:ti:SNP:homalt:287
 4 │ chr1        762945  A           T           ./.:.:.:.:NOCALL:nocall:.  1/1:FP:.:tv:SNP:homalt:287
 Remaniements complexes ? Pas dans le gène en HG38
******* DONE La plupart des FP (4705/5566) sont homozygotes: erreur de référence ?
CLOSED: [2023-07-12 Wed 21:10] SCHEDULED: <2023-07-09 Sun>
Sur les 2 premiers variants, ils montrent en fait la différence entre T2T et GRCh38
Erreur à l'alignement ?
******** KILL relancer l'alignement
CLOSED: [2023-07-09 Sun 17:36]
******** DONE vérifier reads identiques hg38 et T2T: oui
CLOSED: [2023-07-09 Sun 16:36]
T2T CHR1608765
38   	chr1:1180168-1180168 (
SRR14724513.24448214
SRR14724513.24448214
******* DONE Vérifier quelques variants sur IGV
CLOSED: [2023-07-09 Sun 17:36]
******* KILL Répartition des FP : cluster ?
CLOSED: [2023-07-09 Sun 17:36]
****** DONE Examiner les FP restant après correction selon séquence de référence
CLOSED: [2023-08-12 Sat 15:57]
****** HOLD Examiner les variants supprimé
****** TODO Enlever les FP qui correspondent à un changement dans le génome
******* Condition:
- pas de variation à la position en GRCh38
- variantion homozygote
- la varation en T2T correspond au changement de pair de base GRC38 -> T2T
  pour les SNP:
  alt_T2T[i] = DNA_GRC38[j]
  avec i la position en T2T et j la position en GRCh38
  Note: définir un ID n'est pas correct car les variants peuvent être modifié par happy !
******* Idée
 - Pour chaque FP, c'est un "faux" FP si
     - REF en hg38 == ALT en T2T
     - et REF en hg38 != REF en T2T
     - et variant homozygote
Comment obtenir les séquences de réferences ?
1. liftover
2. blat sur la séquence autour du variant
3. identifier quelques reads contenant le variant et regarder leur aligneement en hg38
Après discussion avec Alexis: solution 3
******* Algorithme
1. Extraire les coordonnées en T2T des faux positifs *homozygote*
2. Pour chaque faux positif
   1. lister 10 reads contenant le variant
   2. pour chacun de ces reads, récupérer la séquence en T2T et GRCh38 via le nom du read dans le bam
   3. si la séquence en T2T modifiée par le variant est "identique" à celle en GRCh38, alors on ignore ce faux positif
Note: on ignore les reads qui ont changé de chromosome entre les version
******* DONE Résultat préliminaire
CLOSED: [2023-07-23 Sun 14:30]
cf [[file:~/roam/research/bisonex/code/giab/giab-corrected.csv][script julia]]
3498 faux positifs en moins, soit 0.89 sensibilité
julia> tp=15479
julia> fp=5277
julia> tp/(tp+fp)
0.7457602620928888
julia> tp/(tp+(fp-3498))
0.8969173716537258
On est toujours en dessous des 97%
******* HOLD Corriger proprement VCF ou résultats Happy
******* TODO Adapter pour gérer plusieurs variants par read
****** DONE Méthodologie du pangenome
CLOSED: [2023-10-03 Tue 21:28]
Voir biblio[cite:@liao2023]  mais ont aligné sur GRCH38
******* DONE Mail alexis
CLOSED: [2023-10-03 Tue 21:28]
****** TODO Méthodologie T2T
Mail alexis
SCHEDULED: <2023-10-04 Wed>
***** KILL Mail Yannis
CLOSED: [2023-07-08 Sat 10:44]
***** DONE Mail GIAB pour version T2T
CLOSED: [2023-07-07 Fri 18:37]
**** TODO HG002 :hg002:T2T:
**** TODO HG003 :hg003:T2T:
**** TODO HG004 :hg004:T2T:
**** DONE Plot : ashkenazim trio :hg38:
CLOSED: [2023-07-30 Sun 16:49] SCHEDULED: <2023-07-30 Sun 15:00>
:LOGBOOK:
CLOCK: [2023-07-30 Sun 16:06]--[2023-07-30 Sun 16:35] =>  0:29
CLOCK: [2023-07-30 Sun 15:39]--[2023-07-30 Sun 15:40] =>  0:01
:END:
/Entered on/ [2023-04-16 Sun 17:29]
Refaire résultats
**** DONE Mail Paul sur les résultat ashkenazim +/- centogene
CLOSED: [2023-08-06 Sun 20:24] SCHEDULED: <2023-08-06 Sun>
**** DONE Relancer comparaison GIAB avec GATK 4.4.0
CLOSED: [2023-08-12 Sat 15:55]
/Entered on/ [2023-08-03 Thu 12:42]
*** KILL Platinum genome
CLOSED: [2023-06-14 Wed 22:37]
https://emea.illumina.com/platinumgenomes.html
*** DONE Séquencer NA12878 :cento:hg001:
CLOSED: [2023-10-07 Sat 17:59]
Discussion avec Paul : sous-traitant ne nous donnera pas les données, il faut commander l'ADN
**** DONE ADN commandé
CLOSED: [2023-06-30 Fri 22:29]
**** DONE Sauvegarder les données brutes
CLOSED: [2023-07-30 Sun 14:22] SCHEDULED: <2023-07-19 Wed>
K, scality, S
**** KILL Récupérer le fichier de capture
CLOSED: [2023-07-30 Sun 14:25] SCHEDULED: <2023-07-23 Sun>
Candidats donnés dans publication https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8354858/
#+begin_quote
In short, the Nextera Rapid Capture Exome Kit (Illumina, San Diego, CA), the SureSelect Human All Exon kit (Agilent, Santa Clara, CA) or the Twist Human Core Exome was used for enrichment, and a Nextseq500, HiSeq4000, or Novoseq 6000 (Illumina) instrument was used for the actual sequencing, with the average coverage targeted to at least 100× or at least 98% of the target DNA covered 20×.
#+end_quote
Par défaut, on utilisera https://www.twistbioscience.com/products/ngs/alliance-panels#tab-3
ANnonce récente pour nouveau panel Twist : https://www.centogene.com/news-events/news/newsdetails/twist-bioscience-and-centogene-launch-three-panels-to-advance-rare-disease-and-hereditary-cancer-research-and-support-diagnostics
Masi pas de fichier BED
***** DONE Mail centogène
CLOSED: [2023-07-30 Sun 14:22] DEADLINE: <2023-07-23 Sun>
**** DONE Tester Nextera Rapid Capture Exome v1.2 (hg19) :giab:
CLOSED: [2023-08-06 Sun 19:05] SCHEDULED: <2023-08-03 Thu 19:00>
https://support.illumina.com/downloads/nextera-rapid-capture-exome-v1-2-product-files.html
***** DONE Liftover capture
CLOSED: [2023-08-06 Sun 18:30] SCHEDULED: <2023-08-06 Sun>
#+begin_src sh
 nextflow run -profile standard,helios workflows/lift-nextera-capture.nf  -lib lib
#+end_src
Vérification rapide : ok
***** DONE Run
CLOSED: [2023-08-06 Sun 19:05] SCHEDULED: <2023-08-06 Sun>
#+begin_src sh
 nextflow run workflows/compareVCF.nf -profile standard,helios --query=out/2300346867_NA12878-63118093_S260-GRCh38/callVariant/haplotypecaller/2300346867_NA12878-63118093_S260-GRCh38.vcf.gz --outdir=out/2300346867_NA12878-63118093_S260-GRCh38/happy-nextera-lifted/ --compare=happy -lib lib --capture=capture/nexterarapidcapture_exome_targetedregions_v1.2-nochrM_lifted.bed  --id=HG001 --genome=GRCh38
#+end_src
**** DONE Tester Agilent SureSelect All Exon V8 (hg38) :giab:
CLOSED: [2023-07-31 Mon 23:09] SCHEDULED: <2023-07-31 Mon>
https://earray.chem.agilent.com/suredesign/index.htm
"Find design"
"Agilent catalog"
Fichiers:
- Regions.bed: Targeted exon intervals, curated and targeted by Agilent Technologies
- MergedProbes.bed: Merged probes for targeted enrichment of exons described in Regions.bed
- Covered.bed: Merged probes and sequences with 95% homology or above
- Padded.bed: Merged probes and sequences with 95% homology or above extended 50 bp at each side
- AllTracks.bed: Targeted regions and covered tracks
 #+begin_src sh
nextflow run workflows/compareVCF.nf -profile standard,helios --query=out/2300346867_63118093_NA12878-GRCh38/callVariant/haplotypecaller/2300346867_63118093_NA12878-GRCh38.vcf.gz --outdir=out/2300346867_63118093_NA12878-GRCh38/happy/ --compare=happy -lib lib --capture=capture/Agilent_SureSelect_All_Exons_v8_hg38_Regions.bed  --id=HG001 --genome=GRCh38
 #+end_src
| Type  | Filter | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision | METRIC.Frac_NA | METRIC.F1_Score | TRUTH.TOTAL.TiTv_ratio | QUERY.TOTAL.TiTv_ratio | TRUTH.TOTAL.het_hom_ratio | QUERY.TOTAL.het_hom_ratio |
| INDEL | ALL    |         423 |      395 |       28 |         915 |      108 |       405 |     4 |    13 |      0.933806 |         0.788235 |       0.442623 |        0.854868 |                        |                        |        1.7012987012987013 |        2.7916666666666665 |
| INDEL | PASS   |         423 |      395 |       28 |         915 |      108 |       405 |     4 |    13 |      0.933806 |         0.788235 |       0.442623 |        0.854868 |                        |                        |        1.7012987012987013 |        2.7916666666666665 |
| SNP   | ALL    |       20984 |    20600 |      384 |       26080 |      780 |      4703 |    62 |    10 |        0.9817 |         0.963512 |        0.18033 |        0.972521 |     3.0499710592321048 |     2.7596541786743516 |          1.58256372367935 |        1.8978207694018234 |
| SNP   | PASS   |       20984 |    20600 |      384 |       26080 |      780 |      4703 |    62 |    10 |        0.9817 |         0.963512 |        0.18033 |        0.972521 |     3.0499710592321048 |     2.7596541786743516 |          1.58256372367935 |        1.8978207694018234 |
**** DONE Test Twist Human core Exome (hg38):giab:
CLOSED: [2023-08-01 Tue 23:16] SCHEDULED: <202 3-08-02 Wed>
https://www.twistbioscience.com/resources/data-files/ngs-human-core-exome-panel-bed-file
#+begin_src
nextflow run workflows/compareVCF.nf -profile standard,helios --query=out/2300346867_63118093_NA12878-GRCh38/callVariant/haplotypecaller/2300346867_63118093_NA12878-GRCh38.vcf.gz --outdir=out/2300346867_63118093_NA12878-GRCh38/happy-twist-exome-core/ --compare=happy -lib lib --capture=capture/Twist_Exome_Core_Covered_Targets_hg38.bed  --id=HG001 --genome=GRCh38 -bg
#+end_src
| Type  | Filter | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision | METRIC.Frac_NA | METRIC.F1_Score | TRUTH.TOTAL.TiTv_ratio | QUERY.TOTAL.TiTv_ratio | TRUTH.TOTAL.het_hom_ratio | QUERY.TOTAL.het_hom_ratio |
| INDEL | ALL    |         328 |      313 |       15 |         722 |       95 |       309 |     4 |    13 |      0.954268 |         0.769976 |       0.427978 |        0.852273 |                        |                        |        1.8584070796460177 |        2.8967391304347827 |
| INDEL | PASS   |         328 |      313 |       15 |         722 |       95 |       309 |     4 |    13 |      0.954268 |         0.769976 |       0.427978 |        0.852273 |                        |                        |        1.8584070796460177 |        2.8967391304347827 |
| SNP   | ALL    |       19198 |    18962 |      236 |       23381 |      684 |      3738 |    48 |    10 |      0.987707 |         0.965178 |       0.159873 |        0.976313 |     3.1034188034188035 |      2.859264147830391 |        1.5669565217391304 |        1.8578767123287672 |
| SNP   | PASS   |       19198 |    18962 |      236 |       23381 |      684 |      3738 |    48 |    10 |      0.987707 |         0.965178 |       0.159873 |        0.976313 |     3.1034188034188035 |      2.859264147830391 |        1.5669565217391304 |        1.8578767123287672 |
**** DONE Test Twist Human core Exome (hg38):giab:
CLOSED: [2023-08-05 Sat 09:25] SCHEDULED: <2023-08-03 Thu 20:00>
#+begin_src sh
ID="2300346867_NA12878-63118093_S260-GRCh38"; nextflow run workflows/compareVCF.nf -profile standard,helios --query=out/${ID}/callVariant/haplotypecaller/${ID}.vcf.gz --outdir=out/${ID}/happy-twist-exome-core/ --compare=happy -lib lib --capture=capture/Twist_Exome_Core_Covered_Targets_hg38.bed  --id=HG001 --genome=GRCh38 -bg
#+end_src
**** DONE Tester Agilen SureSelect All Exon V8 (hg38) GATK-4.4:giab:
CLOSED: [2023-08-05 Sat 09:25] SCHEDULED: <2023-08-03 Thu 20:00>
**** DONE Vérifier l'impact gatk 4.3 - 4.4 : aucun
CLOSED: [2023-08-05 Sat 09:25]
**** DONE Figure comparant les 3 capture :hg001:
CLOSED: [2023-08-06 Sun 20:24] SCHEDULED: <2023-08-06 Sun>
**** DONE Mail Paul sur  les 3 capture :hg001:
CLOSED: [2023-08-06 Sun 20:24] SCHEDULED: <2023-08-06 Sun>
**** KILL Tester si le panel Twist Alliance VCGS Exome suffit
CLOSED: [2023-07-31 Mon 22:31] SCHEDULED: <2023-07-30 Sun>
**** DONE Mail cento pour demande le type de capture
CLOSED: [2023-10-07 Sat 17:59]
/Entered on/ [2023-08-07 Mon 20:40]
Twist exome
*** PROJ Comparer happy et happy-vcfeval :giab:
** TODO Données CHM13 :chm:
https://github.com/lh3/CHM-eval
*** TODO Run ERR1341793
SCHEDULED: <2023-10-14 Sat>
(raw reads ERR1341793_1.fastq.gz and ERR1341793_2.fastq.gz downloaded from https://www.ebi.ac.uk/ena/browser/view/ERR1341793)
*** TODO Run ERR1341796
SCHEDULED: <2023-10-14 Sat>
** TODO Insilico :cento:
*** TODO tous les variants centogène
**** DONE Extraire liste des SNVs
CLOSED: [2023-04-22 Sat 17:32] SCHEDULED: <2023-04-17 Mon>
***** DONE Corriger manquant à la main
CLOSED: [2023-04-22 Sat 17:31]
La sortie est sauvegardé dans git-annex : variants_success.csv
***** DONE Automatique
CLOSED: [2023-04-22 Sat 17:31]
**** DONE Convert SNVs : transcript -> génomique
CLOSED: [2023-06-03 Sat 17:16]
***** DONE Variant_recoder
CLOSED: [2023-

Replacement in projects/bisonex.org at line 76 [26.35]

B:BD[10.33265] → [3.30:8559]

LED: <2023-10-10 Tue>
**** DONE En T2T avec liftover (filtre = spip) : ok mais lent et trop de variants :tests:
CLOSED: [2023-09-17 Sun 17:13] SCHEDULED: <2023-09-17 Sun>
1. Conversion en bed
#+begin_src sh :dir:~/code/sanger
open snvs-cento-sanger.csv | select chrom pos | insert pos2 {$in.pos } | to csv --separator="\t" | save snvs-cento-sanger.bed -f
#+end_src
2. Liftover avec UCSC (en ligne)
NB: vérifié sur le premier résultat en cherche le read contenant le variant (samtools view -r puis samtools view | grep en T2T) et avec l'aide d'IGV, on a un variant qui correspond en
chr1:10757746
3. En supposant que l'ordre des variants n'a pas changé, on ajoute simplement REF et ALT avec annotateLifted.jl
Annotation spip *très lente* : 1h13 !
Résultat:
2×3 DataFrame
 Row │ variant              meanQual  depth
     │ String               Float64   Int64
─────┼──────────────────────────────────────
   1 │ chr12:g.13594572      60.0      1
   2 │ chr17:g.10204026      60.0      1
144 found over 146
filter depth : another 0 missed variants
filter poly : another 0 missed variants
filter vep   : another 0 missed variants
Et on a trop de variants en sortie (7330 !)
**** DONE Mail Paul avec résultats filtre en T2T + nouveau schéma
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
* Ré-interprétation
** PROJ Lancer tests sur données brutes [146/250]
- [X] 100222_63015289
- [X] 1600304839_63051311
- [X] 1900007827_62913191
- [X] 1900398899_62999500
- [X] 1900486799_62913197
- [X] 2100422923_62952677
- [X] 2100458888_62933047
- [X] 2100601558_62903840
- [X] 2100609288_62905768
- [X] 2100609501_62905776
- [X] 2100614493_62951074
- [X] 2100622566_62908067
- [X] 2100622601_62908060
- [X] 2100622705_62908063
- [X] 2100640027_62911936
- [X] 2100645285_62913212
- [X] 2100661411_62914081
- [X] 2100661462_62914086
- [X] 2100708257_62921596
- [X] 2100738732_62926501
- [X] 2100738850_62926509
- [X] 2100746751_62926505
- [X] 2100746797_62926506
- [X] 2100782349_62931722
- [X] 2100782416_62931561
- [X] 2100782559_62931718
- [X] 2100799204_62934768
- [X] 2200010202_62940284
- [X] 2200023600_62940631
- [X] 2200024348_62999591
- [X] 2200027505_62942457
- [X] 2200038776_62943412
- [X] 2200041919_62943405
- [X] 2200088014_62951326
- [X] 2200146652_62959388
- [X] 2200151850_62960953
- [X] 2200160014_62959475
- [X] 2200160070_62959478
- [X] 2200201368_62967471
- [X] 2200201400_62967470
- [X] 2200265558_62976332
- [X] 2200265605_62976401
- [X] 2200267046_62975192
- [X] 2200273878_62999530
- [X] 2200279708_62977002
- [X] 2200284408_62979102
- [X] 2200293987_62979116
- [X] 2200294359_62979118
- [X] 2200306299_62982217
- [X] 2200306539_62982193
- [X] 220030671_62982211
- [X] 2200307058_62982231
- [X] 2200307108_62982196
- [X] 2200307136_62982221
- [X] 2200307199_62982239
- [X] 2200307230_62982234
- [X] 2200307262_62982219
- [X] 2200307297_62982227
- [X] 2200324510_62985453
- [X] 2200324549_62985478
- [X] 2200324573_62985445
- [X] 2200324594_62985467
- [X] 2200324606_62985463
- [X] 2200324614_62985459
- [X] 2200338306_62985430
- [X] 2200343880_62989407
- [X] 2200343910_62989460
- [X] 2200343938_62989451
- [X] 2200343966_62989456
- [X] 2200343993_62989440
- [X] 2200344013_62989464
- [X] 2200349749_62989465
- [X] 2200363462_62988848
- [X] 2200377880_62991993
- [X] 2200378032_62991991
- [X] 2200383996_62993828
- [X] 2200384015_62993796
- [X] 2200384046_62993822
- [X] 2200384117_62993808
- [X] 2200384187_62993825
- [X] 2200384231_62992898
- [X] 2200385658_63060260
- [X] 2200394260_62994732
- [X] 2200395817_62994742
- [X] 2200396731_62994737
- [X] 2200424073_62999579
- [X] 2200424207_62999632
- [X] 2200426178_62999630
- [X] 2200426243_62999635
- [X] 2200426466_62999605
- [X] 2200426642_62999627
- [X] 2200427406_62999649
- [X] 2200427512_62999639
- [X] 2200428953_62999572
- [X] 2200428981_62999600
- [X] 2200428999_62999592
- [X] 2200441970_63000868
- [X] 2200441989_63000882
- [X] 2200442135_63000864
- [X] 2200442216_63000886
- [X] 2200442257_63000951
- [X] 2200451801_63003573
- [X] 2200451862_63004218
- [X] 2200451894_63004210
- [X] 2200456165_63051294
- [X] 2200459865_63004933
- [X] 2200459968_63004937
- [X] 2200460073_63004943
- [X] 2200460121_63004684
- [X] 2200467051_63003856
- [X] 2200467225_63004940
- [X] 2200467261_63004930
- [X] 2200467338_63004925
- [X] 2200470099_63004485
- [X] 2200470142_63004480
- [X] 2200471780_63004362
- [X] 2200480910_63006466
- [X] 2200495073_63010427
- [X] 2200495510_63009152
- [X] 2200508677_63060252
- [X] 2200510531_63012582
- [X] 2200510628_63012549
- [X] 2200510657_63012554
- [X] 2200511249_63012533
- [X] 2200511274_63012586
- [X] 2200517952_63060399
- [X] 2200519525_63060439
- [X] 2200524009_63014044
- [ ] 2200524609_63014046
- [X] 2200524616_63014048
- [X] 2200533429_63060425
- [X] 2200539735_63060406
- [X] 2200549908_63019339
- [X] 2200549965_63019349
- [X] 2200550414_63019357
- [X] 2200550471_63020031
- [X] 2200550490_63019351
- [X] 2200550505_63019340
- [X] 2200555565_63018614
- [X] 2200559438_63020029
- [X] 2200559682_63020030
- [X] 2200559713_63019623
- [X] 2200559739_63019626
- [X] 2200569969_63019991
- [X] 2200570001_63021580
- [X] 2200570025_63021490
- [X] 2200570035_63021491
- [ ] 2200570042_63021493
- [ ] 2200570050_63021494
- [ ] 2200579897_63024910
- [ ] 2200583995_63024866
- [ ] 2200584035_63024905
- [ ] 2200584069_63024888
- [ ] 2200584126_63024810
- [ ] 2200589507_63026712
- [ ] 2200597365_63027994
- [ ] 2200597480_63027988
- [ ] 2200597752_63026853
- [ ] 2200597778_63027992
- [ ] 22005977_63026903
- [ ] 2200609031_63026527
- [ ] 2200614198_63113928
- [ ] 2200620372_63030821
- [ ] 2200620442_63030810
- [ ] 2200620498_63030816
- [ ] 2200620628_63031031
- [ ] 2200622310_63030984
- [ ] 2200622355_63030956
- [ ] 2200625369_63028699
- [ ] 2200625410_63028697
- [ ] 2200625536_63028694
- [ ] 2200630189_63030665
- [ ] 2200635149_63033182
- [ ] 2200644544_63037731
- [ ] 2200644594_63037725
- [ ] 2200650089_63038093
- [ ] 2200666292_63076568
- [ ] 2200669188_63036688
- [ ] 2200669320_63040259
- [ ] 2200669383_63040254
- [ ] 2200669414_63040257
- [ ] 2200669446_63040251
- [ ] 2200680342_63105271
- [ ] 2200694535_63042853
- [ ] 2200694789_63042862
- [ ] 2200694858_63042702
- [ ] 2200694917_63042696
- [ ] 2200699290_63043047
- [ ] 2200699345_63040238
- [ ] 2200699383_63043050
- [ ] 2200699412_63040731
- [ ] 220071551_63048935
- [ ] 2200731515_63048963
- [ ] 2200748145_63051198
- [ ] 2200748171_63051213
- [ ] 2200751046_63051249
- [ ] 2200751101_63051234
- [ ] 2200766471_63054590
- [ ] 2200767731_63054595
- [ ] 2200767822_63054464
- [ ] 2200775505_63060410
- [ ] 2200850441_63019345
- [ ] 220597589_63026879
- [ ] 2300003253_63060430
- [ ] 2300005679_63060370
- [ ] 2300009914_63060390
- [ ] 2300028784_63060001
- [ ] 2300036815_63063357
- [ ] 2300055382_63061874
- [ ] 2300055421_63061871
- [ ] 2300055440_63061880
- [ ] 230006894_63064950
- [ ] 2300071111_63070356
- [ ] 2300083434_63071675
- [ ] 2300103609_63076239
- [ ] 2300104572_63076232
- [ ] 2300109602_63076765
- [ ] 2300109665_63076770
- [ ] 2300119721_63078732
- [ ] 2300137773_63078133
- [ ] 2300137834_63078123
- [ ] 2300167821_63086183
- [ ] 2300172698_63113453
- [ ] 2300188216_63090609
- [ ] 2300188281_63090632
- [ ] 2300188800_63090616
- [ ] 2300193193645_63090623
- [ ] 2300193668_63090611
- [ ] 2300195426_63090608
- [ ] 2300201017_63089636
- [ ] 2300227479_63098330
- [ ] 2300232688_63130821
- [ ] 2300292749_63109239
- [ ] 230029277_63109247
- [ ] 2300294712_63109236
- [ ] 2300308032_63111581
- [ ] 2300323537_63114209
- [ ] 2300334609_63115535
- [ ] 2300346867_63118093
- [ ] 2300346867_63118093_NA12878
- [ ] 2300348940_63118099
- [ ] 2300359806_63119915
- [ ] 2300380476_63123963
- [ ] 2300382582_63123749
- [ ] 2300384269_63126867
- [ ] 2300407581_63130826
- [ ] 2300407626_63130842
- [ ] 2300409593_63130874
- [ ] 2300409612_63130980
- [ ] 2300417623_63131524
* Résultats
** TODO Speed-up BWA-mem
SCHEDULED: <2023-10-08 Sun>
** TODO Speed-up Hapotypecaller
SCHEDULED: <2023-10-08 Sun>
* Communication
** DONE Mail NGS-diag
CLOSED: [2023-10-06 Fri 08:04] SCHEDULED: <2023-10-06 Fri>
/Entered on/ [2023-10-04 Wed 19:33]

[10.33265]

LED: <2023-10-10 Tue>
**** DONE En T2T avec liftover (filtre = spip) : ok mais lent et trop de variants :tests:
CLOSED: [2023-09-17 Sun 17:13] SCHEDULED: <2023-09-17 Sun>
1. Conversion en bed
#+begin_src sh :dir:~/code/sanger
open snvs-cento-sanger.csv | select chrom pos | insert pos2 {$in.pos } | to csv --separator="\t" | save snvs-cento-sanger.bed -f
#+end_src
2. Liftover avec UCSC (en ligne)
NB: vérifié sur le premier résultat en cherche le read contenant le variant (samtools view -r puis samtools view | grep en T2T) et avec l'aide d'IGV, on a un variant qui correspond en
chr1:10757746
3. En supposant que l'ordre des variants n'a pas changé, on ajoute simplement REF et ALT avec annotateLifted.jl
Annotation spip *très lente* : 1h13 !
Résultat:
2×3 DataFrame
 Row │ variant              meanQual  depth
     │ String               Float64   Int64
─────┼──────────────────────────────────────
   1 │ chr12:g.13594572      60.0      1
   2 │ chr17:g.10204026      60.0      1
144 found over 146
filter depth : another 0 missed variants
filter poly : another 0 missed variants
filter vep   : another 0 missed variants
Et on a trop de variants en sortie (7330 !)
**** DONE Mail Paul avec résultats filtre en T2T + nouveau schéma
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
* Ré-interprétation
** PROJ Lancer tests sur données brutes [167/250] <(samples.csv)>  <(runs.waiting)>
- [X] 100222_63015289
- [X] 1600304839_63051311
- [X] 1900007827_62913191
- [X] 1900398899_62999500
- [X] 1900486799_62913197
- [X] 2100422923_62952677
- [X] 2100458888_62933047
- [X] 2100601558_62903840
- [X] 2100609288_62905768
- [X] 2100609501_62905776
- [X] 2100614493_62951074
- [X] 2100622566_62908067
- [X] 2100622601_62908060
- [X] 2100622705_62908063
- [X] 2100640027_62911936
- [X] 2100645285_62913212
- [X] 2100661411_62914081
- [X] 2100661462_62914086
- [X] 2100708257_62921596
- [X] 2100738732_62926501
- [X] 2100738850_62926509
- [X] 2100746751_62926505
- [X] 2100746797_62926506
- [X] 2100782349_62931722
- [X] 2100782416_62931561
- [X] 2100782559_62931718
- [X] 2100799204_62934768
- [X] 2200010202_62940284
- [X] 2200023600_62940631
- [X] 2200024348_62999591
- [X] 2200027505_62942457
- [X] 2200038776_62943412
- [X] 2200041919_62943405
- [X] 2200088014_62951326
- [X] 2200146652_62959388
- [X] 2200151850_62960953
- [X] 2200160014_62959475
- [X] 2200160070_62959478
- [X] 2200201368_62967471
- [X] 2200201400_62967470
- [X] 2200265558_62976332
- [X] 2200265605_62976401
- [X] 2200267046_62975192
- [X] 2200273878_62999530
- [X] 2200279708_62977002
- [X] 2200284408_62979102
- [X] 2200293987_62979116
- [X] 2200294359_62979118
- [X] 2200306299_62982217
- [X] 2200306539_62982193
- [X] 220030671_62982211
- [X] 2200307058_62982231
- [X] 2200307108_62982196
- [X] 2200307136_62982221
- [X] 2200307199_62982239
- [X] 2200307230_62982234
- [X] 2200307262_62982219
- [X] 2200307297_62982227
- [X] 2200324510_62985453
- [X] 2200324549_62985478
- [X] 2200324573_62985445
- [X] 2200324594_62985467
- [X] 2200324606_62985463
- [X] 2200324614_62985459
- [X] 2200338306_62985430
- [X] 2200343880_62989407
- [X] 2200343910_62989460
- [X] 2200343938_62989451
- [X] 2200343966_62989456
- [X] 2200343993_62989440
- [X] 2200344013_62989464
- [X] 2200349749_62989465
- [X] 2200363462_62988848
- [X] 2200377880_62991993
- [X] 2200378032_62991991
- [X] 2200383996_62993828
- [X] 2200384015_62993796
- [X] 2200384046_62993822
- [X] 2200384117_62993808
- [X] 2200384187_62993825
- [X] 2200384231_62992898
- [X] 2200385658_63060260
- [X] 2200394260_62994732
- [X] 2200395817_62994742
- [X] 2200396731_62994737
- [X] 2200424073_62999579
- [X] 2200424207_62999632
- [X] 2200426178_62999630
- [X] 2200426243_62999635
- [X] 2200426466_62999605
- [X] 2200426642_62999627
- [X] 2200427406_62999649
- [X] 2200427512_62999639
- [X] 2200428953_62999572
- [X] 2200428981_62999600
- [X] 2200428999_62999592
- [X] 2200441970_63000868
- [X] 2200441989_63000882
- [X] 2200442135_63000864
- [X] 2200442216_63000886
- [X] 2200442257_63000951
- [X] 2200451801_63003573
- [X] 2200451862_63004218
- [X] 2200451894_63004210
- [X] 2200456165_63051294
- [X] 2200459865_63004933
- [X] 2200459968_63004937
- [X] 2200460073_63004943
- [X] 2200460121_63004684
- [X] 2200467051_63003856
- [X] 2200467225_63004940
- [X] 2200467261_63004930
- [X] 2200467338_63004925
- [X] 2200470099_63004485
- [X] 2200470142_63004480
- [X] 2200471780_63004362
- [X] 2200480910_63006466
- [X] 2200495073_63010427
- [X] 2200495510_63009152
- [X] 2200508677_63060252
- [X] 2200510531_63012582
- [X] 2200510628_63012549
- [X] 2200510657_63012554
- [X] 2200511249_63012533
- [X] 2200511274_63012586
- [X] 2200517952_63060399
- [X] 2200519525_63060439
- [X] 2200524009_63014044
- [X] 2200524609_63014046
- [X] 2200524616_63014048
- [X] 2200533429_63060425
- [X] 2200539735_63060406
- [X] 2200549908_63019339
- [X] 2200549965_63019349
- [X] 2200550414_63019357
- [X] 2200550471_63020031
- [X] 2200550490_63019351
- [X] 2200550505_63019340
- [X] 2200555565_63018614
- [X] 2200559438_63020029
- [X] 2200559682_63020030
- [X] 2200559713_63019623
- [X] 2200559739_63019626
- [X] 2200569969_63019991
- [X] 2200570001_63021580
- [X] 2200570025_63021490
- [X] 2200570035_63021491
- [X] 2200570042_63021493
- [X] 2200570050_63021494
- [X] 2200579897_63024910
- [X] 2200583995_63024866
- [X] 2200584035_63024905
- [X] 2200584069_63024888
- [X] 2200584126_63024810
- [X] 2200589507_63026712
- [X] 2200597365_63027994
- [X] 2200597480_63027988
- [X] 2200597752_63026853
- [X] 2200597778_63027992
- [X] 22005977_63026903
- [X] 2200609031_63026527
- [X] 2200614198_63113928
- [X] 2200620372_63030821
- [X] 2200620442_63030810
- [X] 2200620498_63030816
- [X] 2200620628_63031031
- [X] 2200622310_63030984
- [ ] 2200622355_63030956
- [ ] 2200625369_63028699
- [ ] 2200625410_63028697
- [ ] 2200625536_63028694
- [ ] 2200630189_63030665
- [ ] 2200635149_63033182
- [ ] 2200644544_63037731
- [ ] 2200644594_63037725
- [ ] 2200650089_63038093
- [ ] 2200666292_63076568
- [ ] 2200669188_63036688
- [ ] 2200669320_63040259
- [ ] 2200669383_63040254
- [ ] 2200669414_63040257
- [ ] 2200669446_63040251
- [ ] 2200680342_63105271
- [ ] 2200694535_63042853
- [ ] 2200694789_63042862
- [ ] 2200694858_63042702
- [ ] 2200694917_63042696
- [ ] 2200699290_63043047
- [ ] 2200699345_63040238
- [ ] 2200699383_63043050
- [ ] 2200699412_63040731
- [ ] 220071551_63048935
- [ ] 2200731515_63048963
- [ ] 2200748145_63051198
- [ ] 2200748171_63051213
- [ ] 2200751046_63051249
- [ ] 2200751101_63051234
- [ ] 2200766471_63054590
- [ ] 2200767731_63054595
- [ ] 2200767822_63054464
- [ ] 2200775505_63060410
- [ ] 2200850441_63019345
- [ ] 220597589_63026879
- [ ] 2300003253_63060430
- [ ] 2300005679_63060370
- [ ] 2300009914_63060390
- [ ] 2300028784_63060001
- [ ] 2300036815_63063357
- [ ] 2300055382_63061874
- [ ] 2300055421_63061871
- [ ] 2300055440_63061880
- [ ] 230006894_63064950
- [ ] 2300071111_63070356
- [ ] 2300083434_63071675
- [ ] 2300103609_63076239
- [ ] 2300104572_63076232
- [ ] 2300109602_63076765
- [ ] 2300109665_63076770
- [ ] 2300119721_63078732
- [ ] 2300137773_63078133
- [ ] 2300137834_63078123
- [ ] 2300167821_63086183
- [ ] 2300172698_63113453
- [ ] 2300188216_63090609
- [ ] 2300188281_63090632
- [ ] 2300188800_63090616
- [ ] 2300193193645_63090623
- [ ] 2300193668_63090611
- [ ] 2300195426_63090608
- [ ] 2300201017_63089636
- [ ] 2300227479_63098330
- [ ] 2300232688_63130821
- [ ] 2300292749_63109239
- [ ] 230029277_63109247
- [ ] 2300294712_63109236
- [ ] 2300308032_63111581
- [ ] 2300323537_63114209
- [ ] 2300334609_63115535
- [ ] 2300346867_63118093
- [ ] 2300346867_63118093_NA12878
- [ ] 2300348940_63118099
- [ ] 2300359806_63119915
- [ ] 2300380476_63123963
- [ ] 2300382582_63123749
- [ ] 2300384269_63126867
- [ ] 2300407581_63130826
- [ ] 2300407626_63130842
- [ ] 2300409593_63130874
- [ ] 2300409612_63130980
- [ ] 2300417623_63131524
** TODO Variants manqués
*** TODO 63012582: chr10:g.102230760
*** TODO 63060439: chr15:g.26869324 = Problème de profondeur DP=15
SCHEDULED: <2023-10-08 Sun>
GT:AD:DP:GQ:PL 0/1:9,6:15:99:103,0,213
* Résultats
** TODO Speed-up BWA-mem
SCHEDULED: <2023-10-08 Sun>
** TODO Speed-up Hapotypecaller
SCHEDULED: <2023-10-08 Sun>
* Communication
** DONE Mail NGS-diag
CLOSED: [2023-10-06 Fri 08:04] SCHEDULED: <2023-10-06 Fri>
/Entered on/ [2023-10-04 Wed 19:33]

File addition: .hypb (----------)

[26.13]


"bisonex.org"
("runs.waiting" nil nil find-file-other-window ("/ssh:meso:/Home/Users/apraga/runs.waiting" t) "alex@gentoo" "20231007:13:31:21" nil nil)
("samples.csv" nil nil find-file-other-window ("/ssh:meso:/Work/Users/apraga/bisonex/samples.csv" t) "alex@gentoo" "20231007:13:30:53" nil nil)