F4OH5H3ONZKUVBUI5NKYJ25B66FS22QQ7LRAO53OQX3ZBL2BL6JAC
FDRVCB2DMGO2MHW3GMYQ4VUAGQYDRPZLBZDDO3H7CEHETNC7GIXQC
GSNJJN2VNAE7ZUTKSXWAWY727CHWOJQCA7ZYQ4UW4K6CJ45ANLIQC
IJIOG6X6SMDA55XKPEX2RSTCKHZX5E7QASIZR5L4AZLXW4X3KWJQC
CDPYGRNUP6EEE77RHDGEALT4BUELT3RBSBUO35PLI3HLZ44KK26QC
4EYK4BECM65Q7PWBSARPRHL5MIPC4JGIWYIPFDCSAV6YWSZH67BQC
RHWQQAAHNHFO3FLCGVB3SIDKNOUFJGZTDNN57IQVBMXXCWX74MKAC
XVAOM5T7YWGAJHM65FK7QFTINHWIC2UC3X6CYOYM3SL2T2EFKCAAC
BJW7E6D6T7QQFPFGMHHQYHAEMRX5HGEE2H32IUU4HPUIS2AZW7KAC
HODZPQLZBLXXWP6NVKDWVSLFEN6HKIKOFG6QVX2PEJRYAM4I6ZQQC
FYUID2XZFKETTS6EJLOGMV5DJJMLY4NCQSATT3ZKWDLHEKJRO4TAC
RTP5CJJXW2DQQGKCS3R24GIB6MRCNVXWTZVUQYLXNNR4UXWWGD5AC
INI4X4KVQ4TVQVFK5SRVTXRAKGQOW75ZESBEG5IIKLI63NCEGKXAC
6CVBGNZOBDCISGPDH4Z3N37EEUYVOBKMMCVM55OK233FSDS2DDRAC
KBAP3IHF4E6JXSYKVKNBXLCNHX2KGYGKVVS2ATPZ4JGJT7Y32YSAC
Z4JC5QKADUJB7BIW2IR34WPGU4ELNQW2LTOLG3OYMHCXALWKFEQAC
57HKJ3RTGJCI3S7H7NBEOXOXOHEBFO5UU7TQ2QBQKGKSRBGET2KAC
WL5R6AMW2SA27H5YOFYKRSTM344ZTIDVGBA55TS7QOU5LAKLV7VQC
GVIG6NM3IZWXRRMVCZLCDQU4GFLU2ILVVNB7D7ARIPUWRRO6BLDAC
CHCMW6PPGP2XMJQT4D6EYQUAO5MG77RUE5JTVRET7B7ODKRVM4GQC
BSZSUYUWGGM7DKBXK3AZZJFWFWLODMNPLDTAEHY7PWK2PI2NSCIQC
IMCF75S3NQXK7TZUGS4POPLLOCFQPYQX3QTHIR2J3HCW564UMPGAC
3LLSOLDOJ5OSKQN2DKYHYGWTBUJAC5KPRX2SAVNARRUYCPRM44RQC
4UOH23PPEC4TH5GALRKZOOGOSLV4FJB6R4GA2N3TCEDCOIZTMJSAC
7NDZXAGUWT2JOXEDHCP7OGVNLC7XUNJ2MNEBKDJKSJ2ZFL5HF4YAC
FXA3ZBV64FML7W47IPHTAJFJHN3J3XHVHFVNYED47XFSBIGMBKRQC
RUR2HQCDDUOIP7GD2GZV2UPCUF5QS6COVIVDKVIGZF22TVRFBKPAC
XBXXQ7NGCA2AM7F6DODF75VNIA52J3MNYSLNAYRRC2KKYJUVCN2QC
Z6B2FRJWT6EF4MC4GTIDBMULUKEI62ZGPYZMCWRJDJNUM7YUJJMAC
CRN5TEHRZUS2JA26WEEHG65MT4ADK6DYAFFY6SCYRRPIF36NRU4AC
XZOQO7A5LE72EEGJRR6USUXL4S4L5DKQAHPJMUPXHF2MKLK4AK5AC
X6325TNJCTAY773WSNRKDTMW4QTTRPKFNTSBHCFCMBNLNII53A3QC
BSTHKI4NCR6JGVYEX3V4NGAIEHD52CYN7BO7BBIH5ZE2G6VWWVXQC
PL526DIB3OIAMC35BV5DRTUHRZDIJDEUCPPK2AUY5CHLWJNEWI4AC
JMVLHRX7RFJCQFXESQT75DUKOLBQEO7LMKOZCI4QWQBQKAWLC2PAC
FFRVLC5ETQ5C5VBU6RM26MGEHS76DMISRZMAOOTYDIIGAQVTWVYAC
SR7OKKIHACXTVR3C6GSL5NL47FR5KID33NZKQNCZCCMGZQXLSXIQC
I4OQ35BVH6AC4IEQKZP2YXLKGWBCRM4TUX7S66OFZ5D7RZP23R2QC
*** DONE Passe 1 [37/37]
CLOSED: [2023-10-06 Fri 23:27] SCHEDULED: <2023-10-01 Sun>
- [X] https://scut.srht.site/notes/medecine/20230528235124-culture.html
- [X] <[Syphilis]> - ~/annex/public/lessons/microbiologie/bacterio/Syphilis.pdf
- [X] A. agalactiae
- [X] <[ATBg staphylocoques]> - "~/annex/public/lessons/microbiologie/bacterio/Antibiogramme - Staphylocoques.pdf"
- [X] <[ATBg streptococque, entérocoque, Listeria]> - "~/annex/public/lessons/microbiologie/bacterio/Antibiogramme - Streptococoque, enterocoque, Listeria.pdf"
- [X] <[Chlamydia, mycoplasme]> - "~/annex/public/lessons/microbiologie/bacterio/Chlamydia - mycoplasmes.pdf"
- [X] <[Moléculaire (bactério)]> - "~/annex/public/lessons/microbiologie/bacterio/Diagnostic moleculaire - Bacteriologie.pdf"
- [X] <[EBCU etiologies]> - "~/annex/public/lessons/microbiologie/bacterio/EBCU etiologies.pdf"
- [X] <[ECBU interpretation]> - "~/annex/public/lessons/microbiologie/bacterio/ECBU interpretation.pdf"
- [X] <[ECBU pre-analytique]> - "~/annex/public/lessons/microbiologie/bacterio/ECBU pre-analytique.pdf"
- [X] <[EEQ, CIQ]> - "~/annex/public/lessons/microbiologie/bacterio/EEQ, CIQ.pdf"
- [X] <[Examen microscopique]> - "~/annex/public/lessons/microbiologie/bacterio/Examen microscopique.pdf"
- [X] <[Gonocoque]> - "~/annex/public/lessons/microbiologie/bacterio/Gonocoque.pdf"
- [X] <[Hygiène]> - "~/annex/public/lessons/microbiologie/bacterio/Hygiène.pdf"
- [X] <[Infections cutanees]> - "~/annex/public/lessons/microbiologie/bacterio/Infections cutanees.pdf"
- [X] <[MALDI - TOF]> - "~/annex/public/lessons/microbiologie/bacterio/MALDI - TOF.pdf"
- [X] <[Pre-analytique bacteriologie]> - "~/annex/public/lessons/microbiologie/bacterio/Pre-analytique bacteriologie.pdf"
- [X] <[Qualite]> - "~/annex/public/lessons/microbiologie/bacterio/Qualite.pdf"
- [X] <[Securite Transfusionnelle]> - "~/annex/public/lessons/microbiologie/bacterio/Securite Transfusionnelle.pdf"
- [X] <[Serologie bacterienne]> - "~/annex/public/lessons/microbiologie/bacterio/Serologie bacterienne.pdf"
- [X] <[Tests rapides antigeniques et moleculaires]> - "~/annex/public/lessons/microbiologie/bacterio/Tests rapides antigeniques et moleculaires.pdf"
- [X] <[Tuberculose]> - "~/annex/public/lessons/microbiologie/bacterio/Tuberculose.pdf"
- [X] <[Typage moleculaire bacterien]> - "~/annex/public/lessons/microbiologie/bacterio/Typage moleculaire bacterien.pdf"
- [X] <[Vaccination personnel]> - "~/annex/public/lessons/microbiologie/bacterio/Vaccination personnel.pdf"
- [X] <[Angines bacteriennes]> - "~/annex/public/lessons/microbiologie/bacterio/Angines bacteriennes.pdf"
- [X] <[Antibiogramme - Enterobacteries]> - "~/annex/public/lessons/microbiologie/bacterio/Antibiogramme - Enterobacteries.pdf"
- [X] <[Antibiogramme]> - "~/annex/public/lessons/microbiologie/bacterio/Antibiogramme.pdf"
- [X] <[Cambylobacter]> - "~/annex/public/lessons/microbiologie/bacterio/Cambylobacter.pdf"
- [X] <[Clostridium difficile]> - "~/annex/public/lessons/microbiologie/bacterio/Clostridium difficile.pdf"
- [X] <[Concentrations critiques]> - "~/annex/public/lessons/microbiologie/bacterio/Concentrations critiques.pdf"
- [X] <[Conseil anti-infectieux]> - "~/annex/public/lessons/microbiologie/bacterio/Conseil anti-infectieux.pdf"
- [X] <[Declaration obligatoire]> - "~/annex/public/lessons/microbiologie/bacterio/Declaration obligatoire.pdf"
- [X] <[Hemocultures 1]> - "~/annex/public/lessons/microbiologie/bacterio/Hemocultures 1.pdf"
- [X] <[Hemocultures 2]> - "~/annex/public/lessons/microbiologie/bacterio/Hemocultures 2.pdf"
- [X] <[Legionelle]> - "~/annex/public/lessons/microbiologie/bacterio/Legionelle.pdf"
- [X] <[Meningites bacteriennes ]> - "~/annex/public/lessons/microbiologie/bacterio/Meningites bacteriennes .pdf"
- [X] <[Salmonelle - shigelle]> - "~/annex/public/lessons/microbiologie/bacterio/Salmonelle - shigelle.pdf"
*** TODO Passe 2 [37/37]
SCHEDULED: <2023-10-06 Fri> DEADLINE: <2023-10-10 Tue>
- [ ] https://scut.srht.site/notes/medecine/20230528235124-culture.html
- [ ] <[Syphilis]> - ~/annex/public/lessons/microbiologie/bacterio/Syphilis.pdf
- [ ] A. agalactiae
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Antibiogramme - Staphylocoques.pdf][ATBg staphylocoques]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Antibiogramme - Streptococoque, enterocoque, Listeria.pdf][ATBg streptococque, entérocoque, Listeria]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Chlamydia - mycoplasmes.pdf][Chlamydia, mycoplasme]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Diagnostic moleculaire - Bacteriologie.pdf][Moléculaire (bactério)]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/EBCU etiologies.pdf][EBCU etiologies]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/ECBU interpretation.pdf][ECBU interpretation]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/ECBU pre-analytique.pdf][ECBU pre-analytique]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/EEQ, CIQ.pdf][EEQ, CIQ]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Examen microscopique.pdf][Examen microscopique]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Gonocoque.pdf][Gonocoque]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Hygiène.pdf][Hygiène]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Infections cutanees.pdf][Infections cutanees]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/MALDI - TOF.pdf][MALDI - TOF]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Pre-analytique bacteriologie.pdf][Pre-analytique bacteriologie]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Qualite.pdf][Qualite]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Securite Transfusionnelle.pdf][Securite Transfusionnelle]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Serologie bacterienne.pdf][Serologie bacterienne]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Tests rapides antigeniques et moleculaires.pdf][Tests rapides antigeniques et moleculaires]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Tuberculose.pdf][Tuberculose]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Typage moleculaire bacterien.pdf][Typage moleculaire bacterien]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Vaccination personnel.pdf][Vaccination personnel]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Angines bacteriennes.pdf][Angines bacteriennes]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Antibiogramme - Enterobacteries.pdf][Antibiogramme - Enterobacteries]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Antibiogramme.pdf][Antibiogramme]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Cambylobacter.pdf][Cambylobacter]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Clostridium difficile.pdf][Clostridium difficile]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Concentrations critiques.pdf][Concentrations critiques]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Conseil anti-infectieux.pdf][Conseil anti-infectieux]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Declaration obligatoire.pdf][Declaration obligatoire]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Hemocultures 1.pdf][Hemocultures 1]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Hemocultures 2.pdf][Hemocultures 2]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Legionelle.pdf][Legionelle]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Meningites bacteriennes .pdf][Meningites bacteriennes ]]
- [ ] [[~/annex/public/lessons/microbiologie/bacterio/Salmonelle - shigelle.pdf][Salmonelle - shigelle]]
*** TODO Cours
SCHEDULED: <2023-10-07 Sat>
- [1/5] <[Culture]> - https://scut.srht.site/notes/medecine/20230528235124-culture.html
- [1/5] <[Syphilis]> - "~/annex/public/lessons/microbiologie/bacterio/Syphilis.pdf"
- [1/5] A. agalactiae
- [1/5] <[ATBg staphylocoques]> - "~/annex/public/lessons/microbiologie/bacterio/Antibiogramme - Staphylocoques.pdf"
- [1/5] <[ATBg streptococque, entérocoque, Listeria]> - "~/annex/public/lessons/microbiologie/bacterio/Antibiogramme - Streptococoque, enterocoque, Listeria.pdf"
- [1/5] <[Chlamydia, mycoplasme]> - "~/annex/public/lessons/microbiologie/bacterio/Chlamydia - mycoplasmes.pdf"
- [1/5] <[Moléculaire (bactério)]> - "~/annex/public/lessons/microbiologie/bacterio/Diagnostic moleculaire - Bacteriologie.pdf"
- [1/5] <[EBCU etiologies]> - "~/annex/public/lessons/microbiologie/bacterio/EBCU etiologies.pdf"
- [1/5] <[ECBU interpretation]> - "~/annex/public/lessons/microbiologie/bacterio/ECBU interpretation.pdf"
- [1/5] <[ECBU pre-analytique]> - "~/annex/public/lessons/microbiologie/bacterio/ECBU pre-analytique.pdf"
- [1/5] <[EEQ, CIQ]> - "~/annex/public/lessons/microbiologie/bacterio/EEQ, CIQ.pdf"
- [1/5] <[Examen microscopique]> - "~/annex/public/lessons/microbiologie/bacterio/Examen microscopique.pdf"
- [1/5] <[Gonocoque]> - "~/annex/public/lessons/microbiologie/bacterio/Gonocoque.pdf"
- [1/5] <[Hygiène]> - "~/annex/public/lessons/microbiologie/bacterio/Hygiène.pdf"
- [1/5] <[Infections cutanees]> - "~/annex/public/lessons/microbiologie/bacterio/Infections cutanees.pdf"
- [1/5] <[MALDI - TOF]> - "~/annex/public/lessons/microbiologie/bacterio/MALDI - TOF.pdf"
- [1/5] <[Pre-analytique microbiologie/bacteriologie]> - "~/annex/public/lessons/bacterio/Pre-analytique bacteriologie.pdf"
- [1/5] <[Qualite]> - "~/annex/public/lessons/microbiologie/bacterio/Qualite.pdf"
- [1/5] <[Securite Transfusionnelle]> - "~/annex/public/lessons/microbiologie/bacterio/Securite Transfusionnelle.pdf"
- [1/5] <[Serologie bacterienne]> - "~/annex/public/lessons/microbiologie/bacterio/Serologie bacterienne.pdf"
- [1/5] <[Tests rapides antigeniques et moleculaires]> - "~/annex/public/lessons/microbiologie/bacterio/Tests rapides antigeniques et moleculaires.pdf"
- [1/5] <[Tuberculose]> - "~/annex/public/lessons/microbiologie/bacterio/Tuberculose.pdf"
- [1/5] <[Moleculaire]> - "~/annex/public/lessons/microbiologie/bacterio/Typage moleculaire bacterien.pdf"
- [1/5] <[Vaccination personnel]> - "~/annex/public/lessons/microbiologie/bacterio/Vaccination personnel.pdf"
- [1/5] <[Angines]> - "~/annex/public/lessons/microbiologie/bacterio/Angines bacteriennes.pdf"
- [1/5] <[ATBg Enterobacteries]> - "~/annex/public/lessons/microbiologie/bacterio/Antibiogramme - Enterobacteries.pdf"
- [1/5] <[ATBg]> - "~/annex/public/lessons/microbiologie/bacterio/Antibiogramme.pdf"
- [1/5] <[Cambylobacter]> - "~/annex/public/lessons/microbiologie/bacterio/Cambylobacter.pdf"
- [1/5] <[Clostridium difficile]> - "~/annex/public/lessons/microbiologie/bacterio/Clostridium difficile.pdf"
- [1/5] <[Concentrations critiques]> - "~/annex/public/lessons/microbiologie/bacterio/Concentrations critiques.pdf"
- [1/5] <[Conseil anti-infectieux]> - "~/annex/public/lessons/microbiologie/bacterio/Conseil anti-infectieux.pdf"
- [1/5] <[Declaration obligatoire]> - "~/annex/public/lessons/microbiologie/bacterio/Declaration obligatoire.pdf"
- [1/5] <[Hemocultures 1]> - "~/annex/public/lessons/microbiologie/bacterio/Hemocultures 1.pdf"
- [1/5] <[Hemocultures 2]> - "~/annex/public/lessons/microbiologie/bacterio/Hemocultures 2.pdf"
- [1/5] <[Legionelle]> - "~/annex/public/lessons/microbiologie/bacterio/Legionelle.pdf"
- [1/5] <hMeningites bacteriennes ]> - "~/annex/public/lessons/microbiologie/bacterio/Meningites bacteriennes .pdf"
- [1/5] <[Salmonelle - shigelle]> - "~/annex/public/lessons/microbiologie/bacterio/Salmonelle - shigelle.pdf"
*** DONE Passe 1
CLOSED: [2023-10-05 Thu 16:58] DEADLINE: <2023-10-06 Fri> SCHEDULED: <2023-10-06 Fri>
*** TODO Passe 2: autres cours
DEADLINE: <2023-10-11 Wed> SCHEDULED: <2023-10-07 Sat>
** TODO Présentation dépistage hémato :presentation:
:PROPERTIES:
:CATEGORY: bacterio
:END:
*** DONE Refaire analyse bouche
CLOSED: [2023-10-07 Sat 17:49] SCHEDULED: <2023-10-07 Sat>
*** TODO Traitement patients bouches
SCHEDULED: <2023-10-07 Sat>
*** TODO Faire analyse selles
SCHEDULED: <2023-10-07 Sat>
*** TODO Traitement patients selles
SCHEDULED: <2023-10-07 Sat>
*** TODO Présenter Torres 2022
SCHEDULED: <2023-10-07 Sat>
*** TODO Présenter Santibiez 2023
** TODO Résilier ordures ménagères
SCHEDULED: <2023-10-06 Fri>
** TODO Payer ordures ménagères
SCHEDULED: <2023-10-06 Fri>
** WAIT Résilier ordures ménagères
SCHEDULED: <2023-10-14 Sat>
Mail envoyé
** DONE Payer ordures ménagères
CLOSED: [2023-10-07 Sat 17:48] SCHEDULED: <2023-10-06 Fri>
#+title: Bisonex
#+category: bisonex
* Biblio
:PROPERTIES:
:CATEGORY: biblio
:END:
** Workflow
Comparaison WDL, Cromwell, nextflow
https://www.nature.com/articles/s41598-021-99288-8
Nextflow = bon compromis ?
Comparison alignement, variant caller (2021)
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04144-1
** Étapes du pipeline
*** Variant calling: Haplotype caller
https://gatk.broadinstitute.org/hc/en-us/articles/360035531412
Définis l'algorithme + image
*** Phred score
https://gatk.broadinstitute.org/hc/en-us/articles/360035531872-Phred-scaled-quality-scores
** VCF
*** GT genotype
encoded as alleles values separated by either of ”/” or “|”, e.g. The allele values are 0 for the reference allele (what is in the reference sequence), 1 for the first allele listed in ALT, 2 for the second allele list in ALT and so on. For diploid calls examples could be 0/1 or 1|0 etc. For haploid calls, e.g. on Y, male X, mitochondrion, only one allele value should be given. All samples must have GT call information; if a call cannot be made for a sample at a given locus, ”.” must be specified for each missing allele in the GT field (for example ./. for a diploid). The meanings of the separators are:
/ : genotype unphased
| : genotype phased
** Validation
*** NA12878
**** KILL [[https://precision.fda.gov/challegnges/truth/results][fdaPrecision challenge]]
Attention, génome et en hg19 donc comparaison non adaptée ...
**** TODO Best practices for the analytical validation of clinical whole-genome sequencing intended for the diagnosis of germline disease
https://www.nature.com/articles/s41525-020-00154-9
Recommandations générale pour genome, sans données brutes
**** TODO [#A] Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2928-9
1. variant calling seul
2. NA12878 + données simulées
3. exome
4. évalué via F-score
Code disponible ! https://github.com/bharani-lab/WES-Benchmarking-Pipeline_Manoj/tree/master/Script
Résultat: BWA/Novoalign_DeepVariant
Aligneurs
- BWA-MEM 0.7.16
- Bowtie2 2.2.6
- Novoalign 3.08.02
- SOAP 2.21
- MOSAIK 2.2.3
Variantcalling
- GATK HaplotypeCaller 4
- FreeBayes 1.1.0
- SAMtools mpileup 1.7
- DeepVariant r0.4
SNV
| Exome | Pipeline | TP | FP | FN | Sensitivity | Precision | F-Score | FDR |
| 1 | BWA_GATK | 2368g9 | 1397 | 613 | 0.975 | 0.944 | 0.959 | 0.057 |
| 2 | BWA_GATK | 23946 | 865 | 356 | 0.985 | 0.965 | 0.975 | 0.036 |
indel
| TP | FP | FN | Sensitivity | Precision | F-Score | FDR | |
| 1254 | 72 | 75 | 0.944 | 0.946 | 0.945 | 0.054 | |
| 1309 | 10 | 20 | 0.985 | 0.992 | 0.989 | 0.008 | |
Valeur brutes :
https://static-content.springer.com/esm/art%3A10.1186%2Fs12859-019-2928-9/MediaObjects/12859_2019_2928_MOESM8_ESM.pdf
Autres articles avec même comparaison en exome sur NA12878
- Hwang et al., 2015 studyi
- Highnam et al, 2015
- Cornish and Guda, 2015
Variant Type
| | SNVs & Indels | CNVs (>10Kb) | SVs | Mitochondrial variants | Pseudogenes | REs | Somatic/ mosaic | Literature/Data | Source |
| NA12878 | 100%a | 40% | 0 | 0 | 0 | 0 | 0 | Zook et al18 | NIST |
| Other NIST standard | 71% | 40% | 50% | 0 | 0 | 0 | 0 | Zook et al18 | |
| (e.g. AJ/Asian trios) | | | | | | | | | |
| Platinum | 29% | 0 | 0 | 0 | 0 | 0 | 0 | Eberle et al8 | Platinum |
| Genomes | | | | | | | | | |
| Venter/HuRef | 14% | 40% | 0 | 0 | 0 | 0 | 0 | Trost et al1 | HuRef |
**** Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers
#+begin_src bibtex
@ARTICLE{Chen2019-fp,
title = "Systematic comparison of germline variant calling pipelines
cross multiple next-generation sequencers",
author = "Chen, Jiayun and Li, Xingsong and Zhong, Hongbin and Meng,
Yuhuan and Du, Hongli",
abstract = "The development and innovation of next generation sequencing
(NGS) and the subsequent analysis tools have gain popularity in
scientific researches and clinical diagnostic applications.
Hence, a systematic comparison of the sequencing platforms and
variant calling pipelines could provide significant guidance to
NGS-based scientific and clinical genomics. In this study, we
compared the performance, concordance and operating efficiency
of 27 combinations of sequencing platforms and variant calling
pipelines, testing three variant calling pipelines-Genome
Analysis Tool Kit HaplotypeCaller, Strelka2 and
Samtools-Varscan2 for nine data sets for the NA12878 genome
sequenced by different platforms including BGISEQ500,
MGISEQ2000, HiSeq4000, NovaSeq and HiSeq Xten. For the variants
calling performance of 12 combinations in WES datasets, all
combinations displayed good performance in calling SNPs, with
their F-scores entirely higher than 0.96, and their performance
in calling INDELs varies from 0.75 to 0.91. And all 15
combinations in WGS datasets also manifested good performance,
with F-scores in calling SNPs were entirely higher than 0.975
and their performance in calling INDELs varies from 0.71 to
0.93. All of these combinations manifested high concordance in
variant identification, while the divergence of variants
identification in WGS datasets were larger than that in WES
datasets. We also down-sampled the original WES and WGS datasets
at a series of gradient coverage across multiple platforms, then
the variants calling period consumed by the three pipelines at
each coverage were counted, respectively. For the GIAB datasets
on both BGI and Illumina platforms, Strelka2 manifested its
ultra-performance in detecting accuracy and processing
efficiency compared with other two pipelines on each sequencing
platform, which was recommended in the further promotion and
application of next generation sequencing technology. The
results of our researches will provide useful and comprehensive
guidelines for personal or organizational researchers in
reliable and consistent variants identification.",
journal = "Sci. Rep.",
publisher = "Springer Science and Business Media LLC",
volume = 9,
number = 1,
pages = "9345",
month = jun,
year = 2019,
copyright = "https://creativecommons.org/licenses/by/4.0",
language = "en"
}
#+end_src
Comparaison de différents pipeline 2019
https://www.nature.com/articles/s41598-019-45835-3
Combinaison
- variant calling = GATK, Strelka2 and Samtools-Varscan2
- sur NA12878
- séquencé sur BGISEQ500, MGISEQ2000, HiSeq4000, NovaSeq and HiSeq Xten.
Conclusion: strelka2 supérieur mais biais sur NA12878 ?
Illumina > BGI pour indel, probablement car reads plus grand
#+begin_quote
For WES datasets, the BGI platforms displayed the superior performance in SNPs
calling while Illumina platforms manifested the better variants calling
performance
in INDELs calling, which could be explained by their divergence in
sequencing strategy that producing different length of reads (all BGI platforms
were 100 base pair read length while all Illumina platforms were 150 base pair
read length). The read length effects, as a key factor between two platforms,
would bring alignment bias and error which are higher for short reads and
ultimately affect the variants calling especially the INDELs identification
#+end_quote
*** Débugger variant calling (haplotypecaller)
https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant
https://gatk.broadinstitute.org/hc/en-us/articles/360035891111-Expected-variant-at-a-specific-site-was-not-called
*** Hap.py
Format de sortie :
#+begin_src r
vcf_field_names(vcf, tag = "FORMAT")
#+end_src
#+RESULTS:
: FORMAT BD 1 String Decision for call (TP/FP/FN/N)
: FORMAT BK 1 String Sub-type for decision (match/mismatch type)
: FORMAT BVT 1 String High-level variant type (SNP|INDEL).
: FORMAT BLT 1 String High-level location type (het|homref|hetalt|homa
am = genotype mismatch
lm = allele/haplotype mismatch
. = non vu
**** On vérifie que am = genotype mismatch
référence = T/T
high-confidence = T/C
notre = C/C
#+begin_src sh
bcftools filter -i 'POS=19196584' /Work/Groups/bisonex/data/giab/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz | grep -v '#'
bcftools filter -i 'POS=19196584' ../out/NA12878_NIST7035-dbsnp/variantCalling/haplotypecaller/NA12878_NIST.vcf.gz | grep -v '#'
#+end_src
#+RESULTS:
: NC_000022.11 19196584 . T C 50 PASS platforms=5;platformnames=Illumina,PacBio,10X,Ion,Solid;datasets=5;datasetnames=HiSeqPE300x,CCS15kb_20kb,10XChromiumLR,IonExome,SolidSE75bp;callsets=7;callsetnames=HiSeqPE300xGATK,CCS15kb_20kbDV,CCS15kb_20kbGATK4,HiSeqPE300xfreebayes,10XLRGATK,IonExomeTVC,SolidSE75GATKHC;datasetsmissingcall=CGnormal;callable=CS_HiSeqPE300xGATK_callable,CS_CCS15kb_20kbDV_callable,CS_10XLRGATK_callable,CS_CCS15kb_20kbGATK4_callable,CS_HiSeqPE300xfreebayes_callable GT:PS:DP:ADALL:AD:GQ 0/1:.:781:109,123:138,150:348
: NC_000022.11 19196584 rs1061325 T C 59.32 PASS AC=2;AF=1;AN=2;DB;DP=2;ExcessHet=0;FS=0;MLEAC=1;MLEAF=0.5;MQ=60;QD=29.66;SOR=2.303 GT:AD:DP:GQ:PL 1/1:0,2:2:6:71,6,0
**** On vérifie que lm = allele/haplotype mismatch
référence = CAA/CAA
high-confidence = CA/CA
notre = C/CA
#+begin_src sh
bcftools filter -i 'POS=31277416' /Work/Groups/bisonex/data/giab/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz | grep -v '#'
bcftools filter -i 'POS=31277416' ../out/NA12878_NIST7035-dbsnp/variantCalling/haplotypecaller/NA12878_NIST.vcf.gz | grep -v '#'
#+end_src
#+RESULTS:
: NC_000022.11 31277416 . CA C 50 PASS platforms=3;platformnames=Illumina,PacBio,10X;datasets=3;datasetnames=HiSeqPE300x,CCS15kb_20kb,10XChromiumLR;callsets=4;callsetnames=HiSeqPE300xGATK,CCS15kb_20kbDV,10XLRGATK,HiSeqPE300xfreebayes;datasetsmissingcall=CCS15kb_20kb,CGnormal,IonExome,SolidSE75bp;callable=CS_HiSeqPE300xGATK_callable;difficultregion=GRCh38_AllHomopolymers_gt6bp_imperfectgt10bp_slop5,GRCh38_SimpleRepeat_imperfecthomopolgt10_slop5 GT:PS:DP:ADALL:AD:GQ 1/1:.:465:16,229:0,190:129
: NC_000022.11 31277416 rs57244615 CAA C,CA 389.02 PASS AC=1,1;AF=0.5,0.5;AN=2;BaseQRankSum=0.37;DB;DP=37;ExcessHet=0;FS=0;MLEAC=1,1;MLEAF=0.5,0.5;MQ=60;MQRankSum=0;QD=13.41;ReadPosRankSum=-0.651;SOR=0.572 GT:AD:DP:GQ:PL 1/2:5,10,14:29:64:406,202,313,64,0,88
*** Génération de reads
Biblio récente
https://www.biorxiv.org/content/10.1101/2022.03.29.486262v1.full.pdf
Parmi ceux qui gèrent les variations
- *simuscop* reads non centré sur les zones de capture
- *NEAT: exome* mais trop lent en pratique
- *Reseq* exome
- gensim : pas d'exome
- pIRS : non plus
- varsim : non plus
...
Temps de calcul selon l'article de reseq https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02265-7
#+begin_quote
Due to ReSeq’s effective parallelization, its elapsed times are low for this benchmark with 48 virtual CPUs (Additional file 1: Figure S34b,e). In contrast, the single-threaded processes implemented in perl or python have strikingly high elapsed times. This is well visible in Hs-HiX-TruSeq and applies to the training of pIRS (over a week), NEAT (several days), and BEAR (half a week) as well as the simulation of NEAT (close to 2 weeks) and BEAR (several weeks).
Biblio : https://www.nature.com/articles/s41437-022-00577-3
#+end_quote
Divers
- Liste ancienne : https://www.biostars.org/p/128762/
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02265-7
* Idées
** Validation analytique
mail Yannis : données patients +/- simulées
*** Utiliser données GCAT et uploader le notre ?
https://www.nature.com/articles/ncomms7275
*** [#A] Variant calling : Genome in a bottle : NA12878 + autres
Résumé : https://www.nist.gov/programs-projects/genome-bottle
Manuscript : https://www.nature.com/articles/s41587-019-0054-x.epdf?author_access_token=E_1bL0MtBBwZr91xEsy6B9RgN0jAjWel9jnR3ZoTv0OLNnFBR7rUIZNDXq0DIKdg3w6KhBF8Rz2RWQFFc0St45kC6CZs3cDYc87HNHovbWSOubJHDa9CeJV-pN0BW_mQ0n7cM13KF2JRr_wAAn524w%3D%3D
Article comparant les variant calling : https://www.biorxiv.org/content/10.1101/2020.12.11.422022v1.full.pdf
**** KILL Tester le séquencage aussi
CLOSED: [2023-01-30 lun. 18:30]
Depuis un fastq correspondant à Illumina https://github.com/genome-in-a-bottle/giab_data_indexes
puis on compare le VCF avec les "high confidence"
On séquence directement NA12878 -> inutile pour le pipeline seul
**** TODO Tester seul la partie bioinformatique
Tout résumé ici : https://www.nist.gov/programs-projects/genome-bottle
- methode https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/analysis/Illumina_PlatinumGenomes_NA12877_NA12878_09162015/IlluminaPlatinumGenomes-user-guide.pdf
- vcf
https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/latest/GRCh38/
NB: à quoi correspond https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/analysis/Illumina_PlatinumGenomes_NA12877_NA12878_09162015/hg38/2.0.1/NA12878/ ??
Article comparant les variant calling : https://www.biorxiv.org/content/10.1101/2020.12.11.422022v1.full.pdf
Article pour vcfeval : https://www.nature.com/articles/s41587-019-0054-x
La version 4 ajoute 273 gènes "clinically relevant" https://www.biorxiv.org/content/10.1101/2021.06.07.444885v3.full.pdf
Ajout des zones "difficiles"
https://www.biorxiv.org/content/10.1101/2020.07.24.212712v5.full.pdf
*** [#B] Pipeline : générer patient avec tous les variants retrouvés à Cento
Comparaison de génération ADN (2019)
https://academic.oup.com/bfg/article/19/1/49/5680294
**** SimuSCop (exome)
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03665-5
https://github.com/qasimyu/simuscop
1. Crééer un modèle depuis bam + vcf : Setoprofile
2. Génerer données NGS
** Annotation :
*** Comparaison vep / snpeff et annovar
* Changement nouvelle version
- Dernière version du génome (la version "prête à l'emploi" est seulement GRCh38 sans les version patchées)
* Notes
** Nextflow
*** afficher les résultats d'un process/workflow
#+begin_src
lol.out.view()
#+end_src
Attention, ne fonctionne pas si plusieurs sortie:
#+begin_src
lol.out[0].view()
#+end_src
ou si /a/ est le nom de la sortie
#+begin_src
lol.out.a.view()
#+end_src
** Quelle version du génome ?
- T2T: notation chromose = chR1,2 : ok genome, clinvar, dbSNP
- GRCh38: notation chromose = NC_... : ok genome, clinvar, dbSNP
** Performances
Ordinateur de Carine (WSL2) : 4h dont 1h15 alignement (parallélisé) et 1h15 haplotypecaller (séquentiel)
** Chromosomes NC, NT, NW
Correspondance :
https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&chromInfoPage=
Signification
https://genome.ucsc.edu/FAQ/FAQdownloads.htm
#+title: Bisonex
#+category: bisonex
* Idées
** Validation analytique
mail Yannis : données patients +/- simulées
*** Utiliser données GCAT et uploader le notre ?
https://www.nature.com/articles/ncomms7275
*** [#A] Variant calling : Genome in a bottle : NA12878 + autres
Résumé : https://www.nist.gov/programs-projects/genome-bottle
Manuscript : https://www.nature.com/articles/s41587-019-0054-x.epdf?author_access_token=E_1bL0MtBBwZr91xEsy6B9RgN0jAjWel9jnR3ZoTv0OLNnFBR7rUIZNDXq0DIKdg3w6KhBF8Rz2RWQFFc0St45kC6CZs3cDYc87HNHovbWSOubJHDa9CeJV-pN0BW_mQ0n7cM13KF2JRr_wAAn524w%3D%3D
Article comparant les variant calling : https://www.biorxiv.org/content/10.1101/2020.12.11.422022v1.full.pdf
**** Tester le séquencage aussi
Depuis un fastq correspondant à Illumina https://github.com/genome-in-a-bottle/giab_data_indexes
puis on compare le VCF avec les "high confidence"
On séquence directement NA12878 -> inutile pour le pipeline seul
**** Tester seul la partie bioinformatique
Tout résumé ici : https://www.nist.gov/programs-projects/genome-bottle
- methode https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/analysis/Illumina_PlatinumGenomes_NA12877_NA12878_09162015/IlluminaPlatinumGenomes-user-guide.pdf
- vcf
https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/latest/GRCh38/
NB: à quoi correspond https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/analysis/Illumina_PlatinumGenomes_NA12877_NA12878_09162015/hg38/2.0.1/NA12878/ ??
Article comparant les variant calling : https://www.biorxiv.org/content/10.1101/2020.12.11.422022v1.full.pdf
Article pour vcfeval : https://www.nature.com/articles/s41587-019-0054-x
La version 4 ajoute 273 gènes "clinically relevant" https://www.biorxiv.org/content/10.1101/2021.06.07.444885v3.full.pdf
Ajout des zones "difficiles"
https://www.biorxiv.org/content/10.1101/2020.07.24.212712v5.full.pdf
*** [#B] Pipeline : générer patient avec tous les variants retrouvés à Cento
Comparaison de génération ADN (2019)
https://academic.oup.com/bfg/article/19/1/49/5680294
**** SimuSCop (exome)
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03665-5
https://github.com/qasimyu/simuscop
1. Crééer un modèle depuis bam + vcf : Setoprofile
2. Génerer données NGS
** Annotation :
*** Comparaison vep / snpeff et annovar
* Changement nouvelle version
- Dernière version du génome (la version "prête à l'emploi" est seulement GRCh38 sans les version patchées)
* Notes
** Nextflow
*** afficher les résultats d'un process/workflow
#+begin_src
lol.out.view()
#+end_src
Attention, ne fonctionne pas si plusieurs sortie:
#+begin_src
lol.out[0].view()
#+end_src
ou si /a/ est le nom de la sortie
#+begin_src
lol.out.a.view()
#+end_src
** Quelle version du génome ?
- T2T: notation chromose = chR1,2 : ok genome, clinvar, dbSNP
- GRCh38: notation chromose = NC_... : ok genome, clinvar, dbSNP
** Performances
Ordinateur de Carine (WSL2) : 4h dont 1h15 alignement (parallélisé) et 1h15 haplotypecaller (séquentiel)
** Chromosomes NC, NT, NW
Correspondance :
https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&chromInfoPage=
Signification
https://genome.ucsc.edu/FAQ/FAQdownloads.htm
:00>
:LOGBOOK:
CLOCK: [2023-07-30 Sun 19:13]--[2023-07-30 Sun 20:50] => 1:37
:END:
Modification nécessaire pour kent :
- plus de patch
- suppression d'une boucle dans postPatch
On supprime aussi NIX_BUILD_TOP
*** DONE [[https://github.com/NixOS/nixpkgs/pull/186459][BioDBHTS]]
CLOSED: [2023-05-06 Sat 08:49] SCHEDULED: <2023-04-15 Sat>
/Entered on/ [2022-08-10 Wed 14:28]
Correction pour review faites <2022-10-10 Mon>
*** DONE [[https://github.com/NixOS/nixpkgs/pull/186464][BioExtAlign]]
CLOSED: [2022-10-22 Sat 12:43] SCHEDULED: <2022-08-10 Wed>
/Entered on/ [2022-08-10 Wed 14:28]
Review <2022-10-10 Mon>, correction dans la journée.
Correction 2e passe, attente
Impossible de faire marcher les tests Car il ne trouve pas le module Bio::Tools::Align, qui est dans un dossier ailleurs dans le dépôt. Même en compilant tout le dépôt, cela ne fonctionne pas... On skip les tests.
*** TODO VEP
** WAIT [[https://github.com/NixOS/nixpkgs/pull/230394][rtg-tools]] :vcfeval:
Soumis
** WAIT Package Spip https://github.com/NixOS/nixpkgs/pull/247476
** TODO Happy :happy:
*** PROJ PR python 3 upstream
*** PROJ nixpkgs en l'état
** PROJ SpliceAI
** TODO Bamsurgeon
/Entered on/ [2023-05-13 Sat 19:11]
*** TODO Velvet
** TODO PR Picard avec option pour gérer la mémoire
Similaire à
https://github.com/bioconda/bioconda-recipes/blob/master/recipes/picard/picard.sh
* Julia :julia:
** KILL XAM.jl: PR pour modification record :julia:
CLOSED: [2023-05-29 Mon 15:40] SCHEDULED: <2023-05-28 Sun>
/Entered on/ [2023-05-27 Sat 22:39]
** TODO XAMscissors.jl :xamscissors:
Modification de la séquence dans BAM.
*Pas de mise à jour de CIGAR*
On convertit en fastq et on lance le pipeline pour "corriger"
#+begin_src sh
cd /home/alex/code/bisonex/out/63003856/preprocessing/mapped
samtools view 63003856_S135.bam NC_000022.11 -o 63003856_S135_chr22.bam
cd /home/alex/recherche/bisonex/code/BamScissors.jl
cp ~/code/bisonex/out/63003856/preprocessing/mapped/63003856_S135_chr22.bam .
samtools index 63003856_chr22.bam
#+end_src
Le script va modifier le bam, le trier et générer le fastq. !!!
Attention: ne pas oublier l'option -n !!!
#+begin_src sh
time julia --project=.. insertVariant.jl
scp 63003856_S135_chr22_{1,2}.fq.gz meso:/Work/Users/apraga/bisonex/tests/bamscissors/
#+end_src
*** WAIT Implémenter les SNV avec VAF :snv:
Stratégie :
1. calculer la profondeur sur les positions
2. créer un dictionnaire { nom du reads : position dataframe }
3. itérer sur tous les reads et changer ceux marqués
**** DONE VAF = 1
CLOSED: [2023-05-29 Mon 15:34]
**** DONE VAF selon loi normale
CLOSED: [2023-05-29 Mon 15:35]
Tronquée si > 1
**** WAIT Tests unitaires
***** DONE NA12878: 1 gène sur chromosome 22
CLOSED: [2023-05-30 Tue 23:55]
root = "https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/"
#+begin_src sh
samtools view project.NIST_NIST7035_H7AP8ADXX_NA12878.bwa.markDuplicates.bam chr22 -o project.NIST_NIST7035_H7AP8ADXX_NA12878_chr22.bam
samtools view project.NIST_NIST7035_H7AP8ADXX_NA12878_chr22.bam chr22:19419700-19424000 -o NIST7035_H7AP8ADXX_NA12878_chr22_MRPL40_hg19.bam
#+end_src
***** WAIT Pull request formatspeciment
https://github.com/BioJulia/FormatSpecimens.jl/pull/8
***** DONE Formatspecimens
CLOSED: [2023-05-29 Mon 23:03]
****** DONE 1 read
CLOSED: [2023-05-29 Mon 23:02]
****** DONE VAF sur 1 exon
CLOSED: [2023-05-29 Mon 23:03]
**** DONE [#A] Bug: perte de nombreux reads avec NA12878
CLOSED: [2023-08-19 Sat 20:45] SCHEDULED: <2023-08-18 Fri>
:PROPERTIES:
:ID: 5c1c36f3-f68e-4e6d-a7b6-61dca89abc37
:END:
Ex: chrX:g.124056226 : on passe de 65 reads à 1
Test xamscissors: pas de soucis...
On teste sur cette position +/- 200bp
#+begin_src sh :dir /home/alex/roam/research/bisonex/code/sanger
samtools view /home/alex/code/bisonex/out/2300346867_NA12878-63118093_S260-GRCh38/preprocessing/mapped/2300346867_NA12878-63118093_S260-GRCh38.bam chrX:124056026-124056426 -o chrXsmall.bam
#+end_src
#+RESULTS:
***** DONE Vérifier profondeur avec dernière version :
CLOSED: [2023-08-19 Sat 20:34] SCHEDULED: <2023-08-19 Sat>
****** DONE chr20: profondeur ok
SCHEDULED: <2023-08-19 Sat>
****** DONE toutes les données
CLOSED: [2023-08-19 Sat 20:34] SCHEDULED: <2023-08-19 Sat>
Ok pour 7 variants (IGV) notament chromosome X
*** TODO Implémenter les indel avec VAF :indel:
*** TODO Soumission paquet
* Données
:PROPERTIES:
:CATEGORY: data
:END:
** DONE Remplacer bam par fastq sur mesocentre
CLOSED: [2023-04-16 Sun 16:33]
Commande
*** DONE Supprimer les fastq non "paired"
CLOSED: [2023-04-16 Sun 16:33]
nushell
Liste des fastq avec "paired-end" manquant
#+begin_src nu
ls **/*.fastq.gz | get name | path basename | split column "_" | get column1 | uniq -u | save single.txt
#+end_src
#+RESULTS:
: 62907927
: 62907970
: 62899606
: 62911287
: 62913201
: 62914084
: 62915905
: 62921595
: 62923065
: 62925220
: 62926503
: 62926502
: 62926500
: 62926499
: 62926498
: 62931719
: 62943423
: 62943400
: 62948290
: 62949205
: 62949206
: 62949118
: 62951284
: 62960792
: 62960785
: 62960787
: 62960617
: 62962561
: 62962692
: 62967473
: 62972194
: 62979102
On vérifie
#+begin_src nu
open single.txt | lines | each {|e| ls $"fastq/*_($in)/*" | get 0 }
open single.txt | lines | each {|e| ls $"fastq/*_($in)/*" | get 0.name } | path basename | split column "_" | get column1 | uniq -c
#+end_src
On met tous dans un dossier (pas de suppression )
#+begin_src
open single.txt | lines | each {|e| ls $"fastq/*_($in)/*" | get 0 } | each {|e| ^mv $e.name bad-fastq/}
#+end_src
On vérifie que les dossiier sont videsj
open single.txt | lines | each {|e| ls $"fastq/*_($in)" | get 0.name } | ^ls -l $in
Puis on supprime
open single.txt | lines | each {|e| ls $"fastq/*_($in)" | get 0.name } | ^rm -r $in
*** DONE Supprimer bam qui ont des fastq
CLOSED: [2023-04-16 Sun 16:33]
On liste les identifiants des fastq et bam dans un tableau avec leur type :
#+begin_src
let fastq = (ls fastq/*/*.fastq.gz | get name | parse "{dir}/{full_id}/{id}_{R}_001.fastq.gz" | select dir id | uniq )
let bam = (ls bam/*/*.bam | get name | parse "{dir}/{full_id}/{id}_{S}.bqrt.bam" | select dir id)
#+end_src
On groupe les résultat par identifiant (résultats = liste de records qui doit être convertie en table)
et on trie ceux qui n'ont qu'un fastq ou un bam
#+begin_src
let single = ( $bam | append $fastq | group-by id | transpose id files | get files | where {|x| ($x | length) == 1})
#+end_src
On convertit en table et on récupère seulement les bam
#+begin_src
$single | reduce {|it, acc| $acc | append $it} | where dir == bam | get id | each {|e| ^ls $"bam/*_($e)/*.bam"}
#+end_src
#+RESULTS:
: bam/2100656174_62913201/62913201_S52.bqrt.bam
: bam/2100733271_62925220/62925220_S33.bqrt.bam
: bam/2100738763_62926502/62926502_S108.bqrt.bam
: bam/2100746726_62926498/62926498_S105.bqrt.bam
: bam/2100787936_62931955/62931955_S4.bqrt.bam
: bam/2200066374_62948290/62948290_S130.bqrt.bam
: bam/2200074722_62948298/62948298_S131.bqrt.bam
: bam/2200074990_62948306/62948306_S218.bqrt.bam
: bam/2200214581_62967331/62967331_S267.bqrt.bam
: bam/2200225399_62972187/62972187_S85.bqrt.bam
: bam/2200293962_62979117/62979117_S63.bqrt.bam
: bam/2200423985_62999352/62999352_S1.bqrt.bam
: bam/2200495073_63010427/63010427_S20.bqrt.bam
: bam/2200511274_63012586/63012586_S114.bqrt.bam
: bam/2200669188_63036688/63036688_S150.bqrt.bam
* Nouveau workflow :workflow:
** TODO Bases de données
*** KILL Nix pour télécharger les données brutes
**** Conclusion
Non viable sur cluster car en dehors de /nix/store
On peut utiliser des symlink mais trop compliqué
**** KILL Axel au lieu de curl pour gérer les timeout?
CLOSED: [2022-08-19 Fri 15:18]
*** DONE Tester patch de @pennae pour gros fichiers
SCHEDULED: <2022-08-19 Fri>
*** KILL Télécharger les données avec nextflow: hg38
CLOSED: [2023-06-12 Mon 23:29]
**** DONE Genome de référence
**** DONE dbSNP
**** DONE VEP 20G
CLO
:00>
:LOGBOOK:
CLOCK: [2023-07-30 Sun 19:13]--[2023-07-30 Sun 20:50] => 1:37
:END:
Modification nécessaire pour kent :
- plus de patch
- suppression d'une boucle dans postPatch
On supprime aussi NIX_BUILD_TOP
*** DONE [[https://github.com/NixOS/nixpkgs/pull/186459][BioDBHTS]]
CLOSED: [2023-05-06 Sat 08:49] SCHEDULED: <2023-04-15 Sat>
/Entered on/ [2022-08-10 Wed 14:28]
Correction pour review faites <2022-10-10 Mon>
*** DONE [[https://github.com/NixOS/nixpkgs/pull/186464][BioExtAlign]]
CLOSED: [2022-10-22 Sat 12:43] SCHEDULED: <2022-08-10 Wed>
/Entered on/ [2022-08-10 Wed 14:28]
Review <2022-10-10 Mon>, correction dans la journée.
Correction 2e passe, attente
Impossible de faire marcher les tests Car il ne trouve pas le module Bio::Tools::Align, qui est dans un dossier ailleurs dans le dépôt. Même en compilant tout le dépôt, cela ne fonctionne pas... On skip les tests.
*** TODO VEP
** WAIT [[https://github.com/NixOS/nixpkgs/pull/230394][rtg-tools]] :vcfeval:
Soumis
** WAIT Package Spip https://github.com/NixOS/nixpkgs/pull/247476
** TODO Happy :happy:
*** TODO PR python 3 upstream
SCHEDULED: <2023-10-14 Sat>
*** TODO nixpkgs en l'état
SCHEDULED: <2023-10-14 Sat>
** PROJ SpliceAI
** TODO Bamsurgeon
/Entered on/ [2023-05-13 Sat 19:11]
*** TODO Velvet
** TODO PR Picard avec option pour gérer la mémoire
Similaire à
https://github.com/bioconda/bioconda-recipes/blob/master/recipes/picard/picard.sh
* Julia :julia:
** KILL XAM.jl: PR pour modification record :julia:
CLOSED: [2023-05-29 Mon 15:40] SCHEDULED: <2023-05-28 Sun>
/Entered on/ [2023-05-27 Sat 22:39]
** TODO XAMscissors.jl :xamscissors:
Modification de la séquence dans BAM.
*Pas de mise à jour de CIGAR*
On convertit en fastq et on lance le pipeline pour "corriger"
#+begin_src sh
cd /home/alex/code/bisonex/out/63003856/preprocessing/mapped
samtools view 63003856_S135.bam NC_000022.11 -o 63003856_S135_chr22.bam
cd /home/alex/recherche/bisonex/code/BamScissors.jl
cp ~/code/bisonex/out/63003856/preprocessing/mapped/63003856_S135_chr22.bam .
samtools index 63003856_chr22.bam
#+end_src
Le script va modifier le bam, le trier et générer le fastq. !!!
Attention: ne pas oublier l'option -n !!!
#+begin_src sh
time julia --project=.. insertVariant.jl
scp 63003856_S135_chr22_{1,2}.fq.gz meso:/Work/Users/apraga/bisonex/tests/bamscissors/
#+end_src
*** WAIT Implémenter les SNV avec VAF :snv:
Stratégie :
1. calculer la profondeur sur les positions
2. créer un dictionnaire { nom du reads : position dataframe }
3. itérer sur tous les reads et changer ceux marqués
**** DONE VAF = 1
CLOSED: [2023-05-29 Mon 15:34]
**** DONE VAF selon loi normale
CLOSED: [2023-05-29 Mon 15:35]
Tronquée si > 1
**** WAIT Tests unitaires
***** DONE NA12878: 1 gène sur chromosome 22
CLOSED: [2023-05-30 Tue 23:55]
root = "https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/Garvan_NA12878_HG001_HiSeq_Exome/"
#+begin_src sh
samtools view project.NIST_NIST7035_H7AP8ADXX_NA12878.bwa.markDuplicates.bam chr22 -o project.NIST_NIST7035_H7AP8ADXX_NA12878_chr22.bam
samtools view project.NIST_NIST7035_H7AP8ADXX_NA12878_chr22.bam chr22:19419700-19424000 -o NIST7035_H7AP8ADXX_NA12878_chr22_MRPL40_hg19.bam
#+end_src
***** WAIT Pull request formatspeciment
https://github.com/BioJulia/FormatSpecimens.jl/pull/8
***** DONE Formatspecimens
CLOSED: [2023-05-29 Mon 23:03]
****** DONE 1 read
CLOSED: [2023-05-29 Mon 23:02]
****** DONE VAF sur 1 exon
CLOSED: [2023-05-29 Mon 23:03]
**** DONE [#A] Bug: perte de nombreux reads avec NA12878
CLOSED: [2023-08-19 Sat 20:45] SCHEDULED: <2023-08-18 Fri>
:PROPERTIES:
:ID: 5c1c36f3-f68e-4e6d-a7b6-61dca89abc37
:END:
Ex: chrX:g.124056226 : on passe de 65 reads à 1
Test xamscissors: pas de soucis...
On teste sur cette position +/- 200bp
#+begin_src sh :dir /home/alex/roam/research/bisonex/code/sanger
samtools view /home/alex/code/bisonex/out/2300346867_NA12878-63118093_S260-GRCh38/preprocessing/mapped/2300346867_NA12878-63118093_S260-GRCh38.bam chrX:124056026-124056426 -o chrXsmall.bam
#+end_src
#+RESULTS:
***** DONE Vérifier profondeur avec dernière version :
CLOSED: [2023-08-19 Sat 20:34] SCHEDULED: <2023-08-19 Sat>
****** DONE chr20: profondeur ok
SCHEDULED: <2023-08-19 Sat>
****** DONE toutes les données
CLOSED: [2023-08-19 Sat 20:34] SCHEDULED: <2023-08-19 Sat>
Ok pour 7 variants (IGV) notament chromosome X
*** TODO Implémenter les indel avec VAF :indel:
*** TODO Soumission paquet
* Données
:PROPERTIES:
:CATEGORY: data
:END:
** DONE Remplacer bam par fastq sur mesocentre
CLOSED: [2023-04-16 Sun 16:33]
Commande
*** DONE Supprimer les fastq non "paired"
CLOSED: [2023-04-16 Sun 16:33]
nushell
Liste des fastq avec "paired-end" manquant
#+begin_src nu
ls **/*.fastq.gz | get name | path basename | split column "_" | get column1 | uniq -u | save single.txt
#+end_src
#+RESULTS:
: 62907927
: 62907970
: 62899606
: 62911287
: 62913201
: 62914084
: 62915905
: 62921595
: 62923065
: 62925220
: 62926503
: 62926502
: 62926500
: 62926499
: 62926498
: 62931719
: 62943423
: 62943400
: 62948290
: 62949205
: 62949206
: 62949118
: 62951284
: 62960792
: 62960785
: 62960787
: 62960617
: 62962561
: 62962692
: 62967473
: 62972194
: 62979102
On vérifie
#+begin_src nu
open single.txt | lines | each {|e| ls $"fastq/*_($in)/*" | get 0 }
open single.txt | lines | each {|e| ls $"fastq/*_($in)/*" | get 0.name } | path basename | split column "_" | get column1 | uniq -c
#+end_src
On met tous dans un dossier (pas de suppression )
#+begin_src
open single.txt | lines | each {|e| ls $"fastq/*_($in)/*" | get 0 } | each {|e| ^mv $e.name bad-fastq/}
#+end_src
On vérifie que les dossiier sont videsj
open single.txt | lines | each {|e| ls $"fastq/*_($in)" | get 0.name } | ^ls -l $in
Puis on supprime
open single.txt | lines | each {|e| ls $"fastq/*_($in)" | get 0.name } | ^rm -r $in
*** DONE Supprimer bam qui ont des fastq
CLOSED: [2023-04-16 Sun 16:33]
On liste les identifiants des fastq et bam dans un tableau avec leur type :
#+begin_src
let fastq = (ls fastq/*/*.fastq.gz | get name | parse "{dir}/{full_id}/{id}_{R}_001.fastq.gz" | select dir id | uniq )
let bam = (ls bam/*/*.bam | get name | parse "{dir}/{full_id}/{id}_{S}.bqrt.bam" | select dir id)
#+end_src
On groupe les résultat par identifiant (résultats = liste de records qui doit être convertie en table)
et on trie ceux qui n'ont qu'un fastq ou un bam
#+begin_src
let single = ( $bam | append $fastq | group-by id | transpose id files | get files | where {|x| ($x | length) == 1})
#+end_src
On convertit en table et on récupère seulement les bam
#+begin_src
$single | reduce {|it, acc| $acc | append $it} | where dir == bam | get id | each {|e| ^ls $"bam/*_($e)/*.bam"}
#+end_src
#+RESULTS:
: bam/2100656174_62913201/62913201_S52.bqrt.bam
: bam/2100733271_62925220/62925220_S33.bqrt.bam
: bam/2100738763_62926502/62926502_S108.bqrt.bam
: bam/2100746726_62926498/62926498_S105.bqrt.bam
: bam/2100787936_62931955/62931955_S4.bqrt.bam
: bam/2200066374_62948290/62948290_S130.bqrt.bam
: bam/2200074722_62948298/62948298_S131.bqrt.bam
: bam/2200074990_62948306/62948306_S218.bqrt.bam
: bam/2200214581_62967331/62967331_S267.bqrt.bam
: bam/2200225399_62972187/62972187_S85.bqrt.bam
: bam/2200293962_62979117/62979117_S63.bqrt.bam
: bam/2200423985_62999352/62999352_S1.bqrt.bam
: bam/2200495073_63010427/63010427_S20.bqrt.bam
: bam/2200511274_63012586/63012586_S114.bqrt.bam
: bam/2200669188_63036688/63036688_S150.bqrt.bam
* Nouveau workflow :workflow:
** TODO Bases de données
*** KILL Nix pour télécharger les données brutes
**** Conclusion
Non viable sur cluster car en dehors de /nix/store
On peut utiliser des symlink mais trop compliqué
**** KILL Axel au lieu de curl pour gérer les timeout?
CLOSED: [2022-08-19 Fri 15:18]
*** DONE Tester patch de @pennae pour gros fichiers
SCHEDULED: <2022-08-19 Fri>
*** KILL Télécharger les données avec nextflow: hg38
CLOSED: [2023-06-12 Mon 23:29]
**** DONE Genome de référence
**** DONE dbSNP
**** DONE VEP 20G
CLO
| G | A | Conflicting_interpretations_of_pathogenicity |
| NC_000020.11 | 63414925 | 129337 | G | C | Benign |
| NC_000020.11 | 63414925 | 851545 | GG | CA | Uncertain_significance |
| ------ | | | | | |
On a donc plusieurs problèmes :
1. isec devrait fonctionner au moins sur
| NC_000020.11 | 25390747 | rs373200654 | G | C | |
| NC_000020.11 | 25390747 | 338000 | G | C | Conflicting_interpretations_of_pathogenicity |
On teste juste sur cette ligne
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools filter -i 'POS=25390747' clinvar_chr20.vcf.gz -o clinvar_test.vcf.gz
bcftools filter -i 'POS=25390747' dbSNP_common_chr20.vcf.gz -o dbSNP_test.vcf.gz
#+end_src
On retrouve bien la ligne dans l'intersection...
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools filter -i 'POS=25390747' clinvar_chr20.vcf.gz -o clinvar_test.vcf.gz
bcftools index dbSNP_test.vcf.gz dbSNP_test.vcf.gz
bcftools index dbSNP_test.vcf.gz clinvar_test.vcf.gz
bcftools isec dbSNP_test.vcf.gz clinvar_test.vcf.gz -p test
#+end_src
#+RESULTS:
2. isec ne semble pas fonctionner sur en cas d'ALT multiples
| NC_000020.11 | 32800145 | rs2424926 | C | G,T | |
| NC_000020.11 | 32800145 | 338173 | C | G | Benign |
| NC_000020.11 | 32800145 | 338174 | C | T | Conflicting_interpretations_of_pathogenicity |
| | | | | | |
3. s'il y a plusieurs variantions à une position, il faut
bien vérifier que tous ne sont pas patho.
La version d'Alexis le fait bien
| NC_000020.11 | 3234173 | rs3827075 | T | A,C,G | |
| NC_000020.11 | 3234173 | 262001 | T | G | Conflicting_interpretations_of_pathogenicity |
| NC_000020.11 | 3234173 | 1072511 | T | TGGCGAAGC | Pathogenic |
| NC_000020.11 | 3234173 | 208613 | TGGCGAAGC | G | Pathogenic |
| NC_000020.11 | 3234173 | 1312 | TGGCGAAGC | T | Pathogenic |
****** DONE Voir si isec gère les multiallélique (chr20) : non, impossible de faire marcher
CLOSED: [2022-11-27 Sun 00:37]
******* DONE chr20 en prenant un patho clinvar aussi dans dbSNP
CLOSED: [2022-11-27 Sun 00:37]
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools filter dbSNP_common_chr20.vcf.gz -i 'POS=10652589' -o test_dbsnp.vcf.gz
bcftools filter clinvar_chr20.vcf.gz -i 'POS=10652589' -o test_clinvar.vcf.gz
bcftools index test_dbsnp.vcf.gz
bcftools index test_clinvar.vcf.gz
#+end_src
#+RESULTS:
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools isec test_dbsnp.vcf.gz test_clinvar.vcf.gz -p tmp
grep '^[^#]' tmp/0002.vcf
grep '^[^#]' tmp/0003.vcf
#+end_src
#+RESULTS:
Même en biallélique, ne fonctionne pas.
Testé en modifiant test_dbsnp !
Fonctionne avec un variant par ligne
****** DONE isec en coupant les sites multialléliques: non
CLOSED: [2022-11-27 Sun 00:37]
******* DONE Exemple simple ok
CLOSED: [2022-11-27 Sun 00:34]
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools filter -i 'POS=10652589' dbSNP_common_chr20.vcf.gz -o dbsnp_mwi.vcf.gz
bcftools filter
-i 'POS=10652589' clinvar_chr20.vcf.gz -o clinvar_mwi.vcf.gz
bcftools index -f dbsnp_mwi.vcf.gz
bcftools index -f clinvar_mwi.vcf.gz
bcftools isec dbsnp_mwi.vcf.gz clinvar_mwi.vcf.gz -n=2
#+end_src
#+RESULTS:
Même en biallélique, ne fonctionne pas.
Chr 20
Avec les fichiers du teste précédent
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools norm -m -any dbsnp_mwi.vcf.gz -o dbsnp_mwi_norm.vcf.gz
bcftools index dbsnp_mwi_norm.vcf.gz
bcftools isec dbsnp_mwi_norm.vcf.gz clinvar_mwi.vcf.gz -n=2
#+end_src
#+RESULTS:
| NC_000020.11 | 10652589 | G | A | 11 |
| NC_000020.11 | 10652589 | G | C | 11 |
******* TODO Sur dbSNP chr20 non
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools norm -m -any dbSNP_common_chr20 -o dbSNP_common_chr20_norm.vcf.gz
#+end_src
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools isec -i 'INFO/CLNSIG="Pathogenic"' dbSNP_common_chr20_norm.vcf.gz clinvar_chr20.vcf.gz -p tmp
#+end_src
#+RESULTS:
***** DONE Essai bedtools intersect
#+begin_src sh
bedtools intersect -a dbSNP_common.vcf.gz -b clinvar.vcf.gz
#+end_src
$ wc -l intersect.vcf
220206 intersect.vcf
** TODO Dépendences avec Nix
*** DONE GATK
CLOSED: [2022-10-21 Fri 21:59]
*** WAIT BioDBHTS
Contribuer pull request
*** DONE BioExtAlign
CLOSED: [2022-10-22 Sat 00:38]
*** WAIT BioBigFile
Revoir si on peut utliser kent dernière version
Contribuer pull request
*** HOLD rtg-tools
Convertir clinvar NC
*** DONE simuscop
CLOSED: [2022-12-30 Fri 22:31]
*** DONE Spip
CLOSED: [2022-12-04 Sun 12:49]
Pas de pull request
*** DONE R + packages
CLOSED: [2022-11-19 Sat 21:05]
*** TODO hap.py
https://github.com/Illumina/hap.py
**** DONE Version sans rtgtools avec python 3
CLOSED: [2023-02-02 Thu 22:15]
Procédure pour tester
#+begin_src
nix develop .#hap-py
$ genericBuild
#+end_src
1. Supprimer l’appel à make_dependencies dans cmakelist.txt : on peut tout installer avec nix
2. Patch Roc.cpp pour avoir numeric_limits ( error: 'numeric_limits' is not a member of 'std')
3. ajout de flags de link (essai, error)
set(ZLIB_LIBRARIES -lz -lbz2 -lcurl -lcrypto -llzma)
4. Changer les appels à print en print() dans le code python et suppression de quelques import
[nix-shell:~/source]$ sed -i.orig 's/print \"\(.*\)"/print(\1)/' src/python/*.py
**** DONE Sérialiser json pour écrire données de sorties
CLOSED: [2023-02-17 Fri 19:25]
**** DONE Tester sur example
CLOSED: [2023-02-04 Sat 00:25]
#+begin_src sh
$ cd hap.py
$ ../result/bin/hap.py example/happy/PG_NA12878_chr21.vcf.gz example/happy/NA12878_chr21.vcf.gz -f example/happy/PG_Conf_chr21.bed.gz -o test -r example/chr21.fa
#+end_src
#+RESULTS:
| Type | Filter | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision | METRIC.Frac_NA | METRIC.F1_Score |
| INDEL | ALL | 8937 | 7839 | 1098 | 11812 | 343 | 3520 | 45 | 283 | 0.877140 | 0.958635 | 0.298002 | 0.916079 |
| INDEL | PASS | 8937 | 7550 | 1387 | 9971 | 283 | 1964 | 30 | 242 | 0.844803 | 0.964656 | 0.196971 | 0.900760 |
| SNP | ALL | 52494 | 52125 | 369 | 90092 | 582 | 37348 | 107 | 354 | 0.992971 | 0.988966 | 0.414554 | 0.990964 |
| SNP | PASS | 52494 | 46920 | 5574 | 48078 | 143 | 992 | 8 | 97 | 0.893816 | 0.996963 | 0.020633 | 0.942576 |
**** KILL Version avec rtg-tools
CLOSED: [2023-07-30 Sun 14:38]
**** HOLD Faire fonctionner Tests
***** HOLD Essai 2 : depuis nix develop:
#+begin_src
nix develop .#hap-py
genericBuild
#+end_src
Lancé initialement à la main, mais on peut maintenant utiliser run_tests
#+begin_src
HCDIR=bin/ ../src/sh/run_tests.sha
#+end_src
- [X] test boost
- [X] multimerge
- [X] hapenum
- [X] fp accuracy
- [X] faulty variant
- leftshift fails
- [X] other vcf
- [X] chr prefix
- [X] gvcf
- [X] decomp
- [X] contig lengt
- [X] integration test
- [ ] scmp fails sur le type
- [X] giab
- [X] performance
- [ ] quantify fails sur le type
- [ ] stratified échec sur les résultats !
- [X] pg counting
- [ ] sompy: ne trouve pas Strelka dans somatic
phases="buildPhase checkPhase installPhase fixupPhase" genericBuild
#+end_src
**** KILL Reproduire les performances precisionchallenge : attention à HG002 et HG001!
CLOSED: [2023-04-01 Sa
| G | A | Conflicting_interpretations_of_pathogenicity |
| NC_000020.11 | 63414925 | 129337 | G | C | Benign |
| NC_000020.11 | 63414925 | 851545 | GG | CA | Uncertain_significance |
| ------ | | | | | |
On a donc plusieurs problèmes :
1. isec devrait fonctionner au moins sur
| NC_000020.11 | 25390747 | rs373200654 | G | C | |
| NC_000020.11 | 25390747 | 338000 | G | C | Conflicting_interpretations_of_pathogenicity |
On teste juste sur cette ligne
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools filter -i 'POS=25390747' clinvar_chr20.vcf.gz -o clinvar_test.vcf.gz
bcftools filter -i 'POS=25390747' dbSNP_common_chr20.vcf.gz -o dbSNP_test.vcf.gz
#+end_src
On retrouve bien la ligne dans l'intersection...
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools filter -i 'POS=25390747' clinvar_chr20.vcf.gz -o clinvar_test.vcf.gz
bcftools index dbSNP_test.vcf.gz dbSNP_test.vcf.gz
bcftools index dbSNP_test.vcf.gz clinvar_test.vcf.gz
bcftools isec dbSNP_test.vcf.gz clinvar_test.vcf.gz -p test
#+end_src
#+RESULTS:
2. isec ne semble pas fonctionner sur en cas d'ALT multiples
| NC_000020.11 | 32800145 | rs2424926 | C | G,T | |
| NC_000020.11 | 32800145 | 338173 | C | G | Benign |
| NC_000020.11 | 32800145 | 338174 | C | T | Conflicting_interpretations_of_pathogenicity |
| | | | | | |
3. s'il y a plusieurs variantions à une position, il faut bien vérifier que tous ne sont pas patho.
La version d'Alexis le fait bien
| NC_000020.11 | 3234173 | rs3827075 | T | A,C,G | |
| NC_000020.11 | 3234173 | 262001 | T | G | Conflicting_interpretations_of_pathogenicity |
| NC_000020.11 | 3234173 | 1072511 | T | TGGCGAAGC | Pathogenic |
| NC_000020.11 | 3234173 | 208613 | TGGCGAAGC | G | Pathogenic |
| NC_000020.11 | 3234173 | 1312 | TGGCGAAGC | T | Pathogenic |
****** DONE Voir si isec gère les multiallélique (chr20) : non, impossible de faire marcher
CLOSED: [2022-11-27 Sun 00:37]
******* DONE chr20 en prenant un patho clinvar aussi dans dbSNP
CLOSED: [2022-11-27 Sun 00:37]
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools filter dbSNP_common_chr20.vcf.gz -i 'POS=10652589' -o test_dbsnp.vcf.gz
bcftools filter clinvar_chr20.vcf.gz -i 'POS=10652589' -o test_clinvar.vcf.gz
bcftools index test_dbsnp.vcf.gz
bcftools index test_clinvar.vcf.gz
#+end_src
#+RESULTS:
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools isec test_dbsnp.vcf.gz test_clinvar.vcf.gz -p tmp
grep '^[^#]' tmp/0002.vcf
grep '^[^#]' tmp/0003.vcf
#+end_src
#+RESULTS:
Même en biallélique, ne fonctionne pas.
Testé en modifiant test_dbsnp !
Fonctionne avec un variant par ligne
****** DONE isec en coupant les sites multialléliques: non
CLOSED: [2022-11-27 Sun 00:37]
******* DONE Exemple simple ok
CLOSED: [2022-11-27 Sun 00:34]
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools filter -i 'POS=10652589' dbSNP_common_chr20.vcf.gz -o dbsnp_mwi.vcf.gz
bcftools filter -i 'POS=10652589' clinvar_chr20.vcf.gz -o clinvar_mwi.vcf.gz
bcftools index -f dbsnp_mwi.vcf.gz
bcftools index -f clinvar_mwi.vcf.gz
bcftools isec dbsnp_mwi.vcf.gz clinvar_mwi.vcf.gz -n=2
#+end_src
#+RESULTS:
Même en biallélique, ne fonctionne pas.
Chr 20
Avec les fichiers du teste précédent
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools norm -m -any dbsnp_mwi.vcf.gz -o dbsnp_mwi_norm.vcf.gz
bcftools index dbsnp_mwi_norm.vcf.gz
bcftools isec dbsnp_mwi_norm.vcf.gz clinvar_mwi.vcf.gz -n=2
#+end_src
#+RESULTS:
| NC_000020.11 | 10652589 | G | A | 11 |
| NC_000020.11 | 10652589 | G | C | 11 |
******* DONE Sur dbSNP chr20 non
CLOSED: [2023-10-07 Sat 17:57]
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools norm -m -any dbSNP_common_chr20 -o dbSNP_common_chr20_norm.vcf.gz
#+end_src
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools isec -i 'INFO/CLNSIG="Pathogenic"' dbSNP_common_chr20_norm.vcf.gz clinvar_chr20.vcf.gz -p tmp
#+end_src
#+RESULTS:
***** DONE Essai bedtools intersect
#+begin_src sh
bedtools intersect -a dbSNP_common.vcf.gz -b clinvar.vcf.gz
#+end_src
$ wc -l intersect.vcf
220206 intersect.vcf
** TODO Dépendences avec Nix
*** DONE GATK
CLOSED: [2022-10-21 Fri 21:59]
*** WAIT BioDBHTS
Contribuer pull request
*** DONE BioExtAlign
CLOSED: [2022-10-22 Sat 00:38]
*** WAIT BioBigFile
Revoir si on peut utliser kent dernière version
Contribuer pull request
*** HOLD rtg-tools
Convertir clinvar NC
*** DONE simuscop
CLOSED: [2022-12-30 Fri 22:31]
*** DONE Spip
CLOSED: [2022-12-04 Sun 12:49]
Pas de pull request
*** DONE R + packages
CLOSED: [2022-11-19 Sat 21:05]
*** TODO hap.py
https://github.com/Illumina/hap.py
**** DONE Version sans rtgtools avec python 3
CLOSED: [2023-02-02 Thu 22:15]
Procédure pour tester
#+begin_src
nix develop .#hap-py
$ genericBuild
#+end_src
1. Supprimer l’appel à make_dependencies dans cmakelist.txt : on peut tout installer avec nix
2. Patch Roc.cpp pour avoir numeric_limits ( error: 'numeric_limits' is not a member of 'std')
3. ajout de flags de link (essai, error)
set(ZLIB_LIBRARIES -lz -lbz2 -lcurl -lcrypto -llzma)
4. Changer les appels à print en print() dans le code python et suppression de quelques import
[nix-shell:~/source]$ sed -i.orig 's/print \"\(.*\)"/print(\1)/' src/python/*.py
**** DONE Sérialiser json pour écrire données de sorties
CLOSED: [2023-02-17 Fri 19:25]
**** DONE Tester sur example
CLOSED: [2023-02-04 Sat 00:25]
#+begin_src sh
$ cd hap.py
$ ../result/bin/hap.py example/happy/PG_NA12878_chr21.vcf.gz example/happy/NA12878_chr21.vcf.gz -f example/happy/PG_Conf_chr21.bed.gz -o test -r example/chr21.fa
#+end_src
#+RESULTS:
| Type | Filter | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision | METRIC.Frac_NA | METRIC.F1_Score |
| INDEL | ALL | 8937 | 7839 | 1098 | 11812 | 343 | 3520 | 45 | 283 | 0.877140 | 0.958635 | 0.298002 | 0.916079 |
| INDEL | PASS | 8937 | 7550 | 1387 | 9971 | 283 | 1964 | 30 | 242 | 0.844803 | 0.964656 | 0.196971 | 0.900760 |
| SNP | ALL | 52494 | 52125 | 369 | 90092 | 582 | 37348 | 107 | 354 | 0.992971 | 0.988966 | 0.414554 | 0.990964 |
| SNP | PASS | 52494 | 46920 | 5574 | 48078 | 143 | 992 | 8 | 97 | 0.893816 | 0.996963 | 0.020633 | 0.942576 |
**** KILL Version avec rtg-tools
CLOSED: [2023-07-30 Sun 14:38]
**** HOLD Faire fonctionner Tests
***** HOLD Essai 2 : depuis nix develop:
#+begin_src
nix develop .#hap-py
genericBuild
#+end_src
Lancé initialement à la main, mais on peut maintenant utiliser run_tests
#+begin_src
HCDIR=bin/ ../src/sh/run_tests.sha
#+end_src
- [X] test boost
- [X] multimerge
- [X] hapenum
- [X] fp accuracy
- [X] faulty variant
- leftshift fails
- [X] other vcf
- [X] chr prefix
- [X] gvcf
- [X] decomp
- [X] contig lengt
- [X] integration test
- [ ] scmp fails sur le type
- [X] giab
- [X] performance
- [ ] quantify fails sur le type
- [ ] stratified échec sur les résultats !
- [X] pg counting
- [ ] sompy: ne trouve pas Strelka dans somatic
phases="buildPhase checkPhase installPhase fixupPhase" genericBuild
#+end_src
**** KILL Reproduire les performances precisionchallenge : attention à HG002 et HG001!
CLOSED: [2023-04-01 Sa
to load libgkl_utils.so from n
ative/libgkl_utils.so (/Work/Users/apraga/bisonex/out/NA12878_NIST7035/
preprocessing/applybqsr/libgkl_utils821485189051585397.so: libgomp.so.1:
cannot open shared object file: No such file or directory)
17:28:00.733 WARN IntelPairHmm - Intel GKL Utils not loaded
17:28:00.733 WARN PairHMM - ***WARNING: Machine does not have the AVX instruction set support needed for the accelerated AVX PairHmm. Falling back to the MUCH slower LOGLESS_CACHING implementation!
17:28:00.763 INFO ProgressMeter - Starting traversal
#+end_quote
libgomp.so est fourni par gcc donc il faut charger le module
module load gcc@11.3.0/gcc-12.1.0
** KILL Utiliser subworkflow
CLOSED: [2023-04-02 Sun 18:08]
Notre version permet d'être plus souple
*** KILL Alignement
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
*** KILL Vep
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
vcf_annotate_ensemblvep
** TODO Annotation avec nextflow :annotation:
*** KILL VEP : --gene-phenotype ?
CLOSED: [2023-04-18 mar. 18:32]
Vu avec alexis : bases de données non à jour
https://www.ensembl.org/info/genome/variation/phenotype/sources_phenotype_documentation.html
*** DONE plugin VEP
CLOSED: [2023-04-18 mar. 18:32]
Cloner dépôt git avec plugin
Puis utiliser --dir_plugins
*** HOLD Utiliser code d’Ale
xis
*** TODO Nouvelle version avec VEP
Example avec --custom
https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html
**** DONE Ajout spliceAI
CLOSED: [2023-05-18 Thu 11:02] SCHEDULED: <2023-04-30 Sun>
plugin VEP
***** DONE Télécharger les données
CLOSED: [2023-05-11 Thu 19:01]
Difficile d'automatiser, le lien est temporaire...
***** DONE PLugin
CLOSED: [2023-05-11 Thu 20:16]
***** DONE Séparer score en plusieurs colonnes
CLOSED: [2023-05-11 Thu 20:16]
Test avec ce fichier pour avoir une ligne avec annotation et une ligne sans
#CHROM POS ID REF ALT
1 9091 . A C
1 69091 . A C
et
#+begin_src sh
rm -f postvep.tsv* && vep -i testspliceai.vcf.gz -o postvep.tsv --tab --dir 109 --merged --pick --use_given_ref --offline --plugin SpliceAI,snv=spliceai_scores.raw.snv.hg38.vcf.gz,indel=spliceai_scores.raw.indel.hg38.vcf.gz
#+end_src
#+begin_src
$ bgzip postvep.tsv
$ python spliceai.py
$ cat postvep2.tsv
,variation,Location,Allele,Gene,Feature,Feature_type,Consequence,cDNA_position,CDS_position,Protein_position,Amino_acids,Codons,Existing_variation,IMPACT,DISTANCE,STRAND,FLAGS,REFSEQ_MATCH,SOURCE,REFSEQ_OFFSET,SpliceAI_AG,SpliceAI_AL,SpliceAI_DG,SpliceAI_DL
0,1_9091_A/C,1:9091,C,ENSG00000290825,ENST00000456328,Transcript,upstream_gene_variant,-,-,-,-,-,-,MODIFIER,2778,1,-,-,Ensembl,-,,,,
1,1_69091_A/C,1:69091,C,ENSG00000186092,ENST00000641515,Transcript,missense_variant,124,64,22,M/L,Atg/Ctg,-,MODERATE,-,1,-,-,Ensembl,-,0.01,0.00,0.00,0.01
#+end_src
Test
cp work/bf/437ae511958509e43072f032f4d495/small.tab.gz tests/vep-spip.tab.gz
cp work/d5/3b1244b5ae83d54409ee0d456e8c55/small_cadd.tab.gz tests/vep-cadd-splice.tab.gz
**** STRT Package Nix spliceAI
On utilise le tensorflow fourni avec nix (branche spliceai)
Il faut LD_PRELOAD=/lib64/libcuda.so pour l'exécution
***** DONE Version CPU non optimisé
CLOSED: [2023-09-23 Sat 21:34]
****** DONE Vérifier annotation hg19 et 38
CLOSED: [2023-09-23 Sat 18:49]
****** DONE Annotation maison T2T
CLOSED: [2023-09-23 Sat 21:33] SCHEDULED: <2023-09-23 Sat>
***** DONE Version CPU optismisée
CLOSED: [2023-09-23 Sat 21:34]
Activer flag dans le package nix
***** DONE Version CPU: Test chr20
CLOSED: [2023-09-23 Sat 22:25] SCHEDULED: <2023-09-23 Sat>
10min ?
***** DONE Version CPU: NA12878 sanger complet: kill car trop long
CLOSED: [2023-09-24 Sun 08:22] SCHEDULED: <2023-09-23 Sat>
10min ?
***** DONE GPU avec la version sur mésocentre: mail envoyé
CLOSED: [2023-09-26 Tue 11:49]
#+begin_src sh
module unload nix/2.11.0
module load anaconda3@2021.05/gcc-12.1.0
module load deep/tensorflow-gpu
pip install spliceai
#+end_src
Puis on teste sur la queue gpu
#+begin_src sh
srun -p gpu -t 4:00:00 --gres=gpu:1 --pty bash
cd /Work/Users/apraga/bisonex/tests/spliceai/
module load deep/tensorflow-gpu
module unload nix/2.11.0
time spliceai -I NA12878-sanger-chr20-T2T.vep.vcf.gz -O output-20-2.vcf -R /Work/Groups/bisonex/data/fasta/chm13v2.0/chm13v2.0.fa -A ~/t2t.txt
#+end_src
Échec: librarie DNN not found...
#+begin_quote
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-24 09:37:45.892545: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-09-24 09:37:47.759421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 38188 MB memory: -> device: 0, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:21:00.0, compute capability: 8.0
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
2023-09-24 09:37:54.143021: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:637] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2023-09-24 09:37:54.217160: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:429] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2023-09-24 09:37:54.217220: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:438] Possibly insufficient driver version: 510.108.3
2023-09-24 09:37:54.217245: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at conv_ops.cc:1068 : UNIMPLEMENTED: DNN library is not found.
2023-09-24 09:37:54.217262: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:GPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): UNIMPLEMENTED: DNN library is not found.
[[{{node model_1/conv1d_3/Conv1D}}]]
Traceback (most recent call last):
File "/Home/Users/apraga/.local/bin/spliceai", line 8, in <module>
sys.exit(main())
File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/__main__.py", line 72, in main
scores = get_delta_scores(record, ann, args.D, args.M)
File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/utils.py", line 159, in get_delta_scores
y_ref = np.mean([ann.models[m].predict(x_ref) for m in range(5)], axis=0)
File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/utils.py", line 159, in <listcomp>
y_ref = np.mean([ann.models[m].predict(x_ref) for m in range(5)], axis=0)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:
Detected at node 'model_1/conv1d_3/Conv1D' defined at (most recent call last):
File "/Home/Users/apraga/.local/bin/spliceai", line 8, in <module>
sys.exit(main())
File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/__main__.py", line 72, in main
scores = get_delta_scores(record, ann, args.D, args.M)
File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/utils.py", line 159, in get_
to load libgkl_utils.so from native/libgkl_utils.so (/Work/Users/apraga/bisonex/out/NA12878_NIST7035/preprocessing/applybqsr/libgkl_utils821485189051585397.so: libgomp.so.1: cannot open shared object file: No such file or directory)
17:28:00.733 WARN IntelPairHmm - Intel GKL Utils not loaded
17:28:00.733 WARN PairHMM - ***WARNING: Machine does not have the AVX instruction set support needed for the accelerated AVX PairHmm. Falling back to the MUCH slower LOGLESS_CACHING implementation!
17:28:00.763 INFO ProgressMeter - Starting traversal
#+end_quote
libgomp.so est fourni par gcc donc il faut charger le module
module load gcc@11.3.0/gcc-12.1.0
** KILL Utiliser subworkflow
CLOSED: [2023-04-02 Sun 18:08]
Notre version permet d'être plus souple
*** KILL Alignement
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
*** KILL Vep
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
vcf_annotate_ensemblvep
** TODO Annotation avec nextflow :annotation:
*** KILL VEP : --gene-phenotype ?
CLOSED: [2023-04-18 mar. 18:32]
Vu avec alexis : bases de données non à jour
https://www.ensembl.org/info/genome/variation/phenotype/sources_phenotype_documentation.html
*** DONE plugin VEP
CLOSED: [2023-04-18 mar. 18:32]
Cloner dépôt git avec plugin
Puis utiliser --dir_plugins
*** HOLD Utiliser code d’Alexis
*** TODO Nouvelle version avec VEP
Example avec --custom
https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html
**** DONE Ajout spliceAI
CLOSED: [2023-05-18 Thu 11:02] SCHEDULED: <2023-04-30 Sun>
plugin VEP
***** DONE Télécharger les données
CLOSED: [2023-05-11 Thu 19:01]
Difficile d'automatiser, le lien est temporaire...
***** DONE PLugin
CLOSED: [2023-05-11 Thu 20:16]
***** DONE Séparer score en plusieurs colonnes
CLOSED: [2023-05-11 Thu 20:16]
Test avec ce fichier pour avoir une ligne avec annotation et une ligne sans
#CHROM POS ID REF ALT
1 9091 . A C
1 69091 . A C
et
#+begin_src sh
rm -f postvep.tsv* && vep -i testspliceai.vcf.gz -o postvep.tsv --tab --dir 109 --merged --pick --use_given_ref --offline --plugin SpliceAI,snv=spliceai_scores.raw.snv.hg38.vcf.gz,indel=spliceai_scores.raw.indel.hg38.vcf.gz
#+end_src
#+begin_src
$ bgzip postvep.tsv
$ python spliceai.py
$ cat postvep2.tsv
,variation,Location,Allele,Gene,Feature,Feature_type,Consequence,cDNA_position,CDS_position,Protein_position,Amino_acids,Codons,Existing_variation,IMPACT,DISTANCE,STRAND,FLAGS,REFSEQ_MATCH,SOURCE,REFSEQ_OFFSET,SpliceAI_AG,SpliceAI_AL,SpliceAI_DG,SpliceAI_DL
0,1_9091_A/C,1:9091,C,ENSG00000290825,ENST00000456328,Transcript,upstream_gene_variant,-,-,-,-,-,-,MODIFIER,2778,1,-,-,Ensembl,-,,,,
1,1_69091_A/C,1:69091,C,ENSG00000186092,ENST00000641515,Transcript,missense_variant,124,64,22,M/L,Atg/Ctg,-,MODERATE,-,1,-,-,Ensembl,-,0.01,0.00,0.00,0.01
#+end_src
Test
cp work/bf/437ae511958509e43072f032f4d495/small.tab.gz tests/vep-spip.tab.gz
cp work/d5/3b1244b5ae83d54409ee0d456e8c55/small_cadd.tab.gz tests/vep-cadd-splice.tab.gz
**** DONE Package Nix spliceAI
CLOSED: [2023-10-07 Sat 18:00]
On utilise le tensorflow fourni avec nix (branche spliceai)
Il faut LD_PRELOAD=/lib64/libcuda.so pour l'exécution
***** DONE Version CPU non optimisé
CLOSED: [2023-09-23 Sat 21:34]
****** DONE Vérifier annotation hg19 et 38
CLOSED: [2023-09-23 Sat 18:49]
****** DONE Annotation maison T2T
CLOSED: [2023-09-23 Sat 21:33] SCHEDULED: <2023-09-23 Sat>
***** DONE Version CPU optismisée
CLOSED: [2023-09-23 Sat 21:34]
Activer flag dans le package nix
***** DONE Version CPU: Test chr20
CLOSED: [2023-09-23 Sat 22:25] SCHEDULED: <2023-09-23 Sat>
10min ?
***** DONE Version CPU: NA12878 sanger complet: kill car trop long
CLOSED: [2023-09-24 Sun 08:22] SCHEDULED: <2023-09-23 Sat>
10min ?
***** DONE GPU avec la version sur mésocentre: mail envoyé
CLOSED: [2023-09-26 Tue 11:49]
#+begin_src sh
module unload nix/2.11.0
module load anaconda3@2021.05/gcc-12.1.0
module load deep/tensorflow-gpu
pip install spliceai
#+end_src
Puis on teste sur la queue gpu
#+begin_src sh
srun -p gpu -t 4:00:00 --gres=gpu:1 --pty bash
cd /Work/Users/apraga/bisonex/tests/spliceai/
module load deep/tensorflow-gpu
module unload nix/2.11.0
time spliceai -I NA12878-sanger-chr20-T2T.vep.vcf.gz -O output-20-2.vcf -R /Work/Groups/bisonex/data/fasta/chm13v2.0/chm13v2.0.fa -A ~/t2t.txt
#+end_src
Échec: librarie DNN not found...
#+begin_quote
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-24 09:37:45.892545: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-09-24 09:37:47.759421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 38188 MB memory: -> device: 0, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:21:00.0, compute capability: 8.0
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
2023-09-24 09:37:54.143021: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:637] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2023-09-24 09:37:54.217160: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:429] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2023-09-24 09:37:54.217220: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:438] Possibly insufficient driver version: 510.108.3
2023-09-24 09:37:54.217245: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at conv_ops.cc:1068 : UNIMPLEMENTED: DNN library is not found.
2023-09-24 09:37:54.217262: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:GPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): UNIMPLEMENTED: DNN library is not found.
[[{{node model_1/conv1d_3/Conv1D}}]]
Traceback (most recent call last):
File "/Home/Users/apraga/.local/bin/spliceai", line 8, in <module>
sys.exit(main())
File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/__main__.py", line 72, in main
scores = get_delta_scores(record, ann, args.D, args.M)
File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/utils.py", line 159, in get_delta_scores
y_ref = np.mean([ann.models[m].predict(x_ref) for m in range(5)], axis=0)
File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/utils.py", line 159, in <listcomp>
y_ref = np.mean([ann.models[m].predict(x_ref) for m in range(5)], axis=0)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:
Detected at node 'model_1/conv1d_3/Conv1D' defined at (most recent call last):
File "/Home/Users/apraga/.local/bin/spliceai", line 8, in <module>
sys.exit(main())
File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/__main__.py", line 72, in main
scores = get_delta_scores(record, ann, args.D, args.M)
File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/utils.py", line 159, in get_
e_tract_variant&intron_variant&non_coding_transcript_variant
15 splice_region_variant&synonymous_variant
14 start_lost
44 stop_gained
4 stop_gained&frameshift_variant
2 stop_gained&frameshift_variant&splice_region_variant
3 stop_gained&nmd_transcript_variant
3 stop_gained&splice_region_variant
2 stop_gained&splice_region_variant&nmd_transcript_variant
2 stop_lost
1 stop_lost&nmd_transcript_variant
6 stop_retained_variant
2 stop_retained_variant&nmd_transcript_variant
89 synonymous_variant
2 synonymous_variant&nmd_transcript_variant
1 transcript_ablation
1 upstream_gene_variant
En prenant tous les transcrits, 66% missense (total = 22085)
39 3_prime_UTR_variant
120 3_prime_UTR_variant&NMD_transcript_variant
22 5_prime_UTR_variant
2 5_prime_UTR_variant&NMD_transcript_variant
94 coding_sequence_variant
13 coding_sequence_variant&NMD_transcript_variant
527 downstream_gene_variant
257 frameshift_variant
21 frameshift_variant&NMD_transcript_variant
2 frameshift_variant&splice_donor_region_variant
20 frameshift_variant&splice_region_variant
1 frameshift_variant&splice_region_variant&NMD_transcript_variant
1 incomplete_terminal_codon_variant&coding_sequence_variant
211 inframe_deletion
18 inframe_deletion&NMD_transcript_variant
6 inframe_deletion&splice_region_variant
242 inframe_insertion
22 inframe_insertion&NMD_transcript_variant
4 inframe_insertion&splice_region_variant
983 intron_variant
244 intron_variant&NMD_transcript_variant
358 intron_variant&non_coding_transcript_variant
14690 missense_variant
1416 missense_variant&NMD_transcript_variant
6 missense_variant&splice_donor_5th_base_variant
374 missense_variant&splice_region_variant
34 missense_variant&splice_region_variant&NMD_transcript_variant
383 non_coding_transcript_exon_variant
53 splice_acceptor_variant
11 splice_acceptor_variant&NMD_transcript_variant
11 splice_acceptor_variant&non_coding_transcript_variant
20 splice_donor_5th_base_variant&intron_variant
4 splice_donor_5th_base_variant&intron_variant&NMD_transcript_variant
9 splice_donor_5th_base_variant&intron_variant&non_coding_transcript_variant
59 splice_donor_region_variant&intron_variant
11 splice_donor_region_variant&intron_variant&NMD_transcript_variant
24 splice_donor_region_variant&intron_variant&non_coding_transcript_variant
79 splice_donor_variant
6 splice_donor_variant&NMD_transcript_variant
17 splice_donor_variant&non_coding_transcript_variant
1 splice_donor_variant&splice_donor_5th_base_variant&3_prime_UTR_variant&intron_variant&NMD_transcript_variant
21 splice_donor_variant&splice_donor_5th_base_variant&coding_sequence_variant&intron_variant
3 splice_donor_variant&splice_donor_5th_base_variant&intron_variant
1 splice_donor_variant&splice_donor_5th_base_variant&non_coding_transcript_exon_variant&intron_variant
176 splice_polypyrimidine_tract_variant&intron_variant
27 splice_polypyrimidine_tract_variant&intron_variant&NMD_transcript_variant
48 splice_polypyrimidine_tract_variant&intron_variant&non_coding_transcript_variant
1 splice_region_variant&3_prime_UTR_variant
24 splice_region_variant&3_prime_UTR_variant&NMD_transcript_variant
9 splice_region_variant&5_prime_UTR_variant
61 splice_region_variant&intron_variant
23 splice_region_variant&intron_variant&NMD_transcript_variant
37 splice_region_variant&intron_variant&non_coding_transcript_variant
26 splice_region_variant&non_coding_transcript_exon_variant
5 splice_region_variant&non_coding_transcript_variant
145 splice_region_variant&splice_polypyrimidine_tract_variant&intron_variant
27 splice_region_variant&splice_polypyrimidine_tract_variant&intron_variant&NMD_transcript_variant
41 splice_region_variant&splice_polypyrimidine_tract_variant&intron_variant&non_coding_transcript_variant
37 splice_region_variant&synonymous_variant
3 splice_region_variant&synonymous_variant&NMD_transcript_variant
30 start_lost
5 start_lost&NMD_transcript_variant
135 stop_gained
13 stop_gained&frameshift_variant
3 stop_gained&frameshift_variant&NMD_transcript_variant
2 stop_gained&frameshift_variant&splice_region_variant
14 stop_gained&NMD_transcript_variant
5 stop_gained&splice_region_variant
2 stop_gained&splice_region_variant&NMD_transcript_variant
4 stop_lost
1 stop_lost&NMD_transcript_variant
9 stop_retained_variant
6 stop_retained_variant&NMD_transcript_variant
311 synonymous_variant
24 synonymous_variant&NMD_transcript_variant
1 transcript_ablation
390 upstream_gene_variant
**** TODO Ajout LOEUF et pli
plugin VEP
**** TODO NMD
plugin VEP
**** KILL Ajout LOEUF
CLOSED: [2023-04-19 mer. 16:32]
plugin VEP
**** DONE Spip
CLOSED: [2023-05-01 Mon 23:07] SCHEDULED: <2023-04-30 Sun>
BED ne semble pas bien marcher (il faut définir une zone)
VCF : trop d’information
Attention, plusieurs transcripts mais résultats identiques. On supprimer les doublons
***** DONE interpretation + score + intervalle de confiance séparé
CLOSED: [2023-05-01 Mon 23:07] SCHEDULED: <2023-04-30 Sun>
Tests :
dans tests/
vep -i 63004925-small.vcf -o postvep.vcf --vcf --fasta genomeRef.fna --dir 109 --merged --pick --offline --custom ../script/spip_annotation.vcf.gz,SPIP,vcf,exact,0,spipInterp,spipScore,spipConfidence
***** DONE Score
CLOSED: [2023-04-22 Sat 15:30]
**** DONE CADD: remplacer par plugin VEP
CLOSED: [2023-05-07 Sun 14:45] SCHEDULED: <2023-05-07 Sun>
***** Test
#+begin_src
vep -i test.vcf -o lol.vcf --offline --dir /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz --dir_plugins ../VEP_plugins/ -v
#+end_src
Test
#+begin_src sh
vep --id "1 230710048 230710048 A/G 1" --offline --dir /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz --hgvsg --plugin pLI --plugin LOEUF -o lol
#+end_src
CSQ=G|missense_variant|MODERATE|AGT|ENSG00000135744|Transcript|ENST00000366667|protein_coding|2/5||||843|776|259|M/T|aTg/aCg|||-1||HGNC|HGNC:333||Ensembl||A|A||1:g.230710048A>G|0.347|-0.277922|
Correspond bien à https://www.ensembl.org/Homo_sapiens/Tools/VEP/Results?tl=I7ZsIbrj14P6lD43-9115494
***** DONE Utiliser whole genome
CLOSED: [2023-04-29 Sat 15:46]
***** KILL Renommer les chromosome avant ...
CLOSED: [2023-05-01 Mon 09:14] SCHEDULED: <2023-04-30 Sun>
Trop long !
- Téléchargement de CADD: 4h20
- renommer les chromosome pour SNV : 6h20
- tabix sur les SNV : job tué au bout de 21h....
***** DONE annoter séparément et fusionner les tableaux
CLOSED: [2023-05-07 Sun 14:45] SCHEDULED: <2023-05-01 Mon>
NB: on pourrait filtrer CADD avec tabix pour se restreindre à nos variants
**** DONE clinvar
CLOSED: [2023-04-22 Sat 15:31]
**** KILL Vérifier résultats HGVS avec mutalyzer
CLOSED: [2023-05-01 Mon 09:26]
**** HOLD Parallélisation
***** HOLD par chromosome avec workflow VEP
https://github.com/Ensembl/ensembl-vep/blob/release/109/nextflow/workflows/run_vep.nf
***** HOLD Avec option --fork
**** DONE Utiliser la version de nf-core de VEP
CLOSED: [2023-05-13 Sat 18:27] SCHEDULED: <2023-05-07 Sun>
**** DONE OMIM
CLOSED: [2023-08-31 Thu 10:38] SCHEDULED: <2023-08-29 Tue>
**** DONE plI et LOEUF depuis gnomad
CLOSED: [2023-08-3
e_tract_variant&intron_variant&non_coding_transcript_variant
15 splice_region_variant&synonymous_variant
14 start_lost
44 stop_gained
4 stop_gained&frameshift_variant
2 stop_gained&frameshift_variant&splice_region_variant
3 stop_gained&nmd_transcript_variant
3 stop_gained&splice_region_variant
2 stop_gained&splice_region_variant&nmd_transcript_variant
2 stop_lost
1 stop_lost&nmd_transcript_variant
6 stop_retained_variant
2 stop_retained_variant&nmd_transcript_variant
89 synonymous_variant
2 synonymous_variant&nmd_transcript_variant
1 transcript_ablation
1 upstream_gene_variant
En prenant tous les transcrits, 66% missense (total = 22085)
39 3_prime_UTR_variant
120 3_prime_UTR_variant&NMD_transcript_variant
22 5_prime_UTR_variant
2 5_prime_UTR_variant&NMD_transcript_variant
94 coding_sequence_variant
13 coding_sequence_variant&NMD_transcript_variant
527 downstream_gene_variant
257 frameshift_variant
21 frameshift_variant&NMD_transcript_variant
2 frameshift_variant&splice_donor_region_variant
20 frameshift_variant&splice_region_variant
1 frameshift_variant&splice_region_variant&NMD_transcript_variant
1 incomplete_terminal_codon_variant&coding_sequence_variant
211 inframe_deletion
18 inframe_deletion&NMD_transcript_variant
6 inframe_deletion&splice_region_variant
242 inframe_insertion
22 inframe_insertion&NMD_transcript_variant
4 inframe_insertion&splice_region_variant
983 intron_variant
244 intron_variant&NMD_transcript_variant
358 intron_variant&non_coding_transcript_variant
14690 missense_variant
1416 missense_variant&NMD_transcript_variant
6 missense_variant&splice_donor_5th_base_variant
374 missense_variant&splice_region_variant
34 missense_variant&splice_region_variant&NMD_transcript_variant
383 non_coding_transcript_exon_variant
53 splice_acceptor_variant
11 splice_acceptor_variant&NMD_transcript_variant
11 splice_acceptor_variant&non_coding_transcript_variant
20 splice_donor_5th_base_variant&intron_variant
4 splice_donor_5th_base_variant&intron_variant&NMD_transcript_variant
9 splice_donor_5th_base_variant&intron_variant&non_coding_transcript_variant
59 splice_donor_region_variant&intron_variant
11 splice_donor_region_variant&intron_variant&NMD_transcript_variant
24 splice_donor_region_variant&intron_variant&non_coding_transcript_variant
79 splice_donor_variant
6 splice_donor_variant&NMD_transcript_variant
17 splice_donor_variant&non_coding_transcript_variant
1 splice_donor_variant&splice_donor_5th_base_variant&3_prime_UTR_variant&intron_variant&NMD_transcript_variant
21 splice_donor_variant&splice_donor_5th_base_variant&coding_sequence_variant&intron_variant
3 splice_donor_variant&splice_donor_5th_base_variant&intron_variant
1 splice_donor_variant&splice_donor_5th_base_variant&non_coding_transcript_exon_variant&intron_variant
176 splice_polypyrimidine_tract_variant&intron_variant
27 splice_polypyrimidine_tract_variant&intron_variant&NMD_transcript_variant
48 splice_polypyrimidine_tract_variant&intron_variant&non_coding_transcript_variant
1 splice_region_variant&3_prime_UTR_variant
24 splice_region_variant&3_prime_UTR_variant&NMD_transcript_variant
9 splice_region_variant&5_prime_UTR_variant
61 splice_region_variant&intron_variant
23 splice_region_variant&intron_variant&NMD_transcript_variant
37 splice_region_variant&intron_variant&non_coding_transcript_variant
26 splice_region_variant&non_coding_transcript_exon_variant
5 splice_region_variant&non_coding_transcript_variant
145 splice_region_variant&splice_polypyrimidine_tract_variant&intron_variant
27 splice_region_variant&splice_polypyrimidine_tract_variant&intron_variant&NMD_transcript_variant
41 splice_region_variant&splice_polypyrimidine_tract_variant&intron_variant&non_coding_transcript_variant
37 splice_region_variant&synonymous_variant
3 splice_region_variant&synonymous_variant&NMD_transcript_variant
30 start_lost
5 start_lost&NMD_transcript_variant
135 stop_gained
13 stop_gained&frameshift_variant
3 stop_gained&frameshift_variant&NMD_transcript_variant
2 stop_gained&frameshift_variant&splice_region_variant
14 stop_gained&NMD_transcript_variant
5 stop_gained&splice_region_variant
2 stop_gained&splice_region_variant&NMD_transcript_variant
4 stop_lost
1 stop_lost&NMD_transcript_variant
9 stop_retained_variant
6 stop_retained_variant&NMD_transcript_variant
311 synonymous_variant
24 synonymous_variant&NMD_transcript_variant
1 transcript_ablation
390 upstream_gene_variant
**** KILL Ajout LOEUF et pli
CLOSED: [2023-10-07 Sat 17:57]
plugin VEP
**** DONE NMD
CLOSED: [2023-10-07 Sat 17:57]
plugin VEP
**** KILL Ajout LOEUF
CLOSED: [2023-04-19 mer. 16:32]
plugin VEP
**** DONE Spip
CLOSED: [2023-05-01 Mon 23:07] SCHEDULED: <2023-04-30 Sun>
BED ne semble pas bien marcher (il faut définir une zone)
VCF : trop d’information
Attention, plusieurs transcripts mais résultats identiques. On supprimer les doublons
***** DONE interpretation + score + intervalle de confiance séparé
CLOSED: [2023-05-01 Mon 23:07] SCHEDULED: <2023-04-30 Sun>
Tests :
dans tests/
vep -i 63004925-small.vcf -o postvep.vcf --vcf --fasta genomeRef.fna --dir 109 --merged --pick --offline --custom ../script/spip_annotation.vcf.gz,SPIP,vcf,exact,0,spipInterp,spipScore,spipConfidence
***** DONE Score
CLOSED: [2023-04-22 Sat 15:30]
**** DONE CADD: remplacer par plugin VEP
CLOSED: [2023-05-07 Sun 14:45] SCHEDULED: <2023-05-07 Sun>
***** Test
#+begin_src
vep -i test.vcf -o lol.vcf --offline --dir /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz --dir_plugins ../VEP_plugins/ -v
#+end_src
Test
#+begin_src sh
vep --id "1 230710048 230710048 A/G 1" --offline --dir /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz --hgvsg --plugin pLI --plugin LOEUF -o lol
#+end_src
CSQ=G|missense_variant|MODERATE|AGT|ENSG00000135744|Transcript|ENST00000366667|protein_coding|2/5||||843|776|259|M/T|aTg/aCg|||-1||HGNC|HGNC:333||Ensembl||A|A||1:g.230710048A>G|0.347|-0.277922|
Correspond bien à https://www.ensembl.org/Homo_sapiens/Tools/VEP/Results?tl=I7ZsIbrj14P6lD43-9115494
***** DONE Utiliser whole genome
CLOSED: [2023-04-29 Sat 15:46]
***** KILL Renommer les chromosome avant ...
CLOSED: [2023-05-01 Mon 09:14] SCHEDULED: <2023-04-30 Sun>
Trop long !
- Téléchargement de CADD: 4h20
- renommer les chromosome pour SNV : 6h20
- tabix sur les SNV : job tué au bout de 21h....
***** DONE annoter séparément et fusionner les tableaux
CLOSED: [2023-05-07 Sun 14:45] SCHEDULED: <2023-05-01 Mon>
NB: on pourrait filtrer CADD avec tabix pour se restreindre à nos variants
**** DONE clinvar
CLOSED: [2023-04-22 Sat 15:31]
**** KILL Vérifier résultats HGVS avec mutalyzer
CLOSED: [2023-05-01 Mon 09:26]
**** HOLD Parallélisation
***** HOLD par chromosome avec workflow VEP
https://github.com/Ensembl/ensembl-vep/blob/release/109/nextflow/workflows/run_vep.nf
***** HOLD Avec option --fork
**** DONE Utiliser la version de nf-core de VEP
CLOSED: [2023-05-13 Sat 18:27] SCHEDULED: <2023-05-07 Sun>
**** DONE OMIM
CLOSED: [2023-08-31 Thu 10:38] SCHEDULED: <2023-08-29 Tue>
**** DONE plI et LOEUF depuis gnomad
CLOSED: [2023-08-3
pendences pour VM CHU
CLOSED: [2023-04-22 Sat 15:27] SCHEDULED: <2023-04-13 Thu>
* Bibl
iographie
** DONE Finir[cite:@
alser2021]
CLOSED: [2023-09-26 Tue 11:26] SCHEDULED: <2023-09-22 Fri>
* Manuscript
:PROPERTIES:
:CATEGORY: manuscript
:END:
** DONE Flowchart pipeline (avec T2T)
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
** DONE Figure: nombre de publication par aligneur
CLOSED: [2023-09-19 Tue 16:54] SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
** TODO Biblio performance aligneur
SCHEDULED: <2023-10-01 Sun>
** TODO Biblio appel de variant
SCHEDULED: <2023-10-01 Sun>
** TODO Figure: nombre de publication par appel de variant
SCHEDULED: <2023-10-04 Wed>
/Entered on/ [2023-09-19 Tue 08:43]
** TODO Figure: nombre d'articles citant les principaux aligneur par année
SCHEDULED: <2023-10-03 Tue>
** TODO Figure: nombre d'exomes par années
SCHEDULED: <2023-10-10 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
* Tests :tests:
** KILL Non régression : version prod
CLOSED: [2023-05-23 Tue 08:46]
*** DONE ID common snp
CLOSED: [2022-11-19 Sat 21:36]
#+begin_src
$ wc -l ID_of_common_snp.txt
23194290 ID_of_common_snp.txt
$ wc -l /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
23194290 /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
#+end_src
*** DONE ID common snp not clinvar patho
CLOSED: [2022-12-11 Sun 20:11]
**** DONE Vérification du problème
CLOSED: [2022-12-11 Sun 16:30]
Sur le J:
21155134 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref
Version de "non-régression"
21155076 database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt
Nouvelle version
23193391 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt
Si on enlève les doublons
$ sort database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt | uniq > old.txt
$ wc -l old.txt
21107097 old.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt | uniq > new.txt
$ wc -l new.txt
21174578 new.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref | uniq > ref.txt
$ wc -l ref.txt
21107155 ref.txt
Si on regarde la différence
comm -23 ref.txt old.txt
rs1052692
rs1057518973
rs1057518973
rs11074121
rs112848754
rs12573787
rs145033890
rs147889095
rs1553904159
rs1560294695
rs1560296615
rs1560310926
rs1560325547
rs1560342418
rs1560356225
rs1578287542
...
On cherche le premier
bcftools query -i 'ID="rs1052692"' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 1619351 C A,T
Il est bien patho...
$ bcftools query -i 'POS=1619351' database/clinvar/clinvar.vcf.gz -f '%CHROM %POS %REF %ALT %INFO/CLNSIG\n'
19 1619351 C T Conflicting_interpretations_of_pathogenicity
On vérifie pour tous les autres
$ comm -23 ref.txt old.txt > tocheck.txt
On génère les régions à vérifier (chromosome number:position)
$ bcftools query -i 'ID=@tocheck.txt' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM\t%POS\n' > tocheck.pos
On génère le mapping inverse (chromosome number -> NC)
$ awk ' { t = $1; $1 = $2; $2 = t; print; } ' database/RefSeq/refseq_to_number_only_consensual.txt > mapping.txt
On remap clinvar
$ bcftools annotate --rename-chrs mapping.txt database/clinvar/clinvar.vcf.gz -o clinvar_remapped.vcf.gz
$ tabix clinvar_remapped.vcf.gz
Enfin, on cherche dans clinvar la classification
$ bcftools query -R tocheck.pos clinvar_remapped.vcf.gz -f '%CHROM %POS %INFO/CLNSIG\n'
$ bcftools query -R tocheck.pos database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %ID \n' | grep '^NC'
#+RESULTS:
**** DONE Comprendre pourquoi la nouvelle version donne un résultat différent
CLOSED: [2022-12-11 Sun 20:11]
***** DONE Même version dbsnp et clinvar ?
CLOSED: [2022-12-10 Sat 23:02]
Clinvar différent !
$ bcftools stats clinvar.gz
clinvar (Alexis)
SN 0 number of samples: 0
SN 0 number of records: 1492828
SN 0 number of no-ALTs: 965
SN 0 number of SNPs: 1338007
SN 0 number of MNPs: 5562
SN 0 number of indels: 144580
SN 0 number of others: 3714
SN 0 number of multiallelic sites: 0
SN 0 number of multiallelic SNP sites: 0
clinvar (new)
SN 0 number of samples: 0
SN 0 number of records: 1493470
SN 0 number of no-ALTs: 965
SN 0 number of SNPs: 1338561
SN 0 number of MNPs: 5565
SN 0 number of indels: 144663
SN 0 number of others: 3716
SN 0 number of multiallelic sites: 0
SN 0 number of multiallelic SNP sites: 0
***** DONE Mettre à jour clinvar et dbnSNP pour travailler sur les mêm bases
CLOSED: [2022-12-11 Sun 12:10]
Problème persiste
***** DONE Supprimer la conversion en int du chromosome
CLOSED: [2022-12-10 Sat 19:29]
***** KILL Même NC ?
CLOSED: [2022-12-10 Sat 19:29]
$ zgrep "contig=<ID=NC_\(.*\)" clinvar/GRCh38/clinvar.vcf.gz > contig.clinvar
$ diff contig.txt contig.clinvar
< ##contig=<ID=NC_012920.1>
***** DONE Tester sur chromosome 19: ok
CLOSED: [2022-12-11 Sun 13:53]
On prépare les données
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/dbSNP_common.vcf.gz -o dbSNP_common_19.vcf.gz
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/clinvar/GRCh38/clinvar.vcf.gz -o clinvar_19.vcf.gz
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data-alexis/dbSNP/dbSNP_common.vcf.gz -o dbSNP_common_19_old.vcf.gz
bcftools filter -i 'CHROM="19"' /Work/Groups/bisonex/data-alexis/clinvar/clinvar.vcf.gz -o clinvar_19_old.vcf.gz
#+end_src
On récupère les 2 versions du script
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
git checkout regression ../../script/pythonScript/clinvar_sbSNP.py
cp ../../script/pythonScript/clinvar_sbSNP.py clinvar_sbSNP_old.py
git checkout HEAD ../../script/pythonScript/clinvar_sbSNP.py
#+end_src
#+RESULTS:
On compare
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
python ../../script/pythonScript/clinvar_sbSNP.py clinvar_sbSNP.py --clinvar clinvar_19.vcf.gz --dbSNP dbSNP_common_19.vcf.gz --output tmp.txt
sort tmp.txt | uniq > new.txt
table=/Work/Groups/bisonex/data-alexis/RefSeq/refseq_to_number_only_consensual.txt
python clinvar_sbSNP_old.py --clinvar clinvar_19_old.vcf.gz --dbSNP dbSNP_common_19_old.vcf.gz --output tmp_old.txt --chrm_name_table $table
sort tmp_old.txt | uniq > old.txt
wc -l old.txt new.txt
#+end_src
#+RESULTS:
| 535155 | old.txt |
| 535194 | new.txt |
| 1070349 | total |
Si on prend le premier manquant dans new, il est conflicting patho donc il ne devrait pas y être...
$ bcftools query -i 'ID="rs10418277"' dbSNP
_common_19.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 54939682 C G,T
$ bcftools query -i 'ID="rs10418277"' dbSNP_common_19_old.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 54939682 C G,T
$ bcftools query -i 'POS=54939682' clinvar_19.vcf.gz -f '%POS %REF %ALT %INFO/CLNSIG\n'
54939682 C G Conflicting_interpretations_of_pathogenicity
54939682 C T Benign
$ bcftools query -i 'POS=54939682' clinvar_19_old.vcf.gz -f '%POS %REF %ALT %INFO/CLNSIG\n'
54939682 C G Conflicting_interpretations_of_pathogenicity
54939682 C T Benign
$ grep rs10418277 *.txt
new.txt:rs10418277
tmp.txt:rs10418277
Le problème venait de la POS qui n'était plus convertie en int (suppression de la ligne par erreur ??)
On vérifie
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
python ../../script/pythonScript/clinvar_sbSNP.py --clinvar clinvar_19.vcf.gz --dbSNP dbSNP_common_19.vcf.gz --output tmp.txt
sort tmp.txt | uniq > new.txt
table=/Work/Groups/bisonex/data-alexis/RefSeq/refseq_to_number_only_consensual.txt
python clinvar_sbSNP_old.py --clinvar clinvar_19_old.vcf.gz --dbSNP dbSNP_common_19_old.vcf.gz --output tmp_old.txt --chrm_name_table $table
sort tmp_old.txt | uniq > old.txt
wc -l
pendences pour VM CHU
CLOSED: [2023-04-22 Sat 15:27] SCHEDULED: <2023-04-13 Thu>
* Bibliographie
** DONE Finir[cite:@alser2021]
CLOSED: [2023-09-26 Tue 11:26] SCHEDULED: <2023-09-22 Fri>
* Manuscript
:PROPERTIES:
:CATEGORY: manuscript
:END:
** DONE Flowchart pipeline (avec T2T)
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
** DONE Figure: nombre de publication par aligneur
CLOSED: [2023-09-19 Tue 16:54] SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
** DONE Biblio performance aligneur
CLOSED: [2023-10-06 Fri 16:51] SCHEDULED: <2023-10-01 Sun>
** TODO Biblio appel de variant
SCHEDULED: <2023-10-01 Sun>
** TODO Figure: nombre de publication par appel de variant
SCHEDULED: <2023-10-04 Wed>
/Entered on/ [2023-09-19 Tue 08:43]
** TODO Figure: nombre d'articles citant les principaux aligneur par année
SCHEDULED: <2023-10-03 Tue>
** TODO Figure: nombre d'exomes par années
SCHEDULED: <2023-10-10 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
* Tests :tests:
** KILL Non régression : version prod
CLOSED: [2023-05-23 Tue 08:46]
*** DONE ID common snp
CLOSED: [2022-11-19 Sat 21:36]
#+begin_src
$ wc -l ID_of_common_snp.txt
23194290 ID_of_common_snp.txt
$ wc -l /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
23194290 /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
#+end_src
*** DONE ID common snp not clinvar patho
CLOSED: [2022-12-11 Sun 20:11]
**** DONE Vérification du problème
CLOSED: [2022-12-11 Sun 16:30]
Sur le J:
21155134 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref
Version de "non-régression"
21155076 database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt
Nouvelle version
23193391 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt
Si on enlève les doublons
$ sort database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt | uniq > old.txt
$ wc -l old.txt
21107097 old.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt | uniq > new.txt
$ wc -l new.txt
21174578 new.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref | uniq > ref.txt
$ wc -l ref.txt
21107155 ref.txt
Si on regarde la différence
comm -23 ref.txt old.txt
rs1052692
rs1057518973
rs1057518973
rs11074121
rs112848754
rs12573787
rs145033890
rs147889095
rs1553904159
rs1560294695
rs1560296615
rs1560310926
rs1560325547
rs1560342418
rs1560356225
rs1578287542
...
On cherche le premier
bcftools query -i 'ID="rs1052692"' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 1619351 C A,T
Il est bien patho...
$ bcftools query -i 'POS=1619351' database/clinvar/clinvar.vcf.gz -f '%CHROM %POS %REF %ALT %INFO/CLNSIG\n'
19 1619351 C T Conflicting_interpretations_of_pathogenicity
On vérifie pour tous les autres
$ comm -23 ref.txt old.txt > tocheck.txt
On génère les régions à vérifier (chromosome number:position)
$ bcftools query -i 'ID=@tocheck.txt' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM\t%POS\n' > tocheck.pos
On génère le mapping inverse (chromosome number -> NC)
$ awk ' { t = $1; $1 = $2; $2 = t; print; } ' database/RefSeq/refseq_to_number_only_consensual.txt > mapping.txt
On remap clinvar
$ bcftools annotate --rename-chrs mapping.txt database/clinvar/clinvar.vcf.gz -o clinvar_remapped.vcf.gz
$ tabix clinvar_remapped.vcf.gz
Enfin, on cherche dans clinvar la classification
$ bcftools query -R tocheck.pos clinvar_remapped.vcf.gz -f '%CHROM %POS %INFO/CLNSIG\n'
$ bcftools query -R tocheck.pos database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %ID \n' | grep '^NC'
#+RESULTS:
**** DONE Comprendre pourquoi la nouvelle version donne un résultat différent
CLOSED: [2022-12-11 Sun 20:11]
***** DONE Même version dbsnp et clinvar ?
CLOSED: [2022-12-10 Sat 23:02]
Clinvar différent !
$ bcftools stats clinvar.gz
clinvar (Alexis)
SN 0 number of samples: 0
SN 0 number of records: 1492828
SN 0 number of no-ALTs: 965
SN 0 number of SNPs: 1338007
SN 0 number of MNPs: 5562
SN 0 number of indels: 144580
SN 0 number of others: 3714
SN 0 number of multiallelic sites: 0
SN 0 number of multiallelic SNP sites: 0
clinvar (new)
SN 0 number of samples: 0
SN 0 number of records: 1493470
SN 0 number of no-ALTs: 965
SN 0 number of SNPs: 1338561
SN 0 number of MNPs: 5565
SN 0 number of indels: 144663
SN 0 number of others: 3716
SN 0 number of multiallelic sites: 0
SN 0 number of multiallelic SNP sites: 0
***** DONE Mettre à jour clinvar et dbnSNP pour travailler sur les mêm bases
CLOSED: [2022-12-11 Sun 12:10]
Problème persiste
***** DONE Supprimer la conversion en int du chromosome
CLOSED: [2022-12-10 Sat 19:29]
***** KILL Même NC ?
CLOSED: [2022-12-10 Sat 19:29]
$ zgrep "contig=<ID=NC_\(.*\)" clinvar/GRCh38/clinvar.vcf.gz > contig.clinvar
$ diff contig.txt contig.clinvar
< ##contig=<ID=NC_012920.1>
***** DONE Tester sur chromosome 19: ok
CLOSED: [2022-12-11 Sun 13:53]
On prépare les données
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/dbSNP_common.vcf.gz -o dbSNP_common_19.vcf.gz
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/clinvar/GRCh38/clinvar.vcf.gz -o clinvar_19.vcf.gz
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data-alexis/dbSNP/dbSNP_common.vcf.gz -o dbSNP_common_19_old.vcf.gz
bcftools filter -i 'CHROM="19"' /Work/Groups/bisonex/data-alexis/clinvar/clinvar.vcf.gz -o clinvar_19_old.vcf.gz
#+end_src
On récupère les 2 versions du script
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
git checkout regression ../../script/pythonScript/clinvar_sbSNP.py
cp ../../script/pythonScript/clinvar_sbSNP.py clinvar_sbSNP_old.py
git checkout HEAD ../../script/pythonScript/clinvar_sbSNP.py
#+end_src
#+RESULTS:
On compare
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
python ../../script/pythonScript/clinvar_sbSNP.py clinvar_sbSNP.py --clinvar clinvar_19.vcf.gz --dbSNP dbSNP_common_19.vcf.gz --output tmp.txt
sort tmp.txt | uniq > new.txt
table=/Work/Groups/bisonex/data-alexis/RefSeq/refseq_to_number_only_consensual.txt
python clinvar_sbSNP_old.py --clinvar clinvar_19_old.vcf.gz --dbSNP dbSNP_common_19_old.vcf.gz --output tmp_old.txt --chrm_name_table $table
sort tmp_old.txt | uniq > old.txt
wc -l old.txt new.txt
#+end_src
#+RESULTS:
| 535155 | old.txt |
| 535194 | new.txt |
| 1070349 | total |
Si on prend le premier manquant dans new, il est conflicting patho donc il ne devrait pas y être...
$ bcftools query -i 'ID="rs10418277"' dbSNP
_common_19.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 54939682 C G,T
$ bcftools query -i 'ID="rs10418277"' dbSNP_common_19_old.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 54939682 C G,T
$ bcftools query -i 'POS=54939682' clinvar_19.vcf.gz -f '%POS %REF %ALT %INFO/CLNSIG\n'
54939682 C G Conflicting_interpretations_of_pathogenicity
54939682 C T Benign
$ bcftools query -i 'POS=54939682' clinvar_19_old.vcf.gz -f '%POS %REF %ALT %INFO/CLNSIG\n'
54939682 C G Conflicting_interpretations_of_pathogenicity
54939682 C T Benign
$ grep rs10418277 *.txt
new.txt:rs10418277
tmp.txt:rs10418277
Le problème venait de la POS qui n'était plus convertie en int (suppression de la ligne par erreur ??)
On vérifie
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
python ../../script/pythonScript/clinvar_sbSNP.py --clinvar clinvar_19.vcf.gz --dbSNP dbSNP_common_19.vcf.gz --output tmp.txt
sort tmp.txt | uniq > new.txt
table=/Work/Groups/bisonex/data-alexis/RefSeq/refseq_to_number_only_consensual.txt
python clinvar_sbSNP_old.py --clinvar clinvar_19_old.vcf.gz --dbSNP dbSNP_common_19_old.vcf.gz --output tmp_old.txt --chrm_name_table $table
sort tmp_old.txt | uniq > old.txt
wc -l
2841 | 26 | 58 | 0.977661 | 0.529245 |
Hg38
| Type | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision |
| INDEL | 54
9 | 489 | 60 | 899 | 64 | 340 | 8 | 17 | 0.890710 | 0.885510 |
| SNP | 21973 | 21462 | 511 | 26285 | 563 | 4263 | 68 | 16 | 0.976744 | 0.974435 |
****** DONE Interesection des bed: similaire
CLOSED: [2023-07-04 Tue 23:11]
HG38
#+begin_src sh
bedtools intersect -a capture/Agilent_SureSelect_All_Exons_v7_hg38_Regions.bed -b /Work/Groups/bisonex/data/giab/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.bed | wc -l
#+end_src
204280
T2T
#+begin_src sh
bedtools intersect -a /Work/Groups/bisonex/data/giab/T2T/Agilent_SureSelect_All_Exons_v7_hg38_Regions_hg38_T2T.bed -b /Work/Groups/bisonex/data/giab/T2T/HG001_GRCh38_1_22_v4.2.1_benchmark_hg38_T2T.bed | wc -l
#+end_src
204021
****** DONE Vérifier la ligne de commande
CLOSED: [2023-07-04 Tue 23:38]
#+begin_src sh
hap.py \
HG001_GRCh38_1_22_v4_lifted_merged.vcf.gz \
HG001-SRX11061486_SRR14724513-T2T.vcf.gz \
\
--reference chm13v2.0.fa \
--threads 6 \
\
-T Agilent_SureSelect_All_Exons_v7_hg38_Regions_hg38_T2T.bed \
--false-positives HG001_GRCh38_1_22_v4.2.1_benchmark_hg38_T2T.bed \
\
-o HG001
#+end_src
****** DONE Corriger FILTER : mieux mais toujours trop de négatifs. 3/4 SNP retrouvés
CLOSED: [2023-07-08 Sat 15:19] SCHEDULED: <2023-07-08 Sat>
Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio
INDEL ALL 413 246 167 751 289 215 2 98 0.595642 0.460821 0.286285 0.519629 NaN NaN 2.428571 2.465116
INDEL PASS 413 246 167 751 289 215 2 98 0.595642 0.460821 0.286285 0.519629 NaN NaN 2.428571 2.465116
SNP ALL 15883 15479 404 23597 5277 2841 46 44 0.974564 0.745760 0.120397 0.844947 3.017198 2.85705 5.560099 2.114633
SNP PASS 15883 15479 404 23597 5277 2841 46 44 0.974564 0.745760 0.120397 0.844947 3.017198 2.85705 5.560099 2.114633
******* DONE Vérifier qu'il ne reste plus de filtre autre que PASS
CLOSED: [2023-07-08 Sat 15:19]
#+begin_src
$ zgrep -c 'PASS' HG001_GRCh38_1_22_v4_lifted_merged.vcf.gz
3730505
$ zgrep -c '^chr' HG001_GRCh38_1_22_v4_lifted_merged.vcf.gz
3730506
#+end_src
****** TODO 1/4 SNP manquant ?
******* DONE Regarder avec Julia si ce sont vraiment des FP: 61/5277 qui ne le sont pas
CLOSED: [2023-07-09 Sun 12:09]
******* DONE Examiner les FP
CLOSED: [2023-07-30 Sun 22:05]
******* DONE Tester un FP
CLOSED: [2023-07-30 Sun 22:05]
2 │ chr1 608765 A G ./.:.:.:.:NOCALL:nocall:. 1/1:FP:.:ti:SNP:homalt:188
liftDown UCSC: rien en GIAB : vrai FP
3 │ chr1 762943 A G ./.:.:.:.:NOCALL:nocall:. 1/1:FP:.:ti:SNP:homalt:287
4 │ chr1 762945 A T ./.:.:.:.:NOCALL:nocall:. 1/1:FP:.:tv:SNP:homalt:287
Remaniements complexes ? Pas dans le gène en HG38
******* DONE La plupart des FP (4705/5566) sont homozygotes: erreur de référence ?
CLOSED: [2023-07-12 Wed 21:10] SCHEDULED: <2023-07-09 Sun>
Sur les 2 premiers variants, ils montrent en fait la différence entre T2T et GRCh38
Erreur à l'alignement ?
******** KILL relancer l'alignement
CLOSED: [2023-07-09 Sun 17:36]
******** DONE vérifier reads identiques hg38 et T2T: oui
CLOSED: [2023-07-09 Sun 16:36]
T2T CHR1608765
38 chr1:1180168-1180168 (
SRR14724513.24448214
SRR14724513.24448214
******* DONE Vérifier quelques variants sur IGV
CLOSED: [2023-07-09 Sun 17:36]
******* KILL Répartition des FP : cluster ?
CLOSED: [2023-07-09 Sun 17:36]
****** DONE Examiner les FP restant après correction selon séquence de référence
CLOSED: [2023-08-12 Sat 15:57]
****** HOLD Examiner les variants supprimé
****** TODO Enlever les FP qui correspondent à un changement dans le génome
******* Condition:
- pas de variation à la position en GRCh38
- variantion homozygote
- la varation en T2T correspond au changement de pair de base GRC38 -> T2T
pour les SNP:
alt_T2T[i] = DNA_GRC38[j]
avec i la position en T2T et j la position en GRCh38
Note: définir un ID n'est pas correct car les variants peuvent être modifié par happy !
******* Idée
- Pour chaque FP, c'est un "faux" FP si
- REF en hg38 == ALT en T2T
- et REF en hg38 != REF en T2T
- et variant homozygote
Comment obtenir les séquences de réferences ?
1. liftover
2. blat sur la séquence autour du variant
3. identifier quelques reads contenant le variant et regarder leur aligneement en hg38
Après discussion avec Alexis: solution 3
******* Algorithme
1. Extraire les coordonnées en T2T des faux positifs *homozygote*
2. Pour chaque faux positif
1. lister 10 reads contenant le variant
2. pour chacun de ces reads, récupérer la séquence en T2T et GRCh38 via le nom du read dans le bam
3. si la séquence en T2T modifiée par le variant est "identique" à celle en GRCh38, alors on ignore ce faux positif
Note: on ignore les reads qui ont changé de chromosome entre les version
******* DONE Résultat préliminaire
CLOSED: [2023-07-23 Sun 14:30]
cf [[file:~/roam/research/bisonex/code/giab/giab-corrected.csv][script julia]]
3498 faux positifs en moins, soit 0.89 sensibilité
julia> tp=15479
julia> fp=5277
julia> tp/(tp+fp)
0.7457602620928888
julia> tp/(tp+(fp-3498))
0.8969173716537258
On est toujours en dessous des 97%
******* HOLD Corriger proprement VCF ou résultats Happy
******* TODO Adapter pour gérer plusieurs variants par read
****** DONE Méthodologie du pangenome
CLOSED: [2023-10-03 Tue 21:28]
Voir biblio[cite:@liao2023] mais ont aligné sur GRCH38
******* DONE Mail alexis
CLOSED: [2023-10-03 Tue 21:28]
****** TODO Méthodologie T2T
Mail alexis
SCHEDULED: <2023-10-04 Wed>
***** KILL Mail Yannis
CLOSED: [2023-07-08 Sat 10:44]
***** DONE Mail GIAB pour version T2T
CLOSED: [2023-07-07 Fri 18:37]
**** TODO HG002 :hg002:T2T:
**** TODO HG003 :hg003:T2T:
**** TODO HG004 :hg004:T2T:
**** DONE Plot : ashkenazim trio :hg38:
CLOSED: [2023-07-30 Sun 16:49] SCHEDULED: <2023-07-30 Sun 15:00>
:LOGBOOK:
CLOCK: [2023-07-30 Sun 16:06]--[2023-07-30 Sun 16:35] => 0:29
CLOCK: [2023-07-30 Sun 15:39]--[2023-07-30 Sun 15:40] => 0:01
:END:
/Entered on/ [2023-04-16 Sun 17:29]
Refaire résultats
**** DONE Mail Paul sur les résultat ashkenazim +/- centogene
CLOSED: [2023-08-06 Sun 20:24] SCHEDULED: <2023-08-06 Sun>
**** DONE Relancer comparaison GIAB avec GATK 4.4.0
CLOSED: [2023-08-12 Sat 15:55]
/Entered on/ [2023-08-03 Thu 12:42]
*** KILL Platinum genome
CLOSED: [2023-06-14 Wed 22:37]
https://emea.illumina.com/platinumgenomes.html
*** TODO Séquencer NA12878 :cento:hg001:
Discussion avec Paul : sous-traitant ne nous donnera pas les données, il faut commander l'ADN
**** DONE ADN commandé
CLOSED: [2023-06-30 Fri 22:29]
**** DONE Sauvegarder les données brutes
CLOSED: [2023-07-30 Sun 14:22] SCHEDULED: <2023-07-19 Wed>
K, scality, S
**** KILL Récupérer le fichier de capture
CLOSED: [2023-07-30 Sun 14:25] SCHEDULED: <2023-07-23 Sun>
Candidats donnés dans publication https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8354858/
#+begin_quote
In short, the Nextera Rapid Capture Exome Kit (Illumina, San Diego, CA), the SureSelect Human All Exon kit (Agile
nt, Santa Clara, CA) or the Twist Human Core Exome was used for enrichment, and a Nextseq500, HiSeq4000, or Novoseq 6000 (Illumina) instrument was used for the actual sequencing, with the average coverage targeted to at least 100× or at least 98% of the target DNA covered 20×.
#+end_quote
Par défaut, on utilisera https://www.twistbioscience.com/products/ngs/alliance-panels#tab-3
ANnonce récente pour nouveau panel Twist : https://www.centogene.com/news-events/news/newsdetails/twist-bioscience-and-centogene-launch-three-panels-to-advance-rare-disease-and-hereditary-cancer-research-and-support-diagnostics
Masi pas de fichier BED
***** DONE Mail centogène
CLOSED: [2023-07-30 Sun 14:22] DEADLINE: <2023-07-23 Sun>
**** DONE Tester Nextera Rapid Capture Exome v1.2 (hg19) :giab:
CLOSED: [2023-08-06 Sun 19:05] SCHEDULED: <2023-08-03 Thu 19:00>
https://support.illumina.com/downloads/nextera-rapid-capture-exome-v1-2-product-files.html
***** DONE Liftover capture
CLOSED: [2023-08-06 Sun 18:30] SCHEDULED: <2023-08-06 Sun>
#+begin_src sh
nextflow run -profile standard,helios workflows/lift-nextera-capture.nf -lib lib
#+end_src
Vérification rapide : ok
***** DONE Run
CLOSED: [2023-08-06 Sun 19:05] SCHEDULED: <2023-08-06 Sun>
#+begin_src sh
nextflow run workflows/compareVCF.nf -profile standard,helios --query=out/2300346867_NA12878-63118093_S260-GRCh38/callVariant/haplotypecaller/2300346867_NA12878-63118093_S260-GRCh38.vcf.gz --outdir=out/2300346867_NA12878-63118093_S260-GRCh38/happy-nextera-lifted/ --compare=happy -lib lib --capture=capture/nexterarapidcapture_exome_targetedregions_v1.2-nochrM_lifted.bed --id=HG001 --genome=GRCh38
#+end_src
**** DONE Tester Agilent SureSelect All Exon V8 (hg38) :giab:
CLOSED: [2023-07-31 Mon 23:09] SCHEDULED: <2023-07-31 Mon>
https://earray.chem.agilent.com/suredesign/index.htm
"Find design"
"Agilent catalog"
Fichiers:
- Regions.bed: Targeted exon intervals, curated and targeted by Agilent Technologies
- MergedProbes.bed: Merged probes for targeted enrichment of exons described in Regions.bed
- Covered.bed: Merged probes and sequences with 95% homology or above
- Padded.bed: Merged probes and sequences with 95% homology or above extended 50 bp at each side
- AllTracks.bed: Targeted regions and covered tracks
#+begin_src sh
nextflow run workflows/compareVCF.nf -profile standard,helios --query=out/2300346867_63118093_NA12878-GRCh38/callVariant/haplotypecaller/2300346867_63118093_NA12878-GRCh38.vcf.gz --outdir=out/2300346867_63118093_NA12878-GRCh38/happy/ --compare=happy -lib lib --capture=capture/Agilent_SureSelect_All_Exons_v8_hg38_Regions.bed --id=HG001 --genome=GRCh38
#+end_src
| Type | Filter | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision | METRIC.Frac_NA | METRIC.F1_Score | TRUTH.TOTAL.TiTv_ratio | QUERY.TOTAL.TiTv_ratio | TRUTH.TOTAL.het_hom_ratio | QUERY.TOTAL.het_hom_ratio |
| INDEL | ALL | 423 | 395 | 28 | 915 | 108 | 405 | 4 | 13 | 0.933806 | 0.788235 | 0.442623 | 0.854868 | | | 1.7012987012987013 | 2.7916666666666665 |
| INDEL | PASS | 423 | 395 | 28 | 915 | 108 | 405 | 4 | 13 | 0.933806 | 0.788235 | 0.442623 | 0.854868 | | | 1.7012987012987013 | 2.7916666666666665 |
| SNP | ALL | 20984 | 20600 | 384 | 26080 | 780 | 4703 | 62 | 10 | 0.9817 | 0.963512 | 0.18033 | 0.972521 | 3.0499710592321048 | 2.7596541786743516 | 1.58256372367935 | 1.8978207694018234 |
| SNP | PASS | 20984 | 20600 | 384 | 26080 | 780 | 4703 | 62 | 10 | 0.9817 | 0.963512 | 0.18033 | 0.972521 | 3.0499710592321048 | 2.7596541786743516 | 1.58256372367935 | 1.8978207694018234 |
**** DONE Test Twist Human core Exome (hg38):giab:
CLOSED: [2023-08-01 Tue 23:16] SCHEDULED: <202 3-08-02 Wed>
https://www.twistbioscience.com/resources/data-files/ngs-human-core-exome-panel-bed-file
#+begin_src
nextflow run workflows/compareVCF.nf -profile standard,helios --query=out/2300346867_63118093_NA12878-GRCh38/callVariant/haplotypecaller/2300346867_63118093_NA12878-GRCh38.vcf.gz --outdir=out/2300346867_63118093_NA12878-GRCh38/happy-twist-exome-core/ --compare=happy -lib lib --capture=capture/Twist_Exome_Core_Covered_Targets_hg38.bed --id=HG001 --genome=GRCh38 -bg
#+end_src
| Type | Filter | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision | METRIC.Frac_NA | METRIC.F1_Score | TRUTH.TOTAL.TiTv_ratio | QUERY.TOTAL.TiTv_ratio | TRUTH.TOTAL.het_hom_ratio | QUERY.TOTAL.het_hom_ratio |
| INDEL | ALL | 328 | 313 | 15 | 722 | 95 | 309 | 4 | 13 | 0.954268 | 0.769976 | 0.427978 | 0.852273 | | | 1.8584070796460177 | 2.8967391304347827 |
| INDEL | PASS | 328 | 313 | 15 | 722 | 95 | 309 | 4 | 13 | 0.954268 | 0.769976 | 0.427978 | 0.852273 | | | 1.8584070796460177 | 2.8967391304347827 |
| SNP | ALL | 19198 | 18962 | 236 | 23381 | 684 | 3738 | 48 | 10 | 0.987707 | 0.965178 | 0.159873 | 0.976313 | 3.1034188034188035 | 2.859264147830391 | 1.5669565217391304 | 1.8578767123287672 |
| SNP | PASS | 19198 | 18962 | 236 | 23381 | 684 | 3738 | 48 | 10 | 0.987707 | 0.965178 | 0.159873 | 0.976313 | 3.1034188034188035 | 2.859264147830391 | 1.5669565217391304 | 1.8578767123287672 |
**** DONE Test Twist Human core Exome (hg38):giab:
CLOSED: [2023-08-05 Sat 09:25] SCHEDULED: <2023-08-03 Thu 20:00>
#+begin_src sh
ID="2300346867_NA12878-63118093_S260-GRCh38"; nextflow run workflows/compareVCF.nf -profile standard,helios --query=out/${ID}/callVariant/haplotypecaller/${ID}.vcf.gz --outdir=out/${ID}/happy-twist-exome-core/ --compare=happy -lib lib --capture=capture/Twist_Exome_Core_Covered_Targets_hg38.bed --id=HG001 --genome=GRCh38 -bg
#+end_src
**** DONE Tester Agilen SureSelect All Exon V8 (hg38) GATK-4.4:giab:
CLOSED: [2023-08-05 Sat 09:25] SCHEDULED: <2023-08-03 Thu 20:00>
**** DONE Vérifier l'impact gatk 4.3 - 4.4 : aucun
CLOSED: [2023-08-05 Sat 09:25]
**** DONE Figure comparant les 3 capture :hg001:
CLOSED: [2023-08-06 Sun 20:24] SCHEDULED: <2023-08-06 Sun>
**** DONE Mail Paul sur les 3 capture :hg001:
CLOSED: [2023-08-06 Sun 20:24] SCHEDULED: <2023-08-06 Sun>
**** KILL Tester si le panel Twist Alliance VCGS Exome suffit
CLOSED: [2023-07-31 Mon 22:31] SCHEDULED: <2023-07-30 Sun>
**** PROJ Comparer happy et happy-vcfeval :giab:
**** WAIT Mail cento pour demande le type de capture
/Entered on/ [2023-08-07 Mon 20:40]
** TODO Données CHM13 :chm:
https://github.com/lh3/CHM-eval
*** TODO Run ERR1341793
SCHEDULED: <2023-10-07 Sat>
(raw reads ERR1341793_1.fastq.gz and ERR1341793_2.fastq.gz downloaded from https://www.ebi.ac.uk/ena/browser/view/ERR1341793)
*** TODO Run ERR1341796
SCHEDULED: <2023-10-07 Sat>
** TODO Insilico :cento:
*** TODO tous les variants centogène
**** DONE Extraire liste des SNVs
CLOSED: [2023-04-22 Sat 17:32] SCHEDULED: <2023-04-17 Mon>
***** DONE Corriger manquant à la main
CLOSED: [2023-04-22 Sat 17:31]
La sortie est sauvegardé dans git-annex : variants_success.csv
***** DONE Automatique
CLOSED: [2023-04-22 Sat 17:31]
**** DONE Convert SNVs : transcript -> génomique
CLOSED: [2023-06-03 Sat 17:16]
***** DONE Variant_recoder
CLOSED: [2023-
2841 | 26 | 58 | 0.977661 | 0.529245 |
Hg38
| Type | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision |
| INDEL | 549 | 489 | 60 | 899 | 64 | 340 | 8 | 17 | 0.890710 | 0.885510 |
| SNP | 21973 | 21462 | 511 | 26285 | 563 | 4263 | 68 | 16 | 0.976744 | 0.974435 |
****** DONE Interesection des bed: similaire
CLOSED: [2023-07-04 Tue 23:11]
HG38
#+begin_src sh
bedtools intersect -a capture/Agilent_SureSelect_All_Exons_v7_hg38_Regions.bed -b /Work/Groups/bisonex/data/giab/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.bed | wc -l
#+end_src
204280
T2T
#+begin_src sh
bedtools intersect -a /Work/Groups/bisonex/data/giab/T2T/Agilent_SureSelect_All_Exons_v7_hg38_Regions_hg38_T2T.bed -b /Work/Groups/bisonex/data/giab/T2T/HG001_GRCh38_1_22_v4.2.1_benchmark_hg38_T2T.bed | wc -l
#+end_src
204021
****** DONE Vérifier la ligne de commande
CLOSED: [2023-07-04 Tue 23:38]
#+begin_src sh
hap.py \
HG001_GRCh38_1_22_v4_lifted_merged.vcf.gz \
HG001-SRX11061486_SRR14724513-T2T.vcf.gz \
\
--reference chm13v2.0.fa \
--threads 6 \
\
-T Agilent_SureSelect_All_Exons_v7_hg38_Regions_hg38_T2T.bed \
--false-positives HG001_GRCh38_1_22_v4.2.1_benchmark_hg38_T2T.bed \
\
-o HG001
#+end_src
****** DONE Corriger FILTER : mieux mais toujours trop de négatifs. 3/4 SNP retrouvés
CLOSED: [2023-07-08 Sat 15:19] SCHEDULED: <2023-07-08 Sat>
Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio
INDEL ALL 413 246 167 751 289 215 2 98 0.595642 0.460821 0.286285 0.519629 NaN NaN 2.428571 2.465116
INDEL PASS 413 246 167 751 289 215 2 98 0.595642 0.460821 0.286285 0.519629 NaN NaN 2.428571 2.465116
SNP ALL 15883 15479 404 23597 5277 2841 46 44 0.974564 0.745760 0.120397 0.844947 3.017198 2.85705 5.560099 2.114633
SNP PASS 15883 15479 404 23597 5277 2841 46 44 0.974564 0.745760 0.120397 0.844947 3.017198 2.85705 5.560099 2.114633
******* DONE Vérifier qu'il ne reste plus de filtre autre que PASS
CLOSED: [2023-07-08 Sat 15:19]
#+begin_src
$ zgrep -c 'PASS' HG001_GRCh38_1_22_v4_lifted_merged.vcf.gz
3730505
$ zgrep -c '^chr' HG001_GRCh38_1_22_v4_lifted_merged.vcf.gz
3730506
#+end_src
****** TODO 1/4 SNP manquant ?
******* DONE Regarder avec Julia si ce sont vraiment des FP: 61/5277 qui ne le sont pas
CLOSED: [2023-07-09 Sun 12:09]
******* DONE Examiner les FP
CLOSED: [2023-07-30 Sun 22:05]
******* DONE Tester un FP
CLOSED: [2023-07-30 Sun 22:05]
2 │ chr1 608765 A G ./.:.:.:.:NOCALL:nocall:. 1/1:FP:.:ti:SNP:homalt:188
liftDown UCSC: rien en GIAB : vrai FP
3 │ chr1 762943 A G ./.:.:.:.:NOCALL:nocall:. 1/1:FP:.:ti:SNP:homalt:287
4 │ chr1 762945 A T ./.:.:.:.:NOCALL:nocall:. 1/1:FP:.:tv:SNP:homalt:287
Remaniements complexes ? Pas dans le gène en HG38
******* DONE La plupart des FP (4705/5566) sont homozygotes: erreur de référence ?
CLOSED: [2023-07-12 Wed 21:10] SCHEDULED: <2023-07-09 Sun>
Sur les 2 premiers variants, ils montrent en fait la différence entre T2T et GRCh38
Erreur à l'alignement ?
******** KILL relancer l'alignement
CLOSED: [2023-07-09 Sun 17:36]
******** DONE vérifier reads identiques hg38 et T2T: oui
CLOSED: [2023-07-09 Sun 16:36]
T2T CHR1608765
38 chr1:1180168-1180168 (
SRR14724513.24448214
SRR14724513.24448214
******* DONE Vérifier quelques variants sur IGV
CLOSED: [2023-07-09 Sun 17:36]
******* KILL Répartition des FP : cluster ?
CLOSED: [2023-07-09 Sun 17:36]
****** DONE Examiner les FP restant après correction selon séquence de référence
CLOSED: [2023-08-12 Sat 15:57]
****** HOLD Examiner les variants supprimé
****** TODO Enlever les FP qui correspondent à un changement dans le génome
******* Condition:
- pas de variation à la position en GRCh38
- variantion homozygote
- la varation en T2T correspond au changement de pair de base GRC38 -> T2T
pour les SNP:
alt_T2T[i] = DNA_GRC38[j]
avec i la position en T2T et j la position en GRCh38
Note: définir un ID n'est pas correct car les variants peuvent être modifié par happy !
******* Idée
- Pour chaque FP, c'est un "faux" FP si
- REF en hg38 == ALT en T2T
- et REF en hg38 != REF en T2T
- et variant homozygote
Comment obtenir les séquences de réferences ?
1. liftover
2. blat sur la séquence autour du variant
3. identifier quelques reads contenant le variant et regarder leur aligneement en hg38
Après discussion avec Alexis: solution 3
******* Algorithme
1. Extraire les coordonnées en T2T des faux positifs *homozygote*
2. Pour chaque faux positif
1. lister 10 reads contenant le variant
2. pour chacun de ces reads, récupérer la séquence en T2T et GRCh38 via le nom du read dans le bam
3. si la séquence en T2T modifiée par le variant est "identique" à celle en GRCh38, alors on ignore ce faux positif
Note: on ignore les reads qui ont changé de chromosome entre les version
******* DONE Résultat préliminaire
CLOSED: [2023-07-23 Sun 14:30]
cf [[file:~/roam/research/bisonex/code/giab/giab-corrected.csv][script julia]]
3498 faux positifs en moins, soit 0.89 sensibilité
julia> tp=15479
julia> fp=5277
julia> tp/(tp+fp)
0.7457602620928888
julia> tp/(tp+(fp-3498))
0.8969173716537258
On est toujours en dessous des 97%
******* HOLD Corriger proprement VCF ou résultats Happy
******* TODO Adapter pour gérer plusieurs variants par read
****** DONE Méthodologie du pangenome
CLOSED: [2023-10-03 Tue 21:28]
Voir biblio[cite:@liao2023] mais ont aligné sur GRCH38
******* DONE Mail alexis
CLOSED: [2023-10-03 Tue 21:28]
****** TODO Méthodologie T2T
Mail alexis
SCHEDULED: <2023-10-04 Wed>
***** KILL Mail Yannis
CLOSED: [2023-07-08 Sat 10:44]
***** DONE Mail GIAB pour version T2T
CLOSED: [2023-07-07 Fri 18:37]
**** TODO HG002 :hg002:T2T:
**** TODO HG003 :hg003:T2T:
**** TODO HG004 :hg004:T2T:
**** DONE Plot : ashkenazim trio :hg38:
CLOSED: [2023-07-30 Sun 16:49] SCHEDULED: <2023-07-30 Sun 15:00>
:LOGBOOK:
CLOCK: [2023-07-30 Sun 16:06]--[2023-07-30 Sun 16:35] => 0:29
CLOCK: [2023-07-30 Sun 15:39]--[2023-07-30 Sun 15:40] => 0:01
:END:
/Entered on/ [2023-04-16 Sun 17:29]
Refaire résultats
**** DONE Mail Paul sur les résultat ashkenazim +/- centogene
CLOSED: [2023-08-06 Sun 20:24] SCHEDULED: <2023-08-06 Sun>
**** DONE Relancer comparaison GIAB avec GATK 4.4.0
CLOSED: [2023-08-12 Sat 15:55]
/Entered on/ [2023-08-03 Thu 12:42]
*** KILL Platinum genome
CLOSED: [2023-06-14 Wed 22:37]
https://emea.illumina.com/platinumgenomes.html
*** DONE Séquencer NA12878 :cento:hg001:
CLOSED: [2023-10-07 Sat 17:59]
Discussion avec Paul : sous-traitant ne nous donnera pas les données, il faut commander l'ADN
**** DONE ADN commandé
CLOSED: [2023-06-30 Fri 22:29]
**** DONE Sauvegarder les données brutes
CLOSED: [2023-07-30 Sun 14:22] SCHEDULED: <2023-07-19 Wed>
K, scality, S
**** KILL Récupérer le fichier de capture
CLOSED: [2023-07-30 Sun 14:25] SCHEDULED: <2023-07-23 Sun>
Candidats donnés dans publication https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8354858/
#+begin_quote
In short, the Nextera Rapid Capture Exome Kit (Illumina, San Diego, CA), the SureSelect Human All Exon kit (Agilent, Santa Clara, CA) or the Twist Human Core Exome was used for enrichment, and a Nextseq500, HiSeq4000, or Novoseq 6000 (Illumina) instrument was used for the actual sequencing, with the average coverage targeted to at least 100× or at least 98% of the target DNA covered 20×.
#+end_quote
Par défaut, on utilisera https://www.twistbioscience.com/products/ngs/alliance-panels#tab-3
ANnonce récente pour nouveau panel Twist : https://www.centogene.com/news-events/news/newsdetails/twist-bioscience-and-centogene-launch-three-panels-to-advance-rare-disease-and-hereditary-cancer-research-and-support-diagnostics
Masi pas de fichier BED
***** DONE Mail centogène
CLOSED: [2023-07-30 Sun 14:22] DEADLINE: <2023-07-23 Sun>
**** DONE Tester Nextera Rapid Capture Exome v1.2 (hg19) :giab:
CLOSED: [2023-08-06 Sun 19:05] SCHEDULED: <2023-08-03 Thu 19:00>
https://support.illumina.com/downloads/nextera-rapid-capture-exome-v1-2-product-files.html
***** DONE Liftover capture
CLOSED: [2023-08-06 Sun 18:30] SCHEDULED: <2023-08-06 Sun>
#+begin_src sh
nextflow run -profile standard,helios workflows/lift-nextera-capture.nf -lib lib
#+end_src
Vérification rapide : ok
***** DONE Run
CLOSED: [2023-08-06 Sun 19:05] SCHEDULED: <2023-08-06 Sun>
#+begin_src sh
nextflow run workflows/compareVCF.nf -profile standard,helios --query=out/2300346867_NA12878-63118093_S260-GRCh38/callVariant/haplotypecaller/2300346867_NA12878-63118093_S260-GRCh38.vcf.gz --outdir=out/2300346867_NA12878-63118093_S260-GRCh38/happy-nextera-lifted/ --compare=happy -lib lib --capture=capture/nexterarapidcapture_exome_targetedregions_v1.2-nochrM_lifted.bed --id=HG001 --genome=GRCh38
#+end_src
**** DONE Tester Agilent SureSelect All Exon V8 (hg38) :giab:
CLOSED: [2023-07-31 Mon 23:09] SCHEDULED: <2023-07-31 Mon>
https://earray.chem.agilent.com/suredesign/index.htm
"Find design"
"Agilent catalog"
Fichiers:
- Regions.bed: Targeted exon intervals, curated and targeted by Agilent Technologies
- MergedProbes.bed: Merged probes for targeted enrichment of exons described in Regions.bed
- Covered.bed: Merged probes and sequences with 95% homology or above
- Padded.bed: Merged probes and sequences with 95% homology or above extended 50 bp at each side
- AllTracks.bed: Targeted regions and covered tracks
#+begin_src sh
nextflow run workflows/compareVCF.nf -profile standard,helios --query=out/2300346867_63118093_NA12878-GRCh38/callVariant/haplotypecaller/2300346867_63118093_NA12878-GRCh38.vcf.gz --outdir=out/2300346867_63118093_NA12878-GRCh38/happy/ --compare=happy -lib lib --capture=capture/Agilent_SureSelect_All_Exons_v8_hg38_Regions.bed --id=HG001 --genome=GRCh38
#+end_src
| Type | Filter | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision | METRIC.Frac_NA | METRIC.F1_Score | TRUTH.TOTAL.TiTv_ratio | QUERY.TOTAL.TiTv_ratio | TRUTH.TOTAL.het_hom_ratio | QUERY.TOTAL.het_hom_ratio |
| INDEL | ALL | 423 | 395 | 28 | 915 | 108 | 405 | 4 | 13 | 0.933806 | 0.788235 | 0.442623 | 0.854868 | | | 1.7012987012987013 | 2.7916666666666665 |
| INDEL | PASS | 423 | 395 | 28 | 915 | 108 | 405 | 4 | 13 | 0.933806 | 0.788235 | 0.442623 | 0.854868 | | | 1.7012987012987013 | 2.7916666666666665 |
| SNP | ALL | 20984 | 20600 | 384 | 26080 | 780 | 4703 | 62 | 10 | 0.9817 | 0.963512 | 0.18033 | 0.972521 | 3.0499710592321048 | 2.7596541786743516 | 1.58256372367935 | 1.8978207694018234 |
| SNP | PASS | 20984 | 20600 | 384 | 26080 | 780 | 4703 | 62 | 10 | 0.9817 | 0.963512 | 0.18033 | 0.972521 | 3.0499710592321048 | 2.7596541786743516 | 1.58256372367935 | 1.8978207694018234 |
**** DONE Test Twist Human core Exome (hg38):giab:
CLOSED: [2023-08-01 Tue 23:16] SCHEDULED: <202 3-08-02 Wed>
https://www.twistbioscience.com/resources/data-files/ngs-human-core-exome-panel-bed-file
#+begin_src
nextflow run workflows/compareVCF.nf -profile standard,helios --query=out/2300346867_63118093_NA12878-GRCh38/callVariant/haplotypecaller/2300346867_63118093_NA12878-GRCh38.vcf.gz --outdir=out/2300346867_63118093_NA12878-GRCh38/happy-twist-exome-core/ --compare=happy -lib lib --capture=capture/Twist_Exome_Core_Covered_Targets_hg38.bed --id=HG001 --genome=GRCh38 -bg
#+end_src
| Type | Filter | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision | METRIC.Frac_NA | METRIC.F1_Score | TRUTH.TOTAL.TiTv_ratio | QUERY.TOTAL.TiTv_ratio | TRUTH.TOTAL.het_hom_ratio | QUERY.TOTAL.het_hom_ratio |
| INDEL | ALL | 328 | 313 | 15 | 722 | 95 | 309 | 4 | 13 | 0.954268 | 0.769976 | 0.427978 | 0.852273 | | | 1.8584070796460177 | 2.8967391304347827 |
| INDEL | PASS | 328 | 313 | 15 | 722 | 95 | 309 | 4 | 13 | 0.954268 | 0.769976 | 0.427978 | 0.852273 | | | 1.8584070796460177 | 2.8967391304347827 |
| SNP | ALL | 19198 | 18962 | 236 | 23381 | 684 | 3738 | 48 | 10 | 0.987707 | 0.965178 | 0.159873 | 0.976313 | 3.1034188034188035 | 2.859264147830391 | 1.5669565217391304 | 1.8578767123287672 |
| SNP | PASS | 19198 | 18962 | 236 | 23381 | 684 | 3738 | 48 | 10 | 0.987707 | 0.965178 | 0.159873 | 0.976313 | 3.1034188034188035 | 2.859264147830391 | 1.5669565217391304 | 1.8578767123287672 |
**** DONE Test Twist Human core Exome (hg38):giab:
CLOSED: [2023-08-05 Sat 09:25] SCHEDULED: <2023-08-03 Thu 20:00>
#+begin_src sh
ID="2300346867_NA12878-63118093_S260-GRCh38"; nextflow run workflows/compareVCF.nf -profile standard,helios --query=out/${ID}/callVariant/haplotypecaller/${ID}.vcf.gz --outdir=out/${ID}/happy-twist-exome-core/ --compare=happy -lib lib --capture=capture/Twist_Exome_Core_Covered_Targets_hg38.bed --id=HG001 --genome=GRCh38 -bg
#+end_src
**** DONE Tester Agilen SureSelect All Exon V8 (hg38) GATK-4.4:giab:
CLOSED: [2023-08-05 Sat 09:25] SCHEDULED: <2023-08-03 Thu 20:00>
**** DONE Vérifier l'impact gatk 4.3 - 4.4 : aucun
CLOSED: [2023-08-05 Sat 09:25]
**** DONE Figure comparant les 3 capture :hg001:
CLOSED: [2023-08-06 Sun 20:24] SCHEDULED: <2023-08-06 Sun>
**** DONE Mail Paul sur les 3 capture :hg001:
CLOSED: [2023-08-06 Sun 20:24] SCHEDULED: <2023-08-06 Sun>
**** KILL Tester si le panel Twist Alliance VCGS Exome suffit
CLOSED: [2023-07-31 Mon 22:31] SCHEDULED: <2023-07-30 Sun>
**** DONE Mail cento pour demande le type de capture
CLOSED: [2023-10-07 Sat 17:59]
/Entered on/ [2023-08-07 Mon 20:40]
Twist exome
*** PROJ Comparer happy et happy-vcfeval :giab:
** TODO Données CHM13 :chm:
https://github.com/lh3/CHM-eval
*** TODO Run ERR1341793
SCHEDULED: <2023-10-14 Sat>
(raw reads ERR1341793_1.fastq.gz and ERR1341793_2.fastq.gz downloaded from https://www.ebi.ac.uk/ena/browser/view/ERR1341793)
*** TODO Run ERR1341796
SCHEDULED: <2023-10-14 Sat>
** TODO Insilico :cento:
*** TODO tous les variants centogène
**** DONE Extraire liste des SNVs
CLOSED: [2023-04-22 Sat 17:32] SCHEDULED: <2023-04-17 Mon>
***** DONE Corriger manquant à la main
CLOSED: [2023-04-22 Sat 17:31]
La sortie est sauvegardé dans git-annex : variants_success.csv
***** DONE Automatique
CLOSED: [2023-04-22 Sat 17:31]
**** DONE Convert SNVs : transcript -> génomique
CLOSED: [2023-06-03 Sat 17:16]
***** DONE Variant_recoder
CLOSED: [2023-
LED: <2023-10-10 Tue>
**** DONE En T2T avec liftover (filtre = spip) : ok mais lent et trop de variants :tests:
CLOSED: [2023-09-17 Sun 17:13] SCHEDULED: <2023-09-17 Sun>
1. Conversion en bed
#+begin_src sh :dir:~/code/sanger
open snvs-cento-sanger.csv | select chrom pos | insert pos2 {$in.pos } | to csv --separator="\t" | save snvs-cento-sanger.bed -f
#+end_src
2. Liftover avec UCSC (en ligne)
NB: vérifié sur le premier résultat en cherche le read contenant le variant (samtools view -r puis samtools view | grep en T2T) et avec l'aide d'IGV, on a un variant qui correspond en
chr1:10757746
3. En supposant que l'ordre des variants n'a pas changé, on ajoute simplement REF et ALT avec annotateLifted.jl
Annotation spip *très lente* : 1h13 !
Résultat:
2×3 DataFrame
Row │ variant meanQual depth
│ String Float64 Int64
─────┼──────────────────────────────────────
1 │ chr12:g.13594572 60.0 1
2 │ chr17:g.10204026 60.0 1
144 found over 146
filter depth : another 0 missed variants
filter poly : another 0 missed variants
filter vep : another 0 missed variants
Et on a trop de variants en sortie (7330 !)
**** DONE Mail Paul avec résultats filtre en T2T + nouveau schéma
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
* Ré-interprétation
** PROJ Lancer tests sur données brutes [146/250]
- [X] 100222_63015289
- [X] 1600304839_63051311
- [X] 1900007827_62913191
- [X] 1900398899_62999500
- [X] 1900486799_62913197
- [X] 2100422923_62952677
- [X] 2100458888_62933047
- [X] 2100601558_62903840
- [X] 2100609288_62905768
- [X] 2100609501_62905776
- [X] 2100614493_62951074
- [X] 2100622566_62908067
- [X] 2100622601_62908060
- [X] 2100622705_62908063
- [X] 2100640027_62911936
- [X] 2100645285_62913212
- [X] 2100661411_62914081
- [X] 2100661462_62914086
- [X] 2100708257_62921596
- [X] 2100738732_62926501
- [X] 2100738850_62926509
- [X] 2100746751_62926505
- [X] 2100746797_62926506
- [X] 2100782349_62931722
- [X] 2100782416_62931561
- [X] 2100782559_62931718
- [X] 2100799204_62934768
- [X] 2200010202_62940284
- [X] 2200023600_62940631
- [X] 2200024348_62999591
- [X] 2200027505_62942457
- [X] 2200038776_62943412
- [X] 2200041919_62943405
- [X] 2200088014_62951326
- [X] 2200146652_62959388
- [X] 2200151850_62960953
- [X] 2200160014_62959475
- [X] 2200160070_62959478
- [X] 2200201368_62967471
- [X] 2200201400_62967470
- [X] 2200265558_62976332
- [X] 2200265605_62976401
- [X] 2200267046_62975192
- [X] 2200273878_62999530
- [X] 2200279708_62977002
- [X] 2200284408_62979102
- [X] 2200293987_62979116
- [X] 2200294359_62979118
- [X] 2200306299_62982217
- [X] 2200306539_62982193
- [X] 220030671_62982211
- [X] 2200307058_62982231
- [X] 2200307108_62982196
- [X] 2200307136_62982221
- [X] 2200307199_62982239
- [X] 2200307230_62982234
- [X] 2200307262_62982219
- [X] 2200307297_62982227
- [X] 2200324510_62985453
- [X] 2200324549_62985478
- [X] 2200324573_62985445
- [X] 2200324594_62985467
- [X] 2200324606_62985463
- [X] 2200324614_62985459
- [X] 2200338306_62985430
- [X] 2200343880_62989407
- [X] 2200343910_62989460
- [X] 2200343938_62989451
- [X] 2200343966_62989456
- [X] 2200343993_62989440
- [X] 2200344013_62989464
- [X] 2200349749_62989465
- [X] 2200363462_62988848
- [X] 2200377880_62991993
- [X] 2200378032_62991991
- [X] 2200383996_62993828
- [X] 2200384015_62993796
- [X] 2200384046_62993822
- [X] 2200384117_62993808
- [X] 2200384187_62993825
- [X] 2200384231_62992898
- [X] 2200385658_63060260
- [X] 2200394260_62994732
- [X] 2200395817_62994742
- [X] 2200396731_62994737
- [X] 2200424073_62999579
- [X] 2200424207_62999632
- [X] 2200426178_62999630
- [X] 2200426243_62999635
- [X] 2200426466_62999605
- [X] 2200426642_62999627
- [X] 2200427406_62999649
- [X] 2200427512_62999639
- [X] 2200428953_62999572
- [X] 2200428981_62999600
- [X] 2200428999_62999592
- [X] 2200441970_63000868
- [X] 2200441989_63000882
- [X] 2200442135_63000864
- [X] 2200442216_63000886
- [X] 2200442257_63000951
- [X] 2200451801_63003573
- [X] 2200451862_63004218
- [X] 2200451894_63004210
- [X] 2200456165_63051294
- [X] 2200459865_63004933
- [X] 2200459968_63004937
- [X] 2200460073_63004943
- [X] 2200460121_63004684
- [X] 2200467051_63003856
- [X] 2200467225_63004940
- [X] 2200467261_63004930
- [X] 2200467338_63004925
- [X] 2200470099_63004485
- [X] 2200470142_63004480
- [X] 2200471780_63004362
- [X] 2200480910_63006466
- [X] 2200495073_63010427
- [X] 2200495510_63009152
- [X] 2200508677_63060252
- [X] 2200510531_63012582
- [X] 2200510628_63012549
- [X] 2200510657_63012554
- [X] 2200511249_63012533
- [X] 2200511274_63012586
- [X] 2200517952_63060399
- [X] 2200519525_63060439
- [X] 2200524009_63014044
- [ ] 2200524609_63014046
- [X] 2200524616_63014048
- [X] 2200533429_63060425
- [X] 2200539735_63060406
- [X] 2200549908_63019339
- [X] 2200549965_63019349
- [X] 2200550414_63019357
- [X] 2200550471_63020031
- [X] 2200550490_63019351
- [X] 2200550505_63019340
- [X] 2200555565_63018614
- [X] 2200559438_63020029
- [X] 2200559682_63020030
- [X] 2200559713_63019623
- [X] 2200559739_63019626
- [X] 2200569969_63019991
- [X] 2200570001_63021580
- [X] 2200570025_63021490
- [X] 2200570035_63021491
- [ ] 2200570042_63021493
- [ ] 2200570050_63021494
- [ ] 2200579897_63024910
- [ ] 2200583995_63024866
- [ ] 2200584035_63024905
- [ ] 2200584069_63024888
- [ ] 2200584126_63024810
- [ ] 2200589507_63026712
- [ ] 2200597365_63027994
- [ ] 2200597480_63027988
- [ ] 2200597752_63026853
- [ ] 2200597778_63027992
- [ ] 22005977_63026903
- [ ] 2200609031_63026527
- [ ] 2200614198_63113928
- [ ] 2200620372_63030821
- [ ] 2200620442_63030810
- [ ] 2200620498_63030816
- [ ] 2200620628_63031031
- [ ] 2200622310_63030984
- [ ] 2200622355_63030956
- [ ] 2200625369_63028699
- [ ] 2200625410_63028697
- [ ] 2200625536_63028694
- [ ] 2200630189_63030665
- [ ] 2200635149_63033182
- [ ] 2200644544_63037731
- [ ] 2200644594_63037725
- [ ] 2200650089_63038093
- [ ] 2200666292_63076568
- [ ] 2200669188_63036688
- [ ] 2200669320_63040259
- [ ] 2200669383_63040254
- [ ] 2200669414_63040257
- [ ] 2200669446_63040251
- [ ] 2200680342_63105271
- [ ] 2200694535_63042853
- [ ] 2200694789_63042862
- [ ] 2200694858_63042702
- [ ] 2200694917_63042696
- [ ] 2200699290_63043047
- [ ] 2200699345_63040238
- [ ] 2200699383_63043050
- [ ] 2200699412_63040731
- [ ] 220071551_63048935
- [ ] 2200731515_63048963
- [ ] 2200748145_63051198
- [ ] 2200748171_63051213
- [ ] 2200751046_63051249
- [ ] 2200751101_63051234
- [ ] 2200766471_63054590
- [ ] 2200767731_63054595
- [ ] 2200767822_63054464
- [ ] 2200775505_63060410
- [ ] 2200850441_63019345
- [ ] 220597589_63026879
- [ ] 2300003253_63060430
- [ ] 2300005679_63060370
- [ ] 2300009914_63060390
- [ ] 2300028784_63060001
- [ ] 2300036815_63063357
- [ ] 2300055382_63061874
- [ ] 2300055421_63061871
- [ ] 2300055440_63061880
- [ ] 230006894_63064950
- [ ] 2300071111_63070356
- [ ] 2300083434_63071675
- [ ] 2300103609_63076239
- [ ] 2300104572_63076232
- [ ] 2300109602_63076765
- [ ] 2300109665_63076770
- [ ] 2300119721_63078732
- [ ] 2300137773_63078133
- [ ] 2300137834_63078123
- [ ] 2300167821_63086183
- [ ] 2300172698_63113453
- [ ] 2300188216_63090609
- [ ] 2300188281_63090632
- [ ] 2300188800_63090616
- [ ] 2300193193645_63090623
- [ ] 2300193668_63090611
- [ ] 2300195426_63090608
- [ ] 2300201017_63089636
- [ ] 2300227479_63098330
- [ ] 2300232688_63130821
- [ ] 2300292749_63109239
- [ ] 230029277_63109247
- [ ] 2300294712_63109236
- [ ] 2300308032_63111581
- [ ] 2300323537_63114209
- [ ] 2300334609_63115535
- [ ] 2300346867_63118093
- [ ] 2300346867_63118093_NA12878
- [ ] 2300348940_63118099
- [ ] 2300359806_63119915
- [ ] 2300380476_63123963
- [ ] 2300382582_63123749
- [ ] 2300384269_63126867
- [ ] 2300407581_63130826
- [ ] 2300407626_63130842
- [ ] 2300409593_63130874
- [ ] 2300409612_63130980
- [ ] 2300417623_63131524
* Résultats
** TODO Speed-up BWA-mem
SCHEDULED: <2023-10-08 Sun>
** TODO Speed-up Hapotypecaller
SCHEDULED: <2023-10-08 Sun>
* Communication
** DONE Mail NGS-diag
CLOSED: [2023-10-06 Fri 08:04] SCHEDULED: <2023-10-06 Fri>
/Entered on/ [2023-10-04 Wed 19:33]
LED: <2023-10-10 Tue>
**** DONE En T2T avec liftover (filtre = spip) : ok mais lent et trop de variants :tests:
CLOSED: [2023-09-17 Sun 17:13] SCHEDULED: <2023-09-17 Sun>
1. Conversion en bed
#+begin_src sh :dir:~/code/sanger
open snvs-cento-sanger.csv | select chrom pos | insert pos2 {$in.pos } | to csv --separator="\t" | save snvs-cento-sanger.bed -f
#+end_src
2. Liftover avec UCSC (en ligne)
NB: vérifié sur le premier résultat en cherche le read contenant le variant (samtools view -r puis samtools view | grep en T2T) et avec l'aide d'IGV, on a un variant qui correspond en
chr1:10757746
3. En supposant que l'ordre des variants n'a pas changé, on ajoute simplement REF et ALT avec annotateLifted.jl
Annotation spip *très lente* : 1h13 !
Résultat:
2×3 DataFrame
Row │ variant meanQual depth
│ String Float64 Int64
─────┼──────────────────────────────────────
1 │ chr12:g.13594572 60.0 1
2 │ chr17:g.10204026 60.0 1
144 found over 146
filter depth : another 0 missed variants
filter poly : another 0 missed variants
filter vep : another 0 missed variants
Et on a trop de variants en sortie (7330 !)
**** DONE Mail Paul avec résultats filtre en T2T + nouveau schéma
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
* Ré-interprétation
** PROJ Lancer tests sur données brutes [167/250] <(samples.csv)> <(runs.waiting)>
- [X] 100222_63015289
- [X] 1600304839_63051311
- [X] 1900007827_62913191
- [X] 1900398899_62999500
- [X] 1900486799_62913197
- [X] 2100422923_62952677
- [X] 2100458888_62933047
- [X] 2100601558_62903840
- [X] 2100609288_62905768
- [X] 2100609501_62905776
- [X] 2100614493_62951074
- [X] 2100622566_62908067
- [X] 2100622601_62908060
- [X] 2100622705_62908063
- [X] 2100640027_62911936
- [X] 2100645285_62913212
- [X] 2100661411_62914081
- [X] 2100661462_62914086
- [X] 2100708257_62921596
- [X] 2100738732_62926501
- [X] 2100738850_62926509
- [X] 2100746751_62926505
- [X] 2100746797_62926506
- [X] 2100782349_62931722
- [X] 2100782416_62931561
- [X] 2100782559_62931718
- [X] 2100799204_62934768
- [X] 2200010202_62940284
- [X] 2200023600_62940631
- [X] 2200024348_62999591
- [X] 2200027505_62942457
- [X] 2200038776_62943412
- [X] 2200041919_62943405
- [X] 2200088014_62951326
- [X] 2200146652_62959388
- [X] 2200151850_62960953
- [X] 2200160014_62959475
- [X] 2200160070_62959478
- [X] 2200201368_62967471
- [X] 2200201400_62967470
- [X] 2200265558_62976332
- [X] 2200265605_62976401
- [X] 2200267046_62975192
- [X] 2200273878_62999530
- [X] 2200279708_62977002
- [X] 2200284408_62979102
- [X] 2200293987_62979116
- [X] 2200294359_62979118
- [X] 2200306299_62982217
- [X] 2200306539_62982193
- [X] 220030671_62982211
- [X] 2200307058_62982231
- [X] 2200307108_62982196
- [X] 2200307136_62982221
- [X] 2200307199_62982239
- [X] 2200307230_62982234
- [X] 2200307262_62982219
- [X] 2200307297_62982227
- [X] 2200324510_62985453
- [X] 2200324549_62985478
- [X] 2200324573_62985445
- [X] 2200324594_62985467
- [X] 2200324606_62985463
- [X] 2200324614_62985459
- [X] 2200338306_62985430
- [X] 2200343880_62989407
- [X] 2200343910_62989460
- [X] 2200343938_62989451
- [X] 2200343966_62989456
- [X] 2200343993_62989440
- [X] 2200344013_62989464
- [X] 2200349749_62989465
- [X] 2200363462_62988848
- [X] 2200377880_62991993
- [X] 2200378032_62991991
- [X] 2200383996_62993828
- [X] 2200384015_62993796
- [X] 2200384046_62993822
- [X] 2200384117_62993808
- [X] 2200384187_62993825
- [X] 2200384231_62992898
- [X] 2200385658_63060260
- [X] 2200394260_62994732
- [X] 2200395817_62994742
- [X] 2200396731_62994737
- [X] 2200424073_62999579
- [X] 2200424207_62999632
- [X] 2200426178_62999630
- [X] 2200426243_62999635
- [X] 2200426466_62999605
- [X] 2200426642_62999627
- [X] 2200427406_62999649
- [X] 2200427512_62999639
- [X] 2200428953_62999572
- [X] 2200428981_62999600
- [X] 2200428999_62999592
- [X] 2200441970_63000868
- [X] 2200441989_63000882
- [X] 2200442135_63000864
- [X] 2200442216_63000886
- [X] 2200442257_63000951
- [X] 2200451801_63003573
- [X] 2200451862_63004218
- [X] 2200451894_63004210
- [X] 2200456165_63051294
- [X] 2200459865_63004933
- [X] 2200459968_63004937
- [X] 2200460073_63004943
- [X] 2200460121_63004684
- [X] 2200467051_63003856
- [X] 2200467225_63004940
- [X] 2200467261_63004930
- [X] 2200467338_63004925
- [X] 2200470099_63004485
- [X] 2200470142_63004480
- [X] 2200471780_63004362
- [X] 2200480910_63006466
- [X] 2200495073_63010427
- [X] 2200495510_63009152
- [X] 2200508677_63060252
- [X] 2200510531_63012582
- [X] 2200510628_63012549
- [X] 2200510657_63012554
- [X] 2200511249_63012533
- [X] 2200511274_63012586
- [X] 2200517952_63060399
- [X] 2200519525_63060439
- [X] 2200524009_63014044
- [X] 2200524609_63014046
- [X] 2200524616_63014048
- [X] 2200533429_63060425
- [X] 2200539735_63060406
- [X] 2200549908_63019339
- [X] 2200549965_63019349
- [X] 2200550414_63019357
- [X] 2200550471_63020031
- [X] 2200550490_63019351
- [X] 2200550505_63019340
- [X] 2200555565_63018614
- [X] 2200559438_63020029
- [X] 2200559682_63020030
- [X] 2200559713_63019623
- [X] 2200559739_63019626
- [X] 2200569969_63019991
- [X] 2200570001_63021580
- [X] 2200570025_63021490
- [X] 2200570035_63021491
- [X] 2200570042_63021493
- [X] 2200570050_63021494
- [X] 2200579897_63024910
- [X] 2200583995_63024866
- [X] 2200584035_63024905
- [X] 2200584069_63024888
- [X] 2200584126_63024810
- [X] 2200589507_63026712
- [X] 2200597365_63027994
- [X] 2200597480_63027988
- [X] 2200597752_63026853
- [X] 2200597778_63027992
- [X] 22005977_63026903
- [X] 2200609031_63026527
- [X] 2200614198_63113928
- [X] 2200620372_63030821
- [X] 2200620442_63030810
- [X] 2200620498_63030816
- [X] 2200620628_63031031
- [X] 2200622310_63030984
- [ ] 2200622355_63030956
- [ ] 2200625369_63028699
- [ ] 2200625410_63028697
- [ ] 2200625536_63028694
- [ ] 2200630189_63030665
- [ ] 2200635149_63033182
- [ ] 2200644544_63037731
- [ ] 2200644594_63037725
- [ ] 2200650089_63038093
- [ ] 2200666292_63076568
- [ ] 2200669188_63036688
- [ ] 2200669320_63040259
- [ ] 2200669383_63040254
- [ ] 2200669414_63040257
- [ ] 2200669446_63040251
- [ ] 2200680342_63105271
- [ ] 2200694535_63042853
- [ ] 2200694789_63042862
- [ ] 2200694858_63042702
- [ ] 2200694917_63042696
- [ ] 2200699290_63043047
- [ ] 2200699345_63040238
- [ ] 2200699383_63043050
- [ ] 2200699412_63040731
- [ ] 220071551_63048935
- [ ] 2200731515_63048963
- [ ] 2200748145_63051198
- [ ] 2200748171_63051213
- [ ] 2200751046_63051249
- [ ] 2200751101_63051234
- [ ] 2200766471_63054590
- [ ] 2200767731_63054595
- [ ] 2200767822_63054464
- [ ] 2200775505_63060410
- [ ] 2200850441_63019345
- [ ] 220597589_63026879
- [ ] 2300003253_63060430
- [ ] 2300005679_63060370
- [ ] 2300009914_63060390
- [ ] 2300028784_63060001
- [ ] 2300036815_63063357
- [ ] 2300055382_63061874
- [ ] 2300055421_63061871
- [ ] 2300055440_63061880
- [ ] 230006894_63064950
- [ ] 2300071111_63070356
- [ ] 2300083434_63071675
- [ ] 2300103609_63076239
- [ ] 2300104572_63076232
- [ ] 2300109602_63076765
- [ ] 2300109665_63076770
- [ ] 2300119721_63078732
- [ ] 2300137773_63078133
- [ ] 2300137834_63078123
- [ ] 2300167821_63086183
- [ ] 2300172698_63113453
- [ ] 2300188216_63090609
- [ ] 2300188281_63090632
- [ ] 2300188800_63090616
- [ ] 2300193193645_63090623
- [ ] 2300193668_63090611
- [ ] 2300195426_63090608
- [ ] 2300201017_63089636
- [ ] 2300227479_63098330
- [ ] 2300232688_63130821
- [ ] 2300292749_63109239
- [ ] 230029277_63109247
- [ ] 2300294712_63109236
- [ ] 2300308032_63111581
- [ ] 2300323537_63114209
- [ ] 2300334609_63115535
- [ ] 2300346867_63118093
- [ ] 2300346867_63118093_NA12878
- [ ] 2300348940_63118099
- [ ] 2300359806_63119915
- [ ] 2300380476_63123963
- [ ] 2300382582_63123749
- [ ] 2300384269_63126867
- [ ] 2300407581_63130826
- [ ] 2300407626_63130842
- [ ] 2300409593_63130874
- [ ] 2300409612_63130980
- [ ] 2300417623_63131524
** TODO Variants manqués
*** TODO 63012582: chr10:g.102230760
*** TODO 63060439: chr15:g.26869324 = Problème de profondeur DP=15
SCHEDULED: <2023-10-08 Sun>
GT:AD:DP:GQ:PL 0/1:9,6:15:99:103,0,213
* Résultats
** TODO Speed-up BWA-mem
SCHEDULED: <2023-10-08 Sun>
** TODO Speed-up Hapotypecaller
SCHEDULED: <2023-10-08 Sun>
* Communication
** DONE Mail NGS-diag
CLOSED: [2023-10-06 Fri 08:04] SCHEDULED: <2023-10-06 Fri>
/Entered on/ [2023-10-04 Wed 19:33]
"bisonex.org"
("runs.waiting" nil nil find-file-other-window ("/ssh:meso:/Home/Users/apraga/runs.waiting" t) "alex@gentoo" "20231007:13:31:21" nil nil)
("samples.csv" nil nil find-file-other-window ("/ssh:meso:/Work/Users/apraga/bisonex/samples.csv" t) "alex@gentoo" "20231007:13:30:53" nil nil)