FYUID2XZFKETTS6EJLOGMV5DJJMLY4NCQSATT3ZKWDLHEKJRO4TAC
7F3ARXB7ZM4WPC3FOH5RHXFHWB3MGZ6XJYJX5IH6RFQNUKOWHAOQC
CDPYGRNUP6EEE77RHDGEALT4BUELT3RBSBUO35PLI3HLZ44KK26QC
CXW37WKZDOFBTPGZQGQVWDWGA7YWGGJ47SSD4KYEXD6MPERELGGAC
RHWQQAAHNHFO3FLCGVB3SIDKNOUFJGZTDNN57IQVBMXXCWX74MKAC
LCKDIFG2ZAOKZXWFEQBTV7GSTYU4KB6DPHF3EDRCWCCHX6QYSUTQC
CNHFD23P6634DYH6WYSWLNORGZH2UEVRDVGR356PNSNF5ONBAKVAC
FXA3ZBV64FML7W47IPHTAJFJHN3J3XHVHFVNYED47XFSBIGMBKRQC
ME74GL3O7CDN3SEXU4QLQVMFB4TKR2UFVRNIEVBBYMJM2MQSEIRAC
5XXDOQUZ5ORUD6TSVRSOHABWOB4OFPDPQCN4DFU6IF5YDZE3LNRQC
#+RESULTS:
: 3154
Gènes uniquements dans GRCh38
#+begin_src sh :dir ~/Downloads
comm -13 t2t-genes.txt hg38-genes.txt | wc -l
#+end_src
#+RESULTS:
: 12621
*** HOLD OMIM sur nom du gène :annotation:
*** DONE Mobidetails API
CLOSED: [2023-09-10 Sun 16:44]
Trop long ... 1h à 1h30 d'exécution
Disponible dans module
*** DONE Filtre vep avec spip for T2T et spliceAI pour GRCh38
CLOSED: [2023-09-16 Sat 22:47]
*** DONE Repasser tests en GRCh38 avec nouveau filtre (spip ou splice ai) :sanger:
CLOSED: [2023-09-17 Sun 09:07] SCHEDULED: <2023-09-16 Sat>
*** HOLD Franklin API
https://www.postman.com/genoox-ps/workspace/franklin-api-documentation-s-public-workspace/documentation/6621518-4335389d-12e3-445f-8182-339df95b2a09
*** KILL Regarder si clinique disponible avec vep :annotation:
CLOSED: [2023-09-10 Sun 16:44]
** DONE [#B] Indicateurs qualité :qualité:
CLOSED: [2023-09-10 Sun 16:46]
*** Idée
Raredisease:
- FastQC : nombreuses statistiques. Non disponible Nix
- Mosdepth : calcule la profondeur (2x plus rapide que samtools depth). Nix
- MultiQC : fusionne juste les résultats des analyses. Non disponible nix
- Picard's CollectMutipleMetrics, CollectHsMetrics, and CollectWgsMetrics
- Qualimap : alternative fastqc ? Non disponible nix
- Sentieon's WgsMetricsAlgo : propriétaire
- TIDDIT's cov : TIDIT = remaninement chromosomique
Sarek:
- alignment statistics : samtools stats, mosdepth
- QC : MultiQC
MultiQC : non disponible Nix
*** DONE FastqQC
CLOSED: [2023-08-15 Tue 21:43] SCHEDULED: <2023-08-13 Sun>
*** DONE Mosdepth
CLOSED: [2023-08-15 Tue 21:43] SCHEDULED: <2023-08-13 Sun>
Pour exomple, il faut le fichier de capture
subworkflows/local/bam_markduplicates/
*** DONE Samtools stats
CLOSED: [2023-08-15 Tue 21:43] SCHEDULED: <2023-08-13 Sun>
*** DONE [#B] Compte-redu exécution avec MultiQC
CLOSED: [2023-08-15 Tue 21:43] SCHEDULED: <2023-08-13 Sun>
*** DONE Résultats sur NA12878 : 98% à 20x
CLOSED: [2023-08-19 Sat 20:45] SCHEDULED: <2023-08-17 Thu>
**** DONE Comprendre 91% à 20x seulement: SNVs inséré
CLOSED: [2023-08-18 Fri 22:25]
***** DONE Tester autre kit : Twist exome comprehensive
CLOSED: [2023-08-18 Fri 22:24]
Moins bon
***** DONE Tester génome sans alt
CLOSED: [2023-08-18 Fri 22:25]
Idem
***** DONE Tester NA12878 sans SNVs inséré: cause !!
CLOSED: [2023-08-18 Fri 22:25]
***** DONE Tester hg19 sur NA12878 non inséré
CLOSED: [2023-08-18 Fri 22:25]
**** DONE Comprendre pourquoi SNVs diminuent le score: reads manquants
CLOSED: [2023-08-19 Sat 20:34] SCHEDULED: <2023-08-18 Fri>
Voir [[id:5c1c36f3-f68e-4e6d-a7b6-61dca89abc37][Bug: perte de nombreux reads avec NA12878]]
*** DONE Relancer résultats avec NA1287 et NA12878 + sanger
CLOSED: [2023-08-29 Tue 10:30] SCHEDULED: <2023-08-29 Tue>
*** DONE Comparer avec hg19
CLOSED: [2023-08-28 Mon 17:22] SCHEDULED: <2023-08-20 Sun>
*** DONE Comparer avec autres kit de capture
CLOSED: [2023-08-28 Mon 17:22] SCHEDULED: <2023-08-20 Sun>
*** DONE Comparer avec no-alt
CLOSED: [2023-08-28 Mon 17:22] SCHEDULED: <2023-08-20 Sun>
** HOLD vérifier si normalisation
** KILL [#B] Vérification nomenclature hgvs :hgvs:
CLOSED: [2023-08-16 Wed 19:07] SCHEDULED: <2023-08-15 Tue>
*** KILL mutalyzer
CLOSED: [2023-08-16 Wed 19:07] SCHEDULED: <2023-08-13 Sun>
*** KILL API variantvalidator
CLOSED: [2023-08-16 Wed 19:07] SCHEDULED: <2023-08-13 Sun>
** DONE Exécution
CLOSED: [2022-09-13 Tue 21:37]
*** KILL test Bionix
*** KILL Implémenter execution avec Nix ?
Voir https://academic.oup.com/gigascience/article/9/11/giaa121/5987272?login=false
pour un exemple.
Probablement plus simple d’utiliser Nix pour gestion de l’environnement et snakemake pour l’exécution
Pas d’accès internet depuis le cluster
*** DONE nextflow
CLOSED: [2022-09-13 Tue 21:37]
**** TODO Bug scheduler SGE
Le job se fait tuer car l'utilisateur n'est pas passé correctement à nextflow
***** DONE Forcer l'utilisateur à l'exécution
CLOSED: [2023-04-01 Sat 17:57]
NXF_OPTS=-D"user.name=alex"
***** DONE Vérifier si le problème persiste avec 22.10.6
CLOSED: [2023-04-01 Sat 18:38] SCHEDULED: <2023-04-01 Sat>
oui
***** KILL Packager l'utilisateur dans le programme ?
Mauvaise idée..
** TODO Preprocessing avec nextflow
*** TODO Map to reference
**** TODO Sample ID dans header
/Work/Users/apraga/bisonex/out/63003856_S135/preprocessing/baserecalibrator
*** DONE Mark duplicate
CLOSED: [2022-10-09 Sun 22:30]
*** DONE Recalibrate base quality score
CLOSED: [2022-10-09 Sun 22:30]
** DONE Variant calling avec Nextflow
CLOSED: [2022-11-19 Sat 21:34]
*** DONE Haplotype caller
CLOSED: [2022-10-09 Sun 22:40]
*** DONE Filter variants
CLOSED: [2022-10-09 Sun 22:40]
*** DONE Filter common snp not clinvar path
CLOSED: [2022-11-07 Mon 23:00]
Voir [[*common dbSNP not clinvar patho][common dbSNP not clinvar patho]]
*** DONE Filter variant only in consensual sequence
CLOSED: [2022-11-08 Tue 22:23]
*** DONE Filter technical variants
CLOSED: [2022-11-19 Sat 21:34]
*** DONE Utilise AVX pour accélerer l'exécution
CLOSED: [2023-04-29 Sat 15:46]
Sans cela, on a l'avertissement
#+begin_quote
17:28:00.720 INFO PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported
17:28:00.721 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/nix/store/cy9ckxqwrkifx7wf02hm4ww1p6lnbxg9-gatk-4.2.4.1/bin/gatk-package-4.2.4.1-local.jar!/com/intel/gkl/native/libgkl_utils.so
17:28:00.733 WARN NativeLibraryLoader - Unable to load libgkl_utils.so from native/libgkl_utils.so (/Work/Users/apraga/bisonex/out/NA12878_NIST7035/preprocessing/applybqsr/libgkl_utils821485189051585397.so: libgomp.so.1: cannot open shared object file: No such file or directory)
17:28:00.733 WARN IntelPairHmm - Intel GKL Utils not loaded
17:28:00.733 WARN PairHMM - ***WARNING: Machine does not have the AVX instruction set support needed for the accelerated AVX PairHmm. Falling back to the MUCH slower LOGLESS_CACHING implementation!
17:28:00.763 INFO ProgressMeter - Starting traversal
#+end_quote
libgomp.so est fourni par gcc donc il faut charger le module
module load gcc@11.3.0/gcc-12.1.0
** KILL Utiliser subworkflow
CLOSED: [2023-04-02 Sun 18:08]
Notre version permet d'être plus souple
*** KILL Alignement
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
*** KILL Vep
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
vcf_annotate_ensemblvep
** TODO Annotation avec nextflow :annotation:
*** KILL VEP : --gene-phenotype ?
CLOSED: [2023-04-18 mar. 18:32]
Vu avec alexis : bases de données non à jour
https://www.ensembl.org/info/genome/variation/phenotype/sources_phenotype_documentation.html
*** DONE plugin VEP
CLOSED: [2023-04-18 mar. 18:32]
Cloner dépôt git avec plugin
Puis utiliser --dir_plugins
*** HOLD Utiliser code d’Alexis
*** TODO Nouvelle version avec VEP
Example avec --custom
https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html
**** DONE Ajout spliceAI
CLOSED: [2023-05-18 Thu 11:02] SCHEDULED: <2023-04-30 Sun>
plugin VEP
***** DONE Télécharger les données
CLOSED: [2023-05-11 Thu 19:01]
Difficile d'automatiser, le lien est temporaire...
***** DONE PLugin
CLOSED: [2023-05-11 Thu 20:16]
***** DONE Séparer score en plusieurs colonnes
CLOSED: [2023-05-11 Thu 20:16]
Test avec ce fichier pour avoir une ligne avec annotation et une ligne sans
#CHROM POS ID REF ALT
1 9091 . A C
1 69091 . A C
et
#+begin_src sh
rm -f postvep.tsv* && vep -i testspliceai.vcf.gz -o postvep.tsv --tab --dir 109 --merged --pick --use_given_ref --offline --plugin SpliceAI,snv=spliceai_scores.raw.snv.hg38.vcf.gz,indel=spliceai_scores.raw.indel.hg38.vcf.gz
#+end_src
#+begin_src
$ bgzip postvep.tsv
$ python spliceai.py
$ cat postvep2.tsv
,variation,Location,Allele,Gene,Feature,Feature_type,Consequence,cDNA_position,CDS_position,Protein_position,Amino_acids,Codons,Existing_variation,IMPACT,DISTANCE,STRAND,FLAGS,REFSEQ_MATCH,SOURCE,REFSEQ_OFFSET,SpliceAI_AG,SpliceAI_AL,SpliceAI_DG,SpliceAI_DL
0,1_9091_A/C,1:9091,C,ENSG00000290825,ENST00000456328,Transcrip
#+RESULTS:
: 3154
Gènes uniquements dans GRCh38
#+begin_src sh :dir ~/Downloads
comm -13 t2t-genes.txt hg38-genes.txt | wc -l
#+end_src
#+RESULTS:
: 12621
*** HOLD OMIM sur nom du gène :annotation:
*** DONE Mobidetails API
CLOSED: [2023-09-10 Sun 16:44]
Trop long ... 1h à 1h30 d'exécution
Disponible dans module
*** DONE Filtre vep avec spip for T2T et spliceAI pour GRCh38
CLOSED: [2023-09-16 Sat 22:47]
*** DONE Repasser tests en GRCh38 avec nouveau filtre (spip ou splice ai) :sanger:
CLOSED: [2023-09-17 Sun 09:07] SCHEDULED: <2023-09-16 Sat>
*** HOLD Franklin API
https://www.postman.com/genoox-ps/workspace/franklin-api-documentation-s-public-workspace/documentation/6621518-4335389d-12e3-445f-8182-339df95b2a09
*** KILL Regarder si clinique disponible avec vep :annotation:
CLOSED: [2023-09-10 Sun 16:44]
** DONE [#B] Indicateurs qualité :qualité:
CLOSED: [2023-09-10 Sun 16:46]
*** Idée
Raredisease:
- FastQC : nombreuses statistiques. Non disponible Nix
- Mosdepth : calcule la profondeur (2x plus rapide que samtools depth). Nix
- MultiQC : fusionne juste les résultats des analyses. Non disponible nix
- Picard's CollectMutipleMetrics, CollectHsMetrics, and CollectWgsMetrics
- Qualimap : alternative fastqc ? Non disponible nix
- Sentieon's WgsMetricsAlgo : propriétaire
- TIDDIT's cov : TIDIT = remaninement chromosomique
Sarek:
- alignment statistics : samtools stats, mosdepth
- QC : MultiQC
MultiQC : non disponible Nix
*** DONE FastqQC
CLOSED: [2023-08-15 Tue 21:43] SCHEDULED: <2023-08-13 Sun>
*** DONE Mosdepth
CLOSED: [2023-08-15 Tue 21:43] SCHEDULED: <2023-08-13 Sun>
Pour exomple, il faut le fichier de capture
subworkflows/local/bam_markduplicates/
*** DONE Samtools stats
CLOSED: [2023-08-15 Tue 21:43] SCHEDULED: <2023-08-13 Sun>
*** DONE [#B] Compte-redu exécution avec MultiQC
CLOSED: [2023-08-15 Tue 21:43] SCHEDULED: <2023-08-13 Sun>
*** DONE Résultats sur NA12878 : 98% à 20x
CLOSED: [2023-08-19 Sat 20:45] SCHEDULED: <2023-08-17 Thu>
**** DONE Comprendre 91% à 20x seulement: SNVs inséré
CLOSED: [2023-08-18 Fri 22:25]
***** DONE Tester autre kit : Twist exome comprehensive
CLOSED: [2023-08-18 Fri 22:24]
Moins bon
***** DONE Tester génome sans alt
CLOSED: [2023-08-18 Fri 22:25]
Idem
***** DONE Tester NA12878 sans SNVs inséré: cause !!
CLOSED: [2023-08-18 Fri 22:25]
***** DONE Tester hg19 sur NA12878 non inséré
CLOSED: [2023-08-18 Fri 22:25]
**** DONE Comprendre pourquoi SNVs diminuent le score: reads manquants
CLOSED: [2023-08-19 Sat 20:34] SCHEDULED: <2023-08-18 Fri>
Voir [[id:5c1c36f3-f68e-4e6d-a7b6-61dca89abc37][Bug: perte de nombreux reads avec NA12878]]
*** DONE Relancer résultats avec NA1287 et NA12878 + sanger
CLOSED: [2023-08-29 Tue 10:30] SCHEDULED: <2023-08-29 Tue>
*** DONE Comparer avec hg19
CLOSED: [2023-08-28 Mon 17:22] SCHEDULED: <2023-08-20 Sun>
*** DONE Comparer avec autres kit de capture
CLOSED: [2023-08-28 Mon 17:22] SCHEDULED: <2023-08-20 Sun>
*** DONE Comparer avec no-alt
CLOSED: [2023-08-28 Mon 17:22] SCHEDULED: <2023-08-20 Sun>
** HOLD vérifier si normalisation
** KILL [#B] Vérification nomenclature hgvs :hgvs:
CLOSED: [2023-08-16 Wed 19:07] SCHEDULED: <2023-08-15 Tue>
*** KILL mutalyzer
CLOSED: [2023-08-16 Wed 19:07] SCHEDULED: <2023-08-13 Sun>
*** KILL API variantvalidator
CLOSED: [2023-08-16 Wed 19:07] SCHEDULED: <2023-08-13 Sun>
** DONE Exécution
CLOSED: [2022-09-13 Tue 21:37]
*** KILL test Bionix
*** KILL Implémenter execution avec Nix ?
Voir https://academic.oup.com/gigascience/article/9/11/giaa121/5987272?login=false
pour un exemple.
Probablement plus simple d’utiliser Nix pour gestion de l’environnement et snakemake pour l’exécution
Pas d’accès internet depuis le cluster
*** DONE nextflow
CLOSED: [2022-09-13 Tue 21:37]
**** TODO Bug scheduler SGE
Le job se fait tuer car l'utilisateur n'est pas passé correctement à nextflow
***** DONE Forcer l'utilisateur à l'exécution
CLOSED: [2023-04-01 Sat 17:57]
NXF_OPTS=-D"user.name=alex"
***** DONE Vérifier si le problème persiste avec 22.10.6
CLOSED: [2023-04-01 Sat 18:38] SCHEDULED: <2023-04-01 Sat>
oui
***** KILL Packager l'utilisateur dans le programme ?
Mauvaise idée..
*** DONE Diminuer mémoire pour haplotypecaller
CLOSED: [2023-09-20 Wed 21:44] SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 15:30]
Medium = 32Go pour 6 coeurs => 4 jobs (donc tout le noeud) prend plus que les 96GB...
On essaie 16Gb
Puis commit
*** DONE Report multiqc avec 10 runs
CLOSED: [2023-09-19 Tue 15:31] SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 15:31]
Cf mail 2023-09-19
** TODO Preprocessing avec nextflow
*** TODO Map to reference
**** TODO Sample ID dans header
/Work/Users/apraga/bisonex/out/63003856_S135/preprocessing/baserecalibrator
*** DONE Mark duplicate
CLOSED: [2022-10-09 Sun 22:30]
*** DONE Recalibrate base quality score
CLOSED: [2022-10-09 Sun 22:30]
** DONE Variant calling avec Nextflow
CLOSED: [2022-11-19 Sat 21:34]
*** DONE Haplotype caller
CLOSED: [2022-10-09 Sun 22:40]
*** DONE Filter variants
CLOSED: [2022-10-09 Sun 22:40]
*** DONE Filter common snp not clinvar path
CLOSED: [2022-11-07 Mon 23:00]
Voir [[*common dbSNP not clinvar patho][common dbSNP not clinvar patho]]
*** DONE Filter variant only in consensual sequence
CLOSED: [2022-11-08 Tue 22:23]
*** DONE Filter technical variants
CLOSED: [2022-11-19 Sat 21:34]
*** DONE Utilise AVX pour accélerer l'exécution
CLOSED: [2023-04-29 Sat 15:46]
Sans cela, on a l'avertissement
#+begin_quote
17:28:00.720 INFO PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported
17:28:00.721 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/nix/store/cy9ckxqwrkifx7wf02hm4ww1p6lnbxg9-gatk-4.2.4.1/bin/gatk-package-4.2.4.1-local.jar!/com/intel/gkl/native/libgkl_utils.so
17:28:00.733 WARN NativeLibraryLoader - Unable to load libgkl_utils.so from native/libgkl_utils.so (/Work/Users/apraga/bisonex/out/NA12878_NIST7035/preprocessing/applybqsr/libgkl_utils821485189051585397.so: libgomp.so.1: cannot open shared object file: No such file or directory)
17:28:00.733 WARN IntelPairHmm - Intel GKL Utils not loaded
17:28:00.733 WARN PairHMM - ***WARNING: Machine does not have the AVX instruction set support needed for the accelerated AVX PairHmm. Falling back to the MUCH slower LOGLESS_CACHING implementation!
17:28:00.763 INFO ProgressMeter - Starting traversal
#+end_quote
libgomp.so est fourni par gcc donc il faut charger le module
module load gcc@11.3.0/gcc-12.1.0
** KILL Utiliser subworkflow
CLOSED: [2023-04-02 Sun 18:08]
Notre version permet d'être plus souple
*** KILL Alignement
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
*** KILL Vep
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
vcf_annotate_ensemblvep
** TODO Annotation avec nextflow :annotation:
*** KILL VEP : --gene-phenotype ?
CLOSED: [2023-04-18 mar. 18:32]
Vu avec alexis : bases de données non à jour
https://www.ensembl.org/info/genome/variation/phenotype/sources_phenotype_documentation.html
*** DONE plugin VEP
CLOSED: [2023-04-18 mar. 18:32]
Cloner dépôt git avec plugin
Puis utiliser --dir_plugins
*** HOLD Utiliser code d’Alexis
*** TODO Nouvelle version avec VEP
Example avec --custom
https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html
**** DONE Ajout spliceAI
CLOSED: [2023-05-18 Thu 11:02] SCHEDULED: <2023-04-30 Sun>
plugin VEP
***** DONE Télécharger les données
CLOSED: [2023-05-11 Thu 19:01]
Difficile d'automatiser, le lien est temporaire...
***** DONE PLugin
CLOSED: [2023-05-11 Thu 20:16]
***** DONE Séparer score en plusieurs colonnes
CLOSED: [2023-05-11 Thu 20:16]
Test avec ce fichier pour avoir une ligne avec annotation et une ligne sans
#CHROM POS ID REF ALT
1 9091 . A C
1 69091 . A C
et
#+begin_src sh
rm -f postvep.tsv* && vep -i testspliceai.vcf.gz -o postvep.tsv --tab --dir 109 --merged --pick --use_given_ref --offline --plugin SpliceAI,snv=spliceai_scores.raw.snv.hg38.vcf.gz,indel=spliceai_scores.raw.indel.hg38.vcf.gz
#+end_src
#+begin_src
$ bgzip postvep.tsv
$ python spliceai.py
$ cat postvep2.tsv
,variation,Location,Allele,Gene,Feature,Feature_type,Consequence,cDNA_position,CDS_position,Protein_position,Amino_acids,Codons,Existing_variation,IMPACT,DISTANCE,STRAND,FLAGS,REFSEQ_MATCH,SOURCE,REFSEQ_OFFSET,SpliceAI_AG,SpliceAI_AL,SpliceAI_DG,SpliceAI_DL
0,1_9091_A/C,1:9091,C,ENSG00000290825,ENST00000456328,Transcrip
CLOSED: [2023-09-10 Sun 15:44] SCHEDULED: <2023-09-09 Sat>
Merge depuis debug
***** DONE Mail alexis
CLOSED: [2023-09-01 Fri 10:32] SCHEDULED: <2023-08-31 Thu>
***** DONE Relancer test sanger
CLOSED: [2023-09-11 Mon 19:11] SCHEDULED: <2023-09-10 Sun>
***** DONE Mail Paul pour validation
CLOSED: [2023-09-11 Mon 19:11] SCHEDULED: <2023-09-10 Sun>
**** DONE Utiliser spliceAI >= 0.2 pour filtre au lieu de spip
CLOSED: [2023-09-11 Mon 21:48] SCHEDULED: <2023-09-11 Mon>
**** DONE Repasser tests sanger avec spliceAI
CLOSED: [2023-09-14 Thu 22:45] SCHEDULED: <2023-09-11 Mon>
**** DONE Corriger colonne récessive
CLOSED: [2023-09-14 Thu 22:57] SCHEDULED: <2023-09-14 Thu>
soit 1/1, soit 1/2
soit 0/1 avec 2 variants par gene
*** KILL Comparer les annotations sur 63003856
CLOSED: [2023-08-28 Mon 17:28]
**** Relancer le nouveau pipeline
*** KILL Ancienne version
CLOSED: [2023-08-28 Mon 17:24]
**** KILL HGVS
CLOSED: [2023-08-28 Mon 17:24]
**** KILL Filtrer après VEP
CLOSED: [2023-08-28 Mon 17:24]
**** KILL OMIM
CLOSED: [2023-08-28 Mon 17:24]
**** KILL clinvar
CLOSED: [2023-08-28 Mon 17:24]
**** KILL ACMG incidental
CLOSED: [2023-08-28 Mon 17:24]
**** KILL Grantham
CLOSED: [2023-08-28 Mon 17:24]
**** KILL LRG
CLOSED: [2023-04-18 mar. 17:22] SCHEDULED: <2023-04-18 Tue>
Vu avec alexis, n’est plus à jour
**** KILL Gnomad
CLOSED: [2023-08-28 Mon 17:24]
*** DONE Réordonner les colonnes :annotation:
CLOSED: [2023-08-31 Thu 10:38] SCHEDULED: <2023-08-28 Mon>
Pas d'OMIM, pas de CADD, pas de spliceAI
** DONE Porter exactement la version d'Alexis sur Helios
CLOSED: [2023-01-14 Sat 17:56]
Branche "prod"
** KILL Tester version d'alexis avec Nix
CLOSED: [2023-06-14 Wed 22:37]
*** DONE Ajouter clinvar
CLOSED: [2022-11-13 Sun 19:37]
*** DONE Alignement
CLOSED: [2022-11-13 Sun 12:52]
*** DONE Haplotype caller
CLOSED: [2022-11-13 Sun 13:00]
*** KILL Filter
CLOSED: [2023-06-14 Wed 22:37]
- [X] depth
- [ ] comon snp not path
Problème avec liste des ID
**** KILL variant annotation
CLOSED: [2023-06-14 Wed 22:37]
Besoin de vep
*** KILL Variant calling
CLOSED: [2023-06-14 Wed 22:37]
** KILL Tester sarek
CLOSED: [2023-08-12 Sat 15:53]
#+begin_src sh
module load apptainer/1.1.8
nextflow run nf-core/sarek -profile test,singularity --outdir test-sarek
#+end_src
Les dépendences ne se téléchargent pas correctement, on les extrait à la main
#+begin_src sh
rg -IN galaxyproject modules | sed 's/ //g;s/:$//' | sort | uniq > deps.txt
#+end_src
Nettoyage à la main
Puis
#+begin_src sh
cat deps.txt | xargs -L1 singularity pull
#+end_src
** DONE Support pour samplesheet
CLOSED: [2023-08-03 Thu 14:24] SCHEDULED: <2023-08-03 Thu 13:00>
/Entered on/ [2023-08-03 Thu 13:12]
** DONE Petit jeu de données : chr22 sur HG001
CLOSED: [2023-08-05 Sat 14:21] SCHEDULED: <2023-08-05 Sat>
** DONE Corriger OMIM annotation: manquant pour NMNAT1
CLOSED: [2023-09-16 Sat 22:47] SCHEDULED: <2023-09-16 Sat>
/Entered on/ [2023-09-16 Sat 19:32]
* Documentation
:PROPERTIES:
:CATEGORY: doc
:END:
** DONE Procédure d'installation nix + dependences pour VM CHU
CLOSED: [2023-04-22 Sat 15:27] SCHEDULED: <2023-04-13 Thu>
* Manuscript
:PROPERTIES:
:CATEGORY: manuscript
:END:
** DONE Flowchart pipeline (avec T2T)
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
** TODO Figure: nombre de publication par aligneur
SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
** TODO Figure: nombre de publication par appel de variant
SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
** TODO Figure: nombre d'exomes par années
SCHEDULED: <2023-09-23 Sat>
/Entered on/ [2023-09-19 Tue 08:43]
* Tests :tests:
** KILL Non régression : version prod
CLOSED: [2023-05-23 Tue 08:46]
*** DONE ID common snp
CLOSED: [2022-11-19 Sat 21:36]
#+begin_src
$ wc -l ID_of_common_snp.txt
23194290 ID_of_common_snp.txt
$ wc -l /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
23194290 /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
#+end_src
*** DONE ID common snp not clinvar patho
CLOSED: [2022-12-11 Sun 20:11]
**** DONE Vérification du problème
CLOSED: [2022-12-11 Sun 16:30]
Sur le J:
21155134 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref
Version de "non-régression"
21155076 database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt
Nouvelle version
23193391 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt
Si on enlève les doublons
$ sort database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt | uniq > old.txt
$ wc -l old.txt
21107097 old.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt | uniq > new.txt
$ wc -l new.txt
21174578 new.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref | uniq > ref.txt
$ wc -l ref.txt
21107155 ref.txt
Si on regarde la différence
comm -23 ref.txt old.txt
rs1052692
rs1057518973
rs1057518973
rs11074121
rs112848754
rs12573787
rs145033890
rs147889095
rs1553904159
rs1560294695
rs1560296615
rs1560310926
rs1560325547
rs1560342418
rs1560356225
rs1578287542
...
On cherche le premier
bcftools query -i 'ID="rs1052692"' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 1619351 C A,T
Il est bien patho...
$ bcftools query -i 'POS=1619351' database/clinvar/clinvar.vcf.gz -f '%CHROM %POS %REF %ALT %INFO/CLNSIG\n'
19 1619351 C T Conflicting_interpretations_of_pathogenicity
On vérifie pour tous les autres
$ comm -23 ref.txt old.txt > tocheck.txt
On génère les régions à vérifier (chromosome number:position)
$ bcftools query -i 'ID=@tocheck.txt' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM\t%POS\n' > tocheck.pos
On génère le mapping inverse (chromosome number -> NC)
$ awk ' { t = $1; $1 = $2; $2 = t; print; } ' database/RefSeq/refseq_to_number_only_consensual.txt > mapping.txt
On remap clinvar
$ bcftools annotate --rename-chrs mapping.txt database/clinvar/clinvar.vcf.gz -o clinvar_remapped.vcf.gz
$ tabix clinvar_remapped.vcf.gz
Enfin, on cherche dans clinvar la classification
$ bcftools query -R tocheck.pos clinvar_remapped.vcf.gz -f '%CHROM %POS %INFO/CLNSIG\n'
$ bcftools query -R tocheck.pos database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %ID \n' | grep '^NC'
#+RESULTS:
**** DONE Comprendre pourquoi la nouvelle version donne un résultat différent
CLOSED: [2022-12-11 Sun 20:11]
***** DONE Même version dbsnp et clinvar ?
CLOSED: [2022-12-10 Sat 23:02]
Clinvar différent !
$ bcftools stats clinvar.gz
clinvar (Alexis)
SN 0 number of samples: 0
SN 0 number of records: 1492828
SN 0 number of no-ALTs: 965
SN 0 number of SNPs: 1338007
SN 0 number of MNPs: 5562
SN 0 number of indels: 144580
SN 0 number of others: 3714
SN 0 number of multiallelic sites: 0
SN 0 number of multiallelic SNP sites: 0
clinvar (new)
SN 0 number of samples: 0
SN 0 number of records: 1493470
SN 0 number of no-ALTs: 965
SN 0 number of SNPs: 1338561
SN 0 number of MNPs: 5565
SN 0 number of indels: 144663
SN 0 number of others: 3716
SN 0 number of multiallelic sites: 0
SN 0 number of multiallelic SNP sites: 0
***** DONE Mettre à jour clinvar et dbnSNP pour travailler sur les mêm bases
CLOSED: [2022-12-11 Sun 12:10]
Problème persiste
***** DONE Supprimer la conversion en int du chromosome
CLOSED: [2022-12-10 Sat 19:29]
***** KILL Même NC ?
CLOSED: [2022-12-10 Sat 19:29]
$ zgrep "contig=<ID=NC_\(.*\)" clinvar/GRCh38/clinvar.vcf.gz > contig.clinvar
$ diff contig.txt contig.clinvar
< ##contig=<ID=NC_012920.1>
***** DONE Tester sur chromosome 19: ok
CLOSED: [2022-12-11 Sun 13:53]
On prépare les données
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/dbSNP_common.vcf.gz -o dbSNP_common_19.vcf.gz
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/clinvar/GRCh38/clinvar.vcf.gz -o clinvar_
CLOSED: [2023-09-10 Sun 15:44] SCHEDULED: <2023-09-09 Sat>
Merge depuis debug
***** DONE Mail alexis
CLOSED: [2023-09-01 Fri 10:32] SCHEDULED: <2023-08-31 Thu>
***** DONE Relancer test sanger
CLOSED: [2023-09-11 Mon 19:11] SCHEDULED: <2023-09-10 Sun>
***** DONE Mail Paul pour validation
CLOSED: [2023-09-11 Mon 19:11] SCHEDULED: <2023-09-10 Sun>
**** DONE Utiliser spliceAI >= 0.2 pour filtre au lieu de spip
CLOSED: [2023-09-11 Mon 21:48] SCHEDULED: <2023-09-11 Mon>
**** DONE Repasser tests sanger avec spliceAI
CLOSED: [2023-09-14 Thu 22:45] SCHEDULED: <2023-09-11 Mon>
**** DONE Corriger colonne récessive
CLOSED: [2023-09-14 Thu 22:57] SCHEDULED: <2023-09-14 Thu>
soit 1/1, soit 1/2
soit 0/1 avec 2 variants par gene
*** KILL Comparer les annotations sur 63003856
CLOSED: [2023-08-28 Mon 17:28]
**** Relancer le nouveau pipeline
*** KILL Ancienne version
CLOSED: [2023-08-28 Mon 17:24]
**** KILL HGVS
CLOSED: [2023-08-28 Mon 17:24]
**** KILL Filtrer après VEP
CLOSED: [2023-08-28 Mon 17:24]
**** KILL OMIM
CLOSED: [2023-08-28 Mon 17:24]
**** KILL clinvar
CLOSED: [2023-08-28 Mon 17:24]
**** KILL ACMG incidental
CLOSED: [2023-08-28 Mon 17:24]
**** KILL Grantham
CLOSED: [2023-08-28 Mon 17:24]
**** KILL LRG
CLOSED: [2023-04-18 mar. 17:22] SCHEDULED: <2023-04-18 Tue>
Vu avec alexis, n’est plus à jour
**** KILL Gnomad
CLOSED: [2023-08-28 Mon 17:24]
*** DONE Réordonner les colonnes :annotation:
CLOSED: [2023-08-31 Thu 10:38] SCHEDULED: <2023-08-28 Mon>
Pas d'OMIM, pas de CADD, pas de spliceAI
** DONE Porter exactement la version d'Alexis sur Helios
CLOSED: [2023-01-14 Sat 17:56]
Branche "prod"
** KILL Tester version d'alexis avec Nix
CLOSED: [2023-06-14 Wed 22:37]
*** DONE Ajouter clinvar
CLOSED: [2022-11-13 Sun 19:37]
*** DONE Alignement
CLOSED: [2022-11-13 Sun 12:52]
*** DONE Haplotype caller
CLOSED: [2022-11-13 Sun 13:00]
*** KILL Filter
CLOSED: [2023-06-14 Wed 22:37]
- [X] depth
- [ ] comon snp not path
Problème avec liste des ID
**** KILL variant annotation
CLOSED: [2023-06-14 Wed 22:37]
Besoin de vep
*** KILL Variant calling
CLOSED: [2023-06-14 Wed 22:37]
** KILL Tester sarek
CLOSED: [2023-08-12 Sat 15:53]
#+begin_src sh
module load apptainer/1.1.8
nextflow run nf-core/sarek -profile test,singularity --outdir test-sarek
#+end_src
Les dépendences ne se téléchargent pas correctement, on les extrait à la main
#+begin_src sh
rg -IN galaxyproject modules | sed 's/ //g;s/:$//' | sort | uniq > deps.txt
#+end_src
Nettoyage à la main
Puis
#+begin_src sh
cat deps.txt | xargs -L1 singularity pull
#+end_src
** DONE Support pour samplesheet
CLOSED: [2023-08-03 Thu 14:24] SCHEDULED: <2023-08-03 Thu 13:00>
/Entered on/ [2023-08-03 Thu 13:12]
** DONE Petit jeu de données : chr22 sur HG001
CLOSED: [2023-08-05 Sat 14:21] SCHEDULED: <2023-08-05 Sat>
** DONE Corriger OMIM annotation: manquant pour NMNAT1
CLOSED: [2023-09-16 Sat 22:47] SCHEDULED: <2023-09-16 Sat>
/Entered on/ [2023-09-16 Sat 19:32]
* Documentation
:PROPERTIES:
:CATEGORY: doc
:END:
** DONE Procédure d'installation nix + dependences pour VM CHU
CLOSED: [2023-04-22 Sat 15:27] SCHEDULED: <2023-04-13 Thu>
* Manuscript
:PROPERTIES:
:CATEGORY: manuscript
:END:
** DONE Flowchart pipeline (avec T2T)
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
** DONE Figure: nombre de publication par aligneur
CLOSED: [2023-09-19 Tue 16:54] SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
** TODO Figure: nombre de publication par appel de variant
SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
** TODO Figure: nombre d'exomes par années
SCHEDULED: <2023-09-23 Sat>
/Entered on/ [2023-09-19 Tue 08:43]
* Tests :tests:
** KILL Non régression : version prod
CLOSED: [2023-05-23 Tue 08:46]
*** DONE ID common snp
CLOSED: [2022-11-19 Sat 21:36]
#+begin_src
$ wc -l ID_of_common_snp.txt
23194290 ID_of_common_snp.txt
$ wc -l /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
23194290 /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
#+end_src
*** DONE ID common snp not clinvar patho
CLOSED: [2022-12-11 Sun 20:11]
**** DONE Vérification du problème
CLOSED: [2022-12-11 Sun 16:30]
Sur le J:
21155134 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref
Version de "non-régression"
21155076 database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt
Nouvelle version
23193391 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt
Si on enlève les doublons
$ sort database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt | uniq > old.txt
$ wc -l old.txt
21107097 old.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt | uniq > new.txt
$ wc -l new.txt
21174578 new.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref | uniq > ref.txt
$ wc -l ref.txt
21107155 ref.txt
Si on regarde la différence
comm -23 ref.txt old.txt
rs1052692
rs1057518973
rs1057518973
rs11074121
rs112848754
rs12573787
rs145033890
rs147889095
rs1553904159
rs1560294695
rs1560296615
rs1560310926
rs1560325547
rs1560342418
rs1560356225
rs1578287542
...
On cherche le premier
bcftools query -i 'ID="rs1052692"' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 1619351 C A,T
Il est bien patho...
$ bcftools query -i 'POS=1619351' database/clinvar/clinvar.vcf.gz -f '%CHROM %POS %REF %ALT %INFO/CLNSIG\n'
19 1619351 C T Conflicting_interpretations_of_pathogenicity
On vérifie pour tous les autres
$ comm -23 ref.txt old.txt > tocheck.txt
On génère les régions à vérifier (chromosome number:position)
$ bcftools query -i 'ID=@tocheck.txt' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM\t%POS\n' > tocheck.pos
On génère le mapping inverse (chromosome number -> NC)
$ awk ' { t = $1; $1 = $2; $2 = t; print; } ' database/RefSeq/refseq_to_number_only_consensual.txt > mapping.txt
On remap clinvar
$ bcftools annotate --rename-chrs mapping.txt database/clinvar/clinvar.vcf.gz -o clinvar_remapped.vcf.gz
$ tabix clinvar_remapped.vcf.gz
Enfin, on cherche dans clinvar la classification
$ bcftools query -R tocheck.pos clinvar_remapped.vcf.gz -f '%CHROM %POS %INFO/CLNSIG\n'
$ bcftools query -R tocheck.pos database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %ID \n' | grep '^NC'
#+RESULTS:
**** DONE Comprendre pourquoi la nouvelle version donne un résultat différent
CLOSED: [2022-12-11 Sun 20:11]
***** DONE Même version dbsnp et clinvar ?
CLOSED: [2022-12-10 Sat 23:02]
Clinvar différent !
$ bcftools stats clinvar.gz
clinvar (Alexis)
SN 0 number of samples: 0
SN 0 number of records: 1492828
SN 0 number of no-ALTs: 965
SN 0 number of SNPs: 1338007
SN 0 number of MNPs: 5562
SN 0 number of indels: 144580
SN 0 number of others: 3714
SN 0 number of multiallelic sites: 0
SN 0 number of multiallelic SNP sites: 0
clinvar (new)
SN 0 number of samples: 0
SN 0 number of records: 1493470
SN 0 number of no-ALTs: 965
SN 0 number of SNPs: 1338561
SN 0 number of MNPs: 5565
SN 0 number of indels: 144663
SN 0 number of others: 3716
SN 0 number of multiallelic sites: 0
SN 0 number of multiallelic SNP sites: 0
***** DONE Mettre à jour clinvar et dbnSNP pour travailler sur les mêm bases
CLOSED: [2022-12-11 Sun 12:10]
Problème persiste
***** DONE Supprimer la conversion en int du chromosome
CLOSED: [2022-12-10 Sat 19:29]
***** KILL Même NC ?
CLOSED: [2022-12-10 Sat 19:29]
$ zgrep "contig=<ID=NC_\(.*\)" clinvar/GRCh38/clinvar.vcf.gz > contig.clinvar
$ diff contig.txt contig.clinvar
< ##contig=<ID=NC_012920.1>
***** DONE Tester sur chromosome 19: ok
CLOSED: [2022-12-11 Sun 13:53]
On prépare les données
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/dbSNP_common.vcf.gz -o dbSNP_common_19.vcf.gz
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/clinvar/GRCh38/clinvar.vcf.gz -o clinvar_
nQual depth
│ String Float64 Int64
─────┼──────────────────────────────────────
1 │ chr12:g.13720138C>T 60.0 1
2 │ chr17:g.10296150T>A 60.0 1
3 │ chr21:g.43426167C>T 0.0 59
Manque de profondeur su
r 2 et mauvaise qualité sur 3
***** DONE Vérifier après filterdepth: 0 perdus en plus
CLOSED: [2023-08-20 Sun 09:18] SCHEDULED: <2023-08-20 Sun>
***** DONE Vérifier après filterpolymorphis : 0 perdus en plus
CLOSED: [2023-08-20 Sun 09:18] SCHEDULED: <2023-08-20 Sun>
***** DONE Vérifier après filter vep: 2 perdus en plus
CLOSED: [2023-08-20 Sun 12:37] SCHEDULED: <2023-08-20 Sun>
2×3 DataFrame
Row │ variant meanQual depth
│ String Float64 Int64
─────┼─────────────────────────────────────
1 │ chr17:g.7852503T>C 60.0 96
2 │ chrX:g.47575255G>A 60.0 145
***** DONE 1ere correction spip: meilleur nombre de variants en sortie mais manque toujours ces 2
CLOSED: [2023-08-20 Sun 11:38]
***** DONE --pick : résout le problème
CLOSED: [2023-08-20 Sun 12:37]
chrX:g.47575255G>A est rendu downstream_gene_variant avec l'option --pick
Or il n'est pas en5' dans les transcrits refseq...
https://genome-euro.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chrX%3A47575242%2D47575268&hgsid=301211823_xpelPqPJije7wSIhg070JeGH5ZwV
https://mobidetails.iurc.montp.inserm.fr/MD/api/variant/238296/browser/
Idem pour l'autre
chr17:g.7852503T>C
https://mobidetails.iurc.montp.inserm.fr/MD/api/variant/182993/browser/
Note:
VEP chooses one block of annotation per variant, using an ordered set of criteria. This order may be customised using --pick_order.
MANE Select transcript status
MANE Plus Clinical transcript status
canonical status of transcript
APPRIS isoform annotation
transcript support level
biotype of transcript ("protein_coding" preferred)
CCDS status of transcript
consequence rank according to this table
translated, transcript or feature length (longer preferred)
"Wherever possible we would discourage you from summarising data in this way. "
**** DONE Mail alexis
CLOSED: [2023-08-20 Sun 13:45] SCHEDULED: <2023-08-20 Sun>
**** TODO Données simuscop 200x
SCHEDULED: <2023-09-24 Sun>
**** DONE En T2T avec liftover (filtre = spip) : ok mais lent et trop de variants :tests:
CLOSED: [2023-09-17 Sun 17:13] SCHEDULED: <2023-09-17 Sun>
1. Conversion en bed
#+begin_src sh :dir:~/code/sanger
open snvs-cento-sanger.csv | select chrom pos | insert pos2 {$in.pos } | to csv --separator="\t" | save snvs-cento-sanger.bed -f
#+end_src
2. Liftover avec UCSC (en ligne)
NB: vérifié sur le premier résultat en cherche le read contenant le variant (samtools view -r puis samtools view | grep en T2T) et avec l'aide d'IGV, on a un variant qui correspond en
chr1:10757746
3. En supposant que l'ordre des variants n'a pas changé, on ajoute simplement REF et ALT avec annotateLifted.jl
Annotation spip *très lente* : 1h13 !
Résultat:
2×3 DataFrame
Row │ variant meanQual depth
│ String Float64 Int64
─────┼──────────────────────────────────────
1 │ chr12:g.13594572 60.0 1
2 │ chr17:g.10204026 60.0 1
144 found over 146
filter depth : another 0 missed variants
filter poly : another 0 missed variants
filter vep : another 0 missed variants
Et on a trop de variants en sortie (7330 !)
**** DONE Mail Paul avec résultats filtre en T2T + nouveau schéma
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
* Ré-interprétation
** TODO Lancer tests sur données brutes [26/250]
- [X] 100222_63015289
- [X] 1600304839_63051311
- [X] 1900007827_62913191
- [X] 1900398899_62999500
- [X] 1900486799_62913197
- [X] 2100422923_62952677
- [X] 2100458888_62933047
- [X] 2100601558_62903840
- [X] 2100609288_62905768
- [X] 2100609501_62905776
- [X] 2100614493_62951074
- [X] 2100622566_62908067
- [X] 2100622601_62908060
- [X] 2100622705_62908063
- [X] 2100640027_62911936
- [X] 2100645285_62913212
- [X] 2100661411_62914081
- [X] 2100661462_62914086
- [X] 2100708257_62921596
- [X] 2100738732_62926501
- [X] 2100738850_62926509
- [X] 2100746751_62926505
- [X] 2100746797_62926506
- [X] 2100782349_62931722
- [X] 2100782416_62931561
- [X] 2100782559_62931718
- [ ] 2100799204_62934768
- [ ] 2200010202_62940284
- [ ] 2200023600_62940631
- [ ] 2200024348_62999591
- [ ] 2200027505_62942457
- [ ] 2200038776_62943412
- [ ] 2200041919_62943405
- [ ] 2200088014_62951326
- [ ] 2200146652_62959388
- [ ] 2200151850_62960953
- [ ] 2200160014_62959475
- [ ] 2200160070_62959478
- [ ] 2200201368_62967471
- [ ] 2200201400_62967470
- [ ] 2200265558_62976332
- [ ] 2200265605_62976401
- [ ] 2200267046_62975192
- [ ] 2200273878_62999530
- [ ] 2200279708_62977002
- [ ] 2200284408_62979102
- [ ] 2200293987_62979116
- [ ] 2200294359_62979118
- [ ] 2200306299_62982217
- [ ] 2200306539_62982193
- [ ] 220030671_62982211
- [ ] 2200307058_62982231
- [ ] 2200307108_62982196
- [ ] 2200307136_62982221
- [ ] 2200307199_62982239
- [ ] 2200307230_62982234
- [ ] 2200307262_62982219
- [ ] 2200307297_62982227
- [ ] 2200324510_62985453
- [ ] 2200324549_62985478
- [ ] 2200324573_62985445
- [ ] 2200324594_62985467
- [ ] 2200324606_62985463
- [ ] 2200324614_62985459
- [ ] 2200338306_62985430
- [ ] 2200343880_62989407
- [ ] 2200343910_62989460
- [ ] 2200343938_62989451
- [ ] 2200343966_62989456
- [ ] 2200343993_62989440
- [ ] 2200344013_62989464
- [ ] 2200349749_62989465
- [ ] 2200363462_62988848
- [ ] 2200377880_62991993
- [ ] 2200378032_62991991
- [ ] 2200383996_62993828
- [ ] 2200384015_62993796
- [ ] 2200384046_62993822
- [ ] 2200384117_62993808
- [ ] 2200384187_62993825
- [ ] 2200384231_62992898
- [ ] 2200385658_63060260
- [ ] 2200394260_62994732
- [ ] 2200395817_62994742
- [ ] 2200396731_62994737
- [ ] 2200424073_62999579
- [ ] 2200424207_62999632
- [ ] 2200426178_62999630
- [ ] 2200426243_62999635
- [ ] 2200426466_62999605
- [ ] 2200426642_62999627
- [ ] 2200427406_62999649
- [ ] 2200427512_62999639
- [ ] 2200428953_62999572
- [ ] 2200428981_62999600
- [ ] 2200428999_62999592
- [ ] 2200441970_63000868
- [ ] 2200441989_63000882
- [ ] 2200442135_63000864
- [ ] 2200442216_63000886
- [ ] 2200442257_63000951
- [ ] 2200451801_63003573
- [ ] 2200451862_63004218
- [ ] 2200451894_63004210
- [ ] 2200456165_63051294
- [ ] 2200459865_63004933
- [ ] 2200459968_63004937
- [ ] 2200460073_63004943
- [ ] 2200460121_63004684
- [ ] 2200467051_63003856
- [ ] 2200467225_63004940
- [ ] 2200467261_63004930
- [ ] 2200467338_63004925
- [ ] 2200470099_63004485
- [ ] 2200470142_63004480
- [ ] 2200471780_63004362
- [ ] 2200480910_63006466
- [ ] 2200495073_63010427
- [ ] 2200495510_63009152
- [ ] 2200508677_63060252
- [ ] 2200510531_63012582
- [ ] 2200510628_63012549
- [ ] 2200510657_63012554
- [ ] 2200511249_63012533
- [ ] 2200511274_63012586
- [ ] 2200517952_63060399
- [ ] 2200519525_63060439
- [ ] 2200524009_63014044
- [ ] 2200524609_63014046
- [ ] 2200524616_63014048
- [ ] 2200533429_63060425
- [ ] 2200539735_63060406
- [ ] 2200549908_63019339
- [ ] 2200549965_63019349
- [ ] 2200550414_63019357
- [ ] 2200550471_63020031
- [ ] 2200550490_63019351
- [ ] 2200550505_63019340
- [ ] 2200555565_63018614
- [ ] 2200559438_63020029
- [ ] 2200559682_63020030
- [ ] 2200559713_63019623
- [ ] 2200559739_63019626
- [ ] 2200569969_63019991
- [ ] 2200570001_63021580
- [ ] 2200570025_63021490
- [ ] 2200570035_63021491
- [ ] 2200570042_63021493
- [ ] 2200570050_63021494
- [ ] 2200579897_63024910
- [ ] 2200583995_63024866
- [ ] 2200584035_63024905
- [ ] 2200584069
nQual depth
│ String Float64 Int64
─────┼──────────────────────────────────────
1 │ chr12:g.13720138C>T 60.0 1
2 │ chr17:g.10296150T>A 60.0 1
3 │ chr21:g.43426167C>T 0.0 59
Manque de profondeur sur 2 et mauvaise qualité sur 3
***** DONE Vérifier après filterdepth: 0 perdus en plus
CLOSED: [2023-08-20 Sun 09:18] SCHEDULED: <2023-08-20 Sun>
***** DONE Vérifier après filterpolymorphis : 0 perdus en plus
CLOSED: [2023-08-20 Sun 09:18] SCHEDULED: <2023-08-20 Sun>
***** DONE Vérifier après filter vep: 2 perdus en plus
CLOSED: [2023-08-20 Sun 12:37] SCHEDULED: <2023-08-20 Sun>
2×3 DataFrame
Row │ variant meanQual depth
│ String Float64 Int64
─────┼─────────────────────────────────────
1 │ chr17:g.7852503T>C 60.0 96
2 │ chrX:g.47575255G>A 60.0 145
***** DONE 1ere correction spip: meilleur nombre de variants en sortie mais manque toujours ces 2
CLOSED: [2023-08-20 Sun 11:38]
***** DONE --pick : résout le problème
CLOSED: [2023-08-20 Sun 12:37]
chrX:g.47575255G>A est rendu downstream_gene_variant avec l'option --pick
Or il n'est pas en5' dans les transcrits refseq...
https://genome-euro.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chrX%3A47575242%2D47575268&hgsid=301211823_xpelPqPJije7wSIhg070JeGH5ZwV
https://mobidetails.iurc.montp.inserm.fr/MD/api/variant/238296/browser/
Idem pour l'autre
chr17:g.7852503T>C
https://mobidetails.iurc.montp.inserm.fr/MD/api/variant/182993/browser/
Note:
VEP chooses one block of annotation per variant, using an ordered set of criteria. This order may be customised using --pick_order.
MANE Select transcript status
MANE Plus Clinical transcript status
canonical status of transcript
APPRIS isoform annotation
transcript support level
biotype of transcript ("protein_coding" preferred)
CCDS status of transcript
consequence rank according to this table
translated, transcript or feature length (longer preferred)
"Wherever possible we would discourage you from summarising data in this way. "
**** DONE Mail alexis
CLOSED: [2023-08-20 Sun 13:45] SCHEDULED: <2023-08-20 Sun>
**** TODO Données simuscop 200x
SCHEDULED: <2023-09-24 Sun>
**** DONE En T2T avec liftover (filtre = spip) : ok mais lent et trop de variants :tests:
CLOSED: [2023-09-17 Sun 17:13] SCHEDULED: <2023-09-17 Sun>
1. Conversion en bed
#+begin_src sh :dir:~/code/sanger
open snvs-cento-sanger.csv | select chrom pos | insert pos2 {$in.pos } | to csv --separator="\t" | save snvs-cento-sanger.bed -f
#+end_src
2. Liftover avec UCSC (en ligne)
NB: vérifié sur le premier résultat en cherche le read contenant le variant (samtools view -r puis samtools view | grep en T2T) et avec l'aide d'IGV, on a un variant qui correspond en
chr1:10757746
3. En supposant que l'ordre des variants n'a pas changé, on ajoute simplement REF et ALT avec annotateLifted.jl
Annotation spip *très lente* : 1h13 !
Résultat:
2×3 DataFrame
Row │ variant meanQual depth
│ String Float64 Int64
─────┼──────────────────────────────────────
1 │ chr12:g.13594572 60.0 1
2 │ chr17:g.10204026 60.0 1
144 found over 146
filter depth : another 0 missed variants
filter poly : another 0 missed variants
filter vep : another 0 missed variants
Et on a trop de variants en sortie (7330 !)
**** DONE Mail Paul avec résultats filtre en T2T + nouveau schéma
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
* Ré-interprétation
** TODO Lancer tests sur données brutes [47/250]
- [X] 100222_63015289
- [X] 1600304839_63051311
- [X] 1900007827_62913191
- [X] 1900398899_62999500
- [X] 1900486799_62913197
- [X] 2100422923_62952677
- [X] 2100458888_62933047
- [X] 2100601558_62903840
- [X] 2100609288_62905768
- [X] 2100609501_62905776
- [X] 2100614493_62951074
- [X] 2100622566_62908067
- [X] 2100622601_62908060
- [X] 2100622705_62908063
- [X] 2100640027_62911936
- [X] 2100645285_62913212
- [X] 2100661411_62914081
- [X] 2100661462_62914086
- [X] 2100708257_62921596
- [X] 2100738732_62926501
- [X] 2100738850_62926509
- [X] 2100746751_62926505
- [X] 2100746797_62926506
- [X] 2100782349_62931722
- [X] 2100782416_62931561
- [X] 2100782559_62931718
- [X] 2100799204_62934768
- [X] 2200010202_62940284
- [X] 2200023600_62940631
- [X] 2200024348_62999591
- [X] 2200027505_62942457
- [X] 2200038776_62943412
- [X] 2200041919_62943405
- [X] 2200088014_62951326
- [X] 2200146652_62959388
- [X] 2200151850_62960953
- [X] 2200160014_62959475
- [X] 2200160070_62959478
- [X] 2200201368_62967471
- [X] 2200201400_62967470
- [X] 2200265558_62976332
- [X] 2200265605_62976401
- [X] 2200267046_62975192
- [X] 2200273878_62999530
- [X] 2200279708_62977002
- [X] 2200284408_62979102
- [X] 2200293987_62979116
- [ ] 2200294359_62979118
- [ ] 2200306299_62982217
- [ ] 2200306539_62982193
- [ ] 220030671_62982211
- [ ] 2200307058_62982231
- [ ] 2200307108_62982196
- [ ] 2200307136_62982221
- [ ] 2200307199_62982239
- [ ] 2200307230_62982234
- [ ] 2200307262_62982219
- [ ] 2200307297_62982227
- [ ] 2200324510_62985453
- [ ] 2200324549_62985478
- [ ] 2200324573_62985445
- [ ] 2200324594_62985467
- [ ] 2200324606_62985463
- [ ] 2200324614_62985459
- [ ] 2200338306_62985430
- [ ] 2200343880_62989407
- [ ] 2200343910_62989460
- [ ] 2200343938_62989451
- [ ] 2200343966_62989456
- [ ] 2200343993_62989440
- [ ] 2200344013_62989464
- [ ] 2200349749_62989465
- [ ] 2200363462_62988848
- [ ] 2200377880_62991993
- [ ] 2200378032_62991991
- [ ] 2200383996_62993828
- [ ] 2200384015_62993796
- [ ] 2200384046_62993822
- [ ] 2200384117_62993808
- [ ] 2200384187_62993825
- [ ] 2200384231_62992898
- [ ] 2200385658_63060260
- [ ] 2200394260_62994732
- [ ] 2200395817_62994742
- [ ] 2200396731_62994737
- [ ] 2200424073_62999579
- [ ] 2200424207_62999632
- [ ] 2200426178_62999630
- [ ] 2200426243_62999635
- [ ] 2200426466_62999605
- [ ] 2200426642_62999627
- [ ] 2200427406_62999649
- [ ] 2200427512_62999639
- [ ] 2200428953_62999572
- [ ] 2200428981_62999600
- [ ] 2200428999_62999592
- [ ] 2200441970_63000868
- [ ] 2200441989_63000882
- [ ] 2200442135_63000864
- [ ] 2200442216_63000886
- [ ] 2200442257_63000951
- [ ] 2200451801_63003573
- [ ] 2200451862_63004218
- [ ] 2200451894_63004210
- [ ] 2200456165_63051294
- [ ] 2200459865_63004933
- [ ] 2200459968_63004937
- [ ] 2200460073_63004943
- [ ] 2200460121_63004684
- [ ] 2200467051_63003856
- [ ] 2200467225_63004940
- [ ] 2200467261_63004930
- [ ] 2200467338_63004925
- [ ] 2200470099_63004485
- [ ] 2200470142_63004480
- [ ] 2200471780_63004362
- [ ] 2200480910_63006466
- [ ] 2200495073_63010427
- [ ] 2200495510_63009152
- [ ] 2200508677_63060252
- [ ] 2200510531_63012582
- [ ] 2200510628_63012549
- [ ] 2200510657_63012554
- [ ] 2200511249_63012533
- [ ] 2200511274_63012586
- [ ] 2200517952_63060399
- [ ] 2200519525_63060439
- [ ] 2200524009_63014044
- [ ] 2200524609_63014046
- [ ] 2200524616_63014048
- [ ] 2200533429_63060425
- [ ] 2200539735_63060406
- [ ] 2200549908_63019339
- [ ] 2200549965_63019349
- [ ] 2200550414_63019357
- [ ] 2200550471_63020031
- [ ] 2200550490_63019351
- [ ] 2200550505_63019340
- [ ] 2200555565_63018614
- [ ] 2200559438_63020029
- [ ] 2200559682_63020030
- [ ] 2200559713_63019623
- [ ] 2200559739_63019626
- [ ] 2200569969_63019991
- [ ] 2200570001_63021580
- [ ] 2200570025_63021490
- [ ] 2200570035_63021491
- [ ] 2200570042_63021493
- [ ] 2200570050_63021494
- [ ] 2200579897_63024910
- [ ] 2200583995_63024866
- [ ] 2200584035_63024905
- [ ] 2200584069