7F3ARXB7ZM4WPC3FOH5RHXFHWB3MGZ6XJYJX5IH6RFQNUKOWHAOQC
6BKBLJ2QBO3O7C3J47SFFOYYGONV4A2FE3RTFKWBMDM4G5TLAH5QC
KOJCCQM3S44KSXDRRILDEZZ6J2CCBGZ7KNYNVIC6PLRXE6KTGBHQC
CDPYGRNUP6EEE77RHDGEALT4BUELT3RBSBUO35PLI3HLZ44KK26QC
RHWQQAAHNHFO3FLCGVB3SIDKNOUFJGZTDNN57IQVBMXXCWX74MKAC
ZT7ANUTIQZ4P6FB7ND64I5RDV742TUDQSYCKIKCHVJQILXI4GJJQC
6JOIJQ3N6CHKCNV3AA763TUD3XROFEQEWUDKPPHSPBPQ3YAOY6BQC
EUGCTHRO3OP2ETEQXLWAYPXVENU5XSL7LQYRNGJTGUB2SLDR3ELAC
CXW37WKZDOFBTPGZQGQVWDWGA7YWGGJ47SSD4KYEXD6MPERELGGAC
36DOXPCS6H5JNDEGCINCKM34KVMO3GO3PQ7ZLQTJUG2KVIEGY63QC
CNHFD23P6634DYH6WYSWLNORGZH2UEVRDVGR356PNSNF5ONBAKVAC
ME74GL3O7CDN3SEXU4QLQVMFB4TKR2UFVRNIEVBBYMJM2MQSEIRAC
FXA3ZBV64FML7W47IPHTAJFJHN3J3XHVHFVNYED47XFSBIGMBKRQC
5XXDOQUZ5ORUD6TSVRSOHABWOB4OFPDPQCN4DFU6IF5YDZE3LNRQC
*** TODO Ajouter images pour gain + IGV
SCHEDULED: <2023-09-19 Tue>
*** TODO Comprendre score LRR pour gain
SCHEDULED: <2023-09-19 Tue>
*** TODO Réponse reviewere
**** TODO Ajouter images pour gain + IGV
SCHEDULED: <2023-09-21 Thu>
**** TODO Comprendre score LRR pour gain
SCHEDULED: <2023-09-21 Thu>
**** TODO Corriger discussion: remaniement complexe
SCHEDULED: <2023-09-24 Sun>
**** TODO Ajouter image Dr Vidau
SCHEDULED: <2023-09-24 Sun>
**** TODO Phénotype "mild" chez le père + détailler moléculaire
SCHEDULED: <2023-09-24 Sun>
**** TODO Traitement FBXW7 ?
SCHEDULED: <2023-09-26 Tue>
*** TODO Resoumettre
SCHEDULED: <2023-10-03 Tue>
** Article thèse
*** Idée
Framework pour tester des pipeline d'exome
1. Les outils pour télécharger les données de comparaison (pipeline(s) nextflow) : GIAB +/- chm
2. Les outils pour comparer les VCF: package nix pour hap.py
3. les données brutes pour lancer le pipeline et comparer ensuite (GIAB)
NB: pipeline existant si on télécharge depuis SRA...
4. les outils pour génerer des données de synthèse : xamscissors (SNV seulement), bamsurgeon
5. des données de référence
CLOSED: [2023-09-10 Sun 15:44] SCHEDULED: <2023-09-09 Sat>
Merge depuis debug
***** DONE Mail alexis
CLOSED: [2023-09-01 Fri 10:32] SCHEDULED: <2023-08-31 Thu>
***** DONE Relancer test sanger
CLOSED: [2023-09-11 Mon 19:11] SCHEDULED: <2023-09-10 Sun>
***** DONE Mail Paul pour validation
CLOSED: [2023-09-11 Mon 19:11] SCHEDULED: <2023-09-10 Sun>
**** DONE Utiliser spliceAI >= 0.2 pour filtre au lieu de spip
CLOSED: [2023-09-11 Mon 21:48] SCHEDULED: <2023-09-11 Mon>
**** DONE Repasser tests sanger avec spliceAI
CLOSED: [2023-09-14 Thu 22:45] SCHEDULED: <2023-09-11 Mon>
**** DONE Corriger colonne récessive
CLOSED: [2023-09-14 Thu 22:57] SCHEDULED: <2023-09-14 Thu>
soit 1/1, soit 1/2
soit 0/1 avec 2 variants par gene
*** KILL Comparer les annotations sur 63003856
CLOSED: [2023-08-28 Mon 17:28]
**** Relancer le nouveau pipeline
*** KILL Ancienne version
CLOSED: [2023-08-28 Mon 17:24]
**** KILL HGVS
CLOSED: [2023-08-28 Mon 17:24]
**** KILL Filtrer après VEP
CLOSED: [2023-08-28 Mon 17:24]
**** KILL OMIM
CLOSED: [2023-08-28 Mon 17:24]
**** KILL clinvar
CLOSED: [2023-08-28 Mon 17:24]
**** KILL ACMG incidental
CLOSED: [2023-08-28 Mon 17:24]
**** KILL Grantham
CLOSED: [2023-08-28 Mon 17:24]
**** KILL LRG
CLOSED: [2023-04-18 mar. 17:22] SCHEDULED: <2023-04-18 Tue>
Vu avec alexis, n’est plus à jour
**** KILL Gnomad
CLOSED: [2023-08-28 Mon 17:24]
*** DONE Réordonner les colonnes :annotation:
CLOSED: [2023-08-31 Thu 10:38] SCHEDULED: <2023-08-28 Mon>
Pas d'OMIM, pas de CADD, pas de spliceAI
** DONE Porter exactement la version d'Alexis sur Helios
CLOSED: [2023-01-14 Sat 17:56]
Branche "prod"
** KILL Tester version d'alexis avec Nix
CLOSED: [2023-06-14 Wed 22:37]
*** DONE Ajouter clinvar
CLOSED: [2022-11-13 Sun 19:37]
*** DONE Alignement
CLOSED: [2022-11-13 Sun 12:52]
*** DONE Haplotype caller
CLOSED: [2022-11-13 Sun 13:00]
*** KILL Filter
CLOSED: [2023-06-14 Wed 22:37]
- [X] depth
- [ ] comon snp not path
Problème avec liste des ID
**** KILL variant annotation
CLOSED: [2023-06-14 Wed 22:37]
Besoin de vep
*** KILL Variant calling
CLOSED: [2023-06-14 Wed 22:37]
** KILL Tester sarek
CLOSED: [2023-08-12 Sat 15:53]
#+begin_src sh
module load apptainer/1.1.8
nextflow run nf-core/sarek -profile test,singularity --outdir test-sarek
#+end_src
Les dépendences ne se téléchargent pas correctement, on les extrait à la main
#+begin_src sh
rg -IN galaxyproject modules | sed 's/ //g;s/:$//' | sort | uniq > deps.txt
#+end_src
Nettoyage à la main
Puis
#+begin_src sh
cat deps.txt | xargs -L1 singularity pull
#+end_src
** DONE Support pour samplesheet
CLOSED: [2023-08-03 Thu 14:24] SCHEDULED: <2023-08-03 Thu 13:00>
/Entered on/ [2023-08-03 Thu 13:12]
** DONE Petit jeu de données : chr22 sur HG001
CLOSED: [2023-08-05 Sat 14:21] SCHEDULED: <2023-08-05 Sat>
** DONE Corriger OMIM annotation: manquant pour NMNAT1
CLOSED: [2023-09-16 Sat 22:47] SCHEDULED: <2023-09-16 Sat>
/Entered on/ [2023-09-16 Sat 19:32]
* Documentation
:PROPERTIES:
:CATEGORY: doc
:END:
** DONE Procédure d'installation nix + dependences pour VM CHU
CLOSED: [2023-04-22 Sat 15:27] SCHEDULED: <2023-04-13 Thu>
* Manuscript
:PROPERTIES:
:CATEGORY: manuscript
:END:
** DONE Flowchart pipeline (avec T2T)
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
* Tests :tests:
** KILL Non régression : version prod
CLOSED: [2023-05-23 Tue 08:46]
*** DONE ID common snp
CLOSED: [2022-11-19 Sat 21:36]
#+begin_src
$ wc -l ID_of_common_snp.txt
23194290 ID_of_common_snp.txt
$ wc -l /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
23194290 /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
#+end_src
*** DONE ID common snp not clinvar patho
CLOSED: [2022-12-11 Sun 20:11]
**** DONE Vérification du problème
CLOSED: [2022-12-11 Sun 16:30]
Sur le J:
21155134 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref
Version de "non-régression"
21155076 database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt
Nouvelle version
23193391 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt
Si on enlève les doublons
$ sort database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt | uniq > old.txt
$ wc -l old.txt
21107097 old.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt | uniq > new.txt
$ wc -l new.txt
21174578 new.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref | uniq > ref.txt
$ wc -l ref.txt
21107155 ref.txt
Si on regarde la différence
comm -23 ref.txt old.txt
rs1052692
rs1057518973
rs1057518973
rs11074121
rs112848754
rs12573787
rs145033890
rs147889095
rs1553904159
rs1560294695
rs1560296615
rs1560310926
rs1560325547
rs1560342418
rs1560356225
rs1578287542
...
On cherche le premier
bcftools query -i 'ID="rs1052692"' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 1619351 C A,T
Il est bien patho...
$ bcftools query -i 'POS=1619351' database/clinvar/clinvar.vcf.gz -f '%CHROM %POS %REF %ALT %INFO/CLNSIG\n'
19 1619351 C T Conflicting_interpretations_of_pathogenicity
On vérifie pour tous les autres
$ comm -23 ref.txt old.txt > tocheck.txt
On génère les régions à vérifier (chromosome number:position)
$ bcftools query -i 'ID=@tocheck.txt' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM\t%POS\n' > tocheck.pos
On génère le mapping inverse (chromosome number -> NC)
$ awk ' { t = $1; $1 = $2; $2 = t; print; } ' database/RefSeq/refseq_to_number_only_consensual.txt > mapping.txt
On remap clinvar
$ bcftools annotate --rename-chrs mapping.txt database/clinvar/clinvar.vcf.gz -o clinvar_remapped.vcf.gz
$ tabix clinvar_remapped.vcf.gz
Enfin, on cherche dans clinvar la classification
$ bcftools query -R tocheck.pos clinvar_remapped.vcf.gz -f '%CHROM %POS %INFO/CLNSIG\n'
$ bcftools query -R tocheck.pos database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %ID \n' | grep '^NC'
#+RESULTS:
**** DONE Comprendre pourquoi la nouvelle version donne un résultat différent
CLOSED: [2022-12-11 Sun 20:11]
***** DONE Même version dbsnp et clinvar ?
CLOSED: [2022-12-10 Sat 23:02]
Clinvar différent !
$ bcftools stats clinvar.gz
clinvar (Alexis)
SN 0 number of samples: 0
SN 0 number of records: 1492828
SN 0 number of no-ALTs: 965
SN 0 number of SNPs: 1338007
SN 0 number of MNPs: 5562
SN 0 number of indels: 144580
SN 0 number of others: 3714
SN 0 number of multiallelic sites: 0
SN 0 number of multiallelic SNP sites: 0
clinvar (new)
SN 0 number of samples: 0
SN 0 number of records: 1493470
SN 0 number of no-ALTs: 965
SN 0 number of SNPs: 1338561
SN 0 number of MNPs: 5565
SN 0 number of indels: 144663
SN 0 number of others: 3716
SN 0 number of multiallelic sites: 0
SN 0 number of multiallelic SNP sites: 0
***** DONE Mettre à jour clinvar et dbnSNP pour travailler sur les mêm bases
CLOSED: [2022-12-11 Sun 12:10]
Problème persiste
***** DONE Supprimer la conversion en int du chromosome
CLOSED: [2022-12-10 Sat 19:29]
***** KILL Même NC ?
CLOSED: [2022-12-10 Sat 19:29]
$ zgrep "contig=<ID=NC_\(.*\)" clinvar/GRCh38/clinvar.vcf.gz > contig.clinvar
$ diff contig.txt contig.clinvar
< ##contig=<ID=NC_012920.1>
***** DONE Tester sur chromosome 19: ok
CLOSED: [2022-12-11 Sun 13:53]
On prépare les données
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/dbSNP_common.vcf.gz -o dbSNP_common_19.vcf.gz
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/clinvar/GRCh38/clinvar.vcf.gz -o clinvar_19.vcf.gz
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data-alexis/dbSNP/dbSNP_common.vcf.gz -o dbSNP_common_19_old.vcf.gz
bcftools filter -i 'CHROM="19"' /Work/Groups/bisonex/data-alexis/clinvar/clinvar.vcf.gz -o clinvar_19_old.vcf.gz
#+end_src
On récupère les 2 versions du script
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga
CLOSED: [2023-09-10 Sun 15:44] SCHEDULED: <2023-09-09 Sat>
Merge depuis debug
***** DONE Mail alexis
CLOSED: [2023-09-01 Fri 10:32] SCHEDULED: <2023-08-31 Thu>
***** DONE Relancer test sanger
CLOSED: [2023-09-11 Mon 19:11] SCHEDULED: <2023-09-10 Sun>
***** DONE Mail Paul pour validation
CLOSED: [2023-09-11 Mon 19:11] SCHEDULED: <2023-09-10 Sun>
**** DONE Utiliser spliceAI >= 0.2 pour filtre au lieu de spip
CLOSED: [2023-09-11 Mon 21:48] SCHEDULED: <2023-09-11 Mon>
**** DONE Repasser tests sanger avec spliceAI
CLOSED: [2023-09-14 Thu 22:45] SCHEDULED: <2023-09-11 Mon>
**** DONE Corriger colonne récessive
CLOSED: [2023-09-14 Thu 22:57] SCHEDULED: <2023-09-14 Thu>
soit 1/1, soit 1/2
soit 0/1 avec 2 variants par gene
*** KILL Comparer les annotations sur 63003856
CLOSED: [2023-08-28 Mon 17:28]
**** Relancer le nouveau pipeline
*** KILL Ancienne version
CLOSED: [2023-08-28 Mon 17:24]
**** KILL HGVS
CLOSED: [2023-08-28 Mon 17:24]
**** KILL Filtrer après VEP
CLOSED: [2023-08-28 Mon 17:24]
**** KILL OMIM
CLOSED: [2023-08-28 Mon 17:24]
**** KILL clinvar
CLOSED: [2023-08-28 Mon 17:24]
**** KILL ACMG incidental
CLOSED: [2023-08-28 Mon 17:24]
**** KILL Grantham
CLOSED: [2023-08-28 Mon 17:24]
**** KILL LRG
CLOSED: [2023-04-18 mar. 17:22] SCHEDULED: <2023-04-18 Tue>
Vu avec alexis, n’est plus à jour
**** KILL Gnomad
CLOSED: [2023-08-28 Mon 17:24]
*** DONE Réordonner les colonnes :annotation:
CLOSED: [2023-08-31 Thu 10:38] SCHEDULED: <2023-08-28 Mon>
Pas d'OMIM, pas de CADD, pas de spliceAI
** DONE Porter exactement la version d'Alexis sur Helios
CLOSED: [2023-01-14 Sat 17:56]
Branche "prod"
** KILL Tester version d'alexis avec Nix
CLOSED: [2023-06-14 Wed 22:37]
*** DONE Ajouter clinvar
CLOSED: [2022-11-13 Sun 19:37]
*** DONE Alignement
CLOSED: [2022-11-13 Sun 12:52]
*** DONE Haplotype caller
CLOSED: [2022-11-13 Sun 13:00]
*** KILL Filter
CLOSED: [2023-06-14 Wed 22:37]
- [X] depth
- [ ] comon snp not path
Problème avec liste des ID
**** KILL variant annotation
CLOSED: [2023-06-14 Wed 22:37]
Besoin de vep
*** KILL Variant calling
CLOSED: [2023-06-14 Wed 22:37]
** KILL Tester sarek
CLOSED: [2023-08-12 Sat 15:53]
#+begin_src sh
module load apptainer/1.1.8
nextflow run nf-core/sarek -profile test,singularity --outdir test-sarek
#+end_src
Les dépendences ne se téléchargent pas correctement, on les extrait à la main
#+begin_src sh
rg -IN galaxyproject modules | sed 's/ //g;s/:$//' | sort | uniq > deps.txt
#+end_src
Nettoyage à la main
Puis
#+begin_src sh
cat deps.txt | xargs -L1 singularity pull
#+end_src
** DONE Support pour samplesheet
CLOSED: [2023-08-03 Thu 14:24] SCHEDULED: <2023-08-03 Thu 13:00>
/Entered on/ [2023-08-03 Thu 13:12]
** DONE Petit jeu de données : chr22 sur HG001
CLOSED: [2023-08-05 Sat 14:21] SCHEDULED: <2023-08-05 Sat>
** DONE Corriger OMIM annotation: manquant pour NMNAT1
CLOSED: [2023-09-16 Sat 22:47] SCHEDULED: <2023-09-16 Sat>
/Entered on/ [2023-09-16 Sat 19:32]
* Documentation
:PROPERTIES:
:CATEGORY: doc
:END:
** DONE Procédure d'installation nix + dependences pour VM CHU
CLOSED: [2023-04-22 Sat 15:27] SCHEDULED: <2023-04-13 Thu>
* Manuscript
:PROPERTIES:
:CATEGORY: manuscript
:END:
** DONE Flowchart pipeline (avec T2T)
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
** TODO Figure: nombre de publication par aligneur
SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
** TODO Figure: nombre de publication par appel de variant
SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
** TODO Figure: nombre d'exomes par années
SCHEDULED: <2023-09-23 Sat>
/Entered on/ [2023-09-19 Tue 08:43]
* Tests :tests:
** KILL Non régression : version prod
CLOSED: [2023-05-23 Tue 08:46]
*** DONE ID common snp
CLOSED: [2022-11-19 Sat 21:36]
#+begin_src
$ wc -l ID_of_common_snp.txt
23194290 ID_of_common_snp.txt
$ wc -l /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
23194290 /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
#+end_src
*** DONE ID common snp not clinvar patho
CLOSED: [2022-12-11 Sun 20:11]
**** DONE Vérification du problème
CLOSED: [2022-12-11 Sun 16:30]
Sur le J:
21155134 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref
Version de "non-régression"
21155076 database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt
Nouvelle version
23193391 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt
Si on enlève les doublons
$ sort database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt | uniq > old.txt
$ wc -l old.txt
21107097 old.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt | uniq > new.txt
$ wc -l new.txt
21174578 new.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref | uniq > ref.txt
$ wc -l ref.txt
21107155 ref.txt
Si on regarde la différence
comm -23 ref.txt old.txt
rs1052692
rs1057518973
rs1057518973
rs11074121
rs112848754
rs12573787
rs145033890
rs147889095
rs1553904159
rs1560294695
rs1560296615
rs1560310926
rs1560325547
rs1560342418
rs1560356225
rs1578287542
...
On cherche le premier
bcftools query -i 'ID="rs1052692"' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 1619351 C A,T
Il est bien patho...
$ bcftools query -i 'POS=1619351' database/clinvar/clinvar.vcf.gz -f '%CHROM %POS %REF %ALT %INFO/CLNSIG\n'
19 1619351 C T Conflicting_interpretations_of_pathogenicity
On vérifie pour tous les autres
$ comm -23 ref.txt old.txt > tocheck.txt
On génère les régions à vérifier (chromosome number:position)
$ bcftools query -i 'ID=@tocheck.txt' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM\t%POS\n' > tocheck.pos
On génère le mapping inverse (chromosome number -> NC)
$ awk ' { t = $1; $1 = $2; $2 = t; print; } ' database/RefSeq/refseq_to_number_only_consensual.txt > mapping.txt
On remap clinvar
$ bcftools annotate --rename-chrs mapping.txt database/clinvar/clinvar.vcf.gz -o clinvar_remapped.vcf.gz
$ tabix clinvar_remapped.vcf.gz
Enfin, on cherche dans clinvar la classification
$ bcftools query -R tocheck.pos clinvar_remapped.vcf.gz -f '%CHROM %POS %INFO/CLNSIG\n'
$ bcftools query -R tocheck.pos database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %ID \n' | grep '^NC'
#+RESULTS:
**** DONE Comprendre pourquoi la nouvelle version donne un résultat différent
CLOSED: [2022-12-11 Sun 20:11]
***** DONE Même version dbsnp et clinvar ?
CLOSED: [2022-12-10 Sat 23:02]
Clinvar différent !
$ bcftools stats clinvar.gz
clinvar (Alexis)
SN 0 number of samples: 0
SN 0 number of records: 1492828
SN 0 number of no-ALTs: 965
SN 0 number of SNPs: 1338007
SN 0 number of MNPs: 5562
SN 0 number of indels: 144580
SN 0 number of others: 3714
SN 0 number of multiallelic sites: 0
SN 0 number of multiallelic SNP sites: 0
clinvar (new)
SN 0 number of samples: 0
SN 0 number of records: 1493470
SN 0 number of no-ALTs: 965
SN 0 number of SNPs: 1338561
SN 0 number of MNPs: 5565
SN 0 number of indels: 144663
SN 0 number of others: 3716
SN 0 number of multiallelic sites: 0
SN 0 number of multiallelic SNP sites: 0
***** DONE Mettre à jour clinvar et dbnSNP pour travailler sur les mêm bases
CLOSED: [2022-12-11 Sun 12:10]
Problème persiste
***** DONE Supprimer la conversion en int du chromosome
CLOSED: [2022-12-10 Sat 19:29]
***** KILL Même NC ?
CLOSED: [2022-12-10 Sat 19:29]
$ zgrep "contig=<ID=NC_\(.*\)" clinvar/GRCh38/clinvar.vcf.gz > contig.clinvar
$ diff contig.txt contig.clinvar
< ##contig=<ID=NC_012920.1>
***** DONE Tester sur chromosome 19: ok
CLOSED: [2022-12-11 Sun 13:53]
On prépare les données
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/dbSNP_common.vcf.gz -o dbSNP_common_19.vcf.gz
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/clinvar/GRCh38/clinvar.vcf.gz -o clinvar_19.vcf.gz
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data-alexis/dbSNP/dbSNP_common.vcf.gz -o dbSNP_common_19_old.vcf.gz
bcftools filter -i 'CHROM="19"' /Work/Groups/bisonex/data-alexis/clinvar/clinvar.vcf.gz -o clinvar_19_old.vcf.gz
#+end_src
On récupère les 2 versions du script
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga
r 2 et mauvaise qualité sur 3
***** DONE Vérifier après filterdepth: 0 perdus en plus
CLOSED: [2023-08-20 Sun 09:18] SCHEDULED: <2023-08-20 Sun>
***** DONE Vérifier après filterpolymorphis : 0 perdus en plus
CLOSED: [2023-08-20 Sun 09:18] SCHEDULED: <2023-08-20 Sun>
***** DONE Vérifier après filter vep: 2 perdus en plus
CLOSED: [2023-08-20 Sun 12:37] SCHEDULED: <2023-08-20 Sun>
2×3 DataFrame
Row │ variant meanQual depth
│ String Float64 Int64
─────┼─────────────────────────────────────
1 │ chr17:g.7852503T>C 60.0 96
2 │ chrX:g.47575255G>A 60.0 145
***** DONE 1ere correction spip: meilleur nombre de variants en sortie mais manque toujours ces 2
CLOSED: [2023-08-20 Sun 11:38]
***** DONE --pick : résout le problème
CLOSED: [2023-08-20 Sun 12:37]
chrX:g.47575255G>A est rendu downstream_gene_variant avec l'option --pick
Or il n'est pas en5' dans les transcrits refseq...
https://genome-euro.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chrX%3A47575242%2D47575268&hgsid=301211823_xpelPqPJije7wSIhg070JeGH5ZwV
https://mobidetails.iurc.montp.inserm.fr/MD/api/variant/238296/browser/
Idem pour l'autre
chr17:g.7852503T>C
https://mobidetails.iurc.montp.inserm.fr/MD/api/variant/182993/browser/
Note:
VEP chooses one block of annotation per variant, using an ordered set of criteria. This order may be customised using --pick_order.
MANE Select transcript status
MANE Plus Clinical transcript status
canonical status of transcript
APPRIS isoform annotation
transcript support level
biotype of transcript ("protein_coding" preferred)
CCDS status of transcript
consequence rank according to this table
translated, transcript or feature length (longer preferred)
"Wherever possible we would discourage you from summarising data in this way. "
**** DONE Mail alexis
CLOSED: [2023-08-20 Sun 13:45] SCHEDULED: <2023-08-20 Sun>
**** TODO Données simuscop 200x
SCHEDULED: <2023-09-24 Sun>
**** DONE En T2T avec liftover (filtre = spip) : ok mais lent et trop de variants :tests:
CLOSED: [2023-09-17 Sun 17:13] SCHEDULED: <2023-09-17 Sun>
1. Conversion en bed
#+begin_src sh :dir:~/code/sanger
open snvs-cento-sanger.csv | select chrom pos | insert pos2 {$in.pos } | to csv --separator="\t" | save snvs-cento-sanger.bed -f
#+end_src
2. Liftover avec UCSC (en ligne)
NB: vérifié sur le premier résultat en cherche le read contenant le variant (samtools view -r puis samtools view | grep en T2T) et avec l'aide d'IGV, on a un variant qui correspond en
chr1:10757746
3. En supposant que l'ordre des variants n'a pas changé, on ajoute simplement REF et ALT avec annotateLifted.jl
Annotation spip *très lente* : 1h13 !
Résultat:
2×3 DataFrame
Row │ variant meanQual depth
│ String Float64 Int64
─────┼──────────────────────────────────────
1 │ chr12:g.13594572 60.0 1
2 │ chr17:g.10204026 60.0 1
144 found over 146
filter depth : another 0 missed variants
filter poly : another 0 missed variants
filter vep : another 0 missed variants
Et on a trop de variants en sortie (7330 !)
**** DONE Mail Paul avec résultats filtre en T2T + nouveau schéma
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
* Ré-interprétation
** TODO Lancer tests sur données brutes [6/250]
- [X] 100222_63015289
- [X] 1600304839_63051311
- [X] 1900007827_62913191
- [X] 1900398899_62999500
- [X] 1900486799_62913197
- [X] 2100422923_62952677
- [ ] 2100458888_62933047
- [ ] 2100601558_62903840
- [ ] 2100609288_62905768
- [ ] 2100609501_62905776
- [ ] 2100614493_62951074
- [ ] 2100622566_62908067
- [ ] 2100622601_62908060
- [ ] 2100622705_62908063
- [ ] 2100640027_62911936
- [ ] 2100645285_62913212
- [ ] 2100661411_62914081
- [ ] 2100661462_62914086
- [ ] 2100708257_62921596
- [ ] 2100738732_62926501
- [ ] 2100738850_62926509
- [ ] 2100746751_62926505
- [ ] 2100746797_62926506
- [ ] 2100782349_62931722
- [ ] 2100782416_62931561
- [ ] 2100782559_62931718
- [ ] 2100799204_62934768
- [ ] 2200010202_62940284
- [ ] 2200023600_62940631
- [ ] 2200024348_62999591
- [ ] 2200027505_62942457
- [ ] 2200038776_62943412
- [ ] 2200041919_62943405
- [ ] 2200088014_62951326
- [ ] 2200146652_62959388
- [ ] 2200151850_62960953
- [ ] 2200160014_62959475
- [ ] 2200160070_62959478
- [ ] 2200201368_62967471
- [ ] 2200201400_62967470
- [ ] 2200265558_62976332
- [ ] 2200265605_62976401
- [ ] 2200267046_62975192
- [ ] 2200273878_62999530
- [ ] 2200279708_62977002
- [ ] 2200284408_62979102
- [ ] 2200293987_62979116
- [ ] 2200294359_62979118
- [ ] 2200306299_62982217
- [ ] 2200306539_62982193
- [ ] 220030671_62982211
- [ ] 2200307058_62982231
- [ ] 2200307108_62982196
- [ ] 2200307136_62982221
- [ ] 2200307199_62982239
- [ ] 2200307230_62982234
- [ ] 2200307262_62982219
- [ ] 2200307297_62982227
- [ ] 2200324510_62985453
- [ ] 2200324549_62985478
- [ ] 2200324573_62985445
- [ ] 2200324594_62985467
- [ ] 2200324606_62985463
- [ ] 2200324614_62985459
- [ ] 2200338306_62985430
- [ ] 2200343880_62989407
- [ ] 2200343910_62989460
- [ ] 2200343938_62989451
- [ ] 2200343966_62989456
- [ ] 2200343993_62989440
- [ ] 2200344013_62989464
- [ ] 2200349749_62989465
- [ ] 2200363462_62988848
- [ ] 2200377880_62991993
- [ ] 2200378032_62991991
- [ ] 2200383996_62993828
- [ ] 2200384015_62993796
- [ ] 2200384046_62993822
- [ ] 2200384117_62993808
- [ ] 2200384187_62993825
- [ ] 2200384231_62992898
- [ ] 2200385658_63060260
- [ ] 2200394260_62994732
- [ ] 2200395817_62994742
- [ ] 2200396731_62994737
- [ ] 2200424073_62999579
- [ ] 2200424207_62999632
- [ ] 2200426178_62999630
- [ ] 2200426243_62999635
- [ ] 2200426466_62999605
- [ ] 2200426642_62999627
- [ ] 2200427406_62999649
- [ ] 2200427512_62999639
- [ ] 2200428953_62999572
- [ ] 2200428981_62999600
- [ ] 2200428999_62999592
- [ ] 2200441970_63000868
- [ ] 2200441989_63000882
- [ ] 2200442135_63000864
- [ ] 2200442216_63000886
- [ ] 2200442257_63000951
- [ ] 2200451801_63003573
- [ ] 2200451862_63004218
- [ ] 2200451894_63004210
- [ ] 2200456165_63051294
- [ ] 2200459865_63004933
- [ ] 2200459968_63004937
- [ ] 2200460073_63004943
- [ ] 2200460121_63004684
- [ ] 2200467051_63003856
- [ ] 2200467225_63004940
- [ ] 2200467261_63004930
- [ ] 2200467338_63004925
- [ ] 2200470099_63004485
- [ ] 2200470142_63004480
- [ ] 2200471780_63004362
- [ ] 2200480910_63006466
- [ ] 2200495073_63010427
- [ ] 2200495510_63009152
- [ ] 2200508677_63060252
- [ ] 2200510531_63012582
- [ ] 2200510628_63012549
- [ ] 2200510657_63012554
- [ ] 2200511249_63012533
- [ ] 2200511274_63012586
- [ ] 2200517952_63060399
- [ ] 2200519525_63060439
- [ ] 2200524009_63014044
- [ ] 2200524609_63014046
- [ ] 2200524616_63014048
- [ ] 2200533429_63060425
- [ ] 2200539735_63060406
- [ ] 2200549908_63019339
- [ ] 2200549965_63019349
- [ ] 2200550414_63019357
- [ ] 2200550471_63020031
- [ ] 2200550490_63019351
- [ ] 2200550505_63019340
- [ ] 2200555565_63018614
- [ ] 2200559438_63020029
- [ ] 2200559682_63020030
- [ ] 2200559713_63019623
- [ ] 2200559739_63019626
- [ ] 2200569969_63019991
- [ ] 2200570001_63021580
- [ ] 2200570025_63021490
- [ ] 2200570035_63021491
- [ ] 2200570042_63021493
- [ ] 2200570050_63021494
- [ ] 2200579897_63024910
- [ ] 2200583995_63024866
- [ ] 2200584035_63024905
- [ ] 2200584069_63024888
- [ ] 2200584126_63024810
- [ ] 2200589507_63026712
- [ ] 2200597365_63027994
- [ ] 2200597480_63027988
- [ ] 2200597752_63026853
- [ ] 2200597778_63027992
- [ ] 22005977_63026903
- [ ] 2200609031_63026527
- [ ] 2200614198_63113928
- [ ] 2200620372_63030821
- [ ] 2200620442_63030810
- [ ] 2200620498_63030816
- [ ] 2200620628_63031031
-
r 2 et mauvaise qualité sur 3
***** DONE Vérifier après filterdepth: 0 perdus en plus
CLOSED: [2023-08-20 Sun 09:18] SCHEDULED: <2023-08-20 Sun>
***** DONE Vérifier après filterpolymorphis : 0 perdus en plus
CLOSED: [2023-08-20 Sun 09:18] SCHEDULED: <2023-08-20 Sun>
***** DONE Vérifier après filter vep: 2 perdus en plus
CLOSED: [2023-08-20 Sun 12:37] SCHEDULED: <2023-08-20 Sun>
2×3 DataFrame
Row │ variant meanQual depth
│ String Float64 Int64
─────┼─────────────────────────────────────
1 │ chr17:g.7852503T>C 60.0 96
2 │ chrX:g.47575255G>A 60.0 145
***** DONE 1ere correction spip: meilleur nombre de variants en sortie mais manque toujours ces 2
CLOSED: [2023-08-20 Sun 11:38]
***** DONE --pick : résout le problème
CLOSED: [2023-08-20 Sun 12:37]
chrX:g.47575255G>A est rendu downstream_gene_variant avec l'option --pick
Or il n'est pas en5' dans les transcrits refseq...
https://genome-euro.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chrX%3A47575242%2D47575268&hgsid=301211823_xpelPqPJije7wSIhg070JeGH5ZwV
https://mobidetails.iurc.montp.inserm.fr/MD/api/variant/238296/browser/
Idem pour l'autre
chr17:g.7852503T>C
https://mobidetails.iurc.montp.inserm.fr/MD/api/variant/182993/browser/
Note:
VEP chooses one block of annotation per variant, using an ordered set of criteria. This order may be customised using --pick_order.
MANE Select transcript status
MANE Plus Clinical transcript status
canonical status of transcript
APPRIS isoform annotation
transcript support level
biotype of transcript ("protein_coding" preferred)
CCDS status of transcript
consequence rank according to this table
translated, transcript or feature length (longer preferred)
"Wherever possible we would discourage you from summarising data in this way. "
**** DONE Mail alexis
CLOSED: [2023-08-20 Sun 13:45] SCHEDULED: <2023-08-20 Sun>
**** TODO Données simuscop 200x
SCHEDULED: <2023-09-24 Sun>
**** DONE En T2T avec liftover (filtre = spip) : ok mais lent et trop de variants :tests:
CLOSED: [2023-09-17 Sun 17:13] SCHEDULED: <2023-09-17 Sun>
1. Conversion en bed
#+begin_src sh :dir:~/code/sanger
open snvs-cento-sanger.csv | select chrom pos | insert pos2 {$in.pos } | to csv --separator="\t" | save snvs-cento-sanger.bed -f
#+end_src
2. Liftover avec UCSC (en ligne)
NB: vérifié sur le premier résultat en cherche le read contenant le variant (samtools view -r puis samtools view | grep en T2T) et avec l'aide d'IGV, on a un variant qui correspond en
chr1:10757746
3. En supposant que l'ordre des variants n'a pas changé, on ajoute simplement REF et ALT avec annotateLifted.jl
Annotation spip *très lente* : 1h13 !
Résultat:
2×3 DataFrame
Row │ variant meanQual depth
│ String Float64 Int64
─────┼──────────────────────────────────────
1 │ chr12:g.13594572 60.0 1
2 │ chr17:g.10204026 60.0 1
144 found over 146
filter depth : another 0 missed variants
filter poly : another 0 missed variants
filter vep : another 0 missed variants
Et on a trop de variants en sortie (7330 !)
**** DONE Mail Paul avec résultats filtre en T2T + nouveau schéma
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
* Ré-interprétation
** TODO Lancer tests sur données brutes [26/250]
- [X] 100222_63015289
- [X] 1600304839_63051311
- [X] 1900007827_62913191
- [X] 1900398899_62999500
- [X] 1900486799_62913197
- [X] 2100422923_62952677
- [X] 2100458888_62933047
- [X] 2100601558_62903840
- [X] 2100609288_62905768
- [X] 2100609501_62905776
- [X] 2100614493_62951074
- [X] 2100622566_62908067
- [X] 2100622601_62908060
- [X] 2100622705_62908063
- [X] 2100640027_62911936
- [X] 2100645285_62913212
- [X] 2100661411_62914081
- [X] 2100661462_62914086
- [X] 2100708257_62921596
- [X] 2100738732_62926501
- [X] 2100738850_62926509
- [X] 2100746751_62926505
- [X] 2100746797_62926506
- [X] 2100782349_62931722
- [X] 2100782416_62931561
- [X] 2100782559_62931718
- [ ] 2100799204_62934768
- [ ] 2200010202_62940284
- [ ] 2200023600_62940631
- [ ] 2200024348_62999591
- [ ] 2200027505_62942457
- [ ] 2200038776_62943412
- [ ] 2200041919_62943405
- [ ] 2200088014_62951326
- [ ] 2200146652_62959388
- [ ] 2200151850_62960953
- [ ] 2200160014_62959475
- [ ] 2200160070_62959478
- [ ] 2200201368_62967471
- [ ] 2200201400_62967470
- [ ] 2200265558_62976332
- [ ] 2200265605_62976401
- [ ] 2200267046_62975192
- [ ] 2200273878_62999530
- [ ] 2200279708_62977002
- [ ] 2200284408_62979102
- [ ] 2200293987_62979116
- [ ] 2200294359_62979118
- [ ] 2200306299_62982217
- [ ] 2200306539_62982193
- [ ] 220030671_62982211
- [ ] 2200307058_62982231
- [ ] 2200307108_62982196
- [ ] 2200307136_62982221
- [ ] 2200307199_62982239
- [ ] 2200307230_62982234
- [ ] 2200307262_62982219
- [ ] 2200307297_62982227
- [ ] 2200324510_62985453
- [ ] 2200324549_62985478
- [ ] 2200324573_62985445
- [ ] 2200324594_62985467
- [ ] 2200324606_62985463
- [ ] 2200324614_62985459
- [ ] 2200338306_62985430
- [ ] 2200343880_62989407
- [ ] 2200343910_62989460
- [ ] 2200343938_62989451
- [ ] 2200343966_62989456
- [ ] 2200343993_62989440
- [ ] 2200344013_62989464
- [ ] 2200349749_62989465
- [ ] 2200363462_62988848
- [ ] 2200377880_62991993
- [ ] 2200378032_62991991
- [ ] 2200383996_62993828
- [ ] 2200384015_62993796
- [ ] 2200384046_62993822
- [ ] 2200384117_62993808
- [ ] 2200384187_62993825
- [ ] 2200384231_62992898
- [ ] 2200385658_63060260
- [ ] 2200394260_62994732
- [ ] 2200395817_62994742
- [ ] 2200396731_62994737
- [ ] 2200424073_62999579
- [ ] 2200424207_62999632
- [ ] 2200426178_62999630
- [ ] 2200426243_62999635
- [ ] 2200426466_62999605
- [ ] 2200426642_62999627
- [ ] 2200427406_62999649
- [ ] 2200427512_62999639
- [ ] 2200428953_62999572
- [ ] 2200428981_62999600
- [ ] 2200428999_62999592
- [ ] 2200441970_63000868
- [ ] 2200441989_63000882
- [ ] 2200442135_63000864
- [ ] 2200442216_63000886
- [ ] 2200442257_63000951
- [ ] 2200451801_63003573
- [ ] 2200451862_63004218
- [ ] 2200451894_63004210
- [ ] 2200456165_63051294
- [ ] 2200459865_63004933
- [ ] 2200459968_63004937
- [ ] 2200460073_63004943
- [ ] 2200460121_63004684
- [ ] 2200467051_63003856
- [ ] 2200467225_63004940
- [ ] 2200467261_63004930
- [ ] 2200467338_63004925
- [ ] 2200470099_63004485
- [ ] 2200470142_63004480
- [ ] 2200471780_63004362
- [ ] 2200480910_63006466
- [ ] 2200495073_63010427
- [ ] 2200495510_63009152
- [ ] 2200508677_63060252
- [ ] 2200510531_63012582
- [ ] 2200510628_63012549
- [ ] 2200510657_63012554
- [ ] 2200511249_63012533
- [ ] 2200511274_63012586
- [ ] 2200517952_63060399
- [ ] 2200519525_63060439
- [ ] 2200524009_63014044
- [ ] 2200524609_63014046
- [ ] 2200524616_63014048
- [ ] 2200533429_63060425
- [ ] 2200539735_63060406
- [ ] 2200549908_63019339
- [ ] 2200549965_63019349
- [ ] 2200550414_63019357
- [ ] 2200550471_63020031
- [ ] 2200550490_63019351
- [ ] 2200550505_63019340
- [ ] 2200555565_63018614
- [ ] 2200559438_63020029
- [ ] 2200559682_63020030
- [ ] 2200559713_63019623
- [ ] 2200559739_63019626
- [ ] 2200569969_63019991
- [ ] 2200570001_63021580
- [ ] 2200570025_63021490
- [ ] 2200570035_63021491
- [ ] 2200570042_63021493
- [ ] 2200570050_63021494
- [ ] 2200579897_63024910
- [ ] 2200583995_63024866
- [ ] 2200584035_63024905
- [ ] 2200584069_63024888
- [ ] 2200584126_63024810
- [ ] 2200589507_63026712
- [ ] 2200597365_63027994
- [ ] 2200597480_63027988
- [ ] 2200597752_63026853
- [ ] 2200597778_63027992
- [ ] 22005977_63026903
- [ ] 2200609031_63026527
- [ ] 2200614198_63113928
- [ ] 2200620372_63030821
- [ ] 2200620442_63030810
- [ ] 2200620498_63030816
- [ ] 2200620628_63031031
-