B:BD[
2.24606] → [
2.24606:40990]
Test
cp work/bf/437ae511958509e43072f032f4d495/small.tab.gz tests/vep-spip.tab.gz
cp work/d5/3b1244b5ae83d54409ee0d456e8c55/small_cadd.tab.gz tests/vep-cadd-splice.tab.gz
**** HOLD Package Nix spliceAI ?
nix profile install nixpkgs#python3Packages.tensorflow
+ ajouter dépendencs ("grep import" ou cnad)
**** TODO Ajout LOEUF et pli
plugin VEP
**** TODO NMD
plugin VEP
**** KILL Ajout LOEUF
CLOSED: [2023-04-19 mer. 16:32]
plugin VEP
**** DONE Spip
CLOSED: [2023-05-01 Mon 23:07] SCHEDULED: <2023-04-30 Sun>
BED ne semble pas bien marcher (il faut définir une zone)
VCF : trop d’information
Attention, plusieurs transcripts mais résultats identiques. On supprimer les doublons
***** DONE interpretation + score + intervalle de confiance séparé
CLOSED: [2023-05-01 Mon 23:07] SCHEDULED: <2023-04-30 Sun>
Tests :
dans tests/
vep -i 63004925-small.vcf -o postvep.vcf --vcf --fasta genomeRef.fna --dir 109 --merged --pick --offline --custom ../script/spip_annotation.vcf.gz,SPIP,vcf,exact,0,spipInterp,spipScore,spipConfidence
***** DONE Score
CLOSED: [2023-04-22 Sat 15:30]
**** DONE CADD: remplacer par plugin VEP
CLOSED: [2023-05-07 Sun 14:45] SCHEDULED: <2023-05-07 Sun>
***** Test
#+begin_src
vep -i test.vcf -o lol.vcf --offline --dir /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz --dir_plugins ../VEP_plugins/ -v
#+end_src
Test
#+begin_src sh
vep --id "1 230710048 230710048 A/G 1" --offline --dir /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz --hgvsg --plugin pLI --plugin LOEUF -o lol
#+end_src
CSQ=G|missense_variant|MODERATE|AGT|ENSG00000135744|Transcript|ENST00000366667|protein_coding|2/5||||843|776|259|M/T|aTg/aCg|||-1||HGNC|HGNC:333||Ensembl||A|A||1:g.230710048A>G|0.347|-0.277922|
Correspond bien à https://www.ensembl.org/Homo_sapiens/Tools/VEP/Results?tl=I7ZsIbrj14P6lD43-9115494
***** DONE Utiliser whole genome
CLOSED: [2023-04-29 Sat 15:46]
***** KILL Renommer les chromosome avant ...
CLOSED: [2023-05-01 Mon 09:14] SCHEDULED: <2023-04-30 Sun>
Trop long !
- Téléchargement de CADD: 4h20
- renommer les chromosome pour SNV : 6h20
- tabix sur les SNV : job tué au bout de 21h....
***** DONE annoter séparément et fusionner les tableaux
CLOSED: [2023-05-07 Sun 14:45] SCHEDULED: <2023-05-01 Mon>
NB: on pourrait filtrer CADD avec tabix pour se restreindre à nos variants
**** DONE clinvar
CLOSED: [2023-04-22 Sat 15:31]
**** KILL Vérifier résultats HGVS avec mutalyzer
CLOSED: [2023-05-01 Mon 09:26]
**** HOLD Parallélisation
***** HOLD par chromosome avec workflow VEP
https://github.com/Ensembl/ensembl-vep/blob/release/109/nextflow/workflows/run_vep.nf
***** HOLD Avec option --fork
**** DONE Utiliser la version de nf-core de VEP
CLOSED: [2023-05-13 Sat 18:27] SCHEDULED: <2023-05-07 Sun>
**** DONE OMIM
CLOSED: [2023-08-31 Thu 10:38] SCHEDULED: <2023-08-29 Tue>
**** DONE plI et LOEUF depuis gnomad
CLOSED: [2023-08-31 Thu 10:38] SCHEDULED: <2023-08-29 Tue>
**** DONE Grantham
CLOSED: [2023-08-31 Thu 22:08] SCHEDULED: <2023-08-30 Wed>
**** DONE Corriger spliceAI
CLOSED: [2023-08-31 Thu 13:51] SCHEDULED: <2023-08-31 Thu>
Pas d'annotation
- chromosome ? essai 1 au lieu de chr1 : idem. Et fonctionne pour CADD
- index ?
- retélécharger
- indexer nous-meme
**** DONE Supprimer score spip en double
CLOSED: [2023-08-31 Thu 14:17] SCHEDULED: <2023-08-31 Thu>
**** DONE Vérifier variant 63126867
CLOSED: [2023-08-31 Thu 10:52] SCHEDULED: <2023-08-31 Thu>
**** DONE Ajouter tronquant ou non
CLOSED: [2023-08-31 Thu 22:08] SCHEDULED: <2023-08-31 Thu>
**** DONE Ajouter récessif
CLOSED: [2023-08-31 Thu 22:08] SCHEDULED: <2023-08-31 Thu>
**** KILL Corriger allelic depth
CLOSED: [2023-08-31 Thu 11:18] SCHEDULED: <2023-08-31 Thu>
Problème lié à libre office
**** DONE Regénérer annotation pour na12878, inserted et patient PEX1
CLOSED: [2023-08-31 Thu 22:08] SCHEDULED: <2023-08-31 Thu>
**** TODO ACMG incidental
**** DONE Sortie VCF (pour avoir la fraction allélique AF)
CLOSED: [2023-08-28 Mon 17:22]
**** DONE VCF -> tsv avec bcftools
CLOSED: [2023-08-29 Tue 11:03] SCHEDULED: <2023-08-28 Mon>
**** DONE Un seul transcrit après VEP avec filter_vep :filter:
CLOSED: [2023-08-29 Tue 11:03] SCHEDULED: <2023-08-28 Mon>
Avec mise à jour VEP 110, pick_flag semble fonctionner.
***** DONE Test chr20: Pas de variant "perdus"
CLOSED: [2023-08-28 Mon 17:31] SCHEDULED: <2023-08-28 Mon>
contrairement au résultat communiqué à alexis par mail
#+begin_src sh :dir out/annotate
bcftools +counts vep/NA12878-sanger-chr20-GRCh38/NA12878-sanger-chr20-GRCh38.vep.vcf.gz
#+end_src
Number of samples: 1
Number of SNPs: 123
Number of INDELs: 32
Number of MNPs: 53
Number of others: 0
Number of sites: 208
#+begin_src sh
filter_vep -i vep/NA12878-sanger-chr20-GRCh38/NA12878-sanger-chr20-GRCh38.vep.vcf.gz --filter 'PICK' | bcftools +counts
#+end_src
Number of samples: 1
Number of SNPs: 123
Number of INDELs: 32
Number of MNPs: 53
Number of others: 0
Number of sites: 208
2nd vérification
#+begin_src sh :dir out/annotate
filter_vep -i vep/NA12878-sanger-chr20-GRCh38/NA12878-sanger-chr20-GRCh38.vep.vcf.gz --filter 'PICK' --soft_filter | grep fail
#+end_src
***** DONE Test NA12878 + variants sanger : variants perdus avec --pick ?
CLOSED: [2023-08-29 Tue 10:36] SCHEDULED: <2023-08-28 Mon>
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/out/annotate
~/.nix-profile/bin/bcftools +counts vep/NA12878-sanger-all-GRCh38/NA12878-sanger-all-GRCh38.vep.vcf.gz
#+end_src
#+RESULTS:
| Number | of | samples: | 1 |
| Number | of | SNPs: | 6293 |
| Number | of | INDELs: | 1515 |
| Number | of | MNPs: | 1588 |
| Number | of | others: | 0 |
| Number | of | sites: | 9322 |
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/out/annotate
~/.nix-profile/bin/filter_vep -i vep/NA12878-sanger-all-GRCh38/NA12878-sanger-all-GRCh38.vep.vcf.gz --filter 'PICK' | bcftools +counts
#+end_src
| Number | of | samples: | 1 |
| Number | of | SNPs: | 6293 |
| Number | of | INDELs: | 1515 |
| Number | of | MNPs: | 1588 |
| Number | of | others: | 0 |
| Number | of | sites: | 9322 |
***** DONE Test NA12878 + variants sanger: vérifier sortie avec julia : ok
CLOSED: [2023-08-29 Tue 10:21] SCHEDULED: <2023-08-28 Mon>
143 variants/146 comme avant
***** DONE Relancer en T2T pour vérifier compatibilité :T2T:
CLOSED: [2023-08-29 Tue 11:03] SCHEDULED: <2023-08-29 Tue>
**** DONE Repasser les tests sanger sur NA12878
CLOSED: [2023-09-01 Fri 10:32] SCHEDULED: <2023-08-31 Thu>
2 variants manquants après filter vep
**** DONE Choisir le meilleur transcript nous-meme
CLOSED: [2023-09-01 Fri 10:32] SCHEDULED: <2023-09-01 Fri>
**** DONE Vérifier T2T passe
CLOSED: [2023-08-31 Thu 22:10] SCHEDULED: <2023-08-31 Thu>
**** DONE Revoir choix du transcrit + filtre avec paul
CLOSED: [2023-09-08 Fri 22:46] SCHEDULED: <2023-09-06 Wed>
**** DONE Filtrer les variants selon les filtres d'Alexis et garder tous les résultat
CLOSED: [2023-09-10 Sun 15:39] SCHEDULED: <2023-09-09 Sat>
**** DONE Ajout colonne MANE SELECT et garder les autres
CLOSED: [2023-09-10 Sun 15:39] SCHEDULED: <2023-09-09 Sat>
**** TODO v1.0
SCHEDULED: <2023-09-09 Sat>
***** DONE Branche prod
CLOSED: [2023-09-10 Sun 15:44] SCHEDULED: <2023-09-09 Sat>
Merge depuis debug
***** DONE Mail alexis
CLOSED: [2023-09-01 Fri 10:32] SCHEDULED: <2023-08-31 Thu>
***** STRT Relancer test sanger
SCHEDULED: <2023-09-10 Sun>
***** TODO Mail Paul pour validation
SCHEDULED: <2023-09-10 Sun>
*** KILL Comparer les annotations sur 63003856
CLOSED: [2023-08-28 Mon 17:28]
**** Relancer le nouveau pipeline
*** KILL Ancienne version
CLOSED: [2023-08-28 Mon 17:24]
**** KILL HGVS
CLOSED: [2023-08-28 Mon 17:24]
**** KILL Filtrer après VEP
CLOSED: [2023-08-28 Mon 17:24]
**** KILL OMIM
CLOSED: [2023-08-28 Mon 17:24]
**** KILL clinvar
CLOSED: [2023-08-28 Mon 17:24]
**** KILL ACMG incidental
CLOSED: [2023-08-28 Mon 17:24]
**** KILL Grantham
CLOSED: [2023-08-28 Mon 17:24]
**** KILL LRG
CLOSED: [2023-04-18 mar. 17:22] SCHEDULED: <2023-04-18 Tue>
Vu avec alexis, n’est plus à jour
**** KILL Gnomad
CLOSED: [2023-08-28 Mon 17:24]
*** DONE Réordonner les colonnes :annotation:
CLOSED: [2023-08-31 Thu 10:38] SCHEDULED: <2023-08-28 Mon>
Pas d'OMIM, pas de CADD, pas de spliceAI
** DONE Porter exactement la version d'Alexis sur Helios
CLOSED: [2023-01-14 Sat 17:56]
Branche "prod"
** KILL Tester version d'alexis avec Nix
CLOSED: [2023-06-14 Wed 22:37]
*** DONE Ajouter clinvar
CLOSED: [2022-11-13 Sun 19:37]
*** DONE Alignement
CLOSED: [2022-11-13 Sun 12:52]
*** DONE Haplotype caller
CLOSED: [2022-11-13 Sun 13:00]
*** KILL Filter
CLOSED: [2023-06-14 Wed 22:37]
- [X] depth
- [ ] comon snp not path
Problème avec liste des ID
**** KILL variant annotation
CLOSED: [2023-06-14 Wed 22:37]
Besoin de vep
*** KILL Variant calling
CLOSED: [2023-06-14 Wed 22:37]
** KILL Tester sarek
CLOSED: [2023-08-12 Sat 15:53]
#+begin_src sh
module load apptainer/1.1.8
nextflow run nf-core/sarek -profile test,singularity --outdir test-sarek
#+end_src
Les dépendences ne se téléchargent pas correctement, on les extrait à la main
#+begin_src sh
rg -IN galaxyproject modules | sed 's/ //g;s/:$//' | sort | uniq > deps.txt
#+end_src
Nettoyage à la main
Puis
#+begin_src sh
cat deps.txt | xargs -L1 singularity pull
#+end_src
** DONE Support pour samplesheet
CLOSED: [2023-08-03 Thu 14:24] SCHEDULED: <2023-08-03 Thu 13:00>
/Entered on/ [2023-08-03 Thu 13:12]
** DONE Petit jeu de données : chr22 sur HG001
CLOSED: [2023-08-05 Sat 14:21] SCHEDULED: <2023-08-05 Sat>
* Documentation
:PROPERTIES:
:CATEGORY: doc
:END:
** DONE Procédure d'installation nix + dependences pour VM CHU
CLOSED: [2023-04-22 Sat 15:27] SCHEDULED: <2023-04-13 Thu>
* Manuscript
:PROPERTIES:
:CATEGORY: manuscript
:END:
* Tests :tests:
** KILL Non régression : version prod
CLOSED: [2023-05-23 Tue 08:46]
*** DONE ID common snp
CLOSED: [2022-11-19 Sat 21:36]
#+begin_src
$ wc -l ID_of_common_snp.txt
23194290 ID_of_common_snp.txt
$ wc -l /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
23194290 /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
#+end_src
*** DONE ID common snp not clinvar patho
CLOSED: [2022-12-11 Sun 20:11]
**** DONE Vérification du problème
CLOSED: [2022-12-11 Sun 16:30]
Sur le J:
21155134 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref
Version de "non-régression"
21155076 database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt
Nouvelle version
23193391 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt
Si on enlève les doublons
$ sort database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt | uniq > old.txt
$ wc -l old.txt
21107097 old.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt | uniq > new.txt
$ wc -l new.txt
21174578 new.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref | uniq > ref.txt
$ wc -l ref.txt
21107155 ref.txt
Si on regarde la différence
comm -23 ref.txt old.txt
rs1052692
rs1057518973
rs1057518973
rs11074121
rs112848754
rs12573787
rs145033890
rs147889095
rs1553904159
rs1560294695
rs1560296615
rs1560310926
rs1560325547
rs1560342418
rs1560356225
rs1578287542
...
On cherche le premier
bcftools query -i 'ID="rs1052692"' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 1619351 C A,T
Il est bien patho...
$ bcftools query -i 'POS=1619351' database/clinvar/clinvar.vcf.gz -f '%CHROM %POS %REF %ALT %INFO/CLNSIG\n'
19 1619351 C T Conflicting_interpretations_of_pathogenicity
On vérifie pour tous les autres
$ comm -23 ref.txt old.txt > tocheck.txt
On génère les régions à vérifier (chromosome number:position)
$ bcftools query -i 'ID=@tocheck.txt' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM\t%POS\n' > tocheck.pos
On génère le mapping inverse (chromosome number -> NC)
$ awk ' { t = $1; $1 = $2; $2 = t; print; } ' database/RefSeq/refseq_to_number_only_consensual.txt > mapping.txt
On remap clinvar
$ bcftools annotate --rename-chrs mapping.txt database/clinvar/clinvar.vcf.gz -o clinvar_remapped.vcf.gz
$ tabix clinvar_remapped.vcf.gz
Enfin, on cherche dans clinvar la classification
$ bcftools query -R tocheck.pos clinvar_remapped.vcf.gz -f '%CHROM %POS %INFO/CLNSIG\n'
$ bcftools query -R tocheck.pos database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %ID \n' | grep '^NC'
#+RESULTS:
**** DONE Comprendre pourquoi la nouvelle version donne un résultat différent
CLOSED: [2022-12-11 Sun 20:11]
***** DONE Même version dbsnp et clinvar ?
CLOSED: [2022-12-10 Sat 23:02]
Clinvar différent !
$ bcftools stats clinvar.gz
clinvar (Alexis)
SN 0 number of samples: 0
SN 0 number of records: 1492828
SN 0 number of no-ALTs: 965
SN 0 number of SNPs: 1338007
SN 0 number of MNPs: 5562
SN 0 number of indels: 144580
SN 0 number of others: 3714
SN 0 number of multiallelic sites: 0
SN 0 number of multiallelic SNP sites: 0
clinvar (new)
SN 0 number of samples: 0
SN 0 number of records: 1493470
SN 0 number of no-ALTs: 965
SN 0 number of SNPs: 1338561
SN 0 number of MNPs: 5565
SN 0 number of indels: 144663
SN 0 number of others: 3716
SN 0 number of multiallelic sites: 0
SN 0 number of multiallelic SNP sites: 0
***** DONE Mettre à jour clinvar et dbnSNP pour travailler sur les mêm bases
CLOSED: [2022-12-11 Sun 12:10]
Problème persiste
***** DONE Supprimer la conversion en int du chromosome
CLOSED: [2022-12-10 Sat 19:29]
***** KILL Même NC ?
CLOSED: [2022-12-10 Sat 19:29]
$ zgrep "contig=<ID=NC_\(.*\)" clinvar/GRCh38/clinvar.vcf.gz > contig.clinvar
$ diff contig.txt contig.clinvar
< ##contig=<ID=NC_012920.1>
***** DONE Tester sur chromosome 19: ok
CLOSED: [2022-12-11 Sun 13:53]
On prépare les données
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/dbSNP_common.vcf.gz -o dbSNP_common_19.vcf.gz
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/clinvar/GRCh38/clinvar.vcf.gz -o clinvar_19.vcf.gz
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data-alexis/dbSNP/dbSNP_common.vcf.gz -o dbSNP_common_19_old.vcf.gz
bcftools filter -i 'CHROM="19"' /Work/Groups/bisonex/data-alexis/clinvar/clinvar.vcf.gz -o clinvar_19_old.vcf.gz
#+end_src
On récupère les 2 versions du script
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
git checkout regression ../../script/pythonScript/clinvar_sbSNP.py
cp ../../script/pythonScript/clinvar_sbSNP.py clinvar_sbSNP_old.py
git checkout HEAD ../../script/pythonScript/clinvar_sbSNP.py
#+end_src
#+RESULTS:
On compare
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
python ../../script/pythonScript/clinvar_sbSNP.py clinvar_sbSNP.py --clinvar clinvar_19.vcf.gz --dbSNP dbSNP_common_19.vcf.gz --output tmp.txt
sort tmp.txt | uniq > new.txt
table=/Work/Groups/bisonex/data-alexis/RefSeq/refseq_to_number_only_consensual.txt
python clinvar_sbSNP_old.py --clinvar clinvar_19_old.vcf.gz --dbSNP dbSNP_common_19_old.vcf.gz --output tmp_old.txt --chrm_name_table $table
sort tmp_old.txt | uniq > old.txt
wc -l old.txt new.txt
#+end_src
#+RESULTS:
| 535155 | old.txt |
| 535194 | new.txt |
| 1070349 | total
Test
cp work/bf/437ae511958509e43072f032f4d495/small.tab.gz tests/vep-spip.tab.gz
cp work/d5/3b1244b5ae83d54409ee0d456e8c55/small_cadd.tab.gz tests/vep-cadd-splice.tab.gz
**** HOLD Package Nix spliceAI ?
nix profile install nixpkgs#python3Packages.tensorflow
+ ajouter dépendencs ("grep import" ou cnad)
**** TODO Ajout LOEUF et pli
plugin VEP
**** TODO NMD
plugin VEP
**** KILL Ajout LOEUF
CLOSED: [2023-04-19 mer. 16:32]
plugin VEP
**** DONE Spip
CLOSED: [2023-05-01 Mon 23:07] SCHEDULED: <2023-04-30 Sun>
BED ne semble pas bien marcher (il faut définir une zone)
VCF : trop d’information
Attention, plusieurs transcripts mais résultats identiques. On supprimer les doublons
***** DONE interpretation + score + intervalle de confiance séparé
CLOSED: [2023-05-01 Mon 23:07] SCHEDULED: <2023-04-30 Sun>
Tests :
dans tests/
vep -i 63004925-small.vcf -o postvep.vcf --vcf --fasta genomeRef.fna --dir 109 --merged --pick --offline --custom ../script/spip_annotation.vcf.gz,SPIP,vcf,exact,0,spipInterp,spipScore,spipConfidence
***** DONE Score
CLOSED: [2023-04-22 Sat 15:30]
**** DONE CADD: remplacer par plugin VEP
CLOSED: [2023-05-07 Sun 14:45] SCHEDULED: <2023-05-07 Sun>
***** Test
#+begin_src
vep -i test.vcf -o lol.vcf --offline --dir /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz --dir_plugins ../VEP_plugins/ -v
#+end_src
Test
#+begin_src sh
vep --id "1 230710048 230710048 A/G 1" --offline --dir /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz --hgvsg --plugin pLI --plugin LOEUF -o lol
#+end_src
CSQ=G|missense_variant|MODERATE|AGT|ENSG00000135744|Transcript|ENST00000366667|protein_coding|2/5||||843|776|259|M/T|aTg/aCg|||-1||HGNC|HGNC:333||Ensembl||A|A||1:g.230710048A>G|0.347|-0.277922|
Correspond bien à https://www.ensembl.org/Homo_sapiens/Tools/VEP/Results?tl=I7ZsIbrj14P6lD43-9115494
***** DONE Utiliser whole genome
CLOSED: [2023-04-29 Sat 15:46]
***** KILL Renommer les chromosome avant ...
CLOSED: [2023-05-01 Mon 09:14] SCHEDULED: <2023-04-30 Sun>
Trop long !
- Téléchargement de CADD: 4h20
- renommer les chromosome pour SNV : 6h20
- tabix sur les SNV : job tué au bout de 21h....
***** DONE annoter séparément et fusionner les tableaux
CLOSED: [2023-05-07 Sun 14:45] SCHEDULED: <2023-05-01 Mon>
NB: on pourrait filtrer CADD avec tabix pour se restreindre à nos variants
**** DONE clinvar
CLOSED: [2023-04-22 Sat 15:31]
**** KILL Vérifier résultats HGVS avec mutalyzer
CLOSED: [2023-05-01 Mon 09:26]
**** HOLD Parallélisation
***** HOLD par chromosome avec workflow VEP
https://github.com/Ensembl/ensembl-vep/blob/release/109/nextflow/workflows/run_vep.nf
***** HOLD Avec option --fork
**** DONE Utiliser la version de nf-core de VEP
CLOSED: [2023-05-13 Sat 18:27] SCHEDULED: <2023-05-07 Sun>
**** DONE OMIM
CLOSED: [2023-08-31 Thu 10:38] SCHEDULED: <2023-08-29 Tue>
**** DONE plI et LOEUF depuis gnomad
CLOSED: [2023-08-31 Thu 10:38] SCHEDULED: <2023-08-29 Tue>
**** DONE Grantham
CLOSED: [2023-08-31 Thu 22:08] SCHEDULED: <2023-08-30 Wed>
**** DONE Corriger spliceAI
CLOSED: [2023-08-31 Thu 13:51] SCHEDULED: <2023-08-31 Thu>
Pas d'annotation
- chromosome ? essai 1 au lieu de chr1 : idem. Et fonctionne pour CADD
- index ?
- retélécharger
- indexer nous-meme
**** DONE Supprimer score spip en double
CLOSED: [2023-08-31 Thu 14:17] SCHEDULED: <2023-08-31 Thu>
**** DONE Vérifier variant 63126867
CLOSED: [2023-08-31 Thu 10:52] SCHEDULED: <2023-08-31 Thu>
**** DONE Ajouter tronquant ou non
CLOSED: [2023-08-31 Thu 22:08] SCHEDULED: <2023-08-31 Thu>
**** DONE Ajouter récessif
CLOSED: [2023-08-31 Thu 22:08] SCHEDULED: <2023-08-31 Thu>
**** KILL Corriger allelic depth
CLOSED: [2023-08-31 Thu 11:18] SCHEDULED: <2023-08-31 Thu>
Problème lié à libre office
**** DONE Regénérer annotation pour na12878, inserted et patient PEX1
CLOSED: [2023-08-31 Thu 22:08] SCHEDULED: <2023-08-31 Thu>
**** TODO ACMG incidental
**** DONE Sortie VCF (pour avoir la fraction allélique AF)
CLOSED: [2023-08-28 Mon 17:22]
**** DONE VCF -> tsv avec bcftools
CLOSED: [2023-08-29 Tue 11:03] SCHEDULED: <2023-08-28 Mon>
**** DONE Un seul transcrit après VEP avec filter_vep :filter:
CLOSED: [2023-08-29 Tue 11:03] SCHEDULED: <2023-08-28 Mon>
Avec mise à jour VEP 110, pick_flag semble fonctionner.
***** DONE Test chr20: Pas de variant "perdus"
CLOSED: [2023-08-28 Mon 17:31] SCHEDULED: <2023-08-28 Mon>
contrairement au résultat communiqué à alexis par mail
#+begin_src sh :dir out/annotate
bcftools +counts vep/NA12878-sanger-chr20-GRCh38/NA12878-sanger-chr20-GRCh38.vep.vcf.gz
#+end_src
Number of samples: 1
Number of SNPs: 123
Number of INDELs: 32
Number of MNPs: 53
Number of others: 0
Number of sites: 208
#+begin_src sh
filter_vep -i vep/NA12878-sanger-chr20-GRCh38/NA12878-sanger-chr20-GRCh38.vep.vcf.gz --filter 'PICK' | bcftools +counts
#+end_src
Number of samples: 1
Number of SNPs: 123
Number of INDELs: 32
Number of MNPs: 53
Number of others: 0
Number of sites: 208
2nd vérification
#+begin_src sh :dir out/annotate
filter_vep -i vep/NA12878-sanger-chr20-GRCh38/NA12878-sanger-chr20-GRCh38.vep.vcf.gz --filter 'PICK' --soft_filter | grep fail
#+end_src
***** DONE Test NA12878 + variants sanger : variants perdus avec --pick ?
CLOSED: [2023-08-29 Tue 10:36] SCHEDULED: <2023-08-28 Mon>
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/out/annotate
~/.nix-profile/bin/bcftools +counts vep/NA12878-sanger-all-GRCh38/NA12878-sanger-all-GRCh38.vep.vcf.gz
#+end_src
#+RESULTS:
| Number | of | samples: | 1 |
| Number | of | SNPs: | 6293 |
| Number | of | INDELs: | 1515 |
| Number | of | MNPs: | 1588 |
| Number | of | others: | 0 |
| Number | of | sites: | 9322 |
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/out/annotate
~/.nix-profile/bin/filter_vep -i vep/NA12878-sanger-all-GRCh38/NA12878-sanger-all-GRCh38.vep.vcf.gz --filter 'PICK' | bcftools +counts
#+end_src
| Number | of | samples: | 1 |
| Number | of | SNPs: | 6293 |
| Number | of | INDELs: | 1515 |
| Number | of | MNPs: | 1588 |
| Number | of | others: | 0 |
| Number | of | sites: | 9322 |
***** DONE Test NA12878 + variants sanger: vérifier sortie avec julia : ok
CLOSED: [2023-08-29 Tue 10:21] SCHEDULED: <2023-08-28 Mon>
143 variants/146 comme avant
***** DONE Relancer en T2T pour vérifier compatibilité :T2T:
CLOSED: [2023-08-29 Tue 11:03] SCHEDULED: <2023-08-29 Tue>
**** DONE Repasser les tests sanger sur NA12878
CLOSED: [2023-09-01 Fri 10:32] SCHEDULED: <2023-08-31 Thu>
2 variants manquants après filter vep
**** DONE Choisir le meilleur transcript nous-meme
CLOSED: [2023-09-01 Fri 10:32] SCHEDULED: <2023-09-01 Fri>
**** DONE Vérifier T2T passe
CLOSED: [2023-08-31 Thu 22:10] SCHEDULED: <2023-08-31 Thu>
**** DONE Revoir choix du transcrit + filtre avec paul
CLOSED: [2023-09-08 Fri 22:46] SCHEDULED: <2023-09-06 Wed>
**** DONE Filtrer les variants selon les filtres d'Alexis et garder tous les résultat
CLOSED: [2023-09-10 Sun 15:39] SCHEDULED: <2023-09-09 Sat>
**** DONE Ajout colonne MANE SELECT et garder les autres
CLOSED: [2023-09-10 Sun 15:39] SCHEDULED: <2023-09-09 Sat>
**** DONE v1.0
CLOSED: [2023-09-11 Mon 19:11] SCHEDULED: <2023-09-09 Sat>
***** DONE Branche prod
CLOSED: [2023-09-10 Sun 15:44] SCHEDULED: <2023-09-09 Sat>
Merge depuis debug
***** DONE Mail alexis
CLOSED: [2023-09-01 Fri 10:32] SCHEDULED: <2023-08-31 Thu>
***** DONE Relancer test sanger
CLOSED: [2023-09-11 Mon 19:11] SCHEDULED: <2023-09-10 Sun>
***** DONE Mail Paul pour validation
CLOSED: [2023-09-11 Mon 19:11] SCHEDULED: <2023-09-10 Sun>
**** DONE Utiliser spliceAI >= 0.2 pour filtre au lieu de spip
CLOSED: [2023-09-11 Mon 21:48] SCHEDULED: <2023-09-11 Mon>
**** TODO Repasser tests sanger avec spliceAI
SCHEDULED: <2023-09-11 Mon>
*** KILL Comparer les annotations sur 63003856
CLOSED: [2023-08-28 Mon 17:28]
**** Relancer le nouveau pipeline
*** KILL Ancienne version
CLOSED: [2023-08-28 Mon 17:24]
**** KILL HGVS
CLOSED: [2023-08-28 Mon 17:24]
**** KILL Filtrer après VEP
CLOSED: [2023-08-28 Mon 17:24]
**** KILL OMIM
CLOSED: [2023-08-28 Mon 17:24]
**** KILL clinvar
CLOSED: [2023-08-28 Mon 17:24]
**** KILL ACMG incidental
CLOSED: [2023-08-28 Mon 17:24]
**** KILL Grantham
CLOSED: [2023-08-28 Mon 17:24]
**** KILL LRG
CLOSED: [2023-04-18 mar. 17:22] SCHEDULED: <2023-04-18 Tue>
Vu avec alexis, n’est plus à jour
**** KILL Gnomad
CLOSED: [2023-08-28 Mon 17:24]
*** DONE Réordonner les colonnes :annotation:
CLOSED: [2023-08-31 Thu 10:38] SCHEDULED: <2023-08-28 Mon>
Pas d'OMIM, pas de CADD, pas de spliceAI
** DONE Porter exactement la version d'Alexis sur Helios
CLOSED: [2023-01-14 Sat 17:56]
Branche "prod"
** KILL Tester version d'alexis avec Nix
CLOSED: [2023-06-14 Wed 22:37]
*** DONE Ajouter clinvar
CLOSED: [2022-11-13 Sun 19:37]
*** DONE Alignement
CLOSED: [2022-11-13 Sun 12:52]
*** DONE Haplotype caller
CLOSED: [2022-11-13 Sun 13:00]
*** KILL Filter
CLOSED: [2023-06-14 Wed 22:37]
- [X] depth
- [ ] comon snp not path
Problème avec liste des ID
**** KILL variant annotation
CLOSED: [2023-06-14 Wed 22:37]
Besoin de vep
*** KILL Variant calling
CLOSED: [2023-06-14 Wed 22:37]
** KILL Tester sarek
CLOSED: [2023-08-12 Sat 15:53]
#+begin_src sh
module load apptainer/1.1.8
nextflow run nf-core/sarek -profile test,singularity --outdir test-sarek
#+end_src
Les dépendences ne se téléchargent pas correctement, on les extrait à la main
#+begin_src sh
rg -IN galaxyproject modules | sed 's/ //g;s/:$//' | sort | uniq > deps.txt
#+end_src
Nettoyage à la main
Puis
#+begin_src sh
cat deps.txt | xargs -L1 singularity pull
#+end_src
** DONE Support pour samplesheet
CLOSED: [2023-08-03 Thu 14:24] SCHEDULED: <2023-08-03 Thu 13:00>
/Entered on/ [2023-08-03 Thu 13:12]
** DONE Petit jeu de données : chr22 sur HG001
CLOSED: [2023-08-05 Sat 14:21] SCHEDULED: <2023-08-05 Sat>
* Documentation
:PROPERTIES:
:CATEGORY: doc
:END:
** DONE Procédure d'installation nix + dependences pour VM CHU
CLOSED: [2023-04-22 Sat 15:27] SCHEDULED: <2023-04-13 Thu>
* Manuscript
:PROPERTIES:
:CATEGORY: manuscript
:END:
* Tests :tests:
** KILL Non régression : version prod
CLOSED: [2023-05-23 Tue 08:46]
*** DONE ID common snp
CLOSED: [2022-11-19 Sat 21:36]
#+begin_src
$ wc -l ID_of_common_snp.txt
23194290 ID_of_common_snp.txt
$ wc -l /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
23194290 /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
#+end_src
*** DONE ID common snp not clinvar patho
CLOSED: [2022-12-11 Sun 20:11]
**** DONE Vérification du problème
CLOSED: [2022-12-11 Sun 16:30]
Sur le J:
21155134 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref
Version de "non-régression"
21155076 database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt
Nouvelle version
23193391 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt
Si on enlève les doublons
$ sort database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt | uniq > old.txt
$ wc -l old.txt
21107097 old.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt | uniq > new.txt
$ wc -l new.txt
21174578 new.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref | uniq > ref.txt
$ wc -l ref.txt
21107155 ref.txt
Si on regarde la différence
comm -23 ref.txt old.txt
rs1052692
rs1057518973
rs1057518973
rs11074121
rs112848754
rs12573787
rs145033890
rs147889095
rs1553904159
rs1560294695
rs1560296615
rs1560310926
rs1560325547
rs1560342418
rs1560356225
rs1578287542
...
On cherche le premier
bcftools query -i 'ID="rs1052692"' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 1619351 C A,T
Il est bien patho...
$ bcftools query -i 'POS=1619351' database/clinvar/clinvar.vcf.gz -f '%CHROM %POS %REF %ALT %INFO/CLNSIG\n'
19 1619351 C T Conflicting_interpretations_of_pathogenicity
On vérifie pour tous les autres
$ comm -23 ref.txt old.txt > tocheck.txt
On génère les régions à vérifier (chromosome number:position)
$ bcftools query -i 'ID=@tocheck.txt' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM\t%POS\n' > tocheck.pos
On génère le mapping inverse (chromosome number -> NC)
$ awk ' { t = $1; $1 = $2; $2 = t; print; } ' database/RefSeq/refseq_to_number_only_consensual.txt > mapping.txt
On remap clinvar
$ bcftools annotate --rename-chrs mapping.txt database/clinvar/clinvar.vcf.gz -o clinvar_remapped.vcf.gz
$ tabix clinvar_remapped.vcf.gz
Enfin, on cherche dans clinvar la classification
$ bcftools query -R tocheck.pos clinvar_remapped.vcf.gz -f '%CHROM %POS %INFO/CLNSIG\n'
$ bcftools query -R tocheck.pos database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %ID \n' | grep '^NC'
#+RESULTS:
**** DONE Comprendre pourquoi la nouvelle version donne un résultat différent
CLOSED: [2022-12-11 Sun 20:11]
***** DONE Même version dbsnp et clinvar ?
CLOSED: [2022-12-10 Sat 23:02]
Clinvar différent !
$ bcftools stats clinvar.gz
clinvar (Alexis)
SN 0 number of samples: 0
SN 0 number of records: 1492828
SN 0 number of no-ALTs: 965
SN 0 number of SNPs: 1338007
SN 0 number of MNPs: 5562
SN 0 number of indels: 144580
SN 0 number of others: 3714
SN 0 number of multiallelic sites: 0
SN 0 number of multiallelic SNP sites: 0
clinvar (new)
SN 0 number of samples: 0
SN 0 number of records: 1493470
SN 0 number of no-ALTs: 965
SN 0 number of SNPs: 1338561
SN 0 number of MNPs: 5565
SN 0 number of indels: 144663
SN 0 number of others: 3716
SN 0 number of multiallelic sites: 0
SN 0 number of multiallelic SNP sites: 0
***** DONE Mettre à jour clinvar et dbnSNP pour travailler sur les mêm bases
CLOSED: [2022-12-11 Sun 12:10]
Problème persiste
***** DONE Supprimer la conversion en int du chromosome
CLOSED: [2022-12-10 Sat 19:29]
***** KILL Même NC ?
CLOSED: [2022-12-10 Sat 19:29]
$ zgrep "contig=<ID=NC_\(.*\)" clinvar/GRCh38/clinvar.vcf.gz > contig.clinvar
$ diff contig.txt contig.clinvar
< ##contig=<ID=NC_012920.1>
***** DONE Tester sur chromosome 19: ok
CLOSED: [2022-12-11 Sun 13:53]
On prépare les données
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/dbSNP_common.vcf.gz -o dbSNP_common_19.vcf.gz
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/clinvar/GRCh38/clinvar.vcf.gz -o clinvar_19.vcf.gz
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data-alexis/dbSNP/dbSNP_common.vcf.gz -o dbSNP_common_19_old.vcf.gz
bcftools filter -i 'CHROM="19"' /Work/Groups/bisonex/data-alexis/clinvar/clinvar.vcf.gz -o clinvar_19_old.vcf.gz
#+end_src
On récupère les 2 versions du script
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
git checkout regression ../../script/pythonScript/clinvar_sbSNP.py
cp ../../script/pythonScript/clinvar_sbSNP.py clinvar_sbSNP_old.py
git checkout HEAD ../../script/pythonScript/clinvar_sbSNP.py
#+end_src
#+RESULTS:
On compare
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
python ../../script/pythonScript/clinvar_sbSNP.py clinvar_sbSNP.py --clinvar clinvar_19.vcf.gz --dbSNP dbSNP_common_19.vcf.gz --output tmp.txt
sort tmp.txt | uniq > new.txt
table=/Work/Groups/bisonex/data-alexis/RefSeq/refseq_to_number_only_consensual.txt
python clinvar_sbSNP_old.py --clinvar clinvar_19_old.vcf.gz --dbSNP dbSNP_common_19_old.vcf.gz --output tmp_old.txt --chrm_name_table $table
sort tmp_old.txt | uniq > old.txt
wc -l old.txt new.txt
#+end_src
#+RESULTS:
| 535155 | old.txt |
| 535194 | new.txt |
| 1070349 | total