apraga/org - Change G7KNVIJW7ORXWK4IX3VHPW5DIILVOT6W7U6DEDLNUZWHNDGYWKEQC

Saving tasks

Created by Alexis Praga on October 12, 2023

G7KNVIJW7ORXWK4IX3VHPW5DIILVOT6W7U6DEDLNUZWHNDGYWKEQC

Dependencies

In channels

main

Change contents

Insertion in workout.org at line 6739 [4.1]

[2.169]

* <2023-10-11 Wed> Tricking
Bkick (amélioration avec exercices) + crampes

Replacement in projects.org at line 127 [4.123895]

B:BD[5.1606] → [3.15:92]

*** TODO Passe 2: cours non tobmé avec interro + révision + interro [5/34]

[5.1606]

[6.4058]

*** TODO Passe 2: cours non tobmé avec interro + révision + interro [9/35]

Replacement in projects.org at line 135 [4.123895]

B:BD[5.1922] → [5.1922:1992]

  - [ ] <(Chlamydia - mycoplasmes)>
  - [ ] <(Clostridium difficile)>

[5.1922]

[5.1992]

  - [ ] Mécanisme résistance antibio
  - [X] <(Chlamydia - mycoplasmes)>
  - [X] <(Clostridium difficile)>

Replacement in projects.org at line 146 [4.123895]
B:BD[5.2277] → [5.2277:2299]
```
  - [ ] <(Gonocoque)>
```
[5.2277]
[5.2299]
```
  - [X] <(Gonocoque)>
```
Replacement in projects.org at line 157 [4.123895]
∅:D[3.127] → [5.2608:2642]
B:BD[5.2608] → [5.2608:2642]
```
  - [ ] <(Serologie bacterienne)>
```
[3.127]
[5.2642]
```
  - [X] <(Serologie bacterienne)>
```
Replacement in projects.org at line 179 [4.123895]
B:BD[7.1440] → [7.1440:1468]
```
SCHEDULED: <2023-10-12 Thu>
```
[7.1440]
[7.1468]
```
SCHEDULED: <2023-10-16 Mon>
```
Replacement in projects.org at line 181 [4.123895]
B:BD[7.1493] → [7.1493:1521]
```
SCHEDULED: <2023-10-12 Thu>
```
[7.1493]
[3.128]
```
SCHEDULED: <2023-10-16 Mon>
```

Replacement in projects.org at line 188 [4.123895]

B:BD[5.3163] → [5.3163:3258]

*** TODO Première version avec contexte, Torres et nos résultats
SCHEDULED: <2023-10-08 Sun>

[5.3163]

[5.3258]

*** DONE Première version avec contexte, Torres et nos résultats
CLOSED: [2023-10-11 Wed 21:47] SCHEDULED: <2023-10-08 Sun>

Replacement in projects.org at line 499 [4.123895]
∅:D[8.570] → [9.503:532]
∅:D[10.675] → [9.503:532]
∅:D[11.1051] → [9.503:532]
∅:D[12.2617] → [9.503:532]
∅:D[5.3458] → [9.503:532]
∅:D[13.7756] → [9.503:532]
B:BD[9.503] → [9.503:532]
B:BD[9.532] → [5.3459:3487]
```
**** TODO Traitement FBXW7 ?
SCHEDULED: <2023-10-15 Sun>
```
[5.3458]
[14.453]
```
**** DONE Traitement FBXW7 ?
CLOSED: [2023-10-11 Wed 16:08] SCHEDULED: <2023-10-15 Sun>
Lié à NF1 et non FBXW7 (voir mail V. Laithier)
```

Replacement in projects.org at line 804 [4.123895]

B:BD[7.1953] → [15.320:362]

B:BD[15.362] → [3.360:388]

** TODO Coudre fermeture éclair pantalon
SCHEDULED: <2023-10-11 Wed>

[7.1953]

[15.390]

** DONE Coudre fermeture éclair pantalon
CLOSED: [2023-10-11 Wed 12:48] SCHEDULED: <2023-10-11 Wed>

Insertion in projects.org at line 807 [4.123895]

[15.426]

[16.1513]

** WAIT certificat de baptême
SCHEDULED: <2023-10-18 Wed>
/Entered on/ [2023-10-11 Wed 23:13]
Message envoyé sur le site 2023-10-11 Wed

Replacement in projects.org at line 865 [4.123895]
B:BD[17.397] → [7.1954:1982]
```
SCHEDULED: <2023-10-11 Wed>
```
[17.397]
[17.425]
```
SCHEDULED: <2023-10-13 Fri>
```

Insertion in projects.org at line 956 [4.123895]

[18.102]

[19.124]

** Julia
:PROPERTIES:
:CATEGORY: julia
:END:
*** TODO Juliacon 2023
:PROPERTIES:
:ID:       42f6a7bf-ac90-4737-884e-c35187776a4c
:END:
- [ ] <[Graphs, matrices]> - <yt-play "X2JEWdCFf70">
- [ ] <[Alan, julia and climate]> - <yt-play "SclkiqCn4Cs">
- [ ] <[Sherlocks Homes, mathematics and julia]> - <yt-play "zX-U6-6Prso">
- [ ] <[Sound synthesis]> - <yt-play "SvnDr9nnOZs">
- [ ] <[JuMP by example]> - <yt-play "rIan_XbYyaM">
- [ ] <[neurophysiological symbolic modeling]> - <yt-play "qC6tzsn8Uxc">
- [ ] <yt-play "ipDCx174Qkw">
- [ ] <yt-play "hKa2eTeb_lo">
- [ ] <yt-play "4omFGfcvvOY">
- [ ] <yt-play "d7SA36kVaq0">
- [ ] <yt-play "5uF3VqgjiVE">
- [ ] <yt-play "jIuRXzo4m38">
- [ ] <yt-play "iUarLpmZmco">
- [ ] <yt-play "WVT9wJegC6Q">
- [ ] <yt-play "ZVvP7rAIvkE">
- [ ] <yt-play "RXjjTQffen0">
- [ ] <yt-play "TpyHGaCB8P4">
- [ ] <yt-play "ksh-CNM2YJU">
- [ ] <yt-play "_sZdWVZeKqI">
- [ ] <yt-play "_Y6mNrN7eWA">
- [ ] <yt-play "tnw_BI2tRaA">
- [ ] <yt-play "qgmgg_Bzgyg">
- [ ] <yt-play "Nlq3J7PCB_Q">
- [ ] <yt-play "ruxYAY5_bfE">
- [ ] <yt-play "-C7Zbh6UTgk">
- [ ] <[machine learing for biological data]> - <yt-play "Q9eYgwvJfWE">
- [X] <[Genomic analysis]> - <yt-play "egWrDz6RDRs">
- [ ] <[MRI denoising]> - <yt-play "dOsuIBUUDc4">
- [ ] <[modeling neural control circuit]> - <yt-play "f2XVrDoF35A">
- [ ] <[earth system software]> - <yt-play "O2rANteGTTY">
- [ ] <[fracture]> - <yt-play "6zt-TEUuMu8">
- [ ] <[fluid dynamic]> - <yt-play "R9b1xiqQtC8">
- [ ] <[Parquet]> - <yt-play "-QRacAGsxOI">
- [ ] <[pipeline]> - <yt-play "ECERq8BHvn4">

Replacement in projects/bisonex.org at line 22 [20.35]

B:BD[14.5359] → [14.5359:8775]

B:BD[14.8775] → [6.24968:29744]

 avec Nix ?
Voir https://academic.oup.com/gigascience/article/9/11/giaa121/5987272?login=false
pour un exemple.
Probablement plus simple d’utiliser Nix pour gestion de l’environnement et snakemake pour l’exécution
Pas d’accès internet depuis le cluster
*** DONE nextflow
CLOSED: [2022-09-13 Tue 21:37]
**** TODO Bug scheduler SGE
Le job se fait tuer car l'utilisateur n'est pas passé correctement à nextflow
***** DONE Forcer l'utilisateur à l'exécution
CLOSED: [2023-04-01 Sat 17:57]
NXF_OPTS=-D"user.name=alex"
***** DONE Vérifier si le problème persiste avec 22.10.6
CLOSED: [2023-04-01 Sat 18:38] SCHEDULED: <2023-04-01 Sat>
oui
***** KILL Packager l'utilisateur dans le programme ?
Mauvaise idée..
*** DONE Diminuer mémoire pour haplotypecaller
CLOSED: [2023-09-20 Wed 21:44] SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 15:30]
Medium = 32Go pour 6 coeurs => 4 jobs (donc tout le noeud) prend plus que les 96GB...
On essaie 16Gb
Puis commit
*** DONE Report multiqc avec 10 runs
CLOSED: [2023-09-19 Tue 15:31] SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 15:31]
Cf mail 2023-09-19
*** DONE Bug: variant sur 7788314 pour patient 62982193 filtré : DP < 30
CLOSED: [2023-10-02 Mon 21:58] SCHEDULED: <2023-09-25 Mon>
/Entered on/ [2023-09-22 Fri 22:59]
35 selon IGV mais 27 en pratique dans le VCF.
VCF cento: 26 reads également...
VOUS, non confirmé sanger
Mail envoyé Alexis
Vu avec Paul : on laisse DP >= 30 si c'est la seule occurence
** DONE Mettre à jour spip pour corriger bug 62982239 : variant trop long (?)
CLOSED: [2023-09-22 Fri 22:22] SCHEDULED: <2023-09-21 Thu>
/Entered on/ [2023-09-21 Thu 23:11]
Rapporté par https://github.com/raphaelleman/SPiP/issues/9
*** DONE Relance run
CLOSED: [2023-09-22 Fri 22:22] SCHEDULED: <2023-09-21 Thu>
*** DONE Mise à jour spip
CLOSED: [2023-09-21 Thu 23:41] SCHEDULED: <2023-09-21 Thu>
** DONE nixpkgs unstable -> 23.05
CLOSED: [2023-09-22 Fri 22:22] SCHEDULED: <2023-09-21 Thu>
*** DONE repasser tests sanger
CLOSED: [2023-09-22 Fri 22:22] SCHEDULED: <2023-09-21 Thu>
** TODO Preprocessing avec nextflow
*** TODO Map to reference
**** TODO Sample ID dans header
/Work/Users/apraga/bisonex/out/63003856_S135/preprocessing/baserecalibrator
*** DONE Mark duplicate
CLOSED: [2022-10-09 Sun 22:30]
*** DONE Recalibrate base quality score
CLOSED: [2022-10-09 Sun 22:30]
** DONE Variant calling avec Nextflow
CLOSED: [2022-11-19 Sat 21:34]
*** DONE Haplotype caller
CLOSED: [2022-10-09 Sun 22:40]
*** DONE Filter variants
CLOSED: [2022-10-09 Sun 22:40]
*** DONE Filter common snp not clinvar path
CLOSED: [2022-11-07 Mon 23:00]
Voir [[*common dbSNP not clinvar patho][common dbSNP not clinvar patho]]
*** DONE Filter variant only in consensual sequence
CLOSED: [2022-11-08 Tue 22:23]
*** DONE Filter technical variants
CLOSED: [2022-11-19 Sat 21:34]
*** DONE Utilise AVX pour accélerer l'exécution
CLOSED: [2023-04-29 Sat 15:46]
Sans cela, on a l'avertissement
#+begin_quote
17:28:00.720 INFO  PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported
17:28:00.721 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/nix/store/cy9ckxqwrkifx7wf02hm4ww1p6lnbxg9-gatk-4.2.4.1/bin/gatk-package-4.2.4.1-local.jar!/com/intel/gkl/native/libgkl_utils.so
17:28:00.733 WARN  NativeLibraryLoader - Unable
 to load libgkl_utils.so from native/libgkl_utils.so (/Work/Users/apraga/bisonex/out/NA12878_NIST7035/preprocessing/applybqsr/libgkl_utils821485189051585397.so: libgomp.so.1: cannot open shared object file: No such file or directory)
17:28:00.733 WARN  IntelPairHmm - Intel GKL Utils not loaded
17:28:00.733 WARN  PairHMM - ***WARNING: Machine does not have the AVX instruction set support needed for the accelerated AVX PairHmm. Falling back to the MUCH slower LOGLESS_CACHING implementation!
17:28:00.763 INFO  ProgressMeter - Starting traversal
#+end_quote
libgomp.so est fourni par gcc donc il faut charger le module
 module load gcc@11.3.0/gcc-12.1.0
** KILL Utiliser subworkflow
CLOSED: [2023-04-02 Sun 18:08]
Notre version permet d'être plus souple
*** KILL Alignement
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
*** KILL Vep
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
vcf_annotate_ensemblvep
** TODO Annotation avec nextflow :annotation:
*** KILL VEP : --gene-phenotype ?
CLOSED: [2023-04-18 mar. 18:32]
Vu avec alexis : bases de données non à jour
https://www.ensembl.org/info/genome/variation/phenotype/sources_phenotype_documentation.html
*** DONE plugin VEP
CLOSED: [2023-04-18 mar. 18:32]
Cloner dépôt git avec plugin
Puis utiliser --dir_plugins
*** HOLD Utiliser code d’Alexis
*** TODO Nouvelle version avec VEP
Example avec --custom
https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html
**** DONE Ajout spliceAI
CLOSED: [2023-05-18 Thu 11:02] SCHEDULED: <2023-04-30 Sun>
plugin VEP
***** DONE Télécharger les données
CLOSED: [2023-05-11 Thu 19:01]
Difficile d'automatiser, le lien est temporaire...
***** DONE PLugin
CLOSED: [2023-05-11 Thu 20:16]
***** DONE Séparer score en plusieurs colonnes
CLOSED: [2023-05-11 Thu 20:16]
Test avec ce fichier pour avoir une ligne avec annotation et une ligne sans
#CHROM	POS	ID	REF	ALT
1	9091	.	A	C
1	69091	.	A	C
et
#+begin_src sh
rm -f postvep.tsv* && vep -i testspliceai.vcf.gz -o postvep.tsv --tab  --dir 109 --merged --pick --use_given_ref   --offline  --plugin SpliceAI,snv=spliceai_scores.raw.snv.hg38.vcf.gz,indel=spliceai_scores.raw.indel.hg38.vcf.gz
#+end_src
#+begin_src
$ bgzip postvep.tsv
$ python spliceai.py
$ cat postvep2.tsv
,variation,Location,Allele,Gene,Feature,Feature_type,Consequence,cDNA_position,CDS_position,Protein_position,Amino_acids,Codons,Existing_variation,IMPACT,DISTANCE,STRAND,FLAGS,REFSEQ_MATCH,SOURCE,REFSEQ_OFFSET,SpliceAI_AG,SpliceAI_AL,SpliceAI_DG,SpliceAI_DL
0,1_9091_A/C,1:9091,C,ENSG00000290825,ENST00000456328,Transcript,upstream_gene_variant,-,-,-,-,-,-,MODIFIER,2778,1,-,-,Ensembl,-,,,,
1,1_69091_A/C,1:69091,C,ENSG00000186092,ENST00000641515,Transcript,missense_variant,124,64,22,M/L,Atg/Ctg,-,MODERATE,-,1,-,-,Ensembl,-,0.01,0.00,0.00,0.01
#+end_src
Test
cp work/bf/437ae511958509e43072f032f4d495/small.tab.gz tests/vep-spip.tab.gz
cp work/d5/3b1244b5ae83d54409ee0d456e8c55/small_cadd.tab.gz tests/vep-cadd-splice.tab.gz
**** DONE Package Nix spliceAI
CLOSED: [2023-10-07 Sat 18:00]
On utilise le tensorflow fourni avec nix (branche spliceai)
Il faut LD_PRELOAD=/lib64/libcuda.so  pour l'exécution
***** DONE Version CPU non optimisé
CLOSED: [2023-09-23 Sat 21:34]
****** DONE Vérifier annotation hg19 et 38
CLOSED: [2023-09-23 Sat 18:49]
****** DONE Annotation maison T2T
CLOSED: [2023-09-23 Sat 21:33] SCHEDULED: <2023-09-23 Sat>
***** DONE Version CPU optismisée
CLOSED: [2023-09-23 Sat 21:34]
Activer flag dans le package nix
***** DONE Version CPU:  Test chr20
CLOSED: [2023-09-23 Sat 22:25] SCHEDULED: <2023-09-23 Sat>
10min ?
***** DONE Version CPU:  NA12878 sanger complet: kill car trop long
CLOSED: [2023-09-24 Sun 08:22] SCHEDULED: <2023-09-23 Sat>
10min ?
***** DONE GPU avec la version sur mésocentre: mail envoyé
CLOSED: [2023-09-26 Tue 11:49]
#+begin_src sh
module unload nix/2.11.0
module load anaconda3@2021.05/gcc-12.1.0
module load deep/tensorflow-gpu
pip install spliceai
#+end_src
Puis on teste sur la queue gpu
#+begin_src sh
srun -p gpu -t 4:00:00 --gres=gpu:1 --pty bash
cd /Work/Users/apraga/bisonex/tests/spliceai/
module load deep/tensorflow-gpu
module unload nix/2.11.0
time spliceai -I NA12878-sanger-chr20-T2T.vep.vcf.gz -O output-20-2.vcf -R /Work/Groups/bisonex/data/fasta/chm13v2.0/chm13v2.0.fa -A ~/t2t.txt
#+end_src
Échec: librarie DNN not found...
#+begin_quote
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-24 09:37:45.892545: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-09-24 09:37:47.759421: I tensorflow/core/common_

[14.5359]

[6.29744]

 avec Nix ?
Voir https://academic.oup.com/gigascience/article/9/11/giaa121/5987272?login=false
pour un exemple.
Probablement plus simple d’utiliser Nix pour gestion de l’environnement et snakemake pour l’exécution
Pas d’accès internet depuis le cluster
*** DONE nextflow
CLOSED: [2022-09-13 Tue 21:37]
**** TODO Bug scheduler SGE
Le job se fait tuer car l'utilisateur n'est pas passé correctement à nextflow
***** DONE Forcer l'utilisateur à l'exécution
CLOSED: [2023-04-01 Sat 17:57]
NXF_OPTS=-D"user.name=alex"
***** DONE Vérifier si le problème persiste avec 22.10.6
CLOSED: [2023-04-01 Sat 18:38] SCHEDULED: <2023-04-01 Sat>
oui
***** KILL Packager l'utilisateur dans le programme ?
Mauvaise idée..
*** DONE Diminuer mémoire pour haplotypecaller
CLOSED: [2023-09-20 Wed 21:44] SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 15:30]
Medium = 32Go pour 6 coeurs => 4 jobs (donc tout le noeud) prend plus que les 96GB...
On essaie 16Gb
Puis commit
*** DONE Report multiqc avec 10 runs
CLOSED: [2023-09-19 Tue 15:31] SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 15:31]
Cf mail 2023-09-19
*** DONE Bug: variant sur 7788314 pour patient 62982193 filtré : DP < 30
CLOSED: [2023-10-02 Mon 21:58] SCHEDULED: <2023-09-25 Mon>
/Entered on/ [2023-09-22 Fri 22:59]
35 selon IGV mais 27 en pratique dans le VCF.
VCF cento: 26 reads également...
VOUS, non confirmé sanger
Mail envoyé Alexis
Vu avec Paul : on laisse DP >= 30 si c'est la seule occurence
*** TODO Bug mésohelios: les jobs se font killer :bug:
/Entered on/ [2023-10-11 Wed 12:06]
**** DONE Comprendre pourquoi
CLOSED: [2023-10-11 Wed 16:06] SCHEDULED: <2023-10-11 Wed>
Utilisateurs déconnectés à 4h du matin tous les jours
**** STRT Démarrer nextflow avec sbatch
SCHEDULED: <2023-10-11 Wed>
**** TODO Mettre à jour documentation
SCHEDULED: <2023-10-13 Fri>
** DONE Mettre à jour spip pour corriger bug 62982239 : variant trop long (?)
CLOSED: [2023-09-22 Fri 22:22] SCHEDULED: <2023-09-21 Thu>
/Entered on/ [2023-09-21 Thu 23:11]
Rapporté par https://github.com/raphaelleman/SPiP/issues/9
*** DONE Relance run
CLOSED: [2023-09-22 Fri 22:22] SCHEDULED: <2023-09-21 Thu>
*** DONE Mise à jour spip
CLOSED: [2023-09-21 Thu 23:41] SCHEDULED: <2023-09-21 Thu>
** DONE nixpkgs unstable -> 23.05
CLOSED: [2023-09-22 Fri 22:22] SCHEDULED: <2023-09-21 Thu>
*** DONE repasser tests sanger
CLOSED: [2023-09-22 Fri 22:22] SCHEDULED: <2023-09-21 Thu>
** TODO Preprocessing avec nextflow
*** TODO Map to reference
**** TODO Sample ID dans header
/Work/Users/apraga/bisonex/out/63003856_S135/preprocessing/baserecalibrator
*** DONE Mark duplicate
CLOSED: [2022-10-09 Sun 22:30]
*** DONE Recalibrate base quality score
CLOSED: [2022-10-09 Sun 22:30]
** DONE Variant calling avec Nextflow
CLOSED: [2022-11-19 Sat 21:34]
*** DONE Haplotype caller
CLOSED: [2022-10-09 Sun 22:40]
*** DONE Filter variants
CLOSED: [2022-10-09 Sun 22:40]
*** DONE Filter common snp not clinvar path
CLOSED: [2022-11-07 Mon 23:00]
Voir [[*common dbSNP not clinvar patho][common dbSNP not clinvar patho]]
*** DONE Filter variant only in consensual sequence
CLOSED: [2022-11-08 Tue 22:23]
*** DONE Filter technical variants
CLOSED: [2022-11-19 Sat 21:34]
*** DONE Utilise AVX pour accélerer l'exécution
CLOSED: [2023-04-29 Sat 15:46]
Sans cela, on a l'avertissement
#+begin_quote
17:28:00.720 INFO  PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported
17:28:00.721 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/nix/store/cy9ckxqwrkifx7wf02hm4ww1p6lnbxg9-gatk-4.2.4.1/bin/gatk-package-4.2.4.1-local.jar!/com/intel/gkl/native/libgkl_utils.so
17:28:00.733 WARN  NativeLibraryLoader - Unable to load libgkl_utils.so from native/libgkl_utils.so (/Work/Users/apraga/bisonex/out/NA12878_NIST7035/preprocessing/applybqsr/libgkl_utils821485189051585397.so: libgomp.so.1: cannot open shared object file: No such file or directory)
17:28:00.733 WARN  IntelPairHmm - Intel GKL Utils not loaded
17:28:00.733 WARN  PairHMM - ***WARNING: Machine does not have the AVX instruction set support needed for the accelerated AVX PairHmm. Falling back to the MUCH slower LOGLESS_CACHING implementation!
17:28:00.763 INFO  ProgressMeter - Starting traversal
#+end_quote
libgomp.so est fourni par gcc donc il faut charger le module
 module load gcc@11.3.0/gcc-12.1.0
** KILL Utiliser subworkflow
CLOSED: [2023-04-02 Sun 18:08]
Notre version permet d'être plus souple
*** KILL Alignement
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
*** KILL Vep
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
vcf_annotate_ensemblvep
** TODO Annotation avec nextflow :annotation:
*** KILL VEP : --gene-phenotype ?
CLOSED: [2023-04-18 mar. 18:32]
Vu avec alexis : bases de données non à jour
https://www.ensembl.org/info/genome/variation/phenotype/sources_phenotype_documentation.html
*** DONE plugin VEP
CLOSED: [2023-04-18 mar. 18:32]
Cloner dépôt git avec plugin
Puis utiliser --dir_plugins
*** HOLD Utiliser code d’Alexis
*** TODO Nouvelle version avec VEP
Example avec --custom
https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html
**** DONE Ajout spliceAI
CLOSED: [2023-05-18 Thu 11:02] SCHEDULED: <2023-04-30 Sun>
plugin VEP
***** DONE Télécharger les données
CLOSED: [2023-05-11 Thu 19:01]
Difficile d'automatiser, le lien est temporaire...
***** DONE PLugin
CLOSED: [2023-05-11 Thu 20:16]
***** DONE Séparer score en plusieurs colonnes
CLOSED: [2023-05-11 Thu 20:16]
Test avec ce fichier pour avoir une ligne avec annotation et une ligne sans
#CHROM	POS	ID	REF	ALT
1	9091	.	A	C
1	69091	.	A	C
et
#+begin_src sh
rm -f postvep.tsv* && vep -i testspliceai.vcf.gz -o postvep.tsv --tab  --dir 109 --merged --pick --use_given_ref   --offline  --plugin SpliceAI,snv=spliceai_scores.raw.snv.hg38.vcf.gz,indel=spliceai_scores.raw.indel.hg38.vcf.gz
#+end_src
#+begin_src
$ bgzip postvep.tsv
$ python spliceai.py
$ cat postvep2.tsv
,variation,Location,Allele,Gene,Feature,Feature_type,Consequence,cDNA_position,CDS_position,Protein_position,Amino_acids,Codons,Existing_variation,IMPACT,DISTANCE,STRAND,FLAGS,REFSEQ_MATCH,SOURCE,REFSEQ_OFFSET,SpliceAI_AG,SpliceAI_AL,SpliceAI_DG,SpliceAI_DL
0,1_9091_A/C,1:9091,C,ENSG00000290825,ENST00000456328,Transcript,upstream_gene_variant,-,-,-,-,-,-,MODIFIER,2778,1,-,-,Ensembl,-,,,,
1,1_69091_A/C,1:69091,C,ENSG00000186092,ENST00000641515,Transcript,missense_variant,124,64,22,M/L,Atg/Ctg,-,MODERATE,-,1,-,-,Ensembl,-,0.01,0.00,0.00,0.01
#+end_src
Test
cp work/bf/437ae511958509e43072f032f4d495/small.tab.gz tests/vep-spip.tab.gz
cp work/d5/3b1244b5ae83d54409ee0d456e8c55/small_cadd.tab.gz tests/vep-cadd-splice.tab.gz
**** DONE Package Nix spliceAI
CLOSED: [2023-10-07 Sat 18:00]
On utilise le tensorflow fourni avec nix (branche spliceai)
Il faut LD_PRELOAD=/lib64/libcuda.so  pour l'exécution
***** DONE Version CPU non optimisé
CLOSED: [2023-09-23 Sat 21:34]
****** DONE Vérifier annotation hg19 et 38
CLOSED: [2023-09-23 Sat 18:49]
****** DONE Annotation maison T2T
CLOSED: [2023-09-23 Sat 21:33] SCHEDULED: <2023-09-23 Sat>
***** DONE Version CPU optismisée
CLOSED: [2023-09-23 Sat 21:34]
Activer flag dans le package nix
***** DONE Version CPU:  Test chr20
CLOSED: [2023-09-23 Sat 22:25] SCHEDULED: <2023-09-23 Sat>
10min ?
***** DONE Version CPU:  NA12878 sanger complet: kill car trop long
CLOSED: [2023-09-24 Sun 08:22] SCHEDULED: <2023-09-23 Sat>
10min ?
***** DONE GPU avec la version sur mésocentre: mail envoyé
CLOSED: [2023-09-26 Tue 11:49]
#+begin_src sh
module unload nix/2.11.0
module load anaconda3@2021.05/gcc-12.1.0
module load deep/tensorflow-gpu
pip install spliceai
#+end_src
Puis on teste sur la queue gpu
#+begin_src sh
srun -p gpu -t 4:00:00 --gres=gpu:1 --pty bash
cd /Work/Users/apraga/bisonex/tests/spliceai/
module load deep/tensorflow-gpu
module unload nix/2.11.0
time spliceai -I NA12878-sanger-chr20-T2T.vep.vcf.gz -O output-20-2.vcf -R /Work/Groups/bisonex/data/fasta/chm13v2.0/chm13v2.0.fa -A ~/t2t.txt
#+end_src
Échec: librarie DNN not found...
#+begin_quote
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-24 09:37:45.892545: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-09-24 09:37:47.759421: I tensorflow/core/common_

Replacement in projects/bisonex.org at line 27 [20.35]

B:BD[21.5177] → [21.5177:8689]

B:BD[21.8689] → [6.41450:46130]

ULED: <2023-09-10 Sun>
***** DONE Mail Paul pour validation
CLOSED: [2023-09-11 Mon 19:11] SCHEDULED: <2023-09-10 Sun>
**** DONE Utiliser spliceAI >= 0.2 pour filtre au lieu de spip
CLOSED: [2023-09-11 Mon 21:48] SCHEDULED: <2023-09-11 Mon>
**** DONE Repasser tests sanger avec spliceAI
CLOSED: [2023-09-14 Thu 22:45] SCHEDULED: <2023-09-11 Mon>
**** DONE Corriger colonne récessive
CLOSED: [2023-09-14 Thu 22:57] SCHEDULED: <2023-09-14 Thu>
soit 1/1, soit 1/2
soit 0/1 avec 2 variants par gene
*** KILL Comparer les annotations sur 63003856
CLOSED: [2023-08-28 Mon 17:28]
**** Relancer le nouveau pipeline
*** KILL Ancienne version
CLOSED: [2023-08-28 Mon 17:24]
**** KILL HGVS
CLOSED: [2023-08-28 Mon 17:24]
**** KILL Filtrer après VEP
CLOSED: [2023-08-28 Mon 17:24]
**** KILL OMIM
CLOSED: [2023-08-28 Mon 17:24]
**** KILL clinvar
CLOSED: [2023-08-28 Mon 17:24]
**** KILL ACMG incidental
CLOSED: [2023-08-28 Mon 17:24]
**** KILL Grantham
CLOSED: [2023-08-28 Mon 17:24]
**** KILL LRG
CLOSED: [2023-04-18 mar. 17:22] SCHEDULED: <2023-04-18 Tue>
Vu avec alexis, n’est plus à jour
**** KILL Gnomad
CLOSED: [2023-08-28 Mon 17:24]
*** DONE Réordonner les colonnes :annotation:
CLOSED: [2023-08-31 Thu 10:38] SCHEDULED: <2023-08-28 Mon>
Pas d'OMIM, pas de CADD, pas de spliceAI
*** DONE Ajouter gnomAD v3 :gnomadv3:
CLOSED: [2023-10-01 Sun 15:34] SCHEDULED: <2023-09-29 Fri>
/Entered on/ [2023-09-29 Fri 22:38]
Après discussion avec Mathieu sur le problème de certaines régions corrigées dans la v3 !
VEP utilise la v2 pour les exomes et v3 pour génomes
Il manquera pour les patients 1-107
On ne l'a pas dans la sortie de VEP jusque là
**** DONE Test sur 1 patient
CLOSED: [2023-09-30 Sat 23:54] SCHEDULED: <2023-09-29 Fri>
**** DONE Reprendre run batch
CLOSED: [2023-10-01 Sun 15:34]
** DONE Porter exactement la version d'Alexis sur Helios
CLOSED: [2023-01-14 Sat 17:56]
Branche "prod"
** KILL Tester version d'alexis avec Nix
CLOSED: [2023-06-14 Wed 22:37]
*** DONE Ajouter clinvar
CLOSED: [2022-11-13 Sun 19:37]
*** DONE Alignement
CLOSED: [2022-11-13 Sun 12:52]
*** DONE Haplotype caller
CLOSED: [2022-11-13 Sun 13:00]
*** KILL Filter
CLOSED: [2023-06-14 Wed 22:37]
- [X] depth
- [ ] comon snp not path
Problème avec liste des ID
**** KILL variant annotation
CLOSED: [2023-06-14 Wed 22:37]
Besoin de vep
*** KILL Variant calling
CLOSED: [2023-06-14 Wed 22:37]
** KILL Tester sarek
CLOSED: [2023-08-12 Sat 15:53]
#+begin_src sh
 module load apptainer/1.1.8
 nextflow run nf-core/sarek -profile test,singularity --outdir test-sarek
#+end_src
Les dépendences ne se téléchargent pas correctement, on les extrait à la main
#+begin_src sh
 rg -IN galaxyproject modules  | sed 's/ //g;s/:$//' | sort | uniq > deps.txt
#+end_src
 Nettoyage à la main
 Puis
 #+begin_src sh
 cat deps.txt | xargs -L1 singularity pull
 #+end_src
** DONE Support pour samplesheet
CLOSED: [2023-08-03 Thu 14:24] SCHEDULED: <2023-08-03 Thu 13:00>
/Entered on/ [2023-08-03 Thu 13:12]
** DONE Petit jeu de données : chr22 sur HG001
CLOSED: [2023-08-05 Sat 14:21] SCHEDULED: <2023-08-05 Sat>
** DONE Corriger OMIM annotation: manquant pour NMNAT1
CLOSED: [2023-09-16 Sat 22:47] SCHEDULED: <2023-09-16 Sat>
/Entered on/ [2023-09-16 Sat 19:32]
** PROJ Regarder la profondeur des variants rendus
/Entered on/ [2023-10-05 Thu 21:44]
* Documentation
:PROPERTIES:
:CATEGORY: doc
:END:
** DONE Procédure d'installation nix + de
pendences pour VM CHU
CLOSED: [2023-04-22 Sat 15:27] SCHEDULED: <2023-04-13 Thu>
* Bibliographie
** DONE Finir[cite:@alser2021]
CLOSED: [2023-09-26 Tue 11:26] SCHEDULED: <2023-09-22 Fri>
* Manuscript
:PROPERTIES:
:CATEGORY: manuscript
:END:
** DONE Flowchart pipeline (avec T2T)
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
** DONE Figure: nombre de publication par aligneur
CLOSED: [2023-09-19 Tue 16:54] SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
** DONE Biblio performance aligneur
CLOSED: [2023-10-06 Fri 16:51] SCHEDULED: <2023-10-01 Sun>
** TODO Biblio appel de variant
SCHEDULED: <2023-10-01 Sun>
** TODO Figure: nombre de publication par appel de variant
SCHEDULED: <2023-10-04 Wed>
/Entered on/ [2023-09-19 Tue 08:43]
** TODO Figure: nombre d'articles citant les principaux aligneur par année
SCHEDULED: <2023-10-03 Tue>
** TODO Figure: nombre d'exomes par années
SCHEDULED: <2023-10-10 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
* Tests :tests:
** KILL Non régression : version prod
CLOSED: [2023-05-23 Tue 08:46]
*** DONE ID common snp
CLOSED: [2022-11-19 Sat 21:36]
#+begin_src
$ wc -l ID_of_common_snp.txt
23194290 ID_of_common_snp.txt
$ wc -l /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
23194290 /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
#+end_src
*** DONE ID common snp not clinvar patho
CLOSED: [2022-12-11 Sun 20:11]
**** DONE Vérification du problème
CLOSED: [2022-12-11 Sun 16:30]
Sur le J:
21155134 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref
Version de "non-régression"
21155076 database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt
Nouvelle version
23193391 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt
Si on enlève les doublons
$ sort database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt | uniq > old.txt
$ wc -l old.txt
21107097 old.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt | uniq > new.txt
$ wc -l new.txt
21174578 new.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref | uniq > ref.txt
$ wc -l ref.txt
21107155 ref.txt
Si on regarde la différence
 comm -23 ref.txt old.txt
rs1052692
rs1057518973
rs1057518973
rs11074121
rs112848754
rs12573787
rs145033890
rs147889095
rs1553904159
rs1560294695
rs1560296615
rs1560310926
rs1560325547
rs1560342418
rs1560356225
rs1578287542
...
On cherche le premier
bcftools query -i 'ID="rs1052692"' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 1619351 C A,T
Il est bien patho...
$ bcftools query -i 'POS=1619351' database/clinvar/clinvar.vcf.gz -f '%CHROM %POS %REF %ALT %INFO/CLNSIG\n'
19 1619351 C T Conflicting_interpretations_of_pathogenicity
On vérifie pour tous les autres
$ comm -23 ref.txt old.txt > tocheck.txt
On génère les régions à vérifier (chromosome number:position)
$ bcftools query -i 'ID=@tocheck.txt' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM\t%POS\n' > tocheck.pos
On génère le mapping inverse (chromosome number -> NC)
$ awk ' { t = $1; $1 = $2; $2 = t; print; } ' database/RefSeq/refseq_to_number_only_consensual.txt  > mapping.txt
On remap clinvar
$ bcftools annotate --rename-chrs mapping.txt database/clinvar/clinvar.vcf.gz -o clinvar_remapped.vcf.gz
$ tabix clinvar_remapped.vcf.gz
Enfin, on cherche dans clinvar la classification
$ bcftools query -R tocheck.pos clinvar_remapped.vcf.gz -f '%CHROM %POS %INFO/CLNSIG\n'
$ bcftools query -R tocheck.pos database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %ID \n' | grep '^NC'
#+RESULTS:
**** DONE Comprendre pourquoi la nouvelle version donne un résultat différent
CLOSED: [2022-12-11 Sun 20:11]
***** DONE Même version dbsnp et clinvar ?
CLOSED: [2022-12-10 Sat 23:02]
Clinvar différent !
  $ bcftools stats clinvar.gz
  clinvar (Alexis)
SN	0	number of samples:	0
SN	0	number of records:	1492828
SN	0	number of no-ALTs:	965
SN	0	number of SNPs:	1338007
SN	0	number of MNPs:	5562
SN	0	number of indels:	144580
SN	0	number of others:	3714
SN	0	number of multiallelic sites:	0
SN	0	number of multiallelic SNP sites:	0
clinvar (new)
SN	0	number of samples:	0
SN	0	number of records:	1493470
SN	0	number of no-ALTs:	965
SN	0	number of SNPs:	1338561
SN	0	number of MNPs:	5565
SN	0	number of indels:	144663
SN	0	number of others:	3716
SN	0	number of multiallelic sites:	0
SN	0	number of multiallelic SNP sites:	0
***** DONE Mettre à jour clinvar et dbnSNP pour travailler sur les mêm bases
CLOSED: [20

[21.5177]

[6.46130]

ULED: <2023-09-10 Sun>
***** DONE Mail Paul pour validation
CLOSED: [2023-09-11 Mon 19:11] SCHEDULED: <2023-09-10 Sun>
**** DONE Utiliser spliceAI >= 0.2 pour filtre au lieu de spip
CLOSED: [2023-09-11 Mon 21:48] SCHEDULED: <2023-09-11 Mon>
**** DONE Repasser tests sanger avec spliceAI
CLOSED: [2023-09-14 Thu 22:45] SCHEDULED: <2023-09-11 Mon>
**** DONE Corriger colonne récessive
CLOSED: [2023-09-14 Thu 22:57] SCHEDULED: <2023-09-14 Thu>
soit 1/1, soit 1/2
soit 0/1 avec 2 variants par gene
*** KILL Comparer les annotations sur 63003856
CLOSED: [2023-08-28 Mon 17:28]
**** Relancer le nouveau pipeline
*** KILL Ancienne version
CLOSED: [2023-08-28 Mon 17:24]
**** KILL HGVS
CLOSED: [2023-08-28 Mon 17:24]
**** KILL Filtrer après VEP
CLOSED: [2023-08-28 Mon 17:24]
**** KILL OMIM
CLOSED: [2023-08-28 Mon 17:24]
**** KILL clinvar
CLOSED: [2023-08-28 Mon 17:24]
**** KILL ACMG incidental
CLOSED: [2023-08-28 Mon 17:24]
**** KILL Grantham
CLOSED: [2023-08-28 Mon 17:24]
**** KILL LRG
CLOSED: [2023-04-18 mar. 17:22] SCHEDULED: <2023-04-18 Tue>
Vu avec alexis, n’est plus à jour
**** KILL Gnomad
CLOSED: [2023-08-28 Mon 17:24]
*** DONE Réordonner les colonnes :annotation:
CLOSED: [2023-08-31 Thu 10:38] SCHEDULED: <2023-08-28 Mon>
Pas d'OMIM, pas de CADD, pas de spliceAI
*** DONE Ajouter gnomAD v3 :gnomadv3:
CLOSED: [2023-10-01 Sun 15:34] SCHEDULED: <2023-09-29 Fri>
/Entered on/ [2023-09-29 Fri 22:38]
Après discussion avec Mathieu sur le problème de certaines régions corrigées dans la v3 !
VEP utilise la v2 pour les exomes et v3 pour génomes
Il manquera pour les patients 1-107
On ne l'a pas dans la sortie de VEP jusque là
**** DONE Test sur 1 patient
CLOSED: [2023-09-30 Sat 23:54] SCHEDULED: <2023-09-29 Fri>
**** DONE Reprendre run batch
CLOSED: [2023-10-01 Sun 15:34]
** DONE Porter exactement la version d'Alexis sur Helios
CLOSED: [2023-01-14 Sat 17:56]
Branche "prod"
** KILL Tester version d'alexis avec Nix
CLOSED: [2023-06-14 Wed 22:37]
*** DONE Ajouter clinvar
CLOSED: [2022-11-13 Sun 19:37]
*** DONE Alignement
CLOSED: [2022-11-13 Sun 12:52]
*** DONE Haplotype caller
CLOSED: [2022-11-13 Sun 13:00]
*** KILL Filter
CLOSED: [2023-06-14 Wed 22:37]
- [X] depth
- [ ] comon snp not path
Problème avec liste des ID
**** KILL variant annotation
CLOSED: [2023-06-14 Wed 22:37]
Besoin de vep
*** KILL Variant calling
CLOSED: [2023-06-14 Wed 22:37]
** KILL Tester sarek
CLOSED: [2023-08-12 Sat 15:53]
#+begin_src sh
 module load apptainer/1.1.8
 nextflow run nf-core/sarek -profile test,singularity --outdir test-sarek
#+end_src
Les dépendences ne se téléchargent pas correctement, on les extrait à la main
#+begin_src sh
 rg -IN galaxyproject modules  | sed 's/ //g;s/:$//' | sort | uniq > deps.txt
#+end_src
 Nettoyage à la main
 Puis
 #+begin_src sh
 cat deps.txt | xargs -L1 singularity pull
 #+end_src
** DONE Support pour samplesheet
CLOSED: [2023-08-03 Thu 14:24] SCHEDULED: <2023-08-03 Thu 13:00>
/Entered on/ [2023-08-03 Thu 13:12]
** DONE Petit jeu de données : chr22 sur HG001
CLOSED: [2023-08-05 Sat 14:21] SCHEDULED: <2023-08-05 Sat>
** DONE Corriger OMIM annotation: manquant pour NMNAT1
CLOSED: [2023-09-16 Sat 22:47] SCHEDULED: <2023-09-16 Sat>
/Entered on/ [2023-09-16 Sat 19:32]
** PROJ Regarder la profondeur des variants rendus
/Entered on/ [2023-10-05 Thu 21:44]
* Documentation
:PROPERTIES:
:CATEGORY: doc
:END:
** DONE Procédure d'installation nix + dependences pour VM CHU
CLOSED: [2023-04-22 Sat 15:27] SCHEDULED: <2023-04-13 Thu>
* Bibliographie
** DONE Finir[cite:@alser2021]
CLOSED: [2023-09-26 Tue 11:26] SCHEDULED: <2023-09-22 Fri>
* Manuscript
:PROPERTIES:
:CATEGORY: manuscript
:END:
** DONE Flowchart pipeline (avec T2T)
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
** DONE Figure: nombre de publication par aligneur
CLOSED: [2023-09-19 Tue 16:54] SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
** DONE Biblio performance aligneur
CLOSED: [2023-10-06 Fri 16:51] SCHEDULED: <2023-10-01 Sun>
** TODO Biblio appel de variant
SCHEDULED: <2023-10-01 Sun>
** TODO Figure: nombre de publication par appel de variant
SCHEDULED: <2023-10-13 Fri>
/Entered on/ [2023-09-19 Tue 08:43]
** KILL Figure: nombre d'articles citant les principaux aligneur par année
CLOSED: [2023-10-11 Wed 23:54] SCHEDULED: <2023-10-03 Tue>
Il faudrait utiliser pubmed en local, sinon c'est 10 000 requete par aligner !
** DONE Figure: nombre d'articles citant les principaux aligneur
CLOSED: [2023-10-12 Thu 23:58] SCHEDULED: <2023-10-12 Thu>
Il faudrait utiliser pubmed en local, sinon c'est 10 000 requete par aligner !
On se base sur
** TODO Figure: nombre d'exomes par années
SCHEDULED: <2023-10-18 Wed>
/Entered on/ [2023-09-19 Tue 08:43]
* Tests :tests:
** KILL Non régression : version prod
CLOSED: [2023-05-23 Tue 08:46]
*** DONE ID common snp
CLOSED: [2022-11-19 Sat 21:36]
#+begin_src
$ wc -l ID_of_common_snp.txt
23194290 ID_of_common_snp.txt
$ wc -l /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
23194290 /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
#+end_src
*** DONE ID common snp not clinvar patho
CLOSED: [2022-12-11 Sun 20:11]
**** DONE Vérification du problème
CLOSED: [2022-12-11 Sun 16:30]
Sur le J:
21155134 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref
Version de "non-régression"
21155076 database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt
Nouvelle version
23193391 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt
Si on enlève les doublons
$ sort database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt | uniq > old.txt
$ wc -l old.txt
21107097 old.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt | uniq > new.txt
$ wc -l new.txt
21174578 new.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref | uniq > ref.txt
$ wc -l ref.txt
21107155 ref.txt
Si on regarde la différence
 comm -23 ref.txt old.txt
rs1052692
rs1057518973
rs1057518973
rs11074121
rs112848754
rs12573787
rs145033890
rs147889095
rs1553904159
rs1560294695
rs1560296615
rs1560310926
rs1560325547
rs1560342418
rs1560356225
rs1578287542
...
On cherche le premier
bcftools query -i 'ID="rs1052692"' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 1619351 C A,T
Il est bien patho...
$ bcftools query -i 'POS=1619351' database/clinvar/clinvar.vcf.gz -f '%CHROM %POS %REF %ALT %INFO/CLNSIG\n'
19 1619351 C T Conflicting_interpretations_of_pathogenicity
On vérifie pour tous les autres
$ comm -23 ref.txt old.txt > tocheck.txt
On génère les régions à vérifier (chromosome number:position)
$ bcftools query -i 'ID=@tocheck.txt' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM\t%POS\n' > tocheck.pos
On génère le mapping inverse (chromosome number -> NC)
$ awk ' { t = $1; $1 = $2; $2 = t; print; } ' database/RefSeq/refseq_to_number_only_consensual.txt  > mapping.txt
On remap clinvar
$ bcftools annotate --rename-chrs mapping.txt database/clinvar/clinvar.vcf.gz -o clinvar_remapped.vcf.gz
$ tabix clinvar_remapped.vcf.gz
Enfin, on cherche dans clinvar la classification
$ bcftools query -R tocheck.pos clinvar_remapped.vcf.gz -f '%CHROM %POS %INFO/CLNSIG\n'
$ bcftools query -R tocheck.pos database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %ID \n' | grep '^NC'
#+RESULTS:
**** DONE Comprendre pourquoi la nouvelle version donne un résultat différent
CLOSED: [2022-12-11 Sun 20:11]
***** DONE Même version dbsnp et clinvar ?
CLOSED: [2022-12-10 Sat 23:02]
Clinvar différent !
  $ bcftools stats clinvar.gz
  clinvar (Alexis)
SN	0	number of samples:	0
SN	0	number of records:	1492828
SN	0	number of no-ALTs:	965
SN	0	number of SNPs:	1338007
SN	0	number of MNPs:	5562
SN	0	number of indels:	144580
SN	0	number of others:	3714
SN	0	number of multiallelic sites:	0
SN	0	number of multiallelic SNP sites:	0
clinvar (new)
SN	0	number of samples:	0
SN	0	number of records:	1493470
SN	0	number of no-ALTs:	965
SN	0	number of SNPs:	1338561
SN	0	number of MNPs:	5565
SN	0	number of indels:	144663
SN	0	number of others:	3716
SN	0	number of multiallelic sites:	0
SN	0	number of multiallelic SNP sites:	0
***** DONE Mettre à jour clinvar et dbnSNP pour travailler sur les mêm bases
CLOSED: [20

Replacement in projects/bisonex.org at line 76 [20.35]
B:BD[14.29646] → [5.3844:17122]
[14.29646]

Saving tasks

Dependencies

In channels

Change contents

Insertion in workout.org at line 6739 [4.1]

Replacement in projects.org at line 127 [4.123895]

Replacement in projects.org at line 135 [4.123895]

Replacement in projects.org at line 146 [4.123895]

Replacement in projects.org at line 157 [4.123895]

Replacement in projects.org at line 179 [4.123895]

Replacement in projects.org at line 181 [4.123895]

Replacement in projects.org at line 188 [4.123895]

Replacement in projects.org at line 499 [4.123895]

Replacement in projects.org at line 804 [4.123895]

Insertion in projects.org at line 807 [4.123895]

Replacement in projects.org at line 865 [4.123895]

Insertion in projects.org at line 956 [4.123895]

Replacement in projects/bisonex.org at line 22 [20.35]

Replacement in projects/bisonex.org at line 27 [20.35]

Replacement in projects/bisonex.org at line 76 [20.35]