B:BD[
2.8393] → [
2.8393:32969]
xis
*** TODO Nouvelle version avec VEP
Example avec --custom
https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html
**** DONE Ajout spliceAI
CLOSED: [2023-05-18 Thu 11:02] SCHEDULED: <2023-04-30 Sun>
plugin VEP
***** DONE Télécharger les données
CLOSED: [2023-05-11 Thu 19:01]
Difficile d'automatiser, le lien est temporaire...
***** DONE PLugin
CLOSED: [2023-05-11 Thu 20:16]
***** DONE Séparer score en plusieurs colonnes
CLOSED: [2023-05-11 Thu 20:16]
Test avec ce fichier pour avoir une ligne avec annotation et une ligne sans
#CHROM POS ID REF ALT
1 9091 . A C
1 69091 . A C
et
#+begin_src sh
rm -f postvep.tsv* && vep -i testspliceai.vcf.gz -o postvep.tsv --tab --dir 109 --merged --pick --use_given_ref --offline --plugin SpliceAI,snv=spliceai_scores.raw.snv.hg38.vcf.gz,indel=spliceai_scores.raw.indel.hg38.vcf.gz
#+end_src
#+begin_src
$ bgzip postvep.tsv
$ python spliceai.py
$ cat postvep2.tsv
,variation,Location,Allele,Gene,Feature,Feature_type,Consequence,cDNA_position,CDS_position,Protein_position,Amino_acids,Codons,Existing_variation,IMPACT,DISTANCE,STRAND,FLAGS,REFSEQ_MATCH,SOURCE,REFSEQ_OFFSET,SpliceAI_AG,SpliceAI_AL,SpliceAI_DG,SpliceAI_DL
0,1_9091_A/C,1:9091,C,ENSG00000290825,ENST00000456328,Transcript,upstream_gene_variant,-,-,-,-,-,-,MODIFIER,2778,1,-,-,Ensembl,-,,,,
1,1_69091_A/C,1:69091,C,ENSG00000186092,ENST00000641515,Transcript,missense_variant,124,64,22,M/L,Atg/Ctg,-,MODERATE,-,1,-,-,Ensembl,-,0.01,0.00,0.00,0.01
#+end_src
Test
cp work/bf/437ae511958509e43072f032f4d495/small.tab.gz tests/vep-spip.tab.gz
cp work/d5/3b1244b5ae83d54409ee0d456e8c55/small_cadd.tab.gz tests/vep-cadd-splice.tab.gz
**** STRT Package Nix spliceAI
On utilise le tensorflow fourni avec nix (branche spliceai)
***** DONE Version CPU non optimisé
CLOSED: [2023-09-23 Sat 21:34]
****** DONE Vérifier annotation hg19 et 38
CLOSED: [2023-09-23 Sat 18:49]
****** DONE Annotation maison T2T
CLOSED: [2023-09-23 Sat 21:33] SCHEDULED: <2023-09-23 Sat>
***** DONE Version CPU optismisée
CLOSED: [2023-09-23 Sat 21:34]
Activer flag dans le package nix
***** DONE Version CPU: Test chr20
CLOSED: [2023-09-23 Sat 22:25] SCHEDULED: <2023-09-23 Sat>
10min ?
***** DONE Version CPU: NA12878 sanger complet: kill car trop long
CLOSED: [2023-09-24 Sun 08:22] SCHEDULED: <2023-09-23 Sat>
10min ?
***** WAIT GPU avec la version sur mésocentre: mail envoyé
#+begin_src sh
module unload nix/2.11.0
module load anaconda3@2021.05/gcc-12.1.0
module load deep/tensorflow-gpu
pip install spliceai
#+end_src
Puis on teste sur la queue gpu
#+begin_src sh
srun -p gpu -t 4:00:00 --gres=gpu:1 --pty bash
cd /Work/Users/apraga/bisonex/tests/spliceai/
module load deep/tensorflow-gpu
module unload nix/2.11.0
time spliceai -I NA12878-sanger-chr20-T2T.vep.vcf.gz -O output-20-2.vcf -R /Work/Groups/bisonex/data/fasta/chm13v2.0/chm13v2.0.fa -A ~/t2t.txt
#+end_src
Échec: librarie DNN not found...
#+begin_quote
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-24 09:37:45.892545: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-09-24 09:37:47.759421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 38188 MB memory: -> device: 0, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:21:00.0, compute capability: 8.0
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
2023-09-24 09:37:54.143021: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:637] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2023-09-24 09:37:54.217160: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:429] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2023-09-24 09:37:54.217220: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:438] Possibly insufficient driver version: 510.108.3
2023-09-24 09:37:54.217245: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at conv_ops.cc:1068 : UNIMPLEMENTED: DNN library is not found.
2023-09-24 09:37:54.217262: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:GPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): UNIMPLEMENTED: DNN library is not found.
[[{{node model_1/conv1d_3/Conv1D}}]]
Traceback (most recent call last):
File "/Home/Users/apraga/.local/bin/spliceai", line 8, in <module>
sys.exit(main())
File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/__main__.py", line 72, in main
scores = get_delta_scores(record, ann, args.D, args.M)
File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/utils.py", line 159, in get_delta_scores
y_ref = np.mean([ann.models[m].predict(x_ref) for m in range(5)], axis=0)
File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/utils.py", line 159, in <listcomp>
y_ref = np.mean([ann.models[m].predict(x_ref) for m in range(5)], axis=0)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:
Detected at node 'model_1/conv1d_3/Conv1D' defined at (most recent call last):
File "/Home/Users/apraga/.local/bin/spliceai", line 8, in <module>
sys.exit(main())
File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/__main__.py", line 72, in main
scores = get_delta_scores(record, ann, args.D, args.M)
File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/utils.py", line 159, in get_delta_scores
y_ref = np.mean([ann.models[m].predict(x_ref) for m in range(5)], axis=0)
File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/utils.py", line 159, in <listcomp>
y_ref = np.mean([ann.models[m].predict(x_ref) for m in range(5)], axis=0)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/engine/training.py", line 2382, in predict
tmp_batch_outputs = self.predict_function(iterator)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/engine/training.py", line 2169, in predict_function
return step_function(self, iterator)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/engine/training.py", line 2155, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/engine/training.py", line 2143, in run_step
outputs = model.predict_step(data)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/engine/training.py", line 2111, in predict_step
return self(x, training=False)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/engine/training.py", line 558, in __call__
return super().__call__(*args, **kwargs)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/engine/base_layer.py", line 1145, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 96, in error_handler
return fn(*args, **kwargs)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/engine/functional.py", line 512, in call
return self._run_internal_graph(inputs, training=training, mask=mask)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/engine/functional.py", line 669, in _run_internal_graph
outputs = node.layer(*args, **kwargs)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/engine/base_layer.py", line 1145, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 96, in error_handler
return fn(*args, **kwargs)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/layers/convolutional/base_conv.py", line 290, in call
outputs = self.convolution_op(inputs, self.kernel)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/layers/convolutional/base_conv.py", line 262, in convolution_op
return tf.nn.convolution(
Node: 'model_1/conv1d_3/Conv1D'
DNN library is not found.
[[{{node model_1/conv1d_3/Conv1D}}]] [Op:__inference_predict_function_22195]
#+end_quote
***** TODO Avec pip: echec
2023-09-24 08:28:46.361434: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU.
***** KILL Tester conda: echec
CLOSED: [2023-09-23 Sat 21:43] SCHEDULED: <2023-09-23 Sat>
N'arrive pas à installer
#+begin_quote
- feature:/linux-64::__glibc==2.28=0
- python=3.11 -> libgcc-ng[version='>=11.2.0'] -> __glibc[version='>=2.17']
- spliceai -> tensorflow[version='>=1.13.0'] -> __cuda
- spliceai -> tensorflow[version='>=1.13.0'] -> __glibc[version='>=2.17']
Your installed version is: 2.28
#+end_quote
**** TODO Ajout LOEUF et pli
plugin VEP
**** TODO NMD
plugin VEP
**** KILL Ajout LOEUF
CLOSED: [2023-04-19 mer. 16:32]
plugin VEP
**** DONE Spip
CLOSED: [2023-05-01 Mon 23:07] SCHEDULED: <2023-04-30 Sun>
BED ne semble pas bien marcher (il faut définir une zone)
VCF : trop d’information
Attention, plusieurs transcripts mais résultats identiques. On supprimer les doublons
***** DONE interpretation + score + intervalle de confiance séparé
CLOSED: [2023-05-01 Mon 23:07] SCHEDULED: <2023-04-30 Sun>
Tests :
dans tests/
vep -i 63004925-small.vcf -o postvep.vcf --vcf --fasta genomeRef.fna --dir 109 --merged --pick --offline --custom ../script/spip_annotation.vcf.gz,SPIP,vcf,exact,0,spipInterp,spipScore,spipConfidence
***** DONE Score
CLOSED: [2023-04-22 Sat 15:30]
**** DONE CADD: remplacer par plugin VEP
CLOSED: [2023-05-07 Sun 14:45] SCHEDULED: <2023-05-07 Sun>
***** Test
#+begin_src
vep -i test.vcf -o lol.vcf --offline --dir /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz --dir_plugins ../VEP_plugins/ -v
#+end_src
Test
#+begin_src sh
vep --id "1 230710048 230710048 A/G 1" --offline --dir /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz --hgvsg --plugin pLI --plugin LOEUF -o lol
#+end_src
CSQ=G|missense_variant|MODERATE|AGT|ENSG00000135744|Transcript|ENST00000366667|protein_coding|2/5||||843|776|259|M/T|aTg/aCg|||-1||HGNC|HGNC:333||Ensembl||A|A||1:g.230710048A>G|0.347|-0.277922|
Correspond bien à https://www.ensembl.org/Homo_sapiens/Tools/VEP/Results?tl=I7ZsIbrj14P6lD43-9115494
***** DONE Utiliser whole genome
CLOSED: [2023-04-29 Sat 15:46]
***** KILL Renommer les chromosome avant ...
CLOSED: [2023-05-01 Mon 09:14] SCHEDULED: <2023-04-30 Sun>
Trop long !
- Téléchargement de CADD: 4h20
- renommer les chromosome pour SNV : 6h20
- tabix sur les SNV : job tué au bout de 21h....
***** DONE annoter séparément et fusionner les tableaux
CLOSED: [2023-05-07 Sun 14:45] SCHEDULED: <2023-05-01 Mon>
NB: on pourrait filtrer CADD avec tabix pour se restreindre à nos variants
**** DONE clinvar
CLOSED: [2023-04-22 Sat 15:31]
**** KILL Vérifier résultats HGVS avec mutalyzer
CLOSED: [2023-05-01 Mon 09:26]
**** HOLD Parallélisation
***** HOLD par chromosome avec workflow VEP
https://github.com/Ensembl/ensembl-vep/blob/release/109/nextflow/workflows/run_vep.nf
***** HOLD Avec option --fork
**** DONE Utiliser la version de nf-core de VEP
CLOSED: [2023-05-13 Sat 18:27] SCHEDULED: <2023-05-07 Sun>
**** DONE OMIM
CLOSED: [2023-08-31 Thu 10:38] SCHEDULED: <2023-08-29 Tue>
**** DONE plI et LOEUF depuis gnomad
CLOSED: [2023-08-31 Thu 10:38] SCHEDULED: <2023-08-29 Tue>
**** DONE Grantham
CLOSED: [2023-08-31 Thu 22:08] SCHEDULED: <2023-08-30 Wed>
**** DONE Corriger spliceAI
CLOSED: [2023-08-31 Thu 13:51] SCHEDULED: <2023-08-31 Thu>
Pas d'annotation
- chromosome ? essai 1 au lieu de chr1 : idem. Et fonctionne pour CADD
- index ?
- retélécharger
- indexer nous-meme
**** DONE Supprimer score spip en double
CLOSED: [2023-08-31 Thu 14:17] SCHEDULED: <2023-08-31 Thu>
**** DONE Vérifier variant 63126867
CLOSED: [2023-08-31 Thu 10:52] SCHEDULED: <2023-08-31 Thu>
**** DONE Ajouter tronquant ou non
CLOSED: [2023-08-31 Thu 22:08] SCHEDULED: <2023-08-31 Thu>
**** DONE Ajouter récessif
CLOSED: [2023-08-31 Thu 22:08] SCHEDULED: <2023-08-31 Thu>
**** KILL Corriger allelic depth
CLOSED: [2023-08-31 Thu 11:18] SCHEDULED: <2023-08-31 Thu>
Problème lié à libre office
**** DONE Regénérer annotation pour na12878, inserted et patient PEX1
CLOSED: [2023-08-31 Thu 22:08] SCHEDULED: <2023-08-31 Thu>
**** TODO ACMG incidental
**** DONE Sortie VCF (pour avoir la fraction allélique AF)
CLOSED: [2023-08-28 Mon 17:22]
**** DONE VCF -> tsv avec bcftools
CLOSED: [2023-08-29 Tue 11:03] SCHEDULED: <2023-08-28 Mon>
**** DONE Un seul transcrit après VEP avec filter_vep :filter:
CLOSED: [2023-08-29 Tue 11:03] SCHEDULED: <2023-08-28 Mon>
Avec mise à jour VEP 110, pick_flag semble fonctionner.
***** DONE Test chr20: Pas de variant "perdus"
CLOSED: [2023-08-28 Mon 17:31] SCHEDULED: <2023-08-28 Mon>
contrairement au résultat communiqué à alexis par mail
#+begin_src sh :dir out/annotate
bcftools +counts vep/NA12878-sanger-chr20-GRCh38/NA12878-sanger-chr20-GRCh38.vep.vcf.gz
#+end_src
Number of samples: 1
Number of SNPs: 123
Number of INDELs: 32
Number of MNPs: 53
Number of others: 0
Number of sites: 208
#+begin_src sh
filter_vep -i vep/NA12878-sanger-chr20-GRCh38/NA12878-sanger-chr20-GRCh38.vep.vcf.gz --filter 'PICK' | bcftools +counts
#+end_src
Number of samples: 1
Number of SNPs: 123
Number of INDELs: 32
Number of MNPs: 53
Number of others: 0
Number of sites: 208
2nd vérification
#+begin_src sh :dir out/annotate
filter_vep -i vep/NA12878-sanger-chr20-GRCh38/NA12878-sanger-chr20-GRCh38.vep.vcf.gz --filter 'PICK' --soft_filter | grep fail
#+end_src
***** DONE Test NA12878 + variants sanger : variants perdus avec --pick ?
CLOSED: [2023-08-29 Tue 10:36] SCHEDULED: <2023-08-28 Mon>
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/out/annotate
~/.nix-profile/bin/bcftools +counts vep/NA12878-sanger-all-GRCh38/NA12878-sanger-all-GRCh38.vep.vcf.gz
#+end_src
#+RESULTS:
| Number | of | samples: | 1 |
| Number | of | SNPs: | 6293 |
| Number | of | INDELs: | 1515 |
| Number | of | MNPs: | 1588 |
| Number | of | others: | 0 |
| Number | of | sites: | 9322 |
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/out/annotate
~/.nix-profile/bin/filter_vep -i vep/NA12878-sanger-all-GRCh38/NA12878-sanger-all-GRCh38.vep.vcf.gz --filter 'PICK' | bcftools +counts
#+end_src
| Number | of | samples: | 1 |
| Number | of | SNPs: | 6293 |
| Number | of | INDELs: | 1515 |
| Number | of | MNPs: | 1588 |
| Number | of | others: | 0 |
| Number | of | sites: | 9322 |
***** DONE Test NA12878 + variants sanger: vérifier sortie avec julia : ok
CLOSED: [2023-08-29 Tue 10:21] SCHEDULED: <2023-08-28 Mon>
143 variants/146 comme avant
***** DONE Relancer en T2T pour vérifier compatibilité :T2T:
CLOSED: [2023-08-29 Tue 11:03] SCHEDULED: <2023-08-29 Tue>
**** DONE Repasser les tests sanger sur NA12878
CLOSED: [2023-09-01 Fri 10:32] SCHEDULED: <2023-08-31 Thu>
2 variants manquants après filter vep
**** DONE Choisir le meilleur transcript nous-meme
CLOSED: [2023-09-01 Fri 10:32] SCHEDULED: <2023-09-01 Fri>
**** DONE Vérifier T2T passe
CLOSED: [2023-08-31 Thu 22:10] SCHEDULED: <2023-08-31 Thu>
**** DONE Revoir choix du transcrit + filtre avec paul
CLOSED: [2023-09-08 Fri 22:46] SCHEDULED: <2023-09-06 Wed>
**** DONE Filtrer les variants selon les filtres d'Alexis et garder tous les résultat
CLOSED: [2023-09-10 Sun 15:39] SCHEDULED: <2023-09-09 Sat>
**** DONE Ajout colonne MANE SELECT et garder les autres
CLOSED: [2023-09-10 Sun 15:39] SCHEDULED: <2023-09-09 Sat>
**** DONE v1.0
CLOSED: [2023-09-11 Mon 19:11] SCHEDULED: <2023-09-09 Sat>
***** DONE Branche prod
CLOSED: [2023-09-10 Sun 15:44] SCHEDULED: <2023-09-09 Sat>
Merge depuis debug
***** DONE Mail alexis
CLOSED: [2023-09-01 Fri 10:32] SCHEDULED: <2023-08-31 Thu>
***** DONE Relancer test sanger
CLOSED: [2023-09-11 Mon 19:11] SCHEDULED: <2023-09-10 Sun>
***** DONE Mail Paul pour validation
CLOSED: [2023-09-11 Mon 19:11] SCHEDULED: <2023-09-10 Sun>
**** DONE Utiliser spliceAI >= 0.2 pour filtre au lieu de spip
CLOSED: [2023-09-11 Mon 21:48] SCHEDULED: <2023-09-11 Mon>
**** DONE Repasser tests sanger avec spliceAI
CLOSED: [2023-09-14 Thu 22:45] SCHEDULED: <2023-09-11 Mon>
**** DONE Corriger colonne récessive
CLOSED: [2023-09-14 Thu 22:57] SCHEDULED: <2023-09-14 Thu>
soit 1/1, soit 1/2
soit 0/1 avec 2 variants par gene
*** KILL Comparer les annotations sur 63003856
CLOSED: [2023-08-28 Mon 17:28]
**** Relancer le nouveau pipeline
*** KILL Ancienne version
CLOSED: [2023-08-28 Mon 17:24]
**** KILL HGVS
CLOSED: [2023-08-28 Mon 17:24]
**** KILL Filtrer après VEP
CLOSED: [2023-08-28 Mon 17:24]
**** KILL OMIM
CLOSED: [2023-08-28 Mon 17:24]
**** KILL clinvar
CLOSED: [2023-08-28 Mon 17:24]
**** KILL ACMG incidental
CLOSED: [2023-08-28 Mon 17:24]
**** KILL Grantham
CLOSED: [2023-08-28 Mon 17:24]
**** KILL LRG
CLOSED: [2023-04-18 mar. 17:22] SCHEDULED: <2023-04-18 Tue>
Vu avec alexis, n’est plus à jour
**** KILL Gnomad
CLOSED: [2023-08-28 Mon 17:24]
*** DONE Réordonner les colonnes :annotation:
CLOSED: [2023-08-31 Thu 10:38] SCHEDULED: <2023-08-28 Mon>
Pas d'OMIM, pas de CADD, pas de spliceAI
** DONE Porter exactement la version d'Alexis sur Helios
CLOSED: [2023-01-14 Sat 17:56]
Branche "prod"
** KILL Tester version d'alexis avec Nix
CLOSED: [2023-06-14 Wed 22:37]
*** DONE Ajouter clinvar
CLOSED: [2022-11-13 Sun 19:37]
*** DONE Alignement
CLOSED: [2022-11-13 Sun 12:52]
*** DONE Haplotype caller
CLOSED: [2022-11-13 Sun 13:00]
*** KILL Filter
CLOSED: [2023-06-14 Wed 22:37]
- [X] depth
- [ ] comon snp not path
Problème avec liste des ID
**** KILL variant annotation
CLOSED: [2023-06-14 Wed 22:37]
Besoin de vep
*** KILL Variant calling
CLOSED: [2023-06-14 Wed 22:37]
** KILL Tester sarek
CLOSED: [2023-08-12 Sat 15:53]
#+begin_src sh
module load apptainer/1.1.8
nextflow run nf-core/sarek -profile test,singularity --outdir test-sarek
#+end_src
Les dépendences ne se téléchargent pas correctement, on les extrait à la main
#+begin_src sh
rg -IN galaxyproject modules | sed 's/ //g;s/:$//' | sort | uniq > deps.txt
#+end_src
Nettoyage à la main
Puis
#+begin_src sh
cat deps.txt | xargs -L1 singularity pull
#+end_src
** DONE Support pour samplesheet
CLOSED: [2023-08-03 Thu 14:24] SCHEDULED: <2023-08-03 Thu 13:00>
/Entered on/ [2023-08-03 Thu 13:12]
** DONE Petit jeu de données : chr22 sur HG001
CLOSED: [2023-08-05 Sat 14:21] SCHEDULED: <2023-08-05 Sat>
** DONE Corriger OMIM annotation: manquant pour NMNAT1
CLOSED: [2023-09-16 Sat 22:47] SCHEDULED: <2023-09-16 Sat>
/Entered on/ [2023-09-16 Sat 19:32]
* Documentation
:PROPERTIES:
:CATEGORY: doc
:END:
** DONE Procédure d'installation nix + dependences pour VM CHU
CLOSED: [2023-04-22 Sat 15:27] SCHEDULED: <2023-04-13 Thu>
* Bibliographie
** TODO Finir[cite:@alser2021]
SCHEDULED: <2023-09-22 Fri>
* Manuscript
:PROPERTIES:
:CATEGORY: manuscript
:END:
** DONE Flowchart pipeline (avec T2T)
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
** DONE Figure: nombre de publication par aligneur
CLOSED: [2023-09-19 Tue 16:54] SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
** TODO Figure: nombre de publication par appel de variant
SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
** TODO Figure: nombre d'exomes par années
SCHEDULED: <2023-09-30 Sat>
/Entered on/ [2023-09-19 Tue 08:43]
* Tests :tests:
** KILL Non régression : version prod
CLOSED: [2023-05-23 Tue 08:46]
*** DONE ID common snp
CLOSED: [2022-11-19 Sat 21:36]
#+begin_src
$ wc -l ID_of_common_snp.txt
23194290 ID_of_common_snp.txt
$ wc -l /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
23194290 /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
#+end_src
*** DONE ID common snp not clinvar patho
CLOSED: [2022-12-11 Sun 20:11]
**** DONE Vérification du problème
CLOSED: [2022-12-11 Sun 16:30]
Sur le J:
21155134 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref
Version de "non-régression"
21155076 database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt
Nouvelle version
23193391 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt
Si on enlève les doublons
$ sort database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt | uniq > old.txt
$ wc -l old.txt
21107097 old.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt | uniq > new.txt
$ wc -l new.txt
21174578 new.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref | uniq > ref.txt
$ wc -l ref.txt
21107155 ref.txt
Si on regarde la différence
comm -23 ref.txt old.txt
rs1052692
rs1057518973
rs1057518973
rs11074121
rs11284
xis
*** TODO Nouvelle version avec VEP
Example avec --custom
https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html
**** DONE Ajout spliceAI
CLOSED: [2023-05-18 Thu 11:02] SCHEDULED: <2023-04-30 Sun>
plugin VEP
***** DONE Télécharger les données
CLOSED: [2023-05-11 Thu 19:01]
Difficile d'automatiser, le lien est temporaire...
***** DONE PLugin
CLOSED: [2023-05-11 Thu 20:16]
***** DONE Séparer score en plusieurs colonnes
CLOSED: [2023-05-11 Thu 20:16]
Test avec ce fichier pour avoir une ligne avec annotation et une ligne sans
#CHROM POS ID REF ALT
1 9091 . A C
1 69091 . A C
et
#+begin_src sh
rm -f postvep.tsv* && vep -i testspliceai.vcf.gz -o postvep.tsv --tab --dir 109 --merged --pick --use_given_ref --offline --plugin SpliceAI,snv=spliceai_scores.raw.snv.hg38.vcf.gz,indel=spliceai_scores.raw.indel.hg38.vcf.gz
#+end_src
#+begin_src
$ bgzip postvep.tsv
$ python spliceai.py
$ cat postvep2.tsv
,variation,Location,Allele,Gene,Feature,Feature_type,Consequence,cDNA_position,CDS_position,Protein_position,Amino_acids,Codons,Existing_variation,IMPACT,DISTANCE,STRAND,FLAGS,REFSEQ_MATCH,SOURCE,REFSEQ_OFFSET,SpliceAI_AG,SpliceAI_AL,SpliceAI_DG,SpliceAI_DL
0,1_9091_A/C,1:9091,C,ENSG00000290825,ENST00000456328,Transcript,upstream_gene_variant,-,-,-,-,-,-,MODIFIER,2778,1,-,-,Ensembl,-,,,,
1,1_69091_A/C,1:69091,C,ENSG00000186092,ENST00000641515,Transcript,missense_variant,124,64,22,M/L,Atg/Ctg,-,MODERATE,-,1,-,-,Ensembl,-,0.01,0.00,0.00,0.01
#+end_src
Test
cp work/bf/437ae511958509e43072f032f4d495/small.tab.gz tests/vep-spip.tab.gz
cp work/d5/3b1244b5ae83d54409ee0d456e8c55/small_cadd.tab.gz tests/vep-cadd-splice.tab.gz
**** STRT Package Nix spliceAI
On utilise le tensorflow fourni avec nix (branche spliceai)
Il faut LD_PRELOAD=/lib64/libcuda.so pour l'exécution
***** DONE Version CPU non optimisé
CLOSED: [2023-09-23 Sat 21:34]
****** DONE Vérifier annotation hg19 et 38
CLOSED: [2023-09-23 Sat 18:49]
****** DONE Annotation maison T2T
CLOSED: [2023-09-23 Sat 21:33] SCHEDULED: <2023-09-23 Sat>
***** DONE Version CPU optismisée
CLOSED: [2023-09-23 Sat 21:34]
Activer flag dans le package nix
***** DONE Version CPU: Test chr20
CLOSED: [2023-09-23 Sat 22:25] SCHEDULED: <2023-09-23 Sat>
10min ?
***** DONE Version CPU: NA12878 sanger complet: kill car trop long
CLOSED: [2023-09-24 Sun 08:22] SCHEDULED: <2023-09-23 Sat>
10min ?
***** DONE GPU avec la version sur mésocentre: mail envoyé
CLOSED: [2023-09-26 Tue 11:49]
#+begin_src sh
module unload nix/2.11.0
module load anaconda3@2021.05/gcc-12.1.0
module load deep/tensorflow-gpu
pip install spliceai
#+end_src
Puis on teste sur la queue gpu
#+begin_src sh
srun -p gpu -t 4:00:00 --gres=gpu:1 --pty bash
cd /Work/Users/apraga/bisonex/tests/spliceai/
module load deep/tensorflow-gpu
module unload nix/2.11.0
time spliceai -I NA12878-sanger-chr20-T2T.vep.vcf.gz -O output-20-2.vcf -R /Work/Groups/bisonex/data/fasta/chm13v2.0/chm13v2.0.fa -A ~/t2t.txt
#+end_src
Échec: librarie DNN not found...
#+begin_quote
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-24 09:37:45.892545: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-09-24 09:37:47.759421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 38188 MB memory: -> device: 0, name: NVIDIA A100-PCIE-40GB, pci bus id: 0000:21:00.0, compute capability: 8.0
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
WARNING:tensorflow:No training configuration found in the save file, so the model was *not* compiled. Compile it manually.
2023-09-24 09:37:54.143021: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:637] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2023-09-24 09:37:54.217160: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:429] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2023-09-24 09:37:54.217220: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:438] Possibly insufficient driver version: 510.108.3
2023-09-24 09:37:54.217245: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at conv_ops.cc:1068 : UNIMPLEMENTED: DNN library is not found.
2023-09-24 09:37:54.217262: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:GPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): UNIMPLEMENTED: DNN library is not found.
[[{{node model_1/conv1d_3/Conv1D}}]]
Traceback (most recent call last):
File "/Home/Users/apraga/.local/bin/spliceai", line 8, in <module>
sys.exit(main())
File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/__main__.py", line 72, in main
scores = get_delta_scores(record, ann, args.D, args.M)
File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/utils.py", line 159, in get_delta_scores
y_ref = np.mean([ann.models[m].predict(x_ref) for m in range(5)], axis=0)
File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/utils.py", line 159, in <listcomp>
y_ref = np.mean([ann.models[m].predict(x_ref) for m in range(5)], axis=0)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:
Detected at node 'model_1/conv1d_3/Conv1D' defined at (most recent call last):
File "/Home/Users/apraga/.local/bin/spliceai", line 8, in <module>
sys.exit(main())
File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/__main__.py", line 72, in main
scores = get_delta_scores(record, ann, args.D, args.M)
File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/utils.py", line 159, in get_delta_scores
y_ref = np.mean([ann.models[m].predict(x_ref) for m in range(5)], axis=0)
File "/Home/Users/apraga/.local/lib/python3.9/site-packages/spliceai/utils.py", line 159, in <listcomp>
y_ref = np.mean([ann.models[m].predict(x_ref) for m in range(5)], axis=0)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/engine/training.py", line 2382, in predict
tmp_batch_outputs = self.predict_function(iterator)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/engine/training.py", line 2169, in predict_function
return step_function(self, iterator)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/engine/training.py", line 2155, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/engine/training.py", line 2143, in run_step
outputs = model.predict_step(data)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/engine/training.py", line 2111, in predict_step
return self(x, training=False)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/engine/training.py", line 558, in __call__
return super().__call__(*args, **kwargs)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/engine/base_layer.py", line 1145, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 96, in error_handler
return fn(*args, **kwargs)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/engine/functional.py", line 512, in call
return self._run_internal_graph(inputs, training=training, mask=mask)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/engine/functional.py", line 669, in _run_internal_graph
outputs = node.layer(*args, **kwargs)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/engine/base_layer.py", line 1145, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 96, in error_handler
return fn(*args, **kwargs)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/layers/convolutional/base_conv.py", line 290, in call
outputs = self.convolution_op(inputs, self.kernel)
File "/Softs/helios/gpu/anaconda3/2023.03-1/envs/tensorflow-gpu-2.12.0+py3.9/lib/python3.9/site-packages/keras/layers/convolutional/base_conv.py", line 262, in convolution_op
return tf.nn.convolution(
Node: 'model_1/conv1d_3/Conv1D'
DNN library is not found.
[[{{node model_1/conv1d_3/Conv1D}}]] [Op:__inference_predict_function_22195]
#+end_quote
***** DONE GPU: chr20 ok
CLOSED: [2023-09-26 Tue 11:50]
LD_PRELOAD=/lib64/libcuda.so spliceai -I NA12878-sanger-20-2-T2T.vep.vcf.gz -O output-20-2-gpu.vcf -R /Work/Groups/bisonex/data/fasta/chm13v2.0/chm13v2.0.fa -A ~/t2t.txt
temps d'exécution : 5min
***** STRT GPU: toutes les données
SCHEDULED: <2023-09-26 Tue>
#+begin_src slurm
#!/bin/bash -l
# Fichier submission.SBATCH
#SBATCH --job-name="spliceai-gpu"
#SBATCH --output=%x.%J.out ## %x=nom_du_job, %J=id du job
#SBATCH --error=%x.%J.out
# walltime (hh:mm::ss) max is 8 days
#SBATCH -t 24:00:00
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
## To request more memory, use --mem option.
## Please don't use more than 128g.
#SBATCH --mem=32G
## votre dresse mail pour les notifs
#SBATCH --mail-user=apraga@chu-besancon.fr
#SBATCH --mail-type=END,FAIL
nvidia-smi
module purge
module load nix/2.11.0
LD_PRELOAD=/lib64/libcuda.so spliceai -I NA12878-sanger-all-T2T.vep.vcf.gz -O output-all-gpu.vcf -R /Work/Groups/bisonex/data/fasta/chm13v2.0/chm13v2.0.fa -A ~/t2t.txt
#+end_src
***** TODO Avec pip: echec
2023-09-24 08:28:46.361434: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU.
***** KILL Tester conda: echec
CLOSED: [2023-09-23 Sat 21:43] SCHEDULED: <2023-09-23 Sat>
N'arrive pas à installer
#+begin_quote
- feature:/linux-64::__glibc==2.28=0
- python=3.11 -> libgcc-ng[version='>=11.2.0'] -> __glibc[version='>=2.17']
- spliceai -> tensorflow[version='>=1.13.0'] -> __cuda
- spliceai -> tensorflow[version='>=1.13.0'] -> __glibc[version='>=2.17']
Your installed version is: 2.28
#+end_quote
**** TODO Ajout LOEUF et pli
plugin VEP
**** TODO NMD
plugin VEP
**** KILL Ajout LOEUF
CLOSED: [2023-04-19 mer. 16:32]
plugin VEP
**** DONE Spip
CLOSED: [2023-05-01 Mon 23:07] SCHEDULED: <2023-04-30 Sun>
BED ne semble pas bien marcher (il faut définir une zone)
VCF : trop d’information
Attention, plusieurs transcripts mais résultats identiques. On supprimer les doublons
***** DONE interpretation + score + intervalle de confiance séparé
CLOSED: [2023-05-01 Mon 23:07] SCHEDULED: <2023-04-30 Sun>
Tests :
dans tests/
vep -i 63004925-small.vcf -o postvep.vcf --vcf --fasta genomeRef.fna --dir 109 --merged --pick --offline --custom ../script/spip_annotation.vcf.gz,SPIP,vcf,exact,0,spipInterp,spipScore,spipConfidence
***** DONE Score
CLOSED: [2023-04-22 Sat 15:30]
**** DONE CADD: remplacer par plugin VEP
CLOSED: [2023-05-07 Sun 14:45] SCHEDULED: <2023-05-07 Sun>
***** Test
#+begin_src
vep -i test.vcf -o lol.vcf --offline --dir /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz --dir_plugins ../VEP_plugins/ -v
#+end_src
Test
#+begin_src sh
vep --id "1 230710048 230710048 A/G 1" --offline --dir /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz --hgvsg --plugin pLI --plugin LOEUF -o lol
#+end_src
CSQ=G|missense_variant|MODERATE|AGT|ENSG00000135744|Transcript|ENST00000366667|protein_coding|2/5||||843|776|259|M/T|aTg/aCg|||-1||HGNC|HGNC:333||Ensembl||A|A||1:g.230710048A>G|0.347|-0.277922|
Correspond bien à https://www.ensembl.org/Homo_sapiens/Tools/VEP/Results?tl=I7ZsIbrj14P6lD43-9115494
***** DONE Utiliser whole genome
CLOSED: [2023-04-29 Sat 15:46]
***** KILL Renommer les chromosome avant ...
CLOSED: [2023-05-01 Mon 09:14] SCHEDULED: <2023-04-30 Sun>
Trop long !
- Téléchargement de CADD: 4h20
- renommer les chromosome pour SNV : 6h20
- tabix sur les SNV : job tué au bout de 21h....
***** DONE annoter séparément et fusionner les tableaux
CLOSED: [2023-05-07 Sun 14:45] SCHEDULED: <2023-05-01 Mon>
NB: on pourrait filtrer CADD avec tabix pour se restreindre à nos variants
**** DONE clinvar
CLOSED: [2023-04-22 Sat 15:31]
**** KILL Vérifier résultats HGVS avec mutalyzer
CLOSED: [2023-05-01 Mon 09:26]
**** HOLD Parallélisation
***** HOLD par chromosome avec workflow VEP
https://github.com/Ensembl/ensembl-vep/blob/release/109/nextflow/workflows/run_vep.nf
***** HOLD Avec option --fork
**** DONE Utiliser la version de nf-core de VEP
CLOSED: [2023-05-13 Sat 18:27] SCHEDULED: <2023-05-07 Sun>
**** DONE OMIM
CLOSED: [2023-08-31 Thu 10:38] SCHEDULED: <2023-08-29 Tue>
**** DONE plI et LOEUF depuis gnomad
CLOSED: [2023-08-31 Thu 10:38] SCHEDULED: <2023-08-29 Tue>
**** DONE Grantham
CLOSED: [2023-08-31 Thu 22:08] SCHEDULED: <2023-08-30 Wed>
**** DONE Corriger spliceAI
CLOSED: [2023-08-31 Thu 13:51] SCHEDULED: <2023-08-31 Thu>
Pas d'annotation
- chromosome ? essai 1 au lieu de chr1 : idem. Et fonctionne pour CADD
- index ?
- retélécharger
- indexer nous-meme
**** DONE Supprimer score spip en double
CLOSED: [2023-08-31 Thu 14:17] SCHEDULED: <2023-08-31 Thu>
**** DONE Vérifier variant 63126867
CLOSED: [2023-08-31 Thu 10:52] SCHEDULED: <2023-08-31 Thu>
**** DONE Ajouter tronquant ou non
CLOSED: [2023-08-31 Thu 22:08] SCHEDULED: <2023-08-31 Thu>
**** DONE Ajouter récessif
CLOSED: [2023-08-31 Thu 22:08] SCHEDULED: <2023-08-31 Thu>
**** KILL Corriger allelic depth
CLOSED: [2023-08-31 Thu 11:18] SCHEDULED: <2023-08-31 Thu>
Problème lié à libre office
**** DONE Regénérer annotation pour na12878, inserted et patient PEX1
CLOSED: [2023-08-31 Thu 22:08] SCHEDULED: <2023-08-31 Thu>
**** TODO ACMG incidental
**** DONE Sortie VCF (pour avoir la fraction allélique AF)
CLOSED: [2023-08-28 Mon 17:22]
**** DONE VCF -> tsv avec bcftools
CLOSED: [2023-08-29 Tue 11:03] SCHEDULED: <2023-08-28 Mon>
**** DONE Un seul transcrit après VEP avec filter_vep :filter:
CLOSED: [2023-08-29 Tue 11:03] SCHEDULED: <2023-08-28 Mon>
Avec mise à jour VEP 110, pick_flag semble fonctionner.
***** DONE Test chr20: Pas de variant "perdus"
CLOSED: [2023-08-28 Mon 17:31] SCHEDULED: <2023-08-28 Mon>
contrairement au résultat communiqué à alexis par mail
#+begin_src sh :dir out/annotate
bcftools +counts vep/NA12878-sanger-chr20-GRCh38/NA12878-sanger-chr20-GRCh38.vep.vcf.gz
#+end_src
Number of samples: 1
Number of SNPs: 123
Number of INDELs: 32
Number of MNPs: 53
Number of others: 0
Number of sites: 208
#+begin_src sh
filter_vep -i vep/NA12878-sanger-chr20-GRCh38/NA12878-sanger-chr20-GRCh38.vep.vcf.gz --filter 'PICK' | bcftools +counts
#+end_src
Number of samples: 1
Number of SNPs: 123
Number of INDELs: 32
Number of MNPs: 53
Number of others: 0
Number of sites: 208
2nd vérification
#+begin_src sh :dir out/annotate
filter_vep -i vep/NA12878-sanger-chr20-GRCh38/NA12878-sanger-chr20-GRCh38.vep.vcf.gz --filter 'PICK' --soft_filter | grep fail
#+end_src
***** DONE Test NA12878 + variants sanger : variants perdus avec --pick ?
CLOSED: [2023-08-29 Tue 10:36] SCHEDULED: <2023-08-28 Mon>
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/out/annotate
~/.nix-profile/bin/bcftools +counts vep/NA12878-sanger-all-GRCh38/NA12878-sanger-all-GRCh38.vep.vcf.gz
#+end_src
#+RESULTS:
| Number | of | samples: | 1 |
| Number | of | SNPs: | 6293 |
| Number | of | INDELs: | 1515 |
| Number | of | MNPs: | 1588 |
| Number | of | others: | 0 |
| Number | of | sites: | 9322 |
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/out/annotate
~/.nix-profile/bin/filter_vep -i vep/NA12878-sanger-all-GRCh38/NA12878-sanger-all-GRCh38.vep.vcf.gz --filter 'PICK' | bcftools +counts
#+end_src
| Number | of | samples: | 1 |
| Number | of | SNPs: | 6293 |
| Number | of | INDELs: | 1515 |
| Number | of | MNPs: | 1588 |
| Number | of | others: | 0 |
| Number | of | sites: | 9322 |
***** DONE Test NA12878 + variants sanger: vérifier sortie avec julia : ok
CLOSED: [2023-08-29 Tue 10:21] SCHEDULED: <2023-08-28 Mon>
143 variants/146 comme avant
***** DONE Relancer en T2T pour vérifier compatibilité :T2T:
CLOSED: [2023-08-29 Tue 11:03] SCHEDULED: <2023-08-29 Tue>
**** DONE Repasser les tests sanger sur NA12878
CLOSED: [2023-09-01 Fri 10:32] SCHEDULED: <2023-08-31 Thu>
2 variants manquants après filter vep
**** DONE Choisir le meilleur transcript nous-meme
CLOSED: [2023-09-01 Fri 10:32] SCHEDULED: <2023-09-01 Fri>
**** DONE Vérifier T2T passe
CLOSED: [2023-08-31 Thu 22:10] SCHEDULED: <2023-08-31 Thu>
**** DONE Revoir choix du transcrit + filtre avec paul
CLOSED: [2023-09-08 Fri 22:46] SCHEDULED: <2023-09-06 Wed>
**** DONE Filtrer les variants selon les filtres d'Alexis et garder tous les résultat
CLOSED: [2023-09-10 Sun 15:39] SCHEDULED: <2023-09-09 Sat>
**** DONE Ajout colonne MANE SELECT et garder les autres
CLOSED: [2023-09-10 Sun 15:39] SCHEDULED: <2023-09-09 Sat>
**** DONE v1.0
CLOSED: [2023-09-11 Mon 19:11] SCHEDULED: <2023-09-09 Sat>
***** DONE Branche prod
CLOSED: [2023-09-10 Sun 15:44] SCHEDULED: <2023-09-09 Sat>
Merge depuis debug
***** DONE Mail alexis
CLOSED: [2023-09-01 Fri 10:32] SCHEDULED: <2023-08-31 Thu>
***** DONE Relancer test sanger
CLOSED: [2023-09-11 Mon 19:11] SCHEDULED: <2023-09-10 Sun>
***** DONE Mail Paul pour validation
CLOSED: [2023-09-11 Mon 19:11] SCHEDULED: <2023-09-10 Sun>
**** DONE Utiliser spliceAI >= 0.2 pour filtre au lieu de spip
CLOSED: [2023-09-11 Mon 21:48] SCHEDULED: <2023-09-11 Mon>
**** DONE Repasser tests sanger avec spliceAI
CLOSED: [2023-09-14 Thu 22:45] SCHEDULED: <2023-09-11 Mon>
**** DONE Corriger colonne récessive
CLOSED: [2023-09-14 Thu 22:57] SCHEDULED: <2023-09-14 Thu>
soit 1/1, soit 1/2
soit 0/1 avec 2 variants par gene
*** KILL Comparer les annotations sur 63003856
CLOSED: [2023-08-28 Mon 17:28]
**** Relancer le nouveau pipeline
*** KILL Ancienne version
CLOSED: [2023-08-28 Mon 17:24]
**** KILL HGVS
CLOSED: [2023-08-28 Mon 17:24]
**** KILL Filtrer après VEP
CLOSED: [2023-08-28 Mon 17:24]
**** KILL OMIM
CLOSED: [2023-08-28 Mon 17:24]
**** KILL clinvar
CLOSED: [2023-08-28 Mon 17:24]
**** KILL ACMG incidental
CLOSED: [2023-08-28 Mon 17:24]
**** KILL Grantham
CLOSED: [2023-08-28 Mon 17:24]
**** KILL LRG
CLOSED: [2023-04-18 mar. 17:22] SCHEDULED: <2023-04-18 Tue>
Vu avec alexis, n’est plus à jour
**** KILL Gnomad
CLOSED: [2023-08-28 Mon 17:24]
*** DONE Réordonner les colonnes :annotation:
CLOSED: [2023-08-31 Thu 10:38] SCHEDULED: <2023-08-28 Mon>
Pas d'OMIM, pas de CADD, pas de spliceAI
** DONE Porter exactement la version d'Alexis sur Helios
CLOSED: [2023-01-14 Sat 17:56]
Branche "prod"
** KILL Tester version d'alexis avec Nix
CLOSED: [2023-06-14 Wed 22:37]
*** DONE Ajouter clinvar
CLOSED: [2022-11-13 Sun 19:37]
*** DONE Alignement
CLOSED: [2022-11-13 Sun 12:52]
*** DONE Haplotype caller
CLOSED: [2022-11-13 Sun 13:00]
*** KILL Filter
CLOSED: [2023-06-14 Wed 22:37]
- [X] depth
- [ ] comon snp not path
Problème avec liste des ID
**** KILL variant annotation
CLOSED: [2023-06-14 Wed 22:37]
Besoin de vep
*** KILL Variant calling
CLOSED: [2023-06-14 Wed 22:37]
** KILL Tester sarek
CLOSED: [2023-08-12 Sat 15:53]
#+begin_src sh
module load apptainer/1.1.8
nextflow run nf-core/sarek -profile test,singularity --outdir test-sarek
#+end_src
Les dépendences ne se téléchargent pas correctement, on les extrait à la main
#+begin_src sh
rg -IN galaxyproject modules | sed 's/ //g;s/:$//' | sort | uniq > deps.txt
#+end_src
Nettoyage à la main
Puis
#+begin_src sh
cat deps.txt | xargs -L1 singularity pull
#+end_src
** DONE Support pour samplesheet
CLOSED: [2023-08-03 Thu 14:24] SCHEDULED: <2023-08-03 Thu 13:00>
/Entered on/ [2023-08-03 Thu 13:12]
** DONE Petit jeu de données : chr22 sur HG001
CLOSED: [2023-08-05 Sat 14:21] SCHEDULED: <2023-08-05 Sat>
** DONE Corriger OMIM annotation: manquant pour NMNAT1
CLOSED: [2023-09-16 Sat 22:47] SCHEDULED: <2023-09-16 Sat>
/Entered on/ [2023-09-16 Sat 19:32]
* Documentation
:PROPERTIES:
:CATEGORY: doc
:END:
** DONE Procédure d'installation nix + dependences pour VM CHU
CLOSED: [2023-04-22 Sat 15:27] SCHEDULED: <2023-04-13 Thu>
* Bibliographie
** DONE Finir[cite:@alser2021]
CLOSED: [2023-09-26 Tue 11:26] SCHEDULED: <2023-09-22 Fri>
* Manuscript
:PROPERTIES:
:CATEGORY: manuscript
:END:
** DONE Flowchart pipeline (avec T2T)
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
** DONE Figure: nombre de publication par aligneur
CLOSED: [2023-09-19 Tue 16:54] SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
** TODO Figure: nombre de publication par appel de variant
SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
** TODO Figure: nombre d'exomes par années
SCHEDULED: <2023-09-30 Sat>
/Entered on/ [2023-09-19 Tue 08:43]
* Tests :tests:
** KILL Non régression : version prod
CLOSED: [2023-05-23 Tue 08:46]
*** DONE ID common snp
CLOSED: [2022-11-19 Sat 21:36]
#+begin_src
$ wc -l ID_of_common_snp.txt
23194290 ID_of_common_snp.txt
$ wc -l /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
23194290 /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
#+end_src
*** DONE ID common snp not clinvar patho
CLOSED: [2022-12-11 Sun 20:11]
**** DONE Vérification du problème
CLOSED: [2022-12-11 Sun 16:30]
Sur le J:
21155134 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref
Version de "non-régression"
21155076 database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt
Nouvelle version
23193391 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt
Si on enlève les doublons
$ sort database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt | uniq > old.txt
$ wc -l old.txt
21107097 old.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt | uniq > new.txt
$ wc -l new.txt
21174578 new.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref | uniq > ref.txt
$ wc -l ref.txt
21107155 ref.txt
Si on regarde la différence
comm -23 ref.txt old.txt
rs1052692
rs1057518973
rs1057518973
rs11074121
rs11284