B:BD[
2.24606] → [
2.24606:32798]
[2023-09-27 Wed 21:40]
# ****** DONE Filtre vep avec spliceAI: 37365 -> 6130. SpliceAI n'apporte rien
# CLOSED: [2023-09-27 Wed 19:37] SCHEDULED: <2023-09-27 Wed>
# :PROPERTIES:
# :ID: c9b2009a-503b-4561-94c6-29ae21a3188d
# :END:
# #+begin_src sh
# filter_vep -i output-all-gpu.vcf --format vcf --filter " not(Consequence matches non_coding_transcript or Consequence matches stream or Consequence matches intergenic_variant or Consequence matches UTR or Consequence matches intron_variant or Consequence matches synonymous or BIOTYPE matches pseudogene or BIOTYPE matches misc_RNA) or (SpliceAI_pred_DS_AG and SpliceAI_pred_DS_AG >= 0.2) or (SpliceAI_pred_DS_AL and SpliceAI_pred_DS_AL >= 0.2) or (SpliceAI_pred_DS_DG and SpliceAI_pred_DS_DG >= 0.2) or (SpliceAI_pred_DS_DL and SpliceAI_pred_DS_DL >= 0.2) " --only_matched -o output-all-gpu-filtered.vcf
# #+end_src
# filter_vep -i output-all-gpu.vcf --format vcf --filter " not(Consequence matches non_coding_transcript or Consequence matches stream or Consequence matches intergenic_variant or Consequence matches UTR or Consequence matches intron_variant or Consequence matches synonymous or BIOTYPE matches pseudogene or BIOTYPE matches misc_RNA)" --only_matched | grep -c -v '^#'
# 6130
# $ grep -c -v '^#' output-all-gpu-filtered.vcf
# 6130
# ****** DONE Re-vérifier filtre avec spip: 7730 -> probable problème avec spip
# CLOSED: [2023-09-27 Wed 20:54] SCHEDULED: <2023-09-27 Wed>
# filter_vep -i NA12878-sanger-all-T2T.vep.vcf.gz --format vcf --filter " not(Consequence matches non_coding_transcript or Consequence matches stream or Consequence matches intergenic_variant or Consequence matches UTR or Consequence matches intron_variant or Consequence matches synonymous or BIOTYPE matches pseudogene or BIOTYPE matches misc_RNA) or (SPIP_spipScore and SPIP_spipScore >= 20)" --only_matched | grep -c -v '^#'
# perl: warning: Setting locale failed.
# perl: warning: Please check that your locale settings:
# LANGUAGE = (unset),
# LC_ALL = (unset),
# LANG = "en_US.utf8"
# are supported and installed on your system.
# perl: warning: Falling back to the standard locale ("C").
# 7730
****** DONE vérifier si tests sanger passent: ok
CLOSED: [2023-09-28 Thu 01:32] SCHEDULED: <2023-09-27 Wed>
ok !
Haplotypecaller : /Work/Users/apraga/bisonex/out//call_variant/haplotypecaller/NA12878-sanger-all-T2T/NA12878-sanger-all-T2T.haplotypecaller.vcf.gz/Work/Users/apraga/bisonex/out//call_variant/haplotypecaller/NA12878-sanger-all-T2T/NA12878-sanger-all-T2T.haplotypecaller.vcf.gz
144 found over 146
2×3 DataFrame
Row │ variant meanQual depth
│ String Float64 Int64
─────┼──────────────────────────────────────
1 │ chr12:g.13594572C>T 60.0 1
2 │ chr17:g.10204026T>A 60.0 1
/Work/Users/apraga/bisonex/tests/spliceai/output-all-gpu-filtered.vcf.gz
144 found over 146
spliceai : another 0 missed variants
0×3 DataFrame
Row │ variant meanQual depth
│ String Float64 Int64
─────┴──────────────────────────
***** KILL Avec pip: echec
CLOSED: [2023-09-28 Thu 01:34]
2023-09-24 08:28:46.361434: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU.
***** DONE Tester conda: echec
CLOSED: [2023-09-23 Sat 21:43] SCHEDULED: <2023-09-23 Sat>
Ananconda: N'arrive pas à installer
#+begin_quote
- feature:/linux-64::__glibc==2.28=0
- python=3.11 -> libgcc-ng[version='>=11.2.0'] -> __glibc[version='>=2.17']
- spliceai -> tensorflow[version='>=1.13.0'] -> __cuda
- spliceai -> tensorflow[version='>=1.13.0'] -> __glibc[version='>=2.17']
Your installed version is: 2.28
#+end_quote
Il faut utiliser mamba
***** TODO Mail Paul
SCHEDULED: <2023-09-28 Thu>
Au total:
- pas de filtre sur l'épissage pour "rattraper" variants intronique : 6130 variants mais on en perd 4 (donner un exemple)
- avec spip: pas de variant perdu (hormis les 2 lié au bam) mais 7 332 variants au total et exécution lente (1H) mais possible sur serveur
- avec spliceai sur GPU (annotation "à la volée"): pas de variant perdu mais 6609 variants au total. 3h30 de calcul sur le mésocentre, impossible à faire chez nous car GPU
Rejoint ce que disait Yannis....
J'ai l'impression que c'est lié à un grand nombre de missense [1]
Note: j'ai plus confiance dans l'annottaion spliceAI pour T2T car moins compliqué à porter
[1] en prenant le transcrit avec la "pire" conséquences, on a 80% de missense (total = 6609)
11 3_prime_utr_variant
9 3_prime_utr_variant&nmd_transcript_variant
6 5_prime_utr_variant
48 coding_sequence_variant
5 coding_sequence_variant&nmd_transcript_variant
121 frameshift_variant
9 frameshift_variant&nmd_transcript_variant
1 frameshift_variant&splice_donor_region_variant
9 frameshift_variant&splice_region_variant
78 inframe_deletion
3 inframe_deletion&nmd_transcript_variant
2 inframe_deletion&splice_region_variant
84 inframe_insertion
2 inframe_insertion&nmd_transcript_variant
1 inframe_insertion&splice_region_variant
156 intron_variant
8 intron_variant&nmd_transcript_variant
24 intron_variant&non_coding_transcript_variant
5305 missense_variant
205 missense_variant&nmd_transcript_variant
3 missense_variant&splice_donor_5th_base_variant
110 missense_variant&splice_region_variant
9 missense_variant&splice_region_variant&nmd_transcript_variant
11 non_coding_transcript_exon_variant
12 splice_acceptor_variant
1 splice_acceptor_variant&nmd_transcript_variant
2 splice_acceptor_variant&non_coding_transcript_variant
9 splice_donor_5th_base_variant&intron_variant
1 splice_donor_5th_base_variant&intron_variant&nmd_transcript_variant
16 splice_donor_region_variant&intron_variant
4 splice_donor_region_variant&intron_variant&non_coding_transcript_variant
19 splice_donor_variant
1 splice_donor_variant&nmd_transcript_variant
3 splice_donor_variant&non_coding_transcript_variant
1 splice_donor_variant&splice_donor_5th_base_variant&3_prime_utr_variant&intron_variant&nmd_transcript_variant
3 splice_donor_variant&splice_donor_5th_base_variant&coding_sequence_variant&intron_variant
1 splice_donor_variant&splice_donor_5th_base_variant&intron_variant
39 splice_polypyrimidine_tract_variant&intron_variant
5 splice_polypyrimidine_tract_variant&intron_variant&nmd_transcript_variant
10 splice_polypyrimidine_tract_variant&intron_variant&non_coding_transcript_variant
1 splice_region_variant&3_prime_utr_variant
1 splice_region_variant&5_prime_utr_variant
9 splice_region_variant&intron_variant
1 splice_region_variant&intron_variant&nmd_transcript_variant
2 splice_region_variant&intron_variant&non_coding_transcript_variant
5 splice_region_variant&non_coding_transcript_exon_variant
1 splice_region_variant&non_coding_transcript_variant
43 splice_region_variant&splice_polypyrimidine_tract_variant&intron_variant
2 splice_region_variant&splice_polypyrimidine_tract_variant&intron_variant&nmd_transcript_variant
6 splice_region_variant&splice_polypyrimidine_tract_variant&intron_variant&non_coding_transcript_variant
15 splice_region_variant&synonymous_variant
14 start_lost
[2023-09-27 Wed 21:40]
# ****** DONE Filtre vep avec spliceAI: 37365 -> 6130. SpliceAI n'apporte rien
# CLOSED: [2023-09-27 Wed 19:37] SCHEDULED: <2023-09-27 Wed>
# :PROPERTIES:
# :ID: c9b2009a-503b-4561-94c6-29ae21a3188d
# :END:
# #+begin_src sh
# filter_vep -i output-all-gpu.vcf --format vcf --filter " not(Consequence matches non_coding_transcript or Consequence matches stream or Consequence matches intergenic_variant or Consequence matches UTR or Consequence matches intron_variant or Consequence matches synonymous or BIOTYPE matches pseudogene or BIOTYPE matches misc_RNA) or (SpliceAI_pred_DS_AG and SpliceAI_pred_DS_AG >= 0.2) or (SpliceAI_pred_DS_AL and SpliceAI_pred_DS_AL >= 0.2) or (SpliceAI_pred_DS_DG and SpliceAI_pred_DS_DG >= 0.2) or (SpliceAI_pred_DS_DL and SpliceAI_pred_DS_DL >= 0.2) " --only_matched -o output-all-gpu-filtered.vcf
# #+end_src
# filter_vep -i output-all-gpu.vcf --format vcf --filter " not(Consequence matches non_coding_transcript or Consequence matches stream or Consequence matches intergenic_variant or Consequence matches UTR or Consequence matches intron_variant or Consequence matches synonymous or BIOTYPE matches pseudogene or BIOTYPE matches misc_RNA)" --only_matched | grep -c -v '^#'
# 6130
# $ grep -c -v '^#' output-all-gpu-filtered.vcf
# 6130
# ****** DONE Re-vérifier filtre avec spip: 7730 -> probable problème avec spip
# CLOSED: [2023-09-27 Wed 20:54] SCHEDULED: <2023-09-27 Wed>
# filter_vep -i NA12878-sanger-all-T2T.vep.vcf.gz --format vcf --filter " not(Consequence matches non_coding_transcript or Consequence matches stream or Consequence matches intergenic_variant or Consequence matches UTR or Consequence matches intron_variant or Consequence matches synonymous or BIOTYPE matches pseudogene or BIOTYPE matches misc_RNA) or (SPIP_spipScore and SPIP_spipScore >= 20)" --only_matched | grep -c -v '^#'
# perl: warning: Setting locale failed.
# perl: warning: Please check that your locale settings:
# LANGUAGE = (unset),
# LC_ALL = (unset),
# LANG = "en_US.utf8"
# are supported and installed on your system.
# perl: warning: Falling back to the standard locale ("C").
# 7730
****** DONE vérifier si tests sanger passent: ok
CLOSED: [2023-09-28 Thu 01:32] SCHEDULED: <2023-09-27 Wed>
ok !
Haplotypecaller : /Work/Users/apraga/bisonex/out//call_variant/haplotypecaller/NA12878-sanger-all-T2T/NA12878-sanger-all-T2T.haplotypecaller.vcf.gz/Work/Users/apraga/bisonex/out//call_variant/haplotypecaller/NA12878-sanger-all-T2T/NA12878-sanger-all-T2T.haplotypecaller.vcf.gz
144 found over 146
2×3 DataFrame
Row │ variant meanQual depth
│ String Float64 Int64
─────┼──────────────────────────────────────
1 │ chr12:g.13594572C>T 60.0 1
2 │ chr17:g.10204026T>A 60.0 1
/Work/Users/apraga/bisonex/tests/spliceai/output-all-gpu-filtered.vcf.gz
144 found over 146
spliceai : another 0 missed variants
0×3 DataFrame
Row │ variant meanQual depth
│ String Float64 Int64
─────┴──────────────────────────
***** KILL Avec pip: echec
CLOSED: [2023-09-28 Thu 01:34]
2023-09-24 08:28:46.361434: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU.
***** DONE Tester conda: echec
CLOSED: [2023-09-23 Sat 21:43] SCHEDULED: <2023-09-23 Sat>
Ananconda: N'arrive pas à installer
#+begin_quote
- feature:/linux-64::__glibc==2.28=0
- python=3.11 -> libgcc-ng[version='>=11.2.0'] -> __glibc[version='>=2.17']
- spliceai -> tensorflow[version='>=1.13.0'] -> __cuda
- spliceai -> tensorflow[version='>=1.13.0'] -> __glibc[version='>=2.17']
Your installed version is: 2.28
#+end_quote
Il faut utiliser mamba
***** DONE Mail Paul
CLOSED: [2023-09-28 Thu 09:30] SCHEDULED: <2023-09-28 Thu>
Au total:
- pas de filtre sur l'épissage pour "rattraper" variants intronique : 6130 variants mais on en perd 4 (donner un exemple)
- avec spip: pas de variant perdu (hormis les 2 lié au bam) mais 7 332 variants au total et exécution lente (1H) mais possible sur serveur
- avec spliceai sur GPU (annotation "à la volée"): pas de variant perdu mais 6609 variants au total. 3h30 de calcul sur le mésocentre, impossible à faire chez nous car GPU
Rejoint ce que disait Yannis....
J'ai l'impression que c'est lié à un grand nombre de missense [1]
Note: j'ai plus confiance dans l'annottaion spliceAI pour T2T car moins compliqué à porter
[1] en prenant le transcrit avec la "pire" conséquences, on a 80% de missense (total = 6609)
11 3_prime_utr_variant
9 3_prime_utr_variant&nmd_transcript_variant
6 5_prime_utr_variant
48 coding_sequence_variant
5 coding_sequence_variant&nmd_transcript_variant
121 frameshift_variant
9 frameshift_variant&nmd_transcript_variant
1 frameshift_variant&splice_donor_region_variant
9 frameshift_variant&splice_region_variant
78 inframe_deletion
3 inframe_deletion&nmd_transcript_variant
2 inframe_deletion&splice_region_variant
84 inframe_insertion
2 inframe_insertion&nmd_transcript_variant
1 inframe_insertion&splice_region_variant
156 intron_variant
8 intron_variant&nmd_transcript_variant
24 intron_variant&non_coding_transcript_variant
5305 missense_variant
205 missense_variant&nmd_transcript_variant
3 missense_variant&splice_donor_5th_base_variant
110 missense_variant&splice_region_variant
9 missense_variant&splice_region_variant&nmd_transcript_variant
11 non_coding_transcript_exon_variant
12 splice_acceptor_variant
1 splice_acceptor_variant&nmd_transcript_variant
2 splice_acceptor_variant&non_coding_transcript_variant
9 splice_donor_5th_base_variant&intron_variant
1 splice_donor_5th_base_variant&intron_variant&nmd_transcript_variant
16 splice_donor_region_variant&intron_variant
4 splice_donor_region_variant&intron_variant&non_coding_transcript_variant
19 splice_donor_variant
1 splice_donor_variant&nmd_transcript_variant
3 splice_donor_variant&non_coding_transcript_variant
1 splice_donor_variant&splice_donor_5th_base_variant&3_prime_utr_variant&intron_variant&nmd_transcript_variant
3 splice_donor_variant&splice_donor_5th_base_variant&coding_sequence_variant&intron_variant
1 splice_donor_variant&splice_donor_5th_base_variant&intron_variant
39 splice_polypyrimidine_tract_variant&intron_variant
5 splice_polypyrimidine_tract_variant&intron_variant&nmd_transcript_variant
10 splice_polypyrimidine_tract_variant&intron_variant&non_coding_transcript_variant
1 splice_region_variant&3_prime_utr_variant
1 splice_region_variant&5_prime_utr_variant
9 splice_region_variant&intron_variant
1 splice_region_variant&intron_variant&nmd_transcript_variant
2 splice_region_variant&intron_variant&non_coding_transcript_variant
5 splice_region_variant&non_coding_transcript_exon_variant
1 splice_region_variant&non_coding_transcript_variant
43 splice_region_variant&splice_polypyrimidine_tract_variant&intron_variant
2 splice_region_variant&splice_polypyrimidine_tract_variant&intron_variant&nmd_transcript_variant
6 splice_region_variant&splice_polypyrimidine_tract_variant&intron_variant&non_coding_transcript_variant
15 splice_region_variant&synonymous_variant
14 start_lost