B:BD[
18.43603] → [
18.43603:43661]
B:BD[
18.43661] → [
10.33867:42002]
B:BD[
10.42002] → [
2.32:8255]
∅:D[
2.8255] → [
19.602:3681]
B:BD[
20.8222] → [
19.602:3681]
r preferred)
"Wherever possible we would discourage you
from summarising data in this way. "
**** DONE Mail alexis
CLOSED: [2023-08-20 Sun 13:45] SCHEDULED: <2023-08-20 Sun>
**** TODO Données simuscop 200x
SCHEDULED: <2023-11-11 Sat>
**** DONE En T2T avec liftover (filtre = spip) : ok mais lent et trop de variants :tests:
CLOSED: [2023-09-17 Sun 17:13] SCHEDULED: <2023-09-17 Sun>
1. Conversion en bed
#+begin_src sh :dir:~/code/sanger
open snvs-cento-sanger.csv | select chrom pos | insert pos2 {$in.pos } | to csv --separator="\t" | save snvs-cento-sanger.bed -f
#+end_src
2. Liftover avec UCSC (en ligne)
NB: vérifié sur le premier résultat en cherche le read contenant le variant (samtools view -r puis samtools view | grep en T2T) et avec l'aide d'IGV, on a un variant qui correspond en
chr1:10757746
3. En supposant que l'ordre des variants n'a pas changé, on ajoute simplement REF et ALT avec annotateLifted.jl
Annotation spip *très lente* : 1h13 !
Résultat:
2×3 DataFrame
Row │ variant meanQual depth
│ String Float64 Int64
─────┼──────────────────────────────────────
1 │ chr12:g.13594572 60.0 1
2 │ chr17:g.10204026 60.0 1
144 found over 146
filter depth : another 0 missed variants
filter poly : another 0 missed variants
filter vep : another 0 missed variants
Et on a trop de variants en sortie (7330 !)
**** DONE Mail Paul avec résultats filtre en T2T + nouveau schéma
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
** TODO Medically relevant genes
SCHEDULED: <2023-11-09 Thu>
/Entered on/ [2023-10-18 Wed 22:37]
* Ré-interprétation :reanalysis:
** DONE Lancer tests sur données brutes [225/250] <(samples.csv)> <(runs.waiting)>
CLOSED: [2023-10-14 Sat 11:58] SCHEDULED: <2023-10-08 Sun>
- [X] 100222_63015289
- [X] 1600304839_63051311
- [X] 1900007827_62913191
- [X] 1900398899_62999500
- [X] 1900486799_62913197
- [X] 2100422923_62952677
- [X] 2100458888_62933047
- [X] 2100601558_62903840
- [X] 2100609288_62905768
- [X] 2100609501_62905776
- [X] 2100614493_62951074
- [X] 2100622566_62908067
- [X] 2100622601_62908060
- [X] 2100622705_62908063
- [X] 2100640027_62911936
- [X] 2100645285_62913212
- [X] 2100661411_62914081
- [X] 2100661462_62914086
- [X] 2100708257_62921596
- [X] 2100738732_62926501
- [X] 2100738850_62926509
- [X] 2100746751_62926505
- [X] 2100746797_62926506
- [X] 2100782349_62931722
- [X] 2100782416_62931561
- [X] 2100782559_62931718
- [X] 2100799204_62934768
- [X] 2200010202_62940284
- [X] 2200023600_62940631
- [X] 2200024348_62999591
- [X] 2200027505_62942457
- [X] 2200038776_62943412
- [X] 2200041919_62943405
- [X] 2200088014_62951326
- [X] 2200146652_62959388
- [X] 2200151850_62960953
- [X] 2200160014_62959475
- [X] 2200160070_62959478
- [X] 2200201368_62967471
- [X] 2200201400_62967470
- [X] 2200265558_62976332
- [X] 2200265605_62976401
- [X] 2200267046_62975192
- [X] 2200273878_62999530
- [X] 2200279708_62977002
- [X] 2200284408_62979102
- [X] 2200293987_62979116
- [X] 2200294359_62979118
- [X] 2200306299_62982217
- [X] 2200306539_62982193
- [X] 220030671_62982211
- [X] 2200307058_62982231
- [X] 2200307108_62982196
- [X] 2200307136_62982221
- [X] 2200307199_62982239
- [X] 2200307230_62982234
- [X] 2200307262_62982219
- [X] 2200307297_62982227
- [X] 2200324510_62985453
- [X] 2200324549_62985478
- [X] 2200324573_62985445
- [X] 2200324594_62985467
- [X] 2200324606_62985463
- [X] 2200324614_62985459
- [X] 2200338306_62985430
- [X] 2200343880_62989407
- [X] 2200343910_62989460
- [X] 2200343938_62989451
- [X] 2200343966_62989456
- [X] 2200343993_62989440
- [X] 2200344013_62989464
- [X] 2200349749_62989465
- [X] 2200363462_62988848
- [X] 2200377880_62991993
- [X] 2200378032_62991991
- [X] 2200383996_62993828
- [X] 2200384015_62993796
- [X] 2200384046_62993822
- [X] 2200384117_62993808
- [X] 2200384187_62993825
- [X] 2200384231_62992898
- [X] 2200385658_63060260
- [X] 2200394260_62994732
- [X] 2200395817_62994742
- [X] 2200396731_62994737
- [X] 2200424073_62999579
- [X] 2200424207_62999632
- [X] 2200426178_62999630
- [X] 2200426243_62999635
- [X] 2200426466_62999605
- [X] 2200426642_62999627
- [X] 2200427406_62999649
- [X] 2200427512_62999639
- [X] 2200428953_62999572
- [X] 2200428981_62999600
- [X] 2200428999_62999592
- [X] 2200441970_63000868
- [X] 2200441989_63000882
- [X] 2200442135_63000864
- [X] 2200442216_63000886
- [X] 2200442257_63000951
- [X] 2200451801_63003573
- [X] 2200451862_63004218
- [X] 2200451894_63004210
- [X] 2200456165_63051294
- [X] 2200459865_63004933
- [X] 2200459968_63004937
- [X] 2200460073_63004943
- [X] 2200460121_63004684
- [X] 2200467051_63003856
- [X] 2200467225_63004940
- [X] 2200467261_63004930
- [X] 2200467338_63004925
- [X] 2200470099_63004485
- [X] 2200470142_63004480
- [X] 2200471780_63004362
- [X] 2200480910_63006466
- [X] 2200495073_63010427
- [X] 2200495510_63009152
- [X] 2200508677_63060252
- [X] 2200510531_63012582
- [X] 2200510628_63012549
- [X] 2200510657_63012554
- [X] 2200511249_63012533
- [X] 2200511274_63012586
- [X] 2200517952_63060399
- [X] 2200519525_63060439
- [X] 2200524009_63014044
- [X] 2200524609_63014046
- [X] 2200524616_63014048
- [X] 2200533429_63060425
- [X] 2200539735_63060406
- [X] 2200549908_63019339
- [X] 2200549965_63019349
- [X] 2200550414_63019357
- [X] 2200550471_63020031
- [X] 2200550490_63019351
- [X] 2200550505_63019340
- [X] 2200555565_63018614
- [X] 2200559438_63020029
- [X] 2200559682_63020030
- [X] 2200559713_63019623
- [X] 2200559739_63019626
- [X] 2200569969_63019991
- [X] 2200570001_63021580
- [X] 2200570025_63021490
- [X] 2200570035_63021491
- [X] 2200570042_63021493
- [X] 2200570050_63021494
- [X] 2200579897_63024910
- [X] 2200583995_63024866
- [X] 2200584035_63024905
- [X] 2200584069_63024888
- [X] 2200584126_63024810
- [X] 2200589507_63026712
- [X] 2200597365_63027994
- [X] 2200597480_63027988
- [X] 2200597752_63026853
- [X] 2200597778_63027992
- [X] 22005977_63026903
- [X] 2200609031_63026527
- [X] 2200614198_63113928
- [X] 2200620372_63030821
- [X] 2200620442_63030810
- [X] 2200620498_63030816
- [X] 2200620628_63031031
- [X] 2200622310_63030984
- [X] 2200622355_63030956
- [X] 2200625369_63028699
- [X] 2200625410_63028697
- [X] 2200625536_63028694
- [X] 2200630189_63030665
- [X] 2200635149_63033182
- [X] 2200644544_63037731
- [X] 2200644594_63037725
- [X] 2200650089_63038093
- [X] 2200666292_63076568
- [X] 2200669188_63036688
- [X] 2200669320_63040259
- [X] 2200669383_63040254
- [X] 2200669414_63040257
- [X] 2200669446_63040251
- [X] 2200680342_63105271
- [X] 2200694535_63042853
- [X] 2200694789_63042862
- [X] 2200694858_63042702
- [X] 2200694917_63042696
- [X] 2200699290_63043047
- [X] 2200699345_63040238
- [X] 2200699383_63043050
- [X] 2200699412_63040731
- [X] 220071551_63048935
- [X] 2200731515_63048963
- [X] 2200748145_63051198
- [X] 2200748171_63051213
- [X] 2200751046_63051249
- [X] 2200751101_63051234
- [X] 2200766471_63054590
- [X] 2200767731_63054595
- [X] 2200767822_63054464
- [X] 2200775505_63060410
- [X] 2200850441_63019345
- [X] 220597589_63026879
- [X] 2300003253_63060430
- [X] 2300005679_63060370
- [X] 2300009914_63060390
- [X] 2300028784_63060001
- [X] 2300036815_63063357
- [X] 2300055382_63061874
- [X] 2300055421_63061871
- [X] 2300055440_63061880
- [X] 230006894_63064950
- [X] 2300071111_63070356
- [X] 2300083434_63071675
- [X] 2300103609_63076239
- [X] 2300104572_63076232
- [X] 2300109602_63076765
- [X] 2300109665_63076770
- [X] 2300119721_63078732
- [X] 2300137773_63078133
- [X] 2300137834_63078123
- [X] 2300167821_63086183
- [X] 2300172698_63113453
- [X] 2300188216_63090609
- [X] 2300188281_63090632
- [ ] 2300188800_63090616
- [ ] 2300193193645_63090623
- [ ] 2300193668_63090611
- [ ] 2300195426_63090608
- [ ] 2300201017_63089636
- [ ] 2300227479_63098330
- [ ] 2300232688_63130821
- [ ] 2300292749_63109239
- [ ] 230029277_63109247
- [ ] 2300294712_63109236
- [ ] 2300308032_63111581
- [ ] 2300323537_63114209
- [ ] 2300334609_63115535
- [ ] 2300346867_63118093
- [ ] 2300346867_63118093_NA12878
- [ ] 2300348940_63118099
- [ ] 2300359806_63119915
- [ ] 2300380476_63123963
- [ ] 2300382582_63123749
- [ ] 2300384269_63126867
- [ ] 2300407581_63130826
- [ ] 2300407626_63130842
- [ ] 2300409593_63130874
- [ ] 2300409612_63130980
- [ ] 2300417623_63131524
** DONE Variants manqués :checkpipeline:
CLOSED: [2023-11-10 Fri 00:25] SCHEDULED: <2023-10-21 Sat>
*** DONE 63012582: chr10:g.102230760 filtré par AD :63012582:
CLOSED: [2023-10-08 Sun 23:24] SCHEDULED: <2023-10-08 Sun>
Il est en sortie d'haplotypecaller !
Attention à la position : POS=102230753 noté CG->C
GT:AD:DP:GQ:PL 0/1:26,8:34:99:146,0,671
Filtré par la condition AD <= 10 (porté par 8 reads seulement)
Non confirméen sanger, rendu vous
**** KILL image BAM cento
CLOSED: [2023-10-08 Sun 23:13]
**** DONE image BAM bisonex
CLOSED: [2023-10-08 Sun 23:23] SCHEDULED: <2023-10-08 Sun>
**** DONE Mail Paul
CLOSED: [2023-10-08 Sun 23:24] SCHEDULED: <2023-10-08 Sun>
*** DONE 63060439: chr15:g.26869324 = Problème de profondeur DP=15 :63060439:
CLOSED: [2023-10-08 Sun 23:24] SCHEDULED: <2023-10-08 Sun>
GABRA5
Rendu VOUS avec un variant patho MDB5 pour même patient (VOUS- même)
Non confirmé en Sanger
GT:AD:DP:GQ:PL 0/1:9,6:15:99:103,0,213
**** DONE image BAM bisonex
CLOSED: [2023-10-08 Sun 22:56]
**** DONE Mail Paul
CLOSED: [2023-10-08 Sun 23:24] SCHEDULED: <2023-10-08 Sun>
*** DONE Un seul exécutable pour toutes les étapes
CLOSED: [2023-11-04 Sat 19:00] SCHEDULED: <2023-10-21 Sat>
Un utilitaire en ligne de commande qui appel les différentes étapes.
On utilise une structure unique pour toutes les étapes mais qui sera remplie au fur et à mesure. En stockant dans un csv à chaque étape
**** DONE parse variants
CLOSED: [2023-10-21 Sat 23:29] SCHEDULED: <2023-10-21 Sat>
**** DONE Ajouter négatifs dans la liste des variants
CLOSED: [2023-10-22 Sun 23:01] SCHEDULED: <2023-10-21 Sat>
**** DONE Mettre à jour liste des variants
CLOSED: [2023-10-22 Sun 23:01] SCHEDULED: <2023-10-21 Sat>
- [X] Régéner la liste des variants
- [ ] Retrouver les variants modifié à la main avec diff
On ne garde que les ajouts
#+begin_src sh
awk -F ',' '{print $1","$2":"$3$4$5}' extracted.csv | ^sort | save -f extracted_concat.csv
xsv select 1-2 ~/annex/data/centogene/variants/variant_genomic.csv | ^sort | save variant_genomic_corr.csv -f
diff extracted_concat.csv variant_genomic_corr.csv | grep '^>' | save -f update.diff
#+end_src
- [X] Ajouter négatifs
345 variants non trouvés avant modification
141 après modification
- [X] Ajouter différence
- [X] Corriger erreurs de parsing
**** KILL Lifter coordonées variants cento génomique en GRCh38
CLOSED: [2023-10-21 Sat 22:47]
**** DONE Parser coordonnée patient
CLOSED: [2023-11-04 Sat 18:59] SCHEDULED: <2023-10-31 Tue>
**** DONE Un seul type de données
CLOSED: [2023-11-01 Wed 00:55] SCHEDULED: <2023-10-31 Tue>
***** DONE Vérifier avec derniere version sauvegardé
CLOSED: [2023-11-01 Wed 00:55] SCHEDULED: <2023-10-31 Tue>
file,transcript,coding,codingPos,codingChange,proteinChange,classification,zygosity
->
patient,transcript (cento),transcript (canonical),coding,genomic (hg38),classification (cento),zygosity,gene,Confirmed in sanger,Found by bisonex,chrom,pos,ref,alt
- [ ] Fusionner coding,codingPos,codingChange
- [ ] Ne pas écrire proteinchange
- [ ] Fichier de référence : insérer champs vides avec awk : transcript (canonical), genomic puis gene, etc
- [ ] Fichier de référence : renommer header
- [ ] vérifier que fichier toujours identique
#+begin_src julia
using DataFramesMeta, CSV
# - [ ] Fichier de référence : insérer champs vides avec awk : transcript (canonical), genomic puis gene, etc
# - [ ] Fichier de référence : renommer header
# - [ ] vérifier que fichier toujours identique
#file,transcript,coding,codingPos,codingChange,proteinChange,classification,zygosity
#->
#patient,transcript (cento),transcript (canonical),coding,genomic (hg38),classification (cento),zygosity,gene,Confirmed in sanger,Found by bisonex,chrom,pos,ref,alt
function mergeCoding(c, p, ch)
"negatif" in [c, p, ch] ? "negatif" : c * p * ch
end
function negative(t, pos)
t == "negatif" ? -1 : pos
end
cols = [:patient,:"transcript (cento)",:"transcript (canonical)",:coding,:"genomic (hg38)",
:"classification (cento)",:zygosity,:gene,:"Confirmed in sanger",:"Found by bisonex",
:chrom,:pos,:ref,:alt]
d = @chain CSV.read("variant_extracted.csv", DataFrame) begin
# Fusionner coding,codingPos,codingChange
@transform :coding = mergeCoding.(:coding, :codingPos, :codingChange)
# Ne pas écrire proteinchange
@select $(Not([:codingPos, :codingChange, :proteinChange]))
# Add missing mcolumns
@rename :"transcript (cento)" = :transcript :patient = :file :"classification (cento)" = :classification
@transform :"transcript (canonical)" = missing :"genomic (hg38)" = missing :gene = missing
@transform :"Confirmed in sanger" = missing :"Found by bisonex" = missing
@transform :chrom = missing :pos = missing :ref = missing :alt = missing
# Rorder
@select :patient :"transcript (cento)" :"transcript (canonical)" :coding :"genomic (hg38)" :"classification (cento)" :zygosity :gene :"Confirmed in sanger" :"Found by bisonex" :chrom :pos :ref :alt
# Set -1 for negative variant
@rtransform :pos = :"transcript (cento)" == "negatif" ? -1 : :pos
@rtransform :"transcript (canonical)"= :"transcript (cento)" == "negatif" ? "negatif" : :"transcript (canonical)"
@rtransform :"genomic (hg38)" = :"transcript (cento)" == "negatif" ? "negatif" : :"genomic (hg38)"
@rtransform :coding = :"transcript (cento)" == "negatif" ? "negatif" : :coding
@rtransform :gene = :"transcript (cento)" == "negatif" ? "negatif" : :gene
@rtransform :pos = ismissing(:pos) ? -1 : :pos
end
CSV.write("variant_extracted_remap.csv", d)
d2 = @chain CSV.read("extracted.csv", DataFrame) begin
@orderby :patient
end
CSV.write("extracted_sorted.csv", d2)
#+end_src
ON trie les fichiers pour bien avoir le bon order (sinon diff ne fonctionne pas ??)
diff extracted_sorted.csv variant_extracted_remap.csv -u | save extracted.diff
patch -p1 extracted_sorted.csv extracted.diff
patching file extracted_sorted.csv
diff extracted_sorted.csv variant_extracted_remap.csv -u
**** KILL variant_recoder pour avoir les coordonnées VCF
CLOSED: [2023-10-25 Wed 09:13] SCHEDULED: <2023-10-21 Sat>
mobidetails n e trouve pas les ieux transcrits
**** DONE Annotation mobidetails (gene + données gonémique)
CLOSED: [2023-10-25 Wed 09:14]
**** DONE Envoyer liste à Paul
SCHEDULED: <2023-10-26 Thu>
**** DONE compare chaque variant avec la sortie du pipeline
CLOSED: [2023-10-31 Tue 00:18] SCHEDULED: <2023-10-21 Sat>
Avec la fonction "test" dans Search.hs
1126 extracted
654 annotated
253 raw data
102 raw and annotated
236 raw and extracted
17 raw NOT extracted
890 extract WITHOUT raw
#+begin_src sh
❯ open diff.txt | from csv | get id | into string | each {|e| "~/annex/data/centogene/reports/" ++ $e ++ "*.pdf"} | each {|e| firefox $e }
#+end_src
Les 17 manquants sont
- 62913191 : CNV
- 62959388 : MT-ATP6
- 62999572 : MT-ATP6
- 62999627 : CNV
- 62999630 : CNV
- 63004218: CNV
- 63006466 : CNV
- 63009152 : manqué à extraire -> bien présent
- 63015289: CNV
- 63024910 : MT-ATP6
- 63040251 : CNV
- 63043050 : CNV
- 63118093 : NA12878
- NA12878 x4
*** DONE Comparer variants cento à sortie bisonex: 50/121 confirmé en sanger, 71/121 non testé, 0 confirmés manqué par pipeline, 5 manqué mais non confirmés
CLOSED: [2023-11-08 Wed 00:19] SCHEDULED: <2023-11-04 Sat>
*** Comparger sanger : variant seul
Compliqué de reconstituer l'arbre familial. L'information est là mais demande du travail.
ON suppose que le variant n'est que dans la famille....
Résultats
❯ open san
gerized.csv | where "Found by bisonex" == "found" | where "Confirmed in sanger" == "true" | length
50
❯ open sangerized.csv | where "Found by bisonex" == "found" | where "Confirmed in sanger" == "" | length
71
❯ open sangerized.csv | where "Found by bisonex" == "missed" | where "Confirmed in sanger" == "" | length
5
❯ open sangerized.csv | where "Found by bisonex" == "missed" | where "Confirmed in sanger" == "true" | length
0
[[id:cd79a77c-a0b6-4bb1-9e08-fe08dc89e3aa][Résultats finaux]]
*** DONE Regarder 5 variants manqués: 3 explicables, 2 non
CLOSED: [2023-11-09 Thu 00:22] SCHEDULED: <2023-11-05 Sun>
open searched.csv | where "Found by bisonex" == "missed"
62982193 7884996 : haplotypecaller ok... -> filtré car AD=5 <= 10
63012582 102230760 : non présent haplotypcellar mais une délétion en 755 (en 754 CG -> C). Vérifié mobidetails
63019340 50721335 : non présent haplotypecaller (vérifié igv). vérifié mobidetails
63060439 26869324 : filtré car 15 reads
63109239 14358800 : présent haplotypecaller : filtré car DP=29 <= 30
Non présent haplotypecaller avec bcftools mais zgrep ok
zgrep 7884996 call_variant/haplotypecaller/*62982193*/*
zgrep 102230760 call_variant/haplotypecaller/*63012582*/*
zgrep 50721335 call_variant/haplotypecaller/*63019340*/*
zgrep 26869324 call_variant/haplotypecaller/*63060439*/*
zgrep 14358800 call_variant/haplotypecaller/*63109239*/*
*** DONE Flowchart
CLOSED: [2023-11-09 Thu 00:22]
*** DONE Refaire extraction
CLOSED: [2023-11-04 Sat 19:02] SCHEDULED: <2023-11-04 Sat>
*** DONE Refaire annotation avec mobidetails
CLOSED: [2023-11-04 Sat 19:02] SCHEDULED: <2023-11-04 Sat>
*** DONE Refaire annotation avec transcrit non reconnus
CLOSED: [2023-11-04 Sat 20:42] SCHEDULED: <2023-11-04 Sat>
5 transcrits, donnés égalemen tpar
#+begin_src nu
open annotated.csv | where coding != "negatif" | where chrom == ""
#+end_src
| 62676048 | NM_001080420.1 | SHANK3 | référénce non valide |
| 62690893 | NM_001080420.1 | KDM6B | idem |
| 62690893 | NM_001080420.1 | KDM6B | même variant |
| 62795429 | NM_016381.3 | TREX1 | NM_033629.5 |
| 63019340 | NM_001080420.1 | SHANK3 | NM_001372044.2 |
SCHEDULED: <2023-11-01 Wed>
*** DONE Rajouter variant pour 63009152
CLOSED: [2023-11-04 Sat 20:47] SCHEDULED: <2023-11-01 Wed>
*** DONE Regénérer annotation avec NC_
CLOSED: [2023-11-04 Sat 18:59] SCHEDULED: <2023-10-31 Tue>
*** DONE Comparer variants manqué avec sanger: 0 confirmés
CLOSED: [2023-11-06 Mon 23:48] SCHEDULED: <2023-11-04 Sat>
*** DONE Annoter variants avec sanger
CLOSED: [2023-11-08 Wed 23:17] SCHEDULED: <2023-11-07 Tue>
*** DONE Mail paul avec résultats
CLOSED: [2023-11-09 Thu 00:22] SCHEDULED: <2023-11-05 Sun>
* Résultats
** TODO Speed-up BWA-mem
SCHEDULED: <2023-11-11 Sat>
** TODO Speed-up Hapotypecaller
SCHEDULED: <2023-11-11 Sat>
* Communication
** DONE Mail NGS-diag
CLOSED: [2023-10-06 Fri 08:04] SCHEDULED: <2023-10-06 Fri>
/Entered on/ [2023-10-04 Wed 19:33]
r preferred)
"Wherever possible we would discourage you from summarising data in this way. "
**** DONE Mail alexis
CLOSED: [2023-08-20 Sun 13:45] SCHEDULED: <2023-08-20 Sun>
**** TODO Données simuscop 200x
SCHEDULED: <2023-11-15 Wed>
**** DONE En T2T avec liftover (filtre = spip) : ok mais lent et trop de variants :tests:
CLOSED: [2023-09-17 Sun 17:13] SCHEDULED: <2023-09-17 Sun>
1. Conversion en bed
#+begin_src sh :dir:~/code/sanger
open snvs-cento-sanger.csv | select chrom pos | insert pos2 {$in.pos } | to csv --separator="\t" | save snvs-cento-sanger.bed -f
#+end_src
2. Liftover avec UCSC (en ligne)
NB: vérifié sur le premier résultat en cherche le read contenant le variant (samtools view -r puis samtools view | grep en T2T) et avec l'aide d'IGV, on a un variant qui correspond en
chr1:10757746
3. En supposant que l'ordre des variants n'a pas changé, on ajoute simplement REF et ALT avec annotateLifted.jl
Annotation spip *très lente* : 1h13 !
Résultat:
2×3 DataFrame
Row │ variant meanQual depth
│ String Float64 Int64
─────┼──────────────────────────────────────
1 │ chr12:g.13594572 60.0 1
2 │ chr17:g.10204026 60.0 1
144 found over 146
filter depth : another 0 missed variants
filter poly : another 0 missed variants
filter vep : another 0 missed variants
Et on a trop de variants en sortie (7330 !)
**** DONE Mail Paul avec résultats filtre en T2T + nouveau schéma
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
** TODO Medically relevant genes
SCHEDULED: <2023-11-17 Fri>
/Entered on/ [2023-10-18 Wed 22:37]
* Ré-interprétation :reanalysis:
** DONE Lancer tests sur données brutes [225/250] <(samples.csv)> <(runs.waiting)>
CLOSED: [2023-10-14 Sat 11:58] SCHEDULED: <2023-10-08 Sun>
- [X] 100222_63015289
- [X] 1600304839_63051311
- [X] 1900007827_62913191
- [X] 1900398899_62999500
- [X] 1900486799_62913197
- [X] 2100422923_62952677
- [X] 2100458888_62933047
- [X] 2100601558_62903840
- [X] 2100609288_62905768
- [X] 2100609501_62905776
- [X] 2100614493_62951074
- [X] 2100622566_62908067
- [X] 2100622601_62908060
- [X] 2100622705_62908063
- [X] 2100640027_62911936
- [X] 2100645285_62913212
- [X] 2100661411_62914081
- [X] 2100661462_62914086
- [X] 2100708257_62921596
- [X] 2100738732_62926501
- [X] 2100738850_62926509
- [X] 2100746751_62926505
- [X] 2100746797_62926506
- [X] 2100782349_62931722
- [X] 2100782416_62931561
- [X] 2100782559_62931718
- [X] 2100799204_62934768
- [X] 2200010202_62940284
- [X] 2200023600_62940631
- [X] 2200024348_62999591
- [X] 2200027505_62942457
- [X] 2200038776_62943412
- [X] 2200041919_62943405
- [X] 2200088014_62951326
- [X] 2200146652_62959388
- [X] 2200151850_62960953
- [X] 2200160014_62959475
- [X] 2200160070_62959478
- [X] 2200201368_62967471
- [X] 2200201400_62967470
- [X] 2200265558_62976332
- [X] 2200265605_62976401
- [X] 2200267046_62975192
- [X] 2200273878_62999530
- [X] 2200279708_62977002
- [X] 2200284408_62979102
- [X] 2200293987_62979116
- [X] 2200294359_62979118
- [X] 2200306299_62982217
- [X] 2200306539_62982193
- [X] 220030671_62982211
- [X] 2200307058_62982231
- [X] 2200307108_62982196
- [X] 2200307136_62982221
- [X] 2200307199_62982239
- [X] 2200307230_62982234
- [X] 2200307262_62982219
- [X] 2200307297_62982227
- [X] 2200324510_62985453
- [X] 2200324549_62985478
- [X] 2200324573_62985445
- [X] 2200324594_62985467
- [X] 2200324606_62985463
- [X] 2200324614_62985459
- [X] 2200338306_62985430
- [X] 2200343880_62989407
- [X] 2200343910_62989460
- [X] 2200343938_62989451
- [X] 2200343966_62989456
- [X] 2200343993_62989440
- [X] 2200344013_62989464
- [X] 2200349749_62989465
- [X] 2200363462_62988848
- [X] 2200377880_62991993
- [X] 2200378032_62991991
- [X] 2200383996_62993828
- [X] 2200384015_62993796
- [X] 2200384046_62993822
- [X] 2200384117_62993808
- [X] 2200384187_62993825
- [X] 2200384231_62992898
- [X] 2200385658_63060260
- [X] 2200394260_62994732
- [X] 2200395817_62994742
- [X] 2200396731_62994737
- [X] 2200424073_62999579
- [X] 2200424207_62999632
- [X] 2200426178_62999630
- [X] 2200426243_62999635
- [X] 2200426466_62999605
- [X] 2200426642_62999627
- [X] 2200427406_62999649
- [X] 2200427512_62999639
- [X] 2200428953_62999572
- [X] 2200428981_62999600
- [X] 2200428999_62999592
- [X] 2200441970_63000868
- [X] 2200441989_63000882
- [X] 2200442135_63000864
- [X] 2200442216_63000886
- [X] 2200442257_63000951
- [X] 2200451801_63003573
- [X] 2200451862_63004218
- [X] 2200451894_63004210
- [X] 2200456165_63051294
- [X] 2200459865_63004933
- [X] 2200459968_63004937
- [X] 2200460073_63004943
- [X] 2200460121_63004684
- [X] 2200467051_63003856
- [X] 2200467225_63004940
- [X] 2200467261_63004930
- [X] 2200467338_63004925
- [X] 2200470099_63004485
- [X] 2200470142_63004480
- [X] 2200471780_63004362
- [X] 2200480910_63006466
- [X] 2200495073_63010427
- [X] 2200495510_63009152
- [X] 2200508677_63060252
- [X] 2200510531_63012582
- [X] 2200510628_63012549
- [X] 2200510657_63012554
- [X] 2200511249_63012533
- [X] 2200511274_63012586
- [X] 2200517952_63060399
- [X] 2200519525_63060439
- [X] 2200524009_63014044
- [X] 2200524609_63014046
- [X] 2200524616_63014048
- [X] 2200533429_63060425
- [X] 2200539735_63060406
- [X] 2200549908_63019339
- [X] 2200549965_63019349
- [X] 2200550414_63019357
- [X] 2200550471_63020031
- [X] 2200550490_63019351
- [X] 2200550505_63019340
- [X] 2200555565_63018614
- [X] 2200559438_63020029
- [X] 2200559682_63020030
- [X] 2200559713_63019623
- [X] 2200559739_63019626
- [X] 2200569969_63019991
- [X] 2200570001_63021580
- [X] 2200570025_63021490
- [X] 2200570035_63021491
- [X] 2200570042_63021493
- [X] 2200570050_63021494
- [X] 2200579897_63024910
- [X] 2200583995_63024866
- [X] 2200584035_63024905
- [X] 2200584069_63024888
- [X] 2200584126_63024810
- [X] 2200589507_63026712
- [X] 2200597365_63027994
- [X] 2200597480_63027988
- [X] 2200597752_63026853
- [X] 2200597778_63027992
- [X] 22005977_63026903
- [X] 2200609031_63026527
- [X] 2200614198_63113928
- [X] 2200620372_63030821
- [X] 2200620442_63030810
- [X] 2200620498_63030816
- [X] 2200620628_63031031
- [X] 2200622310_63030984
- [X] 2200622355_63030956
- [X] 2200625369_63028699
- [X] 2200625410_63028697
- [X] 2200625536_63028694
- [X] 2200630189_63030665
- [X] 2200635149_63033182
- [X] 2200644544_63037731
- [X] 2200644594_63037725
- [X] 2200650089_63038093
- [X] 2200666292_63076568
- [X] 2200669188_63036688
- [X] 2200669320_63040259
- [X] 2200669383_63040254
- [X] 2200669414_63040257
- [X] 2200669446_63040251
- [X] 2200680342_63105271
- [X] 2200694535_63042853
- [X] 2200694789_63042862
- [X] 2200694858_63042702
- [X] 2200694917_63042696
- [X] 2200699290_63043047
- [X] 2200699345_63040238
- [X] 2200699383_63043050
- [X] 2200699412_63040731
- [X] 220071551_63048935
- [X] 2200731515_63048963
- [X] 2200748145_63051198
- [X] 2200748171_63051213
- [X] 2200751046_63051249
- [X] 2200751101_63051234
- [X] 2200766471_63054590
- [X] 2200767731_63054595
- [X] 2200767822_63054464
- [X] 2200775505_63060410
- [X] 2200850441_63019345
- [X] 220597589_63026879
- [X] 2300003253_63060430
- [X] 2300005679_63060370
- [X] 2300009914_63060390
- [X] 2300028784_63060001
- [X] 2300036815_63063357
- [X] 2300055382_63061874
- [X] 2300055421_63061871
- [X] 2300055440_63061880
- [X] 230006894_63064950
- [X] 2300071111_63070356
- [X] 2300083434_63071675
- [X] 2300103609_63076239
- [X] 2300104572_63076232
- [X] 2300109602_63076765
- [X] 2300109665_63076770
- [X] 2300119721_63078732
- [X] 2300137773_63078133
- [X] 2300137834_63078123
- [X] 2300167821_63086183
- [X] 2300172698_63113453
- [X] 2300188216_63090609
- [X] 2300188281_63090632
- [ ] 2300188800_63090616
- [ ] 2300193193645_63090623
- [ ] 2300193668_63090611
- [ ] 2300195426_63090608
- [ ] 2300201017_63089636
- [ ] 2300227479_63098330
- [ ] 2300232688_63130821
- [ ] 2300292749_63109239
- [ ] 230029277_63109247
- [ ] 2300294712_63109236
- [ ] 2300308032_63111581
- [ ] 2300323537_63114209
- [ ] 2300334609_63115535
- [ ] 2300346867_63118093
- [ ] 2300346867_63118093_NA12878
- [ ] 2300348940_63118099
- [ ] 2300359806_63119915
- [ ] 2300380476_63123963
- [ ] 2300382582_63123749
- [ ] 2300384269_63126867
- [ ] 2300407581_63130826
- [ ] 2300407626_63130842
- [ ] 2300409593_63130874
- [ ] 2300409612_63130980
- [ ] 2300417623_63131524
** TODO Variants manqués :checkpipeline:
SCHEDULED: <2023-10-21 Sat>
*** DONE 63012582: chr10:g.102230760 filtré par AD :63012582:
CLOSED: [2023-10-08 Sun 23:24] SCHEDULED: <2023-10-08 Sun>
Il est en sortie d'haplotypecaller !
Attention à la position : POS=102230753 noté CG->C
GT:AD:DP:GQ:PL 0/1:26,8:34:99:146,0,671
Filtré par la condition AD <= 10 (porté par 8 reads seulement)
Non confirméen sanger, rendu vous
**** KILL image BAM cento
CLOSED: [2023-10-08 Sun 23:13]
**** DONE image BAM bisonex
CLOSED: [2023-10-08 Sun 23:23] SCHEDULED: <2023-10-08 Sun>
**** DONE Mail Paul
CLOSED: [2023-10-08 Sun 23:24] SCHEDULED: <2023-10-08 Sun>
*** DONE 63060439: chr15:g.26869324 = Problème de profondeur DP=15 :63060439:
CLOSED: [2023-10-08 Sun 23:24] SCHEDULED: <2023-10-08 Sun>
GABRA5
Rendu VOUS avec un variant patho MDB5 pour même patient (VOUS- même)
Non confirmé en Sanger
GT:AD:DP:GQ:PL 0/1:9,6:15:99:103,0,213
**** DONE image BAM bisonex
CLOSED: [2023-10-08 Sun 22:56]
**** DONE Mail Paul
CLOSED: [2023-10-08 Sun 23:24] SCHEDULED: <2023-10-08 Sun>
*** DONE Un seul exécutable pour toutes les étapes
CLOSED: [2023-11-04 Sat 19:00] SCHEDULED: <2023-10-21 Sat>
Un utilitaire en ligne de commande qui appel les différentes étapes.
On utilise une structure unique pour toutes les étapes mais qui sera remplie au fur et à mesure. En stockant dans un csv à chaque étape
**** DONE parse variants
CLOSED: [2023-10-21 Sat 23:29] SCHEDULED: <2023-10-21 Sat>
**** DONE Ajouter négatifs dans la liste des variants
CLOSED: [2023-10-22 Sun 23:01] SCHEDULED: <2023-10-21 Sat>
**** DONE Mettre à jour liste des variants
CLOSED: [2023-10-22 Sun 23:01] SCHEDULED: <2023-10-21 Sat>
- [X] Régéner la liste des variants
- [ ] Retrouver les variants modifié à la main avec diff
On ne garde que les ajouts
#+begin_src sh
awk -F ',' '{print $1","$2":"$3$4$5}' extracted.csv | ^sort | save -f extracted_concat.csv
xsv select 1-2 ~/annex/data/centogene/variants/variant_genomic.csv | ^sort | save variant_genomic_corr.csv -f
diff extracted_concat.csv variant_genomic_corr.csv | grep '^>' | save -f update.diff
#+end_src
- [X] Ajouter négatifs
345 variants non trouvés avant modification
141 après modification
- [X] Ajouter différence
- [X] Corriger erreurs de parsing
**** KILL Lifter coordonées variants cento génomique en GRCh38
CLOSED: [2023-10-21 Sat 22:47]
**** DONE Parser coordonnée patient
CLOSED: [2023-11-04 Sat 18:59] SCHEDULED: <2023-10-31 Tue>
**** DONE Un seul type de données
CLOSED: [2023-11-01 Wed 00:55] SCHEDULED: <2023-10-31 Tue>
***** DONE Vérifier avec derniere version sauvegardé
CLOSED: [2023-11-01 Wed 00:55] SCHEDULED: <2023-10-31 Tue>
file,transcript,coding,codingPos,codingChange,proteinChange,classification,zygosity
->
patient,transcript (cento),transcript (canonical),coding,genomic (hg38),classification (cento),zygosity,gene,Confirmed in sanger,Found by bisonex,chrom,pos,ref,alt
- [ ] Fusionner coding,codingPos,codingChange
- [ ] Ne pas écrire proteinchange
- [ ] Fichier de référence : insérer champs vides avec awk : transcript (canonical), genomic puis gene, etc
- [ ] Fichier de référence : renommer header
- [ ] vérifier que fichier toujours identique
#+begin_src julia
using DataFramesMeta, CSV
# - [ ] Fichier de référence : insérer champs vides avec awk : transcript (canonical), genomic puis gene, etc
# - [ ] Fichier de référence : renommer header
# - [ ] vérifier que fichier toujours identique
#file,transcript,coding,codingPos,codingChange,proteinChange,classification,zygosity
#->
#patient,transcript (cento),transcript (canonical),coding,genomic (hg38),classification (cento),zygosity,gene,Confirmed in sanger,Found by bisonex,chrom,pos,ref,alt
function mergeCoding(c, p, ch)
"negatif" in [c, p, ch] ? "negatif" : c * p * ch
end
function negative(t, pos)
t == "negatif" ? -1 : pos
end
cols = [:patient,:"transcript (cento)",:"transcript (canonical)",:coding,:"genomic (hg38)",
:"classification (cento)",:zygosity,:gene,:"Confirmed in sanger",:"Found by bisonex",
:chrom,:pos,:ref,:alt]
d = @chain CSV.read("variant_extracted.csv", DataFrame) begin
# Fusionner coding,codingPos,codingChange
@transform :coding = mergeCoding.(:coding, :codingPos, :codingChange)
# Ne pas écrire proteinchange
@select $(Not([:codingPos, :codingChange, :proteinChange]))
# Add missing mcolumns
@rename :"transcript (cento)" = :transcript :patient = :file :"classification (cento)" = :classification
@transform :"transcript (canonical)" = missing :"genomic (hg38)" = missing :gene = missing
@transform :"Confirmed in sanger" = missing :"Found by bisonex" = missing
@transform :chrom = missing :pos = missing :ref = missing :alt = missing
# Rorder
@select :patient :"transcript (cento)" :"transcript (canonical)" :coding :"genomic (hg38)" :"classification (cento)" :zygosity :gene :"Confirmed in sanger" :"Found by bisonex" :chrom :pos :ref :alt
# Set -1 for negative variant
@rtransform :pos = :"transcript (cento)" == "negatif" ? -1 : :pos
@rtransform :"transcript (canonical)"= :"transcript (cento)" == "negatif" ? "negatif" : :"transcript (canonical)"
@rtransform :"genomic (hg38)" = :"transcript (cento)" == "negatif" ? "negatif" : :"genomic (hg38)"
@rtransform :coding = :"transcript (cento)" == "negatif" ? "negatif" : :coding
@rtransform :gene = :"transcript (cento)" == "negatif" ? "negatif" : :gene
@rtransform :pos = ismissing(:pos) ? -1 : :pos
end
CSV.write("variant_extracted_remap.csv", d)
d2 = @chain CSV.read("extracted.csv", DataFrame) begin
@orderby :patient
end
CSV.write("extracted_sorted.csv", d2)
#+end_src
ON trie les fichiers pour bien avoir le bon order (sinon diff ne fonctionne pas ??)
diff extracted_sorted.csv variant_extracted_remap.csv -u | save extracted.diff
patch -p1 extracted_sorted.csv extracted.diff
patching file extracted_sorted.csv
diff extracted_sorted.csv variant_extracted_remap.csv -u
**** KILL variant_recoder pour avoir les coordonnées VCF
CLOSED: [2023-10-25 Wed 09:13] SCHEDULED: <2023-10-21 Sat>
mobidetails n e trouve pas les ieux transcrits
**** DONE Annotation mobidetails (gene + données gonémique)
CLOSED: [2023-10-25 Wed 09:14]
**** DONE Envoyer liste à Paul
SCHEDULED: <2023-10-26 Thu>
**** DONE compare chaque variant avec la sortie du pipeline
CLOSED: [2023-10-31 Tue 00:18] SCHEDULED: <2023-10-21 Sat>
Avec la fonction "test" dans Search.hs
1126 extracted
654 annotated
253 raw data
102 raw and annotated
236 raw and extracted
17 raw NOT extracted
890 extract WITHOUT raw
#+begin_src sh
❯ open diff.txt | from csv | get id | into string | each {|e| "~/annex/data/centogene/reports/" ++ $e ++ "*.pdf"} | each {|e| firefox $e }
#+end_src
Les 17 manquants sont
- 62913191 : CNV
- 62959388 : MT-ATP6
- 62999572 : MT-ATP6
- 62999627 : CNV
- 62999630 : CNV
- 63004218: CNV
- 63006466 : CNV
- 63009152 : manqué à extraire -> bien présent
- 63015289: CNV
- 63024910 : MT-ATP6
- 63040251 : CNV
- 63043050 : CNV
- 63118093 : NA12878
- NA12878 x4
*** DONE Comparer variants cento à sortie bisonex: 50/121 confirmé en sanger, 71/121 non testé, 0 confirmés manqué par pipeline, 5 manqué mais non confirmés
CLOSED: [2023-11-08 Wed 00:19] SCHEDULED: <2023-11-04 Sat>
*** Comparger sanger : variant seul
Compliqué de reconstituer l'arbre familial. L'information est là mais demande du travail.
ON suppose que le variant n'est que dans la famille....
Résultats
❯ open sangerized.csv | where "Found by bisonex" == "found" | where "Confirmed in sanger" == "true" | length
50
❯ open sangerized.csv | where "Found by bisonex" == "found" | where "Confirmed in sanger" == "" | length
71
❯ open sangerized.csv | where "Found by bisonex" == "missed" | where "Confirmed in sanger" == "" | length
5
❯ open sangerized.csv | where "Found by bisonex" == "missed" | where "Confirmed in sanger" == "true" | length
0
[[id:cd79a77c-a0b6-4bb1-9e08-fe08dc89e3aa][Résultats finaux]]
*** DONE Regarder 5 variants manqués: 3 explicables, 2 non
CLOSED: [2023-11-09 Thu 00:22] SCHEDULED: <2023-11-05 Sun>
open searched.csv | where "Found by bisonex" == "missed"
62982193 7884996 : haplotypecaller ok... -> filtré car AD=5 <= 10
63012582 102230760 : non présent haplotypcellar mais une délétion en 755 (en 754 CG -> C). Vérifié mobidetails
63019340 50721335 : non présent haplotypecaller (vérifié igv). vérifié mobidetails
63060439 26869324 : filtré car 15 reads
63109239 14358800 : présent haplotypecaller : filtré car DP=29 <= 30
Non présent haplotypecaller avec bcftools mais zgrep ok
zgrep 7884996 call_variant/haplotypecaller/*62982193*/*
zgrep 102230760 call_variant/haplotypecaller/*63012582*/*
zgrep 50721335 call_variant/haplotypecaller/*63019340*/*
zgrep 26869324 call_variant/haplotypecaller/*63060439*/*
zgrep 14358800 call_variant/haplotypecaller/*63109239*/*
*** DONE Flowchart
CLOSED: [2023-11-09 Thu 00:22]
*** DONE Refaire extraction
CLOSED: [2023-11-04 Sat 19:02] SCHEDULED: <2023-11-04 Sat>
*** DONE Refaire annotation avec mobidetails
CLOSED: [2023-11-04 Sat 19:02] SCHEDULED: <2023-11-04 Sat>
*** DONE Refaire annotation avec transcrit non reconnus
CLOSED: [2023-11-04 Sat 20:42] SCHEDULED: <2023-11-04 Sat>
5 transcrits, donnés égalemen tpar
#+begin_src nu
open annotated.csv | where coding != "negatif" | where chrom == ""
#+end_src
| 62676048 | NM_001080420.1 | SHANK3 | référénce non valide |
| 62690893 | NM_001080420.1 | KDM6B | idem |
| 62690893 | NM_001080420.1 | KDM6B | même variant |
| 62795429 | NM_016381.3 | TREX1 | NM_033629.5 |
| 63019340 | NM_001080420.1 | SHANK3 | NM_001372044.2 |
SCHEDULED: <2023-11-01 Wed>
*** DONE Rajouter variant pour 63009152
CLOSED: [2023-11-04 Sat 20:47] SCHEDULED: <2023-11-01 Wed>
*** DONE Regénérer annotation avec NC_
CLOSED: [2023-11-04 Sat 18:59] SCHEDULED: <2023-10-31 Tue>
*** DONE Comparer variants manqué avec sanger: 0 confirmés
CLOSED: [2023-11-06 Mon 23:48] SCHEDULED: <2023-11-04 Sat>
*** DONE Annoter variants avec sanger
CLOSED: [2023-11-08 Wed 23:17] SCHEDULED: <2023-11-07 Tue>
*** DONE Mail paul avec résultats
CLOSED: [2023-11-09 Thu 00:22] SCHEDULED: <2023-11-05 Sun>
*** DONE Vérifier coordonnées des 2 variants manquants
CLOSED: [2023-11-12 Sun 16:53] SCHEDULED: <2023-11-11 Sat>
Les 2 sont des homopolymer
- 1er = même variant mais représenté différement
- SHANK3 ?
**** PITX3: filtrée car AD=8
NB: représentation synonyme
Même séquence
>hg38_dna range=chr10:102230742-102230777 5'pad=2 3'pad=2 strand=+ repeatMasking=none
GGAGCCAGCCCGGGGGGGCCCCCGCCCAGGCCCTG
>hg19_dna range=chr10:103990500-103990534 5'pad=0 3'pad=0 strand=+ repeatMasking=none
GGAGCCAGCCCGGGGGGGCCCCCGCCCAGGCCCTG
Selon IGV:
GGAGCCAGCCC(G)GGGGGGCCCCCGCCCAGGCCCTG
Selon cento
GGAGCCAGCCCGGGGGG(G)CCCCCGCCCAGGCCCTG
#+begin_src sh :dir ~/annex/data/bisonex/
bcftools filter -i 'POS=102230760' call_variant/haplotypecaller/*63012582*/*.vcf.gz
#+end_src
DP ok mais AD trop faible
GT:AD:DP:GQ:PL 0/1:26,8:34:99:146,0,671
**** SHANK3: transcrit supprimé depuis: ok
Retrouvé par ERic: 50721504dup
On vérifie
#+begin_src sh :dir ~/annex/data/bisonex/
bcftools filter -i 'POS=50721504' call_variant/haplotypecaller/*63019340*/*.vcf.gz
#+end_src
#+begin_src sh :dir ~/annex/data/bisonex/
zgrep '50721504' annotate/full/*63019340*.tsv
#+end_src
* Résultats
** TODO Speed-up BWA-mem
SCHEDULED: <2023-11-19 Sun>
** TODO Speed-up Hapotypecaller
SCHEDULED: <2023-11-19 Sun>
* Communication
** DONE Mail NGS-diag
CLOSED: [2023-10-06 Fri 08:04] SCHEDULED: <2023-10-06 Fri>
/Entered on/ [2023-10-04 Wed 19:33]