XNMQLJSI7Y44JPOUN3F2ZARC62A5OL5YEQNAP3ZV2BU4XNCUSCRQC
NNXJAZKQ6TCPG6NTQB5OK647FCZMTTKOED2JOTTP2ZHGI67AANIQC
QGZPAO3X5YZ7HFTQUJULRFCKRZXFOMJ23W3DNLMZZH4IVHUN7CSAC
QTPZ6AL5CZES5XTQHDFXSHWHU5SXXGI7EBLZMXJK7I7TGSISI4YAC
YYYI54A7EXSROJ64AILVAR23PYF3VWS2C27QW4WIMWGOVDZHKJVQC
ME74GL3O7CDN3SEXU4QLQVMFB4TKR2UFVRNIEVBBYMJM2MQSEIRAC
RHWQQAAHNHFO3FLCGVB3SIDKNOUFJGZTDNN57IQVBMXXCWX74MKAC
UPVCS5WSF5W5CYRF5YSG23C5CSFW4HFKZSJXXKBWGCMGO7V5GWAAC
FXA3ZBV64FML7W47IPHTAJFJHN3J3XHVHFVNYED47XFSBIGMBKRQC
PL526DIB3OIAMC35BV5DRTUHRZDIJDEUCPPK2AUY5CHLWJNEWI4AC
JMVLHRX7RFJCQFXESQT75DUKOLBQEO7LMKOZCI4QWQBQKAWLC2PAC
WU76EHIWT7E3CEKGV6LQJUSOC63JYDPPDRV3KKOEGLP2FB5UEUEQC
FYUID2XZFKETTS6EJLOGMV5DJJMLY4NCQSATT3ZKWDLHEKJRO4TAC
7F3ARXB7ZM4WPC3FOH5RHXFHWB3MGZ6XJYJX5IH6RFQNUKOWHAOQC
6BKBLJ2QBO3O7C3J47SFFOYYGONV4A2FE3RTFKWBMDM4G5TLAH5QC
T26RE7E7ZHSHZYNQS7XCDJJUYGOXNOZ7IOOFHMKLNY3KVVKGQ2BQC
CNHFD23P6634DYH6WYSWLNORGZH2UEVRDVGR356PNSNF5ONBAKVAC
CHCMW6PPGP2XMJQT4D6EYQUAO5MG77RUE5JTVRET7B7ODKRVM4GQC
T7GS3FVFOFBEFQXCLESYLY7JJ6D2ZVA2CAH5U4FRTKHTMPLINNXAC
NJXOLJZQUCFUKZA657S4SKJWP3U7VV4PZJYANT5IWJIB2OZYDFKAC
JXA2LMD35VKJZLEJAJYNZWU6RGGQ7HZDXFQHNCDHODQ4AHCQU5ZQC
NBJFXQNG6YLIEL6HK7VZDEQZTXK2QJZ43AKNLXIIBX5K3MOX6WCQC
DZ6GQN2ERJAZG3PWZC34EN32YZOAPVFNGVCMJQNMFZVK4TR57UTAC
*** STRT google drive
SCHEDULED: <2023-10-01 Sun>
)
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
** DONE Figure: nombre de publication par aligneur
CLOSED: [2023-09-19 Tue 16:54] SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
** TODO Figure: nombre de publication par appel de variant
SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
** TODO Figure: nombre d'exomes par années
SCHEDULED: <2023-09-30 Sat>
/Entered on/ [20
23-09-19 Tue 08:43]
* Tests :tests:
** KILL Non régression : version prod
CLOSED: [2023-05-23 Tue 08:46]
*** DONE ID common snp
CLOSED: [2022-11-19 Sat 21:36]
#+begin_src
$ wc -l ID_of_common_snp.txt
23194290 ID_of_common_snp.txt
$ wc -l /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
23194290 /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
#+end_src
*** DONE ID common snp not clinvar patho
CLOSED: [2022-12-11 Sun 20:11]
**** DONE Vérification du problème
CLOSED: [2022-12-11 Sun 16:30]
Sur le J:
21155134 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref
Version de "non-régression"
21155076 database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt
Nouvelle version
23193391 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt
Si on enlève les doublons
$ sort database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt | uniq > old.txt
$ wc -l old.txt
21107097 old.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt | uniq > new.txt
$ wc -l new.txt
21174578 new.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref | uniq > ref.txt
$ wc -l ref.txt
21107155 ref.txt
Si on regarde la différence
comm -23 ref.txt old.txt
rs1052692
rs1057518973
rs1057518973
rs11074121
rs11284
8754
rs12573787
rs145033890
rs147889095
rs1553904159
rs1560294695
rs1560296615
rs1560310926
rs1560325547
rs1560342418
rs1560356225
rs1578287542
...
On cherche le premier
bcftools query -i 'ID="rs1052692"' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 1619351 C A,T
Il est bien patho...
$ bcftools query -i 'POS=1619351' database/clinvar/clinvar.vcf.gz -f '%CHROM %POS %REF %ALT %INFO/CLNSIG\n'
19 1619351 C T Conflicting_interpretations_of_pathogenicity
On vérifie pour tous les autres
$ comm -23 ref.txt old.txt > tocheck.txt
On génère les régions à vérifier (chromosome number:position)
$ bcftools query -i 'ID=@tocheck.txt' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM\t%POS\n' > tocheck.pos
On génère le mapping inverse (chromosome number -> NC)
$ awk ' { t = $1; $1 = $2; $2 = t; print; } ' database/RefSeq/refseq_to_number_only_consensual.txt > mapping.txt
On remap clinvar
$ bcftools annotate --rename-chrs mapping.txt database/clinvar/clinvar.vcf.gz -o clinvar_remapped.vcf.gz
$ tabix clinvar_remapped.vcf.gz
Enfin, on cherche dans clinvar la classification
$ bcftools query -R tocheck.pos clinvar_remapped.vcf.gz -f '%CHROM %POS %INFO/CLNSIG\n'
$ bcftools query -R tocheck.pos database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %ID \n' | grep '^NC'
#+RESULTS:
**** DONE Comprendre pourquoi la nouvelle version donne un résultat différent
CLOSED: [2022-12-11 Sun 20:11]
***** DONE Même version dbsnp et clinvar ?
CLOSED: [2022-12-10 Sat 23:02]
Clinvar différent !
$ bcftools stats clinvar.gz
clinvar (
Alexis)
SN 0 number of samples: 0
SN 0 number of records: 1492828
SN 0 number of no-ALTs: 965
SN 0 number of SNPs: 1338007
SN 0 number of MNPs: 5562
SN 0 number of indels: 144580
SN 0 number of others: 3714
SN 0 number of multiallelic sites: 0
SN 0 number of multiallelic SNP sites: 0
clinvar (new)
SN 0 number of samples: 0
SN 0 number of records: 1493470
SN 0 number of no-ALTs: 965
SN 0 number of SNPs: 1338561
SN 0 number of MNPs: 5565
SN 0 number of indels: 144663
SN 0 number of others: 3716
SN 0 number of multiallelic sites: 0
SN 0 number of multiallelic SNP sites: 0
***** DONE Mettre à jour clinvar et dbnSNP pour travailler sur les mêm bases
CLOSED: [2022-12-11 Sun 12:10]
Problème persiste
***** DONE Supprimer la conversion en int du chromosome
CLOSED: [2022-12-10 Sat 19:29]
***** KILL Même NC ?
CLOSED: [2022-12-10 Sat 19:29]
$ zgrep "contig=<ID=NC_\(.*\)" clinvar/GRCh38/clinvar.vcf.gz > contig.clinvar
$ diff contig.txt contig.clinvar
< ##contig=<ID=NC_012920.1>
***** D
ONE Tester sur chromosome 19: ok
CLOSED: [2022-12-11 Sun 13:53]
On prépare les données
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/dbSNP_common.vcf.gz -o dbSNP_common_19.vcf.gz
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/clinvar/GRCh38/clinvar.vcf.gz -o clinvar_
19.vcf.gz
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data-alexis/dbSNP/dbSNP_common.vcf.gz -o dbSNP_common_19_old.vcf.gz
bcftools filter -i 'CHROM="19"' /Work/Groups/bisonex/data-alexis/clinvar/clinvar.vcf.gz -o clinvar_19_old.vcf.gz
#+end_src
On récupère les 2 versions du script
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga
/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
git checkout regression ../../sc
ript/pythonScript/clinvar_sbSNP.py
cp ../../script/pythonScript/clinvar_sbSNP.py clinvar_sbSNP_old.py
git checkout HEAD ../../script/pythonScript/clinvar_sbSNP.py
#+end_src
#+RESULTS:
On compare
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
python ..
/../script/pythonScript/clinvar_sbSNP.py clinvar_sbSNP.py --clinvar clinvar_19.vcf.gz --dbSNP dbSNP_common_19.vcf.gz --output tmp.txt
sort tmp.txt | uniq > new.txt
table=/Work/Groups/bisonex/data-alexis/RefSeq/refseq_to_number_only_consensual.t
xt
python clinvar_sbSNP_old.py --clinvar clinvar_19_old.vcf.gz --dbSNP dbSNP_common_19_old.vcf.gz --output tmp_old.txt --chrm_name_table $table
sort tmp_old.txt | uniq > old.txt
wc -l old.txt new.txt
#+end_src
#+RESULTS:
| 535155 | old.txt |
| 535194 | new.txt |
| 1070349 | total
|
Si on prend le premier manquant dans new, il est conflicting patho donc il ne devrait pas y être...
$ bcftools query -i 'ID="rs10418277"' dbSNP
_common_19.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 54939682 C G,T
$ bcftools query -i 'ID="rs10418277"' dbSNP_common_19_old.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 54939682 C G,T
$ bcftools query -i 'POS=54939682'
clinvar_19.vcf.gz -f '%POS %REF %ALT %INFO/CLNSIG\n'
54939682 C G Conflicting_interpretations_of_pathogenicity
54939682 C T Benign
$ bcftools query -i 'POS=54939682' clinvar_19_old.vcf.gz -f '%POS %REF %ALT %INFO/CLNSIG\n'
54939682 C G Conflicting_interpretations_of_pathogenicity
54939682 C T Benign
$ grep rs10418277 *.txt
new.txt:rs10418277
tmp.txt:rs10418277
Le problème venait de la
POS qui n'était plus convertie en int (suppression de la ligne par erreur ??)
On vérifie
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
python ../../script/pythonScript/clinvar_sbSNP.py --clinvar clinvar_19.vcf.gz --dbSNP dbSNP_common_19.vcf.gz --output tmp.txt
sort tmp.txt | uniq > new.txt
table=/Work/Groups/bisonex/data-alexis/RefSeq/refseq_to_number_only_consensual.txt
python clinvar_sbSNP_old.py --clinvar clinvar_19_old.vcf.gz --dbSNP dbSNP_common_19_old.vcf.gz --output tmp_old.txt --chrm_name_table $table
sort tmp_old.txt | uniq > old.txt
wc -l old.txt new.txt
diff old.txt new.txt
#+end_src
#+RESULTS:
| 535155 | old.txt |
| 535155 | new.txt |
| 1070310 | total |
***** DONE Tester sur chromosome 19 et 20: ok
CLOSED: [2022-12-11 Sun 15:56]
On prépare les données
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
bcftools filter -i 'CHROM="NC_000019.10" | CHROM="NC_000020.11"' /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/dbSNP_common.vcf.gz -o dbSNP_common_19_20.vcf.gz
bc
)
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
** DONE Figure: nombre de publication par aligneur
CLOSED: [2023-09-19 Tue 16:54] SCHEDULED: <2023-09-19 Tue>
/Entered on/ [2023-09-19 Tue 08:43]
** TODO Figure: nombre de publication par appel de variant
SCHEDULED: <2023-10-01 Sun>
/Entered on/ [2023-09-19 Tue 08:43]
** TODO Figure: nombre d'exomes par années
SCHEDULED: <2023-10-01 Sun>
/Entered on/ [2023-09-19 Tue 08:43]
* Tests :tests:
** KILL Non régression : version prod
CLOSED: [2023-05-23 Tue 08:46]
*** DONE ID common snp
CLOSED: [2022-11-19 Sat 21:36]
#+begin_src
$ wc -l ID_of_common_snp.txt
23194290 ID_of_common_snp.txt
$ wc -l /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
23194290 /Work/Users/apraga/bisonex/database/dbSNP/ID_of_common_snp.txt
#+end_src
*** DONE ID common snp not clinvar patho
CLOSED: [2022-12-11 Sun 20:11]
**** DONE Vérification du problème
CLOSED: [2022-12-11 Sun 16:30]
Sur le J:
21155134 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref
Version de "non-régression"
21155076 database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt
Nouvelle version
23193391 /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt
Si on enlève les doublons
$ sort database/dbSNP/ID_of_common_snp_not_clinvar_patho.txt | uniq > old.txt
$ wc -l old.txt
21107097 old.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt | uniq > new.txt
$ wc -l new.txt
21174578 new.txt
$ sort /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/ID_of_common_snp_not_clinvar_patho.txt.ref | uniq > ref.txt
$ wc -l ref.txt
21107155 ref.txt
Si on regarde la différence
comm -23 ref.txt old.txt
rs1052692
rs1057518973
rs1057518973
rs11074121
rs112848754
rs12573787
rs145033890
rs147889095
rs1553904159
rs1560294695
rs1560296615
rs1560310926
rs1560325547
rs1560342418
rs1560356225
rs1578287542
...
On cherche le premier
bcftools query -i 'ID="rs1052692"' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 1619351 C A,T
Il est bien patho...
$ bcftools query -i 'POS=1619351' database/clinvar/clinvar.vcf.gz -f '%CHROM %POS %REF %ALT %INFO/CLNSIG\n'
19 1619351 C T Conflicting_interpretations_of_pathogenicity
On vérifie pour tous les autres
$ comm -23 ref.txt old.txt > tocheck.txt
On génère les régions à vérifier (chromosome number:position)
$ bcftools query -i 'ID=@tocheck.txt' database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM\t%POS\n' > tocheck.pos
On génère le mapping inverse (chromosome number -> NC)
$ awk ' { t = $1; $1 = $2; $2 = t; print; } ' database/RefSeq/refseq_to_number_only_consensual.txt > mapping.txt
On remap clinvar
$ bcftools annotate --rename-chrs mapping.txt database/clinvar/clinvar.vcf.gz -o clinvar_remapped.vcf.gz
$ tabix clinvar_remapped.vcf.gz
Enfin, on cherche dans clinvar la classification
$ bcftools query -R tocheck.pos clinvar_remapped.vcf.gz -f '%CHROM %POS %INFO/CLNSIG\n'
$ bcftools query -R tocheck.pos database/dbSNP/dbSNP_common.vcf.gz -f '%CHROM %POS %ID \n' | grep '^NC'
#+RESULTS:
**** DONE Comprendre pourquoi la nouvelle version donne un résultat différent
CLOSED: [2022-12-11 Sun 20:11]
***** DONE Même version dbsnp et clinvar ?
CLOSED: [2022-12-10 Sat 23:02]
Clinvar différent !
$ bcftools stats clinvar.gz
clinvar (Alexis)
SN 0 number of samples: 0
SN 0 number of records: 1492828
SN 0 number of no-ALTs: 965
SN 0 number of SNPs: 1338007
SN 0 number of MNPs: 5562
SN 0 number of indels: 144580
SN 0 number of others: 3714
SN 0 number of multiallelic sites: 0
SN 0 number of multiallelic SNP sites: 0
clinvar (new)
SN 0 number of samples: 0
SN 0 number of records: 1493470
SN 0 number of no-ALTs: 965
SN 0 number of SNPs: 1338561
SN 0 number of MNPs: 5565
SN 0 number of indels: 144663
SN 0 number of others: 3716
SN 0 number of multiallelic sites: 0
SN 0 number of multiallelic SNP sites: 0
***** DONE Mettre à jour clinvar et dbnSNP pour travailler sur les mêm bases
CLOSED: [2022-12-11 Sun 12:10]
Problème persiste
***** DONE Supprimer la conversion en int du chromosome
CLOSED: [2022-12-10 Sat 19:29]
***** KILL Même NC ?
CLOSED: [2022-12-10 Sat 19:29]
$ zgrep "contig=<ID=NC_\(.*\)" clinvar/GRCh38/clinvar.vcf.gz > contig.clinvar
$ diff contig.txt contig.clinvar
< ##contig=<ID=NC_012920.1>
***** DONE Tester sur chromosome 19: ok
CLOSED: [2022-12-11 Sun 13:53]
On prépare les données
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/dbSNP_common.vcf.gz -o dbSNP_common_19.vcf.gz
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data/clinvar/GRCh38/clinvar.vcf.gz -o clinvar_19.vcf.gz
bcftools filter -i 'CHROM="NC_000019.10"' /Work/Groups/bisonex/data-alexis/dbSNP/dbSNP_common.vcf.gz -o dbSNP_common_19_old.vcf.gz
bcftools filter -i 'CHROM="19"' /Work/Groups/bisonex/data-alexis/clinvar/clinvar.vcf.gz -o clinvar_19_old.vcf.gz
#+end_src
On récupère les 2 versions du script
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
git checkout regression ../../script/pythonScript/clinvar_sbSNP.py
cp ../../script/pythonScript/clinvar_sbSNP.py clinvar_sbSNP_old.py
git checkout HEAD ../../script/pythonScript/clinvar_sbSNP.py
#+end_src
#+RESULTS:
On compare
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
python ../../script/pythonScript/clinvar_sbSNP.py clinvar_sbSNP.py --clinvar clinvar_19.vcf.gz --dbSNP dbSNP_common_19.vcf.gz --output tmp.txt
sort tmp.txt | uniq > new.txt
table=/Work/Groups/bisonex/data-alexis/RefSeq/refseq_to_number_only_consensual.txt
python clinvar_sbSNP_old.py --clinvar clinvar_19_old.vcf.gz --dbSNP dbSNP_common_19_old.vcf.gz --output tmp_old.txt --chrm_name_table $table
sort tmp_old.txt | uniq > old.txt
wc -l old.txt new.txt
#+end_src
#+RESULTS:
| 535155 | old.txt |
| 535194 | new.txt |
| 1070349 | total |
Si on prend le premier manquant dans new, il est conflicting patho donc il ne devrait pas y être...
$ bcftools query -i 'ID="rs10418277"' dbSNP
_common_19.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 54939682 C G,T
$ bcftools query -i 'ID="rs10418277"' dbSNP_common_19_old.vcf.gz -f '%CHROM %POS %REF %ALT\n'
NC_000019.10 54939682 C G,T
$ bcftools query -i 'POS=54939682' clinvar_19.vcf.gz -f '%POS %REF %ALT %INFO/CLNSIG\n'
54939682 C G Conflicting_interpretations_of_pathogenicity
54939682 C T Benign
$ bcftools query -i 'POS=54939682' clinvar_19_old.vcf.gz -f '%POS %REF %ALT %INFO/CLNSIG\n'
54939682 C G Conflicting_interpretations_of_pathogenicity
54939682 C T Benign
$ grep rs10418277 *.txt
new.txt:rs10418277
tmp.txt:rs10418277
Le problème venait de la POS qui n'était plus convertie en int (suppression de la ligne par erreur ??)
On vérifie
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
python ../../script/pythonScript/clinvar_sbSNP.py --clinvar clinvar_19.vcf.gz --dbSNP dbSNP_common_19.vcf.gz --output tmp.txt
sort tmp.txt | uniq > new.txt
table=/Work/Groups/bisonex/data-alexis/RefSeq/refseq_to_number_only_consensual.txt
python clinvar_sbSNP_old.py --clinvar clinvar_19_old.vcf.gz --dbSNP dbSNP_common_19_old.vcf.gz --output tmp_old.txt --chrm_name_table $table
sort tmp_old.txt | uniq > old.txt
wc -l old.txt new.txt
diff old.txt new.txt
#+end_src
#+RESULTS:
| 535155 | old.txt |
| 535155 | new.txt |
| 1070310 | total |
***** DONE Tester sur chromosome 19 et 20: ok
CLOSED: [2022-12-11 Sun 15:56]
On prépare les données
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/debug-commonsnp
PATH=$PATH:$HOME/.nix-profile/bin
bcftools filter -i 'CHROM="NC_000019.10" | CHROM="NC_000020.11"' /Work/Groups/bisonex/data/dbSNP/GRCh38.p13/dbSNP_common.vcf.gz -o dbSNP_common_19_20.vcf.gz
bc
nnotateLifted.jl
Annotation spip *très lente* : 1h13 !
Résultat:
2×3 DataFrame
Row │ variant meanQual depth
│ String Float64 Int64
─────┼──────────────────────────────────────
1 │ chr12:g.13594572 60.0 1
2 │ chr17:g.10204026 60.0 1
144 found over 146
filter
depth : another 0 missed variants
filter poly : another 0 missed variants
filter vep : another 0 missed variants
Et on a trop de variants en sortie (7330 !)
**** DONE Mail Paul avec résultats filtre en T2T + nouveau schéma
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
* Ré-interprétation
** PROJ Lancer tests sur données brutes [107/250]
- [X] 100222_63015289
- [X] 1600304839_63051311
- [X] 1900007827_62913191
- [X] 1900398899_62999500
- [X] 1900486799_62913197
- [X] 2100422923_62952677
- [X] 2100458888_62933047
- [X] 2100601558_62903840
- [X] 2100609288_62905768
- [X] 2100609501_62905776
- [X] 2100614493_62951074
- [X] 2100622566_62908067
- [X] 2100622601_62908060
- [X] 2100622705_62908063
- [X] 2100640027_62911936
- [X] 2100645285_62913212
- [X] 2100661411_62914081
- [X] 2100661462_62914086
- [X] 2100708257_62921596
- [X] 2100738732_62926501
- [X] 2100738850_62926509
- [X] 2100746751_62926505
- [X] 2100746797_62926506
- [X] 2100782349_62931722
- [X] 2100782416_62931561
- [X] 2100782559_62931718
- [X] 2100799204_62934768
- [X] 2200010202_62940284
- [X] 2200023600_62940631
- [X] 2200024348_62999591
- [X] 2200027505_62942457
- [X] 2200038776_62943412
- [X] 2200041919_62943405
- [X] 2200088014_62951326
- [X] 2200146652_62959388
- [X] 2200151850_62960953
- [X] 2200160014_62959475
- [X] 2200160070_62959478
- [X] 2200201368_62967471
- [X] 2200201400_62967470
- [X] 2200265558_62976332
- [X] 2200265605_62976401
- [X] 2200267046_62975192
- [X] 2200273878_62999530
- [X] 2200279708_62977002
- [X] 2200284408_62979102
- [X] 2200293987_62979116
- [X] 2200294359_62979118
- [X] 2200306299_62982217
- [X] 2200306539_62982193
- [X] 220030671_62982211
- [X] 2200307058_62982231
- [X] 2200307108_62982196
- [X] 2200307136_62982221
- [X] 2200307199_62982239
- [X] 2200307230_62982234
- [X] 2200307262_62982219
- [X] 2200307297_62982227
- [X] 2200324510_62985453
- [X] 2200324549_62985478
- [X] 2200324573_62985445
- [X] 2200324594_62985467
- [X] 2200324606_62985463
- [X] 2200324614_62985459
- [X] 2200338306_62985430
- [X] 2200343880_62989407
- [X] 2200343910_62989460
- [X] 2200343938_62989451
- [X] 2200343966_62989456
- [X] 2200343993_62989440
- [X] 2200344013_62989464
- [X] 2200349749_62989465
- [X] 2200363462_62988848
- [X] 2200377880_62991993
- [X] 2200378032_62991991
- [X] 2200383996_62993828
- [X] 2200384015_62993796
- [X] 2200384046_62993822
- [X] 2200384117_62993808
- [X] 2200384187_62993825
- [X] 2200384231_62992898
- [X] 2200385658_63060260
- [X] 2200394260_62994732
- [X] 2200395817_62994742
- [X] 2200396731_62994737
- [X] 2200424073_62999579
- [X] 2200424207_62999632
- [X] 2200426178_62999630
- [X] 2200426243_62999635
- [X] 2200426466_62999605
- [X] 2200426642_62999627
- [X] 2200427406_62999649
- [X] 2200427512_62999639
- [X] 2200428953_62999572
- [X] 2200428981_62999600
- [X] 2200428999_62999592
- [X] 2200441970_63000868
- [X] 2200441989_63000882
- [X] 2200442135_63000864
- [X] 2200442216_63000886
- [X] 2200442257_63000951
- [X] 2200451801_63003573
- [X] 2200451862_63004218
- [X] 2200451894_63004210
- [X] 2200456165_63051294
- [X] 2200459865_63004933
- [X] 2200459968_63004937
- [ ] 2200460073_63004943
- [ ] 2200460121_63004684
- [ ] 2200467051_63003856
- [ ] 2200467225_63004940
- [ ] 2200467261_63004930
- [ ] 2200467338_63004925
- [ ] 2200470099_63004485
- [ ] 2200470142_63004480
- [ ] 2200471780_63004362
- [ ] 2200480910_63006466
- [ ] 2200495073_63010427
- [ ] 2200495510_63009152
- [ ] 2200508677_63060252
- [ ] 2200510531_63012582
- [ ] 2200510628_63012549
- [ ] 2200510657_63012554
- [ ] 2200511249_63012533
- [ ] 2200511274_63012586
- [ ] 2200517952_63060399
- [ ] 2200519525_63060439
- [ ] 2200524009_63014044
- [ ] 2200524609_63014046
- [ ] 2200524616_63014048
- [ ] 2200533429_63060425
- [ ] 2200539735_63060406
- [ ] 2200549908_63019339
- [ ] 2200549965_63019349
- [ ] 2200550414_63019357
- [ ] 2200550471_63020031
- [ ] 2200550490_63019351
- [ ] 2200550505_63019340
- [ ] 2200555565_63018614
- [ ] 2200559438_63020029
- [ ] 2200559682_63020030
- [ ] 2200559713_63019623
- [ ] 2200559739_63019626
- [ ] 2200569969_63019991
- [ ] 2200570001_63021580
- [ ] 2200570025_63021490
- [ ] 2200570035_63021491
- [ ] 2200570042_63021493
- [ ] 2200570050_63021494
- [ ] 2200579897_63024910
- [ ] 2200583995_63024866
- [ ] 2200584035_63024905
- [ ] 2200584069_63024888
- [ ] 2200584126_63024810
- [ ] 2200589507_63026712
- [ ] 2200597365_63027994
- [ ] 2200597480_63027988
- [ ] 2200597752_63026853
- [ ] 2200597778_63027992
- [ ] 22005977_63026903
- [ ] 2200609031_63026527
- [ ] 2200614198_63113928
- [ ] 2200620372_63030821
- [ ] 2200620442_63030810
- [ ] 2200620498_63030816
- [ ] 2200620628_63031031
- [ ] 2200622310_63030984
- [ ] 2200622355_63030956
- [ ] 2200625369_63028699
- [ ] 2200625410_63028697
- [ ] 2200625536_63028694
- [ ] 2200630189_63030665
- [ ] 2200635149_63033182
- [ ] 2200644544_63037731
- [ ] 2200644594_63037725
- [ ] 2200650089_63038093
- [ ] 2200666292_63076568
- [ ] 2200669188_63036688
- [ ] 2200669320_63040259
- [ ] 2200669383_63040254
- [ ] 2200669414_63040257
- [ ] 2200669446_63040251
- [ ] 2200680342_63105271
- [ ] 2200694535_63042853
- [ ] 2200694789_63042862
- [ ] 2200694858_63042702
- [ ] 2200694917_63042696
- [ ] 2200699290_63043047
- [ ] 2200699345_63040238
- [ ] 2200699383_63043050
- [ ] 2200699412_63040731
- [ ] 220071551_63048935
- [ ] 2200731515_63048963
- [ ] 2200748145_63051198
- [ ] 2200748171_63051213
- [ ] 2200751046_63051249
- [ ] 2200751101_63051234
- [ ] 2200766471_63054590
- [ ] 2200767731_63054595
- [ ] 2200767822_63054464
- [ ] 2200775505_63060410
- [ ] 2200850441_63019345
- [ ] 220597589_63026879
- [ ] 2300003253_63060430
- [ ] 2300005679_63060370
- [ ] 2300009914_63060390
- [ ] 2300028784_63060001
- [ ] 2300036815_63063357
- [ ] 2300055382_63061874
- [ ] 2300055421_63061871
- [ ] 2300055440_63061880
- [ ] 230006894_63064950
- [ ] 2300071111_63070356
- [ ] 2300083434_63071675
- [ ] 2300103609_63076239
- [ ] 2300104572_63076232
- [ ] 2300109602_63076765
- [ ] 2300109665_63076770
- [ ] 2300119721_63078732
- [ ] 2300137773_63078133
- [ ] 2300137834_63078123
- [ ] 2300167821_63086183
- [ ] 2300172698_63113453
- [ ] 2300188216_63090609
- [ ] 2300188281_63090632
- [ ] 2300188800_63090616
- [ ] 2300193193645_63090623
- [ ] 2300193668_63090611
- [ ] 2300195426_63090608
- [ ] 2300201017_63089636
- [ ] 2300227479_63098330
- [ ] 2300232688_63130821
- [ ] 2300292749_63109239
- [ ] 230029277_63109247
- [ ] 2300294712_63109236
- [ ] 2300308032_63111581
- [ ] 2300323537_63114209
- [ ] 2300334609_63115535
- [ ] 2300346867_63118093
- [ ] 2300346867_63118093_NA12878
- [ ] 2300348940_63118099
- [ ] 2300359806_63119915
- [ ] 2300380476_63123963
- [ ] 2300382582_63123749
- [ ] 2300384269_63126867
- [ ] 2300407581_63130826
- [ ] 2300407626_63130842
- [ ] 2300409593_63130874
- [ ] 2300409612_63130980
- [ ] 2300417623_63131524
* Résultats
** TODO Speed-up BWA-mem
SCHEDULED: <2023-09-30 Sat>
** TODO Speed-up Hapotypecaller
SCHEDULED: <2023-09-30 Sat>
nnotateLifted.jl
Annotation spip *très lente* : 1h13 !
Résultat:
2×3 DataFrame
Row │ variant meanQual depth
│ String Float64 Int64
─────┼──────────────────────────────────────
1 │ chr12:g.13594572 60.0 1
2 │ chr17:g.10204026 60.0 1
144 found over 146
filter depth : another 0 missed variants
filter poly : another 0 missed variants
filter vep : another 0 missed variants
Et on a trop de variants en sortie (7330 !)
**** DONE Mail Paul avec résultats filtre en T2T + nouveau schéma
CLOSED: [2023-09-17 Sun 23:15] SCHEDULED: <2023-09-17 Sun>
* Ré-interprétation
** PROJ Lancer tests sur données brutes [108/250]
- [X] 100222_63015289
- [X] 1600304839_63051311
- [X] 1900007827_62913191
- [X] 1900398899_62999500
- [X] 1900486799_62913197
- [X] 2100422923_62952677
- [X] 2100458888_62933047
- [X] 2100601558_62903840
- [X] 2100609288_62905768
- [X] 2100609501_62905776
- [X] 2100614493_62951074
- [X] 2100622566_62908067
- [X] 2100622601_62908060
- [X] 2100622705_62908063
- [X] 2100640027_62911936
- [X] 2100645285_62913212
- [X] 2100661411_62914081
- [X] 2100661462_62914086
- [X] 2100708257_62921596
- [X] 2100738732_62926501
- [X] 2100738850_62926509
- [X] 2100746751_62926505
- [X] 2100746797_62926506
- [X] 2100782349_62931722
- [X] 2100782416_62931561
- [X] 2100782559_62931718
- [X] 2100799204_62934768
- [X] 2200010202_62940284
- [X] 2200023600_62940631
- [X] 2200024348_62999591
- [X] 2200027505_62942457
- [X] 2200038776_62943412
- [X] 2200041919_62943405
- [X] 2200088014_62951326
- [X] 2200146652_62959388
- [X] 2200151850_62960953
- [X] 2200160014_62959475
- [X] 2200160070_62959478
- [X] 2200201368_62967471
- [X] 2200201400_62967470
- [X] 2200265558_62976332
- [X] 2200265605_62976401
- [X] 2200267046_62975192
- [X] 2200273878_62999530
- [X] 2200279708_62977002
- [X] 2200284408_62979102
- [X] 2200293987_62979116
- [X] 2200294359_62979118
- [X] 2200306299_62982217
- [X] 2200306539_62982193
- [X] 220030671_62982211
- [X] 2200307058_62982231
- [X] 2200307108_62982196
- [X] 2200307136_62982221
- [X] 2200307199_62982239
- [X] 2200307230_62982234
- [X] 2200307262_62982219
- [X] 2200307297_62982227
- [X] 2200324510_62985453
- [X] 2200324549_62985478
- [X] 2200324573_62985445
- [X] 2200324594_62985467
- [X] 2200324606_62985463
- [X] 2200324614_62985459
- [X] 2200338306_62985430
- [X] 2200343880_62989407
- [X] 2200343910_62989460
- [X] 2200343938_62989451
- [X] 2200343966_62989456
- [X] 2200343993_62989440
- [X] 2200344013_62989464
- [X] 2200349749_62989465
- [X] 2200363462_62988848
- [X] 2200377880_62991993
- [X] 2200378032_62991991
- [X] 2200383996_62993828
- [X] 2200384015_62993796
- [X] 2200384046_62993822
- [X] 2200384117_62993808
- [X] 2200384187_62993825
- [X] 2200384231_62992898
- [X] 2200385658_63060260
- [X] 2200394260_62994732
- [X] 2200395817_62994742
- [X] 2200396731_62994737
- [X] 2200424073_62999579
- [X] 2200424207_62999632
- [X] 2200426178_62999630
- [X] 2200426243_62999635
- [X] 2200426466_62999605
- [X] 2200426642_62999627
- [X] 2200427406_62999649
- [X] 2200427512_62999639
- [X] 2200428953_62999572
- [X] 2200428981_62999600
- [X] 2200428999_62999592
- [X] 2200441970_63000868
- [X] 2200441989_63000882
- [X] 2200442135_63000864
- [X] 2200442216_63000886
- [X] 2200442257_63000951
- [X] 2200451801_63003573
- [X] 2200451862_63004218
- [X] 2200451894_63004210
- [X] 2200456165_63051294
- [X] 2200459865_63004933
- [X] 2200459968_63004937
- [X] 2200460073_63004943
- [ ] 2200460121_63004684
- [ ] 2200467051_63003856
- [ ] 2200467225_63004940
- [ ] 2200467261_63004930
- [ ] 2200467338_63004925
- [ ] 2200470099_63004485
- [ ] 2200470142_63004480
- [ ] 2200471780_63004362
- [ ] 2200480910_63006466
- [ ] 2200495073_63010427
- [ ] 2200495510_63009152
- [ ] 2200508677_63060252
- [ ] 2200510531_63012582
- [ ] 2200510628_63012549
- [ ] 2200510657_63012554
- [ ] 2200511249_63012533
- [ ] 2200511274_63012586
- [ ] 2200517952_63060399
- [ ] 2200519525_63060439
- [ ] 2200524009_63014044
- [ ] 2200524609_63014046
- [ ] 2200524616_63014048
- [ ] 2200533429_63060425
- [ ] 2200539735_63060406
- [ ] 2200549908_63019339
- [ ] 2200549965_63019349
- [ ] 2200550414_63019357
- [ ] 2200550471_63020031
- [ ] 2200550490_63019351
- [ ] 2200550505_63019340
- [ ] 2200555565_63018614
- [ ] 2200559438_63020029
- [ ] 2200559682_63020030
- [ ] 2200559713_63019623
- [ ] 2200559739_63019626
- [ ] 2200569969_63019991
- [ ] 2200570001_63021580
- [ ] 2200570025_63021490
- [ ] 2200570035_63021491
- [ ] 2200570042_63021493
- [ ] 2200570050_63021494
- [ ] 2200579897_63024910
- [ ] 2200583995_63024866
- [ ] 2200584035_63024905
- [ ] 2200584069_63024888
- [ ] 2200584126_63024810
- [ ] 2200589507_63026712
- [ ] 2200597365_63027994
- [ ] 2200597480_63027988
- [ ] 2200597752_63026853
- [ ] 2200597778_63027992
- [ ] 22005977_63026903
- [ ] 2200609031_63026527
- [ ] 2200614198_63113928
- [ ] 2200620372_63030821
- [ ] 2200620442_63030810
- [ ] 2200620498_63030816
- [ ] 2200620628_63031031
- [ ] 2200622310_63030984
- [ ] 2200622355_63030956
- [ ] 2200625369_63028699
- [ ] 2200625410_63028697
- [ ] 2200625536_63028694
- [ ] 2200630189_63030665
- [ ] 2200635149_63033182
- [ ] 2200644544_63037731
- [ ] 2200644594_63037725
- [ ] 2200650089_63038093
- [ ] 2200666292_63076568
- [ ] 2200669188_63036688
- [ ] 2200669320_63040259
- [ ] 2200669383_63040254
- [ ] 2200669414_63040257
- [ ] 2200669446_63040251
- [ ] 2200680342_63105271
- [ ] 2200694535_63042853
- [ ] 2200694789_63042862
- [ ] 2200694858_63042702
- [ ] 2200694917_63042696
- [ ] 2200699290_63043047
- [ ] 2200699345_63040238
- [ ] 2200699383_63043050
- [ ] 2200699412_63040731
- [ ] 220071551_63048935
- [ ] 2200731515_63048963
- [ ] 2200748145_63051198
- [ ] 2200748171_63051213
- [ ] 2200751046_63051249
- [ ] 2200751101_63051234
- [ ] 2200766471_63054590
- [ ] 2200767731_63054595
- [ ] 2200767822_63054464
- [ ] 2200775505_63060410
- [ ] 2200850441_63019345
- [ ] 220597589_63026879
- [ ] 2300003253_63060430
- [ ] 2300005679_63060370
- [ ] 2300009914_63060390
- [ ] 2300028784_63060001
- [ ] 2300036815_63063357
- [ ] 2300055382_63061874
- [ ] 2300055421_63061871
- [ ] 2300055440_63061880
- [ ] 230006894_63064950
- [ ] 2300071111_63070356
- [ ] 2300083434_63071675
- [ ] 2300103609_63076239
- [ ] 2300104572_63076232
- [ ] 2300109602_63076765
- [ ] 2300109665_63076770
- [ ] 2300119721_63078732
- [ ] 2300137773_63078133
- [ ] 2300137834_63078123
- [ ] 2300167821_63086183
- [ ] 2300172698_63113453
- [ ] 2300188216_63090609
- [ ] 2300188281_63090632
- [ ] 2300188800_63090616
- [ ] 2300193193645_63090623
- [ ] 2300193668_63090611
- [ ] 2300195426_63090608
- [ ] 2300201017_63089636
- [ ] 2300227479_63098330
- [ ] 2300232688_63130821
- [ ] 2300292749_63109239
- [ ] 230029277_63109247
- [ ] 2300294712_63109236
- [ ] 2300308032_63111581
- [ ] 2300323537_63114209
- [ ] 2300334609_63115535
- [ ] 2300346867_63118093
- [ ] 2300346867_63118093_NA12878
- [ ] 2300348940_63118099
- [ ] 2300359806_63119915
- [ ] 2300380476_63123963
- [ ] 2300382582_63123749
- [ ] 2300384269_63126867
- [ ] 2300407581_63130826
- [ ] 2300407626_63130842
- [ ] 2300409593_63130874
- [ ] 2300409612_63130980
- [ ] 2300417623_63131524
* Résultats
** TODO Speed-up BWA-mem
SCHEDULED: <2023-10-01 Sun>
** TODO Speed-up Hapotypecaller
SCHEDULED: <2023-10-01 Sun>
:PROPERTIES:
:ID: 00866878-5a50-4ade-bf68-6669c3b3bcf1
:END:
#+title: Projet: évolution
#+filetags: personal
À faire :
SDL:
- Lire une carte de hauteur et générer le terrain
- Représenter les herbivores avec des sprites
- vue isométrique
- génération de terrain procédurale
Main:
- Faire une liste d’herbivores
1. Algorithme génétique très simple (OK)
Converger vers une string donnée à partir d’une string random. Une seule mutation à chaque étape et on prend le pire charactère pour en générer un autre aléatoire. Si la nouvelle string est meilleure, on la garde. Convergence en ~2000 itérations
2. Notes
Représenter les animaux.
Idéalement, un squelette rassemble l’information importante. Mais comment choisir le nombre de membres et d’articulations ?
Fonction de fitness
Doit-elle être définie a priori ? Peut-on utiliser ce que l’animal apprend au cours de sa vie ? En tout cas, il faudra pouvoir supprimer des animaux qui n’arrivent pas à se nourrir.
Pour les herbivores, cela dépend clairement de son squelette. Comment quantifier le rapport entre le squelette et la sphère atteignable par l’animal ?
Des liaisons pivots pourraient définir ces sphères. Mais cela suppose que l’animal peut attraper (comment quantifier ça ?) ou bien manquer directement avec le membre.
Dans un premier temps, on peut supper que c’est une sorte de grosse bactérie qui absorbe l’herbe par la peau.
Environnement
Au début, on va avoir une carte de hauteur d’herbe. On peut utiliser 3 hauteurs au début : désert, praire et forêt, en supposant que la forêt est comestible.
Sélection
On peut soit enlever tous les animaux dont la fonction de fitness est inférieure à un seuil (risque d’extinction), soit enlèver les n pires. Idéalement, il faudrait avoir les deux. On ne veut pas d’extinction (à creuser?).
Reproduction
Comment choisir le couple qui va se reproduire ? Une approche réaliste consisterait à choisir l’animal célibataire le plus proche. On suppose qu’ils sont asexués et qu’ils peuvent parcourir toute la carte.
- public (13Go) = On utilise git-annex non encrypter pour archiver sur:
- github (LFS, accès direct)
- mega
- disque dur
- private (5.6Gb) = git-annex avec encryption
- mega
- disque dur
Structure: plusieurs dépôts dans annex
| Repo | Taille | Crypté | Mega | Google Drive | /annex | ~/annex | Raspberry |
| | (Gb) | | | | (tour) | (portable) | |
|---------+--------+--------+------+--------------+--------+------------+-----------|
| papers | | non | | | | | |
| public | | non | oui | oui | oui | | oui |
| private | 0.8 | non | oui | oui | oui | | oui |
| data | | | | | oui | | oui |
Contenu
- public
** Configuration
*** Papers
#+begin_src sh
cd papers
git init
git annex init
git annex add
git annex initremote github-lfs type=git-lfs url=git@github.com:apraga/papers.git encryption=none
git lfs track "*.pdf"
git annex sync --content
#+end_src
*** Configuration des remote avec rclone (public + private)
Installer https://github.com/git-annex-remote-rclone/git-annex-remote-rclone
(copier exécutable dans $PATH)
#+begin_src sh
cd public
git init
git annex init
git annex add
#+end_src
Suivre les indications pour ajouter un remote nommé mega de type Mega (29)
#+begin_src sh
rclone config
rclone lsd mega:
#+end_src
Créer un remote megaremote qui pointe vers le remote mega dans rclone. Pas d'encryption:
#+begin_src sh
git annex initremote megaremote type=external externaltype=rclone target=mega prefix=annex-public chunk=50MiB encryption=none rclone_layout=lower
#+end_src
Pour googledrive, il faut un client id et un client secret selon les consignes ici : <https://rclone.org/drive/#making-your-own-client-id>. On encrypte mais avec la clé dans le dépôt git donc ne pas mettre les dépôts n'importe où !
#+begin_src sh
git annex initremote gdriveremote type=external externaltype=rclone target=gdrive prefix=annex-private chunk=50MiB encryption=shared rclone_layout=lower
#+end_src
** Archivage
Dans chaque dossier
#+begin_src sh
git annex sync --content
#+end_src
- [X] courrier de distribution 145 000km