apraga/org - Change RWZ457JG7GN4JJKPRNLB22OHOGYSKR756OQUWLJDAEXRCHFBK42AC

Updating notes

Created by Alexis Praga on May 14, 2023

RWZ457JG7GN4JJKPRNLB22OHOGYSKR756OQUWLJDAEXRCHFBK42AC

Dependencies

In channels

main

Change contents

Insertion in projects.org at line 2242 [8.123895]

[6.727]

[7.1513]

** TODO Bamsurgeon
/Entered on/ [2023-05-13 Sat 19:11]
*** TODO Velvet

Replacement in projects/bisonex.org at line 7 [10.35]

B:BD[9.14923] → [9.14923:15852]

B:BD[9.15852] → [11.8455:15718]

B:BD[11.15718] → [12.284:8476]

_norm.vcf.gz
bcftools isec dbsnp_mwi_norm.vcf.gz clinvar_mwi.vcf.gz -n=2
#+end_src
#+RESULTS:
| NC_000020.11 | 10652589 | G | A | 11 |
| NC_000020.11 | 10652589 | G | C | 11 |
******* TODO Sur dbSNP chr20 non
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools norm -m -any dbSNP_common_chr20 -o dbSNP_common_chr20_norm.vcf.gz
#+end_src
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools isec -i 'INFO/CLNSIG="Pathogenic"' dbSNP_common_chr20_norm.vcf.gz clinvar_chr20.vcf.gz -p tmp
#+end_src
#+RESULTS:
***** DONE Essai bedtools intersect
#+begin_src sh
bedtools intersect -a  dbSNP_common.vcf.gz -b clinvar.vcf.gz
#+end_src
$ wc -l intersect.vcf
220206 intersect.vcf
** TODO Dépendences avec Nix
*** DONE GATK
CLOSED: [2022-10-21 Fri 21:59]
*** WAIT BioDBHTS
Contribuer pull request
*** DONE BioExtAlign
CLOSED: [2022-10-22 Sat 00:38]
*** WAIT BioBigFile
Revoir si on peut utliser kent de
rnière version
Contribuer pull request
*** HOLD rtg-tools
Convertir clinvar NC
*** DONE simuscop
CLOSED: [2022-12-30 Fri 22:31]
*** DONE Spip
CLOSED: [2022-12-04 Sun 12:49]
Pas de pull request
*** DONE R + packages
CLOSED: [2022-11-19 Sat 21:05]
*** TODO hap.py
https://github.com/Illumina/hap.py
**** DONE Version sans rtgtools avec python 3
CLOSED: [2023-02-02 Thu 22:15]
Procédure pour tester
#+begin_src
nix develop .#hap-py
$ genericBuild
#+end_src
1. Supprimer l’appel à make_dependencies dans cmakelist.txt : on peut tout installer avec nix
2. Patch Roc.cpp pour avoir numeric_limits ( error: 'numeric_limits' is not a member of 'std')
3. ajout de flags de link (essai, error)
set(ZLIB_LIBRARIES -lz -lbz2 -lcurl -lcrypto -llzma)
4. Changer les appels à print en print() dans le code python et suppression de quelques import
[nix-shell:~/source]$ sed -i.orig 's/print \"\(.*\)"/print(\1)/' src/python/*.py
**** DONE Sérialiser json pour écrire données de sorties
CLOSED: [2023-02-17 Fri 19:25]
**** DONE Tester sur example
CLOSED: [2023-02-04 Sat 00:25]
#+begin_src sh
$ cd hap.py
$ ../result/bin/hap.py example/happy/PG_NA12878_chr21.vcf.gz       example/happy/NA12878_chr21.vcf.gz       -f example/happy/PG_Conf_chr21.bed.gz       -o test -r example/chr21.fa
#+end_src
#+RESULTS:
| Type  | Filter | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision | METRIC.Frac_NA | METRIC.F1_Score |
| INDEL | ALL    |        8937 |     7839 |     1098 |       11812 |      343 |      3520 |    45 |   283 |      0.877140 |         0.958635 |       0.298002 |        0.916079 |
| INDEL | PASS   |        8937 |     7550 |     1387 |        9971 |      283 |      1964 |    30 |   242 |      0.844803 |         0.964656 |       0.196971 |        0.900760 |
| SNP   | ALL    |       52494 |    52125 |      369 |       90092 |      582 |     37348 |   107 |   354 |      0.992971 |         0.988966 |       0.414554 |        0.990964 |
| SNP   | PASS   |       52494 |    46920 |     5574 |       48078 |      143 |       992 |     8 |    97 |      0.893816 |         0.996963 |       0.020633 |        0.942576 |
**** TODO Version avec rtg-tools
**** TODO Faire fonctionner Tests
***** TODO Essai 2 : depuis nix develop:
#+begin_src
nix develop .#hap-py
genericBuild
#+end_src
Lancé initialement à la main, mais on peut maintenant utiliser run_tests
#+begin_src
HCDIR=bin/ ../src/sh/run_tests.sha
#+end_src
- [X] test boost
- [X] multimerge
- [X] hapenum
- [X] fp accuracy
- [X] faulty variant
- leftshift fails
- [X] other vcf
- [X] chr prefix
- [X] gvcf
- [X] decomp
- [X] contig lengt
- [X]  integration test
- [ ] scmp fails sur le type
- [X] giab
- [X] performance
- [ ] quantify fails sur le type
- [ ] stratified échec sur les résultats !
- [X] pg counting
- [ ] sompy: ne trouve pas Strelka dans somatic
phases="buildPhase checkPhase installPhase fixupPhase" genericBuild
#+end_src
**** KILL Reproduire les performances precisionchallenge : attention à HG002 et HG001!
CLOSED: [2023-04-01 Sat 19:43]
https://www.nist.gov/programs-projects/genome-bottle
***** KILL 0GOOR
CLOSED: [2023-04-01 Sat 19:40]
Le problème venait 1. de l'ADN et 2. du renommage des chromosomes qui était faux
****** DONE HG002
CLOSED: [2023-02-17 Fri 19:31]
 Type Filter  TRUTH.TOTAL  TRUTH.TP  TRUTH.FN  QUERY.TOTAL  QUERY.FP  QUERY.UNK  FP.gt  FP.al  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA  METRIC.F1_Score
INDEL    ALL       525466    491355     34111      1156702     57724     605307   9384  25027       0.935084          0.895313        0.523304         0.914766
INDEL   PASS       525466    491355     34111      1156702     57724     605307   9384  25027       0.935084          0.895313        0.523304         0.914766
  SNP    ALL      3365115   3358399      6716      5666020     21995    2284364   4194   1125       0.998004          0.993496        0.403169         0.995745
  SNP   PASS      3365115   3358399      6716      5666020     21995    2284364   4194   1125       0.998004          0.993496        0.403169         0.995745
 TRUTH.TOTAL.TiTv_ratio  QUERY.TOTAL.TiTv_ratio  TRUTH.TOTAL.het_hom_ratio  QUERY.TOTAL.het_hom_ratio
                    NaN                     NaN                   1.528276                   2.752637
                    NaN                     NaN                   1.528276                   2.752637
               2.100129                1.473519                   1.581196                   1.795603
               2.100129                1.473519                   1.581196                   1.795603
***** KILL Avec python2
CLOSED: [2023-02-17 Fri 19:25]
****** KILL avec nix
CLOSED: [2023-02-17 Fri 19:25]
conda create -n python2 python=2.7 anaconda
****** KILL avec conda
CLOSED: [2023-02-17 Fri 19:25]
******* Gentoo: regex_error sur test...
Ok avec bash !
#+begin_src
anaconda3/bin/conda create --name py2 python=2.7
conda activate py2
conda install -c bioconda hap.py
#+end_src
******** Faire tourner les tests.
Il faut remplace bin/test_haplotypes par test_haplotypes dans src/sh/run_tests.sh
#+begin_src sh
 HGREF=../genome/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta HCDIR=~/anaconda3/envs/py2/bin bash src/sh/run_tests.sh
#+end_src
Echec:
test_haplotypes: /opt/conda/conda-bld/work/hap.py-0.3.7/src/c++/lib/tools/Fasta.cpp:81: MMappedFastaFile::MMappedFastaFile(const string&): Assertion `fd != -1' failed.
unknown location(0): fatal error in "testVariantPrimitiveSplitter": signal: SIGABRT (application abort requested)
/opt/conda/conda-bld/work/hap.py-0.3.7/src/c++/test/test_align.cpp(298): last checkpoint
******** Chr21
HGREF=../genome/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta hap.py        example/happy/PG_NA12878_chr21.vcf.gz       example/happy/NA12878_chr21.vcf.gz       -f example/happy/PG_Conf_chr21.bed.gz       -o test
******* Helios
échec
** DONE Exécution
CLOSED: [2022-09-13 Tue 21:37]
*** KILL test Bionix
*** KILL Implémenter execution avec Nix ?
Voir https://academic.oup.com/gigascience/article/9/11/giaa121/5987272?login=false
pour un exemple.
Probablement plus simple d’utiliser Nix pour gestion de l’environnement et snakemake pour l’exécution
Pas d’accès internet depuis le cluster
*** DONE nextflow
CLOSED: [2022-09-13 Tue 21:37]
**** TODO Bug scheduler SGE
Le job se fait tuer car l'utilisateur n'est pas passé correctement à nextflow
***** DONE Forcer l'utilisateur à l'exécution
CLOSED: [2023-04-01 Sat 17:57]
NXF_OPTS=-D"user.name=alex"
***** DONE Vérifier si le problème persiste avec 22.10.6
CLOSED: [2023-04-01 Sat 18:38] SCHEDULED: <2023-04-01 Sat>
oui
***** KILL Packager l'utilisateur dans le programme ?
Mauvaise idée..
** TODO Preprocessing avec nextflow
*** TODO Map to reference
**** TODO Sample ID dans header
/Work/Users/apraga/bisonex/out/63003856_S135/preprocessing/baserecalibrator
*** DONE Mark duplicate
CLOSED: [2022-10-09 Sun 22:30]
*** DONE Recalibrate base quality score
CLOSED: [2022-10-09 Sun 22:30]
** DONE Variant calling avec Nextflow
CLOSED: [2022-11-19 Sat
 21:34]
*** DONE Haplotype caller
CLOSED: [2022-10-09 Sun 22:40]
*** DONE Filter variants
CLOSED: [2022-10-09 Sun 22:40]
*** DONE Filter common snp not clinvar path
CLOSED: [2022-11-07 Mon 23:00]
Voir [[*common dbSNP not clinvar patho][common dbSNP not clinvar patho]]
*** DONE Filter variant only in consensual sequence
CLOSED: [2022-11-08 Tue 22:23]
*** DONE Filter technical variants
CLOSED: [2022-11-19 Sat 21:34]
*** DONE Utilise AVX pour accélerer l'exécution
CLOSED: [2023-04-29 Sat 15:46]
Sans cela, on a l'avertissement
#+begin_quote
17:28:00.720 INFO  PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported
17:28:00.721 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/nix/store/cy9ckxqwrkifx7wf02hm4ww1p6lnbxg9-gatk-4.2.4.1/bin/gatk-package-4.2.4.1-local.jar!/com/intel/gkl/native/libgkl_utils.so
17:28:00.733 WARN  NativeLibraryLoader - Unable to load libgkl_utils.so from native/libgkl_utils.so (/Work/Users/apraga/bisonex/out/NA12878_NIST7035/preprocessing/applybqsr/libgkl_utils821485189051585397.so: libgomp.so.1: cannot open shared object file: No such file or directory)
17:28:00.733 WARN  IntelPairHmm - Intel GKL Utils not loaded
17:28:00.733 WARN  PairHMM - ***WARNING: Machine does not have the AVX instruction set support needed for the accelerated AVX PairHmm. Falling back to the MUCH slower LOGLESS_CACHING implementation!
17:28:00.763 INFO  ProgressMeter - Starting traversal
#+end_quote
libgomp.so est fourni par gcc donc il faut charger le module
 module load gcc@11.3.0/gcc-12.1.0
** KILL Utiliser subworkflow
CLOSED: [2023-04-02 Sun 18:08]
Notre version permet d'être plus souple
*** KILL Alignement
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
*** KILL Vep
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
vcf_annotate_ensemblvep
** TODO Annotation avec nextflow :annotation:
*** KILL VEP : --gene-phenotype ?
CLOSED: [2023-04-18 mar. 18:32]
Vu avec alexis : bases de données non à jour
https://www.ensembl.org/info/genome/variation/phenotype/sources_phenotype_documentation.html
*** DONE plugin VEP
CLOSED: [2023-04-18 mar. 18:32]
Cloner dépôt git avec plugin
Puis utiliser --dir_plugins
*** HOLD Utiliser code d’Alexis
*** TODO Nouvelle version avec VEP
Example avec --custom
https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html
**** TODO Ajout spliceAI
SCHEDULED: <2023-04-30 Sun>
plugin VEP
***** DONE Télécharger les données
CLOSED: [2023-05-11 Thu 19:01]
Difficile d'automatiser, le lien est temporaire...
***** DONE PLugin
CLOSED: [2023-05-11 Thu 20:16]
***** DONE Séparer score en plusieurs colonnes
CLOSED: [2023-05-11 Thu 20:16]
Test avec ce fichier pour avoir une ligne avec annotation et une ligne sans
#CHROM	POS	ID	REF	ALT
1	9091	.	A	C
1	69091	.	A	C
et
#+begin_src sh
rm -f postvep.tsv* && vep -i testspliceai.vcf.gz -o postvep.tsv --tab  --dir 109 --merged --pick --use_given_ref   --offline  --plugin SpliceAI,snv=spliceai_scores.raw.snv.hg38.vcf.gz,indel=spliceai_scores.raw.indel.hg38.vcf.gz
#+end_src
#+begin_src
$ bgzip postvep.tsv
$ python spliceai.py
$ cat postvep2.tsv
,variation,Location,Allele,Gene,Feature,Feature_type,Consequence,cDNA_position,CDS_position,Protein_position,Amino_acids,Codons,Existing_variation,IMPACT,DISTANCE,STRAND,FLAGS,REFSEQ_MATCH,SOURCE,REFSEQ_OFFSET,SpliceAI_AG,SpliceAI_AL,SpliceAI_DG,SpliceAI_DL
0,1_9091_A/C,1:9091,C,ENSG00000290825,ENST00000456328,Transcript,upstream_gene_variant,-,-,-,-,-,-,MODIFIER,2778,1,-,-,Ensembl,-,,,,
1,1_69091_A/C,1:69091,C,ENSG00000186092,ENST00000641515,Transcript,missense_variant,124,64,22,M/L,Atg/Ctg,-,MODERATE,-,1,-,-,Ensembl,-,0.01,0.00,0.00,0.01
#+end_src
Test
cp work/bf/437ae511958509e43072f032f4d495/small.tab.gz tests/vep-spip.tab.gz
cp work/d5/3b1244b5ae83d54409ee0d456e8c55/small_cadd.tab.gz tests/vep-cadd-splice.tab.gz
**** TODO Ajout LOEUF et pli
plugin VEP
**** TODO NMD
**** KILL Ajout LOEUF
CLOSED: [2023-04-19 mer. 16:32]
plugin VEP
**** DONE Spip
CLOSED: [2023-05-01 Mon 23:07] SCHEDULED: <2023-04-30 Sun>
BED ne semble pas bien marcher (il faut définir une zone)
VCF : trop d’information
Attention, plusieurs transcripts mais résultats identiques. On supprimer les doublons
***** DONE interpretation + score + intervalle de confiance séparé
CLOSED: [2023-05-01 Mon 23:07] SCHEDULED: <2023-04-30 Sun>
Tests :
dans tests/
vep -i 63004925-small.vcf -o postvep.vcf --vcf --fasta genomeRef.fna --dir 109 --merged --pick  --offline --custom ../script/spip_annotation.vcf.gz,SPIP,vcf,exact,0,spipInterp,spipScore,spipConfidence
***** DONE Score
CLOSED: [2023-04-22 Sat 15:30]
**** DONE CADD: remplacer par plugin VEP
CLOSED: [2023-05-07 Sun 14:45] SCHEDULED: <2023-05-07 Sun>
***** Test
#+begin_src
vep  -i test.vcf  -o lol.vcf --offline --dir  /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz  --dir_plugins ../VEP_plugins/ -v
#+end_src
Test
#+begin_src sh
vep --id "1  230710048 230710048 A/G 1"   --offline --dir  /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz  --hgvsg --plugin pLI --plugin LOEUF -o lol
#+end_src
CSQ=G|missense_variant|MODERATE|AGT|ENSG00000135744|Transcript|ENST00000366667|protein_coding|2/5||||843|776|259|M/T|aTg/aCg|||-1||HGNC|HGNC:333||Ensembl||A|A||1:g.230710048A>G|0.347|-0.277922|
Correspond bien à https://www.ensembl.org/Homo_sapiens/Tools/VEP/Results?tl=I7ZsIbrj14P6lD43-9115494
***** DONE Utiliser whole genome
CLOSED: [2023-04-29 Sat 15:46]
***** KILL Renommer les chromosome avant ...
CLOSED: [2023-05-01 Mon 09:14] SCHEDULED: <2023-04-30 Sun>
Trop long !
- Téléchargement de CADD: 4h20
- renommer les chromosome pour SNV : 6h20
- tabix sur les SNV : job tué au bout de 21h....
***** DONE annoter séparément et fusionner les tableaux
CLOSED: [2023-05-07 Sun 14:45] SCHEDULED: <2023-05-01 Mon>
NB: on pourrait filtrer CADD avec tabix pour se restreindre à nos variants
**** DONE clinvar
CLOSED: [2023-04-22 Sat 15:31]
**** KILL Vérifier résultats HGVS avec mutalyzer
CLOSED: [2023-05-01 Mon 09:26]
**** TODO Parallélisation
***** HOLD par chromosome avec workflow VEP
https://github.com/Ensembl/ensembl-vep/blob/release/109/nextflow/workflows/run_vep.nf
***** HOLD Avec option --fork
**** TODO Utiliser la version de nf-core de VEP
SCHEDULED: <2023-05-07 Sun>
**** DONE OMIM
CLOSED: [2023-05-08 Mon 15:02] SCHEDULED: <2023-05-01 Mon>
**** TODO Grantham
SCHEDULED: <2023-05-01 Mon>
**** TODO ACMG incidental
SCHEDULED: <2023-05-01 Mon>
**** TODO Gnomad ?
SCHEDULED: <2023-05-01 Mon>
**** DONE Filtrer après VEP avec filter_vep
CLOSED: [2023-04-29 Sat 15:47]
nNon testé
*** HOLD Ancienne version
**** TODO HGVS
**** TODO Filtrer après VEP
**** TODO OMIM
**** TODO clinvar
**** TODO ACMG incidental
**** TODO Grantham
**** KILL LRG
CLOSED: [2023-04-18 mar. 17:22] SCHEDULED: <2023-04-18 Tue>
Vu avec alexis, n’est plus à jour
**** TODO Gnomad
** DONE Porter exactement la version d'Alexis sur Helios
CLOSED: [2023-01-14 Sat 17:56]
Branche "prod"
** STRT Tester version d'alexis avec Nix
*** DONE Ajouter clinvar
CLOSED: [2022-11-13 Sun 19:37]
*** DONE Alignement
CLOSED: [2022-11-13 Sun 12:52]
*** DONE Haplotype caller
CLOSED: [2022-11-13 Sun 13:00]
*** TODO Filter
- [X] depth
- [ ] comon snp not path
Problème avec liste des ID
**** TODO variant annotation
Besoin de vep
*** TODO Variant calling
* Amélioration :amelioration:
* Documentation :doc:
** DONE Procédure d'installation nix + dependences pour VM CHU
CLOSED: [2023-04-22 Sat 1

[9.14923]

[12.8476]

_norm.vcf.gz
bcftools isec dbsnp_mwi_norm.vcf.gz clinvar_mwi.vcf.gz -n=2
#+end_src
#+RESULTS:
| NC_000020.11 | 10652589 | G | A | 11 |
| NC_000020.11 | 10652589 | G | C | 11 |
******* TODO Sur dbSNP chr20 non
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools norm -m -any dbSNP_common_chr20 -o dbSNP_common_chr20_norm.vcf.gz
#+end_src
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools isec -i 'INFO/CLNSIG="Pathogenic"' dbSNP_common_chr20_norm.vcf.gz clinvar_chr20.vcf.gz -p tmp
#+end_src
#+RESULTS:
***** DONE Essai bedtools intersect
#+begin_src sh
bedtools intersect -a  dbSNP_common.vcf.gz -b clinvar.vcf.gz
#+end_src
$ wc -l intersect.vcf
220206 intersect.vcf
** TODO Dépendences avec Nix
*** DONE GATK
CLOSED: [2022-10-21 Fri 21:59]
*** WAIT BioDBHTS
Contribuer pull request
*** DONE BioExtAlign
CLOSED: [2022-10-22 Sat 00:38]
*** WAIT BioBigFile
Revoir si on peut utliser kent dernière version
Contribuer pull request
*** HOLD rtg-tools
Convertir clinvar NC
*** DONE simuscop
CLOSED: [2022-12-30 Fri 22:31]
*** DONE Spip
CLOSED: [2022-12-04 Sun 12:49]
Pas de pull request
*** DONE R + packages
CLOSED: [2022-11-19 Sat 21:05]
*** TODO hap.py
https://github.com/Illumina/hap.py
**** DONE Version sans rtgtools avec python 3
CLOSED: [2023-02-02 Thu 22:15]
Procédure pour tester
#+begin_src
nix develop .#hap-py
$ genericBuild
#+end_src
1. Supprimer l’appel à make_dependencies dans cmakelist.txt : on peut tout installer avec nix
2. Patch Roc.cpp pour avoir numeric_limits ( error: 'numeric_limits' is not a member of 'std')
3. ajout de flags de link (essai, error)
set(ZLIB_LIBRARIES -lz -lbz2 -lcurl -lcrypto -llzma)
4. Changer les appels à print en print() dans le code python et suppression de quelques import
[nix-shell:~/source]$ sed -i.orig 's/print \"\(.*\)"/print(\1)/' src/python/*.py
**** DONE Sérialiser json pour écrire données de sorties
CLOSED: [2023-02-17 Fri 19:25]
**** DONE Tester sur example
CLOSED: [2023-02-04 Sat 00:25]
#+begin_src sh
$ cd hap.py
$ ../result/bin/hap.py example/happy/PG_NA12878_chr21.vcf.gz       example/happy/NA12878_chr21.vcf.gz       -f example/happy/PG_Conf_chr21.bed.gz       -o test -r example/chr21.fa
#+end_src
#+RESULTS:
| Type  | Filter | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision | METRIC.Frac_NA | METRIC.F1_Score |
| INDEL | ALL    |        8937 |     7839 |     1098 |       11812 |      343 |      3520 |    45 |   283 |      0.877140 |         0.958635 |       0.298002 |        0.916079 |
| INDEL | PASS   |        8937 |     7550 |     1387 |        9971 |      283 |      1964 |    30 |   242 |      0.844803 |         0.964656 |       0.196971 |        0.900760 |
| SNP   | ALL    |       52494 |    52125 |      369 |       90092 |      582 |     37348 |   107 |   354 |      0.992971 |         0.988966 |       0.414554 |        0.990964 |
| SNP   | PASS   |       52494 |    46920 |     5574 |       48078 |      143 |       992 |     8 |    97 |      0.893816 |         0.996963 |       0.020633 |        0.942576 |
**** TODO Version avec rtg-tools
**** TODO Faire fonctionner Tests
***** TODO Essai 2 : depuis nix develop:
SCHEDULED: <2023-05-20 Sat>
#+begin_src
nix develop .#hap-py
genericBuild
#+end_src
Lancé initialement à la main, mais on peut maintenant utiliser run_tests
#+begin_src
HCDIR=bin/ ../src/sh/run_tests.sha
#+end_src
- [X] test boost
- [X] multimerge
- [X] hapenum
- [X] fp accuracy
- [X] faulty variant
- leftshift fails
- [X] other vcf
- [X] chr prefix
- [X] gvcf
- [X] decomp
- [X] contig lengt
- [X]  integration test
- [ ] scmp fails sur le type
- [X] giab
- [X] performance
- [ ] quantify fails sur le type
- [ ] stratified échec sur les résultats !
- [X] pg counting
- [ ] sompy: ne trouve pas Strelka dans somatic
phases="buildPhase checkPhase installPhase fixupPhase" genericBuild
#+end_src
**** KILL Reproduire les performances precisionchallenge : attention à HG002 et HG001!
CLOSED: [2023-04-01 Sat 19:43]
https://www.nist.gov/programs-projects/genome-bottle
***** KILL 0GOOR
CLOSED: [2023-04-01 Sat 19:40]
Le problème venait 1. de l'ADN et 2. du renommage des chromosomes qui était faux
****** DONE HG002
CLOSED: [2023-02-17 Fri 19:31]
 Type Filter  TRUTH.TOTAL  TRUTH.TP  TRUTH.FN  QUERY.TOTAL  QUERY.FP  QUERY.UNK  FP.gt  FP.al  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA  METRIC.F1_Score
INDEL    ALL       525466    491355     34111      1156702     57724     605307   9384  25027       0.935084          0.895313        0.523304         0.914766
INDEL   PASS       525466    491355     34111      1156702     57724     605307   9384  25027       0.935084          0.895313        0.523304         0.914766
  SNP    ALL      3365115   3358399      6716      5666020     21995    2284364   4194   1125       0.998004          0.993496        0.403169         0.995745
  SNP   PASS      3365115   3358399      6716      5666020     21995    2284364   4194   1125       0.998004          0.993496        0.403169         0.995745
 TRUTH.TOTAL.TiTv_ratio  QUERY.TOTAL.TiTv_ratio  TRUTH.TOTAL.het_hom_ratio  QUERY.TOTAL.het_hom_ratio
                    NaN                     NaN                   1.528276                   2.752637
                    NaN                     NaN                   1.528276                   2.752637
               2.100129                1.473519                   1.581196                   1.795603
               2.100129                1.473519                   1.581196                   1.795603
***** KILL Avec python2
CLOSED: [2023-02-17 Fri 19:25]
****** KILL avec nix
CLOSED: [2023-02-17 Fri 19:25]
conda create -n python2 python=2.7 anaconda
****** KILL avec conda
CLOSED: [2023-02-17 Fri 19:25]
******* Gentoo: regex_error sur test...
Ok avec bash !
#+begin_src
anaconda3/bin/conda create --name py2 python=2.7
conda activate py2
conda install -c bioconda hap.py
#+end_src
******** Faire tourner les tests.
Il faut remplace bin/test_haplotypes par test_haplotypes dans src/sh/run_tests.sh
#+begin_src sh
 HGREF=../genome/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta HCDIR=~/anaconda3/envs/py2/bin bash src/sh/run_tests.sh
#+end_src
Echec:
test_haplotypes: /opt/conda/conda-bld/work/hap.py-0.3.7/src/c++/lib/tools/Fasta.cpp:81: MMappedFastaFile::MMappedFastaFile(const string&): Assertion `fd != -1' failed.
unknown location(0): fatal error in "testVariantPrimitiveSplitter": signal: SIGABRT (application abort requested)
/opt/conda/conda-bld/work/hap.py-0.3.7/src/c++/test/test_align.cpp(298): last checkpoint
******** Chr21
HGREF=../genome/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta hap.py        example/happy/PG_NA12878_chr21.vcf.gz       example/happy/NA12878_chr21.vcf.gz       -f example/happy/PG_Conf_chr21.bed.gz       -o test
******* Helios
échec
** DONE Exécution
CLOSED: [2022-09-13 Tue 21:37]
*** KILL test Bionix
*** KILL Implémenter execution avec Nix ?
Voir https://academic.oup.com/gigascience/article/9/11/giaa121/5987272?login=false
pour un exemple.
Probablement plus simple d’utiliser Nix pour gestion de l’environnement et snakemake pour l’exécution
Pas d’accès internet depuis le cluster
*** DONE nextflow
CLOSED: [2022-09-13 Tue 21:37]
**** TODO Bug scheduler SGE
Le job se fait tuer car l'utilisateur n'est pas passé correctement à nextflow
***** DONE Forcer l'utilisateur à l'exécution
CLOSED: [2023-04-01 Sat 17:57]
NXF_OPTS=-D"user.name=alex"
***** DONE Vérifier si le problème persiste avec 22.10.6
CLOSED: [2023-04-01 Sat 18:38] SCHEDULED: <2023-04-01 Sat>
oui
***** KILL Packager l'utilisateur dans le programme ?
Mauvaise idée..
** TODO Preprocessing avec nextflow
*** TODO Map to reference
**** TODO Sample ID dans header
/Work/Users/apraga/bisonex/out/63003856_S135/preprocessing/baserecalibrator
*** DONE Mark duplicate
CLOSED: [2022-10-09 Sun 22:30]
*** DONE Recalibrate base quality score
CLOSED: [2022-10-09 Sun 22:30]
** DONE Variant calling avec Nextflow
CLOSED: [2022-11-19 Sat 21:34]
*** DONE Haplotype caller
CLOSED: [2022-10-09 Sun 22:40]
*** DONE Filter variants
CLOSED: [2022-10-09 Sun 22:40]
*** DONE Filter common snp not clinvar path
CLOSED: [2022-11-07 Mon 23:00]
Voir [[*common dbSNP not clinvar patho][common dbSNP not clinvar patho]]
*** DONE Filter variant only in consensual sequence
CLOSED: [2022-11-08 Tue 22:23]
*** DONE Filter technical variants
CLOSED: [2022-11-19 Sat 21:34]
*** DONE Utilise AVX pour accélerer l'exécution
CLOSED: [2023-04-29 Sat 15:46]
Sans cela, on a l'avertissement
#+begin_quote
17:28:00.720 INFO  PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported
17:28:00.721 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/nix/store/cy9ckxqwrkifx7wf02hm4ww1p6lnbxg9-gatk-4.2.4.1/bin/gatk-package-4.2.4.1-local.jar!/com/intel/gkl/native/libgkl_utils.so
17:28:00.733 WARN  NativeLibraryLoader - Unable to load libgkl_utils.so from native/libgkl_utils.so (/Work/Users/apraga/bisonex/out/NA12878_NIST7035/preprocessing/applybqsr/libgkl_utils821485189051585397.so: libgomp.so.1: cannot open shared object file: No such file or directory)
17:28:00.733 WARN  IntelPairHmm - Intel GKL Utils not loaded
17:28:00.733 WARN  PairHMM - ***WARNING: Machine does not have the AVX instruction set support needed for the accelerated AVX PairHmm. Falling back to the MUCH slower LOGLESS_CACHING implementation!
17:28:00.763 INFO  ProgressMeter - Starting traversal
#+end_quote
libgomp.so est fourni par gcc donc il faut charger le module
 module load gcc@11.3.0/gcc-12.1.0
** KILL Utiliser subworkflow
CLOSED: [2023-04-02 Sun 18:08]
Notre version permet d'être plus souple
*** KILL Alignement
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
*** KILL Vep
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
vcf_annotate_ensemblvep
** TODO Annotation avec nextflow :annotation:
*** KILL VEP : --gene-phenotype ?
CLOSED: [2023-04-18 mar. 18:32]
Vu avec alexis : bases de données non à jour
https://www.ensembl.org/info/genome/variation/phenotype/sources_phenotype_documentation.html
*** DONE plugin VEP
CLOSED: [2023-04-18 mar. 18:32]
Cloner dépôt git avec plugin
Puis utiliser --dir_plugins
*** HOLD Utiliser code d’Alexis
*** TODO Nouvelle version avec VEP
Example avec --custom
https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html
**** TODO Ajout spliceAI
SCHEDULED: <2023-04-30 Sun>
plugin VEP
***** DONE Télécharger les données
CLOSED: [2023-05-11 Thu 19:01]
Difficile d'automatiser, le lien est temporaire...
***** DONE PLugin
CLOSED: [2023-05-11 Thu 20:16]
***** DONE Séparer score en plusieurs colonnes
CLOSED: [2023-05-11 Thu 20:16]
Test avec ce fichier pour avoir une ligne avec annotation et une ligne sans
#CHROM	POS	ID	REF	ALT
1	9091	.	A	C
1	69091	.	A	C
et
#+begin_src sh
rm -f postvep.tsv* && vep -i testspliceai.vcf.gz -o postvep.tsv --tab  --dir 109 --merged --pick --use_given_ref   --offline  --plugin SpliceAI,snv=spliceai_scores.raw.snv.hg38.vcf.gz,indel=spliceai_scores.raw.indel.hg38.vcf.gz
#+end_src
#+begin_src
$ bgzip postvep.tsv
$ python spliceai.py
$ cat postvep2.tsv
,variation,Location,Allele,Gene,Feature,Feature_type,Consequence,cDNA_position,CDS_position,Protein_position,Amino_acids,Codons,Existing_variation,IMPACT,DISTANCE,STRAND,FLAGS,REFSEQ_MATCH,SOURCE,REFSEQ_OFFSET,SpliceAI_AG,SpliceAI_AL,SpliceAI_DG,SpliceAI_DL
0,1_9091_A/C,1:9091,C,ENSG00000290825,ENST00000456328,Transcript,upstream_gene_variant,-,-,-,-,-,-,MODIFIER,2778,1,-,-,Ensembl,-,,,,
1,1_69091_A/C,1:69091,C,ENSG00000186092,ENST00000641515,Transcript,missense_variant,124,64,22,M/L,Atg/Ctg,-,MODERATE,-,1,-,-,Ensembl,-,0.01,0.00,0.00,0.01
#+end_src
Test
cp work/bf/437ae511958509e43072f032f4d495/small.tab.gz tests/vep-spip.tab.gz
cp work/d5/3b1244b5ae83d54409ee0d456e8c55/small_cadd.tab.gz tests/vep-cadd-splice.tab.gz
**** TODO Ajout LOEUF et pli
plugin VEP
**** TODO NMD
**** KILL Ajout LOEUF
CLOSED: [2023-04-19 mer. 16:32]
plugin VEP
**** DONE Spip
CLOSED: [2023-05-01 Mon 23:07] SCHEDULED: <2023-04-30 Sun>
BED ne semble pas bien marcher (il faut définir une zone)
VCF : trop d’information
Attention, plusieurs transcripts mais résultats identiques. On supprimer les doublons
***** DONE interpretation + score + intervalle de confiance séparé
CLOSED: [2023-05-01 Mon 23:07] SCHEDULED: <2023-04-30 Sun>
Tests :
dans tests/
vep -i 63004925-small.vcf -o postvep.vcf --vcf --fasta genomeRef.fna --dir 109 --merged --pick  --offline --custom ../script/spip_annotation.vcf.gz,SPIP,vcf,exact,0,spipInterp,spipScore,spipConfidence
***** DONE Score
CLOSED: [2023-04-22 Sat 15:30]
**** DONE CADD: remplacer par plugin VEP
CLOSED: [2023-05-07 Sun 14:45] SCHEDULED: <2023-05-07 Sun>
***** Test
#+begin_src
vep  -i test.vcf  -o lol.vcf --offline --dir  /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz  --dir_plugins ../VEP_plugins/ -v
#+end_src
Test
#+begin_src sh
vep --id "1  230710048 230710048 A/G 1"   --offline --dir  /Work/Projects/bisonex/data/vep/GRCh38/ --merged --vcf --fasta /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna --plugin CADD,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.snv.tsv.gz,/Work/Users/apraga/bisonex/work/13/9287a7fef17ab9365f5696f20710cd/gnomad.genomes.r3.0.indel.tsv.gz  --hgvsg --plugin pLI --plugin LOEUF -o lol
#+end_src
CSQ=G|missense_variant|MODERATE|AGT|ENSG00000135744|Transcript|ENST00000366667|protein_coding|2/5||||843|776|259|M/T|aTg/aCg|||-1||HGNC|HGNC:333||Ensembl||A|A||1:g.230710048A>G|0.347|-0.277922|
Correspond bien à https://www.ensembl.org/Homo_sapiens/Tools/VEP/Results?tl=I7ZsIbrj14P6lD43-9115494
***** DONE Utiliser whole genome
CLOSED: [2023-04-29 Sat 15:46]
***** KILL Renommer les chromosome avant ...
CLOSED: [2023-05-01 Mon 09:14] SCHEDULED: <2023-04-30 Sun>
Trop long !
- Téléchargement de CADD: 4h20
- renommer les chromosome pour SNV : 6h20
- tabix sur les SNV : job tué au bout de 21h....
***** DONE annoter séparément et fusionner les tableaux
CLOSED: [2023-05-07 Sun 14:45] SCHEDULED: <2023-05-01 Mon>
NB: on pourrait filtrer CADD avec tabix pour se restreindre à nos variants
**** DONE clinvar
CLOSED: [2023-04-22 Sat 15:31]
**** KILL Vérifier résultats HGVS avec mutalyzer
CLOSED: [2023-05-01 Mon 09:26]
**** TODO Parallélisation
***** HOLD par chromosome avec workflow VEP
https://github.com/Ensembl/ensembl-vep/blob/release/109/nextflow/workflows/run_vep.nf
***** HOLD Avec option --fork
**** DONE Utiliser la version de nf-core de VEP
CLOSED: [2023-05-13 Sat 18:27] SCHEDULED: <2023-05-07 Sun>
**** DONE OMIM
CLOSED: [2023-05-08 Mon 15:02] SCHEDULED: <2023-05-01 Mon>
**** TODO Grantham
SCHEDULED: <2023-05-01 Mon>
**** TODO ACMG incidental
SCHEDULED: <2023-05-01 Mon>
**** TODO Gnomad ?
SCHEDULED: <2023-05-01 Mon>
**** DONE Filtrer après VEP avec filter_vep
CLOSED: [2023-04-29 Sat 15:47]
nNon testé
*** HOLD Ancienne version
**** TODO HGVS
**** TODO Filtrer après VEP
**** TODO OMIM
**** TODO clinvar
**** TODO ACMG incidental
**** TODO Grantham
**** KILL LRG
CLOSED: [2023-04-18 mar. 17:22] SCHEDULED: <2023-04-18 Tue>
Vu avec alexis, n’est plus à jour
**** TODO Gnomad
** DONE Porter exactement la version d'Alexis sur Helios
CLOSED: [2023-01-14 Sat 17:56]
Branche "prod"
** STRT Tester version d'alexis avec Nix
*** DONE Ajouter clinvar
CLOSED: [2022-11-13 Sun 19:37]
*** DONE Alignement
CLOSED: [2022-11-13 Sun 12:52]
*** DONE Haplotype caller
CLOSED: [2022-11-13 Sun 13:00]
*** TODO Filter
- [X] depth
- [ ] comon snp not path
Problème avec liste des ID
**** TODO variant annotation
Besoin de vep
*** TODO Variant calling
* Amélioration :amelioration:
* Documentation :doc:
** DONE Procédure d'installation nix + dependences pour VM CHU
CLOSED: [2023-04-22 Sat 1

Replacement in projects/bisonex.org at line 42 [10.35]

B:BD[13.373129] → [13.373129:377394]

B:BD[13.377394] → [4.10785:18999]

onnes séparées !
Essai avec +200bp de chaque côt
NC_000001.11 153817371  153817542
NC_000001.11 153817171  153817742
#+begin_src sh :dir ~/code/bisonex/test-art :results silent
echo -e "NC_000001.11\t153817171\t153817742"> gatad2b-exon6-offset.bed
bedtools getfasta -fi ../test-simuscop/genomeRef.fna -bed gatad2b-exon6-offset.bed -fo gatad2b-exon6-offset.fa
../art_bin_MountRainier/art_illumina -ss HS25 -i gatad2b-exon6-offset.fa -o ./gatad2b -l 150 -f 100 -p -m 500 -s 50
bwa mem ../test-simuscop/bwa/genomeRef gatad2b1.fq gatad2b2.fq | samtools sort -o gatad2b.bam
samtools index gatad2b.bam
#+end_src
En changeant la longueur des reads
#+begin_src sh :dir ~/code/bisonex/test-art :results silent
echo -e "NC_000001.11\t153817171\t153817742"> gatad2b-exon6-offset.bed
bedtools getfasta -fi ../test-simuscop/genomeRef.fna -bed gatad2b-exon6-offset.bed -fo gatad2b-exon6-offset.fa
../art_bin_MountRainier/art_illumina -ss HS25 -i gatad2b-exon6-offset.fa -o ./gatad2b -l 126 -f 100 -p -m 500 -s 50
bwa mem ../test-simuscop/bwa/genomeRef gatad2b1.fq gatad2b2.fq | samtools sort -o gatad2b.bam
samtools index gatad2b.bam
#+end_src
Idem, fait 2 "tas" séparés de chaque côté de l'exon
+/- 100
#+begin_src sh :dir ~/code/bisonex/test-art :results silent
echo -e "NC_000001.11\t153817271\t153817642" > gatad2b-exon6-offset.bed
bedtools getfasta -fi ../test-simuscop/genomeRef.fna -bed gatad2b-exon6-offset.bed -fo gatad2b-exon6-offset.fa
../art_bin_MountRainier/art_illumina -ss HS25 -i gatad2b-exon6-offset.fa -o ./gatad2b -l 125 -f 100 -p -m 500 -s 30
bwa mem ../test-simuscop/bwa/genomeRef gatad2b1.fq gatad2b2.fq | samtools sort -o gatad2b.bam
samtools index gatad2b.bam
#+end_src
*** KILL NGSNG : plante
CLOSED: [2023-05-01 Mon 09:27]
https://github.com/RAHenriksen/NGSNGS
**** KILL Essai avec fasta
CLOSED: [2023-05-01 Mon 09:27]
#+begin_src sh :dir ~/code/bisonex/test-ngsngs :results silent
echo -e "NC_000001.11\t153817371\t153817542" > gatad2b-exon6.bed
bedtools getfasta -fi ../test-simuscop/genomeRef.fna -bed gatad2b-exon6.bed -fo gatad2b-exon6.fa
#+end_src
-ss HS25 : nom du profile illumina
-l 150 : reads de 150
-f 10 : coverage de 10
-p : paired end
-m 500 : longueur moyenne des fragment d'ADN
-s 10 : déviation standard
#+begin_src sh :dir ~/code/bisonex/test-art :results silent
../art_bin_MountRainier/art_illumina -ss HS25 -i gatad2b-exon6.fa -o ./gatad2b -l 150 -f 100 -p -m 500 -s 10
#+end_src
#+begin_src sh :dir ~/code/bisonex/test-art :results silent
bwa mem ../test-simuscop/bwa/genomeRef gatad2b1.fq gatad2b2.fq | samtools sort -o gatad2b.b
am
samtools index gatad2b.bam
#+end_src
*** TODO Insérer variants dans patients centogène :generate:
SCHEDULED: <2023-05-01 Mon>
**** TODO SNV
SCHEDULED: <2023-05-01 Mon>
***** DONE Script python: ok seulement pour reads corrigé, trop long sinon
CLOSED: [2023-05-01 Mon 13:32]
Même sur un seul chromosome (15)...
***** DONE Couper les données avec bedtools intersect ?
CLOSED: [2023-05-01 Mon 20:33]
-wa pour avoir les reads qui corresponds
-v pour ceux qui n'intersecte pass
ok sur un exemple
****** DONE Test simple : ok
CLOSED: [2023-05-01 Mon 13:43]
#+attr_html: :width 500px
[[./test-intersect-chr15.png]]
#+attr_html: :width 500px
[[./test-nointersect-chr15.png]]
****** DONE Chromosome 15 : vérifier BAM
CLOSED: [2023-05-01 Mon 20:33] SCHEDULED: <2023-05-01 Mon>
#+begin_src
samtools view -b `63003856_S135.bam`  NC_000015.10 > `63003856_S135_chr15.bam`
#+end_src
On génère un python avec les dépendances
#+begin_src
nix-build
#+end_src
Puis on lance un script julia qui va couper le bam en 2, lancer le script python sur l'intersection et fusionner le résultat
#+begin_src
julia -Jbisonex.so --project=. insertVariants.jl `63003856_S135_chr15.bam` test_new.bam
#+end_src
NB: pour accélerer l'exécution, générer une sysimage :
#+begin_src julia
(1.7) pkg> activate .
julia> create_sysimage(; sysimage_path="bisonex.so")
#+end_src
3 variants: Ok sur le nombre de reads et les variants
Ok pour homozygote
#+attr_html: :width 500px
[[./check-snv-chr15.png]]
****** DONE Chromosome 15 Vérifier VAF avec checkBam.jl: ok
julia> @subset snv :chrom .== "NC_000015.10"
26×10 DataFrame
 Row │ chrom         pos       variant        variantType  zygosity      ref        alt        refCount  altCount  readsCount
     │ SubStrin…?    Int64     SubStrin…?     String?      String15      SubStrin…  SubStrin…  Int64     Int64     Int64
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ NC_000015.10  74343027  g.74343027C>T  snv          heterozygous  C          T                61        58         120
   2 │ NC_000015.10  75400778  g.75400778C>G  snv          heterozygous  C          G               108        79         187
   3 │ NC_000015.10  89327201  g.89327201C>T  snv          heterozygous  C          T               243       241         486
   4 │ NC_000015.10  48767448  g.48767448A>C  snv          heterozygous  A          C                72        70         142
   5 │ NC_000015.10  75411685  g.75411685T>C  snv          heterozygous  T          C                79        81         160
   6 │ NC_000015.10  66703292  g.66703292C>T  snv          heterozygous  C          T                70        60         130
   7 │ NC_000015.10  89325639  g.89325639G>A  snv          heterozygous  G          A               257       267         524
   8 │ NC_000015.10  89330184  g.89330184G>A  snv          heterozygous  G          A               258       287         548
   9 │ NC_000015.10  89330184  g.89330184G>A  snv          heterozygous  G          A               258       287         548
  10 │ NC_000015.10  89325639  g.89325639G>A  snv          heterozygous  G          A               257       267         524
  11 │ NC_000015.10  42401752  g.42401752G>A  snv          homozygous    G          A                61       212         273
  12 │ NC_000015.10  89327201  g.89327201C>T  snv          heterozygous  C          T               243       241         486
  13 │ NC_000015.10  38339896  g.38339896G>A  snv          heterozygous  G          A                56        86         144
  14 │ NC_000015.10  26869324  g.26869324A>T  snv          heterozygous  A          T                62        49         113
  15 │ NC_000015.10  66435145  g.66435145G>A  snv          heterozygous  G          A                98        95         193
  16 │ NC_000015.10  60514655  g.60514655G>A  snv          heterozygous  G          A                94        99         194
  17 │ NC_000015.10  42410947  g.42410947A>G  snv          heterozygous  A          G               153       123         276
  18 │ NC_000015.10  75430368  g.75430368C>T  snv          heterozygous  C          T                80        62         142
  19 │ NC_000015.10  25375494  g.25375494T>C  snv          heterozygous  T          C               103       104         207
  20 │ NC_000015.10  60497497  g.60497497C>A  snv          heterozygous  C          A                61        65         126
  21 │ NC_000015.10  74891539  g.74891539C>T  snv          heterozygous  C          T               118       124         242
  22 │ NC_000015.10  48488433  g.48488433A>G  snv          heterozygous  A          G               367       122         492
  23 │ NC_000015.10  89318565  g.89318565A>G  snv          heterozygous  A          G               303        98         404
  24 │ NC_000015.10  89323426  g.89323426C>G  snv          heterozygous  C          G                93       109         202
  25 │ NC_000015.10  89318595  g.89318595T>C  snv          heterozygous  T          C               321       128         453
  26 │ NC_000015.10  48488437  g.48488437T>C  snv          heterozygous  T          C               356       132         488
CLOSED: [2023-05-01 Mon 17:18]
***** TODO Chromosome1 15 :Test haplotype caller : échec car CIGARE non mis à jour
SCHEDULED: <2023-05-01 Mon>
#+begin_src
julia -Jbisonex.so --project=. insertVariants.jl `63003856_S135_chr15.bam` 63003856_S135_chr15_inserted.bam
scp 63003856_S135_chr15_inserted.bam* meso:/Work/Users/apraga/bisonex/tests/synthetic/
#+end_src
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/synthetic :results silent
ln -s /Work/Projects/bisonex/data/dbSNP/GRCh38.p13/dbSNP.gz .
ln -s /Work/Projects/bisonex/data/dbSNP/GRCh38.p13/dbSNP.gz.tbi
ln -s /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.dict .
ln -s /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna .
ln -s /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna.fai .
#+end_src
puis
#+begin_src
gatk --java-options "-Xmx3072M" HaplotypeCaller --input 63003856_S135_chr15_inserted.bam --output testchr15.vcf.gz --reference genomeRef.fna  --tmp-dir . -L NC_000015.10
#+end_src
scp meso:/Work/Users/apraga/bisonex/tests/synthetic/testchr15.vcf.gz haplotypecaller-chr15.vcf.gz
Aucun variant inséré
- base quality ok
  -
****** DONE bam out : non appelé
CLOSED: [2023-05-01 Mon 21:57]
gatk --java-options "-Xmx3072M" HaplotypeCaller --input 63003856_S135_chr15_inserted.bam     --output haplotypecaller-chr15.vcf.gz --reference genomeRef.f
na  --tmp-dir . -L NC_000015.10  --bam-output debug.bam
****** DONE --linked-de-bruijn-graph : idem
CLOSED: [2023-05-01 Mon 21:57]
readlink testchr15.vcf.gz -f^C
[apraga@mesointeractive synthetic]$ gatk --java-options "-Xmx3072M" HaplotypeCaller --input 63003856_S135_chr15_inserted.bam     --output haplotypecaller-chr15.vcf.gz --reference genomeRef.fna  --tmp-dir . -L NC_000015.10  --linked-de-bruijn-graph
****** HOLD regénérer fastq
***** TODO Générer bam données pour tous les chromosomes
 timeit julia -Jbisonex.so --project=. insertVariants.jl ~/code/bisonex/out/63003856/preprocessing/63003856_S135.bam 63003856_S135_inserted.bam
40min 516ms 835µs 405ns
Avertissement:
 [W::bam_hdr_read] EOF marker is absent. The input is probably truncated
Inserted.bam et excluded.bam (fichier avant le merge)  ont l'air ok...
On réessaie à la main : ça passe
#+begin_src
samtools merge test-all.bam inserted.bam excluded.bam
❯ mv test-all.bam `63003856_S135_inserted.bam` -f
❯ mv test-all.bam.bai `63003856_S135_chr15_inserted.bam.bai` -f
#+end_src
***** DONE BAm2fastq pour avoir CIGAR à jour : échec (variants "cachés")
CLOSED: [2023-05-04 Thu 20:30] SCHEDULED: <2023-05-01 Mon>
On lance la génération de bam depuis le mesocentro (la copie plante via le VPN)
#+begin_src sh
cd /Work/Users/apraga/recherche/bisonex/generate
julia --project=. insertVariants.jl  ../../../bisonex/out/63003856_S135/preprocessing/applybqsr/63003856_S135.bam 63003856_S135_inserted.bam
#+end_src
Workflow après avec désactivé storeDir pour SAMTOOLS_BAM2FQ dans nextflow.config (pourquoi ??)
#+begin_src nextflow
include { SAMTOOLS_BAM2FQ }                            from "${params.modulesDir}/samtools/bam2fq/main"
include { SAMTOOLS_SORT as sortBamByName }             from "${params.modulesDir}/samtools/sort/main"
workflow {
    f = Channel.fromPath("${params.dataDir}/synthetic/63003856_S135_inserted.bam",
                         checkIfExists: true).map{it -> [["id": "synthetic_63003856"], it]}
    // Important: use "-n" option !!
    sortBamByName(f)
    SAMTOOLS_BAM2FQ(sortBamByName.out.bam, true)
}
#+end_src
Puis
#+begin_src
cp work/34/fb2fc136f6f6d7f42d0960512f06de/*.fq.gz /Work/Groups/bisonex/data/synthetic/
#+end_src
***** KILL Lancer pipeline
CLOSED: [2023-05-04 Thu 20:30] SCHEDULED: <2023-05-01 Mon>
NXF_OPTS=-D"user.name=apraga" nextflow run   main.nf -c nextflow.config  -profile standard,helios -bg --input="/Work/Groups/bisonex/data/synthetic/synthetic_63003856_{1,2}.fq.gz" --outdir out/synthetic_63003856
*** Divers
**** DONE Vérifier nombre de reads fastq - bam
CLOSED: [2022-10-09 Sun 22:31]
* DONE Plot : ashkenazim trio
CLOSED: [2023-04-18 Tue 21:28] SCHEDULED: <2023-04-16 Sun>
/Entered on/ [2023-04-16 Sun 17:29]

[13.373129]

onnes séparées !
Essai avec +200bp de chaque côt
NC_000001.11 153817371  153817542
NC_000001.11 153817171  153817742
#+begin_src sh :dir ~/code/bisonex/test-art :results silent
echo -e "NC_000001.11\t153817171\t153817742"> gatad2b-exon6-offset.bed
bedtools getfasta -fi ../test-simuscop/genomeRef.fna -bed gatad2b-exon6-offset.bed -fo gatad2b-exon6-offset.fa
../art_bin_MountRainier/art_illumina -ss HS25 -i gatad2b-exon6-offset.fa -o ./gatad2b -l 150 -f 100 -p -m 500 -s 50
bwa mem ../test-simuscop/bwa/genomeRef gatad2b1.fq gatad2b2.fq | samtools sort -o gatad2b.bam
samtools index gatad2b.bam
#+end_src
En changeant la longueur des reads
#+begin_src sh :dir ~/code/bisonex/test-art :results silent
echo -e "NC_000001.11\t153817171\t153817742"> gatad2b-exon6-offset.bed
bedtools getfasta -fi ../test-simuscop/genomeRef.fna -bed gatad2b-exon6-offset.bed -fo gatad2b-exon6-offset.fa
../art_bin_MountRainier/art_illumina -ss HS25 -i gatad2b-exon6-offset.fa -o ./gatad2b -l 126 -f 100 -p -m 500 -s 50
bwa mem ../test-simuscop/bwa/genomeRef gatad2b1.fq gatad2b2.fq | samtools sort -o gatad2b.bam
samtools index gatad2b.bam
#+end_src
Idem, fait 2 "tas" séparés de chaque côté de l'exon
+/- 100
#+begin_src sh :dir ~/code/bisonex/test-art :results silent
echo -e "NC_000001.11\t153817271\t153817642" > gatad2b-exon6-offset.bed
bedtools getfasta -fi ../test-simuscop/genomeRef.fna -bed gatad2b-exon6-offset.bed -fo gatad2b-exon6-offset.fa
../art_bin_MountRainier/art_illumina -ss HS25 -i gatad2b-exon6-offset.fa -o ./gatad2b -l 125 -f 100 -p -m 500 -s 30
bwa mem ../test-simuscop/bwa/genomeRef gatad2b1.fq gatad2b2.fq | samtools sort -o gatad2b.bam
samtools index gatad2b.bam
#+end_src
*** KILL NGSNG : plante
CLOSED: [2023-05-01 Mon 09:27]
https://github.com/RAHenriksen/NGSNGS
**** KILL Essai avec fasta
CLOSED: [2023-05-01 Mon 09:27]
#+begin_src sh :dir ~/code/bisonex/test-ngsngs :results silent
echo -e "NC_000001.11\t153817371\t153817542" > gatad2b-exon6.bed
bedtools getfasta -fi ../test-simuscop/genomeRef.fna -bed gatad2b-exon6.bed -fo gatad2b-exon6.fa
#+end_src
-ss HS25 : nom du profile illumina
-l 150 : reads de 150
-f 10 : coverage de 10
-p : paired end
-m 500 : longueur moyenne des fragment d'ADN
-s 10 : déviation standard
#+begin_src sh :dir ~/code/bisonex/test-art :results silent
../art_bin_MountRainier/art_illumina -ss HS25 -i gatad2b-exon6.fa -o ./gatad2b -l 150 -f 100 -p -m 500 -s 10
#+end_src
#+begin_src sh :dir ~/code/bisonex/test-art :results silent
bwa mem ../test-simuscop/bwa/genomeRef gatad2b1.fq gatad2b2.fq | samtools sort -o gatad2b.b
am
samtools index gatad2b.bam
#+end_src
*** KILL Script maison :generate:
CLOSED: [2023-05-13 Sat 18:29] SCHEDULED: <2023-05-01 Mon>
**** KILL SNV
CLOSED: [2023-05-13 Sat 18:29] SCHEDULED: <2023-05-01 Mon>
***** DONE Script python: ok seulement pour reads corrigé, trop long sinon
CLOSED: [2023-05-01 Mon 13:32]
Même sur un seul chromosome (15)...
***** DONE Couper les données avec bedtools intersect ?
CLOSED: [2023-05-01 Mon 20:33]
-wa pour avoir les reads qui corresponds
-v pour ceux qui n'intersecte pass
ok sur un exemple
****** DONE Test simple : ok
CLOSED: [2023-05-01 Mon 13:43]
#+attr_html: :width 500px
[[./test-intersect-chr15.png]]
#+attr_html: :width 500px
[[./test-nointersect-chr15.png]]
****** DONE Chromosome 15 : vérifier BAM
CLOSED: [2023-05-01 Mon 20:33] SCHEDULED: <2023-05-01 Mon>
#+begin_src
samtools view -b `63003856_S135.bam`  NC_000015.10 > `63003856_S135_chr15.bam`
#+end_src
On génère un python avec les dépendances
#+begin_src
nix-build
#+end_src
Puis on lance un script julia qui va couper le bam en 2, lancer le script python sur l'intersection et fusionner le résultat
#+begin_src
julia -Jbisonex.so --project=. insertVariants.jl `63003856_S135_chr15.bam` test_new.bam
#+end_src
NB: pour accélerer l'exécution, générer une sysimage :
#+begin_src julia
(1.7) pkg> activate .
julia> create_sysimage(; sysimage_path="bisonex.so")
#+end_src
3 variants: Ok sur le nombre de reads et les variants
Ok pour homozygote
#+attr_html: :width 500px
[[./check-snv-chr15.png]]
****** DONE Chromosome 15 Vérifier VAF avec checkBam.jl: ok
julia> @subset snv :chrom .== "NC_000015.10"
26×10 DataFrame
 Row │ chrom         pos       variant        variantType  zygosity      ref        alt        refCount  altCount  readsCount
     │ SubStrin…?    Int64     SubStrin…?     String?      String15      SubStrin…  SubStrin…  Int64     Int64     Int64
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ NC_000015.10  74343027  g.74343027C>T  snv          heterozygous  C          T                61        58         120
   2 │ NC_000015.10  75400778  g.75400778C>G  snv          heterozygous  C          G               108        79         187
   3 │ NC_000015.10  89327201  g.89327201C>T  snv          heterozygous  C          T               243       241         486
   4 │ NC_000015.10  48767448  g.48767448A>C  snv          heterozygous  A          C                72        70         142
   5 │ NC_000015.10  75411685  g.75411685T>C  snv          heterozygous  T          C                79        81         160
   6 │ NC_000015.10  66703292  g.66703292C>T  snv          heterozygous  C          T                70        60         130
   7 │ NC_000015.10  89325639  g.89325639G>A  snv          heterozygous  G          A               257       267         524
   8 │ NC_000015.10  89330184  g.89330184G>A  snv          heterozygous  G          A               258       287         548
   9 │ NC_000015.10  89330184  g.89330184G>A  snv          heterozygous  G          A               258       287         548
  10 │ NC_000015.10  89325639  g.89325639G>A  snv          heterozygous  G          A               257       267         524
  11 │ NC_000015.10  42401752  g.42401752G>A  snv          homozygous    G          A                61       212         273
  12 │ NC_000015.10  89327201  g.89327201C>T  snv          heterozygous  C          T               243       241         486
  13 │ NC_000015.10  38339896  g.38339896G>A  snv          heterozygous  G          A                56        86         144
  14 │ NC_000015.10  26869324  g.26869324A>T  snv          heterozygous  A          T                62        49         113
  15 │ NC_000015.10  66435145  g.66435145G>A  snv          heterozygous  G          A                98        95         193
  16 │ NC_000015.10  60514655  g.60514655G>A  snv          heterozygous  G          A                94        99         194
  17 │ NC_000015.10  42410947  g.42410947A>G  snv          heterozygous  A          G               153       123         276
  18 │ NC_000015.10  75430368  g.75430368C>T  snv          heterozygous  C          T                80        62         142
  19 │ NC_000015.10  25375494  g.25375494T>C  snv          heterozygous  T          C               103       104         207
  20 │ NC_000015.10  60497497  g.60497497C>A  snv          heterozygous  C          A                61        65         126
  21 │ NC_000015.10  74891539  g.74891539C>T  snv          heterozygous  C          T               118       124         242
  22 │ NC_000015.10  48488433  g.48488433A>G  snv          heterozygous  A          G               367       122         492
  23 │ NC_000015.10  89318565  g.89318565A>G  snv          heterozygous  A          G               303        98         404
  24 │ NC_000015.10  89323426  g.89323426C>G  snv          heterozygous  C          G                93       109         202
  25 │ NC_000015.10  89318595  g.89318595T>C  snv          heterozygous  T          C               321       128         453
  26 │ NC_000015.10  48488437  g.48488437T>C  snv          heterozygous  T          C               356       132         488
CLOSED: [2023-05-01 Mon 17:18]
***** KILL Chromosome1 15 :Test haplotype caller : échec car CIGARE non mis à jour
CLOSED: [2023-05-13 Sat 18:29] SCHEDULED: <2023-05-01 Mon>
#+begin_src
julia -Jbisonex.so --project=. insertVariants.jl `63003856_S135_chr15.bam` 63003856_S135_chr15_inserted.bam
scp 63003856_S135_chr15_inserted.bam* meso:/Work/Users/apraga/bisonex/tests/synthetic/
#+end_src
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/synthetic :results silent
ln -s /Work/Projects/bisonex/data/dbSNP/GRCh38.p13/dbSNP.gz .
ln -s /Work/Projects/bisonex/data/dbSNP/GRCh38.p13/dbSNP.gz.tbi
ln -s /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.dict .
ln -s /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna .
ln -s /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna.fai .
#+end_src
puis
#+begin_src
gatk --java-options "-Xmx3072M" HaplotypeCaller --input 63003856_S135_chr15_inserted.bam --output testchr15.vcf.gz --reference genomeRef.fna  --tmp-dir . -L NC_000015.10
#+end_src
scp meso:/Work/Users/apraga/bisonex/tests/synthetic/testchr15.vcf.gz haplotypecaller-chr15.vcf.gz
Aucun variant inséré
- base quality ok
  -
****** DONE bam out : non appelé
CLOSED: [2023-05-01 Mon 21:57]
gatk --java-options "-Xmx3072M" HaplotypeCaller --input 63003856_S135_chr15_inserted.bam     --output haplotypecaller-chr15.vcf.gz --reference genomeRef.f
na  --tmp-dir . -L NC_000015.10  --bam-output debug.bam
****** DONE --linked-de-bruijn-graph : idem
CLOSED: [2023-05-01 Mon 21:57]
readlink testchr15.vcf.gz -f^C
[apraga@mesointeractive synthetic]$ gatk --java-options "-Xmx3072M" HaplotypeCaller --input 63003856_S135_chr15_inserted.bam     --output haplotypecaller-chr15.vcf.gz --reference genomeRef.fna  --tmp-dir . -L NC_000015.10  --linked-de-bruijn-graph
****** KILL regénérer fastq
CLOSED: [2023-05-13 Sat 18:29]
Non
***** KILL Générer bam données pour tous les chromosomes
CLOSED: [2023-05-13 Sat 18:29]
 timeit julia -Jbisonex.so --project=. insertVariants.jl ~/code/bisonex/out/63003856/preprocessing/63003856_S135.bam 63003856_S135_inserted.bam
40min 516ms 835µs 405ns
Avertissement:
 [W::bam_hdr_read] EOF marker is absent. The input is probably truncated
Inserted.bam et excluded.bam (fichier avant le merge)  ont l'air ok...
On réessaie à la main : ça passe
#+begin_src
samtools merge test-all.bam inserted.bam excluded.bam
❯ mv test-all.bam `63003856_S135_inserted.bam` -f
❯ mv test-all.bam.bai `63003856_S135_chr15_inserted.bam.bai` -f
#+end_src
***** DONE BAm2fastq pour avoir CIGAR à jour : échec (variants "cachés")
CLOSED: [2023-05-04 Thu 20:30] SCHEDULED: <2023-05-01 Mon>
On lance la génération de bam depuis le mesocentro (la copie plante via le VPN)
#+begin_src sh
cd /Work/Users/apraga/recherche/bisonex/generate
julia --project=. insertVariants.jl  ../../../bisonex/out/63003856_S135/preprocessing/applybqsr/63003856_S135.bam 63003856_S135_inserted.bam
#+end_src
Workflow après avec désactivé storeDir pour SAMTOOLS_BAM2FQ dans nextflow.config (pourquoi ??)
#+begin_src nextflow
include { SAMTOOLS_BAM2FQ }                            from "${params.modulesDir}/samtools/bam2fq/main"
include { SAMTOOLS_SORT as sortBamByName }             from "${params.modulesDir}/samtools/sort/main"
workflow {
    f = Channel.fromPath("${params.dataDir}/synthetic/63003856_S135_inserted.bam",
                         checkIfExists: true).map{it -> [["id": "synthetic_63003856"], it]}
    // Important: use "-n" option !!
    sortBamByName(f)
    SAMTOOLS_BAM2FQ(sortBamByName.out.bam, true)
}
#+end_src
Puis
#+begin_src
cp work/34/fb2fc136f6f6d7f42d0960512f06de/*.fq.gz /Work/Groups/bisonex/data/synthetic/
#+end_src
***** KILL Lancer pipeline
CLOSED: [2023-05-04 Thu 20:30] SCHEDULED: <2023-05-01 Mon>
NXF_OPTS=-D"user.name=apraga" nextflow run   main.nf -c nextflow.config  -profile standard,helios -bg --input="/Work/Groups/bisonex/data/synthetic/synthetic_63003856_{1,2}.fq.gz" --outdir out/synthetic_63003856
*** TODO Bamsurgeon
**** DONE Package nix
CLOSED: [2023-05-13 Sat 22:03]
**** TODO Test sur mini-bam
❯ samtools view -h ~/code/bisonex/simuscop-centogene-200x/cento/preprocessing/mapped/cento.bam | head -n700 | samtools view -Sb - > mini.bam
❯ samtools index mini.bam
Dans snv.txt:
#+begin_quote
NC_000001.11	17651	17651
#+end_quote
./result/bin/addsnv -v snv.txt -f mini.bam -r ../data/genomeRef.fna -o test.bam
Il faut le jar de picard
❯ ../result/bin/addsnv -v snv.txt -f mini.bam -r ../data/genomeRef.fna -o test.bam --aligner mem --picardjar /nix/store/sjwrhvwwfr9gf7zzwk3kzihhckn5jpjw-picard-tools-2.27.4/libexec/picard/picard.jar
ne fonctionne pas mais le test de bamsurgeon oui
#+begin_src sh
head ~/code/bamsurgeon/test_data/random_snvs.txt  -n8 > random_snvs.txt
 ../result/bin/addsnv -v random_snv.txt -f ~/code/bamsurgeon/test_data/testregion_realign.bam -r ~/code/bamsurgeon/test_data/Homo_sapiens_chr22_assembly19.fasta -o test.bam --aligner mem --picardjar /nix/store/sjwrhvwwfr9gf7zzwk3kzihhckn5jpjw-picard-tools-2.27.4/libexec/picard/picard.jar
#+end_src
Il manque les dict ? On teste sur le mésocentre
#+begin_src
samtools view -h ../simuscop-centogene-200x/cento/preprocessing/mapped/cento.bam  | head -n1000 | samtools view -Sb - > mini.bam
[apraga@mesointeractive tests]$ samtools index mini.bam
 ln -s /Work/Groups/bisonex/data/genome/GRCh38.p13/bwa/* .
[apraga@mesointeractive tests]$ ls
#+end_src
Idem...
#+begin_src
addsnv -v snv.txt -f mini.bam -r genomeRef.fna -o test.bam --aligner mem --picardjar=/nix/store/sjwrhvwwfr9gf7zzwk3kzihhckn5jpjw-picard-tools-2.27.4/libexec/picard/picard.jar
#+end_src
On essaye sur un chromosome entier
#+begin_src
samtools view ../simuscop-centogene-50x/cento/preprocessing/mapped/cento.bam NC_000022.11 -b > chr22.bam
#+end_src
Idem
addsnv -v snv.txt -f chr22.bam -r genomeRef.fna -o test.bam --aligner mem --picardjar=/nix/store/sjwrhvwwfr9gf7zzwk3kzihhckn5jpjw-picard-tools-2.27.4/libexec/picard/picard.jar
Problème avec la référence : ne trouve pas les index
*** Divers
**** DONE Vérifier nombre de reads fastq - bam
CLOSED: [2022-10-09 Sun 22:31]
* DONE Plot : ashkenazim trio
CLOSED: [2023-04-18 Tue 21:28] SCHEDULED: <2023-04-16 Sun>
/Entered on/ [2023-04-16 Sun 17:29]

Replacement in notes/index.org at line 1 [8.368291]

B:BD[8.368291] → [5.142:207]

[[file:notes/20230511180745-bacteriologie.html][Bactériologie]]

[8.368291]

[[file:20230511180745-bacteriologie.html][Bactériologie]]

File addition: 20230514100352-haskell_nix.org (----------)

[14.29]

:PROPERTIES:
:ID:       39c6a7a1-39ea-45f9-a647-6119b3f56837
:END:
#+title: Haskell Nix
* Cabal2nix
Après avoir créé un fichier .cabal, mettre dans `default.nix`
#+begin_src sh
let
  pkgs = import <nixpkgs> { }; # pin the channel to ensure reproducibility!
in
pkgs.haskellPackages.developPackage {
  root = ./.;
}
#+end_src
L'exécutable est généré par `nix-build`.
Pour débugger avec ghci,
#+begin_src sh
nix-shell
$ ghci
ghci:> :l app/Main.hs
#+end_src

File addition: 20230514100330-r_nix.org (----------)

[14.29]

:PROPERTIES:
:ID:       ca9c0bc4-c72a-41b9-934c-5858ed11e1eb
:END:
#+title: R Nix
* R librairies
De même que R, il faut packager R avec les librairies
#+begin_src nix
with (import <nixpkgs> {});
let
  my-r-packages = rWrapper.override{packages = with rPackages; [
plotly
  ];};
in
my-r-packages
#+end_src

File addition: 20230514100219-python_nix.org (----------)

[14.29]

:PROPERTIES:
:ID:       f72cd4cf-c1ce-48ab-86df-50c0f2a850bd
:END:
#+title: Python Nix
#+filetags: nix
* Projet en Python
Instructions simples ici : https://nixos.wiki/wiki/Python#Package_and_development_shell_for_a_python_project
Il faut donc setup.py:
#+begin_src python
#!/usr/bin/env python
from setuptools import setup, find_packages
setup(name='demo-flask-vuejs-rest',
      version='1.0',
      # Modules to import from other scripts:
      packages=find_packages(),
      # Executables
      scripts=["web_interface.py"],
     )
#+end_src
Et 2 fichiers .nix, le premier pour les dépendances
#+begin_src sh
{ lib, python3Packages }:
with python3Packages;
buildPythonApplication {
  pname = "demo-flask-vuejs-rest";
  version = "1.0";
  propagatedBuildInputs = [ flask ];
  src = ./.;
}
#+end_src
et le défaut
#+begin_src
{ pkgs ? import <nixpkgs> {} }:
pkgs.callPackage ./derivation.nix {}
#+end_src
Il ne reste plus qu’à le construire
#+begin_src
nix-build
result/bin/lol.py
#+end_src
* Python libraries
On package l’exécutable python avec les libraries. Mettre dans =default.nix=
#+begin_src nix
with (import <nixpkgs> {});
let
  my-python-packages = python-packages: with python-packages; [
    pandas
    requests
    # other python packages you want
  ];
in
python3.withPackages my-python-packages
#+end_src
Puis
#+begin_src sh
nix-build default.nix
result/bin/python
>>> import pandas
#+end_src
Si on veut juste un shell, mettre dans =shell.nix=
#+begin_src sh
{ pkgs ? import <nixpkgs> {} }:
let
  my-python-packages = ps: with ps; [
    pandas
    pysam
    # other python packages
  ];
  my-python = pkgs.python3.withPackages my-python-packages;
in my-python.env
#+end_src

Replacement in notes/20230511181600-nix.org at line 210 [16.47]

B:BD[15.1288] → [15.1288:1308]

∅:D[15.1308] → [17.47:918]

B:BD[16.1769] → [17.47:918]

B:BD[18.1001] → [19.49:59]

B:BD[19.59] → [20.54:74]

∅:D[20.74] → [15.1349:1650]

B:BD[15.1349] → [15.1349:1650]

∅:D[15.1650] → [21.124:134]

B:BD[21.124] → [21.124:134]

B:BD[21.134] → [15.1651:1656]

∅:D[15.1656] → [21.160:175]

B:BD[21.160] → [21.160:175]

B:BD[21.175] → [15.1657:1715]

B:BD[15.1715] → [22.116245:116528]

∅:D[22.116528] → [20.75:301]

B:BD[15.1715] → [20.75:301]

∅:D[20.301] → [21.235:245]

∅:D[15.1715] → [21.235:245]

B:BD[21.235] → [21.235:245]

** Projet en Python
Instructions simples ici : https://nixos.wiki/wiki/Python#Package_and_development_shell_for_a_python_project
Il faut donc setup.py:
#+begin_src python
#!/usr/bin/env python
from setuptools import setup, find_packages
setup(name='demo-flask-vuejs-rest',
      version='1.0',
      # Modules to import from other scripts:
      packages=find_packages(),
      # Executables
      scripts=["web_interface.py"],
     )
#+end_src
Et 2 fichiers .nix, le premier pour les dépendances
#+begin_src sh
{ lib, python3Packages }:
with python3Packages;
buildPythonApplication {
  pname = "demo-flask-vuejs-rest";
  version = "1.0";
  propagatedBuildInputs = [ flask ];
  src = ./.;
}
#+end_src
et le défaut
#+begin_src
{ pkgs ? import <nixpkgs> {} }:
pkgs.callPackage ./derivation.nix {}
#+end_src
Il ne reste plus qu’à le construire
#+begin_src
nix-build
result/bin/lol.py
#+end_src
** Python libraries
On package l’exécutable python avec les libraries. Mettre dans =default.nix=
#+begin_src nix
with (import <nixpkgs> {});
let
  my-python-packages = python-packages: with python-packages; [
    pandas
    requests
    # other python packages you want
  ];
in
python3.withPackages my-python-packages
#+end_src
Puis
#+begin_src sh
nix-build default.nix
result/bin/python
>>> import pandas
#+end_src
Si on veut juste un shell, mettre dans =shell.nix=
#+begin_src sh
{ pkgs ? import <nixpkgs> {} }:
let
  my-python-packages = ps: with ps; [
    pandas
    pysam
    # other python packages
  ];
  my-python = pkgs.python3.withPackages my-python-packages;
in my-python.env
#+end_src
** R librairies
De même que R, il faut packager R avec les librairies
#+begin_src nix
with (import <nixpkgs> {});
let
  my-r-packages = rWrapper.override{packages = with rPackages; [
plotly
  ];};
in
my-r-packages
#+end_src

[15.1288]

[23.3885]

[[id:f72cd4cf-c1ce-48ab-86df-50c0f2a850bd][Python Nix]]
[[id:ca9c0bc4-c72a-41b9-934c-5858ed11e1eb][R Nix]]
[[id:39c6a7a1-39ea-45f9-a647-6119b3f56837][Haskell Nix]]

Deletion in notes/20230511181600-nix.org at line 234 [16.47]
B:BD[23.4703] → [23.4703:4705]
B:BD[23.4705] → [24.65:86]
B:BD[24.86] → [25.52:132]
∅:D[25.132] → [24.205:209]
B:BD[24.205] → [24.205:209]
B:BD[24.209] → [25.133:209]
∅:D[25.209] → [24.241:244]
B:BD[24.241] → [24.241:244]
B:BD[24.244] → [25.210:262]
```
}
#+end_src
** Haskell
Après avoir créé un fichier .cabal, mettre dans `default.nix`
#+begin_src sh
let
  pkgs = import <nixpkgs> { }; # pin the channel to ensure reproducibility!
in
pkgs.haskellPackages.developPackage {
  root = ./.;
```
Deletion in notes/20230511181600-nix.org at line 236 [16.47]
B:BD[23.4715] → [25.265:337]
∅:D[25.337] → [24.324:339]
B:BD[24.324] → [24.324:339]
B:BD[24.339] → [25.338:377]
∅:D[25.377] → [24.379:389]
B:BD[24.379] → [24.379:389]
```
L'exécutable est généré par `nix-build`.
Pour débugger avec ghci,
#+begin_src sh
nix-shell
$ ghci
ghci:> :l app/Main.hs
#+end_src
```
Deletion in notes/20230511180745-bacteriologie.org at line 4 [26.3565]
∅:D[5.325] → [26.3687:3718]
B:BD[26.3687] → [26.3687:3718]
```
---
title: Bactériologie
...
```
Insertion in notes/20230511173024-nushell.org at line 14 [2.60]
[2.394]
```
* Limite
Problème avec nix sur les chemin -> zsh plutôt
```

Insertion in notes/20230511171011-gentoo.org at line 60 [28.75]

[27.880]

[3.76]


* TODO Désinstaller un package qui n'est pas dans @world
emerge -vcp $PACKAGE
et on descend dans les dépendences jusqu'à trouver celle qui pèche

Insertion in notes/20230511170707-gerer_sa_bibliographie.org at line 29 [29.142]
[29.810]
[29.810]
```
*** Emacs
Set citar-library-paths '("~/papers/bisonex")
```
Insertion in notes/20230511170707-gerer_sa_bibliographie.org at line 53 [29.142]
[29.1254]
[29.1254]
```
*** Open a pdf
Open files directly with citar-open-files if there are named as $key.pdf
```
Insertion in notes/20230511170707-gerer_sa_bibliographie.org at line 56 [29.142]
[29.1279]
[29.1279]
```
** KILL V1
CLOSED: [2023-05-13 Sat 15:42]
```

Insertion in notes/20230511170707-gerer_sa_bibliographie.org at line 60 [29.142]

[29.1512]

** V2: org-roam + citar
With SPC m @, you can
- either create a org-roam note for a paper
- open the pdf if it is store in citar-library-paths and is named according to the bibtex key

Updating notes

Dependencies

In channels

Change contents

Insertion in projects.org at line 2242 [8.123895]

Replacement in projects/bisonex.org at line 7 [10.35]

Replacement in projects/bisonex.org at line 42 [10.35]

Replacement in notes/index.org at line 1 [8.368291]

File addition: 20230514100352-haskell_nix.org (----------)

File addition: 20230514100330-r_nix.org (----------)

File addition: 20230514100219-python_nix.org (----------)

Replacement in notes/20230511181600-nix.org at line 210 [16.47]

Deletion in notes/20230511181600-nix.org at line 234 [16.47]

Deletion in notes/20230511181600-nix.org at line 236 [16.47]

Deletion in notes/20230511180745-bacteriologie.org at line 4 [26.3565]

Insertion in notes/20230511173024-nushell.org at line 14 [2.60]

Insertion in notes/20230511171011-gentoo.org at line 60 [28.75]

Insertion in notes/20230511170707-gerer_sa_bibliographie.org at line 29 [29.142]

Insertion in notes/20230511170707-gerer_sa_bibliographie.org at line 53 [29.142]

Insertion in notes/20230511170707-gerer_sa_bibliographie.org at line 56 [29.142]

Insertion in notes/20230511170707-gerer_sa_bibliographie.org at line 60 [29.142]