UPNBONLATA6EOXDE4CPO2ZHJI2XC62YATCJVEMAJ3BSFNLNHY5CQC
JOAGXURURVTDKSBWG52F2WQ4VR6FZTEEU2LHN5JRP6ACYAP3GH6QC
CNBE5XJUIY22ZD2QNFI3ZIRVSJIBV67WTCXGRQ22J6PZAE6B555AC
FTBZ3AEBBAN7JN6FZFKG64VOWLNYAAOSNJZRWWZYH7WCEJ36MFIAC
RHWQQAAHNHFO3FLCGVB3SIDKNOUFJGZTDNN57IQVBMXXCWX74MKAC
FXA3ZBV64FML7W47IPHTAJFJHN3J3XHVHFVNYED47XFSBIGMBKRQC
E22LJP4FLYXYGD3WCHOV64RX6H3MOTKQV3ZBZOBWBON6CTL64CCAC
7UCW5ZF74LJ426HTUSTBLYSFGF2GSOY47EH7GZPPRWD3XKVAI4XAC
MYB2GKGJACLHRZ6PQY3DC26DGULZ2XTWVGD5CSQ5ZN3ZAY3CFUMAC
RVN6OJTPXQQ7FTXHEB3DE3GSGUYPY4LSJV6SCYSTREQMTVBZ7F7AC
RINTICZBB626O33E6A27VFDRZL7GG2KUT2KYIQCT73AXP6RG4KAQC
XWHLGHWELGQVAY2IYMAQJUREUPLKVQQHTH36WS7WAQNDRNUXM3OAC
GKG3LEQDLFB5YKEI5DZMJS6FKZRSM6L54ZB6ZMQVSNIZ7SFU7UGAC
DJY2XLQPXOGPDJHJVQQDYJHPN5HQJ77UKDOBBBVYVPSPF4ZPUP7AC
LEFI7TATKVMWKCX2BIZNQXQH3IHFTGJG6E7NDB54VDO2SQUP3HRQC
VAJ4IGPVOC32AVK7ULFHDZSDPD26IZ6LIXNJIRZGUV6HOPGTMSWQC
YF4W6GSBTMMI3WPNEBWVNYLYNVPUM3LLH5MGM2XMFLWJK4YA765AC
TNVK3OWKFZBPHLW3TOPO63YV3UDDDWY3VITE6MKMPSBNJU2MKCQAC
* <2023-02-18 Sat> Workout
** Muscle-up
Progrès !
- 4x1 + 4 neg
- 2-1-1-1 + 4 neg
- 2-1-1-1 + 4 neg
** Norwegian curl
- 4
- 4
- 4
** Compression
- 1x9
- 1x9
- 1x9
** Push-up
- 5-3-2
- 5-3-2
- 5-3-2
** Pull-up
- 3-2
- 3-2
- 3-2
** Extension
- 22
- 22
- 22
# -*- mode: org -*-
Archived entries from file /home/alex/org/projects/bisonex.org
* DONE Vérifier qualité données sur mesocentre
CLOSED: [2023-01-12 Thu 15:48]
:PROPERTIES:
:ARCHIVE_TIME: 2023-01-12 Thu 15:48
:ARCHIVE_FILE: ~/org/projects/bisonex.org
:ARCHIVE_OLPATH: Données
:ARCHIVE_CATEGORY: bisonex
:ARCHIVE_TODO: DONE
:END:
** DONE BAM
CLOSED: [2023-01-12 Thu 15:47]
picard ValidateSamFile
On regarde juste le code d'erreur (0 = pas d'erreur)
** DONE Fastq
CLOSED: [2023-01-12 Thu 15:48]
fastqc
Il faut ensuite extraire les zip and chercher les erreur dedans
* KILL Implémenter d’autres pipeline
CLOSED: [2023-01-15 Sun 12:03]
:PROPERTIES:
:ARCHIVE_TIME: 2023-01-15 Sun 12:03
:ARCHIVE_FILE: ~/org/projects/bisonex.org
:ARCHIVE_CATEGORY: bisonex
:ARCHIVE_TODO: KILL
:END:
Voir https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04407-x
** KILL GATK
CLOSED: [2022 -11-11 Fri 20:01]
https://broadinstitute.github.io/warp/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README
A priori, respecte les bonnes pratiques
** KILL Essayer snmake avec bonne pratiques
https://github.com/snakemake-workflows/dna-seq-gatk-variant-calling/blob/main/.github/workflows/main.yml
Installer Mamba (micromamba ne fonctionne pas sous nix)
Ne fonctionne pas sous WSL2... MultiQC n’est pas assez à jour
Problèmes de versions...
** KILL Sarek
CLOSED: [2022-12-11 Sun 11:09]
*** Dépendences
**** Nix
#+begin_src sh
nix profile install nixpkgs#mosdepth nixpkgs#python3
nix-shell -p python310Packages.pyyaml --run "nextflow run nf-core/sarek -profile test --executor slurm --queue smp --outdir test -resume"
#+end_src
***** KILL derivation nix pour profile complet
CLOSED: [2022-12-11 Sun 11:09]
**** KILL Sans nix
CLOSED: [2022-09-24 Sat 10:20]
On utilise conda
#+begin_src sh
module unload nix
module load anaconda3@2021.05/gcc-12.1.0
module load nextflow@22.04.0/gcc-12.1.0
module load openjdk@11.0.14.1_1/gcc-12.1.0
nextflow run nf-core/sarek -profile conda,test --executor slurm --queue smp --outdir test -resume
#+end_src
Essai 1: erreurs de permissions, corrigé en relancant le programme
#+begin_quote
Failed to create Conda environment
command: conda create --mkdir --yes --quiet --prefix /Work/Users/apraga/test-sarek/work/conda/env-2d53b1db50de676670cf1a91ef0cf6db bioconda::tabix=1.11
status : 1
message:
NotWritableError: The current user does not have write permissions to a required path.
path: /Home/Users/apraga/.conda/pkgs/urls.txt
uid: 1696
gid: 513
If you feel that permissions on this path are set incorrectly, you can manually
change them by executing
$ sudo chown 1696:513 /Home/Users/apraga/.conda/pkgs/urls.txt
#+end_quote
Corrigé avec
#+begin_src sh
chown 1696:513 /Home/Users/apraga/.conda/pkgs/urls.txt
#+end_src
Mais problème de proxy
*** KILL Dérivation nix pour modules python
CLOSED: [2022-12-11 Sun 11:09]
*** KILL Lancer sarek en mode test
CLOSED: [2022-12-11 Sun 11:09]
#+begin_src sh
nix-shell -p python310Packages.pyyaml --run "nextflow run nf-core/sarek -profile test --executor slurm --queue smp --outdir test -resume"
#+end_src
*** KILL Lancer sarek sur données allégées
CLOSED: [2022-12-11 Sun 11:09]
* DONE Essai 1
CLOSED: [2023-02-13 Mon 11:50]
:PROPERTIES:
:ARCHIVE_TIME: 2023-02-14 Tue 10:42
:ARCHIVE_FILE: ~/org/projects/bisonex.org
:ARCHIVE_OLPATH: Nouveau workflow/Dépendences avec Nix/hap.py/Faire fonctionner Tests
:ARCHIVE_CATEGORY: bisonex
:ARCHIVE_TODO: DONE
:END:
Problème avec chemin python pour pysam : Tools/__init__.py échoue mais on peut utilise build/bin/hap.py
#+begin_src
nix develop .#hap-py
$ genericBuild
#+end_src
On lance donc les tests à la main (trop d'erreurs sur les chemins)
#+begin_src
# OK !
HCDIR=build/bin build/bin/test_haplotypes
# OK !
bash src/sh/make_hg19.sh
HCDIR=build/bin HG19=hg19.fa bash src/sh/run_multimer
ge_test.sh
#+end_src
Écheck sur
$ HCDIR=build/bin bash src/sh/run_hapenum_test.sh
Traceback (most recent call last):
File "build/bin/hap.py", line 26, in <module>
import pysam
File "/nix/store/3w2v5cl4x6ddq4281awcab9412r5gkaw-python3-3.10.9-env/lib/python3.10/site-packages/pysam/__init__.py", line 4, in <module>
from pysam.libchtslib import *
ImportError: No module named libchtslib
IL faut commenter detect var
****** vcfeval : echec
#+begin_src sh
cd /Work/Groups/bisonex/data/NA12878/precisionChallenge
curl -O https://data.nist.gov/od/ds/ark:/88434/mds2-2336/submission_vcfs/0GOOR/0GOOR_HG002.vcf.gz
bcftools annotate --rename-chrs /Work/Groups/bisonex/data/genome/GRCh38.p13/chromosome_mapping.txt 0GOOR_HG002.vcf.gz -o 0GOOR_HG002_renamed.vcf.gz
tabix 0GOOR_HG002_renamed.vcf.gz
#+end_src
Soumission
#+begin_src slurm
#!/bin/bash
#SBATCH -c 4
#SBATCH -p smp
#SBATCH --time=01:00:00
#SBATCH --mem=32G
module load nix/2.11.0
export HGREF=/Work/Groups/bisonex/data-alexis-reference/genome/GRCh38_latest_genomic.fna
dir=/Work/Groups/bisonex/data/NA12878/
rtg vcfeval -b $dir/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz \
--bed-regions ${dir}/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.bed \
-c ${dir}/precisionChallenge/0GOOR_HG002_renamed.vcf.gz \
-o test-0GOOR -t /Work/Groups/bisonex/data/genome/GRCh38.p13/genomeRef.sdf
#rtg vcfeval -b $dir/HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz --bed-regions ${dir}/benchmark-exons.bed -c files/vcf/NA12878_NIST7035_vep_annot.vcf.gz -o test-rtg -t /Work/Groups/bisonex/data/genome/GRCh38.p13/genomeRef.sdf
#+end_src
Résultat :
VCF header does not contain a FORMAT field named GQ
There were 112 problematic called variants skipped during loading (see vcfeval.log for details).
There were 2303523 variants not thresholded in ROC data files due to missing or invalid GQ (FORMAT) values.
Could not select maximized F-measure threshold from ROC data, only un-thresholded statistics will be shown. Consider selecting a different scoring attribute with --vcf-score-field
Threshold True-pos-baseline True-pos-call False-pos False-neg Precision Sensitivity F-measure
----------------------------------------------------------------------------------------------------
None 938101 934445 1369078 1276425 0.4057 0.4236 0.4144
****** DONE sans vcfeval (utilisé pour le challenge) : echec
CLOSED: [2023-02-09 Thu 21:55]
#+begin_src slurm
#!/bin/bash
#SBATCH -c 4
#SBATCH -p smp
#SBATCH --time=01:00:00
#SBATCH --mem=32G
module load nix/2.11.0
export HGREF=/Work/Groups/bisonex/data-alexis-reference/genome/GRCh38_latest_genomic.fna
dir=/Work/Groups/bisonex/data/NA12878/
hap.py $dir/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz \
${dir}/precisionChallenge/0GOOR_HG002_renamed.vcf.gz \
-f ${dir}/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.bed \
-o test
#+end_src
Le problème venait 1. de l'ADN et 2. du renommage des chromosomes qui était faux
****** DONE HG002
CLOSED: [2023-02-17 Fri 19:31]
INDEL ALL 276768 102437 174331 1156702 211537 842303 53768 58006 0.370119 0.327170 0.728194 0.347322
INDEL PASS 276768 102437 174331 1156702 211537 842303 53768 58006 0.370119 0.327170 0.728194 0.347322
SNP ALL 1937706 835470 1102236 5666020 1160590 3669981 437793 21058 0.431164 0.418553 0.647718 0.424765
SNP PASS 1937706 835470 1102236 5666020 1160590 3669981 437793 21058 0.431164 0.418553 0.647718 0.424765
On est censé avoir : https://data.nist.gov/od/ds/ark:/88434/mds2-2336/benchmarking_results/0GOOR/0GOOR_HG002.extended.csv
Type Filter METRIC.Recall METRIC.Precision METRIC.Frac_NA METRIC.F1_Score
INDEL ALL 0.935776 0.903997 0.523369 0.919612
INDEL PASS 0.935776 0.903997 0.523369 0.919612
SNP ALL 0.998554 0.99395 0.403126 0.996247
SNP PASS 0.998554 0.99395 0.403126 0.996247
****** DONE Avec chr originels + génome de référence : idem
CLOSED: [2023-02-09 Thu 21:55]
#+begin_src
awk '{
if($0 !~ /^#/)
print "chr"$0;
else if(match($0,/(##contig=<ID=)(.*)/,m))
print m[1]"chr"m[2];
else print $0
}' 0GOOR_HG002.vcf | gzip -c > 0GOOR_HG002_chr.vcf.gz
#+end_src
#+begin_src
curl -O https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/references/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta.gz /Work/Groups/bisonex/data/genome/GRCh38/
#+end_src
segmentation fault... Il faut unzip le fasta
#+begin_src
cd /Work/Groups/bisonex/data/genome/GRCh38/
gunzip GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta.gz
samtools faidx GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta
#+end_src
#+begin_src slurm
#!/bin/bash
#SBATCH -c 4
#SBATCH -p smp
#SBATCH --time=01:00:00
#SBATCH --mem=32G
INDEL ALL 525466 491355 34111 1156702 57724 605307 9384 25027 0.935084 0.895313 0.523304 0.914766
INDEL PASS 525466 491355 34111 1156702 57724 605307 9384 25027 0.935084 0.895313 0.523304 0.914766
SNP ALL 3365115 3358399 6716 5666020 21995 2284364 4194 1125 0.998004 0.993496 0.403169 0.995745
SNP PASS 3365115 3358399 6716 5666020 21995 2284364 4194 1125 0.998004 0.993496 0.403169 0.995745
module load nix/2.11.0
dir=/Work/Groups/bisonex/data/NA12878/
truth=${dir}/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz
#ref=script/files/vcf/NA12878_NIST7035.vcf
ref=${dir}/precisionChallenge/0GOOR_HG002_chr.vcf.gz
bed=${dir}/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.bed
fasta=/Work/Groups/bisonex/data/genome/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta
hap.py $truth $ref -f $bed -r $fasta -o test
#+end_src
Même résultats
Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision
INDEL ALL 467684 174199 293485 1156702 358998 622785 90229 99609 0.372472 0.327615
INDEL PASS 467684 174199 293485 1156702 358998 622785 90229 99609 0.372472 0.327615
SNP ALL 3254352 1410388 1843964 5666020 1953278 2302401 735823 36239 0.433385 0.419293
SNP PASS 3254352 1410388 1843964 5666020 1953278 2302401 735823 36239 0.433385 0.419293
****** Vérifier FN
***** TODO Avec python2
****** TODO avec nix
TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio
NaN NaN 1.528276 2.752637
NaN NaN 1.528276 2.752637
2.100129 1.473519 1.581196 1.795603
2.100129 1.473519 1.581196 1.795603
***** KILL Avec python2
CLOSED: [2023-02-17 Fri 19:25]
****** KILL avec nix
CLOSED: [2023-02-17 Fri 19:25]
**** Nextflow
***** DONE Télécharger NA12878 (HG001)
CLOSED: [2023-02-17 Fri 19:29]
***** DONE Télécharger Ashkenazy trio HG002
CLOSED: [2023-02-17 Fri 19:29]
***** DONE Renommer les chromosomes
CLOSED: [2023-02-17 Fri 19:30]
***** TODO Genome de reference NCBI
Téléchargé mais index non stocké dars /Work TODO
**** Notes
**** TODO beaucoup trop de faux négatifs
***** DONE Test 1 : vep annot : beaucoup trop de faux négatif
***** KILL beaucoup trop de faux négatifs
CLOSED: [2023-02-17 Fri 19:37]
****** DONE Test 1 : vep annot : beaucoup trop de faux négatif
******* TODO Tests après correction bug dans noms de chromosome : precision ~ ok, recall très mauvais -> trop de FN ?
Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision
INDEL ALL 7230 321 6909 1500 290 888 27 18 0.044398 0.526144
INDEL PASS 7230 321 6909 1500 290 888 27 18 0.044398 0.526144
SNP ALL 59052 1653 57399 3338 101 1583 12 2 0.027992 0.942450
SNP PASS 59052 1653 57399 3338 101 1583 12 2 0.027992 0.942450
METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio
0.592000 0.081887 NaN NaN 1.54733 6.129187
0.592000 0.081887 NaN NaN 1.54733 6.129187
0.474236 0.054370 2.433271 1.861183 1.57523 2.703663
0.474236 0.054370 2.433271 1.861183 1.57523 2.703663
******* Vérifier exons
On a l'union des exons de tous les transcripts...
Il faudrait un .bed d'illumina
On teste Twist for Illumina Exome 2.0 Plus BED File (hg19) sur https://support.illumina.com/downloads/nextera-flex-for-enrichment-BED-files.html
Conversion en hg38 avec ucsc
Renommage des chromosomes
#+begin_src
sed 's:^:s/chr:;s:chrMT:chrM:;s:\s:\\t/:;s:$:\\t/:' ../../genome/GRCh38.p13/chromosome_mapping.txt > pattern.sed
sed -i.bak -f pattern.sed illumina_exons.bed
bedtools intersect -a HG001_GRCh38_1_22_v4.2.1_benchmark.bed -b illumina_exons.bed > HG001_GRCh38_1_22_v4.2.1_benchmark_illumina_exons.bed
#+end_src
Intersection
******* Erreur dans le paramère : -f au lieu de -R !!
Il faut supprimer les overlap (et trier avant)
bedtools sort -i illumina_exons.bed > illumina_exons.bed.sorted
bedtools merge -i illumina_exons.bed.sorted > HG001_GRCh38_1_22_v4.2.1_benchmark_illumina_exons.be
**** Version avec python 2 sans nix
nix profile install nixpkgs#autoconf
nix profile install nixpkgs#cmake
On cherche boost avec ls /nix/store/*boost*
python2 install.py install-hap.py --boost-root /nix/store/4djr55cmnfahbg6i6wr8p3ga66gs09gi-boost-1.79.0
Pour utiliser le boost de nix, il faut patcher pour utiliser la version non static
#+begin_src diff
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 307a3b9..806d064 100755
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -32,16 +32,16 @@ if(NOT "${EXTERNAL_SUCCESS}" STREQUAL "0")
message(FATAL_ERROR "Building external dependencies has failed")
endif()
-set(Boost_USE_STATIC_LIBS ON) # only find static libs
+#set(Boost_USE_STATIC_LIBS ON) # only find static libs
set(Boost_USE_MULTITHREADED ON)
-set(Boost_USE_STATIC_RUNTIME ON)
+#set(Boost_USE_STATIC_RUNTIME ON)
# un-break library finding
-set(Boost_NO_BOOST_CMAKE ON)
-set(Boost_NO_SYSTEM_PATHS ON)
+#set(Boost_NO_BOOST_CMAKE ON)
+#set(Boost_NO_SYSTEM_PATHS ON)
-set(BOOST_ROOT ${CMAKE_BINARY_DIR})
-message("Using our own Boost, which was built at ${HAPLOTYPES_SOURCE_DIR}/external/boost_install")
+#set(BOOST_ROOT ${CMAKE_BINARY_DIR})
+#message("Using our own Boost, which was built at ${HAPLOTYPES_SOURCE_DIR}/external/boost_install")
find_package(Boost 1.55.0 COMPONENTS thread iostreams regex unit_test_framework filesystem system program_options REQUIRED)
include_directories(${Boost_INCLUDE_DIRS})
#+end_src
Echeck à la compilatio
On essaye la dernière version de boost:
#+begin_src sh
cd external
wget https://boostorg.jfrog.io/artifactory/main/release/1.81.0/source/boost_1_81_0.tar.bz2
sed -i.bak 's/boost_subset_1_58_0/boost_1_81_0/' external/make_dependencies.sh
#+end_src
On essaye avec boost en static dans nix :
nix-shell -E "with import <nixpkgs> {}; pkgs.boost.override{ enableStatic = true;}"