contre
$ samtools view -c /Work/Groups/bisonex/ref_63003856_S135/63003856_S135_cleaned.bam
128077207
On regarde les stats en détails (de la version nettoyée)
#+end_src
***** KILL derivation nix pour profile complet
CLOSED: [2022-12-11 Sun 11:09]
**** KILL Sans nix
CLOSED: [2022-09-24 Sat 10:20]
On utilise conda
#+begin_src sh
module unload nix
module load anaconda3@2021.05/gcc-12.1.0
module load nextflow@22.04.0/gcc-12.1.0
module load openjdk@11.0.14.1_1/gcc-12.1.0
nextflow run nf-core/sarek -profile conda,test --executor slurm --queue smp --outdir test -resume
#+end_src
** Divers
*** DONE Vérifier nombre de reads fastq - bam
CLOSED: [2022-10-09 Sun 22:31]
* Améliorations
** TODO Quality score recalibration avec un ensemble de fichier
Voir GATK best practice
** KILL Utiliser T-to-T comme références
CLOSED: [2023-01-01 Sun 21:35]
Semble compliqué avec les nouvelles bases de données
** TODO Macro excel
** TODO Utiliser le XML de clinvar
Extraction sous VCF possible avec
https://github.com/SeqOne/clinvcf
** Annotation
Liste complète
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9252745/
*** TODO Utilise une version allégée de GnomAD (une seule colonne)
*** TODO Digenisme (cf nomenclature omim)
C’est dans le nom de la maladie
* HOLD Implémenter d’autres pipeline
Voir https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04407-x
** KILL GATK
CLOSED: [2022 -11-11 Fri 20:01]
https://broadinstitute.github.io/warp/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README
A priori, respecte les bonnes pratiques
** KILL Essayer snmake avec bonne pratiques
https://github.com/snakemake-workflows/dna-seq-gatk-variant-calling/blob/main/.github/workflows/main.yml
Installer Mamba (micromamba ne fonctionne pas sous nix)
Ne fonctionne pas sous WSL2... MultiQC n’est pas assez à jour
Problèmes de versions...
** KILL Sarek
CLOSED: [2022-12-11 Sun 11:09]
*** Dépendences
**** Nix
#+begin_src sh
nix profile install nixpkgs#mosdepth nixpkgs#python3
nix-shell -p python310Packages.pyyaml --run "nextflow run nf-core/sarek -profile test --executor slurm --queue smp --outdir test -resume"
$ zgrep '^NC' /Work/Groups/bisonex/ref_63003856_S135/63003856_S135_DP_over_30.vcf.gz | wc -l
82033
Non lié à la profondeur : on teste avec
bcftools filter -i 'FORMAT/DP<=30' filter-depth.vcf
bcftools filter -i 'FORMAT/AD[0:1]<=10' filter-depth.vcf
****** Vérifier qu'en utilsant 2 filtres différents on a bien la même chose : oui
$ bcftools filter -e 'FORMAT/DP<=30' 63003856_S135.vcf.gz | bcftools filter -e 'FORMAT/AD[0:1]<=10' -o two-filters.vcf
$ grep '^NC' two-filters.vcf | wc -l
82054
***** Tester bwa en séquentiel
****** KILL Filter technical variants
CLOSED: [2023-01-04 Wed 19:16]
**** Gatk 4.2.4 (même version qu'alexis)
***** TODO Variant calling
****** TODO haplotypecaller: mieux mais non identique !
******* DONE Nombres lignes gatk 4.2.2 : faible différence
CLOSED: [2023-01-04 Wed 19:18]
$ zgrep '^NC' 63003856_S135.vcf.gz | wc -l
1506931
$ grep '^NC' /Work/Groups/bisonex/ref-vcf/63003856_S135 .vcf | wc -l
1506894
******* DONE Flags la même version de gatk 4.2.2 : ok identique
CLOSED: [2023-01-04 Wed 19:09]
##GATKCommandLine=<ID=HaplotypeCaller,CommandLine="HaplotypeCaller
| ",Version="4.2.4.1",Date="January 4, 2023 at 1:46:41 AM CET"> | Version="4.2.4.1",Date="December 3, 2022 at 1:20:38 AM CET"> |
| --dbsnp /Work/Users/apraga/bisonex/work/5d/feb81028d262d7701bed0a759ff6f6/dbSNP.gz | --dbsnp /mnt/j/bases_de_donnees/dbSNP/GCF_000001405.39.gz |
| --max-mnp-distance 2 | --max-mnp-distance 2 |
| --output 63003856_S135.vcf.gz | --output /mnt/j/working_directory_pipeline_analyse_exome/vcf/63003856_S135.vcf |
| --input 63003856_S135.bam | --input /mnt/j/working_directory_pipeline_analyse_exome/bam/63003856_S135_recalibrated_hg38.bam |
| --reference genomeRef.fna | --reference /mnt/j/bases_de_donnees/genome/GRCh38_latest_genomic.fna |
| --tmp-dir . | --verbosity WARNING |
| --use-posteriors-to-calculate-qual false | --use-posteriors-to-calculate-qual false |
| --dont-use-dragstr-priors false | --dont-use-dragstr-priors false |
| --use-new-qual-calculator true | --use-new-qual-calculator true |
| --annotate-with-num-discovered-alleles false | --annotate-with-num-discovered-alleles false |
| --heterozygosity 0.001 | --heterozygosity 0.001 |
| --indel-heterozygosity 1.25E-4 | --indel-heterozygosity 1.25E-4 |
| --heterozygosity-stdev 0.01 | --heterozygosity-stdev 0.01 |
| --standard-min-confidence-threshold-for-calling 30.0 | --standard-min-confidence-threshold-for-calling 30.0 |
| --max-alternate-alleles 6 | --max-alternate-alleles 6 |
| --max-genotype-count 1024 | --max-genotype-count 1024 |
| --sample-ploidy 2 | --sample-ploidy 2 |
| --num-reference-samples-if-no-call 0 | --num-reference-samples-if-no-call 0 |
| --genotype-assignment-method USE_PLS_TO_ASSIGN | --genotype-assignment-method USE_PLS_TO_ASSIGN |
| --contamination-fraction-to-filter 0.0 | --contamination-fraction-to-filter 0.0 |
| --output-mode EMIT_VARIANTS_ONLY | --output-mode EMIT_VARIANTS_ONLY |
| --all-site-pls false | --all-site-pls false |
| --gvcf-gq-bands 1 | --gvcf-gq-bands 1 |
| --gvcf-gq-bands 2 | --gvcf-gq-bands 2 |
| --gvcf-gq-bands 3 | --gvcf-gq-bands 3 |
| --gvcf-gq-bands 4 | --gvcf-gq-bands 4 |
| --gvcf-gq-bands 5 | --gvcf-gq-bands 5 |
| --gvcf-gq-bands 6 | --gvcf-gq-bands 6 |
| --gvcf-gq-bands 7 | --gvcf-gq-bands 7 |
| --gvcf-gq-bands 8 | --gvcf-gq-bands 8 |
| --gvcf-gq-bands 9 | --gvcf-gq-bands 9 |
| --gvcf-gq-bands 10 | --gvcf-gq-bands 10 |
| --gvcf-gq-bands 11 | --gvcf-gq-bands 11 |
| --gvcf-gq-bands 12 | --gvcf-gq-bands 12 |
| --gvcf-gq-bands 13 | --gvcf-gq-bands 13 |
| --gvcf-gq-bands 14 | --gvcf-gq-bands 14 |
| --gvcf-gq-bands 15 | --gvcf-gq-bands 15 |
| --gvcf-gq-bands 16 | --gvcf-gq-bands 16 |
| --gvcf-gq-bands 17 | --gvcf-gq-bands 17 |
| --gvcf-gq-bands 18 | --gvcf-gq-bands 18 |
| --gvcf-gq-bands 19 | --gvcf-gq-bands 19 |
| --gvcf-gq-bands 20 | --gvcf-gq-bands 20 |
| --gvcf-gq-bands 21 | --gvcf-gq-bands 21 |
| --gvcf-gq-bands 22 | --gvcf-gq-bands 22 |
| --gvcf-gq-bands 23 | --gvcf-gq-bands 23 |
| --gvcf-gq-bands 24 | --gvcf-gq-bands 24 |
| --gvcf-gq-bands 25 | --gvcf-gq-bands 25 |
| --gvcf-gq-bands 26 | --gvcf-gq-bands 26 |
| --gvcf-gq-bands 27 | --gvcf-gq-bands 27 |
| --gvcf-gq-bands 28 | --gvcf-gq-bands 28 |
| --gvcf-gq-bands 29 | --gvcf-gq-bands 29 |
| --gvcf-gq-bands 30 | --gvcf-gq-bands 30 |
| --gvcf-gq-bands 31 | --gvcf-gq-bands 31 |
| --gvcf-gq-bands 32 | --gvcf-gq-bands 32 |
| --gvcf-gq-bands 33 | --gvcf-gq-bands 33 |
| --gvcf-gq-bands 34 | --gvcf-gq-bands 34 |
| --gvcf-gq-bands 35 | --gvcf-gq-bands 35 |
| --gvcf-gq-bands 36 | --gvcf-gq-bands 36 |
| --gvcf-gq-bands 37 | --gvcf-gq-bands 37 |
| --gvcf-gq-bands 38 | --gvcf-gq-bands 38 |
| --gvcf-gq-bands 39 | --gvcf-gq-bands 39 |
| --gvcf-gq-bands 40 | --gvcf-gq-bands 40 |
| --gvcf-gq-bands 41 | --gvcf-gq-bands 41 |
| --gvcf-gq-bands 42 | --gvcf-gq-bands 42 |
| --gvcf-gq-bands 43 | --gvcf-gq-bands 43 |
| --gvcf-gq-bands 44 | --gvcf-gq-bands 44 |
| --gvcf-gq-bands 45 | --gvcf-gq-bands 45 |
| --gvcf-gq-bands 46 | --gvcf-gq-bands 46 |
| --gvcf-gq-bands 47 | --gvcf-gq-bands 47 |
| --gvcf-gq-bands 48 | --gvcf-gq-bands 48 |
| --gvcf-gq-bands 49 | --gvcf-gq-bands 49 |
| --gvcf-gq-bands 50 | --gvcf-gq-bands 50 |
| --gvcf-gq-bands 51 | --gvcf-gq-bands 51 |
| --gvcf-gq-bands 52 | --gvcf-gq-bands 52 |
| --gvcf-gq-bands 53 | --gvcf-gq-bands 53 |
| --gvcf-gq-bands 54 | --gvcf-gq-bands 54 |
| --gvcf-gq-bands 55 | --gvcf-gq-bands 55 |
| --gvcf-gq-bands 56 | --gvcf-gq-bands 56 |
| --gvcf-gq-bands 57 | --gvcf-gq-bands 57 |
| --gvcf-gq-bands 58 | --gvcf-gq-bands 58 |
| --gvcf-gq-bands 59 | --gvcf-gq-bands 59 |
| --gvcf-gq-bands 60 | --gvcf-gq-bands 60 |
| --gvcf-gq-bands 70 | --gvcf-gq-bands 70 |
| --gvcf-gq-bands 80 | --gvcf-gq-bands 80 |
| --gvcf-gq-bands 90 | --gvcf-gq-bands 90 |
| --gvcf-gq-bands 99 | --gvcf-gq-bands 99 |
| --floor-blocks false | --floor-blocks false |
| --indel-size-to-eliminate-in-ref-model 10 | --indel-size-to-eliminate-in-ref-model 10 |
| --disable-optimizations false | --disable-optimizations false |
| --dragen-mode false | --dragen-mode false |
| --apply-bqd false | --apply-bqd false |
| --apply-frd false | --apply-frd false |
| --disable-spanning-event-genotyping false | --disable-spanning-event-genotyping false |
| --transform-dragen-mapping-quality false | --transform-dragen-mapping-quality false |
| --mapping-quality-threshold-for-genotyping 20 | --mapping-quality-threshold-for-genotyping 20 |
| --max-effective-depth-adjustment-for-frd 0 | --max-effective-depth-adjustment-for-frd 0 |
| --just-determine-active-regions false | --just-determine-active-regions false |
| --dont-genotype false | --dont-genotype false |
| --do-not-run-physical-phasing false | --do-not-run-physical-phasing false |
| --do-not-correct-overlapping-quality false | --do-not-correct-overlapping-quality false |
| --use-filtered-reads-for-annotations false | --use-filtered-reads-for-annotations false |
| --adaptive-pruning false | --adaptive-pruning false |
| --do-not-recover-dangling-branches false | --do-not-recover-dangling-branches false |
| --recover-dangling-heads false | --recover-dangling-heads false |
| --kmer-size 10 | --kmer-size 10 |
| --kmer-size 25 | --kmer-size 25 |
| --dont-increase-kmer-sizes-for-cycles false | --dont-increase-kmer-sizes-for-cycles false |
| --allow-non-unique-kmers-in-ref false | --allow-non-unique-kmers-in-ref false |
| --num-pruning-samples 1 | --num-pruning-samples 1 |
| --min-dangling-branch-length 4 | --min-dangling-branch-length 4 |
| --recover-all-dangling-branches false | --recover-all-dangling-branches false |
| --max-num-haplotypes-in-population 128 | --max-num-haplotypes-in-population 128 |
| --min-pruning 2 | --min-pruning 2 |
| --adaptive-pruning-initial-error-rate 0.001 | --adaptive-pruning-initial-error-rate 0.001 |
| --pruning-lod-threshold 2.302585092994046 | --pruning-lod-threshold 2.302585092994046 |
| --pruning-seeding-lod-threshold 9.210340371976184 | --pruning-seeding-lod-threshold 9.210340371976184 |
| --max-unpruned-variants 100 | --max-unpruned-variants 100 |
| --linked-de-bruijn-graph false | --linked-de-bruijn-graph false |
| --disable-artificial-haplotype-recovery false | --disable-artificial-haplotype-recovery false |
| --enable-legacy-graph-cycle-detection false | --enable-legacy-graph-cycle-detection false |
| --debug-assembly false | --debug-assembly false |
| --debug-graph-transformations false | --debug-graph-transformations false |
| --capture-assembly-failure-bam false | --capture-assembly-failure-bam false |
| --num-matching-bases-in-dangling-end-to-recover -1 | --num-matching-bases-in-dangling-end-to-recover -1 |
| --error-correction-log-odds -Infinity | --error-correction-log-odds -Infinity |
| --error-correct-reads false | --error-correct-reads false |
| --kmer-length-for-read-error-correction 25 | --kmer-length-for-read-error-correction 25 |
| --min-observations-for-kmer-to-be-solid 20 | --min-observations-for-kmer-to-be-solid 20 |
| --base-quality-score-threshold 18 | --base-quality-score-threshold 18 |
| --dragstr-het-hom-ratio 2 | --dragstr-het-hom-ratio 2 |
| --dont-use-dragstr-pair-hmm-scores false | --dont-use-dragstr-pair-hmm-scores false |
| --pair-hmm-gap-continuation-penalty 10 | --pair-hmm-gap-continuation-penalty 10 |
| --expected-mismatch-rate-for-read-disqualification 0.02 | --expected-mismatch-rate-for-read-disqualification 0.02 |
| --pair-hmm-implementation FASTEST_AVAILABLE | --pair-hmm-implementation FASTEST_AVAILABLE |
| --pcr-indel-model CONSERVATIVE | --pcr-indel-model CONSERVATIVE |
| --phred-scaled-global-read-mismapping-rate 45 | --phred-scaled-global-read-mismapping-rate 45 |
| --disable-symmetric-hmm-normalizing false | --disable-symmetric-hmm-normalizing false |
| --disable-cap-base-qualities-to-map-quality false | --disable-cap-base-qualities-to-map-quality false |
| --enable-dynamic-read-disqualification-for-genotyping false | --enable-dynamic-read-disqualification-for-genotyping false |
| --dynamic-read-disqualification-threshold 1.0 | --dynamic-read-disqualification-threshold 1.0 |
| --native-pair-hmm-threads 4 | --native-pair-hmm-threads 4 |
| --native-pair-hmm-use-double-precision false | --native-pair-hmm-use-double-precision false |
| --bam-writer-type CALLED_HAPLOTYPES | --bam-writer-type CALLED_HAPLOTYPES |
| --dont-use-soft-clipped-bases false | --dont-use-soft-clipped-bases false |
| --min-base-quality-score 10 | --min-base-quality-score 10 |
| --smith-waterman JAVA | --smith-waterman JAVA |
| --emit-ref-confidence NONE | --emit-ref-confidence NONE |
| --force-call-filtered-alleles false | --force-call-filtered-alleles false |
| --soft-clip-low-quality-ends false | --soft-clip-low-quality-ends false |
| --allele-informative-reads-overlap-margin 2 | --allele-informative-reads-overlap-margin 2 |
| --smith-waterman-dangling-end-match-value 25 | --smith-waterman-dangling-end-match-value 25 |
| --smith-waterman-dangling-end-mismatch-penalty -50 | --smith-waterman-dangling-end-mismatch-penalty -50 |
| --smith-waterman-dangling-end-gap-open-penalty -110 | --smith-waterman-dangling-end-gap-open-penalty -110 |
| --smith-waterman-dangling-end-gap-extend-penalty -6 | --smith-waterman-dangling-end-gap-extend-penalty -6 |
| --smith-waterman-haplotype-to-reference-match-value 200 | --smith-waterman-haplotype-to-reference-match-value 200 |
| --smith-waterman-haplotype-to-reference-mismatch-penalty -150 | --smith-waterman-haplotype-to-reference-mismatch-penalty -150 |
| --smith-waterman-haplotype-to-reference-gap-open-penalty -260 | --smith-waterman-haplotype-to-reference-gap-open-penalty -260 |
| --smith-waterman-haplotype-to-reference-gap-extend-penalty -11 | --smith-waterman-haplotype-to-reference-gap-extend-penalty -11 |
| --smith-waterman-read-to-haplotype-match-value 10 | --smith-waterman-read-to-haplotype-match-value 10 |
| --smith-waterman-read-to-haplotype-mismatch-penalty -15 | --smith-waterman-read-to-haplotype-mismatch-penalty -15 |
| --smith-waterman-read-to-haplotype-gap-open-penalty -30 | --smith-waterman-read-to-haplotype-gap-open-penalty -30 |
| --smith-waterman-read-to-haplotype-gap-extend-penalty -5 | --smith-waterman-read-to-haplotype-gap-extend-penalty -5 |
| --min-assembly-region-size 50 | --min-assembly-region-size 50 |
| --max-assembly-region-size 300 | --max-assembly-region-size 300 |
| --active-probability-threshold 0.002 | --active-probability-threshold 0.002 |
| --max-prob-propagation-distance 50 | --max-prob-propagation-distance 50 |
| --force-active false | --force-active false |
| --assembly-region-padding 100 | --assembly-region-padding 100 |
| --padding-around-indels 75 | --padding-around-indels 75 |
| --padding-around-snps 20 | --padding-around-snps 20 |
| --padding-around-strs 75 | --padding-around-strs 75 |
| --max-extension-into-assembly-region-padding-legacy 25 | --max-extension-into-assembly-region-padding-legacy 25 |
| --max-reads-per-alignment-start 50 | --max-reads-per-alignment-start 50 |
| --enable-legacy-assembly-region-trimming false | --enable-legacy-assembly-region-trimming false |
| --interval-set-rule UNION | --interval-set-rule UNION |
| --interval-padding 0 | --interval-padding 0 |
| --interval-exclusion-padding 0 | --interval-exclusion-padding 0 |
| --interval-merging-rule ALL | --interval-merging-rule ALL |
| --read-validation-stringency SILENT | --read-validation-stringency SILENT |
| --seconds-between-progress-updates 10.0 | --seconds-between-progress-updates 10.0 |
| --disable-sequence-dictionary-validation false | --disable-sequence-dictionary-validation false |
| --create-output-bam-index true | --create-output-bam-index true |
| --create-output-bam-md5 false | --create-output-bam-md5 false |
| --create-output-variant-index true | --create-output-variant-index true |
| --create-output-variant-md5 false | --create-output-variant-md5 false |
| --max-variants-per-shard 0 | --max-variants-per-shard 0 |
| --lenient false | --lenient false |
| --add-output-sam-program-record true | --add-output-sam-program-record true |
| --add-output-vcf-command-line true | --add-output-vcf-command-line true |
| --cloud-prefetch-buffer 40 | --cloud-prefetch-buffer 40 |
| --cloud-index-prefetch-buffer -1 | --cloud-index-prefetch-buffer -1 |
| --disable-bam-index-caching false | --disable-bam-index-caching false |
| --sites-only-vcf-output false | --sites-only-vcf-output false |
| --help false | --help false |
| --version false | --version false |
| --showHidden false | --showHidden false |
| --verbosity INFO | --QUIET false |
| --QUIET false | --use-jdk-deflater false |
| --use-jdk-deflater false | --use-jdk-inflater false |
| --use-jdk-inflater false | --gcs-max-retries 20 |
| --gcs-max-retries 20 | --gcs-project-for-requester-pays |
| --gcs-project-for-requester-pays | --disable-tool-default-read-filters false |
| --disable-tool-default-read-filters false | --minimum-mapping-quality 20 |
| --minimum-mapping-quality 20 | --disable-tool-default-annotations false |
| --disable-tool-default-annotations false | --enable-all-annotations false |
| --enable-all-annotations false | --allow-old-rms-mapping-quality-annotation-data false |
| --allow-old-rms-mapping-quality-annotation-data false | |
****** filter depth : Toujours la même différence...
$ grep '^NC' filter-depth.vcf | wc -l
82054
$ grep '^NC' out/63003856_S135/variantCalling/filter-polymorphisms.vcf | wc -l
8898
vs
$ grep '^NC' /Work/Groups/bisonex/ref_63003856_S135/63003856_S135_DP_over_30_not_SNP_consensual_sequence.vcf | wc -l
8864
****** KILL exclude SNP + consensual : 34 en trop !!
CLOSED: [2023-01-04 Wed 19:16]
Nouvelle version (correcton bug markdupicates)
#+begin_src sh
grep '^NC' out/63003856_S135/variantCalling/filter-depth.vcf |wc -l
82054
#+end_src
Alexis
#+begin_src sh
zgrep '^NC' /Work/Groups/bisonex/ref_63003856_S135/63003856_S135_DP_over_30.vcf | wc -l
82033
#+end_src
Ne vient pas du filtre sur la profondeur:
bcftools filter -i 'FORMAT/AD[0:1]<=10' 63003856_S135_DP_over_30.vcf
bcftools filter -i 'FORMAT/DP<=30' 63003856_S135_DP_over_30.vcf
Idem pour notre version. Rien ne sort.
On compare le nombre de lignes
#+begin_src sh
bgzip out/63003856_S135/variantCalling/filter-depth.vcf
tabix /Work/Groups/bisonex/ref_63003856_S135/63003856_S135_DP_over_30.vcf.gz
tabix out/63003856_S135/variantCalling/filter-depth.vcf.gz
bcftools isec out/63003856_S135/variantCalling/filter-depth.vcf.gz /Work/Groups/bisonex/ref_63003856_S135/63003856_S135_DP_over_30.vcf.gz -p compare-filter-depth
find compare-filter-depth/ -type f -exec wc -l {} \;
84763 compare-filter-depth/sites.txt
710 compare-filter-depth/0001.vcf
8 compare-filter-depth/README.txt
85339 compare-filter-depth/0002.vcf
85340 compare-filter-depth/0003.vcf
725 compare-filter-depth/0000.vcf
#+end_src
******** DONE Regarder les flags d'haplotypecaller : nombreuses différences...
CLOSED: [2023-01-04 Wed 19:02]
| --dbsnp /Work/Users/apraga/bisonex/work/08/fca52ac598f21a2812f866bd590792/dbSNP.gz | --dbsnp /mnt/j/bases_de_donnees/dbSNP/GCF_000001405.39.gz |
| --max-mnp-distance 2 | --max-mnp-distance 2 |
| --output 63003856_S135.vcf.gz | --output /mnt/j/working_directory_pipeline_analyse_exome/vcf/63003856_S135.vcf |
| --input 63003856_S135.bam | --input /mnt/j/working_directory_pipeline_analyse_exome/bam/63003856_S135_recalibrated_hg38.bam |
| --reference genomeRef.fna | --reference /mnt/j/bases_de_donnees/genome/GRCh38_latest_genomic.fna |
| --tmp-dir . | --verbosity WARNING |
| --use-posteriors-to-calculate-qual false | --use-posteriors-to-calculate-qual false |
| --dont-use-dragstr-priors false | --dont-use-dragstr-priors false |
| --use-new-qual-calculator true | --use-new-qual-calculator true |
| --annotate-with-num-discovered-alleles false | --annotate-with-num-discovered-alleles false |
| --heterozygosity 0.001 | --heterozygosity 0.001 |
| --indel-heterozygosity 1.25E-4 | --indel-heterozygosity 1.25E-4 |
| --heterozygosity-stdev 0.01 | --heterozygosity-stdev 0.01 |
| --standard-min-confidence-threshold-for-calling 30.0 | --standard-min-confidence-threshold-for-calling 30.0 |
| --max-alternate-alleles 6 | --max-alternate-alleles 6 |
| --max-genotype-count 1024 | --max-genotype-count 1024 |
| --sample-ploidy 2 | --sample-ploidy 2 |
| --num-reference-samples-if-no-call 0 | --num-reference-samples-if-no-call 0 |
| --genotype-assignment-method USE_PLS_TO_ASSIGN | --genotype-assignment-method USE_PLS_TO_ASSIGN |
| --contamination-fraction-to-filter 0.0 | --contamination-fraction-to-filter 0.0 |
| --output-mode EMIT_VARIANTS_ONLY | --output-mode EMIT_VARIANTS_ONLY |
| --all-site-pls false | --all-site-pls false |
| --flow-likelihood-parallel-threads 0 | --gvcf-gq-bands 1 |
| --flow-likelihood-optimized-comp false | --gvcf-gq-bands 2 |
| --flow-use-t0-tag false | --gvcf-gq-bands 3 |
| --flow-probability-threshold 0.003 | --gvcf-gq-bands 4 |
| --flow-remove-non-single-base-pair-indels false | --gvcf-gq-bands 5 |
| --flow-remove-one-zero-probs false | --gvcf-gq-bands 6 |
| --flow-quantization-bins 121 | --gvcf-gq-bands 7 |
| --flow-fill-empty-bins-value 0.001 | --gvcf-gq-bands 8 |
| --flow-symmetric-indel-probs false | --gvcf-gq-bands 9 |
| --flow-report-insertion-or-deletion false | --gvcf-gq-bands 10 |
| --flow-disallow-probs-larger-than-call false | --gvcf-gq-bands 11 |
| --flow-lump-probs false | --gvcf-gq-bands 12 |
| --flow-retain-max-n-probs-base-format false | --gvcf-gq-bands 13 |
| --flow-probability-scaling-factor 10 | --gvcf-gq-bands 14 |
| --flow-order-cycle-length 4 | --gvcf-gq-bands 15 |
| --flow-number-of-uncertain-flows-to-clip 0 | --gvcf-gq-bands 16 |
| --flow-nucleotide-of-first-uncertain-flow T | --gvcf-gq-bands 17 |
| --keep-boundary-flows false | --gvcf-gq-bands 18 |
| --gvcf-gq-bands 1 | --gvcf-gq-bands 19 |
| --gvcf-gq-bands 2 | --gvcf-gq-bands 20 |
| --gvcf-gq-bands 3 | --gvcf-gq-bands 21 |
| --gvcf-gq-bands 4 | --gvcf-gq-bands 22 |
| --gvcf-gq-bands 5 | --gvcf-gq-bands 23 |
| --gvcf-gq-bands 6 | --gvcf-gq-bands 24 |
| --gvcf-gq-bands 7 | --gvcf-gq-bands 25 |
| --gvcf-gq-bands 8 | --gvcf-gq-bands 26 |
| --gvcf-gq-bands 9 | --gvcf-gq-bands 27 |
| --gvcf-gq-bands 10 | --gvcf-gq-bands 28 |
| --gvcf-gq-bands 11 | --gvcf-gq-bands 29 |
| --gvcf-gq-bands 12 | --gvcf-gq-bands 30 |
| --gvcf-gq-bands 13 | --gvcf-gq-bands 31 |
| --gvcf-gq-bands 14 | --gvcf-gq-bands 32 |
| --gvcf-gq-bands 15 | --gvcf-gq-bands 33 |
| --gvcf-gq-bands 16 | --gvcf-gq-bands 34 |
| --gvcf-gq-bands 17 | --gvcf-gq-bands 35 |
| --gvcf-gq-bands 18 | --gvcf-gq-bands 36 |
| --gvcf-gq-bands 19 | --gvcf-gq-bands 37 |
| --gvcf-gq-bands 20 | --gvcf-gq-bands 38 |
| --gvcf-gq-bands 21 | --gvcf-gq-bands 39 |
| --gvcf-gq-bands 22 | --gvcf-gq-bands 40 |
| --gvcf-gq-bands 23 | --gvcf-gq-bands 41 |
| --gvcf-gq-bands 24 | --gvcf-gq-bands 42 |
| --gvcf-gq-bands 25 | --gvcf-gq-bands 43 |
| --gvcf-gq-bands 26 | --gvcf-gq-bands 44 |
| --gvcf-gq-bands 27 | --gvcf-gq-bands 45 |
| --gvcf-gq-bands 28 | --gvcf-gq-bands 46 |
| --gvcf-gq-bands 29 | --gvcf-gq-bands 47 |
| --gvcf-gq-bands 30 | --gvcf-gq-bands 48 |
| --gvcf-gq-bands 31 | --gvcf-gq-bands 49 |
| --gvcf-gq-bands 32 | --gvcf-gq-bands 50 |
| --gvcf-gq-bands 33 | --gvcf-gq-bands 51 |
| --gvcf-gq-bands 34 | --gvcf-gq-bands 52 |
| --gvcf-gq-bands 35 | --gvcf-gq-bands 53 |
| --gvcf-gq-bands 36 | --gvcf-gq-bands 54 |
| --gvcf-gq-bands 37 | --gvcf-gq-bands 55 |
| --gvcf-gq-bands 38 | --gvcf-gq-bands 56 |
| --gvcf-gq-bands 39 | --gvcf-gq-bands 57 |
| --gvcf-gq-bands 40 | --gvcf-gq-bands 58 |
| --gvcf-gq-bands 41 | --gvcf-gq-bands 59 |
| --gvcf-gq-bands 42 | --gvcf-gq-bands 60 |
| --gvcf-gq-bands 43 | --gvcf-gq-bands 70 |
| --gvcf-gq-bands 44 | --gvcf-gq-bands 80 |
| --gvcf-gq-bands 45 | --gvcf-gq-bands 90 |
| --gvcf-gq-bands 46 | --gvcf-gq-bands 99 |
| --gvcf-gq-bands 47 | --floor-blocks false |
| --gvcf-gq-bands 48 | --indel-size-to-eliminate-in-ref-model 10 |
| --gvcf-gq-bands 49 | --disable-optimizations false |
| --gvcf-gq-bands 50 | --dragen-mode false |
| --gvcf-gq-bands 51 | --apply-bqd false |
| --gvcf-gq-bands 52 | --apply-frd false |
| --gvcf-gq-bands 53 | --disable-spanning-event-genotyping false |
| --gvcf-gq-bands 54 | --transform-dragen-mapping-quality false |
| --gvcf-gq-bands 55 | --mapping-quality-threshold-for-genotyping 20 |
| --gvcf-gq-bands 56 | --max-effective-depth-adjustment-for-frd 0 |
| --gvcf-gq-bands 57 | --just-determine-active-regions false |
| --gvcf-gq-bands 58 | --dont-genotype false |
| --gvcf-gq-bands 59 | --do-not-run-physical-phasing false |
| --gvcf-gq-bands 60 | --do-not-correct-overlapping-quality false |
| --gvcf-gq-bands 70 | --use-filtered-reads-for-annotations false |
| --gvcf-gq-bands 80 | --adaptive-pruning false |
| --gvcf-gq-bands 90 | --do-not-recover-dangling-branches false |
| --gvcf-gq-bands 99 | --recover-dangling-heads false |
| --floor-blocks false | --kmer-size 10 |
| --indel-size-to-eliminate-in-ref-model 10 | --kmer-size 25 |
| --disable-optimizations false | --dont-increase-kmer-sizes-for-cycles false |
| --dragen-mode false | --allow-non-unique-kmers-in-ref false |
| --flow-mode NONE | --num-pruning-samples 1 |
| --apply-bqd false | --min-dangling-branch-length 4 |
| --apply-frd false | --recover-all-dangling-branches false |
| --disable-spanning-event-genotyping false | --max-num-haplotypes-in-population 128 |
| --transform-dragen-mapping-quality false | --min-pruning 2 |
| --mapping-quality-threshold-for-genotyping 20 | --adaptive-pruning-initial-error-rate 0.001 |
| --max-effective-depth-adjustment-for-frd 0 | --pruning-lod-threshold 2.302585092994046 |
| --just-determine-active-regions false | --pruning-seeding-lod-threshold 9.210340371976184 |
| --dont-genotype false | --max-unpruned-variants 100 |
| --do-not-run-physical-phasing false | --linked-de-bruijn-graph false |
| --do-not-correct-overlapping-quality false | --disable-artificial-haplotype-recovery false |
| --use-filtered-reads-for-annotations false | --enable-legacy-graph-cycle-detection false |
| --use-flow-aligner-for-stepwise-hc-filtering false | --debug-assembly false |
| --adaptive-pruning false | --debug-graph-transformations false |
| --do-not-recover-dangling-branches false | --capture-assembly-failure-bam false |
| --recover-dangling-heads false | --num-matching-bases-in-dangling-end-to-recover -1 |
| --kmer-size 10 | --error-correction-log-odds -Infinity |
| --kmer-size 25 | --error-correct-reads false |
| --dont-increase-kmer-sizes-for-cycles false | --kmer-length-for-read-error-correction 25 |
| --allow-non-unique-kmers-in-ref false | --min-observations-for-kmer-to-be-solid 20 |
| --num-pruning-samples 1 | --base-quality-score-threshold 18 |
| --min-dangling-branch-length 4 | --dragstr-het-hom-ratio 2 |
| --recover-all-dangling-branches false | --dont-use-dragstr-pair-hmm-scores false |
| --max-num-haplotypes-in-population 128 | --pair-hmm-gap-continuation-penalty 10 |
| --min-pruning 2 | --expected-mismatch-rate-for-read-disqualification 0.02 |
| --adaptive-pruning-initial-error-rate 0.001 | --pair-hmm-implementation FASTEST_AVAILABLE |
| --pruning-lod-threshold 2.302585092994046 | --pcr-indel-model CONSERVATIVE |
| --pruning-seeding-lod-threshold 9.210340371976184 | --phred-scaled-global-read-mismapping-rate 45 |
| --max-unpruned-variants 100 | --disable-symmetric-hmm-normalizing false |
| --linked-de-bruijn-graph false | --disable-cap-base-qualities-to-map-quality false |
| --disable-artificial-haplotype-recovery false | --enable-dynamic-read-disqualification-for-genotyping false |
| --enable-legacy-graph-cycle-detection false | --dynamic-read-disqualification-threshold 1.0 |
| --debug-assembly false | --native-pair-hmm-threads 4 |
| --debug-graph-transformations false | --native-pair-hmm-use-double-precision false |
| --capture-assembly-failure-bam false | --bam-writer-type CALLED_HAPLOTYPES |
| --num-matching-bases-in-dangling-end-to-recover -1 | --dont-use-soft-clipped-bases false |
| --error-correction-log-odds -Infinity | --min-base-quality-score 10 |
| --error-correct-reads false | --smith-waterman JAVA |
| --kmer-length-for-read-error-correction 25 | --emit-ref-confidence NONE |
| --min-observations-for-kmer-to-be-solid 20 | --force-call-filtered-alleles false |
| --likelihood-calculation-engine PairHMM | --soft-clip-low-quality-ends false |
| --base-quality-score-threshold 18 | --allele-informative-reads-overlap-margin 2 |
| --dragstr-het-hom-ratio 2 | --smith-waterman-dangling-end-match-value 25 |
| --dont-use-dragstr-pair-hmm-scores false | --smith-waterman-dangling-end-mismatch-penalty -50 |
| --pair-hmm-gap-continuation-penalty 10 | --smith-waterman-dangling-end-gap-open-penalty -110 |
| --expected-mismatch-rate-for-read-disqualification 0.02 | --smith-waterman-dangling-end-gap-extend-penalty -6 |
| --pair-hmm-implementation FASTEST_AVAILABLE | --smith-waterman-haplotype-to-reference-match-value 200 |
| --pcr-indel-model CONSERVATIVE | --smith-waterman-haplotype-to-reference-mismatch-penalty -150 |
| --phred-scaled-global-read-mismapping-rate 45 | --smith-waterman-haplotype-to-reference-gap-open-penalty -260 |
| --disable-symmetric-hmm-normalizing false | --smith-waterman-haplotype-to-reference-gap-extend-penalty -11 |
| --disable-cap-base-qualities-to-map-quality false | --smith-waterman-read-to-haplotype-match-value 10 |
| --enable-dynamic-read-disqualification-for-genotyping false | --smith-waterman-read-to-haplotype-mismatch-penalty -15 |
| --dynamic-read-disqualification-threshold 1.0 | --smith-waterman-read-to-haplotype-gap-open-penalty -30 |
| --native-pair-hmm-threads 4 | --smith-waterman-read-to-haplotype-gap-extend-penalty -5 |
| --native-pair-hmm-use-double-precision false | --min-assembly-region-size 50 |
| --flow-hmm-engine-min-indel-adjust 6 | --max-assembly-region-size 300 |
| --flow-hmm-engine-flat-insertion-penatly 45 | --active-probability-threshold 0.002 |
| --flow-hmm-engine-flat-deletion-penatly 45 | --max-prob-propagation-distance 50 |
| --pileup-detection false | --force-active false |
| --pileup-detection-enable-indel-pileup-calling false | --assembly-region-padding 100 |
| --num-artificial-haplotypes-to-add-per-allele 5 | --padding-around-indels 75 |
| --artifical-haplotype-filtering-kmer-size 10 | --padding-around-snps 20 |
| --pileup-detection-snp-alt-threshold 0.1 | --padding-around-strs 75 |
| --pileup-detection-indel-alt-threshold 0.5 | --max-extension-into-assembly-region-padding-legacy 25 |
| --pileup-detection-absolute-alt-depth 0.0 | --max-reads-per-alignment-start 50 |
| --pileup-detection-snp-adjacent-to-assembled-indel-range 5 | --enable-legacy-assembly-region-trimming false |
| --pileup-detection-bad-read-tolerance 0.0 | --interval-set-rule UNION |
| --pileup-detection-proper-pair-read-badness true | --interval-padding 0 |
| --pileup-detection-edit-distance-read-badness-threshold 0.08 | --interval-exclusion-padding 0 |
| --pileup-detection-chimeric-read-badness true | --interval-merging-rule ALL |
| --pileup-detection-template-mean-badness-threshold 0.0 | --read-validation-stringency SILENT |
| --pileup-detection-template-std-badness-threshold 0.0 | --seconds-between-progress-updates 10.0 |
| --bam-writer-type CALLED_HAPLOTYPES | --disable-sequence-dictionary-validation false |
| --dont-use-soft-clipped-bases false | --create-output-bam-index true |
| --override-fragment-softclip-check false | --create-output-bam-md5 false |
| --min-base-quality-score 10 | --create-output-variant-index true |
| --smith-waterman JAVA | --create-output-variant-md5 false |
| --emit-ref-confidence NONE | --max-variants-per-shard 0 |
| --force-call-filtered-alleles false | --lenient false |
| --reference-model-deletion-quality 30 | --add-output-sam-program-record true |
| --soft-clip-low-quality-ends false | --add-output-vcf-command-line true |
| --allele-informative-reads-overlap-margin 2 | --cloud-prefetch-buffer 40 |
| --smith-waterman-dangling-end-match-value 25 | --cloud-index-prefetch-buffer -1 |
| --smith-waterman-dangling-end-mismatch-penalty -50 | --disable-bam-index-caching false |
| --smith-waterman-dangling-end-gap-open-penalty -110 | --sites-only-vcf-output false |
| --smith-waterman-dangling-end-gap-extend-penalty -6 | --help false |
| --smith-waterman-haplotype-to-reference-match-value 200 | --version false |
| --smith-waterman-haplotype-to-reference-mismatch-penalty -150 | --showHidden false |
| --smith-waterman-haplotype-to-reference-gap-open-penalty -260 | --QUIET false |
| --smith-waterman-haplotype-to-reference-gap-extend-penalty -11 | --use-jdk-deflater false |
| --smith-waterman-read-to-haplotype-match-value 10 | --use-jdk-inflater false |
| --smith-waterman-read-to-haplotype-mismatch-penalty -15 | --gcs-max-retries 20 |
| --smith-waterman-read-to-haplotype-gap-open-penalty -30 | --gcs-project-for-requester-pays |
| --smith-waterman-read-to-haplotype-gap-extend-penalty -5 | --disable-tool-default-read-filters false |
| --flow-assembly-collapse-hmer-size 0 | --minimum-mapping-quality 20 |
| --flow-assembly-collapse-partial-mode false | --disable-tool-default-annotations false |
| --flow-filter-alleles false | --enable-all-annotations false |
| --flow-filter-alleles-qual-threshold 30.0 | --allow-old-rms-mapping-quality-annotation-data false |
| --flow-filter-alleles-sor-threshold 3.0 | Version="4.2.4.1",Date="December 3, 2022 at 1:20:38 AM CET"> |
| --flow-filter-lone-alleles false |
| --flow-filter-alleles-debug-graphs false |
| --min-assembly-region-size 50 |
| --max-assembly-region-size 300 |
| --active-probability-threshold 0.002 |
| --max-prob-propagation-distance 50 |
| --force-active false |
| --assembly-region-padding 100 |
| --padding-around-indels 75 |
| --padding-around-snps 20 |
| --padding-around-strs 75 |
| --max-extension-into-assembly-region-padding-legacy 25 |
| --max-reads-per-alignment-start 50 |
| --enable-legacy-assembly-region-trimming false |
| --interval-set-rule UNION |
| --interval-padding 0 |
| --interval-exclusion-padding 0 |
| --interval-merging-rule ALL |
| --read-validation-stringency SILENT |
| --seconds-between-progress-updates 10.0 |
| --disable-sequence-dictionary-validation false |
| --create-output-bam-index true |
| --create-output-bam-md5 false |
| --create-output-variant-index true |
| --create-output-variant-md5 false |
| --max-variants-per-shard 0 |
| --lenient false |
| --add-output-sam-program-record true |
| --add-output-vcf-command-line true |
| --cloud-prefetch-buffer 40 |
| --cloud-index-prefetch-buffer -1 |
| --disable-bam-index-caching false |
| --sites-only-vcf-output false |
| --help false |
| --version false |
| --showHidden false |
| --verbosity INFO |
| --QUIET false |
| --use-jdk-deflater false |
| --use-jdk-inflater false |
| --gcs-max-retries 20 |
| --gcs-project-for-requester-pays |
| --disable-tool-default-read-filters false |
| --minimum-mapping-quality 20 |
| --disable-tool-default-annotations false |
| --enable-all-annotations false |
| --allow-old-rms-mapping-quality-annotation-data false" |
| Version="4.3.0.0",Date="December 16, 2022 at 12:51:03 AM CET"> |
****** KILL [#B] filterDepth : 21 en trop
CLOSED: [2023-01-04 Wed 19:16]
$ grep '^NC' out/63003856_S135/variantCalling/haplotypecaller/63003856_S135.vcf | wc -l
1631935
$ grep '^NC' /Work/Groups/bisonex/ref-vcf/63003856_S135 .vcf | wc -l
1506894
******** DONE Nombres lignes gatk 4.3.0 : trop différent
CLOSED: [2023-01-04 Wed 19:11]
#+begin_src
gatk --java-options "-Xmx3g" HaplotypeCaller \
--input 63003856_S135.bam \
--output 63003856_S135.vcf.gz \
--reference genomeRef.fna \
--dbsnp dbSNP.gz \
\
\
--tmp-dir . \
--max-mnp-distance 2
#+end_src
#+begin_src
$gatk --java-options "-Xmx32g" HaplotypeCaller \
-R $genomeRef \
-I $bamDir/$post_ApplyBQSR \
-O $vcfDir/$post_haplotypecaller \
-D "$dbsnpDir"/GCF_000001405.39.gz \
--max-mnp-distance 2 \
--verbosity WARNING
#+end_src
***** KILL Variant caling
CLOSED: [2023-01-04 Wed 19:16]
****** KILL haplotypecaller
CLOSED: [2023-01-04 Wed 19:15]
******** options ok
options ok
#+begin_src
gatk --java-options "-Xmx3g" ApplyBQSR \
--input marked_dups.bam \
--output 63003856_S135.bam \
--reference genomeRef.fna \
--bqsr-recal-file 63003856_S135.table \
\
--tmp-dir . \
#+end_src
#+begin_src
$gatk ApplyBQSR -R $genomeRef \
-I $tmpDir/$post_markDuplicate \
--bqsr-recal-file $tmpDir/$post_BaseRecalibrator \
-O $bamDir/$post_ApplyBQSR \
--verbosity WARNING
#+end_src
missing file
******* options ok
#+begin_src
gatk --java-options "-Xmx3g" BaseRecalibrator \
--input marked_dups.bam \
--output 63003856_S135.table \
--reference genomeRef.fna \
\
--known-sites dbSNP_common.vcf.gz \
--tmp-dir . \
#+end_src
****** KILL applybqsr
CLOSED: [2023-01-04 Wed 19:15]
#+begin_src
$gatk BaseRecalibrator \
-I $tmpDir/$post_markDuplicate \
-R $genomeRef \
--known-sites $dbsnpDir/dbSNP_common.vcf.gz \
-O $tmpDir/$post_BaseRecalibrator
#+end_src
$ cd /Work/Users/apraga/bisonex/out/63003856_S135/preprocessing/baserecalibrator
$ sed 's/sample/63003856_S135/' 63003856_S135.table > 63003856_S135.table2
Les fichiers n'ont pas le même nombre d'erreurs mais assez proches. Sur le premier table, 3 score
#:GATKTable:3:94:%d:%d:%d:;
#:GATKTable:Quantized:Quality quantization map
QualityScore Count QuantizedScore
11 298878631 11
25 542282996 25
34 12846268833 34
vs (référence0)
11 298877785 11
25 542282089 25
34 12846264839 34
****** KILL baserecalibrator
CLOSED: [2023-01-04 Wed 19:15]
#+begin_src
gatk --java-options "-Xmx3g" MarkDuplicates \
--INPUT sorted.bam \
--OUTPUT marked_dups.bam \
--METRICS_FILE marked_dups.bam.metrics \
--TMP_DIR . \
--REFERENCE_SEQUENCE genomeRef.fna \
#+end_src
#+begin_src
$gatk MarkDuplicates \
-I $tmpDir/$post_cleanSam \
-O $tmpDir/$post_markDuplicate \
-M $tmpDir/"$sample"_marked_dup.metrix \
--CREATE_INDEX true \
--VERBOSITY WARNING
#+end_src
******* Arguments ok
logique car on ne supprime pas de donné...
128077211
$ samtools view -c work/46/bd75b4547452af36ee2c6b45362922/63003856_S135
CLOSED: [2022-12-26 Mon 22:27]
Alexis
$ samtools view -c /Work/Groups/bisonex/ref_63003856_S135/63003856_S135_marked_dup.bam
128077207
Nous (pas de sortie dans out/)
****** DONE mark duplicate
$ samtools flagstat work/57/4b5b4c647b98bb7099c4d1ba24bd75/63003856_S135.bam
128077211 + 0 in total (QC-passed reads + QC-failed reads)
126905130 + 0 primary
0 + 0 secondary
1172081 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
127941054 + 0 mapped (99.89% : N/A)
126768973 + 0 primary mapped (99.89% : N/A)
126905130 + 0 paired in sequencing
63452565 + 0 read1
63452565 + 0 read2
125263664 + 0 properly paired (98.71% : N/A)
126676024 + 0 with itself and mate mapped
92949 + 0 singletons (0.07% : N/A)
979608 + 0 with mate mapped to a different chr
675398 + 0 with mate mapped to a different chr (mapQ>=5)
$ samtools flagstat /Work/Groups/bisonex/ref_63003856_S135/63003856_S135_cleaned.bam
128077207 + 0 in total (QC-passed reads + QC-failed reads)
126905130 + 0 primary
0 + 0 secondary
1172077 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
127941051 + 0 mapped (99.89% : N/A)
126768974 + 0 primary mapped (99.89% : N/A)
126905130 + 0 paired in sequencing
63452565 + 0 read1
63452565 + 0 read2
125263790 + 0 properly paired (98.71% : N/A)
126676026 + 0 with itself and mate mapped
92948 + 0 singletons (0.07% : N/A)
979618 + 0 with mate mapped to a different chr
675412 + 0 with mate mapped to a different chr (mapQ>=5)
[57/4b5b4c] Cached process > preprocess:GATK4_CLEANSAM (63003856_S135)
$ samtools view -c work/57/4b5b4c647b98bb7099c4d1ba24bd75/63003856_S135.bam
128077211
Et la sortie
$ samtools view -c out/63003856_S135/preprocessing/clean-sam/sorted.bam
128077211
CLOSED: [2022-12-26 Mon 22:08]
****** DONE Nettoyé
****** DONE Avec gatk 4.2 (version alexis) : idem
$ samtools view -c mapped/63003856_S135.bam
128077211
#+begin_src
bwa mem \
-R '@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tPM:Miseq\tCN:CHU_Minjoz\tLB:definition_to_add' \
-t 24 \
$INDEX \
63003856_S135_R1_001.fastq.gz 63003856_S135_R2_001.fastq.gz \
| samtools sort --threads 24 -o 63003856_S135.bam -
#+end_src
#+begin_src
bwa mem -R "@RG\tID:$sample\tSM:$sample\tPL:ILLUMINA\tPM:Miseq\tCN:CHU_Minjoz\tLB:definition_to_add" -v 2 -t `nproc` $genomeRef ${fastq[0]} ${fastq[1]} | $samtools sort -@ `nproc` -O BAM -o $tmpDir/$post_bwa
#+end_src
****** On vérifie les arguments: ok
CLOSED: [2022-12-26 Mon 22:03]
Bam alexis
$ samtools view -c /Work/Groups/bisonex/ref_63003856_S135/63003856_S135.bam
128077207
Notre
9f/26cf3d] Cached process > preprocess:BWA_MEM (63003856_S135)
$ samtools view -c work/9f/26cf3deb07b425a3e851be2a7bd782/63003856_S135.bam
On vérifie la sortie
$ samtools view -c out/63003856_S135/preprocessing/mapped/63003856_S135.bam
128077211
Petite différence (< 1e-8) mais selon Alexis, bwa mem est non reproductible. d'autant qu'on utilise une version parallélisée
128077211
#+RESULTS:
****** DONE dbSNP et dbSNP common: ok
CLOSED: [2023-01-03 Tue 23:17]
sha256sum GCF_000001405.39.gz
452e1112b6339a9b19821c2a226a8a3ba946e92a47e03e6ae464ef8820ee130d GCF_000001405.39.gz
sha256sum data-alexis-reference/dbSNP/GCF_000001405.39.gz
452e1112b6339a9b19821c2a226a8a3ba946e92a47e03e6ae464ef8820ee130d data-alexis-reference/dbSNP/GCF_000001405.39.gz"
sha256sum dbSNP_common.vcf.gz
70dfd9be859c39916598d23b5744cc1fbda04add5840cd90a6d0cd005bd3075b dbSNP_common.vcf.gz
sha256sum data-alexis-reference/dbSNP/dbSNP_common.vcf.gz
70dfd9be859c39916598d23b5744cc1fbda04add5840cd90a6d0cd005bd3075b data-alexis-reference/dbSNP/dbSNP_common.vcf.gz
***** TODO Outils
| | Prod | Test |
| VCFtools | 0.1.17 | 0.1.16 |
| bcftools | 1.14 | 1.16 |
| samtools | 1.14 | 1.13 |
| gatk | 4.2.4.1 | 4.3.0.0 |
On a des versions plus vieilles sauf (le plus important) Gatk
**** KILL Gatk 4.3.0
CLOSED: [2023-01-04 Wed 19:16]
***** KILL Alignement
CLOSED: [2023-01-04 Wed 19:16]
****** DONE Brut
Dict ok si on renome le ficdhier d'origine
#+begin_src sh :dir /ssh:meso:/Work/Groups/bisonex/
sed 's/UR:.*/UR:genomeRef.fna/' data-alexis-reference/genome/GRCh38_latest_genomic.dict > lol.dict
diff lol.dict data/genome/GRCh38.p13/genomeRef.dict
#+end_src
#+RESULTS:
| e0761a7ba5d10de9e7e97fa331667963925531c0199575bcceafbb13c3147e3f | data-alexis-reference/genome/GRCh38_latest_genomic.fna |
| e0761a7ba5d10de9e7e97fa331667963925531c0199575bcceafbb13c3147e3f | data/genome/GRCh38.p13/genomeRef.fna |
******* DONE Flags la même version de gatk 4.2.2 : ok identique
CLOSED: [2023-01-04 Wed 19:09]
##GATKCommandLine=<ID=HaplotypeCaller,CommandLine="HaplotypeCaller
| ",Version="4.2.4.1",Date="January 4, 2023 at 1:46:41 AM CET"> | Version="4.2.4.1",Date="December 3, 2022 at 1:20:38 AM CET"> |
| --dbsnp /Work/Users/apraga/bisonex/work/5d/feb81028d262d7701bed0a759ff6f6/dbSNP.gz | --dbsnp /mnt/j/bases_de_donnees/dbSNP/GCF_000001405.39.gz |
| --max-mnp-distance 2 | --max-mnp-distance 2 |
| --output 63003856_S135.vcf.gz | --output /mnt/j/working_directory_pipeline_analyse_exome/vcf/63003856_S135.vcf |
| --input 63003856_S135.bam | --input /mnt/j/working_directory_pipeline_analyse_exome/bam/63003856_S135_recalibrated_hg38.bam |
| --reference genomeRef.fna | --reference /mnt/j/bases_de_donnees/genome/GRCh38_latest_genomic.fna |
| --tmp-dir . | --verbosity WARNING |
| --use-posteriors-to-calculate-qual false | --use-posteriors-to-calculate-qual false |
| --dont-use-dragstr-priors false | --dont-use-dragstr-priors false |
| --use-new-qual-calculator true | --use-new-qual-calculator true |
| --annotate-with-num-discovered-alleles false | --annotate-with-num-discovered-alleles false |
| --heterozygosity 0.001 | --heterozygosity 0.001 |
| --indel-heterozygosity 1.25E-4 | --indel-heterozygosity 1.25E-4 |
| --heterozygosity-stdev 0.01 | --heterozygosity-stdev 0.01 |
| --standard-min-confidence-threshold-for-calling 30.0 | --standard-min-confidence-threshold-for-calling 30.0 |
| --max-alternate-alleles 6 | --max-alternate-alleles 6 |
| --max-genotype-count 1024 | --max-genotype-count 1024 |
| --sample-ploidy 2 | --sample-ploidy 2 |
| --num-reference-samples-if-no-call 0 | --num-reference-samples-if-no-call 0 |
| --genotype-assignment-method USE_PLS_TO_ASSIGN | --genotype-assignment-method USE_PLS_TO_ASSIGN |
| --contamination-fraction-to-filter 0.0 | --contamination-fraction-to-filter 0.0 |
| --output-mode EMIT_VARIANTS_ONLY | --output-mode EMIT_VARIANTS_ONLY |
| --all-site-pls false | --all-site-pls false |
| --gvcf-gq-bands 1 | --gvcf-gq-bands 1 |
| --gvcf-gq-bands 2 | --gvcf-gq-bands 2 |
| --gvcf-gq-bands 3 | --gvcf-gq-bands 3 |
| --gvcf-gq-bands 4 | --gvcf-gq-bands 4 |
| --gvcf-gq-bands 5 | --gvcf-gq-bands 5 |
| --gvcf-gq-bands 6 | --gvcf-gq-bands 6 |
| --gvcf-gq-bands 7 | --gvcf-gq-bands 7 |
| --gvcf-gq-bands 8 | --gvcf-gq-bands 8 |
| --gvcf-gq-bands 9 | --gvcf-gq-bands 9 |
| --gvcf-gq-bands 10 | --gvcf-gq-bands 10 |
| --gvcf-gq-bands 11 | --gvcf-gq-bands 11 |
| --gvcf-gq-bands 12 | --gvcf-gq-bands 12 |
| --gvcf-gq-bands 13 | --gvcf-gq-bands 13 |
| --gvcf-gq-bands 14 | --gvcf-gq-bands 14 |
| --gvcf-gq-bands 15 | --gvcf-gq-bands 15 |
| --gvcf-gq-bands 16 | --gvcf-gq-bands 16 |
| --gvcf-gq-bands 17 | --gvcf-gq-bands 17 |
| --gvcf-gq-bands 18 | --gvcf-gq-bands 18 |
| --gvcf-gq-bands 19 | --gvcf-gq-bands 19 |
| --gvcf-gq-bands 20 | --gvcf-gq-bands 20 |
| --gvcf-gq-bands 21 | --gvcf-gq-bands 21 |
| --gvcf-gq-bands 22 | --gvcf-gq-bands 22 |
| --gvcf-gq-bands 23 | --gvcf-gq-bands 23 |
| --gvcf-gq-bands 24 | --gvcf-gq-bands 24 |
| --gvcf-gq-bands 25 | --gvcf-gq-bands 25 |
| --gvcf-gq-bands 26 | --gvcf-gq-bands 26 |
| --gvcf-gq-bands 27 | --gvcf-gq-bands 27 |
| --gvcf-gq-bands 28 | --gvcf-gq-bands 28 |
| --gvcf-gq-bands 29 | --gvcf-gq-bands 29 |
| --gvcf-gq-bands 30 | --gvcf-gq-bands 30 |
| --gvcf-gq-bands 31 | --gvcf-gq-bands 31 |
| --gvcf-gq-bands 32 | --gvcf-gq-bands 32 |
| --gvcf-gq-bands 33 | --gvcf-gq-bands 33 |
| --gvcf-gq-bands 34 | --gvcf-gq-bands 34 |
| --gvcf-gq-bands 35 | --gvcf-gq-bands 35 |
| --gvcf-gq-bands 36 | --gvcf-gq-bands 36 |
| --gvcf-gq-bands 37 | --gvcf-gq-bands 37 |
| --gvcf-gq-bands 38 | --gvcf-gq-bands 38 |
| --gvcf-gq-bands 39 | --gvcf-gq-bands 39 |
| --gvcf-gq-bands 40 | --gvcf-gq-bands 40 |
| --gvcf-gq-bands 41 | --gvcf-gq-bands 41 |
| --gvcf-gq-bands 42 | --gvcf-gq-bands 42 |
| --gvcf-gq-bands 43 | --gvcf-gq-bands 43 |
| --gvcf-gq-bands 44 | --gvcf-gq-bands 44 |
| --gvcf-gq-bands 45 | --gvcf-gq-bands 45 |
| --gvcf-gq-bands 46 | --gvcf-gq-bands 46 |
| --gvcf-gq-bands 47 | --gvcf-gq-bands 47 |
| --gvcf-gq-bands 48 | --gvcf-gq-bands 48 |
| --gvcf-gq-bands 49 | --gvcf-gq-bands 49 |
| --gvcf-gq-bands 50 | --gvcf-gq-bands 50 |
| --gvcf-gq-bands 51 | --gvcf-gq-bands 51 |
| --gvcf-gq-bands 52 | --gvcf-gq-bands 52 |
| --gvcf-gq-bands 53 | --gvcf-gq-bands 53 |
| --gvcf-gq-bands 54 | --gvcf-gq-bands 54 |
| --gvcf-gq-bands 55 | --gvcf-gq-bands 55 |
| --gvcf-gq-bands 56 | --gvcf-gq-bands 56 |
| --gvcf-gq-bands 57 | --gvcf-gq-bands 57 |
| --gvcf-gq-bands 58 | --gvcf-gq-bands 58 |
| --gvcf-gq-bands 59 | --gvcf-gq-bands 59 |
| --gvcf-gq-bands 60 | --gvcf-gq-bands 60 |
| --gvcf-gq-bands 70 | --gvcf-gq-bands 70 |
| --gvcf-gq-bands 80 | --gvcf-gq-bands 80 |
| --gvcf-gq-bands 90 | --gvcf-gq-bands 90 |
| --gvcf-gq-bands 99 | --gvcf-gq-bands 99 |
| --floor-blocks false | --floor-blocks false |
| --indel-size-to-eliminate-in-ref-model 10 | --indel-size-to-eliminate-in-ref-model 10 |
| --disable-optimizations false | --disable-optimizations false |
| --dragen-mode false | --dragen-mode false |
| --apply-bqd false | --apply-bqd false |
| --apply-frd false | --apply-frd false |
| --disable-spanning-event-genotyping false | --disable-spanning-event-genotyping false |
| --transform-dragen-mapping-quality false | --transform-dragen-mapping-quality false |
| --mapping-quality-threshold-for-genotyping 20 | --mapping-quality-threshold-for-genotyping 20 |
| --max-effective-depth-adjustment-for-frd 0 | --max-effective-depth-adjustment-for-frd 0 |
| --just-determine-active-regions false | --just-determine-active-regions false |
| --dont-genotype false | --dont-genotype false |
| --do-not-run-physical-phasing false | --do-not-run-physical-phasing false |
| --do-not-correct-overlapping-quality false | --do-not-correct-overlapping-quality false |
| --use-filtered-reads-for-annotations false | --use-filtered-reads-for-annotations false |
| --adaptive-pruning false | --adaptive-pruning false |
| --do-not-recover-dangling-branches false | --do-not-recover-dangling-branches false |
| --recover-dangling-heads false | --recover-dangling-heads false |
| --kmer-size 10 | --kmer-size 10 |
| --kmer-size 25 | --kmer-size 25 |
| --dont-increase-kmer-sizes-for-cycles false | --dont-increase-kmer-sizes-for-cycles false |
| --allow-non-unique-kmers-in-ref false | --allow-non-unique-kmers-in-ref false |
| --num-pruning-samples 1 | --num-pruning-samples 1 |
| --min-dangling-branch-length 4 | --min-dangling-branch-length 4 |
| --recover-all-dangling-branches false | --recover-all-dangling-branches false |
| --max-num-haplotypes-in-population 128 | --max-num-haplotypes-in-population 128 |
| --min-pruning 2 | --min-pruning 2 |
| --adaptive-pruning-initial-error-rate 0.001 | --adaptive-pruning-initial-error-rate 0.001 |
| --pruning-lod-threshold 2.302585092994046 | --pruning-lod-threshold 2.302585092994046 |
| --pruning-seeding-lod-threshold 9.210340371976184 | --pruning-seeding-lod-threshold 9.210340371976184 |
| --max-unpruned-variants 100 | --max-unpruned-variants 100 |
| --linked-de-bruijn-graph false | --linked-de-bruijn-graph false |
| --disable-artificial-haplotype-recovery false | --disable-artificial-haplotype-recovery false |
| --enable-legacy-graph-cycle-detection false | --enable-legacy-graph-cycle-detection false |
| --debug-assembly false | --debug-assembly false |
| --debug-graph-transformations false | --debug-graph-transformations false |
| --capture-assembly-failure-bam false | --capture-assembly-failure-bam false |
| --num-matching-bases-in-dangling-end-to-recover -1 | --num-matching-bases-in-dangling-end-to-recover -1 |
| --error-correction-log-odds -Infinity | --error-correction-log-odds -Infinity |
| --error-correct-reads false | --error-correct-reads false |
| --kmer-length-for-read-error-correction 25 | --kmer-length-for-read-error-correction 25 |
| --min-observations-for-kmer-to-be-solid 20 | --min-observations-for-kmer-to-be-solid 20 |
| --base-quality-score-threshold 18 | --base-quality-score-threshold 18 |
| --dragstr-het-hom-ratio 2 | --dragstr-het-hom-ratio 2 |
| --dont-use-dragstr-pair-hmm-scores false | --dont-use-dragstr-pair-hmm-scores false |
| --pair-hmm-gap-continuation-penalty 10 | --pair-hmm-gap-continuation-penalty 10 |
| --expected-mismatch-rate-for-read-disqualification 0.02 | --expected-mismatch-rate-for-read-disqualification 0.02 |
| --pair-hmm-implementation FASTEST_AVAILABLE | --pair-hmm-implementation FASTEST_AVAILABLE |
| --pcr-indel-model CONSERVATIVE | --pcr-indel-model CONSERVATIVE |
| --phred-scaled-global-read-mismapping-rate 45 | --phred-scaled-global-read-mismapping-rate 45 |
| --disable-symmetric-hmm-normalizing false | --disable-symmetric-hmm-normalizing false |
| --disable-cap-base-qualities-to-map-quality false | --disable-cap-base-qualities-to-map-quality false |
| --enable-dynamic-read-disqualification-for-genotyping false | --enable-dynamic-read-disqualification-for-genotyping false |
| --dynamic-read-disqualification-threshold 1.0 | --dynamic-read-disqualification-threshold 1.0 |
| --native-pair-hmm-threads 4 | --native-pair-hmm-threads 4 |
| --native-pair-hmm-use-double-precision false | --native-pair-hmm-use-double-precision false |
| --bam-writer-type CALLED_HAPLOTYPES | --bam-writer-type CALLED_HAPLOTYPES |
| --dont-use-soft-clipped-bases false | --dont-use-soft-clipped-bases false |
| --min-base-quality-score 10 | --min-base-quality-score 10 |
| --smith-waterman JAVA | --smith-waterman JAVA |
| --emit-ref-confidence NONE | --emit-ref-confidence NONE |
| --force-call-filtered-alleles false | --force-call-filtered-alleles false |
| --soft-clip-low-quality-ends false | --soft-clip-low-quality-ends false |
| --allele-informative-reads-overlap-margin 2 | --allele-informative-reads-overlap-margin 2 |
| --smith-waterman-dangling-end-match-value 25 | --smith-waterman-dangling-end-match-value 25 |
| --smith-waterman-dangling-end-mismatch-penalty -50 | --smith-waterman-dangling-end-mismatch-penalty -50 |
| --smith-waterman-dangling-end-gap-open-penalty -110 | --smith-waterman-dangling-end-gap-open-penalty -110 |
| --smith-waterman-dangling-end-gap-extend-penalty -6 | --smith-waterman-dangling-end-gap-extend-penalty -6 |
| --smith-waterman-haplotype-to-reference-match-value 200 | --smith-waterman-haplotype-to-reference-match-value 200 |
| --smith-waterman-haplotype-to-reference-mismatch-penalty -150 | --smith-waterman-haplotype-to-reference-mismatch-penalty -150 |
| --smith-waterman-haplotype-to-reference-gap-open-penalty -260 | --smith-waterman-haplotype-to-reference-gap-open-penalty -260 |
| --smith-waterman-haplotype-to-reference-gap-extend-penalty -11 | --smith-waterman-haplotype-to-reference-gap-extend-penalty -11 |
| --smith-waterman-read-to-haplotype-match-value 10 | --smith-waterman-read-to-haplotype-match-value 10 |
| --smith-waterman-read-to-haplotype-mismatch-penalty -15 | --smith-waterman-read-to-haplotype-mismatch-penalty -15 |
| --smith-waterman-read-to-haplotype-gap-open-penalty -30 | --smith-waterman-read-to-haplotype-gap-open-penalty -30 |
| --smith-waterman-read-to-haplotype-gap-extend-penalty -5 | --smith-waterman-read-to-haplotype-gap-extend-penalty -5 |
| --min-assembly-region-size 50 | --min-assembly-region-size 50 |
| --max-assembly-region-size 300 | --max-assembly-region-size 300 |
| --active-probability-threshold 0.002 | --active-probability-threshold 0.002 |
| --max-prob-propagation-distance 50 | --max-prob-propagation-distance 50 |
| --force-active false | --force-active false |
| --assembly-region-padding 100 | --assembly-region-padding 100 |
| --padding-around-indels 75 | --padding-around-indels 75 |
| --padding-around-snps 20 | --padding-around-snps 20 |
| --padding-around-strs 75 | --padding-around-strs 75 |
| --max-extension-into-assembly-region-padding-legacy 25 | --max-extension-into-assembly-region-padding-legacy 25 |
| --max-reads-per-alignment-start 50 | --max-reads-per-alignment-start 50 |
| --enable-legacy-assembly-region-trimming false | --enable-legacy-assembly-region-trimming false |
| --interval-set-rule UNION | --interval-set-rule UNION |
| --interval-padding 0 | --interval-padding 0 |
| --interval-exclusion-padding 0 | --interval-exclusion-padding 0 |
| --interval-merging-rule ALL | --interval-merging-rule ALL |
| --read-validation-stringency SILENT | --read-validation-stringency SILENT |
| --seconds-between-progress-updates 10.0 | --seconds-between-progress-updates 10.0 |
| --disable-sequence-dictionary-validation false | --disable-sequence-dictionary-validation false |
| --create-output-bam-index true | --create-output-bam-index true |
| --create-output-bam-md5 false | --create-output-bam-md5 false |
| --create-output-variant-index true | --create-output-variant-index true |
| --create-output-variant-md5 false | --create-output-variant-md5 false |
| --max-variants-per-shard 0 | --max-variants-per-shard 0 |
| --lenient false | --lenient false |
| --add-output-sam-program-record true | --add-output-sam-program-record true |
| --add-output-vcf-command-line true | --add-output-vcf-command-line true |
| --cloud-prefetch-buffer 40 | --cloud-prefetch-buffer 40 |
| --cloud-index-prefetch-buffer -1 | --cloud-index-prefetch-buffer -1 |
| --disable-bam-index-caching false | --disable-bam-index-caching false |
| --sites-only-vcf-output false | --sites-only-vcf-output false |
| --help false | --help false |
| --version false | --version false |
| --showHidden false | --showHidden false |
| --verbosity INFO | --QUIET false |
| --QUIET false | --use-jdk-deflater false |
| --use-jdk-deflater false | --use-jdk-inflater false |
| --use-jdk-inflater false | --gcs-max-retries 20 |
| --gcs-max-retries 20 | --gcs-project-for-requester-pays |
| --gcs-project-for-requester-pays | --disable-tool-default-read-filters false |
| --disable-tool-default-read-filters false | --minimum-mapping-quality 20 |
| --minimum-mapping-quality 20 | --disable-tool-default-annotations false |
| --disable-tool-default-annotations false | --enable-all-annotations false |
| --enable-all-annotations false | --allow-old-rms-mapping-quality-annotation-data false |
| --allow-old-rms-mapping-quality-annotation-data false | |
****** filter depth : Toujours la même différence...
$ grep '^NC' filter-depth.vcf | wc -l
82054
$ zgrep '^NC' /Work/Groups/bisonex/ref_63003856_S135/63003856_S135_DP_over_30.vcf.gz | wc -l
82033
Non lié à la profondeur : on teste avec
bcftools filter -i 'FORMAT/DP<=30' filter-depth.vcf
bcftools filter -i 'FORMAT/AD[0:1]<=10' filter-depth.vcf
****** Vérifier qu'en utilsant 2 filtres différents on a bien la même chose : oui
$ bcftools filter -e 'FORMAT/DP<=30' 63003856_S135.vcf.gz | bcftools filter -e 'FORMAT/AD[0:1]<=10' -o two-filters.vcf
$ grep '^NC' two-filters.vcf | wc -l
82054
***** Tester bwa en séquentiel
**** STRT (save) Version d'Alexis 24 threads, sans télécharger les bases de données, gatk 4.2.4.1
***** Bwa mem 24 threads: comme la version test...
$ cd /Work/Users/apraga/bisonex/script/files/tmp_63003856_S135
$ samtools view -c 63003856_S135.bam
128077211
$ samtools view -c /Work/Groups/bisonex/ref_63003856_S135/63003856_S135.bam
128077207
En ne conservant que les mapped reads, minime différence
$ samtools view -c -F 260 /Work/Groups/bisonex/ref_63003856_S135/63003856_S135.bam
127941051
$ samtools view -c -F 260 files/tmp_63003856_S135/63003856_S135.bam
127941054
**** STats (save)
post_cleanSam = _cleaned.bam
post_markDuplicate = _marked_dup.bam
post_BaseRecalibrator = _recal.table
post_ApplyBQSR = _recalibrated_hg38.bam
post_haplotypecaller = .vcf
post_depth_filter = _DP_over_30.vcf
post_exclude_SNP = _DP_over_30_not_SNP
post_consensual = _DP_over_30_not_SNP_consensual_sequence.vcf
post_technical = _DP_over_30_not_SNP_consensual_sequence_not_technical.vcf.gz
2 méthodes complémentaires
$ grep '^NC' /Work/Groups/bisonex/ref-vcf/63003856_S135.vcf | wc -l
1506894
$ grep '^NC' /Work/Groups/bisonex/ref-vcf/63003856_S135.vcf -c
1506894
Attention, on a un vcf.gz pour la version de test !!! ne pas utiliser le vcf
$ find . -name 63003856_S135.vcf.gz -exec sh -c "echo {}; zgrep '^NC' {} | wc -l " \;
./63003856_S135-gatk4.2.4.1/variantCalling/haplotypecaller/63003856_S135.vcf.gz
1506931
./63003856_S135-gatk-4.3.0.0/variantCalling/haplotypecaller/63003856_S135.vcf.gz
1506931
./63003856_S135-sequential-gatk-4.2.4.1/variantCalling/haplotypecaller/63003856_S135.vcf.gz
1506919
$ find . -name filter-depth.vcf -exec grep -H -c '^NC' {} \;
./63003856_S135-gatk4.2.4.1/variantCalling/filter-depth.vcf:82054
./63003856_S135-gatk-4.3.0.0/variantCalling/filter-depth.vcf:82054
./63003856_S135-sequential-gatk-4.2.4.1/variantCalling/filter-depth.vcf:82050
| | prod | pseudo-prod | pseudo-prod | test | test | test |
|------------------------------------------------------+-----------+-------------+-------------+----------+----------+------------|
| | | parallel | sequential | parallel | parallel | sequential |
| gatk | | 4.2.4.1 | | 4.2.4.1 | 4.3.0 | 4.2.3.1 |
|------------------------------------------------------+-----------+-------------+-------------+----------+----------+------------|
| bwa mem | 128077207 | 128077211 | | | | |
| cleanSam | 128077207 | | | | | |
| applybqsr | | 128077211 | | | | |
| haplotypecaller | 1506894 | 1506931 | | 1506931 | 1506931 | 1506919 |
| DP_over_30 | 82033 | 82054 | | 82054 | 82054 | 82050 |
| DP_over_30_not_SNP | 8864 | 8884 | | | | |
| DP_over_30_not_SNP_consensual_sequence | 8864 | 8884 | | 8898 | 8898 | 8900 |
| DP_over_30_not_SNP_consensual_sequence_not_technical | 6478 | | | | | |
tester sequential puis version spécifique bwa mem
On a une différence sur les ID clinvar not patho !!
au final, explique la différence avec le pseudo prod et test (mais pas le prod)
$ grep -c '^NC' ours.recode.vcf
8898
$ grep -c '^NC' lol.recode.recode.vcf
8884
** Divers
*** DONE Vérifier nombre de reads fastq - bam
CLOSED: [2022-10-09 Sun 22:31]
* Améliorations
** TODO Quality score recalibration avec un ensemble de fichier
Voir GATK best practice
** KILL Utiliser T-to-T comme références
CLOSED: [2023-01-01 Sun 21:35]
Semble compliqué avec les nouvelles bases de données
** TODO Macro excel
** TODO Utiliser le XML de clinvar
Extraction sous VCF possible avec
https://github.com/SeqOne/clinvcf
** Annotation
Liste complète
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9252745/
*** TODO Utilise une version allégée de GnomAD (une seule colonne)
*** TODO Digenisme (cf nomenclature omim)
C’est dans le nom de la maladie
* HOLD Implémenter d’autres pipeline
Voir https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04407-x
** KILL GATK
CLOSED: [2022 -11-11 Fri 20:01]
https://broadinstitute.github.io/warp/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README
A priori, respecte les bonnes pratiques
** KILL Essayer snmake avec bonne pratiques
https://github.com/snakemake-workflows/dna-seq-gatk-variant-calling/blob/main/.github/workflows/main.yml
Installer Mamba (micromamba ne fonctionne pas sous nix)
Ne fonctionne pas sous WSL2... MultiQC n’est pas assez à jour
Problèmes de versions...
** KILL Sarek
CLOSED: [2022-12-11 Sun 11:09]
*** Dépendences
**** Nix
#+begin_src sh
nix profile install nixpkgs#mosdepth nixpkgs#python3
nix-shell -p python310Packages.pyyaml --run "nextflow run nf-core/sarek -profile test --executor slurm --queue smp --outdir test -resume"
#+end_src
***** KILL derivation nix pour profile complet
CLOSED: [2022-12-11 Sun 11:09]
**** KILL Sans nix
CLOSED: [2022-09-24 Sat 10:20]
On utilise conda
#+begin_src sh
module unload nix
module load anaconda3@2021.05/gcc-12.1.0
module load nextflow@22.04.0/gcc-12.1.0
module load openjdk@11.0.14.1_1/gcc-12.1.0
nextflow run nf-core/sarek -profile conda,test --executor slurm --queue smp --outdir test -resume
#+end_src