apraga/org: notes/bisonex/performances.org

* Performances
:PROPERTIES:
:CUSTOM_ID: performances
:END:
Statistiques pour tout le pipeline - montrer le timeline report (à
partir du trace report) - regarder execution report
https://www.nextflow.io/docs/latest/tracing.html#execution-report

** BWA mem
:PROPERTIES:
:CUSTOM_ID: bwa-mem
:END:
Le plus simple est d'utiliser hyperfine pour lancer des runs successifs
en faisant varier le numbre de threads.

Il est trop long de tester plusieurs fois chaque possibilité. Test sur 3
runs en séquentiel (50G mémoire)

#+begin_example
  Time (mean ± σ):     13725.110 s ± 42.090 s    [User: 13870.522 s, System: 12.543 s]
  Range (min … max):   13677.219 s … 13756.226 s    3 runs
#+end_example

Donc un seul run par configuration avec une demande de 32 coeurs et 50G
(en prod, 24 coeurs et 50G) de manière décroissante

#+begin_example
#!/bin/bash -l
# Fichier submission.SBATCH

#SBATCH --job-name="speedup-bwa"
#SBATCH --output=%x.%J.out   ## %x=job name, %J=job id
#SBATCH --error=%x.%J.out
 # walltime (hh:mm::ss) max is 8 days
#SBATCH -t 24:00:00
#SBATCH --partition=smp
#SBATCH -c 32 ## request 16 cores (MAX is 32)
#SBATCH --mem=50G ## (MAX is 96G)
#SBATCH --mail-user=apraga@chu-besancon.fr
#SBATCH --mail-type=END,FAIL   # notify when job end/fail 

module load nix/2.11.0

# Requires a working directory by nextflow
cd /Work/Users/apraga/bisonex/work/8c/cf49fd4508404faa6986ca4b211e49
INDEX=`find -L ./ -name "*.amb" | sed 's/\.amb$//'`
# 1 run for each numbers of threads
# JSON output is needed to have the stats for each run
# Don't do all configuration and starts from the more expensive

hyperfine  --export-json hyperfine.json -L threads 32,24,16,8,4,2,1 --runs 1 "bwa mem -R '@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tPM:Miseq\tCN:CHU_Minjoz\tLB:definition_to_add' -t {threads} $INDEX k12\:15b90d6813adf03787af0b2b90d708f1.fastq.gz k12\:9c41c3effaeead78e1becee8a49cb0ea.fastq.gz"
#+end_example

Note : il faut bien avoir le JSON en format de sortie pour avoir le
temps de chaque run et pas seulement les statitistiques. Note : penser à
bien remplacer le nombre de threads !

Résultat dans research/bisonex/code/plot/speedup-bwa.json

#+begin_example
Benchmark 1: bwa mem -R '@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tPM:Miseq\tCN:CHU_Minjoz\tLB:definition_to_add' -t 32 ./bwa/GCA_000001405.15_GRCh38_full_analysis_set.fna k12\:15b90d6813adf03787af0b2b90d708f1.fastq.gz k12\:9c41c3effaeead78e1becee8a49cb0ea.fastq.gz
  Time (abs ≡):        684.408 s               [User: 18007.678 s, System: 115.024 s]

Benchmark 2: bwa mem -R '@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tPM:Miseq\tCN:CHU_Minjoz\tLB:definition_to_add' -t 24 ./bwa/GCA_000001405.15_GRCh38_full_analysis_set.fna k12\:15b90d6813adf03787af0b2b90d708f1.fastq.gz k12\:9c41c3effaeead78e1becee8a49cb0ea.fastq.gz
  Time (abs ≡):        760.921 s               [User: 17606.845 s, System: 102.416 s]

Benchmark 3: bwa mem -R '@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tPM:Miseq\tCN:CHU_Minjoz\tLB:definition_to_add' -t 16 ./bwa/GCA_000001405.15_GRCh38_full_analysis_set.fna k12\:15b90d6813adf03787af0b2b90d708f1.fastq.gz k12\:9c41c3effaeead78e1becee8a49cb0ea.fastq.gz
  Time (abs ≡):        1115.368 s               [User: 17417.684 s, System: 115.360 s]

Benchmark 4: bwa mem -R '@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tPM:Miseq\tCN:CHU_Minjoz\tLB:definition_to_add' -t 8 ./bwa/GCA_000001405.15_GRCh38_full_analysis_set.fna k12\:15b90d6813adf03787af0b2b90d708f1.fastq.gz k12\:9c41c3effaeead78e1becee8a49cb0ea.fastq.gz
  Time (abs ≡):        2149.281 s               [User: 17083.367 s, System: 91.516 s]

Benchmark 5: bwa mem -R '@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tPM:Miseq\tCN:CHU_Minjoz\tLB:definition_to_add' -t 4 ./bwa/GCA_000001405.15_GRCh38_full_analysis_set.fna k12\:15b90d6813adf03787af0b2b90d708f1.fastq.gz k12\:9c41c3effaeead78e1becee8a49cb0ea.fastq.gz
  Time (abs ≡):        4262.605 s               [User: 16936.132 s, System: 177.830 s]

Benchmark 6: bwa mem -R '@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tPM:Miseq\tCN:CHU_Minjoz\tLB:definition_to_add' -t 2 ./bwa/GCA_000001405.15_GRCh38_full_analysis_set.fna k12\:15b90d6813adf03787af0b2b90d708f1.fastq.gz k12\:9c41c3effaeead78e1becee8a49cb0ea.fastq.gz
  Time (abs ≡):        8313.601 s               [User: 16411.816 s, System: 332.537 s]

Benchmark 7: bwa mem -R '@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tPM:Miseq\tCN:CHU_Minjoz\tLB:definition_to_add' -t 1 ./bwa/GCA_000001405.15_GRCh38_full_analysis_set.fna k12\:15b90d6813adf03787af0b2b90d708f1.fastq.gz k12\:9c41c3effaeead78e1becee8a49cb0ea.fastq.gz
  Time (abs ≡):        15049.006 s               [User: 15075.411 s, System: 134.121 s]

Summary
  bwa mem -R '@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tPM:Miseq\tCN:CHU_Minjoz\tLB:definition_to_add' -t 32 ./bwa/GCA_000001405.15_GRCh38_full_analysis_set.fna k12\:15b90d6813adf03787af0b2b90d708f1.fastq.gz k12\:9c41c3effaeead78e1becee8a49cb0ea.fastq.gz ran
    1.11 times faster than bwa mem -R '@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tPM:Miseq\tCN:CHU_Minjoz\tLB:definition_to_add' -t 24 ./bwa/GCA_000001405.15_GRCh38_full_analysis_set.fna k12\:15b90d6813adf03787af0b2b90d708f1.fastq.gz k12\:9c41c3effaeead78e1becee8a49cb0ea.fastq.gz
    1.63 times faster than bwa mem -R '@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tPM:Miseq\tCN:CHU_Minjoz\tLB:definition_to_add' -t 16 ./bwa/GCA_000001405.15_GRCh38_full_analysis_set.fna k12\:15b90d6813adf03787af0b2b90d708f1.fastq.gz k12\:9c41c3effaeead78e1becee8a49cb0ea.fastq.gz
    3.14 times faster than bwa mem -R '@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tPM:Miseq\tCN:CHU_Minjoz\tLB:definition_to_add' -t 8 ./bwa/GCA_000001405.15_GRCh38_full_analysis_set.fna k12\:15b90d6813adf03787af0b2b90d708f1.fastq.gz k12\:9c41c3effaeead78e1becee8a49cb0ea.fastq.gz
    6.23 times faster than bwa mem -R '@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tPM:Miseq\tCN:CHU_Minjoz\tLB:definition_to_add' -t 4 ./bwa/GCA_000001405.15_GRCh38_full_analysis_set.fna k12\:15b90d6813adf03787af0b2b90d708f1.fastq.gz k12\:9c41c3effaeead78e1becee8a49cb0ea.fastq.gz
   12.15 times faster than bwa mem -R '@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tPM:Miseq\tCN:CHU_Minjoz\tLB:definition_to_add' -t 2 ./bwa/GCA_000001405.15_GRCh38_full_analysis_set.fna k12\:15b90d6813adf03787af0b2b90d708f1.fastq.gz k12\:9c41c3effaeead78e1becee8a49cb0ea.fastq.gz
   21.99 times faster than bwa mem -R '@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tPM:Miseq\tCN:CHU_Minjoz\tLB:definition_to_add' -t 1 ./bwa/GCA_000001405.15_GRCh38_full_analysis_set.fna k12\:15b90d6813adf03787af0b2b90d708f1.fastq.gz k12\:9c41c3effaeead78e1becee8a49cb0ea.fastq.gz## Haplotypecaller
#+end_example

** Haplotypecaller
:PROPERTIES:
:CUSTOM_ID: haplotypecaller
:END:
Pour haplotypecaller, il n'y a plus moyen de spécifier les threads avec
la version 4... Est-ce vraiment parallélisé ? PairHMM au moins est
parallélisé avec OpenMP. On teste avec

#+begin_example
#!/bin/bash -l
# Fichier submission.SBATCH

#SBATCH --job-name="speedup-haplo"
#SBATCH --output=%x.%J.out   ## %x=job name, %J=job id
#SBATCH --error=%x.%J.out
 # walltime (hh:mm::ss) max is 8 days
#SBATCH -t 24:00:00
#SBATCH --partition=smp
#SBATCH -c 16 ## request 16 cores (MAX is 32)
#SBATCH --mem=16G ## (MAX is 96G)
#SBATCH --mail-user=apraga@chu-besancon.fr
#SBATCH --mail-type=END,FAIL   # notify when job end/fail 

module load nix/2.11.0

cd /Work/Users/apraga/bisonex/work/d4/5b6f4963533a6319c3c67d4731e511
hyperfine --export-json hyperfine-haplo -L threads 12,6,4,2,1 --runs 1 "gatk --java-options \"-Xmx13107M\" HaplotypeCaller  --input HG001-HiSeq4000-Agilentv7-GRCh38.recal.bam  --output HG001-HiSeq4000-Agilentv7-GRCh38.haplotypecaller.vcf.gz  --reference GCA_000001405.15_GRCh38_full_analysis_set.fna  --dbsnp GCF_000001405.40.gz  --tmp-dir .  --max-mnp-distance 2"
#+end_example

Résultat dans research/bisonex/code/plot/speedup-haplo.json

#+begin_example
Benchmark 1: gatk --java-options "-Xmx13107M" HaplotypeCaller  --input HG001-HiSeq4000-Agilentv7-GRCh38.recal.bam  --output HG001-HiSeq4000-Agilentv7-GRCh38.haplotypecaller.vcf.gz  --reference GCA_000001405.15_GRCh38_full_analysis_set.fna  --dbsnp GCF_000001405.40.gz  --tmp-dir .  --max-mnp-distance 2 (threads = 12)
  Time (abs ≡):        14428.832 s               [User: 14399.964 s, System: 213.747 s]

Benchmark 2: gatk --java-options "-Xmx13107M" HaplotypeCaller  --input HG001-HiSeq4000-Agilentv7-GRCh38.recal.bam  --output HG001-HiSeq4000-Agilentv7-GRCh38.haplotypecaller.vcf.gz  --reference GCA_000001405.15_GRCh38_full_analysis_set.fna  --dbsnp GCF_000001405.40.gz  --tmp-dir .  --max-mnp-distance 2 (threads = 6)
  Time (abs ≡):        14164.462 s               [User: 14128.192 s, System: 198.870 s]

Benchmark 3: gatk --java-options "-Xmx13107M" HaplotypeCaller  --input HG001-HiSeq4000-Agilentv7-GRCh38.recal.bam  --output HG001-HiSeq4000-Agilentv7-GRCh38.haplotypecaller.vcf.gz  --reference GCA_000001405.15_GRCh38_full_analysis_set.fna  --dbsnp GCF_000001405.40.gz  --tmp-dir .  --max-mnp-distance 2 (threads = 4)
  Time (abs ≡):        14215.604 s               [User: 14214.320 s, System: 216.795 s]

Benchmark 4: gatk --java-options "-Xmx13107M" HaplotypeCaller  --input HG001-HiSeq4000-Agilentv7-GRCh38.recal.bam  --output HG001-HiSeq4000-Agilentv7-GRCh38.haplotypecaller.vcf.gz  --reference GCA_000001405.15_GRCh38_full_analysis_set.fna  --dbsnp GCF_000001405.40.gz  --tmp-dir .  --max-mnp-distance 2 (threads = 2)
  Time (abs ≡):        14308.381 s               [User: 14294.416 s, System: 213.043 s]

Benchmark 5: gatk --java-options "-Xmx13107M" HaplotypeCaller  --input HG001-HiSeq4000-Agilentv7-GRCh38.recal.bam  --output HG001-HiSeq4000-Agilentv7-GRCh38.haplotypecaller.vcf.gz  --reference GCA_000001405.15_GRCh38_full_analysis_set.fna  --dbsnp GCF_000001405.40.gz  --tmp-dir .  --max-mnp-distance 2 (threads = 1)
  Time (abs ≡):        14136.094 s               [User: 14064.118 s, System: 175.672 s]

Summary
  gatk --java-options "-Xmx13107M" HaplotypeCaller  --input HG001-HiSeq4000-Agilentv7-GRCh38.recal.bam  --output HG001-HiSeq4000-Agilentv7-GRCh38.haplotypecaller.vcf.gz  --reference GCA_000001405.15_GRCh38_full_analysis_set.fna  --dbsnp GCF_000001405.40.gz  --tmp-dir .  --max-mnp-distance 2 (threads = 1) ran
    1.00 times faster than gatk --java-options "-Xmx13107M" HaplotypeCaller  --input HG001-HiSeq4000-Agilentv7-GRCh38.recal.bam  --output HG001-HiSeq4000-Agilentv7-GRCh38.haplotypecaller.vcf.gz  --reference GCA_000001405.15_GRCh38_full_analysis_set.fna  --dbsnp GCF_000001405.40.gz  --tmp-dir .  --max-mnp-distance 2 (threads = 6)
    1.01 times faster than gatk --java-options "-Xmx13107M" HaplotypeCaller  --input HG001-HiSeq4000-Agilentv7-GRCh38.recal.bam  --output HG001-HiSeq4000-Agilentv7-GRCh38.haplotypecaller.vcf.gz  --reference GCA_000001405.15_GRCh38_full_analysis_set.fna  --dbsnp GCF_000001405.40.gz  --tmp-dir .  --max-mnp-distance 2 (threads = 4)
    1.01 times faster than gatk --java-options "-Xmx13107M" HaplotypeCaller  --input HG001-HiSeq4000-Agilentv7-GRCh38.recal.bam  --output HG001-HiSeq4000-Agilentv7-GRCh38.haplotypecaller.vcf.gz  --reference GCA_000001405.15_GRCh38_full_analysis_set.fna  --dbsnp GCF_000001405.40.gz  --tmp-dir .  --max-mnp-distance 2 (threads = 2)
    1.02 times faster than gatk --java-options "-Xmx13107M" HaplotypeCaller  --input HG001-HiSeq4000-Agilentv7-GRCh38.recal.bam  --output HG001-HiSeq4000-Agilentv7-GRCh38.haplotypecaller.vcf.gz  --reference GCA_000001405.15_GRCh38_full_analysis_set.fna  --dbsnp GCF_000001405.40.gz  --tmp-dir .  --max-mnp-distance 2 (threads = 12)# Reproductibilité
#+end_example

On utilise nix 23.05 dans un flake : les version ne bougent donc pas.
Liste des versions données par

#+begin_example
nix profile list | awk '{print $4}' | awk -F '-' '{print "|" $2" | "$3}'
#+end_example

| Logiciels  | Version validée |
|------------+-----------------|
| R          | 4.2.3           |
| bwa        | unstable        |
| rtg-tools  | ?s              |
| gatk       | 4.4.0.0         |
| spip       |                 |
| awscli2    | 2.11.20         |
| fastqc     |                 |
| hap.py     |                 |
| htslib     | 1.17            |
| multiqc    | 1.15            |
| picard     | ?               |
| python3    | 3.10.12         |
| zoxide     | 0.9.2           |
| bcftools   | 1.17            |
| bedtools   | 2.31.0          |
| dos2unix   | 7.4.4           |
| ensembl    | perl5.36.0 + ?  |
| mosdepth   | 0.3.3           |
| nextflow   | 22.10.6         |
| samtools   | 1.17            |
| spliceai   | 1.3.1           |
| vcftools   | 0.1.16          |
| ensembl    | perl5.36.0      |
| sratoolkit | 2.11.3          |