apraga/org - Change FM3FVI2I6JNQY5I6FACRNDWHRXVAGQQRZXR2WGYOASM6H56JSK5QC

Bisonex save

Created by Alexis Praga on April 28, 2023

FM3FVI2I6JNQY5I6FACRNDWHRXVAGQQRZXR2WGYOASM6H56JSK5QC

Dependencies

In channels

main

Change contents

Replacement in workout.org at line 6125 [7.1]
B:BD[5.41] → [6.1:7]
```
* RTO
```
[5.41]
[6.7]
```
** RTO
```
Replacement in workout.org at line 6127 [7.1]
B:BD[6.18] → [6.18:26]
```
* L-sit
```
[6.18]
[6.26]
```
** L-sit
```
Replacement in workout.org at line 6129 [7.1]
B:BD[6.30] → [6.30:42]
```
* Muscle-up
```
[6.30]
[6.42]
```
** Muscle-up
```
Replacement in workout.org at line 6133 [7.1]
B:BD[6.85] → [6.85:98]
```
* Extension:
```
[6.85]
[6.98]
```
** Extension:
```
Replacement in workout.org at line 6135 [7.1]
B:BD[6.106] → [6.106:124]
```
* FL tucked row :
```
[6.106]
[6.124]
```
** FL tucked row :
```
Replacement in workout.org at line 6137 [7.1]
B:BD[6.133] → [6.133:145]
```
* Pistols :
```
[6.133]
[6.145]
```
** Pistols :
```
Replacement in workout.org at line 6139 [7.1]
B:BD[6.151] → [6.151:177]
```
* Planche tucked push-up:
```
[6.151]
[6.177]
```
** Planche tucked push-up:
```
Replacement in workout.org at line 6143 [7.1]
B:BD[6.207] → [6.207:222]
```
* Compression:
```
[6.207]
[6.222]
```
** Compression:
```
Replacement in workout.org at line 6145 [7.1]
B:BD[6.229] → [6.229:246]
```
* Norwegian roll
```
[6.229]
[6.246]
```
** Norwegian roll
```

Insertion in workout.org at line 6153 [7.1]

[4.126]

* <2023-04-26 Wed> Workout
** RTO
- 28-15-13
** L-sit
- 3x1
** Muscle-up
- 2+1+1 - 4neg
- 2+2 - 4neg
- 2+2 - 4neg
** Extension:
-  3x22
** FL tucked row :
- 3+2
- 3+2
- 3+2
** Pistols :
- 3x4
** Planche tucked push-up:
- 3+3+3+1
- 3+3+2+2
- 3+3+3+1
** Compression:
- 3x10
** Norwegian roll
- 3x5

Replacement in projects.org at line 2146 [7.123895]

B:BD[8.1242] → [8.1242:1300]

*** TODO Carte personnalisée
SCHEDULED: <2023-04-17 Mon>

[8.1242]

[5.239]

*** DONE Carte personnalisée
CLOSED: [2023-04-26 Wed 21:10] SCHEDULED: <2023-04-17 Mon>

Insertion in projects.org at line 2151 [7.123895]

[9.89]

[10.316]

** WAIT Remboursement TER Grenoble du 9 avril
SCHEDULED: <2023-05-03 Wed>
/Entered on/ [2023-04-26 Wed 21:03]
Demande envoyé <2023-04-26 Wed>

Replacement in projects/bisonex.org at line 2 [12.35]

B:BD[11.74] → [11.74:8206]

B:BD[11.8206] → [13.7:8259]

atforms
 were 100 base pair read length while all Illumina platforms were 150 base pair
 read length). The read length effects, as a key factor between two platforms,
 would bring alignment bias and error which are higher for short reads and
 ultimately affect the variants calling especially the INDELs identification
#+end_quote
*** Débugger variant calling (haplotypecaller)
https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant
https://gatk.broadinstitute.org/hc/en-us/articles/360035891111-Expected-variant-at-a-specific-site-was-not-called
*** Hap.py
Format de sortie :
#+begin_src r
vcf_field_names(vcf, tag = "FORMAT")
#+end_src
#+RESULTS:
: FORMAT BD    1      String  Decision for call (TP/FP/FN/N)
: FORMAT BK    1      String  Sub-type for decision (match/mismatch type)
: FORMAT BVT   1      String  High-level variant type (SNP|INDEL).
: FORMAT BLT   1      String  High-level location type (het|homref|hetalt|homa
am = genotype mismatch
lm = allele/haplotype mismatch
. = non vu
**** On vérifie que am = genotype mismatch
référence  = T/T
high-confidence = T/C
notre = C/C
#+begin_src sh
bcftools filter -i 'POS=19196584'  /Work/Groups/bisonex/data/giab/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz | grep -v '#'
bcftools filter -i 'POS=19196584'  ../out/NA12878_NIST7035-dbsnp/variantCalling/haplotypecaller/NA12878_NIST.vcf.gz | grep -v '#'
#+end_src
#+RESULTS:
: NC_000022.11    19196584        .       T       C       50      PASS    platforms=5;platformnames=Illumina,PacBio,10X,Ion,Solid;datasets=5;datasetnames=HiSeqPE300x,CCS15kb_20kb,10XChromiumLR,IonExome,SolidSE75bp;callsets=7;callsetnames=HiSeqPE300xGATK,CCS15kb_20kbDV,CCS15kb_20kbGATK4,HiSeqPE300xfreebayes,10XLRGATK,IonExomeTVC,SolidSE75GATKHC;datasetsmissingcall=CGnormal;callable=CS_HiSeqPE300xGATK_callable,CS_CCS15kb_20kbDV_callable,CS_10XLRGATK_callable,CS_CCS15kb_20kbGATK4_callable,CS_HiSeqPE300xfreebayes_callable GT:PS:DP:ADALL:AD:GQ    0/1:.:781:109,123:138,150:348
: NC_000022.11    19196584        rs1061325       T       C       59.32   PASS    AC=2;AF=1;AN=2;DB;DP=2;ExcessHet=0;FS=0;MLEAC=1;MLEAF=0.5;MQ=60;QD=29.66;SOR=2.303      GT:AD:DP:GQ:PL  1/1:0,2:2:6:71,6,0
**** On vérifie que lm = allele/haplotype mismatch
référence  = CAA/CAA
high-confidence = CA/CA
notre = C/CA
#+begin_src sh
 bcftools filter -i 'POS=31277416'  /Work/Groups/bisonex/data/giab/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz | grep -v '#'
 bcftools filter -i 'POS=31277416'  ../out/NA12878_NIST7035-dbsnp/variantCalling/haplotypecaller/NA12878_NIST.vcf.gz | grep -v '#'
#+end_src
#+RESULTS:
: NC_000022.11    31277416        .       CA      C       50      PASS    platforms=3;platformnames=Illumina,PacBio,10X;datasets=3;datasetnames=HiSeqPE300x,CCS15kb_20kb,10XChromiumLR;callsets=4;callsetnames=HiSeqPE300xGATK,CCS15kb_20kbDV,10XLRGATK,HiSeqPE300xfreebayes;datasetsmissingcall=CCS15kb_20kb,CGnormal,IonExome,SolidSE75bp;callable=CS_HiSeqPE300xGATK_callable;difficultregion=GRCh38_AllHomopolymers_gt6bp_imperfectgt10bp_slop5,GRCh38_SimpleRepeat_imperfecthomopolgt10_slop5  GT:PS:DP:ADALL:AD:GQ    1/1:.:465:16,229:0,190:129
: NC_000022.11    31277416        rs57244615      CAA     C,CA    389.02  PASS    AC=1,1;AF=0.5,0.5;AN=2;BaseQRankSum=0.37;DB;DP=37;ExcessHet=0;FS=0;MLEAC=1,1;MLEAF=0.5,0.5;MQ=60;MQRankSum=0;QD=13.41;ReadPosRankSum=-0.651;SOR=0.572    GT:AD:DP:GQ:PL  1/2:5,10,14:29:64:406,202,313,64,0,88
* Idées
** Validation analytique
mail Yannis : données patients +/- simulées
*** Utiliser données GCAT et uploader le notre ?
https://www.nature.com/articles/ncomms7275
*** [#A] Variant calling : Genome in a bottle : NA12878 + autres
Résumé : https://www.nist.gov/programs-projects/genome-bottle
Manuscript : https://www.nature.com/articles/s41587-019-0054-x.epdf?author_access_token=E_1bL0MtBBwZr91xEsy6B9RgN0jAjWel9jnR3ZoTv0OLNnFBR7rUIZNDXq0DIKdg3w6KhBF8Rz2RWQFFc0St45kC6CZs3cDYc87HNHovbWSOubJHDa9CeJV-pN0BW_mQ0n7cM13KF2JRr_wAAn524w%3D%3D
Article comparant les variant calling : https://www.biorxiv.org/content/10.1101/2020.12.11.422022v1.full.pdf
**** KILL Tester le séquencage aussi
CLOSED: [2023-01-30 lun. 18:30]
Depuis un fastq correspondant à Illumina  https://github.com/genome-in-a-bottle/giab_data_indexes
   puis on compare le VCF avec les "high confidence"
On séquence directement NA12878 -> inutile pour le pipeline seul
**** TODO Tester seul la partie bioinformatique
   Tout résumé ici : https://www.nist.gov/programs-projects/genome-bottle
- methode https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/analysis/Illumina_PlatinumGenomes_NA12877_NA12878_09162015/IlluminaPlatinumGenomes-user-guide.pdf
- vcf
     https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/latest/GRCh38/
NB: à quoi correspond https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/analysis/Illumina_PlatinumGenomes_NA12877_NA12878_09162015/hg38/2.0.1/NA12878/ ??
   Article comparant les variant calling : https://www.biorxiv.org/content/10.1101/2020.12.11.422022v1.full.pdf
   Article pour vcfeval : https://www.nature.com/articles/s41587-019-0054-x
   La version 4 ajoute 273 gènes "clinically relevant" https://www.biorxiv.org/content/10.1101/2021.06.07.444885v3.full.pdf
   Ajout des zones "difficiles"
   https://www.biorxiv.org/content/10.1101/2020.07.24.212712v5.full.pdf
*** [#B] Pipeline : générer patient avec tous les variants retrouvés à Centogene
Comparaison de génération ADN (2019)
https://academic.oup.com/bfg/article/19/1/49/5680294
**** SimuSCop
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03665-5
https://github.com/qasimyu/simuscop
1. Crééer un modèle depuis bam + vcf : Setoprofile
2. Génerer données NGS
** Annotation :
*** Comparaison vep / snpeff et annovar
* Changement nouvelle version
- Dernière version du génome (la version "prête à l'emploi" est seulement GRCh38 sans les version patchées)
* Notes
** Nextflow
*** afficher les résultats d'un process/workflow
#+begin_src
lol.out.view()
#+end_src
Attention, ne fonctionne pas si plusieurs sortie:
#+begin_src
lol.out[0].view()
#+end_src
ou si /a/ est le nom de la sortie
#+begin_src
lol.out.a.view()
#+end_src
** Quelle version du génome ?
Il y a 2 notations pour les chrosome: Refseq (NC_0001) ou chr1, chr2...
dbSNP utilise Refseq
pour le fasta, 2 solutions
- refseq : "https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/${genome}_latest/refseq_identifiers/${fna}.gz"
  -> nécessite d'indexer le fichier (long !)
- chromosome https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/
  -> nécessite d'annoter les chromosomes pour corriger (avec le fichier gff)
  On utilise la version chromosome donc on annote dbSNP (à faire)
** Performances
Ordinateur de Carine (WSL2) : 4h dont 1h15 alignement (parallélisé) et 1h15 haplotypecaller (séquentiel)
** Chromosomes NC, NT, NW
Correspondance :
https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&chromInfoPage=
Signification
https://genome.ucsc.edu/FAQ/FAQdownloads.html#downloadAlt
- alt = séquences alternatives (utilisables)
- fix = patch (correction ou amélioration)
- random = séquence connue sur un chromosome mais non encore utilisée
** Pipelines prêt-à-l’emploi nextflow
Problème : nécessite singularity ou docker (ou conda)
Potentiellement utilisable avec nix...
** Validation : Quelles données de référence ?
Discussion avec Alexis
- Platinum genomes = génome seul
*** [[https://github.com/genome-in-a-bottle/giab_data_indexes][Genome in a bottle]]
  - NA12878 :
    - Illumina HiSeq Exome : fastq + capture
    - Illumina TruSeq Exome : bam, pas de capture
      ici ww
  - HG002,3,4
    - Illumina Whole Exome  : bam. le kit de capture est "Agilent SureSelect Human All Exon V5 kit" selon [[https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/Os
loUniversityHospital_Exome_GATK_jointVC_11242015/README.txt][README]]. On il faut les régions [[https://kb.10xgenomics.com/hc/en-us/articles/115004150923-Where-can-I-find-the-Agilent-Target-BED-files-][selon ce site]]
      Un autre fichier est disponible (capture ???)
    https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/OsloUniversityHospital_Exome_GATK_jointVC_11242015/wex_Agilent_SureSelect_v05_b37.baits.slop50.merged.list
  "target region" +/- 50bp
    testé sur chr311780-312086 : ok
Autres technologies non adaptées au pipeline (vu avec Alexis)
*** [[https://www.illumina.com/platinumgenomes.html][Platinum genome
]] Que du génome « sequenced to 50x depth on a HiSeq 2000 system”
Genome possible
** Zone de capture
GIAB fourni le .bed pour l'exome . INfo : https://support.illumina.com/sequencing/sequencing_kits/nextera-rapid-capture-exome-kit/downloads.html
* Données :data:
** DONE Remplacer bam par fastq sur mesocentre
CLOSED: [2023-04-16 Sun 16:33]
Commande
*** DONE Supprimer les fastq non "paired"
CLOSED: [2023-04-16 Sun 16:33]
nushell
Liste des fastq avec "paired-end" manquant
#+begin_src nu
ls **/*.fastq.gz | get name | path basename | split column "_" | get column1 | uniq -u | save single.txt
#+end_src
#+RESULTS:
: 62907927
: 62907970
: 62899606
: 62911287
: 62913201
: 62914084
: 62915905
: 62921595
: 62923065
: 62925220
: 62926503
: 62926502
: 62926500
: 62926499
: 62926498
: 62931719
: 62943423
: 62943400
: 62948290
: 62949205
: 62949206
: 62949118
: 62951284
: 62960792
: 62960785
: 62960787
: 62960617
: 62962561
: 62962692
: 62967473
: 62972194
: 62979102
On vérifie
#+begin_src nu
open single.txt  | lines | each {|e| ls $"fastq/*_($in)/*" | get 0  }
open single.txt  | lines | each {|e| ls $"fastq/*_($in)/*" | get 0.name }  | path basename | split column "_" | get column1 | uniq -c
#+end_src
On met tous dans un dossier (pas de suppression )
#+begin_src
open single.txt  | lines | each {|e| ls $"fastq/*_($in)/*" | get 0  }  | each {|e| ^mv $e.name bad-fastq/}
#+end_src
On vérifie que les dossiier sont videsj
 open single.txt  | lines | each {|e| ls $"fastq/*_($in)" | get 0.name } | ^ls -l $in
 Puis on supprime
 open single.txt  | lines | each {|e| ls $"fastq/*_($in)" | get 0.name } | ^rm -r $in
*** DONE Supprimer bam qui ont des fastq
CLOSED: [2023-04-16 Sun 16:33]
On liste les identifiants des fastq et bam dans un tableau avec leur type :
#+begin_src
let fastq = (ls fastq/*/*.fastq.gz | get name | parse "{dir}/{full_id}/{id}_{R}_001.fastq.gz"  | select dir id | uniq )
let bam = (ls bam/*/*.bam | get name | parse "{dir}/{full_id}/{id}_{S}.bqrt.bam"  | select dir id)
#+end_src
On groupe les résultat par identifiant (résultats = liste de records qui doit être convertie en table)
et on trie ceux qui n'ont qu'un fastq ou un bam
#+begin_src
let single = ( $bam | append $fastq | group-by id | transpose id files | get files | where {|x| ($x | length) == 1})
#+end_src
On convertit en table et on récupère seulement les bam
#+begin_src
$single | reduce {|it, acc| $acc | append $it} | where dir == bam | get id | each {|e| ^ls $"bam/*_($e)/*.bam"}
#+end_src
#+RESULTS:
: bam/2100656174_62913201/62913201_S52.bqrt.bam
: bam/2100733271_62925220/62925220_S33.bqrt.bam
: bam/2100738763_62926502/62926502_S108.bqrt.bam
: bam/2100746726_62926498/62926498_S105.bqrt.bam
: bam/2100787936_62931955/62931955_S4.bqrt.bam
: bam/2200066374_62948290/62948290_S130.bqrt.bam
: bam/2200074722_62948298/62948298_S131.bqrt.bam
: bam/2200074990_62948306/62948306_S218.bqrt.bam
: bam/2200214581_62967331/62967331_S267.bqrt.bam
: bam/2200225399_62972187/62972187_S85.bqrt.bam
: bam/2200293962_62979117/62979117_S63.bqrt.bam
: bam/2200423985_62999352/62999352_S1.bqrt.bam
: bam/2200495073_63010427/63010427_S20.bqrt.bam
: bam/2200511274_63012586/63012586_S114.bqrt.bam
: bam/2200669188_63036688/63036688_S150.bqrt.bam
* Nouveau workflow :workflow:
** TODO Bases de données
*** KILL Nix pour télécharger les données brutes
**** Conclusion
Non viable sur cluster car en dehors de /nix/store
On peut utiliser des symlink mais trop compliqué
**** KILL Axel au lieu de curl pour gérer les timeout?
CLOSED: [2022-08-19 Fri 15:18]
*** DONE Tester patch de @pennae pour gros fichiers
SCHEDULED: <2022-08-19 Fri>
*** STRT Télécharger les données avec nextflow
**** DONE Genome de référence
**** DONE dbSNP
**** TODO VEP 20G
Ajout vérification checksum -> à vérifier
**** TODO transcriptome (spip)
Rajouter checksum manuel
**** KILL Refseq
**** STRT OMIM
codé, à vérifier
**** TODO ACMG incidental
*** HOLD Processing bases de données
**** DONE dbSNP common
**** DONE Seulement les ID dans dbSNP common !
CLOSED: [2022-11-19 Sat 21:42]
172G au lieu de 253M...
**** HOLD common dbSNP not clinvar patho
***** DONE Conclusion partielle
CLOSED: [2022-12-12 Mon 22:25]
- vcfeval : prometteur mais n'arrive pas à traiter toutes les régions
- isec : trop de problèmes avec
- classif clinvar directement dans dbSNP: le plus simple
  Et ça permet de rattraper quelques erreurs dans le script d'Alexis
***** KILL Utiliser directement le numéro dbSNP dans clinvar ? Non
CLOSED: [2022-11-20 Sun 19:51]
Ex: chr20
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools query -f 'rs%INFO/RS \n' -i 'INFO/RS != "." & INFO/CLNSIG="Pathogenic"' clinvar_chr20.vcf.gz | sort > ID_clinvar_patho.txt
bcftools query -f '%ID\n' dbSNP_common_chr20.vcf.gz | sort > ID_of_common_snp.txt
comm -23 ID_of_common_snp.txt ID_clinvar_patho.txt > ID_of_common_snp_not_clinvar_patho.txt
wc -l ID_of_common_snp_not_clinvar_patho.txt
# sort ID
#+end_src
#+RESULTS:
: 518846 ID_of_common_snp_not_clinvar_patho.txt
Version d'alexis
#+begin_src sh :dir ~/code/bisonex/test_isec
snp=dbSNP_common_chr20.vcf.gz
clinvar=clinvar_chr20_notremapped.vcf.gz
python ../script/pythonScript/clinvar_sbSNP.py \
    --clinvar $clinvar \
    --chrm_name_table ../database/RefSeq/refseq_to_number_only_consensual.txt \
    --dbSNP $snp --output prod.txt
wc -l prod.txt
zgrep '^NC' dbSNP_common_chr20.vcf.gz | wc -l
#+end_src
#+RESULTS:
| 518832 | prod.txt |
| 518846 |          |
***** KILL classification clinvar codée dbSNP ?
CLOSED: [2022-12-04 Sun 14:38]
Sur le chromosome 20
*Attention* CLNSIG a plusieurs champs (séparé par une virgule)
On y accède avec INFO/CLNSIG[*]
Ensuite, chaque item peut avoir plusieurs haploïdie (séparé par un |). IL faut donc utiliser une regexp
NB: *ne pas mettre la condition* dans une variable !!
Pour avoir les clinvar patho, on veut 5 mais pas 255 (= autre) pour la classification !`
Il faut également les likely patho et conflicting
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools query -f '%INFO/CLNSIG\n' dbSNP_common_chr20.vcf.gz -i \
'INFO/CLNSIG[*]~"^5|" | INFO/CLNSIG[*]=="5" | INFO/CLNSIG[*]~"|5" | INFO/CLNSIG[*]~"^4|" | INFO/CLNSIG[*]=="4" | INFO/CLNSIG[*]~"|4" | INFO/CLNSIG[*]~"^12|" | INFO/CLNSIG[*]=="12" | INFO/CLNSIG[*]~"|12"' | sort
#+end_src
#+RESULTS:
| . |  . | 12 |    |   |   |   |   |   |   |   |
| . | 12 |  0 |  2 |   |   |   |   |   |   |   |
| 2 |  3 |  2 |  2 | 2 | 5 | . |   |   |   |   |
| . |  2 |  3 |  2 | 2 | 4 |   |   |   |   |   |
| . |  . |  3 | 12 | 3 |   |   |   |   |   |   |
| . |  5 |  2 |  . |   |   |   |   |   |   |   |
| . |  . |  . |  5 | 2 | 2 |   |   |   |   |   |
| . |  9 |  9 |  9 | 5 | 5 | 2 | 3 | 2 | 3 | 2 |
Si on les exclut :
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools query -f '%ID\n' dbSNP_common_chr20.vcf.gz -e \
'INFO/CLNSIG[*]~"^5|" | INFO/CLNSIG[*]=="5" | INFO/CLNSIG[*]~"|5" | INFO/CLNSIG[*]~"4" | INFO/CLNSIG[*]~"12"' | sort | uniq > common-notpatho.txt
#+end_src
#+RESULTS:
 #+begin_src sh :dir ~/code/bisonex/test_isec
snp=dbSNP_common_chr20.vcf.gz
clinvar=clinvar_chr20_notremapped.vcf.gz
python ../script/pythonScript/clinvar_sbSNP.py \
    --clinvar $clinvar \
    --chrm_name_table ../database/RefSeq/refseq_to_number_only_consensual.txt \
    --dbSNP $snp --output tmp.txt
sort tmp.txt | uniq > common-notpatho-alexis.txt
wc -l common-notpatho-alexis.txt
 #+end_s

[11.74]

[13.8259]

atforms
 were 100 base pair read length while all Illumina platforms were 150 base pair
 read length). The read length effects, as a key factor between two platforms,
 would bring alignment bias and error which are higher for short reads and
 ultimately affect the variants calling especially the INDELs identification
#+end_quote
*** Débugger variant calling (haplotypecaller)
https://gatk.broadinstitute.org/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant
https://gatk.broadinstitute.org/hc/en-us/articles/360035891111-Expected-variant-at-a-specific-site-was-not-called
*** Hap.py
Format de sortie :
#+begin_src r
vcf_field_names(vcf, tag = "FORMAT")
#+end_src
#+RESULTS:
: FORMAT BD    1      String  Decision for call (TP/FP/FN/N)
: FORMAT BK    1      String  Sub-type for decision (match/mismatch type)
: FORMAT BVT   1      String  High-level variant type (SNP|INDEL).
: FORMAT BLT   1      String  High-level location type (het|homref|hetalt|homa
am = genotype mismatch
lm = allele/haplotype mismatch
. = non vu
**** On vérifie que am = genotype mismatch
référence  = T/T
high-confidence = T/C
notre = C/C
#+begin_src sh
bcftools filter -i 'POS=19196584'  /Work/Groups/bisonex/data/giab/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz | grep -v '#'
bcftools filter -i 'POS=19196584'  ../out/NA12878_NIST7035-dbsnp/variantCalling/haplotypecaller/NA12878_NIST.vcf.gz | grep -v '#'
#+end_src
#+RESULTS:
: NC_000022.11    19196584        .       T       C       50      PASS    platforms=5;platformnames=Illumina,PacBio,10X,Ion,Solid;datasets=5;datasetnames=HiSeqPE300x,CCS15kb_20kb,10XChromiumLR,IonExome,SolidSE75bp;callsets=7;callsetnames=HiSeqPE300xGATK,CCS15kb_20kbDV,CCS15kb_20kbGATK4,HiSeqPE300xfreebayes,10XLRGATK,IonExomeTVC,SolidSE75GATKHC;datasetsmissingcall=CGnormal;callable=CS_HiSeqPE300xGATK_callable,CS_CCS15kb_20kbDV_callable,CS_10XLRGATK_callable,CS_CCS15kb_20kbGATK4_callable,CS_HiSeqPE300xfreebayes_callable GT:PS:DP:ADALL:AD:GQ    0/1:.:781:109,123:138,150:348
: NC_000022.11    19196584        rs1061325       T       C       59.32   PASS    AC=2;AF=1;AN=2;DB;DP=2;ExcessHet=0;FS=0;MLEAC=1;MLEAF=0.5;MQ=60;QD=29.66;SOR=2.303      GT:AD:DP:GQ:PL  1/1:0,2:2:6:71,6,0
**** On vérifie que lm = allele/haplotype mismatch
référence  = CAA/CAA
high-confidence = CA/CA
notre = C/CA
#+begin_src sh
 bcftools filter -i 'POS=31277416'  /Work/Groups/bisonex/data/giab/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz | grep -v '#'
 bcftools filter -i 'POS=31277416'  ../out/NA12878_NIST7035-dbsnp/variantCalling/haplotypecaller/NA12878_NIST.vcf.gz | grep -v '#'
#+end_src
#+RESULTS:
: NC_000022.11    31277416        .       CA      C       50      PASS    platforms=3;platformnames=Illumina,PacBio,10X;datasets=3;datasetnames=HiSeqPE300x,CCS15kb_20kb,10XChromiumLR;callsets=4;callsetnames=HiSeqPE300xGATK,CCS15kb_20kbDV,10XLRGATK,HiSeqPE300xfreebayes;datasetsmissingcall=CCS15kb_20kb,CGnormal,IonExome,SolidSE75bp;callable=CS_HiSeqPE300xGATK_callable;difficultregion=GRCh38_AllHomopolymers_gt6bp_imperfectgt10bp_slop5,GRCh38_SimpleRepeat_imperfecthomopolgt10_slop5  GT:PS:DP:ADALL:AD:GQ    1/1:.:465:16,229:0,190:129
: NC_000022.11    31277416        rs57244615      CAA     C,CA    389.02  PASS    AC=1,1;AF=0.5,0.5;AN=2;BaseQRankSum=0.37;DB;DP=37;ExcessHet=0;FS=0;MLEAC=1,1;MLEAF=0.5,0.5;MQ=60;MQRankSum=0;QD=13.41;ReadPosRankSum=-0.651;SOR=0.572    GT:AD:DP:GQ:PL  1/2:5,10,14:29:64:406,202,313,64,0,88
*** Génération de reads
Biblio : https://www.nature.com/articles/s41437-022-00577-3
Non mentionné : on teste simuscop qui permet de faire des données d'exome
Ne fait pas les delins ni inv
Liste ancienne : https://www.biostars.org/p/128762/
Reseq fait de l'exome aussi
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02265-7
* Idées
** Validation analytique
mail Yannis : données patients +/- simulées
*** Utiliser données GCAT et uploader le notre ?
https://www.nature.com/articles/ncomms7275
*** [#A] Variant calling : Genome in a bottle : NA12878 + autres
Résumé : https://www.nist.gov/programs-projects/genome-bottle
Manuscript : https://www.nature.com/articles/s41587-019-0054-x.epdf?author_access_token=E_1bL0MtBBwZr91xEsy6B9RgN0jAjWel9jnR3ZoTv0OLNnFBR7rUIZNDXq0DIKdg3w6KhBF8Rz2RWQFFc0St45kC6CZs3cDYc87HNHovbWSOubJHDa9CeJV-pN0BW_mQ0n7cM13KF2JRr_wAAn524w%3D%3D
Article comparant les variant calling : https://www.biorxiv.org/content/10.1101/2020.12.11.422022v1.full.pdf
**** KILL Tester le séquencage aussi
CLOSED: [2023-01-30 lun. 18:30]
Depuis un fastq correspondant à Illumina  https://github.com/genome-in-a-bottle/giab_data_indexes
   puis on compare le VCF avec les "high confidence"
On séquence directement NA12878 -> inutile pour le pipeline seul
**** TODO Tester seul la partie bioinformatique
   Tout résumé ici : https://www.nist.gov/programs-projects/genome-bottle
- methode https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/analysis/Illumina_PlatinumGenomes_NA12877_NA12878_09162015/IlluminaPlatinumGenomes-user-guide.pdf
- vcf
     https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/latest/GRCh38/
NB: à quoi correspond https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/NA12878/analysis/Illumina_PlatinumGenomes_NA12877_NA12878_09162015/hg38/2.0.1/NA12878/ ??
   Article comparant les variant calling : https://www.biorxiv.org/content/10.1101/2020.12.11.422022v1.full.pdf
   Article pour vcfeval : https://www.nature.com/articles/s41587-019-0054-x
   La version 4 ajoute 273 gènes "clinically relevant" https://www.biorxiv.org/content/10.1101/2021.06.07.444885v3.full.pdf
   Ajout des zones "difficiles"
   https://www.biorxiv.org/content/10.1101/2020.07.24.212712v5.full.pdf
*** [#B] Pipeline : générer patient avec tous les variants retrouvés à Centogene
Comparaison de génération ADN (2019)
https://academic.oup.com/bfg/article/19/1/49/5680294
**** SimuSCop
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03665-5
https://github.com/qasimyu/simuscop
1. Crééer un modèle depuis bam + vcf : Setoprofile
2. Génerer données NGS
** Annotation :
*** Comparaison vep / snpeff et annovar
* Changement nouvelle version
- Dernière version du génome (la version "prête à l'emploi" est seulement GRCh38 sans les version patchées)
* Notes
** Nextflow
*** afficher les résultats d'un process/workflow
#+begin_src
lol.out.view()
#+end_src
Attention, ne fonctionne pas si plusieurs sortie:
#+begin_src
lol.out[0].view()
#+end_src
ou si /a/ est le nom de la sortie
#+begin_src
lol.out.a.view()
#+end_src
** Quelle version du génome ?
Il y a 2 notations pour les chrosome: Refseq (NC_0001) ou chr1, chr2...
dbSNP utilise Refseq
pour le fasta, 2 solutions
- refseq : "https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/${genome}_latest/refseq_identifiers/${fna}.gz"
  -> nécessite d'indexer le fichier (long !)
- chromosome https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/
  -> nécessite d'annoter les chromosomes pour corriger (avec le fichier gff)
  On utilise la version chromosome donc on annote dbSNP (à faire)
** Performances
Ordinateur de Carine (WSL2) : 4h dont 1h15 alignement (parallélisé) et 1h15 haplotypecaller (séquentiel)
** Chromosomes NC, NT, NW
Correspondance :
https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&chromInfoPage=
Signification
https://genome.ucsc.edu/FAQ/FAQdownloads.html#downloadAlt
- alt = séquences alternatives (utilisables)
- fix = patch (correction ou amélioration)
- random = séquence connue sur un chromosome mais non encore utilisée
** Pipelines prêt-à-l’emploi nextflow
Problème : nécessite singularity ou docker (ou conda)
Potentiellement utilisable avec nix...
** Validation : Quelles données de référence ?
Discussion avec Alexis
- Platinum genomes = génome seul
*** [[https://github.com/genome-in-a-bottle/giab_data_indexes][Genome in a bottle]]
  - NA12878 :
    - Illumina HiSeq Exome : fastq + capture
    - Illumina TruSeq Exome : bam, pas de capture
      ici ww
  - HG002,3,4
    - Illumina Whole Exome  : bam. le kit de capture est "Agilent SureSelect Human All Exon V5 kit" selon [[https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/OsloUniversityHospital_Exome_GATK_jointVC_11242015/README.txt][README]]. On il faut les régions [[https://kb.10xgenomics.com/hc/en-us/articles/115004150923-Where-can-I-find-the-Agilent-Target-BED-files-][selon ce site]]
      Un autre fichier est disponible (capture ???)
    https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/OsloUniversityHospital_Exome_GATK_jointVC_11242015/wex_Agilent_SureSelect_v05_b37.baits.slop50.merged.list
  "target region" +/- 50bp
    testé sur chr311780-312086 : ok
Autres technologies non adaptées au pipeline (vu avec Alexis)
*** [[https://www.illumina.com/platinumgenomes.html][Platinum genome
]] Que du génome « sequenced to 50x depth on a HiSeq 2000 system”
Genome possible
** Zone de capture
GIAB fourni le .bed pour l'exome . INfo : https://support.illumina.com/sequencing/sequencing_kits/nextera-rapid-capture-exome-kit/downloads.html
** Centogène
https://www.twistbioscience.com/node/23906
Bed non fourni pour exactement cette capture
On prend https://www.twistbioscience.com/resources/data-files/twist-alliance-vcgs-exome-401mb-bed-files
qui content la majeure partie
* Données :data:
** DONE Remplacer bam par fastq sur mesocentre
CLOSED: [2023-04-16 Sun 16:33]
Commande
*** DONE Supprimer les fastq non "paired"
CLOSED: [2023-04-16 Sun 16:33]
nushell
Liste des fastq avec "paired-end" manquant
#+begin_src nu
ls **/*.fastq.gz | get name | path basename | split column "_" | get column1 | uniq -u | save single.txt
#+end_src
#+RESULTS:
: 62907927
: 62907970
: 62899606
: 62911287
: 62913201
: 62914084
: 62915905
: 62921595
: 62923065
: 62925220
: 62926503
: 62926502
: 62926500
: 62926499
: 62926498
: 62931719
: 62943423
: 62943400
: 62948290
: 62949205
: 62949206
: 62949118
: 62951284
: 62960792
: 62960785
: 62960787
: 62960617
: 62962561
: 62962692
: 62967473
: 62972194
: 62979102
On vérifie
#+begin_src nu
open single.txt  | lines | each {|e| ls $"fastq/*_($in)/*" | get 0  }
open single.txt  | lines | each {|e| ls $"fastq/*_($in)/*" | get 0.name }  | path basename | split column "_" | get column1 | uniq -c
#+end_src
On met tous dans un dossier (pas de suppression )
#+begin_src
open single.txt  | lines | each {|e| ls $"fastq/*_($in)/*" | get 0  }  | each {|e| ^mv $e.name bad-fastq/}
#+end_src
On vérifie que les dossiier sont videsj
 open single.txt  | lines | each {|e| ls $"fastq/*_($in)" | get 0.name } | ^ls -l $in
 Puis on supprime
 open single.txt  | lines | each {|e| ls $"fastq/*_($in)" | get 0.name } | ^rm -r $in
*** DONE Supprimer bam qui ont des fastq
CLOSED: [2023-04-16 Sun 16:33]
On liste les identifiants des fastq et bam dans un tableau avec leur type :
#+begin_src
let fastq = (ls fastq/*/*.fastq.gz | get name | parse "{dir}/{full_id}/{id}_{R}_001.fastq.gz"  | select dir id | uniq )
let bam = (ls bam/*/*.bam | get name | parse "{dir}/{full_id}/{id}_{S}.bqrt.bam"  | select dir id)
#+end_src
On groupe les résultat par identifiant (résultats = liste de records qui doit être convertie en table)
et on trie ceux qui n'ont qu'un fastq ou un bam
#+begin_src
let single = ( $bam | append $fastq | group-by id | transpose id files | get files | where {|x| ($x | length) == 1})
#+end_src
On convertit en table et on récupère seulement les bam
#+begin_src
$single | reduce {|it, acc| $acc | append $it} | where dir == bam | get id | each {|e| ^ls $"bam/*_($e)/*.bam"}
#+end_src
#+RESULTS:
: bam/2100656174_62913201/62913201_S52.bqrt.bam
: bam/2100733271_62925220/62925220_S33.bqrt.bam
: bam/2100738763_62926502/62926502_S108.bqrt.bam
: bam/2100746726_62926498/62926498_S105.bqrt.bam
: bam/2100787936_62931955/62931955_S4.bqrt.bam
: bam/2200066374_62948290/62948290_S130.bqrt.bam
: bam/2200074722_62948298/62948298_S131.bqrt.bam
: bam/2200074990_62948306/62948306_S218.bqrt.bam
: bam/2200214581_62967331/62967331_S267.bqrt.bam
: bam/2200225399_62972187/62972187_S85.bqrt.bam
: bam/2200293962_62979117/62979117_S63.bqrt.bam
: bam/2200423985_62999352/62999352_S1.bqrt.bam
: bam/2200495073_63010427/63010427_S20.bqrt.bam
: bam/2200511274_63012586/63012586_S114.bqrt.bam
: bam/2200669188_63036688/63036688_S150.bqrt.bam
* Nouveau workflow :workflow:
** TODO Bases de données
*** KILL Nix pour télécharger les données brutes
**** Conclusion
Non viable sur cluster car en dehors de /nix/store
On peut utiliser des symlink mais trop compliqué
**** KILL Axel au lieu de curl pour gérer les timeout?
CLOSED: [2022-08-19 Fri 15:18]
*** DONE Tester patch de @pennae pour gros fichiers
SCHEDULED: <2022-08-19 Fri>
*** STRT Télécharger les données avec nextflow
**** DONE Genome de référence
**** DONE dbSNP
**** TODO VEP 20G
Ajout vérification checksum -> à vérifier
**** TODO transcriptome (spip)
Rajouter checksum manuel
**** KILL Refseq
**** STRT OMIM
codé, à vérifier
**** TODO ACMG incidental
*** HOLD Processing bases de données
**** DONE dbSNP common
**** DONE Seulement les ID dans dbSNP common !
CLOSED: [2022-11-19 Sat 21:42]
172G au lieu de 253M...
**** HOLD common dbSNP not clinvar patho
***** DONE Conclusion partielle
CLOSED: [2022-12-12 Mon 22:25]
- vcfeval : prometteur mais n'arrive pas à traiter toutes les régions
- isec : trop de problèmes avec
- classif clinvar directement dans dbSNP: le plus simple
  Et ça permet de rattraper quelques erreurs dans le script d'Alexis
***** KILL Utiliser directement le numéro dbSNP dans clinvar ? Non
CLOSED: [2022-11-20 Sun 19:51]
Ex: chr20
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools query -f 'rs%INFO/RS \n' -i 'INFO/RS != "." & INFO/CLNSIG="Pathogenic"' clinvar_chr20.vcf.gz | sort > ID_clinvar_patho.txt
bcftools query -f '%ID\n' dbSNP_common_chr20.vcf.gz | sort > ID_of_common_snp.txt
comm -23 ID_of_common_snp.txt ID_clinvar_patho.txt > ID_of_common_snp_not_clinvar_patho.txt
wc -l ID_of_common_snp_not_clinvar_patho.txt
# sort ID
#+end_src
#+RESULTS:
: 518846 ID_of_common_snp_not_clinvar_patho.txt
Version d'alexis
#+begin_src sh :dir ~/code/bisonex/test_isec
snp=dbSNP_common_chr20.vcf.gz
clinvar=clinvar_chr20_notremapped.vcf.gz
python ../script/pythonScript/clinvar_sbSNP.py \
    --clinvar $clinvar \
    --chrm_name_table ../database/RefSeq/refseq_to_number_only_consensual.txt \
    --dbSNP $snp --output prod.txt
wc -l prod.txt
zgrep '^NC' dbSNP_common_chr20.vcf.gz | wc -l
#+end_src
#+RESULTS:
| 518832 | prod.txt |
| 518846 |          |
***** KILL classification clinvar codée dbSNP ?
CLOSED: [2022-12-04 Sun 14:38]
Sur le chromosome 20
*Attention* CLNSIG a plusieurs champs (séparé par une virgule)
On y accède avec INFO/CLNSIG[*]
Ensuite, chaque item peut avoir plusieurs haploïdie (séparé par un |). IL faut donc utiliser une regexp
NB: *ne pas mettre la condition* dans une variable !!
Pour avoir les clinvar patho, on veut 5 mais pas 255 (= autre) pour la classification !`
Il faut également les likely patho et conflicting
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools query -f '%INFO/CLNSIG\n' dbSNP_common_chr20.vcf.gz -i \
'INFO/CLNSIG[*]~"^5|" | INFO/CLNSIG[*]=="5" | INFO/CLNSIG[*]~"|5" | INFO/CLNSIG[*]~"^4|" | INFO/CLNSIG[*]=="4" | INFO/CLNSIG[*]~"|4" | INFO/CLNSIG[*]~"^12|" | INFO/CLNSIG[*]=="12" | INFO/CLNSIG[*]~"|12"' | sort
#+end_src
#+RESULTS:
| . |  . | 12 |    |   |   |   |   |   |   |   |
| . | 12 |  0 |  2 |   |   |   |   |   |   |   |
| 2 |  3 |  2 |  2 | 2 | 5 | . |   |   |   |   |
| . |  2 |  3 |  2 | 2 | 4 |   |   |   |   |   |
| . |  . |  3 | 12 | 3 |   |   |   |   |   |   |
| . |  5 |  2 |  . |   |   |   |   |   |   |   |
| . |  . |  . |  5 | 2 | 2 |   |   |   |   |   |
| . |  9 |  9 |  9 | 5 | 5 | 2 | 3 | 2 | 3 | 2 |
Si on les exclut :
#+begin_src sh :dir ~/code/bisonex/test_isec
bcftools query -f '%ID\n' dbSNP_common_chr20.vcf.gz -e \
'INFO/CLNSIG[*]~"^5|" | INFO/CLNSIG[*]=="5" | INFO/CLNSIG[*]~"|5" | INFO/CLNSIG[*]~"4" | INFO/CLNSIG[*]~"12"' | sort | uniq > common-notpatho.txt
#+end_src
#+RESULTS:
 #+begin_src sh :dir ~/code/bisonex/test_isec
snp=dbSNP_common_chr20.vcf.gz
clinvar=clinvar_chr20_notremapped.vcf.gz
python ../script/pythonScript/clinvar_sbSNP.py \
    --clinvar $clinvar \
    --chrm_name_table ../database/RefSeq/refseq_to_number_only_consensual.txt \
    --dbSNP $snp --output tmp.txt
sort tmp.txt | uniq > common-notpatho-alexis.txt
wc -l common-notpatho-alexis.txt
 #+end_s

Replacement in projects/bisonex.org at line 37 [12.35]

B:BD[6.16645] → [3.7:9484]

   None              37248          36972        461       4062     0.9877       0.9017     0.9427
| Type  | Filter | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision | METRIC.Frac_NA | METRIC.F1_Score | TRUTH.TOTAL.TiTv_ratio | QUERY.TOTAL.TiTv_ratio | TRUTH.TOTAL.het_hom_ratio | QUERY.TOTAL.het_hom_ratio |
| INDEL | ALL    |        2909 |     2477 |      432 |        3229 |      207 |       519 |    52 |    50 |      0.851495 |         0.923616 |       0.160731 |        0.886091 |                        |                        |        1.4964850615114236 |        1.8339222614840989 |
| INDEL | PASS   |        2909 |     2477 |      432 |        3229 |      207 |       519 |    52 |    50 |      0.851495 |         0.923616 |       0.160731 |        0.886091 |                        |                        |        1.4964850615114236 |        1.8339222614840989 |
| SNP   | ALL    |       38406 |    34793 |     3613 |       36935 |      275 |      1868 |    37 |    15 |      0.905926 |         0.992158 |       0.050575 |        0.947083 |     2.6247759222568168 |     2.5752854654538417 |         1.588953331534934 |        1.6192536889897844 |
| SNP   | PASS   |       38406 |    34793 |     3613 |       36935 |      275 |      1868 |    37 |    15 |      0.905926 |         0.992158 |       0.050575 |        0.947083 |     2.6247759222568168 |     2.5752854654538417 |         1.588953331534934 |        1.6192536889897844 |
**** DONE HG003 :hg003:
CLOSED: [2023-04-16 Sun 00:20]
#+begin_src sh
NXF_OPTS=-D"user.name=${USER}" nextflow run main.nf -profile standard,helios  --input /Work/Groups/bisonex/data/giab/GRCh38/HG003_{1,2}.fq.gz -bg
#+end_src
#+begin_src  sh
NXF_OPTS=-D"user.name=${USER}" nextflow run workflows/compareVCF.nf -profile standard,helios -resume --outdir=compareHG003  --test.id=HG003 --test.query=out/HG003_1/variantCalling/haplotypecaller/HG003_1.vcf.gz  --test.compare=vcfeval,happy --test.capture=data/AgilentSureSelectv05_hg38.bed
#+end_src
vcfeval
Threshold  True-pos-baseline  True-pos-call  False-pos  False-neg  Precision  Sensitivity  F-measure
----------------------------------------------------------------------------------------------------
    5.000              36745          36473        486       3988     0.9869       0.9021     0.9426
     None              36748          36476        495       3985     0.9866       0.9022     0.9425
$ zcat NA12878.snp_roc.tsv.gz  | tail -n 1 | awk '{print $7 $6}'
happy
Type Filter  TRUTH.TOTAL  TRUTH.TP  TRUTH.FN  QUERY.TOTAL  QUERY.FP  QUERY.UNK  FP.gt  FP.al  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA  METRIC.F1_Score  TRUTH.TOTAL.TiTv_ratio  QUERY.TOTAL.TiTv_ratio  TRUTH.TOTAL.het_hom_ratio  QUERY.TOTAL.het_hom_ratio
INDEL    ALL         2731      2290       441         3092       208        577     62     53       0.838521          0.917296        0.186611         0.876141                     NaN                     NaN                   1.505145                   1.888993
INDEL   PASS         2731      2290       441         3092       208        577     62     53       0.838521          0.917296        0.186611         0.876141                     NaN                     NaN                   1.505145                   1.888993
  SNP    ALL        37997     34481      3516        36861       306       2074     33     13       0.907466          0.991204        0.056265         0.947488                2.611269                2.565915                   1.555780                   1.621727
  SNP   PASS        37997     34481      3516        36861       306       2074     33     13       0.907466          0.991204        0.056265         0.947488                2.611269                2.5659
**** DONE HG004
CLOSED: [2023-04-16 Sun 00:20]
#+begin_src sh
NXF_OPTS=-D"user.name=${USER}" nextflow run main.nf -profile standard,helios  --input /Work/Groups/bisonex/data/giab/GRCh38/HG004_{1,2}.fq.gz -bg
#+end_src
vcfeval
Threshold  True-pos-baseline  True-pos-call  False-pos  False-neg  Precision  Sensitivity  F-measure
----------------------------------------------------------------------------------------------------
    6.000              36938          36678        421       4040     0.9887       0.9014     0.9430
     None              36942          36682        432       4036     0.9884       0.9015     0.9429
happy
 Type Filter  TRUTH.TOTAL  TRUTH.TP  TRUTH.FN  QUERY.TOTAL  QUERY.FP  QUERY.UNK  FP.gt  FP.al  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA  METRIC.F1_Score  TRUTH.TOTAL.TiTv_ratio  QUERY.TOTAL.TiTv_ratio  TRUTH.TOTAL.het_hom_ratio  QUERY.TOTAL.het_hom_ratio
INDEL    ALL         2787      2388       399         3183       195        580     53     38       0.856835          0.925086        0.182218         0.889654                     NaN                     NaN                   1.507834                   1.848649
INDEL   PASS         2787      2388       399         3183       195        580     53     38       0.856835          0.925086        0.182218         0.889654                     NaN                     NaN                   1.507834                   1.848649
  SNP    ALL        38185     34560      3625        36921       254       2107     46      7       0.905067          0.992704        0.057068         0.946862                2.589175                2.553546                   1.632595                   1.653534
  SNP   PASS        38185     34560      3625        36921       254       2107     46      7       0.905067          0.992704        0.057068         0.946862                2.589175                2.553546                   1.632595                   1.653534
**** DONE Résumer résultats pour Paul + article :resultats:
CLOSED: [2023-04-06 Thu 21:41] SCHEDULED: <2023-04-02 Sun>
**** DONE Plot : ashkenazim trio
CLOSED: [2023-04-18 Tue 21:27] SCHEDULED: <2023-04-16 Sun>
/Entered on/ [2023-04-16 Sun 17:29]
*** TODO Platinum genome
https://emea.illumina.com/platinumgenomes.html
*** TODO Séquencer NA12878
Discussion avec Paul : sous-traitant ne nous donnera pas les données, il faut commander l'ADN
** TODO Fastq avec tous les variants centogène :centogene:
*** DONE Extraire liste des SNVs
CLOSED: [2023-04-22 Sat 17:32] SCHEDULED: <2023-04-17 Mon>
**** DONE Corriger manquant à la main
CLOSED: [2023-04-22 Sat 17:31]
La sortie est sauvegardé dans git-annex : variants_success.csv
**** DONE Automatique
CLOSED: [2023-04-22 Sat 17:31]
*** TODO Convert SNVs : transcript -> génomique
**** TODO Variant_recoder
SCHEDULED: <2023-04-22 Sat>
***** KILL Haskell: 160 manquant : recoded-success.csv
CLOSED: [2023-04-25 Tue 18:32]
La liste des variants a été générée en Haskell et nettoyée à la main.
On générer une liste de variant pour variant_recoder et on soumet tout d'un coup.
[[file:~/recherche/bisonex/parsevariants/app/Main.hs][parsevariant]]
#+begin_src haskell
recodeVariant = do
  prepareVariantRecod   er "variant_success.csv" "renamed.csv"
  runVariantRecoder "renamed.csv" "recoded.json"
#+end_src
#+RESULTS:
: <interactive>:4:3-19: error:
:     Variable not in scope: runVariantRecoder :: String -> String -> t
: gh
Problème : 160 n'ont pas pu être lu sur 820, probablement à cause du numéro mineur de transcrit
La sortie est sauvegardé dans git-annex : variants-recoded-raw.json.
***** KILL Julia
CLOSED: [2023-04-25 Tue 18:32]
On regénère la liste de variant et on passe à Julia pour préparer l'appel en parallèle à variant recoder
[[file:~/recherche/bisonex/parsevariants/variantRecoder.jl][variantRecoder.jl]]
#+begin_src julia
setupVariantRecoder(unique(init), n)
#+end_src
Puis
#+begin_src sh
parallel -a parallel-recoder.sh --jobs 10
#+end_src
On récupère les résultats
#+begin_src julia
(fails, success) = mergeVariantRecoder(n)
CSV.write(fSuccess, success)
CSV.write(fFailures, fails)
#+end_src
Certains variants ne sont pas trouvé, donc on prépare un nouveau job en enlevant les versionrs mineures des transcrits
#+begin_src julia
# Cleanup json and txt
if isfile(fSuccess) && isfile(fFailures)
    foreach(rm, variantRecoderInput())
    foreach(rm, variantRecoderOutput())
end
redoFails(fFailures)
#+end_src
Puis
#+begin_src sh
parallel -a parallel-recoder.sh --jobs 3
#+end_src
Il manque encore 70 transcrits
***** DONE Julia avec mobidetails: recode-failures-mobidetails.csv
CLOSED: [2023-04-25 Tue 18:58]
Nouvelle stratégie : on essaie une fois variant recoder.
Pour tous les échecs, on utilise mobidetails (~170).
Si l'ID n'est pas trouvé, on incrémente le numéro de version 2 fois
***** Reste une dizaine à corriger à la main
Erreur de parsing, manque souvent un -
*** TODO Extraire liste des CNVs
SCHEDULED: <2023-04-17 Mon>
*** TODO Générer fastq avec simuscop
SCHEDULED: <2023-04-22 Sat>
**** TODO Génerer un profile
SCHEDULED: <2023-04-22 Sat>
NA12878 mais à refaire avec un vrai séquencage
**** TODO Générer les données
SCHEDULED: <2023-04-22 Sat>
Quelle capture ???
*** TODO Vérifier qu'on les retrouve tous
SCHEDULED: <2023-04-22 Sat>
** Divers
*** DONE Vérifier nombre de reads fastq - bam
CLOSED: [2022-10-09 Sun 22:31]
* DONE Plot : ashkenazim trio
CLOSED: [2023-04-18 Tue 21:28] SCHEDULED: <2023-04-16 Sun>
/Entered on/ [2023-04-16 Sun 17:29]

[6.16645]

   None              37248          36972        461       4062     0.9877       0.9017     0.9427
| Type  | Filter | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision | METRIC.Frac_NA | METRIC.F1_Score | TRUTH.TOTAL.TiTv_ratio | QUERY.TOTAL.TiTv_ratio | TRUTH.TOTAL.het_hom_ratio | QUERY.TOTAL.het_hom_ratio |
| INDEL | ALL    |        2909 |     2477 |      432 |        3229 |      207 |       519 |    52 |    50 |      0.851495 |         0.923616 |       0.160731 |        0.886091 |                        |                        |        1.4964850615114236 |        1.8339222614840989 |
| INDEL | PASS   |        2909 |     2477 |      432 |        3229 |      207 |       519 |    52 |    50 |      0.851495 |         0.923616 |       0.160731 |        0.886091 |                        |                        |        1.4964850615114236 |        1.8339222614840989 |
| SNP   | ALL    |       38406 |    34793 |     3613 |       36935 |      275 |      1868 |    37 |    15 |      0.905926 |         0.992158 |       0.050575 |        0.947083 |     2.6247759222568168 |     2.5752854654538417 |         1.588953331534934 |        1.6192536889897844 |
| SNP   | PASS   |       38406 |    34793 |     3613 |       36935 |      275 |      1868 |    37 |    15 |      0.905926 |         0.992158 |       0.050575 |        0.947083 |     2.6247759222568168 |     2.5752854654538417 |         1.588953331534934 |        1.6192536889897844 |
**** DONE HG003 :hg003:
CLOSED: [2023-04-16 Sun 00:20]
#+begin_src sh
NXF_OPTS=-D"user.name=${USER}" nextflow run main.nf -profile standard,helios  --input /Work/Groups/bisonex/data/giab/GRCh38/HG003_{1,2}.fq.gz -bg
#+end_src
#+begin_src  sh
NXF_OPTS=-D"user.name=${USER}" nextflow run workflows/compareVCF.nf -profile standard,helios -resume --outdir=compareHG003  --test.id=HG003 --test.query=out/HG003_1/variantCalling/haplotypecaller/HG003_1.vcf.gz  --test.compare=vcfeval,happy --test.capture=data/AgilentSureSelectv05_hg38.bed
#+end_src
vcfeval
Threshold  True-pos-baseline  True-pos-call  False-pos  False-neg  Precision  Sensitivity  F-measure
----------------------------------------------------------------------------------------------------
    5.000              36745          36473        486       3988     0.9869       0.9021     0.9426
     None              36748          36476        495       3985     0.9866       0.9022     0.9425
$ zcat NA12878.snp_roc.tsv.gz  | tail -n 1 | awk '{print $7 $6}'
happy
Type Filter  TRUTH.TOTAL  TRUTH.TP  TRUTH.FN  QUERY.TOTAL  QUERY.FP  QUERY.UNK  FP.gt  FP.al  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA  METRIC.F1_Score  TRUTH.TOTAL.TiTv_ratio  QUERY.TOTAL.TiTv_ratio  TRUTH.TOTAL.het_hom_ratio  QUERY.TOTAL.het_hom_ratio
INDEL    ALL         2731      2290       441         3092       208        577     62     53       0.838521          0.917296        0.186611         0.876141                     NaN                     NaN                   1.505145                   1.888993
INDEL   PASS         2731      2290       441         3092       208        577     62     53       0.838521          0.917296        0.186611         0.876141                     NaN                     NaN                   1.505145                   1.888993
  SNP    ALL        37997     34481      3516        36861       306       2074     33     13       0.907466          0.991204        0.056265         0.947488                2.611269                2.565915                   1.555780                   1.621727
  SNP   PASS        37997     34481      3516        36861       306       2074     33     13       0.907466          0.991204        0.056265         0.947488                2.611269                2.5659
**** DONE HG004
CLOSED: [2023-04-16 Sun 00:20]
#+begin_src sh
NXF_OPTS=-D"user.name=${USER}" nextflow run main.nf -profile standard,helios  --input /Work/Groups/bisonex/data/giab/GRCh38/HG004_{1,2}.fq.gz -bg
#+end_src
vcfeval
Threshold  True-pos-baseline  True-pos-call  False-pos  False-neg  Precision  Sensitivity  F-measure
----------------------------------------------------------------------------------------------------
    6.000              36938          36678        421       4040     0.9887       0.9014     0.9430
     None              36942          36682        432       4036     0.9884       0.9015     0.9429
happy
 Type Filter  TRUTH.TOTAL  TRUTH.TP  TRUTH.FN  QUERY.TOTAL  QUERY.FP  QUERY.UNK  FP.gt  FP.al  METRIC.Recall  METRIC.Precision  METRIC.Frac_NA  METRIC.F1_Score  TRUTH.TOTAL.TiTv_ratio  QUERY.TOTAL.TiTv_ratio  TRUTH.TOTAL.het_hom_ratio  QUERY.TOTAL.het_hom_ratio
INDEL    ALL         2787      2388       399         3183       195        580     53     38       0.856835          0.925086        0.182218         0.889654                     NaN                     NaN                   1.507834                   1.848649
INDEL   PASS         2787      2388       399         3183       195        580     53     38       0.856835          0.925086        0.182218         0.889654                     NaN                     NaN                   1.507834                   1.848649
  SNP    ALL        38185     34560      3625        36921       254       2107     46      7       0.905067          0.992704        0.057068         0.946862                2.589175                2.553546                   1.632595                   1.653534
  SNP   PASS        38185     34560      3625        36921       254       2107     46      7       0.905067          0.992704        0.057068         0.946862                2.589175                2.553546                   1.632595                   1.653534
**** DONE Résumer résultats pour Paul + article :resultats:
CLOSED: [2023-04-06 Thu 21:41] SCHEDULED: <2023-04-02 Sun>
**** DONE Plot : ashkenazim trio
CLOSED: [2023-04-18 Tue 21:27] SCHEDULED: <2023-04-16 Sun>
/Entered on/ [2023-04-16 Sun 17:29]
*** TODO Platinum genome
https://emea.illumina.com/platinumgenomes.html
*** TODO Séquencer NA12878
Discussion avec Paul : sous-traitant ne nous donnera pas les données, il faut commander l'ADN
** TODO Fastq avec tous les variants centogène :centogene:
*** DONE Extraire liste des SNVs
CLOSED: [2023-04-22 Sat 17:32] SCHEDULED: <2023-04-17 Mon>
**** DONE Corriger manquant à la main
CLOSED: [2023-04-22 Sat 17:31]
La sortie est sauvegardé dans git-annex : variants_success.csv
**** DONE Automatique
CLOSED: [2023-04-22 Sat 17:31]
*** TODO Convert SNVs : transcript -> génomique
**** DONE Variant_recoder
CLOSED: [2023-04-26 Wed 21:21] SCHEDULED: <2023-04-22 Sat>
***** KILL Haskell: 160 manquant : recoded-success.csv
CLOSED: [2023-04-25 Tue 18:32]
La liste des variants a été générée en Haskel   l et nettoyée à la main.
On générer une liste de variant pour variant_rec            oder et on soumet tout d'un coup.
[[file:~/recherche/bisonex/parsevariants/app/Main.hs][parsevariant]]
#+begin_src haskell
recodeVariant = do
  prepareVariantRecod   er "variant_success.csv" "renamed.csv"
  runVariantRecoder "renamed.csv" "recoded.json"
#+end_src
#+RESULTS:
: <interactive>:4:3-19: error:
:     Variable not in scope: runVariantRecoder :: String -> String -> t
: gh
Problème : 160 n'ont pas pu être lu sur 820, probablement à cause du numéro mineur de transcrit
La sortie est sauvegardé dans git-annex : variants-recoded-raw.json.
***** KILL Julia
CLOSED: [2023-04-25 Tue 18:32]
On regénère la liste de variant et on passe à Julia pour préparer l'appel en parallèle à variant recoder
[[file:~/recherche/bisonex/parsevariants/variantRecoder.jl][variantRecoder.jl]]
#+begin_src julia
setupVariantRecoder(unique(init), n)
#+end_src
Puis
#+begin_src sh
parallel -a parallel-recoder.sh --jobs 10
#+end_src
On récupère les résultats
#+begin_src julia
(fails, success) = mergeVariantRecoder(n)
CSV.write(fSuccess, success)
CSV.write(fFailures, fails)
#+end_src
Certains variants ne sont pas trouvé, donc on prépare un nouveau job en enlevant les versionrs mineures des transcrits
#+begin_src julia
# Cleanup json and txt
if isfile(fSuccess) && isfile(fFailures)
    foreach(rm, variantRecoderInput())
    foreach(rm, variantRecoderOutput())
end
redoFails(fFailures)
#+end_src
Puis
#+begin_src sh
parallel -a parallel-recoder.sh --jobs 3
#+end_src
Il manque encore 70 transcrits
**** DONE Julia avec mobidetails: recode-failures-mobidetails.csv
CLOSED: [2023-04-25 Tue 18:58]
Nouvelle stratégie : on essaie une fois variant recoder.
Pour tous les échecs, on utilise mobidetails (~170).
Si l'ID n'est pas trouvé, on incrémente le numéro de version 2 fois
**** DONE Reste une dizaine à corriger à la main
CLOSED: [2023-04-26 Wed 21:21]
- [X] certains transcrits ont juste été supprimé
- [X] Erreur de parsing, manque souvent un -
#+begin_src julia
lastTryMobidetails("recoded-failures-mobidetails.csv")
#+end_src
**** DONE Fusionner données
CLOSED: [2023-04-26 Wed 22:35]
#+begin_src julia
function mergeAllGenomic()
    dNew = mergeAll("recoded-success.csv",
                    "recoded-failures-mobidetails.csv",
                    "recoded-failures-mobidetails-redo.csv")
    dInit = @chain DataFrame(CSV.File("variant_success.csv")) begin
        @transform :transcript = :transcript .* ":" .* :coding .* :codingPos .* :codingChange
        @select :file :transcript :classification :zygosity
        @rename :classificationCentogene = :classification
    end
    dTmp = outerjoin(dInit, dNew, on = :transcript)
    CSV.write("variant_genomic.csv", dTmp)
end
fSuccess = "recoded-success.csv"
fFailures = "recoded-failures.csv"
# variantRecoder(fSuccess, fFailures)
# mobidetailsOnFailures(fFailures)
# lastTryMobidetails("recoded-failures-mobidetails.csv")
mergeAllGenomic()
#+end_src
**** DONE Formatter donner pour simuscop
CLOSED: [2023-04-28 Fri 11:55] SCHEDULED: <2023-04-26 Wed>
*** TODO Extraire liste des CNVs
SCHEDULED: <2023-04-17 Mon>
*** TODO Entrainer le modèle sur 63003856/
Relancer le modèle pour être sûr
*** DONE Générer fastq avec simuscop (del et ins seulement) 20x
CLOSED: [2023-04-28 Fri 23:35] SCHEDULED: <2023-04-22 Sat>
**** DONE Génerer un profile avec bed de centogène
CLOSED: [2023-04-28 Fri 11:54] SCHEDULED: <2023-04-22 Sat>
NA12878 mais à refaire avec un vrai séquencage
Voir [[*Centogène][Bed Centogène]] pour choix
**** DONE Générer les données en 20x
CLOSED: [2023-04-28 Fri 11:54] SCHEDULED: <2023-04-22 Sat>
capture de centogene
**** DONE Regénérer en supprimant les doublons
CLOSED: [2023-04-28 Fri 17:28]
*** TODO Générer fastq avec 50x
ex sur chr11:16,014,966 où on a 11 reads dans la simulation contre 200 !
*** TODO Vérifier tous les variants sont retrouvés
SCHEDULED: <2023-04-22 Sat>
**** TODO Après alignement
SCHEDULED: <2023-04-28 Fri>
***** DONE SNV: avec doublons
CLOSED: [2023-04-28 Fri 18:12]
On utilise [[file:~/recherche/bisonex/simuscop/checkBam.jl][checkBam.jl]]
#+begin_src julia
d = prepareVariant("../parsevariants/variant_genomic.csv")
root = "/home/alex/code/bisonex/simuscop-centogene/cento"
bam = root * "/preprocessing/applybqsr/cento.bam"
bai = root * "/preprocessing/recalibrated/cento.bam.bai"
snv = getSNV(d, bam, bai)
#+end_src
Nombreux faux homozygouteS
Vérification avec checkFalseHemizygous(snv) : nombreux doublons dans le fichier pour simuscop...
***** TODO SNV sans doublons
****** WAIT 18 faux homozygote mais avec peu de reads
julia> @subset snv :refCount .== 0 :altCount .> 0 :zygosity .== "heterozygous"
18×10 DataFrame
 Row │ chrom         pos        variant         variantType  zygosity      ref        alt        refCount  altCount  readsCount
     │ SubStrin…?    Int64      SubStrin…?      String?      String15      SubStrin…  SubStrin…  Int64     Int64     Int64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ NC_000022.11   42213078  g.42213078T>G   snv          heterozygous  T          G                 0         1           1
   2 │ NC_000012.12  101680427  g.101680427C>A  snv          heterozygous  C          A                 0         3           3
   3 │ NC_000014.9   105385684  g.105385684G>C  snv          heterozygous  G          C                 0         4           4
   4 │ NC_000011.10  125978299  g.125978299C>T  snv          heterozygous  C          T                 0         3           3
   5 │ NC_000023.11   77998618  g.77998618C>T   snv          heterozygous  C          T                 0         2           2
   6 │ NC_000015.10   66703292  g.66703292C>T   snv          heterozygous  C          T                 0         3           3
   7 │ NC_000010.11   87961118  g.87961118G>A   snv          heterozygous  G          A                 0         3           3
   8 │ NC_000012.12  112477719  g.112477719A>G  snv          heterozygous  A          G                 0         2           2
   9 │ NC_000020.11    6778406  g.6778406C>T    snv          heterozygous  C          T                 0         3           3
  10 │ NC_000023.11   68192943  g.68192943G>A   snv          heterozygous  G          A                 0         2           2
  11 │ NC_000004.12     987858  g.987858C>T     snv          heterozygous  C          T                 0         3           4
  12 │ NC_000015.10   66435145  g.66435145G>A   snv          heterozygous  G          A                 0         1           2
  13 │ NC_000002.12   47809595  g.47809595C>T   snv          heterozygous  C          T                 0         2           2
  14 │ NC_000003.12  136477305  g.136477305C>G  snv          heterozygous  C          G                 0         4           4
  15 │ NC_000005.10  157285458  g.157285458C>T  snv          heterozygous  C          T                 0         3           3
  16 │ NC_000012.12   23604413  g.23604413T>G   snv          heterozygous  T          G                 0         5           5
  17 │ NC_000019.10   52219703  g.52219703C>T   snv          heterozygous  C          T                 0         1           1
  18 │ NC_000016.10   88856757  g.88856757C>T   snv          heterozygous  C          T                 0         8           8
****** DONE 8 non retrouvé => probablement hors de la zjone de capture
CLOSED: [2023-04-28 Fri 19:49]
julia> @subset snv :refCount .== 0 :altCount .== 0
8×10 DataFrame
 Row │ chrom         pos        variant         variantType  zygosity      ref        alt        refCount  altCount  readsCount
     │ SubStrin…?    Int64      SubStrin…?      String?      String15      SubStrin…  SubStrin…  Int64     Int64     Int64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ NC_000015.10   74343027  g.74343027C>T   snv          heterozygous  C          T                 0         0           0
   2 │ NC_000011.10   20638345  g.20638345A>G   snv          heterozygous  A          G                 0         0           0
   3 │ NC_000004.12  139370252  g.139370252C>T  snv          heterozygous  C          T                 0         0           2
   4 │ NC_000017.11   61966475  g.61966475G>T   snv          heterozygous  G          T                 0         0           0
   5 │ NC_000019.10   54144058  g.54144058G>A   snv          heterozygous  G          A                 0         0           0
   6 │ NC_000023.11   77635947  g.77635947A>G   snv          hemizygous    A          G                 0         0           0
   7 │ NC_000005.10    1258495  g.1258495G>A    snv          heterozygous  G          A                 0         0           0
   8 │ NC_000012.12    2449086  g.2449086C>G    snv          heterozygous  C          G                 0         0           0
**** TODO Après haplotypecaller
SCHEDULED: <2023-04-28 Fri>
Manque 183 sur 766
[[file:~/recherche/bisonex/simuscop/checkVCF.jl][checkVCF.jl]]
#+begin_src julia
@subset leftjoin(d2, dHaplo2, on=:genomic) ismissing.(:Column1)
#+end_src
Problème de profondeur ?
Ex: chr13 nombre de 101081606
NC_000011.10   16014966  g.16014966G>A
1 read sur 11 pour allèle alternative
Sur le patient de référence, 202 reads!
Celui-ci n'est pas le fichier de capture (ni dans le bam !)
ex: NC_000015.10   74343027  g.74343027C>T
Pour les autres, on devrait les retrouver...
Vérifier le nombre de reads sur 63003856
Vérifier la paramétrisation du modèle également
**** TODO Après annotation
SCHEDULED: <2023-04-28 Fri>
Il en manque !
**** TODO Après filtre annotation
** Divers
*** DONE Vérifier nombre de reads fastq - bam
CLOSED: [2022-10-09 Sun 22:31]
* DONE Plot : ashkenazim trio
CLOSED: [2023-04-18 Tue 21:28] SCHEDULED: <2023-04-16 Sun>
/Entered on/ [2023-04-16 Sun 17:29]

Deletion in blog/Main.hs at line 42 [15.9907]

∅:D[14.1400] → [15.11694:11695]

B:BD[15.11694] → [15.11694:11695]

B:BD[15.11695] → [2.853:1382]


    -- -- Hack : generate temporary files for correcting metadata
    -- match ("posts/*.org" .||. "genetique/*.org") $ do
    --     route tempRoute
    --     compile $ getResourceString >>= orgCompiler
    -- match ("_temp/posts/*.org" .||. "_temp/genetique/*.org") $ do
    --     route $ cleanRouteFromTemp
    --     compile $ pandocCompiler
    --         >>= loadAndApplyTemplate "templates/post.html"    postCtx
    --         >>= loadAndApplyTemplate "templates/default.html" postCtx
    --         >>= relativizeUrls

Replacement in blog/Main.hs at line 47 [15.9907]

B:BD[16.509] → [17.1592:1653]

            posts <- recentFirst =<< loadAll "_temp/posts/*"

[16.509]

[16.564]

            posts <- recentFirst =<< loadAll "posts/*"

Replacement in blog/Main.hs at line 62 [15.9907]

B:BD[14.2060] → [17.1710:1771]

            posts <- recentFirst =<< loadAll "_temp/posts/*"

[14.2060]

[14.2115]

            posts <- recentFirst =<< loadAll "posts/*"

Replacement in blog/Main.hs at line 76 [15.9907]

B:BD[18.2482] → [17.1828:1881]

            posts <- loadAll "_temp/genetique/*.org"

[18.2482]

[18.2529]

            posts <- loadAll "genetique/*.org"

Deletion in blog/Main.hs at line 87 [15.9907]
B:BD[17.1883] → [17.1883:1896]
∅:D[17.1896] → [15.12489:12490]
∅:D[14.2488] → [15.12489:12490]
B:BD[15.12489] → [15.12489:12490]
B:BD[15.12490] → [17.1897:2082]
```
------------
-- | Temp Route
tempRoute :: Routes
tempRoute = customRoute tempRoute'
  where
    tempRoute' i = ".." </> "_temp" </> takeDirectory p </> takeFileName p
        where p = toFilePath i
```

Deletion in blog/Main.hs at line 88 [15.9907]

B:BD[15.12953] → [17.2083:3041]

cleanRouteFromTemp :: Routes
cleanRouteFromTemp = customRoute createIndexRoute
  where
    createIndexRoute ident =
        (joinPath . tail . splitDirectories . takeDirectory) p
        </> takeFileName p -<.> ".html"
        where p = toFilePath ident
-- | From org get metadatas.
orgCompiler :: Item String -> Compiler (Item String)
orgCompiler = pure . fmap (\s -> (metadatasToStr . orgMetadatas) s ++ s)
orgMetadatas :: String -> [String]
orgMetadatas = map (format . lower . clean) . takeWhile (/= "") . lines
  where
    clean = concat . splitOn "#+"
    lower s = (map toLower . takeWhile (/= ':')) s ++ dropWhile (/= ':') s
    format xs
        | -- drop weekday str. 2018-05-03 Thu -> 2018-05-03
          "date" `isPrefixOf` xs
        = take 16 . concat . splitOn ">" . concat . splitOn "<" $ xs
        | otherwise
        = xs
metadatasToStr :: [String] -> String
metadatasToStr = ("----------\n" ++) . (++ "----------\n") . unlines
----

Bisonex save

Dependencies

In channels

Change contents

Replacement in workout.org at line 6125 [7.1]

Replacement in workout.org at line 6127 [7.1]

Replacement in workout.org at line 6129 [7.1]

Replacement in workout.org at line 6133 [7.1]

Replacement in workout.org at line 6135 [7.1]

Replacement in workout.org at line 6137 [7.1]

Replacement in workout.org at line 6139 [7.1]

Replacement in workout.org at line 6143 [7.1]

Replacement in workout.org at line 6145 [7.1]