apraga/org - Change VVTLYGPUEPMKPXBSSQYFFSSPOCKVPQ42H47ILYYBWRFEBACZWJ3AC

Thesis meeting notes

Created by Alexis Praga on August 10, 2023

VVTLYGPUEPMKPXBSSQYFFSSPOCKVPQ42H47ILYYBWRFEBACZWJ3AC

Dependencies

In channels

main

Change contents

Replacement in projects/bisonex.org at line 3 [5.35]
B:BD[3.16390] → [4.29:8221]
[3.16390]
[4.8221]

Replacement in projects/bisonex.org at line 9 [5.35]

B:BD[6.16311] → [2.239:8431]


*** DONE Haplotypecaller
CLOSED: [2023-06-26 Mon 19:42] SCHEDULED: <2023-06-15 Thu>
*** DONE Faire fonctionner le filtre technical variant
CLOSED: [2023-08-03 Thu 14:24] SCHEDULED: <2023-08-03 Wed 10:30>
*** DONE Annotation vep seule
CLOSED: [2023-08-05 Sat 08:59] SCHEDULED: <2023-08-05 Sat>
T2T n'a pas
- de version merged
- polyphen
- gnomAD
On désactive l'annotation spip pour le moment
*** PROJ [#A] Porter Spip
*** DONE Générer la base de donnée spip
CLOSED: [2023-08-09 Wed 21:41] SCHEDULED: <2023-08-03 Thu 11:30>
**** KILL Vérifier la génération du transcriptome en hg38: checksum différent
CLOSED: [2023-08-09 Wed 21:41]
- [X] Nettoyer et vérifier sur hg38 avec ediff les RData : différent
- [X] Sinon, ne pas nettoyer et générer: idem
**** DONE Récupérer ncbi RefSeq curated
CLOSED: [2023-08-07 Mon 22:59] SCHEDULED: <2023-08-06 Sun>
.txt sur UCSC mais pas en T2T: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/
Format: https://genome.ucsc.edu/cgi-bin/hgTables?hgsid=1173061381_UepaHnvaOKFZKMOV4o7DtcNUHGVa&hgta_doSchemaDb=chlSab2&hgta_doSchemaTable=ncbiRefSeqCurated
Ancient format vs nouveau
|  1 | bin          |  1 | chrom        |
|  2 | name         |  2 | chromStart   |
|  3 | chrom        |  3 | chromEnd     |
|  4 | strand       |  4 | name         |
|  5 | txStart      |  5 | score        |
|  6 | txEnd        |  6 | strand       |
|  7 | cdsStart     |  7 | thickStart   |
|  8 | cdsEnd       |  8 | thickEnd     |
|  9 | exonCount    |  9 | reserved     |
| 10 | exonStarts   | 10 | blockCount   |
| 11 | exonEnds     | 11 | blockSizes   |
| 12 | score        | 12 | chromStarts  |
| 13 | name2        | 13 | name2        |
| 14 | cdsStartStat | 14 | cdsStartStat |
| 15 | cdsEndStat   | 15 | cdsEndStat   |
| 16 | exonFrames   | 16 | exonFrames   |
|    |              | 17 | type         |
|    |              | 18 | geneName     |
|    |              | 19 | geneName2    |
|    |              | 20 | geneType     |
En T2T, seulement au format bigBed : https://hgdownload.soe.ucsc.edu/gbdb/hs1/ncbiRefSeq/
Il y a un exécutable pour convertir en bed : http://hgdownload.soe.ucsc.edu/admin/exe/
Sous gentoo, il faut instaler mit-krb5 (pour libkrb5)
#+begin_src
./bigBedToBed ncbiRefSeqCurated.bb ncbiRefSeqCurated.bed
#+end_src
Exemple:
chr1    7505    13582   NR_182076.1     0       -       13582   13582   0       2       5477,138,       0,5939, LOC127239154    none    none    -1,-1,          NR_182076.1     LOC127239154
Dans R:
   V1        V2   V3 V4    V5    V6    V7    V8 V9                V10
1 585 NR_046018 chr1  + 11873 14409 14409 14409  3 11873,12612,13220,
                 V11 V12     V13  V14  V15       V16           V17         V18
1 12227,12721,14409,   0 DDX11L1 none none -1,-1,-1, 354,109,1189, 0,739,1347,
Ne pas oublier les headers car ils sont dans un ordre différent:
Colonnes en GRGh38 =
3, 5, 6, 2, 12, 4, 7,  8, 12, 9, 17, 18, 13
Correspondance en T2T
1, 7, 8, 4, 5,  6, 14, 15, 5, ?,  ?,  ?, 13
En fait, il suffit d'avoir
- le gène
- le début du transcrit
- la fin du transcrit
- le brin
  pour générer
***** KILL Tester correspondance partielle ?
CLOSED: [2023-08-07 Mon 22:58]
pas de CDS et pas de colonne 17 et 18
seules les colonnes (dans la nouvelle dataframe) 10,11,12 causent problèmes (9,17,18 dans les ancienne)
NB: on peut retrouver le nombre d'exons colonnes 9 à partir de la lons
***** DONE Correspondance totale
CLOSED: [2023-08-07 Mon 22:59]
> dataRefSeq[1,]
    V1    V2    V3        V4 V5 V6    V7    V8 V9 V10           V11         V12
1 chr1 11873 14409 NR_046018  0  + 14409 14409  0   3 354,109,1189, 0,739,1347,
      V13
1 DDX11L1
> source("pkgs/getRefSeqDatabaseT2T.r")
Use the URL: http://hgdownload.cse.ucsc.edu/goldenPath/
read files...
> dataRefSeq[1,]
    V1   V2    V3        V4 V5 V6    V7    V8 V5.1 V10       V11     V12
1 chr1 7505 13582 NR_182076  0  - 13582 13582    0   2 5477,138, 0,5939,
           V13
1 LOC127239154
*** TODO Vérifier annotation SPIP sur variants confirmer
SCHEDULED: <2023-08-09 Wed>
**** 5 variants patho tirés de l'article princips
On trié par SQUIRLS décroissant
#+begin_src sh
varID
NM_000051:c.2251-10T>G
NM_000267:c.889-12T>A
NM_000059:c.8488-9T>G
NM_000249:c.589-10T>A
NM_000249:c.791-7T>A
#+end_src
***** DONE En hg38
CLOSED: [2023-08-09 Wed 22:01]
#+begin_src
spip --input test-spip.txt --output test-spip.out --GenomeAssenbly hg38 --threads 1 --maxLines 1000
#+end_src
#+RESULTS:
| varID                  | Interpretation | InterConfident              | SPiPscore | strand |    gNomen | varType      | ntChange | ExonInfo  | exonSize | transcript | gene  | NearestSS | DistSS | RegType    | SPiCEproba | SPiCEinter_2thr | deltaMES | BP  | mutInPBarea | deltaESRscore | posCryptMut | sstypeCryptMut | probaCryptMut | classProbaCryptMut | nearestSStoCrypt | nearestPosSStoCrypt | nearestDistSStoCrypt | posCryptWT | probaCryptWT | classProbaCryptWT | posSSPhysio | probaSSPhysio | classProbaSSPhysio | probaSSPhysioMut | classProbaSSPhysioMut |
| NM_000051:c.2251-10T>G | Alter by SPiCE | 98.41 % [91.47 % - 99.96 %] |     0.986 | +      | 108257471 | substitution | T>G      | Intron 14 |     1140 | NM_000051  | ATM   | acceptor  |    -10 | IntronCons |          1 | high            |        0 | 0No |          10 |     108257471 | Acc         |    0.024836003 | No            | Acc                |        108257480 |                 -10 |                    0 |          0 | No           |         108257480 | 0.006489079 | No            |     0.000004368542 | No               |                       |
| NM_000267:c.889-12T>A  | Alter by SPiCE | 98.41 % [91.47 % - 99.96 %] |     1.000 | +      |  31200410 | substitution | T>A      | Intron 8  |    17756 | NM_000267  | NF1   | acceptor  |    -12 | IntronCons |          1 | high            |        0 | 0No |          10 |      31200411 | Acc         |    0.009082899 | No            | Acc                |         31200421 |                 -11 |                    0 |          0 | No           |          31200421 | 0.005160854 | No            |     0.000003718518 | No               |                       |
| NM_000059:c.8488-9T>G  | Alter by SPiCE | 98.41 % [91.47 % - 99.96 %] |     0.994 | +      |  32370947 | substitution | T>G      | Intron 19 |      398 | NM_000059  | BRCA2 | acceptor  |     -9 | IntronCons |          1 | high            |        0 | 0No |          10 |      32370947 | Acc         |    0.004449623 | No            | Acc                |         32370955 |                  -9 |                    0 |          0 | No           |          32370955 | 0.005060308 | No            |     0.000005609419 | No               |                       |
| NM_000249:c.589-10T>A  | Alter by SPiCE | 98.41 % [91.47 % - 99.96 %] |     0.978 | +      |  37012001 | substitution | T>A      | Intron 7  |      148 | NM_000249  | MLH1  | acceptor  |    -10 | IntronCons |          1 | high            |        0 | 0No |          10 |      37012002 | Acc         |    0.009529819 | No            | Acc                |         37012010 |                  -9 |                    0 |          0 | No           |          37012010 | 0.028437574 | No            |     0.000009275960 | No               |                       |
| NM_000249:c.791-7T>A   | Alter by SPiCE | 98.41 % [91.47 % - 99.96 %] |     0.988 | +      |  37017499 | substitution | T>A      | Intron 9  |     2961 | NM_000249  | MLH1  | acceptor  |     -7 | IntronCons |          1 | high            |        0 | 0No |          10 |      37017500 | Acc         |    0.015564917 | No            | Acc                |         37017505 |                  -6 |                    0 |          0 | No           |          37017505 | 0.023995855 | No            |     0.000022606476 | No               |                       |
Test sur mobidetails : 98% pour spip (! différent du fichier excel...)
Second variant: ok en VCf également
***** DONE Lifter les variants T2T : ok ! mais multiples trnascrits en génomique ...
CLOSED: [2023-08-09 Wed 23:23] SCHEDULED

[6.16311]

[2.8431]


*** DONE Haplotypecaller
CLOSED: [2023-06-26 Mon 19:42] SCHEDULED: <2023-06-15 Thu>
*** DONE Faire fonctionner le filtre technical variant
CLOSED: [2023-08-03 Thu 14:24] SCHEDULED: <2023-08-03 Wed 10:30>
*** DONE Annotation vep seule
CLOSED: [2023-08-05 Sat 08:59] SCHEDULED: <2023-08-05 Sat>
T2T n'a pas
- de version merged
- polyphen
- gnomAD
On désactive l'annotation spip pour le moment
*** TODO [#A] Porter Spip proprement
SCHEDULED: <2023-08-11 Fri>
*** DONE Générer la base de donnée spip
CLOSED: [2023-08-09 Wed 21:41] SCHEDULED: <2023-08-03 Thu 11:30>
**** KILL Vérifier la génération du transcriptome en hg38: checksum différent
CLOSED: [2023-08-09 Wed 21:41]
- [X] Nettoyer et vérifier sur hg38 avec ediff les RData : différent
- [X] Sinon, ne pas nettoyer et générer: idem
**** DONE Récupérer ncbi RefSeq curated
CLOSED: [2023-08-07 Mon 22:59] SCHEDULED: <2023-08-06 Sun>
.txt sur UCSC mais pas en T2T: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/
Format: https://genome.ucsc.edu/cgi-bin/hgTables?hgsid=1173061381_UepaHnvaOKFZKMOV4o7DtcNUHGVa&hgta_doSchemaDb=chlSab2&hgta_doSchemaTable=ncbiRefSeqCurated
Ancient format vs nouveau
|  1 | bin          |  1 | chrom        |
|  2 | name         |  2 | chromStart   |
|  3 | chrom        |  3 | chromEnd     |
|  4 | strand       |  4 | name         |
|  5 | txStart      |  5 | score        |
|  6 | txEnd        |  6 | strand       |
|  7 | cdsStart     |  7 | thickStart   |
|  8 | cdsEnd       |  8 | thickEnd     |
|  9 | exonCount    |  9 | reserved     |
| 10 | exonStarts   | 10 | blockCount   |
| 11 | exonEnds     | 11 | blockSizes   |
| 12 | score        | 12 | chromStarts  |
| 13 | name2        | 13 | name2        |
| 14 | cdsStartStat | 14 | cdsStartStat |
| 15 | cdsEndStat   | 15 | cdsEndStat   |
| 16 | exonFrames   | 16 | exonFrames   |
|    |              | 17 | type         |
|    |              | 18 | geneName     |
|    |              | 19 | geneName2    |
|    |              | 20 | geneType     |
En T2T, seulement au format bigBed : https://hgdownload.soe.ucsc.edu/gbdb/hs1/ncbiRefSeq/
Il y a un exécutable pour convertir en bed : http://hgdownload.soe.ucsc.edu/admin/exe/
Sous gentoo, il faut instaler mit-krb5 (pour libkrb5)
#+begin_src
./bigBedToBed ncbiRefSeqCurated.bb ncbiRefSeqCurated.bed
#+end_src
Exemple:
chr1    7505    13582   NR_182076.1     0       -       13582   13582   0       2       5477,138,       0,5939, LOC127239154    none    none    -1,-1,          NR_182076.1     LOC127239154
Dans R:
   V1        V2   V3 V4    V5    V6    V7    V8 V9                V10
1 585 NR_046018 chr1  + 11873 14409 14409 14409  3 11873,12612,13220,
                 V11 V12     V13  V14  V15       V16           V17         V18
1 12227,12721,14409,   0 DDX11L1 none none -1,-1,-1, 354,109,1189, 0,739,1347,
Ne pas oublier les headers car ils sont dans un ordre différent:
Colonnes en GRGh38 =
3, 5, 6, 2, 12, 4, 7,  8, 12, 9, 17, 18, 13
Correspondance en T2T
1, 7, 8, 4, 5,  6, 14, 15, 5, ?,  ?,  ?, 13
En fait, il suffit d'avoir
- le gène
- le début du transcrit
- la fin du transcrit
- le brin
  pour générer
***** KILL Tester correspondance partielle ?
CLOSED: [2023-08-07 Mon 22:58]
pas de CDS et pas de colonne 17 et 18
seules les colonnes (dans la nouvelle dataframe) 10,11,12 causent problèmes (9,17,18 dans les ancienne)
NB: on peut retrouver le nombre d'exons colonnes 9 à partir de la lons
***** DONE Correspondance totale
CLOSED: [2023-08-07 Mon 22:59]
> dataRefSeq[1,]
    V1    V2    V3        V4 V5 V6    V7    V8 V9 V10           V11         V12
1 chr1 11873 14409 NR_046018  0  + 14409 14409  0   3 354,109,1189, 0,739,1347,
      V13
1 DDX11L1
> source("pkgs/getRefSeqDatabaseT2T.r")
Use the URL: http://hgdownload.cse.ucsc.edu/goldenPath/
read files...
> dataRefSeq[1,]
    V1   V2    V3        V4 V5 V6    V7    V8 V5.1 V10       V11     V12
1 chr1 7505 13582 NR_182076  0  - 13582 13582    0   2 5477,138, 0,5939,
           V13
1 LOC127239154
*** DONE Vérifier annotation SPIP sur variants confirmé
CLOSED: [2023-08-10 Thu 23:16] SCHEDULED: <2023-08-09 Wed>
**** DONE 5 variants patho tirés de l'article princips
CLOSED: [2023-08-10 Thu 23:00]
On trié par SQUIRLS décroissant
#+begin_src sh
varID
NM_000051:c.2251-10T>G
NM_000267:c.889-12T>A
NM_000059:c.8488-9T>G
NM_000249:c.589-10T>A
NM_000249:c.791-7T>A
#+end_src
***** DONE En hg38
CLOSED: [2023-08-09 Wed 22:01]
#+begin_src
spip --input test-spip.txt --output test-spip.out --GenomeAssenbly hg38 --threads 1 --maxLines 1000
#+end_src
#+RESULTS:
| varID                  | Interpretation | InterConfident              | SPiPscore | strand |    gNomen | varType      | ntChange | ExonInfo  | exonSize | transcript | gene  | NearestSS | DistSS | RegType    | SPiCEproba | SPiCEinter_2thr | deltaMES | BP  | mutInPBarea | deltaESRscore | posCryptMut | sstypeCryptMut | probaCryptMut | classProbaCryptMut | nearestSStoCrypt | nearestPosSStoCrypt | nearestDistSStoCrypt | posCryptWT | probaCryptWT | classProbaCryptWT | posSSPhysio | probaSSPhysio | classProbaSSPhysio | probaSSPhysioMut | classProbaSSPhysioMut |
| NM_000051:c.2251-10T>G | Alter by SPiCE | 98.41 % [91.47 % - 99.96 %] |     0.986 | +      | 108257471 | substitution | T>G      | Intron 14 |     1140 | NM_000051  | ATM   | acceptor  |    -10 | IntronCons |          1 | high            |        0 | 0No |          10 |     108257471 | Acc         |    0.024836003 | No            | Acc                |        108257480 |                 -10 |                    0 |          0 | No           |         108257480 | 0.006489079 | No            |     0.000004368542 | No               |                       |
| NM_000267:c.889-12T>A  | Alter by SPiCE | 98.41 % [91.47 % - 99.96 %] |     1.000 | +      |  31200410 | substitution | T>A      | Intron 8  |    17756 | NM_000267  | NF1   | acceptor  |    -12 | IntronCons |          1 | high            |        0 | 0No |          10 |      31200411 | Acc         |    0.009082899 | No            | Acc                |         31200421 |                 -11 |                    0 |          0 | No           |          31200421 | 0.005160854 | No            |     0.000003718518 | No               |                       |
| NM_000059:c.8488-9T>G  | Alter by SPiCE | 98.41 % [91.47 % - 99.96 %] |     0.994 | +      |  32370947 | substitution | T>G      | Intron 19 |      398 | NM_000059  | BRCA2 | acceptor  |     -9 | IntronCons |          1 | high            |        0 | 0No |          10 |      32370947 | Acc         |    0.004449623 | No            | Acc                |         32370955 |                  -9 |                    0 |          0 | No           |          32370955 | 0.005060308 | No            |     0.000005609419 | No               |                       |
| NM_000249:c.589-10T>A  | Alter by SPiCE | 98.41 % [91.47 % - 99.96 %] |     0.978 | +      |  37012001 | substitution | T>A      | Intron 7  |      148 | NM_000249  | MLH1  | acceptor  |    -10 | IntronCons |          1 | high            |        0 | 0No |          10 |      37012002 | Acc         |    0.009529819 | No            | Acc                |         37012010 |                  -9 |                    0 |          0 | No           |          37012010 | 0.028437574 | No            |     0.000009275960 | No               |                       |
| NM_000249:c.791-7T>A   | Alter by SPiCE | 98.41 % [91.47 % - 99.96 %] |     0.988 | +      |  37017499 | substitution | T>A      | Intron 9  |     2961 | NM_000249  | MLH1  | acceptor  |     -7 | IntronCons |          1 | high            |        0 | 0No |          10 |      37017500 | Acc         |    0.015564917 | No            | Acc                |         37017505 |                  -6 |                    0 |          0 | No           |          37017505 | 0.023995855 | No            |     0.000022606476 | No               |                       |
Test sur mobidetails : 98% pour spip (! différent du fichier excel...)
Second variant: ok en VCf également
***** DONE Lifter les variants T2T : ok ! mais multiples trnascrits en génomique ...
CLOSED: [2023-08-09 Wed 23:23] SCHEDULED

Replacement in projects/bisonex.org at line 14 [5.35]

B:BD[2.41199] → [2.41199:47914]

∅:D[2.47914] → [7.8526:9755]

B:BD[7.8526] → [7.8526:9755]

∅:D[7.9755] → [4.24606:24648]

B:BD[4.24606] → [4.24606:24648]

∅:D[4.24648] → [8.24635:24841]

B:BD[8.24635] → [8.24635:24841]

0 |  0 | No          |            10 |    37018845 | Acc            | 0.01556491731833 | No                 | Don              |            37015889 |                 2956 |   37018850 | 0.02399585516738 | No                |    37026980 |   0.028353019 | No                 |   0.028353019047 | No                    |
| chr3  |  37018844 | lol | T   | A   | .    | .      | .    | NM_001354628:g.37018844:T>A  | Alter by SPiCE | 98.41 % [91.47 % - 99.96 %] |     0.988 | +      |  37018844 | substitution | T>A      | Intron 9  |     2961 | NM_001354628 | MLH1  | acceptor  |     -7 | IntronCons |          1 | high                         |        0 |  0 | No          |            10 |    37018845 | Acc            | 0.01556491731833 | No                 | Acc              |            37018850 |                   -6 |          0 | 0.00000000000000 | No                |    37018850 |   0.023995855 | No                 |   0.000022606476 | No                    |
| chr3  |  37018844 | lol | T   | A   | .    | .      | .    | NM_001354629:g.37018844:T>A  | Alter by SPiCE | 98.41 % [91.47 % - 99.96 %] |     0.988 | +      |  37018844 | substitution | T>A      | Intron 8  |     2961 | NM_001354629 | MLH1  | acceptor  |     -7 | IntronCons |          1 | high                         |        0 |  0 | No          |            10 |    37018845 | Acc            | 0.01556491731833 | No                 | Acc              |            37018850 |                   -6 |          0 | 0.00000000000000 | No                |    37018850 |   0.023995855 | No                 |   0.000022606476 | No                    |
| chr3  |  37018844 | lol | T   | A   | .    | .      | .    | NM_001354630:g.37018844:T>A  | Alter by SPiCE | 98.41 % [91.47 % - 99.96 %] |     0.988 | +      |  37018844 | substitution | T>A      | Intron 9  |     2961 | NM_001354630 | MLH1  | acceptor  |     -7 | IntronCons |          1 | high                         |        0 |  0 | No          |            10 |    37018845 | Acc            | 0.01556491731833 | No                 | Acc              |            37018850 |                   -6 |          0 | 0.00000000000000 | No                |    37018850 |   0.023995855 | No                 |   0.000022606476 | No                    |
|       |           |     |     |     |      |        |      |                              |                |                             |           |        |           |              |          |           |          |              |       |           |        |            |            |                              |          |    |             |               |             |                |                  |                    |                  |                     |                      |            |                  |                   |             |               |                    |                  |                       |
***** TODO Comment gérer les multiples transcripts ?
SCHEDULED: <2023-08-09 Wed>
*** PROJ [#A] Filtre vep (avec spip ?)
** PROJ [#B] Indicateurs qualité
*** Idée
Raredisease:
- FastQC : nombreuses statistiques. Non disponible Nix
- Mosdepth : calcule la profondeur (2x plus rapide que samtools depth). Nix
- MultiQC : fusionne juste les résultats des analyses. Non disponible nix
- Picard's CollectMutipleMetrics, CollectHsMetrics, and CollectWgsMetrics
- Qualimap : alternative fastqc ? Non disponible nix
- Sentieon's WgsMetricsAlgo : propriétaire
- TIDDIT's cov : TIDIT = remaninement chromosomique
Sarek:
- alignment statistics : samtools stats, mosdepth
- QC : MultiQC
MultiQC : non disponible Nix
** PROJ [#B] Compte-redu exécution avec MultiQC
** PROJ vérifier si normalisation
** PROJ [#B] Vérification nomenclature hgvs avec mutalyzer
** DONE Exécution
CLOSED: [2022-09-13 Tue 21:37]
*** KILL test Bionix
*** KILL Implémenter execution avec Nix ?
Voir https://academic.oup.com/gigascience/article/9/11/giaa121/5987272?login=false
pour un exemple.
Probablement plus simple d’utiliser Nix pour gestion de l’environnement et snakemake pour l’exécution
Pas d’accès internet depuis le cluster
*** DONE nextflow
CLOSED: [2022-09-13 Tue 21:37]
**** TODO Bug scheduler SGE
Le job se fait tuer car l'utilisateur n'est pas passé correctement à nextflow
***** DONE Forcer l'utilisateur à l'exécution
CLOSED: [2023-04-01 Sat 17:57]
NXF_OPTS=-D"user.name=alex"
***** DONE Vérifier si le problème persiste avec 22.10.6
CLOSED: [2023-04-01 Sat 18:38] SCHEDULED: <2023-04-01 Sat>
oui
***** KILL Packager l'utilisateur dans le programme ?
Mauvaise idée..
** TODO Preprocessing avec nextflow
*** TODO Map to reference
**** TODO Sample ID dans header
/Work/Users/apraga/bisonex/out/63003856_S135/preprocessing/baserecalibrator
*** DONE Mark duplicate
CLOSED: [2022-10-09 Sun 22:30]
*** DONE Recalibrate base quality score
CLOSED: [2022-10-09 Sun 22:30]
** DONE Variant calling avec Nextflow
CLOSED: [2022-11-19 Sat 21:34]
*** DONE Haplotype caller
CLOSED: [2022-10-09 Sun 22:40]
*** DONE Filter variants
CLOSED: [2022-10-09 Sun 22:40]
*** DONE Filter common snp not clinvar path
CLOSED: [2022-11-07 Mon 23:00]
Voir [[*common dbSNP not clinvar patho][common dbSNP not clinvar patho]]
*** DONE Filter variant only in consensual sequence
CLOSED: [2022-11-08 Tue 22:23]
*** DONE Filter technical variants
CLOSED: [2022-11-19 Sat 21:34]
*** DONE Utilise AVX pour accélerer l'exécution
CLOSED: [2023-04-29 Sat 15:46]
Sans cela, on a l'avertissement
#+begin_quote
17:28:00.720 INFO  PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported
17:28:00.721 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/nix/store/cy9ckxqwrkifx7wf02hm4ww1p6lnbxg9-gatk-4.2.4.1/bin/gatk-package-4.2.4.1-local.jar!/com/intel/gkl/native/libgkl_utils.so
17:28:00.733 WARN  NativeLibraryLoader - Unable to load libgkl_utils.so from native/libgkl_utils.so (/Work/Users/apraga/bisonex/out/NA12878_NIST7035/preprocessing/applybqsr/libgkl_utils821485189051585397.so: libgomp.so.1: cannot open shared object file: No such file or directory)
17:28:00.733 WARN  IntelPairHmm - Intel GKL Utils not loaded
17:28:00.733 WARN  PairHMM - ***WARNING: Machine does not have the AVX instruction set support needed for the accelerated AVX PairHmm. Falling back to the MUCH slower LOGLESS_CACHING implementation!
17:28:00.763 INFO  ProgressMeter - Starting traversal
#+end_quote
libgomp.so est fourni par gcc donc il faut charger le module
 module load gcc@11.3.0/gcc-12.1.0
** KILL Utiliser subworkflow
CLOSED: [2023-04-02 Sun 18:08]
Notre version permet d'être plus souple
*** KILL Alignement
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
*** KILL Vep
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
vcf_annotate_ensemblvep
** TODO Annotation avec nextflow :annotation:
*** KILL VEP : --gene-phenotype ?
CLOSED: [2023-04-18 mar. 18:32]
Vu avec alexis : bases de données non à jour
https://www.ensembl.org/info/genome/variation/phenotype/sources_phenotype_documentation.html
*** DONE plugin VEP
CLOSED: [2023-04-18 mar. 18:32]
Cloner dépôt git avec plugin
Puis utiliser --dir_plugins
*** HOLD Utiliser code d’Alexis
*** TODO Nouvelle version avec VEP
Example avec --custom
https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html
**** DONE Ajout spliceAI
CLOSED: [2023-05-18 Thu 11:02] SCHEDULED: <2023-04-30 Sun>
plugin VEP
***** DONE Télécharger les données
CLOSED: [2023-05-11 Thu 19:01]
Difficile d'automatiser, le lien est temporaire...
***** DONE PLugin
CLOSED: [2023-05-11 Thu 20:16]
***** DONE Séparer score en plusieurs colonnes
CLOSED: [2023-05-11 Thu 20:16]
Test avec ce fichier pour avoir une ligne avec annotation et une ligne sans
#CHROM	POS	ID	REF	ALT
1	9091	.	A	C
1	69091	.	A	C
et
#+begin_src sh
rm -f pos
tvep.tsv* && vep -i testspliceai.vcf.gz -o
 postvep.tsv --tab  --dir 109 --merged --pick --use_given_ref   --offline  --plugin SpliceAI,snv=spliceai_scores.raw.snv.hg38.vcf.gz,indel=spliceai_scores.raw.indel.hg38.vcf.gz
#+end_src
#+begin_src

[2.41199]

[8.24841]

0 |  0 | No          |            10 |    37018845 | Acc            | 0.01556491731833 | No                 | Don              |            37015889 |                 2956 |   37018850 | 0.02399585516738 | No                |    37026980 |   0.028353019 | No                 |   0.028353019047 | No                    |
| chr3  |  37018844 | lol | T   | A   | .    | .      | .    | NM_001354628:g.37018844:T>A  | Alter by SPiCE | 98.41 % [91.47 % - 99.96 %] |     0.988 | +      |  37018844 | substitution | T>A      | Intron 9  |     2961 | NM_001354628 | MLH1  | acceptor  |     -7 | IntronCons |          1 | high                         |        0 |  0 | No          |            10 |    37018845 | Acc            | 0.01556491731833 | No                 | Acc              |            37018850 |                   -6 |          0 | 0.00000000000000 | No                |    37018850 |   0.023995855 | No                 |   0.000022606476 | No                    |
| chr3  |  37018844 | lol | T   | A   | .    | .      | .    | NM_001354629:g.37018844:T>A  | Alter by SPiCE | 98.41 % [91.47 % - 99.96 %] |     0.988 | +      |  37018844 | substitution | T>A      | Intron 8  |     2961 | NM_001354629 | MLH1  | acceptor  |     -7 | IntronCons |          1 | high                         |        0 |  0 | No          |            10 |    37018845 | Acc            | 0.01556491731833 | No                 | Acc              |            37018850 |                   -6 |          0 | 0.00000000000000 | No                |    37018850 |   0.023995855 | No                 |   0.000022606476 | No                    |
| chr3  |  37018844 | lol | T   | A   | .    | .      | .    | NM_001354630:g.37018844:T>A  | Alter by SPiCE | 98.41 % [91.47 % - 99.96 %] |     0.988 | +      |  37018844 | substitution | T>A      | Intron 9  |     2961 | NM_001354630 | MLH1  | acceptor  |     -7 | IntronCons |          1 | high                         |        0 |  0 | No          |            10 |    37018845 | Acc            | 0.01556491731833 | No                 | Acc              |            37018850 |                   -6 |          0 | 0.00000000000000 | No                |    37018850 |   0.023995855 | No                 |   0.000022606476 | No                    |
|       |           |     |     |     |      |        |      |                              |                |                             |           |        |           |              |          |           |          |              |       |           |        |            |            |                              |          |    |             |               |             |                |                  |                    |                  |                     |                      |            |                  |                   |             |               |                    |                  |                       |
**** DONE 1 variant avec score = 38% (mobidetails)
CLOSED: [2023-08-10 Thu 23:16]
chr10:g.89010760A>G
Avec
#+begin_src sh
varID
NM_000043.4:c.513A>G
#+end_src
On a bien 35.81%
| varID                | Interpretation | InterConfident             | SPiPscore | strand |   gNomen | varType      | ntChange | ExonInfo | exonSize | transcript | gene | NearestSS | DistSS | RegType | SPiCEproba | SPiCEinter_2thr              | deltaMES | BP | mutInPBarea | deltaESRscore | posCryptMut |  sstypeCryptMut | probaCryptMut | classProbaCryptMut | nearestSStoCrypt | nearestPosSStoCrypt | nearestDistSStoCrypt |      posCryptWT | probaCryptWT | classProbaCryptWT | posSSPhysio | probaSSPhysio | classProbaSSPhysio | probaSSPhysioMut | classProbaSSPhysioMut |
| NM_000043.4:c.513A>G | Alter ESR      | 35.81 % [28.11 % - 44.1 %] |     0.288 | +      | 89010760 | substitution | A>G      | Exon 6   |       63 | NM_000043  | FAS  | acceptor  |      8 | ExonESR |          0 | Outside SPiCE Interpretation |       00 | No |    -1.67753 |      89010759 | Acc         | 0.0000003317384 | No            | Acc                |         89010752 |                   7 |             89010759 | 0.0000002205815 | No           |          89010752 |  0.02545572 | No            |         0.02545572 | No               |                       |
Avec les coordonnées génomiques en hg38, on ne retrouve pas le transcrit (non visible sur UCSCS...) parmi les nombreux transcrit.
Mais en T2T oui !
##fileformat=VCFv4.0
##assembly=CHM18v2.0/hs1
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
chr10	89894645	lol	A	G	.	.	.
❯ ./result/bin/spip -I test-spip-T2T.vcf -O test-spip-T2T.out -g hs1 --refseq dataRefSeqhs1.RData --transcriptome transcriptome_hs1.RData
| CHROM |      POS | ID  | REF | ALT | QUAL | FILTER | INFO | varID                       | Interpretation | InterConfident             | SPiPscore | strand |   gNomen | varType      | ntChange | ExonInfo | exonSize | transcript   | gene | NearestSS | DistSSRegType | SPiCEproba | SPiCEinter_2thr | deltaMES                     | BP | mutInPBarea | deltaESRscore | posCryptMut | sstypeCryptMut | probaCryptMut | classProbaCryptMut | nearestSStoCrypt | nearestPosSStoCrypt | nearestDistSStoCrypt | posCryptWT | probaCryptWT | classProbaCryptWT | posSSPhysio | probaSSPhysio | classProbaSSPhysio | probaSSPhysioMut | classProbaSSPhysioMut |     |
| chr10 | 89894645 | lol | A   | G   | .    | .      | .    | NM_000043:g.89894645:A>G    | Alter ESR      | 35.81 % [28.11 % - 44.1 %] |     0.288 | +      | 89894645 | substitution | A>G      | Exon 6   |       63 | NM_000043    | FAS  | acceptor  |             8 | ExonESR    |               0 | Outside SPiCE Interpretation |  0 |           0 | No            |    -1.67753 |       89894644 | Acc           |    0.0000003317384 | No               | Acc                 |             89894637 |          7 |     89894644 |   0.0000002205815 | No          |      89894637 |         0.02545572 | No               |            0.02545572 | No  |
| chr10 | 89894645 | lol | A   | G   | .    | .      | .    | NM_001320619:g.89894645:A>G | Alter ESR      | 35.81 % [28.11 % - 44.1 %] |     0.288 | +      | 89894645 | substitution | A>G      | Exon 6   |       63 | NM_001320619 | FAS  | acceptor  |             8 | ExonESR    |               0 | Outside SPiCE Interpretation |  0 |           0 | No            |    -1.67753 |       89894644 | Acc           |    0.0000003317384 | No               | Acc                 |             89894637 |          7 |     89894644 |   0.0000002205815 | No          |      89894637 |         0.02545572 | No               |            0.02545572 | No  |
| chr10 | 89894645 | lol | A   | G   | .    | .      | .    | NM_152871:g.89894645:A>G    | NTR            | 00 % [00 % - 00.92 %]      |     0.000 | +      | 89894645 | substitution | A>G      | Intron 5 |     1398 | NM_152871    | FAS  | donor     |           160 | DeepIntron |               0 | Outside SPiCE Interpretation |  0 |           0 | No            |    10.00000 |       89894644 | Don           |    0.0001360257829 | No               | Don                 |             89894485 |        159 |            0 |   0.0000000000000 | No          |      89894485 |         0.07177992 | Yes              |            0.07177992 | Yes |
| chr10 | 89894645 | lol | A   | G   | .    | .      | .    | NM_152872:g.89894645:A>G    | Alter ESR      | 35.81 % [28.11 % - 44.1 %] |     0.288 | +      | 89894645 | substitution | A>G      | Exon 6   |       63 | NM_152872    | FAS  | acceptor  |             8 | ExonESR    |               0 | Outside SPiCE Interpretation |  0 |           0 | No            |    -1.67753 |       89894644 | Acc           |    0.0000003317384 | No               | Acc                 |             89894637 |          7 |     89894644 |   0.0000002205815 | No          |      89894637 |         0.02545572 | No               |            0.02545572 | No  |
| chr10 | 89894645 | lol | A   | G   | .    | .      | .    | NR_028033:g.89894645:A>G    | NTR            | 00 % [00 % - 00.92 %]      |     0.000 | +      | 89894645 | substitution | A>G      | Intron 4 |     1398 | NR_028033    | FAS  | donor     |           160 | DeepIntron |               0 | Outside SPiCE Interpretation |  0 |           0 | No            |    10.00000 |       89894644 | Don           |    0.0001360257829 | No               | Don                 |             89894485 |        159 |            0 |   0.0000000000000 | No          |      89894485 |         0.07177992 | Yes              |            0.07177992 | Yes |
| chr10 | 89894645 | lol | A   | G   | .    | .      | .    | NR_028034:g.89894645:A>G    | NTR            | 00 % [00 % - 00.92 %]      |     0.000 | +      | 89894645 | substitution | A>G      | Intron 3 |     1398 | NR_028034    | FAS  | donor     |           160 | DeepIntron |               0 | Outside SPiCE Interpretation |  0 |           0 | No            |    10.00000 |       89894644 | Don           |    0.0001360257829 | No               | Don                 |             89894485 |        159 |            0 |   0.0000000000000 | No          |      89894485 |         0.07177992 | Yes              |            0.07177992 | Yes |
| chr10 | 89894645 | lol | A   | G   | .    | .      | .    | NR_028035:g.89894645:A>G    | Alter ESR      | 35.81 % [28.11 % - 44.1 %] |     0.288 | +      | 89894645 | substitution | A>G      | Exon 4   |       63 | NR_028035    | FAS  | acceptor  |             8 | ExonESR    |               0 | Outside SPiCE Interpretation |  0 |           0 | No            |    -1.67753 |       89894644 | Acc           |    0.0000003317384 | No               | Acc                 |             89894637 |          7 |     89894644 |   0.0000002205815 | No          |      89894637 |         0.02545572 | No               |            0.02545572 | No  |
| chr10 | 89894645 | lol | A   | G   | .    | .      | .    | NR_028036:g.89894645:A>G    | Alter ESR      | 35.81 % [28.11 % - 44.1 %] |     0.288 | +      | 89894645 | substitution | A>G      | Exon 5   |       63 | NR_028036    | FAS  | acceptor  |             8 | ExonESR    |               0 | Outside SPiCE Interpretation |  0 |           0 | No            |    -1.67753 |       89894644 | Acc           |    0.0000003317384 | No               | Acc                 |             89894637 |          7 |     89894644 |   0.0000002205815 | No          |      89894637 |         0.02545572 | No               |            0.02545572 | No  |
| chr10 | 89894645 | lol | A   | G   | .    | .      | .    | NR_135313:g.89894645:A>G    | Alter ESR      | 35.81 % [28.11 % - 44.1 %] |     0.288 | +      | 89894645 | substitution | A>G      | Exon 5   |       63 | NR_135313    | FAS  | acceptor  |             8 | ExonESR    |               0 | Outside SPiCE Interpretation |  0 |           0 | No            |    -1.67753 |       89894644 | Acc           |    0.0000003317384 | No               | Acc                 |             89894637 |          7 |     89894644 |   0.0000002205815 | No          |      89894637 |         0.02545572 | No               |            0.02545572 | No  |
| chr10 | 89894645 | lol | A   | G   | .    | .      | .    | NM_001410956:g.89894645:A>G | Alter ESR      | 35.81 % [28.11 % - 44.1 %] |     0.288 | +      | 89894645 | substitution | A>G      | Exon 6   |       63 | NM_001410956 | FAS  | acceptor  |             8 | ExonESR    |               0 | Outside SPiCE Interpretation |  0 |           0 | No            |    -1.67753 |       89894644 | Acc           |    0.0000003317384 | No               | Acc                 |             89894637 |          7 |     89894644 |   0.0000002205815 | No          |      89894637 |         0.02545572 | No               |            0.02545572 | No  |
| chr10 | 89894645 | lol | A   | G   | .    | .      | .    | NR_135314:g.89894645:A>G    | Alter ESR      | 35.81 % [28.11 % - 44.1 %] |     0.288 | +      | 89894645 | substitution | A>G      | Exon 6   |       63 | NR_135314    | FAS  | acceptor  |             8 | ExonESR    |               0 | Outside SPiCE Interpretation |  0 |           0 | No            |    -1.67753 |       89894644 | Acc           |    0.0000003317384 | No               | Acc                 |             89894637 |          7 |     89894644 |   0.0000002205815 | No          |      89894637 |         0.02545572 | No               |            0.02545572 | No  |
| chr10 | 89894645 | lol | A   | G   | .    | .      | .    | NR_135315:g.89894645:A>G    | Alter ESR      | 35.81 % [28.11 % - 44.1 %] |     0.288 | +      | 89894645 | substitution | A>G      | Exon 4   |       63 | NR_135315    | FAS  | acceptor  |             8 | ExonESR    |               0 | Outside SPiCE Interpretation |  0 |           0 | No            |    -1.67753 |       89894644 | Acc           |    0.0000003317384 | No               | Acc                 |             89894637 |          7 |     89894644 |   0.0000002205815 | No          |      89894637 |         0.02545572 | No               |            0.02545572 | No  |
|       |          |     |     |     |      |        |      |                             |                |                            |           |        |          |              |          |          |          |              |      |           |               |            |                 |                              |    |             |               |             |                |               |                    |                  |                     |                      |            |              |                   |             |               |                    |                  |                       |     |
**** DONE Vérifier multiples transcripts en hg38 avec coordonées génomiquues: ok
CLOSED: [2023-08-10 Thu 23:00]
Beaucoup plus de transcrits en T2T
Ex: 1 transcrit refseq curated
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A108257446%2D108257496&hgsid=1672963428_J5aWAqack2FpJ7mvhFTNVw7bKzxo
vs 2 transcrits en T2T
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_3671779_hs1&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A108264969%2D108265019&hgsid=1672963612_Eso9frdQ7z6RkKkcKsIf2Waq3pec
C'est bien ce qu'on retrouve avec spip
*** PROJ [#A] Filtre vep (avec spip ?)
** PROJ [#B] Indicateurs qualité
*** Idée
Raredisease:
- FastQC : nombreuses statistiques. Non disponible Nix
- Mosdepth : calcule la profondeur (2x plus rapide que samtools depth). Nix
- MultiQC : fusionne juste les résultats des analyses. Non disponible nix
- Picard's CollectMutipleMetrics, CollectHsMetrics, and CollectWgsMetrics
- Qualimap : alternative fastqc ? Non disponible nix
- Sentieon's WgsMetricsAlgo : propriétaire
- TIDDIT's cov : TIDIT = remaninement chromosomique
Sarek:
- alignment statistics : samtools stats, mosdepth
- QC : MultiQC
MultiQC : non disponible Nix
** PROJ [#B] Compte-redu exécution avec MultiQC
** PROJ vérifier si normalisation
** PROJ [#B] Vérification nomenclature hgvs avec mutalyzer
** DONE Exécution
CLOSED: [2022-09-13 Tue 21:37]
*** KILL test Bionix
*** KILL Implémenter execution avec Nix ?
Voir https://academic.oup.com/gigascience/article/9/11/giaa121/5987272?login=false
pour un exemple.
Probablement plus simple d’utiliser Nix pour gestion de l’environnement et snakemake pour l’exécution
Pas d’accès internet depuis le cluster
*** DONE nextflow
CLOSED: [2022-09-13 Tue 21:37]
**** TODO Bug scheduler SGE
Le job se fait tuer car l'utilisateur n'est pas passé correctement à nextflow
***** DONE Forcer l'utilisateur à l'exécution
CLOSED: [2023-04-01 Sat 17:57]
NXF_OPTS=-D"user.name=alex"
***** DONE Vérifier si le problème persiste avec 22.10.6
CLOSED: [2023-04-01 Sat 18:38] SCHEDULED: <2023-04-01 Sat>
oui
***** KILL Packager l'utilisateur dans le programme ?
Mauvaise idée..
** TODO Preprocessing avec nextflow
*** TODO Map to reference
**** TODO Sample ID dans header
/Work/Users/apraga/bisonex/out/63003856_S135/preprocessing/baserecalibrator
*** DONE Mark duplicate
CLOSED: [2022-10-09 Sun 22:30]
*** DONE Recalibrate base quality score
CLOSED: [2022-10-09 Sun 22:30]
** DONE Variant calling avec Nextflow
CLOSED: [2022-11-19 Sat 21:34]
*** DONE Haplotype caller
CLOSED: [2022-10-09 Sun 22:40]
*** DONE Filter variants
CLOSED: [2022-10-09 Sun 22:40]
*** DONE Filter common snp not clinvar path
CLOSED: [2022-11-07 Mon 23:00]
Voir [[*common dbSNP not clinvar patho][common dbSNP not clinvar patho]]
*** DONE Filter variant only in consensual sequence
CLOSED: [2022-11-08 Tue 22:23]
*** DONE Filter technical variants
CLOSED: [2022-11-19 Sat 21:34]
*** DONE Utilise AVX pour accélerer l'exécution
CLOSED: [2023-04-29 Sat 15:46]
Sans cela, on a l'avertissement
#+begin_quote
17:28:00.720 INFO  PairHMM - OpenMP multi-threaded AVX-accelerated native PairHMM implementation is not supported
17:28:00.721 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/nix/store/cy9ckxqwrkifx7wf02hm4ww1p6lnbxg9-gatk-4.2.4.1/bin/gatk-package-4.2.4.1-local.jar!/com/intel/gkl/native/libgkl_utils.so
17:28:00.733 WARN  NativeLibraryLoader - Unable to load libgkl_utils.so from native/libgkl_utils.so (/Work/Users/apraga/bisonex/out/NA12878_NIST7035/preprocessing/applybqsr/libgkl_utils821485189051585397.so: libgomp.so.1: cannot open shared object file: No such file or directory)
17:28:00.733 WARN  IntelPairHmm - Intel GKL Utils not loaded
17:28:00.733 WARN  PairHMM - ***WARNING: Machine does not have the AVX instruction set support needed for the accelerated AVX PairHmm. Falling back to the MUCH slower LOGLESS_CACHING implementation!
17:28:00.763 INFO  ProgressMeter - Starting traversal
#+end_quote
libgomp.so est fourni par gcc donc il faut charger le module
 module load gcc@11.3.0/gcc-12.1.0
** KILL Utiliser subworkflow
CLOSED: [2023-04-02 Sun 18:08]
Notre version permet d'être plus souple
*** KILL Alignement
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
*** KILL Vep
CLOSED: [2023-04-02 Sun 18:08] SCHEDULED: <2023-04-05 Wed>
vcf_annotate_ensemblvep
** TODO Annotation avec nextflow :annotation:
*** KILL VEP : --gene-phenotype ?
CLOSED: [2023-04-18 mar. 18:32]
Vu avec alexis : bases de données non à jour
https://www.ensembl.org/info/genome/variation/phenotype/sources_phenotype_documentation.html
*** DONE plugin VEP
CLOSED: [2023-04-18 mar. 18:32]
Cloner dépôt git avec plugin
Puis utiliser --dir_plugins
*** HOLD Utiliser code d’Alexis
*** TODO Nouvelle version avec VEP
Example avec --custom
https://www.ensembl.org/info/docs/tools/vep/script/vep_custom.html
**** DONE Ajout spliceAI
CLOSED: [2023-05-18 Thu 11:02] SCHEDULED: <2023-04-30 Sun>
plugin VEP
***** DONE Télécharger les données
CLOSED: [2023-05-11 Thu 19:01]
Difficile d'automatiser, le lien est temporaire...
***** DONE PLugin
CLOSED: [2023-05-11 Thu 20:16]
***** DONE Séparer score en plusieurs colonnes
CLOSED: [2023-05-11 Thu 20:16]
Test avec ce fichier pour avoir une ligne avec annotation et une ligne sans
#CHROM	POS	ID	REF	ALT
1	9091	.	A	C
1	69091	.	A	C
et
#+begin_src sh
rm -f postvep.tsv* && vep -i testspliceai.vcf.gz -o postvep.tsv --tab  --dir 109 --merged --pick --use_given_ref   --offline  --plugin SpliceAI,snv=spliceai_scores.raw.snv.hg38.vcf.gz,indel=spliceai_scores.raw.indel.hg38.vcf.gz
#+end_src
#+begin_src