3ZNNLNT7FVFHXCAVNFT4WYBVIBZHU4DPLRPDRJRRL65UE4ASJL7QC
XKYR57AGQRTGE7MV7J37AEWS4H7CQ3TZS4LDZRUCC37TICAOUPIAC
3UBLDI4IJ6XDRUS4QSQNLZOJJYOCBJF2CEUJ5EJVTDPPOJWK4JFQC
JGMCSDW663DQSK7XSWDBPVYQE57ZBP7ZVZLSEXUJOQVE7KY6BB4QC
FXA3ZBV64FML7W47IPHTAJFJHN3J3XHVHFVNYED47XFSBIGMBKRQC
YKHVGRADUMN5WJV3ONGNRCQVEQ6YSD7ZH5BX3U6FTYFTQ5XL2FGAC
GRF2ZGQU3R4G5ZI4BSFSHZW6D7ZSALA3F3OWP4RRRPV4YZOODFGQC
7SXNKSNEXIANSPPAFEK6SS3JXEJIWXLBYLTOHNSF3GO7WJNEQTTAC
F4OH5H3ONZKUVBUI5NKYJ25B66FS22QQ7LRAO53OQX3ZBL2BL6JAC
NEU45XWLRZO7ZWW7QW2PMILLG5KHA3ZBBIEK7TPNSOOBINCUDVKQC
2TJQSW4QG6FWQ24PB2UNQX2ONNO476XYLN6JVMBYRY6VJDXEF5OQC
UPNBONLATA6EOXDE4CPO2ZHJI2XC62YATCJVEMAJ3BSFNLNHY5CQC
CLIRSRZ23CJJBXUCZTELBGNQBYUP6OIHZSZZMVB5KD5FEWOXWEHQC
NBJFXQNG6YLIEL6HK7VZDEQZTXK2QJZ43AKNLXIIBX5K3MOX6WCQC
PDH2BEBXR6WCCO2GRS3L6HMLQNCC2JU5BOKZLV4DLFPQD2UZFJKQC
5XYXHK2ZR6FDG3AFG234HEZPRUAMCQBJSQCL5Z5M7UZWHY2XMZMAC
ORKQ5SEYZULGUEW77EL3XBLKU7VAJP7TDPULU7VKTOQFWD2YI4LQC
F7ZOM4ZVXTE2TIAAYM5UVYKDVCGTAFP6ZRTUNDZYNA2UJIVKE6DQC
KO3RU6L7MHCO7HXKBV2HVVA6JGOO35S5WCNKW345EN5LWCPNWTVQC
CRN5TEHRZUS2JA26WEEHG65MT4ADK6DYAFFY6SCYRRPIF36NRU4AC
6SBCU636KE3RXJXGL4SLDL2KEM5C6SRGUGHQNGIKRJTQRX5GVJIAC
T7GS3FVFOFBEFQXCLESYLY7JJ6D2ZVA2CAH5U4FRTKHTMPLINNXAC
"bisonex.org"
("runs.waiting" nil nil find-file-other-window ("/ssh:meso:/Home/Users/apraga/runs.waiting" t) "alex@gentoo" "20231007:13:31:21" nil nil)
("samples.csv" nil nil find-file-other-window ("/ssh:meso:/Work/Users/apraga/bisonex/samples.csv" t) "alex@gentoo" "20231007:13:30:53" nil nil)
("biblio_appel_variant" nil nil link-to-file ("../../research/bisonex/thesis/biblio.org" 55515) "alex@laptop" "20231013:15:40:18" nil nil)
("appel_variant" nil nil link-to-file ("../../research/bisonex/thesis/pipeline.org" 21153) "alex@laptop" "20231013:15:38:35" nil nil)
("aligneur" nil nil link-to-file ("../../research/bisonex/thesis/pipeline.org" 1777) "alex@laptop" "20231013:15:14:20" nil nil)
("biblio_aligneur" nil nil link-to-file ("../../research/bisonex/thesis/biblio.org" 3417) "alex@laptop" "20231013:15:00:34" nil nil)
# -*- mode: org -*-
Archived entries from file /home/alex/org/projects/bisonex.org
* DONE Vérifier qualité données sur mesocentre
CLOSED: [2023-01-12 Thu 15:48]
:PROPERTIES:
:ARCHIVE_TIME: 2023-01-12 Thu 15:48
:ARCHIVE_FILE: ~/org/projects/bisonex.org
:ARCHIVE_OLPATH: Données
:ARCHIVE_CATEGORY: bisonex
:ARCHIVE_TODO: DONE
:END:
** DONE BAM
CLOSED: [2023-01-12 Thu 15:47]
picard ValidateSamFile
On regarde juste le code d'erreur (0 = pas d'erreur)
** DONE Fastq
CLOSED: [2023-01-12 Thu 15:48]
fastqc
Il faut ensuite extraire les zip and chercher les erreur dedans
* KILL Implémenter d’autres pipeline
CLOSED: [2023-01-15 Sun 12:03]
:PROPERTIES:
:ARCHIVE_TIME: 2023-01-15 Sun 12:03
:ARCHIVE_FILE: ~/org/projects/bisonex.org
:ARCHIVE_CATEGORY: bisonex
:ARCHIVE_TODO: KILL
:END:
Voir https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04407-x
** KILL GATK
CLOSED: [2022 -11-11 Fri 20:01]
https://broadinstitute.github.io/warp/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README
A priori, respecte les bonnes pratiques
** KILL Essayer snmake avec bonne pratiques
https://github.com/snakemake-workflows/dna-seq-gatk-variant-calling/blob/main/.github/workflows/main.yml
Installer Mamba (micromamba ne fonctionne pas sous nix)
Ne fonctionne pas sous WSL2... MultiQC n’est pas assez à jour
Problèmes de versions...
** KILL Sarek
CLOSED: [2022-12-11 Sun 11:09]
*** Dépendences
**** Nix
#+begin_src sh
nix profile install nixpkgs#mosdepth nixpkgs#python3
nix-shell -p python310Packages.pyyaml --run "nextflow run nf-core/sarek -profile test --executor slurm --queue smp --outdir test -resume"
#+end_src
***** KILL derivation nix pour profile complet
CLOSED: [2022-12-11 Sun 11:09]
**** KILL Sans nix
CLOSED: [2022-09-24 Sat 10:20]
On utilise conda
#+begin_src sh
module unload nix
module load anaconda3@2021.05/gcc-12.1.0
module load nextflow@22.04.0/gcc-12.1.0
module load openjdk@11.0.14.1_1/gcc-12.1.0
nextflow run nf-core/sarek -profile conda,test --executor slurm --queue smp --outdir test -resume
#+end_src
Essai 1: erreurs de permissions, corrigé en relancant le programme
#+begin_quote
Failed to create Conda environment
command: conda create --mkdir --yes --quiet --prefix /Work/Users/apraga/test-sarek/work/conda/env-2d53b1db50de676670cf1a91ef0cf6db bioconda::tabix=1.11
status : 1
message:
NotWritableError: The current user does not have write permissions to a required path.
path: /Home/Users/apraga/.conda/pkgs/urls.txt
uid: 1696
gid: 513
If you feel that permissions on this path are set incorrectly, you can manually
change them by executing
$ sudo chown 1696:513 /Home/Users/apraga/.conda/pkgs/urls.txt
#+end_quote
Corrigé avec
#+begin_src sh
chown 1696:513 /Home/Users/apraga/.conda/pkgs/urls.txt
#+end_src
Mais problème de proxy
*** KILL Dérivation nix pour modules python
CLOSED: [2022-12-11 Sun 11:09]
*** KILL Lancer sarek en mode test
CLOSED: [2022-12-11 Sun 11:09]
#+begin_src sh
nix-shell -p python310Packages.pyyaml --run "nextflow run nf-core/sarek -profile test --executor slurm --queue smp --outdir test -resume"
#+end_src
*** KILL Lancer sarek sur données allégées
CLOSED: [2022-12-11 Sun 11:09]
* DONE Essai 1
CLOSED: [2023-02-13 Mon 11:50]
:PROPERTIES:
:ARCHIVE_TIME: 2023-02-14 Tue 10:42
:ARCHIVE_FILE: ~/org/projects/bisonex.org
:ARCHIVE_OLPATH: Nouveau workflow/Dépendences avec Nix/hap.py/Faire fonctionner Tests
:ARCHIVE_CATEGORY: bisonex
:ARCHIVE_TODO: DONE
:END:
Problème avec chemin python pour pysam : Tools/__init__.py échoue mais on peut utilise build/bin/hap.py
#+begin_src
nix develop .#hap-py
$ genericBuild
#+end_src
On lance donc les tests à la main (trop d'erreurs sur les chemins)
#+begin_src
# OK !
HCDIR=build/bin build/bin/test_haplotypes
# OK !
bash src/sh/make_hg19.sh
HCDIR=build/bin HG19=hg19.fa bash src/sh/run_multimer
ge_test.sh
#+end_src
Écheck sur
$ HCDIR=build/bin bash src/sh/run_hapenum_test.sh
Traceback (most recent call last):
File "build/bin/hap.py", line 26, in <module>
import pysam
File "/nix/store/3w2v5cl4x6ddq4281awcab9412r5gkaw-python3-3.10.9-env/lib/python3.10/site-packages/pysam/__init__.py", line 4, in <module>
from pysam.libchtslib import *
ImportError: No module named libchtslib
IL faut commenter detect var
Une minorité concerne des problème d'haploides
❯ awk '!/^#/ && $10~/:FN:/ && $11!~/NOCALL/ && $10~/SNP/' test-allchr.vcf
304
avec 1/3 où l'exome manque une allèles
❯ awk '!/^#/ && $10~/:FN:/ && $11!~/NOCALL/ && $10~/SNP/ && $10~/homalt/' test-allchr.vcf | wc -l
101
et 2/3 où il y a une allèle "en trop"
La majorité ne sont pas vu
❯ awk '!/^#/ && $10~/:FN:/ && $11~/NOCALL/ && $10~/SNP/' test-allchr.vcf | wc -l
6361
Nombre de reads pour chaque position en bash (!)
#+begin_src bash
awk '!/^#/ && $10~/:FN:/ && $11~/NOCALL/ && $10~/SNP/ {print $1":"$2"-"$2}' test-allchr.vcf | xargs -I {} sh -c 'echo -n {}";"; samtools view ../NA12878_NIST.b
am {} | wc -l' > count.csv
#+end_src
* KILL Julia
CLOSED: [2023-03-13 lun. 11:11]
:PROPERTIES:
:ARCHIVE_TIME: 2023-03-13 lun. 15:52
:ARCHIVE_FILE: c:/Users/apraga/org/projects/bisonex.org
:ARCHIVE_OLPATH: Tests/Validation : NA12878/GIAB/Exons seuls/D'où vient la différence ?/Statistiques
:ARCHIVE_CATEGORY: GIAB
:ARCHIVE_TODO: KILL
:END:
Conflict BAM et vcf
#+begin_src julia
using GeneticVariation
reader = VCF.Reader(open("../out/test-bed/test-allchr.vcf", "r"))
fn = 0
notSeen = 0
missingAllel = 0
surnumAllel = 0
for record in reader
global fn, notSeen
global missingAllel, surnumAllel
if VCF.genotype(record)[1][2] == "FN" && VCF.genotype(record)[1][5] == "SNP"
# println(record)
# println("$(VCF.chrom(record)) $(VCF.pos(record))")
fn = fn +1
if VCF.genotype(record)[2][5] == "NOCALL"
notSeen = notSeen + 1
else
if VCF.genotype(record)[1][6] == "homalt"
missingAllel = missingAllel + 1
elseif VCF.genotype(record)[2][6] == "homalt"
surnumAllel = surnumAllel + 1
end
end
end
end
close(reader)
println("SNP: false negative $fn")
println("SNP: not seen $notSeen")
println("SNP: missing allel $missingAllel")
println("SNP: surnumeraries allel $surnumAllel")
#+end_src
SNP: false negative 6665
SNP: not seen 6361
SNP: missing allel 101
SNP: surnumeraries allel 203
1. On confirme le nombre de SNP.
2. Une minorité concerne des problème d'haploides avec
- 1/3 où l'exome manque une allèles
- 2/3 où il y a une allèle "en trop"
3. La majorité ne sont pas vu (6361)
Nombre de reads pour chaque position en bash (!)
#+begin_src bash
awk '!/^#/ && $10~/:FN:/ && $11~/NOCALL/ && $10~/SNP/ {print $1":"$2"-"$2}' test-allchr.vcf | xargs -I {} sh -c 'echo -n {}";"; samtools view ../NA12878_NIST.b
am {} | wc -l' > count.csv
* KILL Reads en dehors des exons
CLOSED: [2023-03-09 Thu 22:43]
:PROPERTIES:
:ARCHIVE_TIME: 2023-03-13 lun. 15:55
:ARCHIVE_FILE: c:/Users/apraga/org/projects/bisonex.org
:ARCHIVE_OLPATH: Tests/Validation : NA12878/GIAB/Exons seuls/D'où vient la différence ?/Statistiques
:ARCHIVE_CATEGORY: GIAB
:ARCHIVE_TODO: KILL
:END:
Ça a l'air sur des exemples simples ...
#+begin_src sh
wget https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_genomic.gff.gz
awk '/BestRefSeq\texon/ && /transcript_id=NM/ {print $1"\t"$4"\t"$5;}' GRCh38_latest_genomic.gff | grep ^NC | save exons.csv
#+end_src
#+begin_src julia
using CSV, DataFrames, Intervals
# Check the number of reads for each false negative (stored in count.csv)
# from hap.py output:
# awk '!/^#/ && $10~/:FN:/ && $11~/NOCALL/ && $10~/SNP/ {print $1":"$2"-"$2}' test-allchr.vcf | xargs -I {} sh -c 'echo -n {}";"; samtools view ../NA12878_NIST.b am {} | wc -l' > count.csv
function groupedCount(fname)
f = DataFrame(CSV.File(fname, header=false))
transform!(f, :Column1 => ByRow(x -> string.(split(x, ":"))[1]) => :chrom)
transform!(f, :Column1 => ByRow(x -> parse(Int64, string.(split(x, "-"))[2])) => :pos)
select!(f, Not([:Column1]))
rename!(f, :Column2 => :nb)
groupby(f, :chrom)
end
function groupedExons(fname)
f = DataFrame(CSV.File(fname, header=["chrom", "start", "end"], delim="\t"))
transform!(f, [:start, :end] => ByRow((x,y) -> Interval(x, y)) => :interval)
select!(f, Not([:start, :end]))
groupby(f, :chrom)
end
# Check is position x is in exons for chromosome chr
function isExon(x, exons, chr)
# Remove duplicates
exonsU = unique(exons[chr])
res = transform(exonsU, :interval => ByRow(z -> x in z) => :check)
# Result only found interval
found = res[res.check,:].interval
isempty(found) ? missing : first(found)
end
reads = groupedCount("../NA12878/happy/count.csv")
exons = groupedExons("exons.csv")
for g in range(1, size(reads)[1])
local found
found = transform(reads[g], :pos => ByRow(x -> isExon(x, exons, g)) => :exon)
readsMissing = found[ismissing.(found.exon), :]
# print("$(k.chrom):")
print(g)
print(": $(size(reads[g])[1]) reads, ")
println("$(size(readsMissing)[1]) not in exons")
end
#+end_src
1: 448 reads, 64 not in exons
2: 307 reads, 47 not in exons
3: 228 reads, 37 not in exons
4: 175 reads, 28 not in exons
5: 237 reads, 38 not in exons
6: 1223 reads, 304 not in exons
7: 274 reads, 52 not in exons
8: 414 reads, 41 not in exons
9: 161 reads, 47 not in exons
10: 194 reads, 37 not in exons
11: 304 reads, 39 not in exons
12: 254 reads, 32 not in exons
13: 168 reads, 120 not in exons
14: 184 reads, 26 not in exons
15: 250 reads, 90 not in exons
16: 239 reads, 29 not in exons
17: 373 reads, 35 not in exons
18: 85 reads, 17 not in exons
19: 576 reads, 46 not in exons
20: 83 reads, 22 not in exons
21: 67 reads, 14 not in exons
22: 117 reads, 24 not in exons
* KILL Répartition par chromosome: variable mais seul le 6 = hotspot
CLOSED: [2023-03-07 Tue 22:44]
:PROPERTIES:
:ARCHIVE_TIME: 2023-03-13 lun. 15:55
:ARCHIVE_FILE: c:/Users/apraga/org/projects/bisonex.org
:ARCHIVE_OLPATH: Tests/Validation : NA12878/GIAB/Exons seuls/D'où vient la différence ?/Statistiques
:ARCHIVE_CATEGORY: GIAB
:ARCHIVE_TODO: KILL
:END:
#+begin_src julia
using CSV, DataFrames
f = DataFrame(CSV.File("count.csv", header=false))
size(f[f.Column2 .< 10,:])
#+end_src
(4430, 2)
#+begin_src julia
using CSV, DataFrames
function cleanCount(fname)
f = DataFrame(CSV.File(fname, header=false))
transform!(f, :Column1 => ByRow(x -> string.(split(x, ":"))[1]) => :chrom)
transform!(f, :Column1 => ByRow(x -> string.(split(x, "-"))[2]) => :pos)
select!(f, Not([:Column1]))
rename!(f, :Column2 => :nb)
end
f = cleanCount("count.csv")
grouped = groupby(f, :chrom)
for (k, g) in pairs(grouped)
println("$(k.chrom): $(size(g, 1))")
end
#+end_src
#+RESULTS:
: NC_000001.11: 448
: NC_000002.12: 307
: NC_000003.12: 228
: NC_000004.12: 175
: NC_000005.10: 237
: NC_000006.12: 1223
: NC_000007.14: 274
: NC_000008.11: 414
: NC_000009.12: 161
: NC_000010.11: 194
: NC_000011.10: 304
: NC_000012.12: 254
: NC_000013.11: 168
: NC_000014.9: 184
: NC_000015.10: 250
: NC_000016.10: 239
: NC_000017.11: 373
: NC_000018.10: 85
: NC_000019.10: 576
: NC_000020.11: 83
: NC_000021.9: 67
: NC_000022.11: 117
* KILL Awk
CLOSED: [2023-03-12 Sun 22:12]
:PROPERTIES:
:ARCHIVE_TIME: 2023-03-13 lun. 15:55
:ARCHIVE_FILE: c:/Users/apraga/org/projects/bisonex.org
:ARCHIVE_OLPATH: Tests/Validation : NA12878/GIAB/Exons seuls/D'où vient la différence ?/Statistiques
:ARCHIVE_CATEGORY: GIAB
:ARCHIVE_TODO: KILL
:END:
NB: awk car bcftools query ne semble pas prendre en compte les nouvelles colonnes
On confirnme le nombre de SNP:
❯ awk '!/^#/ && $10~/:FN:/ && $10~/SNP/' test-allchr.vcf | wc -l
6665
Une minorité concerne des problème d'haploides
❯ awk '!/^#/ && $10~/:FN:/ && $11!~/NOCALL/ && $10~/SNP/' test-allchr.vcf
304
avec 1/3 où l'exome manque une allèles
❯ awk '!/^#/ && $10~/:FN:/ && $11!~/NOCALL/ && $10~/SNP/ && $10~/homalt/' test-allchr.vcf | wc -l
101
et 2/3 où il y a une allèle "en trop"
La majorité ne sont pas vu
❯ awk '!/^#/ && $10~/:FN:/ && $11~/NOCALL/ && $10~/SNP/' test-allchr.vcf | wc -l
6361
Nombre de reads pour chaque position en bash (!)
#+begin_src bash
awk '!/^#/ && $10~/:FN:/ && $11~/NOCALL/ && $10~/SNP/ {print $1":"$2"-"$2}' test-allchr.vcf | xargs -I {} sh -c 'echo -n {}";"; samtools view ../NA12878_NIST.b
am {} | wc -l' > count.csv
#+end_src
* KILL Comprendre/améliorer Recall SNP 0.855
CLOSED: [2023-03-13 lun. 15:55] SCHEDULED: <2023-03-04 Sat>
:PROPERTIES:
:ARCHIVE_TIME: 2023-03-13 lun. 15:55
:ARCHIVE_FILE: c:/Users/apraga/org/projects/bisonex.org
:ARCHIVE_OLPATH: Tests/Validation : NA12878/GIAB/Exons seuls/D'où vient la différence ?/Statistiques
:ARCHIVE_CATEGORY: GIAB
:ARCHIVE_TODO: KILL
:END:
on est inférieur à Kumaran et al 2019
** KILL Regarder les FN (SNP)
CLOSED: [2023-03-13 lun. 15:55] SCHEDULED: <2023-03-04 Sat >
*** Manuel:
NC_000001.11:1385919 pas de read 1/1:FN:.:i1_5:INDEL:homalt:.
NC_000001.11:1623412 1 read 1/1:FN:.:ti:SNP:homalt:.
NC_000001.11:1668449 33 read sur 160 voient l'allèle alternative 1/1:FN:am:ti:SNP:homalt:.
NC_000001.11:1676135 67 reads, non vu 0/1:FN:.:ti:SNP:het:.
NC_000001.11:1734812 1/1:FN:.:ti:SNP:homalt:.
NC_000001.11:1745808 1/1:FN:.:ti:SNP:homalt:.
NC_000001.11:1745814 1/1:FN:.:ti:SNP:homalt:.
NC_000001.11:1953616 1/1:FN:.:ti:SNP:homalt:.
NC_000001.11:2512975 0/1:FN:.:ti:SNP:het:
* DONE Awk
CLOSED: [2023-03-12 Sun 22:12]
:PROPERTIES:
:ARCHIVE_TIME: 2023-03-12 Sun 22:19
:ARCHIVE_FILE: ~/org/projects/bisonex.org
:ARCHIVE_OLPATH: Tests/Validation : NA12878/GIAB/Exons seuls/D'où vient la différence ?/Statistiques
:ARCHIVE_CATEGORY: GIAB
:ARCHIVE_TODO: DONE
:END:
NB: awk car bcftools query ne semble pas prendre en compte les nouvelles colonnes
On confirnme le nombre de SNP:
❯ awk '!/^#/ && $10~/:FN:/ && $10~/SNP/' test-allchr.vcf | wc -l
6665
``` bibtex
@article{cvetkovska2021,
title = {Ophn1 regulation of prefrontal inhibition: A mechanism for stress susceptibility in intellectual disability},
journal = {Neuron},
volume = {109},
number = {10},
pages = {1583-1584},
year = {2021},
issn = {0896-6273},
doi = {https://doi.org/10.1016/j.neuron.2021.04.030},
url = {https://www.sciencedirect.com/science/article/pii/S0896627321003263},
author = {Vedrana Cvetkovska and Rosemary C. Bagot},
abstract = {Wang et al. (2021) characterize the molecular, cellular, and circuit-level role of Oligophrenin-1 in prefrontal parvalbumin interneurons, demonstrating that loss of Ophn1 function in these neurons is a mechanism for increased susceptibility to stress in intellectual disability caused by OPHN1 mutations.}
}
@article{nuovo2021,
author = {Nuovo, Sara and Brankovic, Vesna and Caputi, Caterina and Casella, Antonella and Nigro, Vincenzo and Leuzzi, Vincenzo and Valente, Enza Maria},
title = {Novel unconventional variants expand the allelic spectrum of OPHN1 gene},
journal = {American Journal of Medical Genetics Part A},
volume = {185},
number = {5},
pages = {1575-1581},
keywords = {allelic spectrum, BAR domain, oligophrenin, OPHN1, proline-rich domain},
doi = {https://doi.org/10.1002/ajmg.a.62144},
url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/ajmg.a.62144},
eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1002/ajmg.a.62144},
abstract = {Abstract Mutations in the OPHN1 gene cause a rare X-linked recessive neurodevelopmental disorder characterized by intellectual disability, variably associated with cerebellar hypoplasia and distinctive facial appearance. In most of cases so far reported, the identified genomic variants involve the region encoding the central RhoGAP domain of the oligophrenin-1 protein, and are predicted to result in a complete loss of function. By using a NGS-based diagnostic approach, we identified three male and a female patients from two unrelated families carrying novel non-disruptive OPHN1 variants (the in-frame c.116\_127 deletion and the missense c.2129C>T change, respectively), affecting either the BAR domain or the C-terminus proline-rich domain of the protein. Clinical and neuroimaging findings in the patients recapitulated the main features of OPHN1-related syndrome, including developmental delay, intellectual disability, behavioral disorder, dysmorphic features, seizures, cerebellar hypoplasia, and ventriculomegaly. Yet, we observed a wide variability even among affected siblings, confirming the lack of clear genotype–phenotype correlation. Our results expand the allelic spectrum of OPHN1 and illustrate the challenges for clinical interpretation of non-disruptive variants affecting X-linked genes.},
year = {2021}
}
@article{boglics2020,
title={Exon 21 deletion in the OPHN1 gene in a family with syndromic X-linked intellectual disability: Case report},
author={Bogli{\c{s}}, Alina and Cosma, Adriana S and Tripon, Florin and B{\~a}nescu, Claudia},
journal={Medicine},
volume={99},
number={33},
year={2020},
publisher={Wolters Kluwer Health}
}
```
- [ ] Big Bang Theory: Saison 1
- [ ] Breaking Bad: Saison 1,2
- [x] Deadwood
- [ ] Dexter: saison 1
- [ ] Elementary: saison 1-4
- [ ] Good Eats: saison 1,2
- [ ] Lost: saison 1-3
- [x] Mythbusters
- [ ] Planet Earth: Season 1
- [ ] Savage builds: Season 1
- [x] Rome
- [x] Sherlock (2010)
- [x] Weeds
- [ ] Big Bang Theory: Saison 1
- [ ] Breaking Bad: Saison 1,2
- [x] Deadwood
- [ ] Dexter: saison 1
- [ ] Elementary: saison 1-4
- [ ] Good Eats: saison 1,2
- [ ] Lost: saison 1-3
- [x] Mythbusters
- [ ] Planet Earth: Season 1
- [ ] Savage builds: Season 1
- [x] Rome
- [x] Sherlock (2010)
- [x] Weeds
### [TODO]{.todo .TODO} Batman : City of owls (8-12) Batman & Robin : first arc (1-17) {#batman-city-of-owls-8-12-batman-robin-first-arc-1-17}
### [TODO]{.todo .TODO} Batman : The court of owls (1-7) Batman - Year Two Detective comic {#batman-the-court-of-owls-1-7-batman---year-two-detective-comic}
### [TODO]{.todo .TODO} Hush Incorporated : The Demon Star (1-13) {#hush-incorporated-the-demon-star-1-13}
### [DONE]{.done .DONE} Arkham Reborn {#arkham-reborn rating="3.5"}
### [DONE]{.done .DONE} Bane of the Demon {#bane-of-the-demon rating="3.5"}
### [DONE]{.done .DONE} Batman and the Mad Monk {#batman-and-the-mad-monk rating="3"}
### [DONE]{.done .DONE} Batman and the Monster Men {#batman-and-the-monster-men}
### [DONE]{.done .DONE} Batman & Danger Girl {#batman-danger-girl rating="3.5"}
### [DONE]{.done .DONE} Batman & Deadman : Death and Glory {#batman-deadman-death-and-glory rating="3.5"}
### [DONE]{.done .DONE} Batman Family (2002) {#batman-family-2002 rating="4"}
### [DONE]{.done .DONE} Batman & Green Arrow : The Poison Tomorrow {#batman-green-arrow-the-poison-tomorrow rating="3.5"}
### [DONE]{.done .DONE} Batman/Hellboy/Starman :PROPERTIES::rating: 3.5 [[END]{.smallcaps}]{.tag tag-name="END"} {#batmanhellboystarman-propertiesrating-3.5}
### [DONE]{.done .DONE} Batman Last Rites - The Butler {#batman-last-rites---the-butler rating="3"}
### [DONE]{.done .DONE} Batman Huntress & Spoiler - Blunt Trauma {#batman-huntress-spoiler---blunt-trauma rating="3.5"}
### [DONE]{.done .DONE} Batman & Tarzan : Claws of the Cat-Woman {#batman-tarzan-claws-of-the-cat-woman}
### [DONE]{.done .DONE} Batman - Year One {#batman---year-one rating="4.5"}
### [DONE]{.done .DONE} Battle for the Cowl {#battle-for-the-cowl rating="3.5"}
### [DONE]{.done .DONE} Bruce - The road home {#bruce---the-road-home rating="3.5"}
### [DONE]{.done .DONE} Bruce Wayne : Fugitive {#bruce-wayne-fugitive rating="3.5"}
### [DONE]{.done .DONE} Bruce Wayne : Murderer {#bruce-wayne-murderer rating="4"}
### [DONE]{.done .DONE} Catwoman - Follow the Money {#catwoman---follow-the-money rating="3.5"}
### [DONE]{.done .DONE} Catwoman - Tail of the Gun {#catwoman---tail-of-the-gun rating="4"}
### [DONE]{.done .DONE} Catwoman - When in Rome {#catwoman---when-in-rome rating="3.5"}
### [DONE]{.done .DONE} Deathblow {#deathblow rating="4"}
### [DONE]{.done .DONE} Huntress - Cry for blood {#huntress---cry-for-blood rating="3.5"}
### [DONE]{.done .DONE} Batman Last Rites - Last days of Gotham {#batman-last-rites---last-days-of-gotham rating="3"}
### [DONE]{.done .DONE} The Dark Knight returns {#the-dark-knight-returns rating="4.5"}
### [DONE]{.done .DONE} The Killing Joke {#the-killing-joke rating="4.5"}
### [DONE]{.done .DONE} The Long Halloween {#the-long-halloween rating="4.5"}
- [ ] Batman : City of owls (8-12) Batman & Robin : first arc (1-17) {#batman-city-of-owls-8-12-batman-robin-first-arc-1-17}
- [ ] Batman : The court of owls (1-7) Batman - Year Two Detective comic {#batman-the-court-of-owls-1-7-batman---year-two-detective-comic}
- [ ] Hush Incorporated : The Demon Star (1-13) {#hush-incorporated-the-demon-star-1-13}
- [X] Arkham Reborn {#arkham-reborn rating="3.5"}
- [X] Bane of the Demon {#bane-of-the-demon rating="3.5"}
- [X] Batman and the Mad Monk {#batman-and-the-mad-monk rating="3"}
- [X] Batman and the Monster Men {#batman-and-the-monster-men}
- [X] Batman & Danger Girl {#batman-danger-girl rating="3.5"}
- [X] Batman & Deadman : Death and Glory {#batman-deadman-death-and-glory rating="3.5"}
- [X] Batman Family (2002) {#batman-family-2002 rating="4"}
- [X] Batman & Green Arrow : The Poison Tomorrow {#batman-green-arrow-the-poison-tomorrow rating="3.5"}
- [X] Batman/Hellboy/Starman :PROPERTIES::rating: 3.5 [[END]{.smallcaps}]{.tag tag-name="END"} {#batmanhellboystarman-propertiesrating-3.5}
- [X] Batman Last Rites - The Butler {#batman-last-rites---the-butler rating="3"}
- [X] Batman Huntress & Spoiler - Blunt Trauma {#batman-huntress-spoiler---blunt-trauma rating="3.5"}
- [X] Batman & Tarzan : Claws of the Cat-Woman {#batman-tarzan-claws-of-the-cat-woman}
- [X] Batman - Year One {#batman---year-one rating="4.5"}
- [X] Battle for the Cowl {#battle-for-the-cowl rating="3.5"}
- [X] Bruce - The road home {#bruce---the-road-home rating="3.5"}
- [X] Bruce Wayne : Fugitive {#bruce-wayne-fugitive rating="3.5"}
- [X] Bruce Wayne : Murderer {#bruce-wayne-murderer rating="4"}
- [X] Catwoman - Follow the Money {#catwoman---follow-the-money rating="3.5"}
- [X] Catwoman - Tail of the Gun {#catwoman---tail-of-the-gun rating="4"}
- [X] Catwoman - When in Rome {#catwoman---when-in-rome rating="3.5"}
- [X] Deathblow {#deathblow rating="4"}
- [X] Huntress - Cry for blood {#huntress---cry-for-blood rating="3.5"}
- [X] Batman Last Rites - Last days of Gotham {#batman-last-rites---last-days-of-gotham rating="3"}
- [X] The Dark Knight returns {#the-dark-knight-returns rating="4.5"}
- [X] The Killing Joke {#the-killing-joke rating="4.5"}
- [X] The Long Halloween {#the-long-halloween rating="4.5"}
### [DONE]{.done .DONE} Brainiac {#brainiac rating="4.5"}
### [DONE]{.done .DONE} Earth One Vol. 1 and 2 {#earth-one-vol.-1-and-2 rating="3.5"}
### [DONE]{.done .DONE} Escape from Bizarro World {#escape-from-bizarro-world rating="3.5"}
### [DONE]{.done .DONE} For All Seasons {#for-all-seasons rating="4.5"}
### [DONE]{.done .DONE} For The Man Who Has Everything {#for-the-man-who-has-everything rating="4.5"}
### [DONE]{.done .DONE} Kingdom Come {#kingdom-come rating="4.5"}
### [DONE]{.done .DONE} Last Son {#last-son rating="3.5"}
### [DONE]{.done .DONE} Peace on Earth {#peace-on-earth rating="3.5"}
### [DONE]{.done .DONE} Red Son {#red-son rating="3.5"}
### [DONE]{.done .DONE} Secret Identity {#secret-identity rating="4.5"}
### [DONE]{.done .DONE} Superman and the Legion of Super Heroes {#superman-and-the-legion-of-super-heroes rating="3.5"}
### [DONE]{.done .DONE} What\'s So Funny About Truth, Justice and the American Way? {#whats-so-funny-about-truth-justice-and-the-american-way rating="3.5"}
### [DONE]{.done .DONE} Whatever Happened to the Man of Tomorrow? {#whatever-happened-to-the-man-of-tomorrow rating="4.5"}
### [DONE]{.done .DONE} Superman in Exile {#superman-in-exile rating="3.5"}
- [X] Brainiac {#brainiac rating="4.5"}
- [X] Earth One Vol. 1 and 2 {#earth-one-vol.-1-and-2 rating="3.5"}
- [X] Escape from Bizarro World {#escape-from-bizarro-world rating="3.5"}
- [X] For All Seasons {#for-all-seasons rating="4.5"}
- [X] For The Man Who Has Everything {#for-the-man-who-has-everything rating="4.5"}
- [X] Kingdom Come {#kingdom-come rating="4.5"}
- [X] Last Son {#last-son rating="3.5"}
- [X] Peace on Earth {#peace-on-earth rating="3.5"}
- [X] Red Son {#red-son rating="3.5"}
- [X] Secret Identity {#secret-identity rating="4.5"}
- [X] Superman and the Legion of Super Heroes {#superman-and-the-legion-of-super-heroes rating="3.5"}
- [X] What\'s So Funny About Truth, Justice and the American Way? {#whats-so-funny-about-truth-justice-and-the-american-way rating="3.5"}
- [X] Whatever Happened to the Man of Tomorrow? {#whatever-happened-to-the-man-of-tomorrow rating="4.5"}
- [X] Superman in Exile {#superman-in-exile rating="3.5"}
### [DONE]{.done .DONE} Ultimate Adventures (1-6) {#ultimate-adventures-1-6}
### [DONE]{.done .DONE} Ultimates 3 {#ultimates-3 rating="3.5"}
### [DONE]{.done .DONE} Ultimate Galactus {#ultimate-galactus}
### [DONE]{.done .DONE} READ Ultimate Nightmare {#read-ultimate-nightmare}
### [DONE]{.done .DONE} READ Ultimate Secret {#read-ultimate-secret}
### [DONE]{.done .DONE} READ Ultimate Vision {#read-ultimate-vision}
### [DONE]{.done .DONE} READ Ultimate Extinction {#read-ultimate-extinction}
### [DONE]{.done .DONE} Ultimate X4 {#ultimate-x4}
### [DONE]{.done .DONE} Ultimate War (After 025) {#ultimate-war-after-025}
### [DONE]{.done .DONE} Ultimate X-Men 1-97 {#ultimate-x-men-1-97 rating="3.5"}
### [DONE]{.done .DONE} Ultimate X-Men 98-100 {#ultimate-x-men-98-100 rating="3.5"}
### [DONE]{.done .DONE} Fantastic Four - X Men Annual {#fantastic-four---x-men-annual rating="2.5"}
### [DONE]{.done .DONE} Ultimate human {#ultimate-human rating="3"}
### [DONE]{.done .DONE} Ultimate origins {#ultimate-origins rating="3"}
### [DONE]{.done .DONE} Ultimate human {#ultimate-human-1 rating="3"}
### [DONE]{.done .DONE} Ultimate power {#ultimate-power rating="3"}
### [DONE]{.done .DONE} Ultimate Fantastic four 58-60 {#ultimate-fantastic-four-58-60 rating="3"}
### [DONE]{.done .DONE} Ultimate Spiderman 129-133 {#ultimate-spiderman-129-133 rating="3.5"}
### [DONE]{.done .DONE} Ultimate Spiderman : Requiem {#ultimate-spiderman-requiem rating="3"}
### [DONE]{.done .DONE} Ultimate X-men : Requiem {#ultimate-x-men-requiem rating="3"}
### [DONE]{.done .DONE} Ultimate Fantastic Four : Requiem {#ultimate-fantastic-four-requiem}
- [X] Ultimate Adventures (1-6) {#ultimate-adventures-1-6}
- [X] Ultimates 3 {#ultimates-3 rating="3.5"}
- [X] Ultimate Galactus {#ultimate-galactus}
- [X] READ Ultimate Nightmare {#read-ultimate-nightmare}
- [X] READ Ultimate Secret {#read-ultimate-secret}
- [X] READ Ultimate Vision {#read-ultimate-vision}
- [X] READ Ultimate Extinction {#read-ultimate-extinction}
- [X] Ultimate X4 {#ultimate-x4}
- [X] Ultimate War (After 025) {#ultimate-war-after-025}
- [X] Ultimate X-Men 1-97 {#ultimate-x-men-1-97 rating="3.5"}
- [X] Ultimate X-Men 98-100 {#ultimate-x-men-98-100 rating="3.5"}
- [X] Fantastic Four - X Men Annual {#fantastic-four---x-men-annual rating="2.5"}
- [X] Ultimate human {#ultimate-human rating="3"}
- [X] Ultimate origins {#ultimate-origins rating="3"}
- [X] Ultimate human {#ultimate-human-1 rating="3"}
- [X] Ultimate power {#ultimate-power rating="3"}
- [X] Ultimate Fantastic four 58-60 {#ultimate-fantastic-four-58-60 rating="3"}
- [X] Ultimate Spiderman 129-133 {#ultimate-spiderman-129-133 rating="3.5"}
- [X] Ultimate Spiderman : Requiem {#ultimate-spiderman-requiem rating="3"}
- [X] Ultimate X-men : Requiem {#ultimate-x-men-requiem rating="3"}
- [X] Ultimate Fantastic Four : Requiem {#ultimate-fantastic-four-requiem}
Doc: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/latest/README_v4.2.1.txt
Bed : https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/latest/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.bed
https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/NA12878_HG001/latest/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz
** DONE BAM
CLOSED: [2023-01-12 Thu 15:47]
picard ValidateSamFile
On regarde juste le code d'erreur (0 = pas d'erreur)
** DONE Fastq
CLOSED: [2023-01-12 Thu 15:48]
fastqc
Il faut ensuite extraire les zip and chercher les erreur dedans
** KILL SNV
CLOSED: [2023-05-13 Sat 18:29] SCHEDULED: <2023-05-01 Mon>
*** DONE Script python: ok seulement pour reads corrigé, trop long sinon
CLOSED: [2023-05-01 Mon 13:32]
Même sur un seul chromosome (15)...
*** DONE Couper les données avec bedtools intersect ?
CLOSED: [2023-05-01 Mon 20:33]
-wa pour avoir les reads qui corresponds
-v pour ceux qui n'intersecte pass
ok sur un exemple
**** DONE Test simple : ok
CLOSED: [2023-05-01 Mon 13:43]
Voir https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-021-04407-x
** KILL GATK
CLOSED: [2022 -11-11 Fri 20:01]
https://broadinstitute.github.io/warp/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README
A priori, respecte les bonnes pratiques
** KILL Essayer snmake avec bonne pratiques
https://github.com/snakemake-workflows/dna-seq-gatk-variant-calling/blob/main/.github/workflows/main.yml
Installer Mamba (micromamba ne fonctionne pas sous nix)
Ne fonctionne pas sous WSL2... MultiQC n’est pas assez à jour
Problèmes de versions...
** KILL Sarek
CLOSED: [2022-12-11 Sun 11:09]
*** Dépendences
**** Nix
#+begin_src sh
nix profile install nixpkgs#mosdepth nixpkgs#python3
nix-shell -p python310Packages.pyyaml --run "nextflow run nf-core/sarek -profile test --executor slurm --queue smp --outdir test -resume"
#+end_src
***** KILL derivation nix pour profile complet
CLOSED: [2022-12-11 Sun 11:09]
**** KILL Sans nix
CLOSED: [2022-09-24 Sat 10:20]
On utilise conda
#+begin_src sh
module unload nix
module load anaconda3@2021.05/gcc-12.1.0
module load nextflow@22.04.0/gcc-12.1.0
module load openjdk@11.0.14.1_1/gcc-12.1.0
nextflow run nf-core/sarek -profile conda,test --executor slurm --queue smp --outdir test -resume
#+end_src
Essai 1: erreurs de permissions, corrigé en relancant le programme
#+begin_quote
Failed to create Conda environment
command: conda create --mkdir --yes --quiet --prefix /Work/Users/apraga/test-sarek/work/conda/env-2d53b1db50de676670cf1a91ef0cf6db bioconda::tabix=1.11
status : 1
message:
NotWritableError: The current user does not have write permissions to a required path.
path: /Home/Users/apraga/.conda/pkgs/urls.txt
uid: 1696
gid: 513
#+attr_html: :width 500px
[[./test-intersect-chr15.png]]
#+attr_html: :width 500px
[[./test-nointersect-chr15.png]]
**** DONE Chromosome 15 : vérifier BAM
CLOSED: [2023-05-01 Mon 20:33] SCHEDULED: <2023-05-01 Mon>
#+begin_src
samtools view -b `63003856_S135.bam` NC_000015.10 > `63003856_S135_chr15.bam`
If you feel that permissions on this path are set incorrectly, you can manually
change them by executing
$ sudo chown 1696:513 /Home/Users/apraga/.conda/pkgs/urls.txt
#+end_quote
Corrigé avec
#+begin_src sh
chown 1696:513 /Home/Users/apraga/.conda/pkgs/urls.txt
On génère un python avec les dépendances
Mais problème de proxy
*** KILL Dérivation nix pour modules python
CLOSED: [2022-12-11 Sun 11:09]
*** KILL Lancer sarek en mode test
CLOSED: [2022-12-11 Sun 11:09]
#+begin_src sh
nix-shell -p python310Packages.pyyaml --run "nextflow run nf-core/sarek -profile test --executor slurm --queue smp --outdir test -resume"
#+end_src
*** KILL Lancer sarek sur données allégées
CLOSED: [2022-12-11 Sun 11:09]
* DONE Essai 1
CLOSED: [2023-02-13 Mon 11:50]
:PROPERTIES:
:ARCHIVE_TIME: 2023-02-14 Tue 10:42
:ARCHIVE_FILE: ~/org/projects/bisonex.org
:ARCHIVE_OLPATH: Nouveau workflow/Dépendences avec Nix/hap.py/Faire fonctionner Tests
:ARCHIVE_CATEGORY: bisonex
:ARCHIVE_TODO: DONE
:END:
Problème avec chemin python pour pysam : Tools/__init__.py échoue mais on peut utilise build/bin/hap.py
#+begin_src julia
(1.7) pkg> activate .
julia> create_sysimage(; sysimage_path="bisonex.so")
#+end_src
Écheck sur
$ HCDIR=build/bin bash src/sh/run_hapenum_test.sh
Traceback (most recent call last):
File "build/bin/hap.py", line 26, in <module>
import pysam
File "/nix/store/3w2v5cl4x6ddq4281awcab9412r5gkaw-python3-3.10.9-env/lib/python3.10/site-packages/pysam/__init__.py", line 4, in <module>
from pysam.libchtslib import *
ImportError: No module named libchtslib
IL faut commenter detect var
#+attr_html: :width 500px
[[./check-snv-chr15.png]]
* DONE Awk
CLOSED: [2023-03-12 Sun 22:12]
:PROPERTIES:
:ARCHIVE_TIME: 2023-03-12 Sun 22:19
:ARCHIVE_FILE: ~/org/projects/bisonex.org
:ARCHIVE_OLPATH: Tests/Validation : NA12878/GIAB/Exons seuls/D'où vient la différence ?/Statistiques
:ARCHIVE_CATEGORY: GIAB
:ARCHIVE_TODO: DONE
:END:
NB: awk car bcftools query ne semble pas prendre en compte les nouvelles colonnes
**** DONE Chromosome 15 Vérifier VAF avec checkBam.jl: ok
julia> @subset snv :chrom .== "NC_000015.10"
26×10 DataFrame
Row │ chrom pos variant variantType zygosity ref alt refCount altCount readsCount
│ SubStrin…? Int64 SubStrin…? String? String15 SubStrin… SubStrin… Int64 Int64 Int64
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ NC_000015.10 74343027 g.74343027C>T snv heterozygous C T 61 58 120
2 │ NC_000015.10 75400778 g.75400778C>G snv heterozygous C G 108 79 187
3 │ NC_000015.10 89327201 g.89327201C>T snv heterozygous C T 243 241 486
4 │ NC_000015.10 48767448 g.48767448A>C snv heterozygous A C 72 70 142
5 │ NC_000015.10 75411685 g.75411685T>C snv heterozygous T C 79 81 160
6 │ NC_000015.10 66703292 g.66703292C>T snv heterozygous C T 70 60 130
7 │ NC_000015.10 89325639 g.89325639G>A snv heterozygous G A 257 267 524
8 │ NC_000015.10 89330184 g.89330184G>A snv heterozygous G A 258 287 548
9 │ NC_000015.10 89330184 g.89330184G>A snv heterozygous G A 258 287 548
10 │ NC_000015.10 89325639 g.89325639G>A snv heterozygous G A 257 267 524
11 │ NC_000015.10 42401752 g.42401752G>A snv homozygous G A 61 212 273
12 │ NC_000015.10 89327201 g.89327201C>T snv heterozygous C T 243 241 486
13 │ NC_000015.10 38339896 g.38339896G>A snv heterozygous G A 56 86 144
14 │ NC_000015.10 26869324 g.26869324A>T snv heterozygous A T 62 49 113
15 │ NC_000015.10 66435145 g.66435145G>A snv heterozygous G A 98 95 193
16 │ NC_000015.10 60514655 g.60514655G>A snv heterozygous G A 94 99 194
17 │ NC_000015.10 42410947 g.42410947A>G snv heterozygous A G 153 123 276
18 │ NC_000015.10 75430368 g.75430368C>T snv heterozygous C T 80 62 142
19 │ NC_000015.10 25375494 g.25375494T>C snv heterozygous T C 103 104 207
20 │ NC_000015.10 60497497 g.60497497C>A snv heterozygous C A 61 65 126
21 │ NC_000015.10 74891539 g.74891539C>T snv heterozygous C T 118 124 242
22 │ NC_000015.10 48488433 g.48488433A>G snv heterozygous A G 367 122 492
23 │ NC_000015.10 89318565 g.89318565A>G snv heterozygous A G 303 98 404
24 │ NC_000015.10 89323426 g.89323426C>G snv heterozygous C G 93 109 202
25 │ NC_000015.10 89318595 g.89318595T>C snv heterozygous T C 321 128 453
26 │ NC_000015.10 48488437 g.48488437T>C snv heterozygous T C 356 132 488
CLOSED: [2023-05-01 Mon 17:18]
*** KILL Chromosome1 15 :Test haplotype caller : échec car CIGARE non mis à jour
CLOSED: [2023-05-13 Sat 18:29] SCHEDULED: <2023-05-01 Mon>
On confirnme le nombre de SNP:
❯ awk '!/^#/ && $10~/:FN:/ && $10~/SNP/' test-allchr.vcf | wc -l
6665
Une minorité concerne des problème d'haploides
❯ awk '!/^#/ && $10~/:FN:/ && $11!~/NOCALL/ && $10~/SNP/' test-allchr.vcf
304
avec 1/3 où l'exome manque une allèles
❯ awk '!/^#/ && $10~/:FN:/ && $11!~/NOCALL/ && $10~/SNP/ && $10~/homalt/' test-allchr.vcf | wc -l
101
et 2/3 où il y a une allèle "en trop"
#+begin_src
julia -Jbisonex.so --project=. insertVariants.jl `63003856_S135_chr15.bam` 63003856_S135_chr15_inserted.bam
scp 63003856_S135_chr15_inserted.bam* meso:/Work/Users/apraga/bisonex/tests/synthetic/
La majorité ne sont pas vu
❯ awk '!/^#/ && $10~/:FN:/ && $11~/NOCALL/ && $10~/SNP/' test-allchr.vcf | wc -l
6361
#+begin_src sh :dir /ssh:meso:/Work/Users/apraga/bisonex/tests/synthetic :results silent
ln -s /Work/Projects/bisonex/data/dbSNP/GRCh38.p13/dbSNP.gz .
ln -s /Work/Projects/bisonex/data/dbSNP/GRCh38.p13/dbSNP.gz.tbi
ln -s /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.dict .
ln -s /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna .
ln -s /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef.fna.fai .
* KILL Julia
CLOSED: [2023-03-13 lun. 11:11]
:PROPERTIES:
:ARCHIVE_TIME: 2023-03-13 lun. 15:52
:ARCHIVE_FILE: c:/Users/apraga/org/projects/bisonex.org
:ARCHIVE_OLPATH: Tests/Validation : NA12878/GIAB/Exons seuls/D'où vient la différence ?/Statistiques
:ARCHIVE_CATEGORY: GIAB
:ARCHIVE_TODO: KILL
:END:
Conflict BAM et vcf
#+begin_src julia
using GeneticVariation
reader = VCF.Reader(open("../out/test-bed/test-allchr.vcf", "r"))
fn = 0
notSeen = 0
missingAllel = 0
surnumAllel = 0
for record in reader
global fn, notSeen
global missingAllel, surnumAllel
if VCF.genotype(record)[1][2] == "FN" && VCF.genotype(record)[1][5] == "SNP"
# println(record)
# println("$(VCF.chrom(record)) $(VCF.pos(record))")
fn = fn +1
if VCF.genotype(record)[2][5] == "NOCALL"
notSeen = notSeen + 1
else
if VCF.genotype(record)[1][6] == "homalt"
missingAllel = missingAllel + 1
elseif VCF.genotype(record)[2][6] == "homalt"
surnumAllel = surnumAllel + 1
end
end
end
end
close(reader)
println("SNP: false negative $fn")
println("SNP: not seen $notSeen")
println("SNP: missing allel $missingAllel")
println("SNP: surnumeraries allel $surnumAllel")
puis
#+begin_src
gatk --java-options "-Xmx3072M" HaplotypeCaller --input 63003856_S135_chr15_inserted.bam --output testchr15.vcf.gz --reference genomeRef.fna --tmp-dir . -L NC_000015.10
SNP: false negative 6665
SNP: not seen 6361
SNP: missing allel 101
SNP: surnumeraries allel 203
1. On confirme le nombre de SNP.
2. Une minorité concerne des problème d'haploides avec
- 1/3 où l'exome manque une allèles
- 2/3 où il y a une allèle "en trop"
3. La majorité ne sont pas vu (6361)
Nombre de reads pour chaque position en bash (!)
#+begin_src bash
awk '!/^#/ && $10~/:FN:/ && $11~/NOCALL/ && $10~/SNP/ {print $1":"$2"-"$2}' test-allchr.vcf | xargs -I {} sh -c 'echo -n {}";"; samtools view ../NA12878_NIST.b
am {} | wc -l' > count.csv
* KILL Reads en dehors des exons
CLOSED: [2023-03-09 Thu 22:43]
:PROPERTIES:
:ARCHIVE_TIME: 2023-03-13 lun. 15:55
:ARCHIVE_FILE: c:/Users/apraga/org/projects/bisonex.org
:ARCHIVE_OLPATH: Tests/Validation : NA12878/GIAB/Exons seuls/D'où vient la différence ?/Statistiques
:ARCHIVE_CATEGORY: GIAB
:ARCHIVE_TODO: KILL
:END:
Ça a l'air sur des exemples simples ...
#+begin_src sh
wget https://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_genomic.gff.gz
awk '/BestRefSeq\texon/ && /transcript_id=NM/ {print $1"\t"$4"\t"$5;}' GRCh38_latest_genomic.gff | grep ^NC | save exons.csv
Aucun variant inséré
- base quality ok
-
**** DONE bam out : non appelé
CLOSED: [2023-05-01 Mon 21:57]
gatk --java-options "-Xmx3072M" HaplotypeCaller --input 63003856_S135_chr15_inserted.bam --output haplotypecaller-chr15.vcf.gz --reference genomeRef.f
na --tmp-dir . -L NC_000015.10 --bam-output debug.bam
**** DONE --linked-de-bruijn-graph : idem
CLOSED: [2023-05-01 Mon 21:57]
readlink testchr15.vcf.gz -f^C
[apraga@mesointeractive synthetic]$ gatk --java-options "-Xmx3072M" HaplotypeCaller --input 63003856_S135_chr15_inserted.bam --output haplotypecaller-chr15.vcf.gz --reference genomeRef.fna --tmp-dir . -L NC_000015.10 --linked-de-bruijn-graph
**** KILL regénérer fastq
CLOSED: [2023-05-13 Sat 18:29]
Non
*** KILL Générer bam données pour tous les chromosomes
CLOSED: [2023-05-13 Sat 18:29]
timeit julia -Jbisonex.so --project=. insertVariants.jl ~/code/bisonex/out/63003856/preprocessing/63003856_S135.bam 63003856_S135_inserted.bam
# Check the number of reads for each false negative (stored in count.csv)
# from hap.py output:
# awk '!/^#/ && $10~/:FN:/ && $11~/NOCALL/ && $10~/SNP/ {print $1":"$2"-"$2}' test-allchr.vcf | xargs -I {} sh -c 'echo -n {}";"; samtools view ../NA12878_NIST.b am {} | wc -l' > count.csv
40min 516ms 835µs 405ns
Avertissement:
[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
function groupedCount(fname)
f = DataFrame(CSV.File(fname, header=false))
transform!(f, :Column1 => ByRow(x -> string.(split(x, ":"))[1]) => :chrom)
transform!(f, :Column1 => ByRow(x -> parse(Int64, string.(split(x, "-"))[2])) => :pos)
select!(f, Not([:Column1]))
rename!(f, :Column2 => :nb)
groupby(f, :chrom)
end
Inserted.bam et excluded.bam (fichier avant le merge) ont l'air ok...
function groupedExons(fname)
f = DataFrame(CSV.File(fname, header=["chrom", "start", "end"], delim="\t"))
transform!(f, [:start, :end] => ByRow((x,y) -> Interval(x, y)) => :interval)
select!(f, Not([:start, :end]))
groupby(f, :chrom)
end
# Check is position x is in exons for chromosome chr
function isExon(x, exons, chr)
# Remove duplicates
exonsU = unique(exons[chr])
res = transform(exonsU, :interval => ByRow(z -> x in z) => :check)
# Result only found interval
found = res[res.check,:].interval
isempty(found) ? missing : first(found)
end
On réessaie à la main : ça passe
#+begin_src
samtools merge test-all.bam inserted.bam excluded.bam
❯ mv test-all.bam `63003856_S135_inserted.bam` -f
❯ mv test-all.bam.bai `63003856_S135_chr15_inserted.bam.bai` -f
reads = groupedCount("../NA12878/happy/count.csv")
exons = groupedExons("exons.csv")
for g in range(1, size(reads)[1])
local found
found = transform(reads[g], :pos => ByRow(x -> isExon(x, exons, g)) => :exon)
readsMissing = found[ismissing.(found.exon), :]
# print("$(k.chrom):")
print(g)
print(": $(size(reads[g])[1]) reads, ")
println("$(size(readsMissing)[1]) not in exons")
end
*** DONE BAm2fastq pour avoir CIGAR à jour : échec (variants "cachés")
CLOSED: [2023-05-04 Thu 20:30] SCHEDULED: <2023-05-01 Mon>
1: 448 reads, 64 not in exons
2: 307 reads, 47 not in exons
3: 228 reads, 37 not in exons
4: 175 reads, 28 not in exons
5: 237 reads, 38 not in exons
6: 1223 reads, 304 not in exons
7: 274 reads, 52 not in exons
8: 414 reads, 41 not in exons
9: 161 reads, 47 not in exons
10: 194 reads, 37 not in exons
11: 304 reads, 39 not in exons
12: 254 reads, 32 not in exons
13: 168 reads, 120 not in exons
14: 184 reads, 26 not in exons
15: 250 reads, 90 not in exons
16: 239 reads, 29 not in exons
17: 373 reads, 35 not in exons
18: 85 reads, 17 not in exons
19: 576 reads, 46 not in exons
20: 83 reads, 22 not in exons
21: 67 reads, 14 not in exons
22: 117 reads, 24 not in exons
On lance la génération de bam depuis le mesocentro (la copie plante via le VPN)
#+begin_src sh
cd /Work/Users/apraga/recherche/bisonex/generate
julia --project=. insertVariants.jl ../../../bisonex/out/63003856_S135/preprocessing/applybqsr/63003856_S135.bam 63003856_S135_inserted.bam
* KILL Répartition par chromosome: variable mais seul le 6 = hotspot
CLOSED: [2023-03-07 Tue 22:44]
:PROPERTIES:
:ARCHIVE_TIME: 2023-03-13 lun. 15:55
:ARCHIVE_FILE: c:/Users/apraga/org/projects/bisonex.org
:ARCHIVE_OLPATH: Tests/Validation : NA12878/GIAB/Exons seuls/D'où vient la différence ?/Statistiques
:ARCHIVE_CATEGORY: GIAB
:ARCHIVE_TODO: KILL
:END:
#+begin_src julia
using CSV, DataFrames
f = DataFrame(CSV.File("count.csv", header=false))
size(f[f.Column2 .< 10,:])
Workflow après avec désactivé storeDir pour SAMTOOLS_BAM2FQ dans nextflow.config (pourquoi ??)
#+begin_src nextflow
include { SAMTOOLS_BAM2FQ } from "${params.modulesDir}/samtools/bam2fq/main"
include { SAMTOOLS_SORT as sortBamByName } from "${params.modulesDir}/samtools/sort/main"
(4430, 2)
#+begin_src julia
using CSV, DataFrames
workflow {
f = Channel.fromPath("${params.dataDir}/synthetic/63003856_S135_inserted.bam",
checkIfExists: true).map{it -> [["id": "synthetic_63003856"], it]}
// Important: use "-n" option !!
sortBamByName(f)
SAMTOOLS_BAM2FQ(sortBamByName.out.bam, true)
}
function cleanCount(fname)
f = DataFrame(CSV.File(fname, header=false))
transform!(f, :Column1 => ByRow(x -> string.(split(x, ":"))[1]) => :chrom)
transform!(f, :Column1 => ByRow(x -> string.(split(x, "-"))[2]) => :pos)
select!(f, Not([:Column1]))
rename!(f, :Column2 => :nb)
end
f = cleanCount("count.csv")
grouped = groupby(f, :chrom)
for (k, g) in pairs(grouped)
println("$(k.chrom): $(size(g, 1))")
end
Puis
#+begin_src
cp work/34/fb2fc136f6f6d7f42d0960512f06de/*.fq.gz /Work/Groups/bisonex/data/synthetic/
#+RESULTS:
: NC_000001.11: 448
: NC_000002.12: 307
: NC_000003.12: 228
: NC_000004.12: 175
: NC_000005.10: 237
: NC_000006.12: 1223
: NC_000007.14: 274
: NC_000008.11: 414
: NC_000009.12: 161
: NC_000010.11: 194
: NC_000011.10: 304
: NC_000012.12: 254
: NC_000013.11: 168
: NC_000014.9: 184
: NC_000015.10: 250
: NC_000016.10: 239
: NC_000017.11: 373
: NC_000018.10: 85
: NC_000019.10: 576
: NC_000020.11: 83
: NC_000021.9: 67
: NC_000022.11: 117
* KILL Awk
CLOSED: [2023-03-12 Sun 22:12]
:PROPERTIES:
:ARCHIVE_TIME: 2023-03-13 lun. 15:55
:ARCHIVE_FILE: c:/Users/apraga/org/projects/bisonex.org
:ARCHIVE_OLPATH: Tests/Validation : NA12878/GIAB/Exons seuls/D'où vient la différence ?/Statistiques
:ARCHIVE_CATEGORY: GIAB
:ARCHIVE_TODO: KILL
:END:
NB: awk car bcftools query ne semble pas prendre en compte les nouvelles colonnes
On confirnme le nombre de SNP:
❯ awk '!/^#/ && $10~/:FN:/ && $10~/SNP/' test-allchr.vcf | wc -l
6665
Une minorité concerne des problème d'haploides
❯ awk '!/^#/ && $10~/:FN:/ && $11!~/NOCALL/ && $10~/SNP/' test-allchr.vcf
304
avec 1/3 où l'exome manque une allèles
❯ awk '!/^#/ && $10~/:FN:/ && $11!~/NOCALL/ && $10~/SNP/ && $10~/homalt/' test-allchr.vcf | wc -l
101
et 2/3 où il y a une allèle "en trop"
La majorité ne sont pas vu
❯ awk '!/^#/ && $10~/:FN:/ && $11~/NOCALL/ && $10~/SNP/' test-allchr.vcf | wc -l
6361
Nombre de reads pour chaque position en bash (!)
#+begin_src bash
awk '!/^#/ && $10~/:FN:/ && $11~/NOCALL/ && $10~/SNP/ {print $1":"$2"-"$2}' test-allchr.vcf | xargs -I {} sh -c 'echo -n {}";"; samtools view ../NA12878_NIST.b
am {} | wc -l' > count.csv
*** KILL Lancer pipeline
CLOSED: [2023-05-04 Thu 20:30] SCHEDULED: <2023-05-01 Mon>
NXF_OPTS=-D"user.name=apraga" nextflow run main.nf -c nextflow.config -profile standard,helios -bg --input="/Work/Groups/bisonex/data/synthetic/synthetic_63003856_{1,2}.fq.gz" --outdir out/synthetic_63003856
:ARCHIVE_TIME: 2023-09-10 Sun 16:45
:ARCHIVE_FILE: ~/roam/personal/projects/bisonex.org
:ARCHIVE_CATEGORY: bisonex
:ARCHIVE_TIME: 2023-03-13 lun. 15:55
:ARCHIVE_FILE: c:/Users/apraga/org/projects/bisonex.org
:ARCHIVE_OLPATH: Tests/Validation : NA12878/GIAB/Exons seuls/D'où vient la différence ?/Statistiques
:ARCHIVE_CATEGORY: GIAB
:ARCHIVE_TODO: KILL
on est inférieur à Kumaran et al 2019
** KILL Regarder les FN (SNP)
CLOSED: [2023-03-13 lun. 15:55] SCHEDULED: <2023-03-04 Sat >
*** Manuel:
NC_000001.11:1385919 pas de read 1/1:FN:.:i1_5:INDEL:homalt:.
NC_000001.11:1623412 1 read 1/1:FN:.:ti:SNP:homalt:.
NC_000001.11:1668449 33 read sur 160 voient l'allèle alternative 1/1:FN:am:ti:SNP:homalt:.
NC_000001.11:1676135 67 reads, non vu 0/1:FN:.:ti:SNP:het:.
NC_000001.11:1734812 1/1:FN:.:ti:SNP:homalt:.
NC_000001.11:1745808 1/1:FN:.:ti:SNP:homalt:.
NC_000001.11:1745814 1/1:FN:.:ti:SNP:homalt:.
NC_000001.11:1953616 1/1:FN:.:ti:SNP:homalt:.
NC_000001.11:2512975 0/1:FN:.:ti:SNP:het: