B:BD[
7.8430] → [
2.70:8262]
checkIfExists: true)
bamIndex = bam.map { it -> it + ".bai" }
downloadGenome | indexGenome
indexGenome.out.index | view
addSNV(f, bam, bamIndex, downloadGenome.out, indexGenome.out.index, indexGenome.out.dict, indexGenome.out.fai)
}
#+end_src
****** DONE v1.3: Lancer le pipeline pour vérifier qu'on retrouve les variants
CLOSED: [2023-05-19 Fri 18:41] SCHEDULED: <2023-05-18 Thu>
****** TODO Corrigier position pour avoir une bonne VAF
SCHEDULED: <2023-05-19 Fri>
POS POS+1 VAF ALT
Attention, la base corrigée est à POS+1...
****** DONE Comparaison manuelle avec julia (VAF = 1...)
CLOSED: [2023-05-19 Fri 21:58]
552 found over 585
***** TODO del
***** TODO ins
*** TODO Julia : Bamscissors :bamscissors:
Modification de la séquence dans BAM.
*Pas de mise à jour de CIGAR*
On convertit en fastq et on lance le pipeline pour "corriger"
#+begin_src sh
cd /home/alex/code/bisonex/out/63003856/preprocessing/mapped
samtools view 63003856_S135.bam NC_000022.11 -o 63003856_S135_chr22.bam
cd /home/alex/recherche/bisonex/code/BamScissors.jl
cp ~/code/bisonex/out/63003856/preprocessing/mapped/63003856_S135_chr22.bam .
samtools index 63003856_chr22.bam
#+end_src
Le script va modifier le bam, le trier et générer le fastq. !!!
Attention: ne pas oublier l'option -n !!!
#+begin_src sh
time julia --project=.. insertVariant.jl
scp 63003856_S135_chr22_{1,2}.fq.gz meso:/Work/Users/apraga/bisonex/tests/bamscissors/
#+end_src
**** TODO Phase 1 : chr22, VAF=1
***** DONE 1 SNV : ok !
CLOSED: [2023-05-20 Sat 19:35] SCHEDULED: <2023-05-20 Sat>
#+begin_src
make run READS="tests/bamscissors/corrected_{1,2}.fq.gz"
Puis on lance le pipeline sur correct_1
- [X] Variant visible dans IGV
- [X] Variant visible après alignement
- [X] Variant visible après appel de variant
***** TODO Tous les SNV du chromosome
SCHEDULED: <2023-05-20 Sat>
**** DONE PHase 2 : chr22, VAF variable: de nombreux reads sont perdus -> ok sur un SNV en alignant sur chromosome 22
CLOSED: [2023-05-24 Wed 23:19]
Problème dansr le BAM car sans les insertion
***** DONE Filtrer les reads sans pair : idem
CLOSED: [2023-05-23 Tue 00:01]
FOUND:Found 16662 unpaired mates
at htsjdk.samtools.SAMUtils.processValidationError(SAMUtils.java:470)
at picard.sam.SamToFastq.doWork(SamToFastq.java:224)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
ex: chr22:42213078
On essaie
samtools view ~/code/bisonex/out/63003856/preprocessing/mapped/63003856_S135.bam NC_000022.11 -hf 0x2 -o 63003856_chr22.bam
Ne rale pas...
SCHEDULED: <2023-05-22 Mon>
***** DONE Problème de la conversion BAM -> fastq ?
CLOSED: [2023-05-24 Wed 23:19]
****** DONE Test bamtofastq sur le BAM originel: idem
CLOSED: [2023-05-23 Tue 01:16]
Dans ~/recherche/code/bisonex/Bamscissors.jl
samtools view ~/code/bisonex/out/63003856/preprocessing/mapped/63003856_S135.bam NC_000022.11 -hf 0x2 -o 63003856_chr22.bam
On trie bien par nom de read
samtools sort -n 63003856_chr22.bam -o 63003856_chr22_sorted.bam
Conversion en fastq. Attention à bien prendre le *fichier trié* !
samtools bam2fq -1 63003856_chr22_init1.fq.gz -2 63003856_chr22_init2.fq.gz -n 63003856_chr22_sorted.bam
[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 2592110 reads
On envoie sur le mésocentre
scp 63003856_chr22_init{1,2}.fq.gz meso:/Work/Users/apraga/bisonex/tests/bamscissors
On démarre un job avec 24 coeurs pour aller vite
srun -c 24 -p smp -t 1:00:00 --pty bash
bwa mem -t 24 /Work/Projects/bisonex/data/genome/GRCh38.p13/bwa/genomeRef 63003856_chr22_init1.fq.gz 63003856_chr22_init2.fq.gz | samtools sort -@24 -o output.bam -
Dans /home/alex/recherche/bisonex/code/BamScissors.jl/aligned
❯ scp meso:/Work/Users/apraga/bisonex/tests/bamscissors/output.bam*
****** DONE Avec samtools fastq
CLOSED: [2023-05-23 Tue 01:16]
samtools sort -n 63003856_chr22.bam -o 63003856_chr22_sorted.bam
samtools fastq 63003856_chr22_1.fq.gz -2 63003856_chr22_2.fq.gz -0 /dev/null -s /dev/null -o 63003856_chr22_sorted.bam
scp 63003856_chr22_{1,2}.fq.gz meso:/Work/Users/apraga/bisonex/tests/bamscissors/
mesocentre
bwa mem -t 24 /Work/Projects/bisonex/data/genome/GRCh38.p13/bwa/genomeRef 63003856_chr22_1.fq.gz 63003856_chr22_2.fq.gz | samtools sort -@24 -o output.bam
scp meso:/Work/Users/apraga/bisonex/tests/bamscissors/output.bam\* aligned/
****** DONE En rajoutant .fna dans le doserrie (+samtools fastq): idem
CLOSED: [2023-05-23 Tue 01:16]
ln -s /Work/Projects/bisonex/data/genome/GRCh38.p13/bwa/genomeRef* .
ln -s /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef* .
bwa mem -t 24 genomeRef 63003856_chr22_1.fq.gz 63003856_chr22_2.fq.gz | samtools sort -@24 -o output.bam
****** DONE Test picard : idem
CLOSED: [2023-05-23 Tue 01:16]
Dans bisonex/code/BamScissors.jl
❯ picard SamToFastq -I 63003856_chr22_sorted.bam -F testpicard1.fastq -F2 testpicard2.fastq
bwa mem -t 24 /Work/Projects/bisonex/data/genome/GRCh38.p13/bwa/genomeRef testpicard1.fastq.gz testpicard2.fastq.gz | samtools sort -@24 -o testpicard.bam -
****** DONE Il ne manque pas des reads dans les fastq :
CLOSED: [2023-05-23 Tue 01:26]
bisonex/code/BamScissors.jl on bamscissors [!?] via ஃ v1.9.0
❯ samtools flagstat aligned/output.bam
1306273 + 0 in total (QC-passed reads + QC-failed reads)
1296055 + 0 primary
0 + 0 secondary
10218 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
1304871 + 0 mapped (99.89% : N/A)
1294653 + 0 primary mapped (99.89% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
bisonex/code/BamScissors.jl on bamscissors [!?] via ஃ v1.9.0
❯ samtools flagstat 63003856_chr22.bam
2609214 + 0 in total (QC-passed reads + QC-failed reads)
2592110 + 0 primary
0 + 0 secondary
17104 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
2609214 + 0 mapped (100.00% : N/A)
2592110 + 0 primary mapped (100.00% : N/A)
2592110 + 0 paired in sequencing
1296055 + 0 read1
1296055 + 0 read2
2592110 + 0 properly paired (100.00% : N/A)
2592110 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
❯ echo $(zcat 63003856_chr22_init2.fq.gz | wc -l)/4 | bc
1296055
bisonex/code/BamScissors.jl on bamscissors [!?] via ஃ v1.9.0 took 2s
❯ echo $(zcat 63003856_chr22_init1.fq.gz | wc -l)/4 | bc
1296055
❯ samtools flagstat 63003856_chr22_sorted.bam
2609214 + 0 in total (QC-passed reads + QC-failed reads)
2592110 + 0 primary
0 + 0 secondary
17104 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
2609214 + 0 mapped (100.00% : N/A)
2592110 + 0 primary mapped (100.00% : N/A)
2592110 + 0 paired in sequencing
1296055 + 0 read1
1296055 + 0 read2
2592110 + 0 properly paired (100.00% : N/A)
2592110 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
****** DONE Avec sort -O BAM idem
CLOSED: [2023-05-23 Tue 01:21]
****** DONE Singleton ou non mapped ? nonVirtPosition
CLOSED: [2023-05-23 Tue 01:22]
❯ samtools bam2fq -1 63003856_chr22_init1.fq.gz -2 63003856_chr22_init2.fq.gz -0 both -s single.fq.gz -n 63003856_chr22_sorted.bam
bisonex/code/BamScissors.jl on bamscissors [!?] via ஃ v1.9.0
❯ wc -l 63003856_chr22_init* both single.fq.gz
403540 63003856_chr22_init1.fq.gz
404788 63003856_chr22_init2.fq.gz
0 both
0 single.fq.gz
808328 total
****** Problème d'aligner car les reads sont bien dans le .fastq ?
Ex:
zgrep "A00853:477:HMLWYDSX3:2:2624:2826:18630" 63003856_chr22_{1,2}.fq.gz
63003856_chr22_1.fq.gz:@A00853:477:HMLWYDSX3:2:2624:2826:1863
checkIfExists: true)
bamIndex = bam.map { it -> it + ".bai" }
downloadGenome | indexGenome
indexGenome.out.index | view
addSNV(f, bam, bamIndex, downloadGenome.out, indexGenome.out.index, indexGenome.out.dict, indexGenome.out.fai)
}
#+end_src
****** DONE v1.3: Lancer le pipeline pour vérifier qu'on retrouve les variants
CLOSED: [2023-05-19 Fri 18:41] SCHEDULED: <2023-05-18 Thu>
****** TODO Corrigier position pour avoir une bonne VAF
SCHEDULED: <2023-05-19 Fri>
POS POS+1 VAF ALT
Attention, la base corrigée est à POS+1...
****** DONE Comparaison manuelle avec julia (VAF = 1...)
CLOSED: [2023-05-19 Fri 21:58]
552 found over 585
***** TODO del
***** TODO ins
*** TODO Julia : Bamscissors :bamscissors:
Modification de la séquence dans BAM.
*Pas de mise à jour de CIGAR*
On convertit en fastq et on lance le pipeline pour "corriger"
#+begin_src sh
cd /home/alex/code/bisonex/out/63003856/preprocessing/mapped
samtools view 63003856_S135.bam NC_000022.11 -o 63003856_S135_chr22.bam
cd /home/alex/recherche/bisonex/code/BamScissors.jl
cp ~/code/bisonex/out/63003856/preprocessing/mapped/63003856_S135_chr22.bam .
samtools index 63003856_chr22.bam
#+end_src
Le script va modifier le bam, le trier et générer le fastq. !!!
Attention: ne pas oublier l'option -n !!!
#+begin_src sh
time julia --project=.. insertVariant.jl
scp 63003856_S135_chr22_{1,2}.fq.gz meso:/Work/Users/apraga/bisonex/tests/bamscissors/
#+end_src
**** TODO Implémenter les SNV avec VAF
SCHEDULED: <2023-05-27 Sat>
Stratégie :
1. calculer la profondeur sur les positions
2. créer un dictionnaire { nom du reads : position dataframe }
3. itérer sur tous les reads et changer ceux marqués
**** TODO Implémenter les indel avec VAF
SCHEDULED: <2023-05-29 Mon>
**** TODO Phase 1 : chr22, VAF=1
***** DONE 1 SNV : ok !
CLOSED: [2023-05-20 Sat 19:35] SCHEDULED: <2023-05-20 Sat>
#+begin_src
make run READS="tests/bamscissors/corrected_{1,2}.fq.gz"
Puis on lance le pipeline sur correct_1
- [X] Variant visible dans IGV
- [X] Variant visible après alignement
- [X] Variant visible après appel de variant
***** TODO Tester SNV chromosome 22
SCHEDULED: <2023-05-20 Sat>
**** DONE PHase 2 : chr22, VAF variable: de nombreux reads sont perdus -> ok sur un SNV en alignant sur chromosome 22
CLOSED: [2023-05-24 Wed 23:19]
Problème dansr le BAM car sans les insertion
***** DONE Filtrer les reads sans pair : idem
CLOSED: [2023-05-23 Tue 00:01]
FOUND:Found 16662 unpaired mates
at htsjdk.samtools.SAMUtils.processValidationError(SAMUtils.java:470)
at picard.sam.SamToFastq.doWork(SamToFastq.java:224)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
ex: chr22:42213078
On essaie
samtools view ~/code/bisonex/out/63003856/preprocessing/mapped/63003856_S135.bam NC_000022.11 -hf 0x2 -o 63003856_chr22.bam
Ne rale pas...
SCHEDULED: <2023-05-22 Mon>
***** DONE Problème de la conversion BAM -> fastq ?
CLOSED: [2023-05-24 Wed 23:19]
****** DONE Test bamtofastq sur le BAM originel: idem
CLOSED: [2023-05-23 Tue 01:16]
Dans ~/recherche/code/bisonex/Bamscissors.jl
samtools view ~/code/bisonex/out/63003856/preprocessing/mapped/63003856_S135.bam NC_000022.11 -hf 0x2 -o 63003856_chr22.bam
On trie bien par nom de read
samtools sort -n 63003856_chr22.bam -o 63003856_chr22_sorted.bam
Conversion en fastq. Attention à bien prendre le *fichier trié* !
samtools bam2fq -1 63003856_chr22_init1.fq.gz -2 63003856_chr22_init2.fq.gz -n 63003856_chr22_sorted.bam
[M::bam2fq_mainloop] discarded 0 singletons
[M::bam2fq_mainloop] processed 2592110 reads
On envoie sur le mésocentre
scp 63003856_chr22_init{1,2}.fq.gz meso:/Work/Users/apraga/bisonex/tests/bamscissors
On démarre un job avec 24 coeurs pour aller vite
srun -c 24 -p smp -t 1:00:00 --pty bash
bwa mem -t 24 /Work/Projects/bisonex/data/genome/GRCh38.p13/bwa/genomeRef 63003856_chr22_init1.fq.gz 63003856_chr22_init2.fq.gz | samtools sort -@24 -o output.bam -
Dans /home/alex/recherche/bisonex/code/BamScissors.jl/aligned
❯ scp meso:/Work/Users/apraga/bisonex/tests/bamscissors/output.bam*
****** DONE Avec samtools fastq
CLOSED: [2023-05-23 Tue 01:16]
samtools sort -n 63003856_chr22.bam -o 63003856_chr22_sorted.bam
samtools fastq 63003856_chr22_1.fq.gz -2 63003856_chr22_2.fq.gz -0 /dev/null -s /dev/null -o 63003856_chr22_sorted.bam
scp 63003856_chr22_{1,2}.fq.gz meso:/Work/Users/apraga/bisonex/tests/bamscissors/
mesocentre
bwa mem -t 24 /Work/Projects/bisonex/data/genome/GRCh38.p13/bwa/genomeRef 63003856_chr22_1.fq.gz 63003856_chr22_2.fq.gz | samtools sort -@24 -o output.bam
scp meso:/Work/Users/apraga/bisonex/tests/bamscissors/output.bam\* aligned/
****** DONE En rajoutant .fna dans le doserrie (+samtools fastq): idem
CLOSED: [2023-05-23 Tue 01:16]
ln -s /Work/Projects/bisonex/data/genome/GRCh38.p13/bwa/genomeRef* .
ln -s /Work/Projects/bisonex/data/genome/GRCh38.p13/genomeRef* .
bwa mem -t 24 genomeRef 63003856_chr22_1.fq.gz 63003856_chr22_2.fq.gz | samtools sort -@24 -o output.bam
****** DONE Test picard : idem
CLOSED: [2023-05-23 Tue 01:16]
Dans bisonex/code/BamScissors.jl
❯ picard SamToFastq -I 63003856_chr22_sorted.bam -F testpicard1.fastq -F2 testpicard2.fastq
bwa mem -t 24 /Work/Projects/bisonex/data/genome/GRCh38.p13/bwa/genomeRef testpicard1.fastq.gz testpicard2.fastq.gz | samtools sort -@24 -o testpicard.bam -
****** DONE Il ne manque pas des reads dans les fastq :
CLOSED: [2023-05-23 Tue 01:26]
bisonex/code/BamScissors.jl on bamscissors [!?] via ஃ v1.9.0
❯ samtools flagstat aligned/output.bam
1306273 + 0 in total (QC-passed reads + QC-failed reads)
1296055 + 0 primary
0 + 0 secondary
10218 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
1304871 + 0 mapped (99.89% : N/A)
1294653 + 0 primary mapped (99.89% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
bisonex/code/BamScissors.jl on bamscissors [!?] via ஃ v1.9.0
❯ samtools flagstat 63003856_chr22.bam
2609214 + 0 in total (QC-passed reads + QC-failed reads)
2592110 + 0 primary
0 + 0 secondary
17104 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
2609214 + 0 mapped (100.00% : N/A)
2592110 + 0 primary mapped (100.00% : N/A)
2592110 + 0 paired in sequencing
1296055 + 0 read1
1296055 + 0 read2
2592110 + 0 properly paired (100.00% : N/A)
2592110 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
❯ echo $(zcat 63003856_chr22_init2.fq.gz | wc -l)/4 | bc
1296055
bisonex/code/BamScissors.jl on bamscissors [!?] via ஃ v1.9.0 took 2s
❯ echo $(zcat 63003856_chr22_init1.fq.gz | wc -l)/4 | bc
1296055
❯ samtools flagstat 63003856_chr22_sorted.bam
2609214 + 0 in total (QC-passed reads + QC-failed reads)
2592110 + 0 primary
0 + 0 secondary
17104 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
2609214 + 0 mapped (100.00% : N/A)
2592110 + 0 primary mapped (100.00% : N/A)
2592110 + 0 paired in sequencing
1296055 + 0 read1
1296055 + 0 read2
2592110 + 0 properly paired (100.00% : N/A)
2592110 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
****** DONE Avec sort -O BAM idem
CLOSED: [2023-05-23 Tue 01:21]
****** DONE Singleton ou non mapped ? nonVirtPosition
CLOSED: [2023-05-23 Tue 01:22]
❯ samtools bam2fq -1 63003856_chr22_init1.fq.gz -2 63003856_chr22_init2.fq.gz -0 both -s single.fq.gz -n 63003856_chr22_sorted.bam
bisonex/code/BamScissors.jl on bamscissors [!?] via ஃ v1.9.0
❯ wc -l 63003856_chr22_init* both single.fq.gz
403540 63003856_chr22_init1.fq.gz
404788 63003856_chr22_init2.fq.gz
0 both
0 single.fq.gz
808328 total
****** Problème d'aligner car les reads sont bien dans le .fastq ?
Ex:
zgrep "A00853:477:HMLWYDSX3:2:2624:2826:18630" 63003856_chr22_{1,2}.fq.gz
63003856_chr22_1.fq.gz:@A00853:477:HMLWYDSX3:2:2624:2826:1863