ENKL5IPU7VOUHAGDROOZVFE7GJO3J6Q3ZBSYRTI62PF3XCINGL5AC
- On vérifie après alignement : 19 de profondeur (attention il faut -Q 0 pour correspondre)
- ```bash
cd /Work/Users/apraga/bisonex/out/preprocessing/mapped/2200519525-63060439-GRCh38
samtools mpileup 2200519525-63060439-GRCh38.aligned.bam -r chr15:26869324-26869324 -Q 0
- On vérifie que cela ne vient pas de l'alignement : GRCH38-noalt et GRCh37
- ```bash
[mpileup] 1 samples in 1 input files
chr15 26869324 N 19 AaAaaTTATtTTttAattA k:!k!!kFk!!kk!k!k!F
```
- On essaie avec la version noalt : idem !
- ```bash
cd /Work/Users/apraga/bisonex/work/2c/63779dffb657b13fcfd2f4f09da8e2
mv bwa .bwa
ln -s /Work/Projects/bisonex/data/fasta/GRCh38-noalt/bwa .
mv 2200519525-63060439-GRCh38.aligned.bam 2200519525-63060439-GRCh38.aligned-grch38.bam
bwa mem -R '@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tPM:Miseq\tCN:CHU_Minjoz\tLB:definition_to_add' -t 24 bwa/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna 63060439_S92_R1_001.fastq.gz 63060439_S92_R2_001.fastq.gz | samtools sort --threads 24 -o 2200519525-63060439-GRCh38.aligned.bam -
samtools mpileup 2200519525-63060439-GRCh38.aligned.bam -r chr15:26869324-26869324 -Q 0
bwa mem -R '@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tPM:Miseq\tCN:CHU_Minjoz\tLB:definition_to_add' -t 24 /Work/Groups/bisonex/data/fasta/GRCh37/hg19.p13.plusMT.no_alt_analysis_set/hg19.p13.plusMT.no_alt_analysis_set.fa 63060439_S92_R1_001.fastq.gz 63060439_S92_R2_001.fastq.gz | samtools sort --threads 24 -o 2200519525-63060439-GRCh37.aligned.bam -
bwa mem -R '@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tPM:Miseq\tCN:CHU_Minjoz\tLB:definition_to_add' -t 24 /Work/Groups/bisonex/data/fasta/GRCh38-noalt/bwa/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna 63060439_S92_R1_001.fastq.gz 63060439_S92_R2_001.fastq.gz | samtools sort --threads 24 -o 2200519525-63060439-GRCh38-noalt.aligned.bam -
```
- Dernier test : avec GRCh37 no alt. Il n'y a pas la version avec les index bwa sur le FTP ncbi donc on utilise
- ```bash
cd /Work/Groups/bisonex/data/fasta/GRCh37/
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/analysisSet/hg19.p13.plusMT.no_alt_analysis_set.fa.gz_pipelines/GCA_000001405.14_GRCh37.p13_no_alt_analysis_set.fna.g
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/analysisSet/hg19.p13.plusMT.no_alt_analysis_set.bwa_index.tar.gz
z bisonex
cd test/gabra5
ln -s /Work/Groups/bisonex/data/fasta/GRCh37/hg19.p13.plusMT.no_alt_analysis_set .
bwa mem -R '@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tPM:Miseq\tCN:CHU_Minjoz\tLB:definition_to_add' -t 24 hg19.p13.plusMT.no_alt_analysis_set/hg19.p13.plusMT.no_alt_analysis_set.fa 63060439_S92_R1_001.fastq.gz 63060439_S92_R2_001.fastq.gz | samtools sort --threads 24 -o 2200519525-63060439-GRCh37.aligned.bam -
```
- Après liftover (27114471), pareils ...
```bash
samtools mpileup 2200519525-63060439-GRCh37.aligned.bam -r chr15:27114471-27114471 -Q 0
samtools mpileup 2200519525-63060439-GRCh37.aligned.bam -r chr15:27114471-27114471 -Q 0
- TODO: markduplicates
- gatk MarkDuplicates --INPUT 2200519525-63060439-GRCh37.aligned.bam --OUTPUT 2200519525-63060439-GRCh37.markedup.bam --REFERENCE_SEQUENCE /Work/Groups/bisonex/data/fasta/GRCh37/hg19.p13.plusMT.no_alt_analysis_set.fa.gz --METRICS_FILE metrics
-
- TODO: tested freebays (cf VCF cento)
- Probablement un problème des fastq ? Argument en faveurs
- pas eu ce problème sur les autres variants
- bwa-mem semble signaler une erreur
```bash
[M::mem_process_seqs] Processed 734826 reads in 150.735 CPU sec, 6.449 real sec
[W::bseq_read] the 1st file has fewer sequences.
[W::bseq_read] the 1st file has fewer sequences.
[gzclose] buffer error
[bam_sort_core] merg
```
- Effectivement :
```bash
du -hs *63060439*/*
558M 2200519525_63060439/63060439_S92_R1_001.fastq.gz
2.6G 2200519525_63060439/63060439_S92_R2_001.fastq.gz
```
- D'après un VCF
- Aligneur : freebayes
- Génome GRCh37p13
- GABRA5 : problème du fastq [[Comparaison à centogène]].
- Cento utilise freebayes d'après les VCF [[Centogene config]]