ZJH3FM7YCDLERF3DF4AOPQS7C7SUZIOWTI5ZEPJBKKOFRHM7W4NAC
X6HMTGMJ7M3335P2EFBCMTOZ3FQKDY2ZPUHNXIVUSODFBNZCPMMQC
RHWQQAAHNHFO3FLCGVB3SIDKNOUFJGZTDNN57IQVBMXXCWX74MKAC
Q46IVB6DYWJI7EAXQ7ROWCUTL7XRFYWFUGTWSFOITWEMMSQMQ5DAC
LJ2RW6IZC67XTZEJO4ZCFN625XC3ZOVJSBUG4VQ2F3ILS6ELBCIAC
FXA3ZBV64FML7W47IPHTAJFJHN3J3XHVHFVNYED47XFSBIGMBKRQC
GGUC6BRBVBMIWZAYAYHNLL37QXWKTUUANB4PBMHIJIBJWBWP7JPAC
Y4X2CGFKO6ZYMC4MU43CKFQGLHOO45KPCZXXIYD7RL7LYQCVXIQQC
* <2023-03-08 Wed> Workout
** Circuit
Haut: Muscle-up - Droit - L-sit - Tucked pushed - Muscle negative - tucked pull up - Skin-the-cat - Straddle negative
Bas: pistolsx4 - norwegian curlsx4 - extension x22s - compression x9
3x1
***** TODO Comprendre pourquoi on est inférieur à Kumaran et al 2019
SCHEDULED: <2023-03-05 Sun>
****** TODO Comprende/améliorer Recall SNP 0.855
SCHEDULED: <2023-03-04 Sat>
******* TODO Regarder les FN (SNP)
SCHEDULED: <2023-03-04 Sat>
******** Manuel:
NC_000001.11:1385919 pas de read 1/1:FN:.:i1_5:INDEL:homalt:.
NC_000001.11:1623412 1 read 1/1:FN:.:ti:SNP:homalt:.
NC_000001.11:1668449 33 read sur 160 voient l'allèle alternative 1/1:FN:am:ti:SNP:homalt:.
NC_000001.11:1676135 67 reads, non vu 0/1:FN:.:ti:SNP:het:.
NC_000001.11:1734812 1/1:FN:.:ti:SNP:homalt:.
NC_000001.11:1745808 1/1:FN:.:ti:SNP:homalt:.
NC_000001.11:1745814 1/1:FN:.:ti:SNP:homalt:.
NC_000001.11:1953616 1/1:FN:.:ti:SNP:homalt:.
NC_000001.11:2512975 0/1:FN:.:ti:SNP:het:
***** DONE Re-tester avec exons Refseq et option -T: résultats un peu moins bons
CLOSED: [2023-03-09 Thu 22:42] SCHEDULED: <2023-03-08 Wed>
On utilise directement les coordonées données par refseq
******* TODO Statistiques
Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision
INDEL ALL 7226 3417 3809 6978 1599 1918 228 353 0.472876 0.683992
SNP ALL 59052 37825 21227 43480 1913 3740 675 35 0.640537 0.951862
METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio
0.274864 0.559171 NaN NaN 1.547733 2.756151
0.086017 0.765767 2.433271 2.350281 1.575230 1.492346
***** TODO D'où vient la différence ?
SCHEDULED: <2023-03-09 Thu>
****** TODO Statistiques
using CSV, DataFrames
using CSV, DataFrames, Intervals
# Check the number of reads for each false negative (stored in count.csv)
# from hap.py output:
# awk '!/^#/ && $10~/:FN:/ && $11~/NOCALL/ && $10~/SNP/ {print $1":"$2"-"$2}' test-allchr.vcf | xargs -I {} sh -c 'echo -n {}";"; samtools view ../NA12878_NIST.b am {} | wc -l' > count.csv
f = groupedCount("count.csv")
# Check is position x is in exons for chromosome chr
function isExon(x, exons, chr)
# Remove duplicates
exonsU = unique(exons[chr])
res = transform(exonsU, :interval => ByRow(z -> x in z) => :check)
# Result only found interval
found = res[res.check,:].interval
isempty(found) ? missing : first(found)
end
reads = groupedCount("../NA12878/happy/count.csv")
g = 1
x = parse(Int64, f[g][1, :pos])
x in exons[g][1, :interval]
for g in range(1, size(reads)[1])
local found
found = transform(reads[g], :pos => ByRow(x -> isExon(x, exons, g)) => :exon)
readsMissing = found[ismissing.(found.exon), :]
# print("$(k.chrom):")
print(g)
print(": $(size(reads[g])[1]) reads, ")
println("$(size(readsMissing)[1]) not in exons")
end
****** TODO Vérifier qu'il ne manque pas des exons (avec bam ?)
1: 448 reads, 64 not in exons
2: 307 reads, 47 not in exons
3: 228 reads, 37 not in exons
4: 175 reads, 28 not in exons
5: 237 reads, 38 not in exons
6: 1223 reads, 304 not in exons
7: 274 reads, 52 not in exons
8: 414 reads, 41 not in exons
9: 161 reads, 47 not in exons
10: 194 reads, 37 not in exons
11: 304 reads, 39 not in exons
12: 254 reads, 32 not in exons
13: 168 reads, 120 not in exons
14: 184 reads, 26 not in exons
15: 250 reads, 90 not in exons
16: 239 reads, 29 not in exons
17: 373 reads, 35 not in exons
18: 85 reads, 17 not in exons
19: 576 reads, 46 not in exons
20: 83 reads, 22 not in exons
21: 67 reads, 14 not in exons
22: 117 reads, 24 not in exons
******* TODO Comprendre/améliorer Recall SNP 0.855
SCHEDULED: <2023-03-04 Sat>
on est inférieur à Kumaran et al 2019
******** TODO Regarder les FN (SNP)
SCHEDULED: <2023-03-04 Sat >
********* Manuel:
NC_000001.11:1385919 pas de read 1/1:FN:.:i1_5:INDEL:homalt:.
NC_000001.11:1623412 1 read 1/1:FN:.:ti:SNP:homalt:.
NC_000001.11:1668449 33 read sur 160 voient l'allèle alternative 1/1:FN:am:ti:SNP:homalt:.
NC_000001.11:1676135 67 reads, non vu 0/1:FN:.:ti:SNP:het:.
NC_000001.11:1734812 1/1:FN:.:ti:SNP:homalt:.
NC_000001.11:1745808 1/1:FN:.:ti:SNP:homalt:.
NC_000001.11:1745814 1/1:FN:.:ti:SNP:homalt:.
NC_000001.11:1953616 1/1:FN:.:ti:SNP:homalt:.
NC_000001.11:2512975 0/1:FN:.:ti:SNP:het:
****** TODO Alignement
******* DONE Même genome: idem
CLOSED: [2023-03-09 Thu 23:13]
******** GHC38 + version alexis + exons refseq
Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision
INDEL ALL 7226 3417 3809 6979 1599 1919 228 353 0.472876 0.683992
SNP ALL 59052 37827 21225 43483 1913 3741 676 35 0.640571 0.951865
METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio
0.274968 0.559171 NaN NaN 1.547733 2.756698
0.086034 0.765792 2.433271 2.350254 1.575230 1.492375
******** GHC38.p13 + notre version alexis + exons refseq
Type Filter TRUTH.TOTAL TRUTH.TP TRUTH.FN QUERY.TOTAL QUERY.FP QUERY.UNK FP.gt FP.al METRIC.Recall METRIC.Precision
INDEL ALL 7226 3417 3809 6978 1599 1918 228 353 0.472876 0.683992
SNP ALL 59052 37825 21227 43480 1913 3740 675 35 0.640537 0.951862
METRIC.Frac_NA METRIC.F1_Score TRUTH.TOTAL.TiTv_ratio QUERY.TOTAL.TiTv_ratio TRUTH.TOTAL.het_hom_ratio QUERY.TOTAL.het_hom_ratio
0.274864 0.559171 NaN NaN 1.547733 2.756151
0.086017 0.765767 2.433271 2.350281 1.575230 1.492346
****** DONE Vérifier si coordonnées génomiques (vcf)
CLOSED: [2023-03-09 Thu 23:05]
****** TODO Variant calling
******* TODO Désactiver dbSNP ???
****** TODO Définition des exons
******* TODO Vérifier qu'il ne manque pas des exons (avec bam ?)