:PROPERTIES: :ID: f25927b1-797c-4c9d-89d5-f0bb10be9bd2 :ROAM_REFS: @Alser2021 #+title: Alser2021 :END: Revue des algorithmes **** Notes Introduction - Brute-force = non utilisable sur machine modernes (centains millions de read) - Méthode actuelle - Alignement = étape cruciale - 2 possibilité pour un génome : assemblage /de novo/ ou sur un génomne de rééfrence. La première solution n'est pas encore utilisée (génome complexe et reads petite taille, problème non résolu par long read) car trop lente et plus coûteuse Problèmes - génome de réference incomplet (reads non aligné/incorrectement) - régions répétées (1 reads peut aligner sur différente région) Ex: BWA mem qui aligne "au hasard" ([cite:@firtina2016]: si les reads sont mélangés, un read ambigu est aligné à un endroit différent) - Doit tolérer SNV ou variants structuraux - Doit gérer les brins forward et reverse pour éviter d’avoir des génotypes différents (strand bias) ***** Liste complète d'aligner jusque 2020: Table 1 Aligneur gérant l'ADN après 2012 (W) = wrapper (utilise d'autres outils) ****** Initiale | Aligner | Paper | publication | Application | Indexing | GP fix | GP Spaced speed | GP seed chaining | Pairwise alignment | Max. read length | | | | | | | | | | | tested (bp) | |----------------+-------+-------------+----------------+--------------------+--------+-----------------+------------------+-------------------------+------------------| | [[https://github.com/lh3/bwa][BWA]] | [54] | 2009 | DNA | BWT-FM | N | N | N | Semi-Global | 125 | | [[http://bowtie-bio.sourceforge.net/manual.shtml][Bowtie]] | [55] | 2009 | DNA | BWT-FM | Y | N | N | HD | 76 | | [[https://sourceforge.net/projects/cloudburst-bio/][CloudBurst]] | [56] | 2009 | DNA | Hashing | Y | N | N | Landau-Vishkin | 36 | | [[https://github.com/byucsl/gnumap][GNUMAP]] | [57] | 2009 | DNA | Hashing | Y | N | Y | NW | 36 | | [[http://1001genomes.org/software/genomemapper_singleref.html][GenomeMapper]] | [58] | 2009 | DNA | Hashing | Y | N | Y | NW | 200 | | [[https://github.com/hugheaves/MOM][MOM]] | [59] | 2009 | DNA | Hashing | Y | N | N | HD | 40 | | [[http://pass.cribi.unipd.it/cgi-bin/pass.pl][PASS]] | [60] | 2009 | DNA | Hashing | Y | N | Y | NW | 32 | | [[https://code.google.com/archive/p/perm/downloads][PerM]] | [61] | 2009 | DNA | Hashing | Y | Y | N | HD | 47 | | [[https://github.com/seqan/seqan/tree/master/apps/razers][RazerS]] | [62] | 2009 | DNA | Hashing | Y | Y | Y | Myers Bit Vector | 76 | | [[http://compbio.cs.toronto.edu/shrimp/][SHRiMP]] | [63] | 2009 | DNA | Hashing | N | N | N | SW | 35 | | [[https://github.com/ShujiaHuang/SOAPaligner][SOAP2]] | [64] | 2009 | DNA | BWT-FM | Y | N | N | SW | 44 | | [[http://www.bcgsc.ca/platform/bioinfo/software/slider][Slider]] | [65] | 2009 | DNA | Hashing | Y | N | N | HD | 36 | | [[https://www.bioinf.uni-leipzig.de/Software/segemehl/][segemehl]] | [66] | 2009 | DNA | Suffix array | N | N | Y | SW | 35 | | [[https://github.com/lh3/bwa][BWA-SW]] | [54] | 2010 | DNA | BWT-FM | N | N | N | SW | 10000 | | [[http://www.irisa.fr/symbiose/projects/gassst/][GASSST]] | [35] | 2010 | DNA | Hashing | Y | Y | Y | Semi-Global | 500 | | [[https://github.com/juliangehring/GMAP-GSNAP][GSNAP]] | [37] | 2010 | DNA | Hashing | Y | N | Y | Non-DP Heuristic | 100 | | [[https://github.com/rcallahan/smalt][SMALT]] | [69] | 2010 | DNA | Hashing | Y | N | Y | SW | 150 | | [[http://www.bcgsc.ca/platform/bioinfo/software/SliderII][SliderII]] (W) | [70] | 2010 | DNA | Hashing | Y | N | N | HD | 42 | | [[http://www.vmatch.de/][VMATCH]] (W) | [71] | 2010 | DNA | Suffix array | Y | N | Y | SW | N/A | | [[https://github.com/sfu-compbio/mrsfast][mrsFAST]] | [72] | 2010 | DNA | Hashing | Y | N | N | HD | 100 | | [[http://last.cbrc.jp/][LAST]] | [78] | 2011 | DNA/BS-Seq/RNA | Suffix array | N | Y | N | SW & NW | 105 | | [[https://dl.acm.org/citation.cfm?id=2147845&dl=ACM&coll=DL][DynMap]] | [79] | 2011 | DNA | Hashing | Y | N | N | NW | 52 | | [[http://compbio.cs.toronto.edu/shrimp/][SHRiMP2]] | [80] | 2011 | DNA | Hashing | Y | Y | Y | SW | 75 | | [[http://snap.cs.berkeley.edu/][SNAP]] | [81] | 2011 | DNA | Hashing | Y | N | N | NW | 10000 | | [[https://www.well.ox.ac.uk/project-stampy][Stampy]] | [82] | 2011 | DNA | Hashing | Y | N | N | NW | 4500 | | [[https://github.com/iontorrent/TS/tree/master/Analysis/TMAP][TMAP]] | ? | 2011 | DNA | BWT-FM | N | N | Y | SW | N/A | | [[http://grimmond.imb.uq.edu.au/X-MATE/][X-Mate]] | [83] | 2011 | DNA | Hashing | N | N | N | Non-DP Heuristic | 50 | | [[https://github.com/mchaisso/blasr/][BLASR]] | [85] | 2012 | DNA | Suffix array | Y | N | Y | NW | 8000 | | [[https://code.google.com/archive/p/batmis/][Batmis]] | [86] | 2012 | DNA | BWT-ST | Y | N | N | HD | 100 | | [[http://bowtie-bio.sourceforge.net/bowtie2][Bowtie2]] | [87] | 2012 | DNA | BWT-FM | Y | N | Y | SW & NW | 400 | | [[https://github.com/smarco/gem3-mapper][GEM]] | [88] | 2012 | DNA | BWT-FM | N | N | Y | SW & NW | 150 | | [[https://github.com/seqan/seqan/tree/master/apps/razers3][RazerS3]] | [89] | 2012 | DNA | Hashing | Y | Y | Y | Banded Myers Bit Vector | 800 | | [[https://web.stanford.edu/group/wonglab/seqalto/][SeqAlto]] | [90] | 2012 | DNA | Hashing | Y | N | N | NW | 200 | | [[https://github.com/seqan/seqan/blob/master/apps/splazers/README][SplazerS]] | [91] | 2012 | DNA | Hashing | Y | N | Y | Banded Myers Bit Vector | 150 | | [[http://pages.cs.wisc.edu/~jignesh/wham/][WHAM]] | [92] | 2012 | DNA | Hashing | Y | N | N | NW | 74 | | [[https://github.com/GregoryFaust/yaha][YAHA]] | [93] | 2012 | DNA | Hashing | Y | N | Y | SW | 10000 | | [[http://subread.sourceforge.net/][Subread]] | [97] | 2013 | DNA/RNA-Seq | Hashing | Y | Y | Y | SW | 202 | | [[https://github.com/lh3/bwa][BWA-MEM]] | [98] | 2013 | DNA | BWT-FM | N | N | Y | SW & NW | 650 | | [[http://www.seqan.de/projects/masai][Masai]] | [99] | 2013 | DNA | Suffix tree | N | N | Y | Banded Myers Bit Vector | 150 | | [[http://cibiv.github.io/NextGenMap/][NextGenMap]] | [100] | 2013 | DNA | Hashing | Y | N | N | SW & NW | 250 | | [[http://www.umsl.edu/~wongch/software.html][SRmapper]] | [101] | 2013 | DNA | Hashing | Y | N | N | HD | 100 | | [[https://github.com/BilkentCompGen/mrfast][mrFAST]] | [102] | 2013 | DNA | Hashing | Y | N | N | Semi-Global | 180 | | [[http://bwa-pssm.binf.ku.dk/][BWA-PSSM]] (W) | [107] | 2014 | DNA | BWT-FM | Y | N | N | SW | 100 | | [[http://cushaw3.sourceforge.net/homepage.htm#latest][CUSHAW3]] | [108] | 2014 | DNA | BWT-FM | Y | N | Y | SW & Semi-Global | 100 | | [[https://hobbes.ics.uci.edu/download.shtml][Hobbes2]] | [109] | 2014 | DNA | Hashing | Y | N | Y | Banded Myers Bit Vector | | | [[https://github.com/wanpinglee/MOSAIK][MOSAIK]] | [110] | 2014 | DNA | Hashing | Y | N | N | SW | 100 | | [[https://github.com/opencb/hpg-aligner][hpg-Aligner]] | [111] | 2014 | DNA | Suffix array | N | N | Y | SW | 5000 | | [[https://github.com/sfu-compbio/mrsfast][mrsFAST-Ultra]] | [112] | 2014 | DNA | Hashing | Y | N | N | HD | 100 | | [[http://erne.sourceforge.net/][ERNE2]] | [116] | 2016 | DNA/BS-Seq | BWT-FM +hashing | Y | N | N | HD | 100 | | [[https://github.com/isovic/graphmap][GraphMap]] | [117] | 2016 | DNA | Hashing | Y | Y | Y | Semi-global | 9000 | | [[https://github.com/ruhulsbu/NanoBLASTer][NanoBLASTer]] | [118] | 2016 | DNA | Hashing | Y | N | Y | NW | 7040 | | [[https://github.com/lh3/minimap][minimap]] | [119] | 2016 | DNA | Hashing | Y | N | N | N/A | 13000 | | [[https://github.com/dfguan/rHAT][rHAT]] | [120] | 2016 | DNA | Hashing | Y | N | Y | SW | 8000 | | [[https://github.com/hsinnan75/KART][KART]] | [121] | 2017 | DNA | BWT-FM | N | N | Y | NW | 7118 | | [[https://github.com/hitbc/LAMSA][LAMSA]] (W) | [122] | 2017 | DNA | BWT-FM + hashing | Y | N | Y | Sparse DP | 1000 | | [[https://github.com/lh3/minimap2][minimap2]] | [124] | 2018 | DNA/RNA-Seq | Hashing | Y | N | Y | NW | 11628 | | [[https://gitlab.com/pirovc/dream_yara/][DREAM-Yara]] (W) | [125] | 2018 | DNA | BWT-FM | Y | N | N | Banded Myers Bit Vector | | | [[https://github.com/mummer4/mummer][MUMmer4]] (W) | [126] | 2018 | DNA | Suffix array | Y | N | Y | SW | 7821 | | [[https://github.com/philres/ngmlr][NGMLR]] | [127] | 2018 | DNA | Hashing | Y | N | Y | SW | 50000 | | [[https://github.com/vpc-ccg/lordfast][lordFAST]] | [128] | 2018 | DNA | BWT-FM + hashing | N | N | Y | SW & NW | 35489 | | [[https://github.com/lbcb-sci/graphmap2][GraphMap2]] | [130] | 2019 | DNA/RNA-Seq | Hashing | Y | Y | Y | Semi-global | 9000 | | [[https://github.com/ncbi/magicblast][Magic-BLAST]] | [131] | 2019 | DNA/RNA-Seq | Hashing | Y | N | N | Non-DP Heuristic | 90000 | | [[https://github.com/bwa-mem2/bwa-mem2][BWA-MEM2]] | [132] | 2019 | DNA | BWT-FM | N | N | Y | SW | 650 | | [[https://ccb.jhu.edu/software/hisat2/index.shtml][HISAT2]] | [133] | 2019 | DNA | BWT-FM | Y | N | N | Non-DP Heuristic | 100 | | [[https://www.dropbox.com/s/3jcu4i240kyu2tc/source%20code%20conLSH_bio.tar.gz?dl=0][conLSH]] | [135] | 2020 | DNA | Hashing | Y | N | Y | Sparse DP | 8000 | Étapes 1. Index du génome 2. Liste endroits possible avec cet index 3. Pour chaque position possible, calcul de similarité avec le read ****** Avec citations #+begin_src julia using DataFramesMeta, CSV info = CSV.read("biblio-aligner-init.tsv", DataFrame, delim="\t") cite = CSV.read("citation-aligner-init.tsv", DataFrame, delim="\t") d = innerjoin(info, cite, on = :ID) CSV.write("biblio-aligner.tsv", d, delim="\t") #+end_src | Link | Aligner | ID | publication | Application | Indexing | GP fix | GP spaced speed | GP seed chaining | Pairwise alignment | Max. read length | Doi | Pubmed | | http://www.irisa.fr/symbiose/projects/gassst/ | GASSST | 35 | 2010 | DNA | Hashing | Y | Y | Y | Semi-Global | 500 | 10.1093/bioinformatics/btq485 | 20739310 | | https://github.com/juliangehring/GMAP-GSNAP | GSNAP | 37 | 2010 | DNA | Hashing | Y | N | Y | Non-DP Heuristic | 100 | 10.1093/bioinformatics/btq057 | 20147302 | | https://github.com/lh3/bwa | BWA | 54 | 2009 | DNA | BWT-FM | N | N | N | Semi-Global | 125 | 10.1093/bioinformatics/btp698 | 20080505 | | https://github.com/lh3/bwa | BWA-SW | 54 | 2010 | DNA | BWT-FM | N | N | N | SW | 10000 | 10.1093/bioinformatics/btp698 | 20080505 | | http://bowtie-bio.sourceforge.net/manual.shtml | Bowtie | 55 | 2009 | DNA | BWT-FM | Y | N | N | HD | 76 | 10.1186/gb-2009-10-3-r25 | 19261174 | | https://sourceforge.net/projects/cloudburst-bio/ | CloudBurst | 56 | 2009 | DNA | Hashing | Y | N | N | Landau-Vishkin | 36 | 10.1093/bioinformatics/btp236 | 19357099 | | https://github.com/byucsl/gnumap | GNUMAP | 57 | 2009 | DNA | Hashing | Y | N | Y | NW | 36 | 10.1093/bioinformatics/btp614 | 19861355 | | http://1001genomes.org/software/genomemapper_singleref.html | GenomeMapper | 58 | 2009 | DNA | Hashing | Y | N | Y | NW | 200 | 10.1186/gb-2009-10-9-r98 | 19761611 | | https://github.com/hugheaves/MOM | MOM | 59 | 2009 | DNA | Hashing | Y | N | N | HD | 40 | 10.1093/bioinformatics/btp092 | 19228804 | | http://pass.cribi.unipd.it/cgi-bin/pass.pl | PASS | 60 | 2009 | DNA | Hashing | Y | N | Y | NW | 32 | 10.1093/bioinformatics/btp087 | 19218350 | | https://code.google.com/archive/p/perm/downloads | PerM | 61 | 2009 | DNA | Hashing | Y | Y | N | HD | 47 | 10.1093/bioinformatics/btp486 | 19675096 | | https://github.com/seqan/seqan/tree/master/apps/razers | RazerS | 62 | 2009 | DNA | Hashing | Y | Y | Y | Myers Bit Vector | 76 | 10.1101/gr.088823.108 | 19592482 | | http://compbio.cs.toronto.edu/shrimp/ | SHRiMP | 63 | 2009 | DNA | Hashing | N | N | N | SW | 35 | 10.1371/journal.pcbi.1000386 | 19461883 | | https://github.com/ShujiaHuang/SOAPaligner | SOAP2 | 64 | 2009 | DNA | BWT-FM | Y | N | N | SW | 44 | 10.1093/bioinformatics/btp336 | 19497933 | | http://www.bcgsc.ca/platform/bioinfo/software/slider | Slider | 65 | 2009 | DNA | Hashing | Y | N | N | HD | 36 | 10.1093/bioinformatics/btn565 | 18974170 | | https://www.bioinf.uni-leipzig.de/Software/segemehl/ | segemehl | 66 | 2009 | DNA | Suffix array | N | N | Y | SW | 35 | 10.1371/journal.pcbi.1000502 | 19750212 | | https://github.com/rcallahan/smalt | SMALT | 69 | 2010 | DNA | Hashing | Y | N | Y | SW | 150 | | | | http://www.bcgsc.ca/platform/bioinfo/software/SliderII | SliderII (W) | 70 | 2010 | DNA | Hashing | Y | N | N | HD | 42 | 10.1093/bioinformatics/btq092 | 20190250 | | https://github.com/sfu-compbio/mrsfast | mrsFAST | 72 | 2010 | DNA | Hashing | Y | N | N | HD | 100 | 10.1038/nmeth0810-576 | 20676076 | | http://last.cbrc.jp/ | LAST | 78 | 2011 | DNA/BS-Seq/RNA | Suffix array | N | Y | N | SW & NW | 105 | 10.1101/gr.113985.110 | 21209072 | | https://dl.acm.org/citation.cfm?id=2147845&dl=ACM&coll=DL | DynMap | 79 | 2011 | DNA | Hashing | Y | N | N | NW | 52 | 10.1145/2147805.2147845 | | | http://compbio.cs.toronto.edu/shrimp/ | SHRiMP2 | 80 | 2011 | DNA | Hashing | Y | Y | Y | SW | 75 | 10.1093/bioinformatics/btr046 | 21278192 | | http://snap.cs.berkeley.edu/ | SNAP | 81 | 2011 | DNA | Hashing | Y | N | N | NW | 10000 | | | | https://www.well.ox.ac.uk/project-stampy | Stampy | 82 | 2011 | DNA | Hashing | Y | N | N | NW | 4500 | 10.1101/gr.111120.110 | 20980556 | | http://grimmond.imb.uq.edu.au/X-MATE/ | X-Mate | 83 | 2011 | DNA | Hashing | N | N | N | Non-DP Heuristic | 50 | 10.1093/bioinformatics/btq698 | 21216778 | | https://github.com/mchaisso/blasr/ | BLASR | 85 | 2012 | DNA | Suffix array | Y | N | Y | NW | 8000 | 10.1186/1471-2105-13-238 | | | https://code.google.com/archive/p/batmis/ | Batmis | 86 | 2012 | DNA | BWT-ST | Y | N | N | HD | 100 | 10.1093/bioinformatics/bts339 | 22689389 | | http://bowtie-bio.sourceforge.net/bowtie2 | Bowtie2 | 87 | 2012 | DNA | BWT-FM | Y | N | Y | SW & NW | 400 | 10.1038/nmeth.1923 | 22388286 | | https://github.com/smarco/gem3-mapper | GEM | 88 | 2012 | DNA | BWT-FM | N | N | Y | SW & NW | 150 | 10.1038/nmeth.2221 | 23103880 | | https://github.com/seqan/seqan/tree/master/apps/razers3 | RazerS3 | 89 | 2012 | DNA | Hashing | Y | Y | Y | Banded Myers Bit Vector | 800 | 10.1093/bioinformatics/bts505 | 22923295 | | https://web.stanford.edu/group/wonglab/seqalto/ | SeqAlto | 90 | 2012 | DNA | Hashing | Y | N | N | NW | 200 | 10.1093/bioinformatics/bts450 | 22811546 | | https://github.com/seqan/seqan/blob/master/apps/splazers/README | SplazerS | 91 | 2012 | DNA | Hashing | Y | N | Y | Banded Myers Bit Vector | 150 | 10.1093/bioinformatics/bts019 | 22238266 | | http://pages.cs.wisc.edu/~jignesh/wham/ | WHAM | 92 | 2012 | DNA | Hashing | Y | N | N | NW | 74 | 10.1145/1989323.1989370 | | | https://github.com/GregoryFaust/yaha | YAHA | 93 | 2012 | DNA | Hashing | Y | N | Y | SW | 10000 | 10.1093/bioinformatics/bts456 | 22829624 | | http://subread.sourceforge.net/ | Subread | 97 | 2013 | DNA/RNA-Seq | Hashing | Y | Y | Y | SW | 202 | 10.1093/nar/gkt214 | 23558742 | | https://github.com/lh3/bwa | BWA-MEM | 98 | 2013 | DNA | BWT-FM | N | N | Y | SW & NW | 650 | | | | http://www.seqan.de/projects/masai | Masai | 99 | 2013 | DNA | Suffix tree | N | N | Y | Banded Myers Bit Vector | 150 | 10.1093/nar/gkt005 | 23358824 | | http://cibiv.github.io/NextGenMap/ | NextGenMap | 100 | 2013 | DNA | Hashing | Y | N | N | SW & NW | 250 | 10.1093/bioinformatics/btt468 | 23975764 | | http://www.umsl.edu/~wongch/software.html | SRmapper | 101 | 2013 | DNA | Hashing | Y | N | N | HD | 100 | 10.1093/bioinformatics/bts712 | 23267171 | | https://github.com/BilkentCompGen/mrfast | mrFAST | 102 | 2013 | DNA | Hashing | Y | N | N | Semi-Global | 180 | 10.1038/ng.437 | 19718026 | | http://bwa-pssm.binf.ku.dk/ | BWA-PSSM (W) | 107 | 2014 | DNA | BWT-FM | Y | N | N | SW | 100 | 10.1186/1471-2105-15-100 | | | http://cushaw3.sourceforge.net/homepage.htm#latest | CUSHAW3 | 108 | 2014 | DNA | BWT-FM | Y | N | Y | SW & Semi-Global | 100 | 10.1371/journal.pone.0086869 | 24466273 | | https://hobbes.ics.uci.edu/download.shtml | Hobbes2 | 109 | 2014 | DNA | Hashing | Y | N | Y | Banded Myers Bit Vector | | 10.1186/1471-2105-15-42 | | | https://github.com/wanpinglee/MOSAIK | MOSAIK | 110 | 2014 | DNA | Hashing | Y | N | N | SW | 100 | 10.1371/journal.pone.0090581 | 24599324 | | https://github.com/opencb/hpg-aligner | hpg-Aligner | 111 | 2014 | DNA | Suffix array | N | N | Y | SW | 5000 | 10.1093/bioinformatics/btu553 | 25143289 | | https://github.com/sfu-compbio/mrsfast | mrsFAST-Ultra | 112 | 2014 | DNA | Hashing | Y | N | N | HD | 100 | 10.1093/nar/gku370 | 24810850 | | http://erne.sourceforge.net/ | ERNE2 | 116 | 2016 | DNA/BS-Seq | BWT-FM +hashing | Y | N | N | HD | 100 | 10.1186/s12859-016-0910-3 | | | https://github.com/isovic/graphmap | GraphMap | 117 | 2016 | DNA | Hashing | Y | Y | Y | Semi-global | 9000 | 10.1038/ncomms11307 | 27079541 | | https://github.com/ruhulsbu/NanoBLASTer | NanoBLASTer | 118 | 2016 | DNA | Hashing | Y | N | Y | NW | 7040 | 10.1109/iccabs.2016.7802776 | | | https://github.com/lh3/minimap | minimap | 119 | 2016 | DNA | Hashing | Y | N | N | N/A | 13000 | 10.1093/bioinformatics/btw152 | 27153593 | | https://github.com/dfguan/rHAT | rHAT | 120 | 2016 | DNA | Hashing | Y | N | Y | SW | 8000 | 10.1093/bioinformatics/btv662 | 26568628 | | https://github.com/hsinnan75/KART | KART | 121 | 2017 | DNA | BWT-FM | N | N | Y | NW | 7118 | 10.1093/bioinformatics/btx189 | 28379292 | | https://github.com/hitbc/LAMSA | LAMSA (W) | 122 | 2017 | DNA | BWT-FM + hashing | Y | N | Y | Sparse DP | 1000 | 10.1093/bioinformatics/btw594 | 27667793 | | https://github.com/lh3/minimap2 | minimap2 | 124 | 2018 | DNA/RNA-Seq | Hashing | Y | N | Y | NW | 11628 | 10.1093/bioinformatics/bty191 | 29750242 | | https://gitlab.com/pirovc/dream_yara/ | DREAM-Yara (W) | 125 | 2018 | DNA | BWT-FM | Y | N | N | Banded Myers Bit Vector | | 10.1093/bioinformatics/bty567 | 30423080 | | https://github.com/mummer4/mummer | MUMmer4 (W) | 126 | 2018 | DNA | Suffix array | Y | N | Y | SW | 7821 | 10.1371/journal.pcbi.1005944 | 29373581 | | https://github.com/philres/ngmlr | NGMLR | 127 | 2018 | DNA | Hashing | Y | N | Y | SW | 50000 | 10.1038/s41592-018-0001-7 | 29713083 | | https://github.com/vpc-ccg/lordfast | lordFAST | 128 | 2018 | DNA | BWT-FM + hashing | N | N | Y | SW & NW | 35489 | 10.1093/bioinformatics/bty544 | 30561550 | | https://github.com/bwa-mem2/bwa-mem2 | BWA-MEM2 | 132 | 2019 | DNA | BWT-FM | N | N | Y | SW | 650 | 10.1109/ipdps.2019.00041 | | | https://ccb.jhu.edu/software/hisat2/index.shtml | HISAT2 | 133 | 2019 | DNA | BWT-FM | Y | N | N | Non-DP Heuristic | 100 | 10.1038/s41587-019-0201-4 | 31375807 | | https://www.dropbox.com/s/3jcu4i240kyu2tc/source%20code%20conLSH_bio.tar.gz?dl=0 | conLSH | 135 | 2020 | DNA | Hashing | Y | N | Y | Sparse DP | 8000 | 10.1016/j.compbiolchem.2020.107206 | 32000034 | | | | | | | | | | | | | | | ***** Index ****** Par hash Pour des séquences courtes (seed), stocke la liste des positions (k-mer). Pour un read, on extrait certains seed et donc la liste de positions. ****** suffix tree Permet de faire des correspondances partielles en groupant les sous-séquences communes. Cf figure 1.b. BWT utilise une approche similaire mais en diminuant le stockage. Diminution des performances avec l s erreurs de sequencage ou avec la dissimilarité avec la référence. ****** Comparaison Hash = index de grande taille mais recherche efficace (O(1) !) et coût d'indexage faible. Suffix = recherche partielle possible mais recherche lente et coût d'indexage élevé (cf Table 2) ****** Performance - Run sequential genome - Pas de différence stastiquement significative entre index et hash pour CPU mais significative pour mémoire (loique) - BWA-FMT = 3.8 moins de ressources - BWA, bowtie et bowtie2 proches pour runtime + coût mémoire (cf figure) Popularité : mesure par le nombre de citation de l'article initial : BLAST > BWA,bowtie,bowtie2 ****** Endroits possibles On prend quelques seeds et on génère une liste de possibilités (voisinage de chaque seed). Pour des seeds courts, liste trop grande donc heuristiques pour réduire le nombre de candidats. Augmenter la taille des seeds diminue ma sensibilité Majorité utilisent des seeds de taille fixe (autre utilise des suffix car le hash nécessiterait d’être recalculé) ****** Vérification soit dynamique (local ou global) soit non dynamique (distance de Hamming… probablement comparaison indépendamment du contexte). Pour substitution/indel, dynamic. Alignement local quand on ne veut qu’une correspondance partielle (structural variant) -> 38% outil. Comparaison quadratique en temps et en espace malgré années recherche … ****** Note : Long-read indexing : les 2 méthodes existent Peuvent couper les long read en short read Difficulté : plus d’erreur et plus de seed. Solution : calculer moins de seed mais plus représentatives du read Comparaison avec short read plus d’erreur Moins de reads (throughput) Moins d’ambiguïté car read plus longs Comparaison : plus facile avec short read car moins d’erreur SNP plus facile à détecter avec short read (moins d’erreur) mais variants structurels avec long read ***** Autres RNAseq : problème = aligner des reads sur des zones non contigùes (à cause de l'épissage) ***** Performances Meilleure que [cite:@donato2021] 10 génomes