このページは http://www.slideshare.net/agbiotec/overview-of-genome-assembly-algorithms の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

5年弱前 (2011/11/01)にアップロードinテクノロジー

Overview of Genome Assembly Algorithms with some graph theory overview, given as invited lecture ...

Overview of Genome Assembly Algorithms with some graph theory overview, given as invited lecture to a George Washington University course.

- Introduction OLC Graph theory and assembly deBruijn - EulerGenome Assembly Algorithms and Software (or...what to do with all that sequence data ?) Konstantinos Krampis Asst. Professor, Informatics J. Craig Venter Institute George Washington University, Nov. 2nd 2011 Konstantinos Krampis Genome Assembly Algorithms and Software
- OLC Graph theory and assembly deBruijn - EulerIntroduction Why do we need genome assembly Deﬁnitions of genome assemblyOLC Overlap Layout Consensus OLC assembly software and publicationsGraph theory and assembly Deﬁnition of a graph Graphs and genome assemblydeBruijn - Euler An alternative assembly graph Constructing a de Bruijn graph from reads Genome assembly from de Bruijn graphs deBruijn assembly software and publications
- OLC Why do we need genome assembly Graph theory and assembly Deﬁnitions of genome assembly deBruijn - EulerCannot read the complete genomewith the sequencer from one end tothe other !DNA isolated from a cell isampliﬁedBroken into fragments (shearing)Fragments are ”read” with thesequencerUse the fragments - reads toreconstruct the genome from Credit: Masahiro Kasahara, Large-Scale Genome Sequencesequencing reads Processing, Imprerial College Press
- OLC Why do we need genome assembly Graph theory and assembly Deﬁnitions of genome assembly deBruijn - EulerAssembly: hierarchical processto reconstruct genome fromreadsAssemble the puzzle of thegenome from the reads:overlaps connect the piecesOversample the genome so thatreads overlapKey approach: data structurerepresenting overlaps, andalgorithms operating on that Credit: Masahiro Kasahara, Large-Scale Genome Sequencedata structure Processing, Imprerial College Press
- OLC Why do we need genome assembly Graph theory and assembly Deﬁnitions of genome assembly deBruijn - EulerTwo major algorithmic paradigms for genome assembly Overlap - Layout - Consensus (OLC): well established, more powerful method, but more diﬃcult to implement OLC: ﬁrst to be used successfully for complex Eucaryotic genomes (Drosophila,H.sapiens) deBruijn - Euler: newer, easier to implement, problematic in complex genomes (for current implementations)
- Overlap OLC Layout Graph theory and assembly Consensus deBruijn - Euler OLC assembly software and publicationsFind Overlaps by aligningthe sequence of the readsLayout the reads basedon which aligns to whichGet Consensus by joiningall read sequences,merging overlapsSequencer reads inrandom direction,left-to-right or Credit: Masahiro Kasahara, Large-Scale Genome Sequence Processing,right-to-left Imprerial College Press
- Overlap OLC Layout Graph theory and assembly Consensus deBruijn - Euler OLC assembly software and publicationsSequence alignment,all-against-all reads(Smith-Watermann,BLAST, other?)Computationally intensivebut easily parallelizableRepresent read overlap byconnecting with directed Credit: Kececioglu and Myers 1995, Algorithmica 13:7-51linkFirst step in creating thegenome assembly graph(more later)
- Overlap OLC Layout Graph theory and assembly Consensus deBruijn - Euler OLC assembly software and publicationsCreate a consistent linear(ideally) ordering of thereadsRemove redundancy, sono two dovetails leavethe same edgeNo containment edge isfollowed by a dovetailedgeRemove cycles, one linkin, one out
- Overlap OLC Layout Graph theory and assembly Consensus deBruijn - Euler OLC assembly software and publicationsMultiple SequenceAlignment (ClustalW)algorithms ? Nophylogeny here...Vote for the most abundantnucleotide for each positionIncorporate read quality dataCreate pre-consensus fromhigh-quality reads, and alignremaining reads to it
- Overlap OLC Layout Graph theory and assembly Consensus deBruijn - Euler OLC assembly software and publicationsCelera Assembler Developed at Celera Genomics for ﬁrst Drosophila and human genome assemblies Continuoued development at J. Craig Venter Inst. as open source project http://wgs-assembler.SourceForge.net (Licence: GPL) Plently of wiki (developer + user) documentation, examples, user forums Other OLC implementations: Arachne, PCAP, Newbler, Phrap, TIGR Assembler
- Overlap OLC Layout Graph theory and assembly Consensus deBruijn - Euler OLC assembly software and publicationsCelera Assembler publications Myers et al (2000) A whole-genome assembly of Drosophila Levy et al (2007) The diploid genome sequence of an individual human Zimin et al (2009) The domestic cow, Bos taurus Dalloul et al (2010) The domestic turkey, Meleagris gallopavo Lorenzi et al (2010) New assembly of Entamoeba histolytica Lawniczak et al (2010) Divergence in Anopheles gambiae Jones et al (2011) The marine ﬁlamentous cyanobacterium Lyngbya majuscula Miller et al The Tasmanian devil, Sarcophilus harrisii Prfer et al The great ape bonobo, Pan paniscus Gordon et al The cotton bollworm moth, Helicoverpa
- OLC Deﬁnition of a graph Graph theory and assembly Graphs and genome assembly deBruijn - Eulerand now a bit of Graph Theory...
- OLC Deﬁnition of a graph Graph theory and assembly Graphs and genome assembly deBruijn - EulerGraph G with set of vertices (nodes)V: {P,T,Q,S,R}set of edges (links between nodes)E: {(P,T),(P,Q),(P,S),(Q,T),(S,T),(Q,S),(S,Q),(Q,R),(R,S)}walk from P to R:(P,Q),(Q,R)walk from R to T:(R,S),(S,Q),(Q,T)or (R,S),(S,T) Credit: Introduction to Graph Theor Robert J. Wilsonwalk from R to P: not possible
- OLC Deﬁnition of a graph Graph theory and assembly Graphs and genome assembly deBruijn - EulerTrail: a walk of the graph whereeach edge is visited only onceExample Trail: (P,Q), (Q,R),(R,S), (S,Q), (Q,S), (S,T)Path: a walk where each verticeis visited onceExample Path: (P,Q), (Q,R),(R,S), (S,T)
- OLC Deﬁnition of a graph Graph theory and assembly Graphs and genome assembly deBruijn - EulerCredit: Saad Mneimneh, CUNY
- OLC Deﬁnition of a graph Graph theory and assembly Graphs and genome assembly deBruijn - EulerRepresent sequence overlaps asa graph with weighted edgesSCS solution: ﬁnd Path (visitall edges and vertices once) thatmaximizes weight sumHamiltonian Cycle or TravelingSaleman Problem
- OLC Deﬁnition of a graph Graph theory and assembly Graphs and genome assembly deBruijn - EulerWhich edge to start from?NO: misses a vertex NO: misses edge with large weight
- OLC Deﬁnition of a graph Graph theory and assembly Graphs and genome assembly deBruijn - EulerYES!: all vertices and edge with large weight
- OLC Deﬁnition of a graph Graph theory and assembly Graphs and genome assembly deBruijn - EulerA more realistic version of a read / string overlap graph (C. jejuni)Credit: Eugene W. Myers Bioinformatics 21:79-85
- OLC Deﬁnition of a graph Graph theory and assembly Graphs and genome assembly deBruijn - EulerComputational Complexity SCS solution by searching for a Hamiltonian Cycle on a graph is a diﬃcult algorithmic problem (NP-hard) Using approximation or greedy algorithms can yield a 2 to 4-aprroximation solutions (twice or four times the length of the optimal-shortest string) Transformation of Overlap Graph to String Graph leads to Polynomial time solution. No Polynomial(P) : O(n), O(n2 ), O(n3 )etc. assembler implementation yet. (1)
- An alternative assembly graph OLC Constructing a de Bruijn graph from reads Graph theory and assembly Genome assembly from de Bruijn graphs deBruijn - Euler deBruijn assembly software and publicationsPevzner, Tang andWaterman, AnEulerian pathapproach to DNAfragment assembly,PNAS 98 20019748-9753.
- An alternative assembly graph OLC Constructing a de Bruijn graph from reads Graph theory and assembly Genome assembly from de Bruijn graphs deBruijn - Euler deBruijn assembly software and publicationsdeBruijn graph: a directed graph representing overlaps betweensequences of symbolsCredit: Wikipedia
- An alternative assembly graph OLC Constructing a de Bruijn graph from readsGraph theory and assembly Genome assembly from de Bruijn graphs deBruijn - Euler deBruijn assembly software and publications
- An alternative assembly graph OLC Constructing a de Bruijn graph from readsGraph theory and assembly Genome assembly from de Bruijn graphs deBruijn - Euler deBruijn assembly software and publications
- An alternative assembly graph OLC Constructing a de Bruijn graph from readsGraph theory and assembly Genome assembly from de Bruijn graphs deBruijn - Euler deBruijn assembly software and publications
- An alternative assembly graph OLC Constructing a de Bruijn graph from reads Graph theory and assembly Genome assembly from de Bruijn graphs deBruijn - Euler deBruijn assembly software and publicationsIn a real genome scenario...Credit: Flicek and Birney 2009, Nature Methods 6, S6 - S12
- An alternative assembly graph OLC Constructing a de Bruijn graph from reads Graph theory and assembly Genome assembly from de Bruijn graphs deBruijn - Euler deBruijn assembly software and publicationsEuler’s algorithm Using Euler’s algorithm we can ﬁnd a path that visits each edge of the de Bruijn genome assembly graph once, in order to concatenate the edge labels and ”spell out” the assembly. Polynomial time! Credit: Wikipedia
- An alternative assembly graph OLC Constructing a de Bruijn graph from reads Graph theory and assembly Genome assembly from de Bruijn graphs deBruijn - Euler deBruijn assembly software and publicationsEuler assembler (the very ﬁrst), Pevzner et al 2001 PNAS98:9748-9753Velvet assembler (more user friendly),Both those assemlers store the complete graph on the computermemory 512GB-1024GB for human genomesAt JCVI we have two 1024GB (1TB) RAM servers for assemblyothers: ABYSS, YAGA, Contrail-Bio, PASHA parallel (distributedmemory) assemblers on computer clusters
- An alternative assembly graph OLC Constructing a de Bruijn graph from reads Graph theory and assembly Genome assembly from de Bruijn graphs deBruijn - Euler deBruijn assembly software and publicationsThank you! contact: kkrampis@jcvi.org We hire interns at the J. Craig Venter Institute: http://www.jcvi.org/cms/education/internship-program/ Some of my other projects - Cloud Computing: http://tinyurl.com/cloudbiolinux-jcvi http://www.cloudbiolinux.org