String Algorithms
Course Number: 02-614
Course Relevance: Undergraduate and graduate students who have interest in algorithm techniques for large-scale and sequence processing. Graduate and undergraduate students in computational biology.
Key Topics:
- string search
- inexact matching
- string compression
- string data structures such as suffix trees, suffix arrays, and searchable compressed indices
- the Burrows-Wheeler transform
- locality sensitive hashing
- de Brujin Graphs
Background Knowledge:
- Equivalent of 15-210 (“Parallel & Sequential Data Structures and Algorithms”) or 150351/15-650/02-613 (“Algorithms & Advanced Data Structures”)
- Equivalent of 15-151 or 21-127
- Programming proficiency
Units: 12
Prerequisite(s): (15351 or 15650 or 02613 or 15210) and (15151 or 21127)
Textbook(s):
Learning Resources:
Autolab, Piazza, Gradescope
Learning Objectives
- Learn various algorithmic techniques and data structures for efficient processing of string data, including suffix trees,
- suffix arrays, Borrows-Wheeler transforms.• Understand the why these algorithms and data structures work.
- Learn to apply and extend these algorithms and data
structures. - Learn about the practical application of these techniques,
especially in genomics - At the end of this class, you should be familiar with much of the state-of-the-art in algorithms for strings, have familiarity with their use in practice, and have experience applying them to new problems.
Assessment Structure:
- Homework assignments (35%)
- 2 Midterm exams (15% each)
- Final exam (25%)
- Class participation and quizzes (10%)
Equivalent of 15-210 (“Parallel & Sequential Data Structures and Algorithms”) or 15-351/15-650/02-613 (“Algorithms & Advanced Data Structures”).
Equivalent of 15-151 or 21-127.
Notes: No knowledge of biology assumed