Carnegie Mellon University

String Algorithms

Course Number: 02-614

Course Relevance: Undergraduate and graduate students who have interest in algorithm techniques for large-scale and sequence processing. Graduate and undergraduate students in computational biology.

Key Topics: 

  • string search
  • inexact matching
  • string compression
  • string data structures such as suffix trees, suffix arrays, and searchable compressed indices
  • the Burrows-Wheeler transform
  • locality sensitive hashing
  • de Brujin Graphs

Background Knowledge: 

  • Equivalent of 15-210 (“Parallel & Sequential Data Structures and Algorithms”) or 150351/15-650/02-613 (“Algorithms & Advanced Data Structures”)
  • Equivalent of 15-151 or 21-127
  • Programming proficiency

Units: 12
Prerequisite(s): (15351 or 15650 or 02613 or 15210) and (15151 or 21127)

Textbook(s):

Learning Resources: 

Autolab, Piazza, Gradescope

Learning Objectives

  • Learn various algorithmic techniques and data structures for efficient processing of string data, including suffix trees,
  • suffix arrays, Borrows-Wheeler transforms.• Understand the why these algorithms and data structures work.
  • Learn to apply and extend these algorithms and data
    structures.
  • Learn about the practical application of these techniques,
    especially in genomics
  • At the end of this class, you should be familiar with much of the state-of-the-art in algorithms for strings, have familiarity with their use in practice, and have experience applying them to new problems.

Assessment Structure: 

  • Homework assignments (35%)
  • 2 Midterm exams (15% each)
  • Final exam (25%)
  • Class participation and quizzes (10%)

Equivalent of 15-210 (“Parallel & Sequential Data Structures and Algorithms”) or 15-351/15-650/02-613 (“Algorithms & Advanced Data Structures”).
Equivalent of 15-151 or 21-127.

Notes: No knowledge of biology assumed