Carnegie Mellon University

2019 Program Overview

The inaugural Pre-College Program was held in July 2019. We are continually looking for new ideas to make our program fresh, so this curriculum is always subject to change.

The 2019 Pre-College Program in Computational Biology began on the first day of the program with computational and laboratory bootcamps getting them up to speed in programming and basic “wet lab” techniques.  On the second day of the program, students undertook an exciting day-long adventure onto Pittsburgh’s Three Rivers with our partner Rivers of Steel not only to sample water but also to learn about ecology (and of course take in the city’s beautiful bridges and architecture); see photo above.

Why are Pittsburgh’s Three Rivers an interesting biological environment? The Allegheny and Monongahela Rivers flow from somewhat rural landscapes into an urban environment with a history of industrial run-off, before merging into the Ohio River and continuing westward to its eventual confluence with the Mississippi.  In even a small sample of river water lives an invisible ecosystem of microorganisms (bacteria and viruses).  Only recently have researchers developed methods that can be used to start to understand, for each river, what these microbes are, what they do, and how they have evolved.

What is so interesting about bacteria?  A landmark paper by Hug et al. and published in 2016 in Nature Biotechnology provided the evolutionary tree below.  In it, we see that of the three domains of life, the eukaryotes (i.e., everything you have ever seen that is alive, and some things that you haven’t) make up the smallest component of the tree, meaning that they have the least genetic diversity.  By far the most genetic diversity, and the largest part of the tree, is found in bacteria.  This makes sense!  Bacteria have been around a lot longer than we have, and they replicate and mutate quickly, so they have been able to move into environments that we could never dream of living – such as oil wells, deep sea ocean vents, and polluted rivers 🙂 — as well as produce a host of interesting compounds.  For example, every antibiotic ever used to stop an infection was borrowed from a bacterium that had evolved to use this compound to kill its enemies.


Evolutionary Tree of Life
The picture of an evolutionary tree for all living things is truly worth a thousands words.  Source: Discovery Magazine


But how is an evolutionary tree like this produced?  We must sequence DNA from the same gene in many species.   What is the lab method we can use to sequence this DNA from a biological sample (like river water)?  And once we obtain the DNA, how do we train a computer to build this evolutionary tree?

These questions are just the beginning of the inquiries that we can make about this particular question in computational biology.  The 2019 week-by-week syllabus is detailed below.


2019 Week-by-Week Curriculum

Pre-College Computational Biology 2019


  • Coding bootcamp: How will programming help us solve biological problems that cannot be solved in the lab alone?
  • River sampling: How can we collect biological samples from the rivers while minimizing contamination and maximizing biological material yield? What other features of the rivers (e.g., ambient temperature/recent precipitation) are important to help us understand the microbiological communities?
  • DNA Extraction: In a sample of various biological specimens (river water), how can we extract all of the DNA present (and eliminate everything else)?
  • 16S sequencing: How can we experimentally use a conserved gene to help determine the relative abundances of different species of bacteria in our river water sample?
  • 16S sequencing analysis: Given the sequence of a strand of DNA, how can we determine the species from which it came?
  • Bacterial Isolation for whole genome sequencing: If we want to sequence the genome of a single bacterial cell, how can we isolate one cell from a river water sample containing millions of cells?
  • Predicting Replication Origins: Using sequencing data, how can we predict bacterial replication origin?

Pre-College Presentations 2019


  • Whole Genome Sequencing: How can we read a relatively short fragment of DNA excised from a bacterial genome? Why can sequencing machines only read short fragments of DNA and not entire genomes?
  • Whole Genome Reconstruction: After producing many DNA fragments that we can read, can we reconstruct the full genome from thousands of relatively short sequencing reads?
  • Mass Spectrometry: How can we determine what else is in the water samples that may be affecting microbial diversity?  Likewise, how can we determine what effect the microbial diversity has on chemicals in the water?
  • Bacteria Identification: How can we use computational techniques to understand and characterize images of bacterial colonies?
  • Bacterial modification: How can we insert a new gene into bacteria in order to induce new functionality in bacteria?


  • Building Phylogenies: How can we determine evolutionary relationships between organisms?  Specifically, given genes from a host of different species, how can we construct an evolutionary tree for these species to determine how they have evolved?
  • Presenting Scientific Results: What are good strategies for conveying the results of scientific experiments?  What are the fundamentals of giving a good scientific talk?
  • Fluorescence Microscopy: How can we use fluorescence to image eukaryotes?
  • Fluorescence Microscopy Image Analysis: How can we analyze fluorescence images to help to classify eukaryotes and prokaryotes?


Students presented their scientific work to their parents/guardians and families.