Carnegie Mellon University
May 08, 2019

Computational Biology undergrad research featured at Meeting of the Minds 2019

Meeting of the Minds, an annual research symposium held on May 8, celebrates undergraduate research  at Carnegie Mellon University. The Computational Biology Department is proud to have eight undergraduate students who have conducted research with our faculty and are presenting their work at Meeting of the Minds. They have summarized their work below.

Siddharth AnnaldasulaSiddharth Annaldasula, senior
Advisor: Andreas Pfenning

Regulatory Factors Enriched in the Convergent Evolution of Vocal Learning in Mammals

My research is in the Neurogenomics Laboratory with Dr. Andreas Pfenning. I am studying the genetic basis of vocal learning behavior, which includes drawing parallels between bird’s song and human’s speech production, using computational techniques. Results from this could be helpful in identifying the overall evolution of complex behavior. Currently, I am investigating the convergent evolution of this vocal learning in mammals, identifying regulatory and non-coding factors that are enriched for this behavior. I am in the process developing a pipeline to identify these factors using data and tools from our lab and collaborators. Though currently the data we have is not suitable for vocal learning as a trait and instead using domestication as a validation of our pipeline, a imminent publication including a sequencing of over 200 species of mammals will be extremely instrumental with our research efforts.

Emma JinEmma Jin, sophomore
Advisor: Andreas Pfenning

Investigating Language Evolution in Humans

Language acquisition is one of the most complex traits, requiring sound production, vocal learning, grammar, and more. In addition, language is unique to humans and appeared in a short span of time. These characteristics make it a good candidate for a trait that evolved through changes in regulatory regions, a hypothesis supported by similarities in regulatory structures within different vocal learners. To identify potential mutations, non-coding regions around language-related genes were considered in Neanderthal, Denisovan, human, and chimp DNA, using 1000 genomes data to exclude all human variation as language is a common trait. Sites where human differed from the other three were intersected with open chromatin peaks in brain, liver, GABA-ergic neurons, and glutamatergic neurons ATAC-seq and H3k27ac regions. A support vector machine (SVM) model predicted the effect of the identified mutations on enhancer expression in GABA-ergic and glutamatergic neurons. In total, sixteen human specific mutations were identified, seven of which are actively transcribed, exhibiting varied shifts in enhancer activity. Future work will expand the scope of the search and further examine differences in expression caused by these mutations.

Benjamin SoudryBenjamin Soudry, junior
Advisor: Hosein Mohimani

TensorDSH: Machine Learning Techniques for Identifying new Antibiotics

Identification of proteins and other molecules from Tandem Mass Spectrometer (MS/MS) data is critical for discovering new antibiotics, and other medical research. The current state-of-the-art statistical technique for matching protein sequences with MS/MS data is very accurate, but slow for large datasets. For my undergraduate research I worked with Professor Hosein Mohimani and graduate student Mihir Mongia to aid in developing and testing novel machine learning and natural language processing algorithms that build on this state-of-the-art technique, but provide orders of magnitude speedup. Our algorithm, called TensorDSH, uses a new type of Distribution Sensitive Hashing to map pairs of data with high probability of being generated from the same distribution together. The ultimate goal is to use this algorithm to create a software tool that medical researchers can use to assist in their research.

Chaitanya SrinivasanChaitanya Srinivasan, junior
Advisor: Andreas Pfenning

Identifying Epigenetic Factors associated with Nicotine Addiction

Regulatory elements in the genome such as enhancers and promoters control gene expression in the context of nicotine addiction with varying effects across different cell types in the brain. Particularly, different neuronal subtypes have different active enhancer regions. Their gene expression networks are rewired as the interplay of environmental, genetic, and social factors in nicotine addiction are positively reinforced. Large scale genomics studies have shown that regions in the genome that are associated with nicotine addiction are in enhancer regions, and not protein coding regions. We employ computational techniques to investigate the role of these regions across the genome. First, we identify neuronal subtypes enriched for genetic variants associated with nicotine addiction against a broad background of putative regulatory elements defined by open chromatin measurements. We further interrogate these cell type-specific regions by quantitatively extracting the differentially expressed enhancer regions, and use prediction tools to identify their functional roles. The identification of functional roles of enhancer regions associated with nicotine addiction will provide potential therapeutic targets to reduce synaptic and circuit changes in the brain induced by nicotine that drive maladaptive behavior.

William YangWilliam Yang, junior
Advisor: Bob Murphy

Learning the Hidden Structure of Cells

Fluorescent microscopy and brightfield microscopy are two popular imaging techniques used by biologists to study the underlying structure of cells. Fluorescence microscopy captures information about the locations of specific organelles by attaching fluorescent molecules called fluorophores. Brightfield microscopy captures optical density of cells by measuring attenuation of visible light, which provides information regarding the cellular structure. We present a generative model for translating multi-channel fluorescent images to bright-field images to measure the amount of the optical densities that can be explained by the locations of major organelles. We present an additional model that identifies unexplained regions of the brightfield image from the previous model.

Other Computational Biology presenters:

  • Yeonju Kim and Naomi Shin, juniors (advisor Andreas Pfenning)
  • Noelle Toong, senior (advisor Andreas Pfenning)