RESOURCES
Essential Background Reading
- Molecular biology resources for computer science students
- Introduction to DNA transcription, RNA transcription, and translation
- Introduction to Proteins (read introduction and “Levels of Structure”)
- More extensive molecular biology primer
- Probability and statistics primer
- Short Python Primer
- More extensive Python primer
Chapter 1 Readings
- Basic Local Alignment Search Tool
- PCR Story - Scientific America
- A Model of Evolutionary Change in Proteins - Dayhoff et al. 1978
- Amino acid substitution matrices from protein blocks - Hennikoff 1992 (BLOSUM matrices)
- Karlin 1989 - Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes
- Meyers - An Overview of Sequence Comparison Algorithms in Molecular Biology
- Smith-Waterman - Identification of Common Molecular Subsequences
- Waterman - A local algorithm for DNA Sequence Alignment With Inversions
Chapter 2 Readings
- Knuth, Morris, Pratt /- Fast Pattern Matching in Strings (1977)
- Crochemore and Lecroq /- Pattern Matching and Text Compression Algorithms
- Pettorossi /- Automata Theory and Formal Languages
- DFA Slides
- NFA Slides
- Regular Expression Slides
- Sisper /- Introduction to the Theory of Computation
Eric Davidson Memorial Lecture Readings
- Eric Davidson: Master of the Universe” Developmental Biology 2016
- Criticizing Professor Dijkstra Considered Harmless” Conduit 2008
- When Professor Dijkstra Slapped Me in the Quest for Beautiful Code
Chapter 3 Readings
- Simon /- The Architecture of Complexity, including Hora and Tempus
- Barton: Evolution: Chapter 27: Phylogenetic Reconstruction
- Fisher Tasting Tea Puzzle
Chapter 4 Readings
Chapter 5 Readings
Books
- Introduction to Computational Biology: Maps, Sequences and Genomes by Michael Waterman. An important book, by one of the fathers of computational biology; basic and advanced statistical methods, algorithms.
- Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids by Richard Durbin, Sean Eddy, Anders Krogh, and Graeme Mitchison. An important book, strong introduction to statistical methods, focused in large part on Hidden Markov models algorithms.
- An Introduction to Bioinformatics Algorithms by Dan Gusfield. A book dedicated to Suffix trees and combinatorial algorithms; computer science methods, no statistical methods.
- Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum. A book for biologists with great illustrations and tutorials about the biology and computational methods involved.
Additional Background Reading
- Regex cheatsheet from MIT
- Molecular biology review from the NCBI
- A Structure for Deoxyribose Nucleic Acid (Watson & Crick, 1953)
- The Unusual Origin of PCR (Mullis, 1990)
- Introduction to alignment scoring statistics
- Song Bird I, Song Bird II, Song Bird III
Online Demos
Mathematics Demos
- Play with alignment! Evaluate the notebook and then move the match, mismatch, and gap penalty sliders to see how the parameters affect the output.
- Play with Lander-Waterman statistics and genome assembly! You’ll have to enter a new name for the file location of the fasta file where the genome to be assembled is stored.
Additional Resources
- 40 Years of Suffix Trees
- Language and Animals
- PCR story in Scientific American
- Pattern Matching Book
- A Model of Evolutionary Change in Proteins
- Regulatory Motif Finding
- Logic functions of the genomic cis-regulatory code
- Final Exam Answers
- The regulatory genome and computer
- How does the regulatory genome work?
- Eric Davidson’s Regulatory Genome for Computer Science: Causality, Logic, and Proof Principles of the Genomic cis-Regulatory Code
After CS1810
Professor Istrail will be teaching CS2840, Advanced Algorithms in Computational Biology and Medical Bioinformatics, in Spring 2025. Professor Istrail teaches CS1820, Algorithmic Foundations of Computational Biology, every other spring. See here for a list of course topics. The course will next be offered in Spring 2026.