Selected Topics in Computational Biology (SS 2007)
- Lectures:
Dr. Jens Ernst
- Module:
IN2127
-
3 hrs of lecture per week (in English)
- Location and Times:
Fri. 10:45 - 13:00, MI Room 03.11.18
- Audience:
Students of Bioinformatics
- Prerequisites:
Discrete Structures
Fundamental Algorithms or Efficient Algorithms
Randomized Algorithms
Algorithms for Bioinformatics I,II
- Information and Course Material:
Topics:
- 1. Pairwise Sequence Alignment
- 1.1. Preliminaries
- 1.1.1. Basic Definitions
- 1.1.2. Edit Distances
- 1.1.3. Alignments and their Relations to Edit Distances
- 1.2. Needleman Wunsch Algorithm
- 1.2.1. Overview and Definitions
- 1.2.2. Inductive Derivation
- 1.2.3. Hirschberg Optiomization
- 1.3. Gapped Alignments
- 1.3.1. End-gap Free Alignments
- 1.3.2. Smith Waterman Algorithm
- 1.3.3. Special Gap Penalties - Introduction
- 1.3.4. Byers Algorithm
- 1.3.5. Affine Gap Penalties (Gotoh)
- 1.3.6. Convex Gap Penalties (Myers, Miller)
- 1.3.7. Four-Russians Speedup
- 2. Suffix Trees - Theory, Implementation and Applications
- 2.1. Ukkonen's Online Linear-Time Algorithm
- 2.1.1. Constructing Suffix Tries
- 2.1.2. Structure and Representation of Suffix Trees
- 2.1.3. Efficient Construction of Suffix Trees
- 2.2. McCreight's Algorithm (Outline Only)
- 2.2.1. Overview and Definitions
- 2.2.2. Linear-time Algorithm
- 2.2.3. Hash Code Implementation
- 3. Applications of Suffix Trees in Computational Biology
- 3.0. Classification of Exact String Matching Problems
- 3.0.1. Indexing Problems
- 3.0.2. Dictionary Matching Problems
- 3.1. Indexing - Obvious Application of Suffix Trees
- 3.2. Indexing with One Error
- 3.2.1. Central Idea
- 3.2.2. Efficient Computation of Set Intersections
- 3.2.3. Indexing With One Error in the Presence of Exact Matches
- 3.3. Using Suffix Trees for Dictionary Matching
- 3.4. Dictionary Matching with One Error
- 3.4.1. Central Idea
- 3.4.2. Set Intersection on Tree Paths
- 3.5. Longest Common Substrings of Two or More Strings
- 4. Suffix Arrays - Outline
- 4.1. Basic Definitions
- 4.2. Indexing using Accelerants and Super-Accelerants
- 5. Selected Algorithms for Gene Expression Analysis
- 5.1. Introduction
- 5.1.1. Basics of Gene Expression
- 5.1.2. DNA Microarray Technology
- 5.1.3. Sources of Systematic and Random Error in Microarray Experiments
- 5.1.4. Gene Expression Profiling (Type-II Experiments)
- 5.2. Similarity and Dissimilarity Measures for Gene Expression Profiles
- 5.2.1. Definition for Similarity and Distance Measures
- 5.2.2. Minkowski Distance
- 5.2.3. Pearson Correlation Coefficient
- 5.2.4. Spearman Rank-Order Correlation
- 5.2.5. Kendall's Tau
- 5.2.6. Jackknife Correlation
- 5.2.7. Mutual Information
- 5.3. Elementary Clustering Algorithms
- 5.3.1. Hierarchical Clustering
- 5.3.1.1. Single Linkage
- 5.3.1.2. Complete Linkage
- 5.3.1.3. Average Linkage
- 5.3.2. K-Means
- 5.3.3. HCS
- 5.3.4. Self-Organizing Maps
- 5.4. Probabilistic Algorithms
- 5.4.1. Short Reminder On Probability Theory
- 5.4.2. Introduction to Tail Inequalities
- 5.4.2.1. Markov and Chebyshev
- 5.4.2.2. Chernoff
- 5.4.3. CAST
- 5.4.4. CLICK
(Note: The information given here is incomplete, preliminary and will be subject to change.)
Suggested Reading:
Selected Literature:
- "The smallest atuomaton recognizing the subwords of a text", A. Blumer et. al., Theoretical Computer Science 40 (1985), 31-55
- "A comparison of imperative and purely functional suffix tree constructions", R. Giegerich, S. Kurz, Science of Computer Programming 25 (1995), 187-218
- "From Ukkonen to McCreight and Weiner: A Unifying View to Linear-Time Suffix Tree Construction", R. Giegerich, S. Kurtz, Algorithmica 19 (1997), 331-353
- "Algorithms on Strings, Trees and Sequences", Dan Gusfield Cambridge University Press
- "Suffix Arrays: A new method for on-line string searches", U. Manber, G. Myers, SIAM J. Comput., Vol 22, No 5 (1993), 935-948
- "A Space-Economical Sufix Tree Construction Algoritm", E. McCreight, Journal of the ACM, Vol 23, No 2 (1976), 262-272
- "Practical Suffix Tree Construction", S. Tata et. al., Proceedings of the 30th VLDB Conference (2004)
- "On-Line Construction of Suffix Trees", E. Ukkonen Algorithmica (1995), 249-260
Office hours:
by appointment