18.417 Intro to Computational Biology Lecture 4 BLAST & Database Search

BLAST and Database Search

More sequence - New problems

Speeding up your searches

BLAST and Database Search

What is hashing

PPT Slide

Blast Algorithm Overview

Breaking up the query

Generating the neighborhood

Looking into database

Extending the seeds

Statistical Significance

PPT Slide

BLAST and Database Search

Extensions: Filtering

Extensions: Two-hit blast

BLAST and Database Search

Substitution Matrices

Amino Acids

PAM matrices

BLOSUM matrices

Computing Substitution Matrices

BLAST and Database Search

Overview: Why k-mers work

Pigeonhole principle

Extending pigeonhole principle

Pigeonhole and sequences: Worst case

Random model: Average case

PPT Slide

Random Model: simulation

PPT Slide

True alignments: Looking for K-mers

True alignments: Looking for K-mers

Summary: Why k-mers work

BLAST and Database Search

Identifying exons

Part I - Parameter tuning

Part II - choosing a threshold

Part III - Gene identification

Estimating human gene number

Gene content by Chromosome