IT-SC
IT-SC
$
Beginning Perl for Bioinformatics
James Tisdall
Publisher: O'Reilly
First Edition October 2001
ISBN: 0-596-00080-4, 384 pages
This book shows biologists with little or no programming
experience how to use Perl, the ideal language for biological
data analysis. Each chapter focuses on solving particular
problems or class of problems, so you'll finish the book with a
solid understanding of Perl basics, a collection of programs for
such tasks as parsing BLAST and GenBank, and the skills to
tackle more advanced bioinformatics programming.
IT-SC
2
IT-SC
1
Preface
What Is Bioinformatics?
About This Book
Who This Book Is For
Why Should I Learn to Program?
Structure of This Book
Conventions Used in This Book
Comments and Questions
Acknowledgments
1. Biology and Computer Science
1.1 The Organization of DNA
1.2 The Organization of Proteins
1.3 In Silico
1.4 Limits to Computation
2. Getting Started with Perl
2.1 A Low and Long Learning Curve
2.2 Perl's Benefits
2.3 Installing Perl on Your Computer
2.4 How to Run Perl Programs
2.5 Text Editors
2.6 Finding Help
3. The Art of Programming
3.1 Individual Approaches to Programming
3.2 Edit—Run—Revise (and Save)
3.3 An Environment of Programs
3.4 Programming Strategies
3.5 The Programming Process
4. Sequences and Strings
4.1 Representing Sequence Data
4.2 A Program to Store a DNA Sequence
4.3 Concatenating DNA Fragments
4.4 Transcription: DNA to RNA
4.5 Using the Perl Documentation
4.6 Calculating the Reverse Complement in Perl
4.7 Proteins, Files, and Arrays
4.8 Reading Proteins in Files
4.9 Arrays
4.10 Scalar and List Context
4.11 Exercises
5. Motifs and Loops
5.1 Flow Control
5.2 Code Layout
5.3 Finding Motifs
5.4 Counting Nucleotides
5.5 Exploding Strings into Arrays
5.6 Operating on Strings
5.7 Writing to Files
IT-SC
2
5.8 Exercises
6. Subroutines and Bugs
6.1 Subroutines
6.2 Scoping and Subroutines
6.3 Command-Line Arguments and Arrays
6.4 Passing Data to Subroutines
6.5 Modules and Libraries of Subroutines
6.6 Fixing Bugs in Your Code
6.7 Exercises
7. Mutations and Randomization
7.1 Random Number Generators
7.2 A Program Using Randomization
7.3 A Program to Simulate DNA Mutation
7.4 Generating Random DNA
7.5 Analyzing DNA
7.6 Exercises
8. The Genetic Code
8.1 Hashes
8.2 Data Structures and Algorithms for Biology
8.3 The Genetic Code
8.4 Translating DNA into Proteins
8.5 Reading DNA from Files in FASTA Format
8.6 Reading Frames
8.7 Exercises
9. Restriction Maps and Regular Expressions
9.1 Regular Expressions
9.2 Restriction Maps and Restriction Enzymes
9.3 Perl Operations
9.4 Exercises
10. GenBank
10.1 GenBank Files
10.2 GenBank Libraries
10.3 Separating Sequence and Annotation
10.4 Parsing Annotations
10.5 Indexing GenBank with DBM
10.6 Exercises
11. Protein Data Bank
11.1 Overview of PDB
11.2 Files and Folders
11.3 PDB Files
11.4 Parsing PDB Files
11.5 Controlling Other Programs
11.6 Exercises
12. BLAST
12.1 Obtaining BLAST
12.2 String Matching and Homology