Bioinformatics Unit 2: Activity

SSU Home | SSU Biology | Syllabus | Calendar | Course Materials Home

Glossary   |   Self Tests   |   Software   |   Objectives   |   Articles


PERL and BIOPERL


Objectives:

1. Be able to answer the question, "What is PERL?"
2. Learn how to obtain, configure and run a PERL script.
3. Know why PERL is a mainstay of bioinformatics.
4. Have a brief introduction to BIOPERL.

Introduction:

PERL stands for Practical Extraction and Reporting Language -- a boring name for a versatile, efficient programming language with a highly developed capacity to detect patterns in data, especially text strings. Other programming languages can be used for bioinformatics, but PERL continues to be the international favorite.

Databases full of PERL scripts and vast resources for PERL programmers make it very easy to find or write scripts for almost any task. PERL runs on any platform, code is easily portable from one platform to another, requires little code to perform useful tasks, and has a flexible syntax. There is always more than one way to do it in PERL!

Activity:

Part A:

1. Abundant resources are available for programmers at all levels. See, for example, the PERL homepage at O'Reilly at http://www.perl.com/.

2. For an almost unlimited collection of scripts written in PERL, see the Comprehensive Perl Archive Network, CPAN at http://www.cpan.org/.

Part B:

1. An example of a useful PERL script is BlastReport. This is a Perl script that sorts through BLAST output and formats it for rapid evaluation. The original paper, entitled BlastReport: A Perl Script to Facilitate the Use of Sequence Databases for Mapping and Clustering, was published in BioTechniques 29: 1272-1276, 2000 by Jeanette McClintick and Howard J. Edenberg. BlastReport interfaces with a program called Blastcl3 or network blast.

2. BlastReport can be downloaded from http://www.genomics.iu.edu/Blastrep/BlastReport2.txt. Blastcl3 is available from ftp://ftp.ncbi.nih.gov/blast/executables/release/2.2.6/netblast-2.2.6-powerpc-macosx.tar.gz. (Hint: use Fetch or other telnet ftp client, or go through http://www.ncbi.nih.gov/Ftp/index.html.) There are two files with helpful information about how to set up and run the applications: a Readme, http://www.genomics.iu.edu/Blastrep/Readme.html, and an Update.htm, http://www.genomics.iu.edu/Blastrep/Update.html.

3. BlastReport produces three files:
* BRept_Full provides a summary for each query sequence followed by the alignments that are most likely to be biologically significant.
* BRept_Sum is a list of the regions of alignment only.
* BRept_Tab is a tab-delimited file containing query name, the name of the aligning sequence and the first region of alignment.
BRept_Tab can be opened in a spreadsheet such as ExcelŽ to produce a table of alignments that can be sorted in whatever way is most useful. Sorting by the aligning sequence makes it easy to see if that sequence contains multiple hits to query sequences, such as 2 markers or a marker and a portion of a cDNA.

Part C:

1. For an overview of BIOPERL please read The Bioperl toolkit: Perl modules for the life sciences.

2. Then check out the BIOPERL website at http://bioperl.org/.

[top of page]

.

Updated 10/7/2003 by bchapman@classroomtools.com