Bioinformatics Unit 6: Exercise

SSU Home | SSU Biology | CourseInfo | Calendar | Home

Glossary   |   Self Tests   |   Software   |   Objectives   |   Articles


Unit 6: Metabolism & Networks
Exercise
.

Exercise: Metabolism & Networks

Making Connections


Objectives:

1. Develop concepts of integrative thinking.
2. Learn about algorithms and software for handling large, complex data sets.
3. Visit databases for metabolic pathways, protein-protein interactions, and signaling.
4. Learn fundamentals of proteomics.
5. Be introduced to concepts of pharmaceutical drug design and data mining.

Introduction: Functions of the genome

Genes encoded in genomic DNA contribute to four major areas of function in a living cell. These functions are metabolism (metabolic pathways), regulation (regulatory control systems), signaling (signal transduction cascades), and assembly (architectural construction).

Metabolic pathways allow a cell to extract chemical energy from the environment for making cell components and for doing other useful work. Regulatory control circuits are biochemical mechanisms that turn processes on and off, such as gene expression, or regulate the timing of events. Signal transduction cascades provide internal and external sensory functions of the cell. Signaling can be simple, for example, signal peptides in protein sequences that determine whether a protein is translated into the lumen of the endoplasmic reticulum, or transported through the mitochondrial membrane. Signaling functions are more often carried out by complex networks of interacting proteins.

The complexity of these processes requires specialized databases for storage of experimental results, and specialized software to assemble and correlate information. Visualization tools are particularly important in analysis and presentation of these complex data sets. This exercise is designed to introduce key concepts and methods for analysis of pathways, networks and systems.

There are summary questions at the end of this section. Read them through before you start the activities in this exercise. You can answer them as you go, or answer them afterward. Points = 10. Due by the beginning of class on December 4, 2003.

Pre-Exercise:

1. Read through the text sections suggested for this unit.

2. Also examine the three review articles linked on the unit 6 main page.

Exercise:

Part A: Pathway databases.

1. For an impressive view of biochemical pathways go to the biochemical pathways wall chart provided by ExPASy. You can search by enzyme keyword or pull up a thumbnail view of a portion of the pathways map by clicking on the region of interest. In thumbnail view you can also look at cellular processes from a cellular and molecular perspective.

2. The What Is There curation project has developed a database of function assignments made to genes and from them has developed metabolic models. Visit the WIT homepage sponsored by Argonne National Laboratory. Go through the tutorial, noting the features of the database and the services provided.

3. The Human Protein Reference Database, at www.hprd.org is an integrated pathway database of human reference proteins. The HPRD interface displays domain architecture, post-translational modifications, interaction networks, and disease associations for each protein in the human proteome. The ultimate goal of the database designers is to provide a platform for human systems biology. The HPRD site is self explanatory. Enjoy!

 

Part B: Algorithms for analyzing complex data sets.

1. Software for analysis of complex data sets is based on four main types of algorithms: clustering (including self-organizing maps), classification (decision trees, neural networks), regression (extrapolation), and optimization (genetic algorithms, statistical modeling). See some interesting software applications called CLUSTER and TREEVIEW written by Michael Eisen. The Cluster/TreeView manual describes the interface. You can also play with fuzzy clustering software.

2. Data mining is a well-established concept in computer science, where many of the methods involve artificial intelligence software (machine learning). An important application to biological literature databases is MedMiner. Learn about this software tool by doing the short tutorial.

 

Part C: Proteomics.

1. One of the best protein bioinformatics sites in the world is maintained by the Swiss Institute of Bioinformatics. Check out the homepage for an impressive list of linked databases, software packages, education and services, and documentation.

2. The proteome of a cell can be analyzed experimentally by a combination of 2D-PAGE (two-dimensional polyacrylamide gel electrophoresis) and mass spectrometry. An important database for proteomics is the SWISS-2DPAGE database available at ExPASy. Take a look at how it is organized. To get an idea of what the mass spectrum of a protein looks like, try entering a protein into the PeptideMass calculator which computes a theoretical mass fingerpring for a SWISS-PROT entry or user-entered sequence.

 

Summary Questions:

Please limit your answers to the set of questions to a maximum of three double-spaced pages [12 pt font]. This length should be sufficient for your comments and any appropriate copy/pasted examples. [You need not retype or copy/paste the questions as part of your responses.]

1. Based on your review of documentation on the SWISS-PROT database, how is biochemical information assigned to sequence entries?

2. From information in the WIT tutorial, define "ortholog" and explain why this term might be abused or confused.

3. Explain what a clustering algorithm does. Provide an example of an application of clustering to a complex biological data set.

4. Having chosen a gene, enzyme, or genetic disease of interest to you, summarize information you retrieved using MedMiner.

[top of page]

.

Related Links:
ExPASy | 2D-PAGE maps | Enzyme Names | Proteomics Tools | MedMiner

Updated 12/05/2003 by thatcher@sonoma.edu; bchapman@classroomtools.com