Bioinformatics Unit 1: Exercise 2

SSU Home | SSU Biology | CourseInfo | Forum | Home

Glossary   |   Self Tests   |   Software   |   Objectives   |   Articles


Unit 1: Databases & Queries
Exercise 2, Section B

Exercise:

Exercise 1: Intro to bioinformatics

Part 1- Other Interfaces

Exercise 2: Section A

Part 2- Other Databases

Exercise 2: Section C

Summary questions

Project 1: A look at the literature

Further exploration

.

Exercise 2: Molecular Databases and Search Tools
Section B


Note: Bring a disk to class to save your log and any other useful files.


Objectives for Part B:

1. Gain an understanding of the basics of bioinformatics.

  • Explore some key sites containing databases and application software.
  • Become familiar with some of the uses of bioinformatics.
  • Appreciate the dynamic nature of this rapidly growing field.

2. Become familiar with some molecular databases and interfaces.

  • Be able to query databases and search for molecular sequences.
  • Develop strategies for refining searches and improving results in data retrieved.

Introduction:

In this part of the exercise, you will explore a variety of types of databases and interfaces useful in various types of bioinformatics and computational biology analyses. You will expand your skilss in searching for specific nucleic acid and protein sequences by using different database resources. You will have the opportunity to become familiar with Biology Workbench, an interface for using many types of tools, many of which you have seen at NCBI. The additional advantage to Biology Workbench is that you can store work sessions and files on a server. You may alsodownload and upload files locally.

You will also be exploring several databases which we'll be using later on in the course. They are introduced here as part of a general survey of useful databases. Included are a database of ecological interest, genome databases, and sites linking sequence information and metabolic pathways.

There are summary questions at the end of this section. Read them through before you start browsing. You can answer them as you go, or answer them after browsing the following sites. Points = 2. Due 9/9 midnight or 9/11.

[top of page]

Exercise:

Part 1: Other Interfaces to Databases and Tools

Many find that it is easier to become familiar with sites and tools by casual exploration before having to formally use them. Others are eager to see what else is out there. Both Parts 1 and 2 offer the opportunity to do that.

Explore the following to see what they have to offer. You will find these sites quite useful as you work your way through problems, either in terms of locating useful data and/or applications. In addition, you will find a wealth of information relating to bioinformatics and biotechnology.

As you visit each of these sites, make some notes in your log, so you can easily jog your memory later on. Make some queries of databases to get a feel for how they work and to see what you get back. Also note how the sites are organized and the means of navigation available within the site. In other words: How quickly do you become comfortable using the sites? Is help or resource information available?

A. Biology Workbench is a major interface site which provides many unique features. A powerfully integrated tool resource, it allows you to search multiple databases simultaneously and to use a very wide variety of tools to examine proteins, nucleotide sequences, and alignments.

[http://workbench.sdsc.edu]

All you need to do the first time is to choose a user name & password. The cool thing is that it can save your work sessions, so you can come back to them, even months later. You can upload and download from it as well, so you can easily transfer material to a log, and to a report. The downside to using Biology Workbench is that it takes a little practice to navigate following a simple rule of not using the "back" button, because it can cause problems.

Browse the site to get somewhat familiar with what is available and how it works. Introductory tutorials are available from the homepage- scroll down until you see the link.

Learning to use Biology Workbench is an excellent investment of effort, in that you can save lots of time both in database searches and in applications, since so many are readily available without having to go out and look for them.

B. San Diego Supercomputing Center has lots of useful links and resources applicable to bioinformatics:

http://restools.sdsc.edu/

Here you can link to a variety of resources, which are conveniently organized by general topics.

For a more general look at what all else is available at SDSC, and for links to other sections, try: http://www.sdsc.edu/. SDSC is one of the partners in the development of the Teragrid, the largest computing environment ever. See http://www.teragrid.org/ to read about it.

C. The European Bioinformatics Institute has many useful tools and provides access to several databases. We'll be using some of these as we proceed in the course.

[www.ebi.ac.uk]

D. ExPASy [Expert Protein Analysis System] is a site dedicated to databases and tools for proteomics. This site, among others, will be useful in Unit 5.

[www.expasy.ch/]

E. TIGR [The Institute for Genomic Research] provides access to databases, tools, and links of use for genomics, useful in Unit 2 and beyond.

http://www.tigr.org/
 [top of page]

Part 2: Other Useful Databases

Ready for more exploration and browsing? If not, take a break. The databases in this section are quite varied and specific for different kinds of use.

As you did in Part 1, you should keep track of characteristics of these sites, unique features of interest, and so forth.

A. The next Unit is on genomics, so it is a good idea to preview some of the resources available in this area. The Human genome database can be accessed through a variety of entry sites. Try the following:

http://gdbwww.gdb.org/

http://www.ncbi.nlm.nih.gov/Entrez/

At Entrez, you can browse a map of the human genome by clicking on the link in the right-hand column. You can access all 800 genomes currently deposited in the database by clicking on "Genomes" in the main list. Spend a little time to explore ways of accessing information.

B. The GenoList at L'Institut Pasteur is definitely worth visiting. Although still small, this site has grown significantly in the last year. Care and planning has gone into the creation of easy access.

http://genolist.pasteur.fr/

SubtiList is the oldest, created as a relational database with an intuitive map interface. As you can see by this unique map, there was an international effort to sequence the genome of Bacillus subtilis. Mouse over the map and watch the link bar on the bottom of your screen as you move around. Click somewhere to see what happens. If you are interested in bacteria, you can browse the other databases at your leisure. [I did note that they let a yeast into the group, so maybe the taxonomic range will increase in the future.]

C. For other means of access to genomes on-line, check out the following:

http://wit.integratedgenomics.com/GOLD/

http://www.tigr.org/tdb/

You may or may not have discovered the genomic database resources on your first visit to TIGR. If you did, you can move on. If not, take a few minutes to see what is available.

D. As an immunologist, I just have to put in one example of a narrow focus site. The Kabat Database of proteins of immunological interest is a worthy example of special interest databases.

http://immuno.bme.nwu.edu/

If you have had a course in immunology or are taking it now, you should find this of interest and possibly useful in the future. If not, well, maybe you learned something anyway.

E. Challenging and interesting areas undergoing incredible growth and development in terms of bioinformatics and computational biology are ecology and biodiversity. Check out the Environmental & Biodiversity interface at SDSC:

http://biodi.sdsc.edu/

Browse some of the introductions and articles here to get a feel for what is available and what is being developed.

Visit also the Biological Records Centre, which provides ecology database resources specific for the British Isles. This gives you a taste of what could be available elsewhere, and you should get an appreciation of what a tremendous undertaking it will be.

http://www.brc.ac.uk/

F. Finally take a look at some resources for metabolic pathways. We'll spend some time exploring these further in Unit 6.

At Kyoto Encyclopedia of Genes and Genomes [KEGG], check out KEGG's search and computation tools
http://www.genome.ad.jp/kegg/
For an introduction to KEGG, read the following paper:
http://igs-server.cnrs-mrs.fr/~ogata/Paper/ogata98BioSys.html

At the University of Oxford, there is access to Pathway, a database of inherited metabolic diseases: http://oxmedinfo.jr2.ox.ac.uk/Pathway/Miscell/welcome.htm

Although not recently updated, this site does have some interesting features.

Another good site is What is There? [WIT] database, part of Argonne Computational Biology Group:
http://wit.mcs.anl.gov/WIT2/

ERGO is another database of metabolic interest, sponsored by Integrated Genomics, Inc. http://wit.integratedgenomics.com/IGwit/

The metabolic part of SoyBase, part of USDA's Plant Genome Project, is worth some exploration also:

http://cgsc.biology.yale.edu/metab.html
[top of page]

Summary Questions:

Try to limit your answers to 1-2 typed pages [12 pt font].

1. As you explored the different sites in Parts 1 and 2, you likely found that you preferred some sites over others. Keeping a critical eye out for what works for you and what doesn't work as well can help in several ways. By recognizing specific problems, you can often find constructive solutions to making your navigation easier, which you can share with others. When the opportunity exists, you can give constructive feedback to the webmasters of the sites.

a. In Part 1 you visited five interface sites other than NCBI. Rank your top two choices of all six sites. State the criteria you used in your ranking.

b. Now go back and rank your bottom choice of the sites visited. State what you found to be problems with this site. What would be needed to fix them?

c. In Part 2 you visited at least 13 database sites. Rank you top three choices of all database sites visited. State the criteria you used in your ranking.

d. Now go back and rank your bottom three choices of the sites visited. State what you found to be problems with these sites. What would be needed to fix them?

2. Most people require a reasonable amount of time to become comfortable with a given piece of software or a complex web interface. There is generally some initial discomfort when a web site is changed or software is updated. In bioinformatics, one is often required to use multiple tools to accomplish needed tasks, some local and some web-based. For the sake of comfort and efficiency, many will use only one interface or will keep the number if interfaces to a minimum. However, it is useful to break out once in awhile to see what is out there and to discover if there is anything that could contribute to the task at hand.

a. As you moved from site to site, what strategies did you use to quickly orient yourself?

b. What are your preferences for navigation within a web site? Are these preferences supported at most of the sites you visited? If so, how did they contribute to your ability to familiarize yourself with any given site?

 

Further exploration:

As you read and digest what you have just gone through, go back and browse some of the sites in more detail. This exercise is like one of those fast-paced package tours, where you probably want to get out and leave the trail to explore more independently. It also helps to do segments of browsing to help solidify where you have been and what you have been doing. The more time you can put into it, the more you'll get back from your efforts.

Other sites to visit:

1. At the Southwest Biotechnology and Informatics Center, there are 1000's of link resources, including the game Origin: Unknown, which provides a fun way to learn bioinformatics:

http://www.nbif.org/

Now that you've had some experience with BLAST, give Origin: Unknown a try. To play, just click on "Games" under Education. I suggest doing the individual challenges in order the first time, since they build in complexity. Beware that you'll be fired if you screw up, but don't let that stop you. Just recycle back in and you'll find that all is forgiven and you are back at work.

Check out access to different databases and available tools.

2. A concise resource list at NOAA is nicely organized. This isn't an interface like the others, but it includes many of the sites which you have visited so far, and many that you haven't yet. Hence why I've included it.

http://www.nwfsc.noaa.gov/bioinformatics.html

[top of page]

.

Updated 09/11/2003 by bsc@classroomtools.com, thatcher@sonoma.edu