Exercise 2: Molecular Databases and Search
Tools
Section B
Note: Bring a disk to class to save your log and any
other useful files.
Objectives for Part B:
1. Gain an understanding of the basics of
bioinformatics.
- Explore some key sites containing databases and
application software.
- Become familiar with some of the uses of
bioinformatics.
- Appreciate the dynamic nature of this rapidly growing
field.
2. Become familiar with some molecular databases and
interfaces.
- Be able to query databases and search for molecular
sequences.
- Develop strategies for refining searches and
improving results in data retrieved.
Introduction:
In this part of the exercise, you will explore a variety
of types of databases and interfaces useful in various types
of bioinformatics and computational biology analyses. You
will expand your skilss in searching for specific nucleic
acid and protein sequences by using different database
resources. You will have the opportunity to become familiar
with Biology Workbench, an interface for using many types of
tools, many of which you have seen at NCBI. The additional
advantage to Biology Workbench is that you can store work
sessions and files on a server. You may alsodownload and
upload files locally.
You will also be exploring several databases which we'll
be using later on in the course. They are introduced here as
part of a general survey of useful databases. Included are a
database of ecological interest, genome databases, and sites
linking sequence information and metabolic pathways.
There are summary questions at the end of this
section. Read them through before you start browsing. You
can answer them as you go, or answer them after browsing the
following sites. Points = 2. Due 9/9 midnight or
9/11.
[top of
page]
Exercise:
Part 1: Other Interfaces to Databases and
Tools
Many find that it is easier to become familiar with sites
and tools by casual exploration before having to formally
use them. Others are eager to see what else is out there.
Both Parts 1 and 2 offer the opportunity to do that.
Explore the following to see what they have to offer. You
will find these sites quite useful as you work your way
through problems, either in terms of locating useful data
and/or applications. In addition, you will find a wealth of
information relating to bioinformatics and
biotechnology.
As you visit each of these sites, make some notes in your
log, so you can easily jog your memory later on. Make some
queries of databases to get a feel for how they work and to
see what you get back. Also note how the sites are organized
and the means of navigation available within the site. In
other words: How quickly do you become comfortable using the
sites? Is help or resource information available?
A. Biology Workbench is a major interface site
which provides many unique features. A powerfully integrated
tool resource, it allows you to search multiple databases
simultaneously and to use a very wide variety of tools to
examine proteins, nucleotide sequences, and alignments.
[http://workbench.sdsc.edu]
All you need to do the first time is to choose a user
name & password. The cool thing is that it can save
your work sessions, so you can come back to them, even
months later. You can upload and download from it as
well, so you can easily transfer material to a log, and
to a report. The downside to using Biology Workbench is
that it takes a little practice to navigate following a
simple rule of not using the "back" button, because it
can cause problems.
Browse the site to get somewhat familiar with what is
available and how it works. Introductory tutorials are
available from the homepage- scroll down until you see
the link.
Learning to use Biology Workbench is an excellent
investment of effort, in that you can save lots of time
both in database searches and in applications, since so
many are readily available without having to go out and
look for them.
B. San Diego Supercomputing Center has lots of
useful links and resources applicable to bioinformatics:
http://restools.sdsc.edu/
Here you can link to a variety of resources, which are
conveniently organized by general topics.
For a more general look at what all else is available
at SDSC, and for links to other sections, try: http://www.sdsc.edu/.
SDSC is one of the partners in the development of the
Teragrid, the largest computing environment ever. See
http://www.teragrid.org/
to read about it.
C. The European Bioinformatics Institute
has many useful tools and provides access to several
databases. We'll be using some of these as we proceed in the
course.
[www.ebi.ac.uk]
D. ExPASy [Expert Protein Analysis System]
is a site dedicated to databases and tools for proteomics.
This site, among others, will be useful in Unit 5.
[www.expasy.ch/]
E. TIGR [The Institute for Genomic
Research] provides access to databases, tools, and links
of use for genomics, useful in Unit 2 and beyond.
http://www.tigr.org/
[top
of page]
Part 2: Other Useful Databases
Ready for more exploration and browsing? If not,
take a break. The databases in this section are quite varied
and specific for different kinds of use.
As you did in Part 1, you should keep track of
characteristics of these sites, unique features of interest,
and so forth.
A. The next Unit is on genomics, so it is a good
idea to preview some of the resources available in this
area. The Human genome database can be accessed
through a variety of entry sites. Try the following:
http://gdbwww.gdb.org/
http://www.ncbi.nlm.nih.gov/Entrez/
At Entrez, you can browse a map of the human genome by
clicking on the link in the right-hand column. You can
access all 800 genomes currently deposited in the
database by clicking on "Genomes" in the main list. Spend
a little time to explore ways of accessing
information.
B. The GenoList at L'Institut Pasteur is
definitely worth visiting. Although still small, this site
has grown significantly in the last year. Care and planning
has gone into the creation of easy access.
http://genolist.pasteur.fr/
SubtiList is the oldest, created as a
relational database with an intuitive map interface. As
you can see by this unique map, there was an
international effort to sequence the genome of
Bacillus subtilis. Mouse over the map and watch
the link bar on the bottom of your screen as you move
around. Click somewhere to see what happens. If you are
interested in bacteria, you can browse the other
databases at your leisure. [I did note that they let
a yeast into the group, so maybe the taxonomic range will
increase in the future.]
C. For other means of access to genomes
on-line, check out the following:
http://wit.integratedgenomics.com/GOLD/
http://www.tigr.org/tdb/
You may or may not have discovered the genomic
database resources on your first visit to TIGR. If you
did, you can move on. If not, take a few minutes to see
what is available.
D. As an immunologist, I just have to put
in one example of a narrow focus site. The Kabat
Database of proteins of immunological interest is a
worthy example of special interest databases.
http://immuno.bme.nwu.edu/
If you have had a course in immunology or are taking
it now, you should find this of interest and possibly
useful in the future. If not, well, maybe you learned
something anyway.
E. Challenging and interesting areas undergoing
incredible growth and development in terms of bioinformatics
and computational biology are ecology and biodiversity.
Check out the Environmental & Biodiversity
interface at SDSC:
http://biodi.sdsc.edu/
Browse some of the introductions and articles here to
get a feel for what is available and what is being
developed.
Visit also the Biological Records Centre, which
provides ecology database resources specific for the
British Isles. This gives you a taste of what could be
available elsewhere, and you should get an appreciation
of what a tremendous undertaking it will be.
http://www.brc.ac.uk/
F. Finally take a look at some resources for
metabolic pathways. We'll spend some time exploring
these further in Unit 6.
At Kyoto Encyclopedia of Genes and Genomes
[KEGG], check out KEGG's search and
computation tools
http://www.genome.ad.jp/kegg/
For an introduction to KEGG, read the
following paper:
http://igs-server.cnrs-mrs.fr/~ogata/Paper/ogata98BioSys.html
At the University of Oxford, there is access to
Pathway, a database of inherited metabolic
diseases: http://oxmedinfo.jr2.ox.ac.uk/Pathway/Miscell/welcome.htm
Although not recently updated, this site does have
some interesting features.
Another good site is What is There?
[WIT] database, part of Argonne
Computational Biology Group:
http://wit.mcs.anl.gov/WIT2/
ERGO is another database of metabolic interest,
sponsored by Integrated Genomics, Inc. http://wit.integratedgenomics.com/IGwit/
The metabolic part of SoyBase, part of
USDA's Plant Genome Project, is worth some exploration
also:
http://cgsc.biology.yale.edu/metab.html
[top of
page]
Summary Questions:
Try to limit your answers to 1-2 typed pages [12 pt
font].
1. As you explored the different sites in Parts 1
and 2, you likely found that you preferred some sites over
others. Keeping a critical eye out for what works for you
and what doesn't work as well can help in several ways. By
recognizing specific problems, you can often find
constructive solutions to making your navigation easier,
which you can share with others. When the opportunity
exists, you can give constructive feedback to the webmasters
of the sites.
a. In Part 1 you visited five interface
sites other than NCBI. Rank your top two choices of all
six sites. State the criteria you used in your ranking.
b. Now go back and rank your bottom choice of
the sites visited. State what you found to be problems
with this site. What would be needed to fix them?
c. In Part 2 you visited at least 13 database
sites. Rank you top three choices of all database sites
visited. State the criteria you used in your ranking.
d. Now go back and rank your bottom three
choices of the sites visited. State what you found to be
problems with these sites. What would be needed to fix
them?
2. Most people require a reasonable amount of time
to become comfortable with a given piece of software or a
complex web interface. There is generally some initial
discomfort when a web site is changed or software is
updated. In bioinformatics, one is often required to use
multiple tools to accomplish needed tasks, some local and
some web-based. For the sake of comfort and efficiency, many
will use only one interface or will keep the number if
interfaces to a minimum. However, it is useful to break out
once in awhile to see what is out there and to discover if
there is anything that could contribute to the task at
hand.
a. As you moved from site to site, what
strategies did you use to quickly orient yourself?
b. What are your preferences for navigation
within a web site? Are these preferences supported at
most of the sites you visited? If so, how did they
contribute to your ability to familiarize yourself with
any given site?
Further
exploration:
As you read and digest what you have just gone through,
go back and browse some of the sites in more detail. This
exercise is like one of those fast-paced package tours,
where you probably want to get out and leave the trail to
explore more independently. It also helps to do segments of
browsing to help solidify where you have been and what you
have been doing. The more time you can put into it, the more
you'll get back from your efforts.
Other sites to visit:
1. At the Southwest Biotechnology and
Informatics Center, there are 1000's of link
resources, including the game Origin: Unknown,
which provides a fun way to learn bioinformatics:
http://www.nbif.org/
Now that you've had some experience with BLAST, give
Origin: Unknown a try. To play, just click on
"Games" under Education. I suggest doing the individual
challenges in order the first time, since they build in
complexity. Beware that you'll be fired if you screw up,
but don't let that stop you. Just recycle back in and
you'll find that all is forgiven and you are back at
work.
Check out access to different databases and available
tools.
2. A concise resource list at NOAA
is nicely organized. This isn't an interface like the
others, but it includes many of the sites which you have
visited so far, and many that you haven't yet. Hence why
I've included it.
http://www.nwfsc.noaa.gov/bioinformatics.html
[top of
page]
.
|