|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
A38596 |
Maize |
plant |
|
CAC05496 |
Aradopsis |
plant |
|
AAC08261 |
Porphyra |
algae |
|
CAA76602 |
Plasmodium |
protozoa |
|
AAF49409 |
Drosophila |
animal |
|
CAB92626 |
Neurospora |
fungi |
|
CAA61505 |
Saccharomyces |
fungi |
|
BAB05447 |
Bacillus |
bacteria |
|
AAA58014 |
E. coli |
bacteria |
|
AAK94787 |
Klebsiella |
bacteria |
|
CAB64595 |
Nostoc |
bacteria |
|
AAG44102 |
Staphylococcus |
bacteria |
2. In your log, you need to remove the spaces on the line preceding ">gi|xxx..." for each entry. [Don't disturb the sequence lines.] This is necessary when running Clustal, because any extra spaces will terminate the alignment for all entries beyond those spaces.
3. At this point, you have two options. Choose the
one you like. [You can always return to try another
option.]
a. You can use ClustalW at EBI. You need to careful when you paste in your sequences that you remove any spaces at the beginning of lines. You may leave a blank line between sequence entries.http://www2.ebi.ac.uk/clustalw/ or http://www.ebi.ac.uk/clustalw/
Explore the site. You can read about the windows by clicking on them. [For the next steps, see 4b.] Once you have an alignment, you can save it, or transfer it to another application or to Biology Workbench.
b. You can upload your sequences directly into Biology Workbench before aligning them. Be sure to check each sequence entry for inadvertent spaces at the beginning of lines. If you find any, remove them. Within Workbench, you can use ClustalW in either Protein Tools or Nucleotide Tools [see 4c], and then use other applications in Alignment Tools for analysis.
4. Follow the directions according to the option you selected in 3 above.
a. Paste your grouped FASTA sequences into the text box. For your first run, use the defaults. The alignments will take a few minutes. You may want to enter your e-mail to retrieve the report. If you run fails, the first check to see if the FASTA format and the left-hand spaces are OK. If a run seems to take too long, try "off-hours", keeping in mind that this is a European site, or try making your alignment request smaller. You can do this by selecting only 4-6 sequences. Alternatively, you may want to focus on just one region or domain of your sequences. In that case, you can select portions of the FASTA reports. After using the defaults, try changing some of the settings after reading about them in the support pages. For report viewing, see 5 below.Note: When running Clustal on a set of sequences, you may need to edit your sequences before you get reasonable alignments. It doesn't hurt to try a test run first. As you work through the following, consider what might be some of the causes of misalignments. This will be discussed in class after you have some results in hand.
b. Select the sequences you want to compare by checking the boxes. Choose ClustalW. Initially accept the defaults. On repeated runs, try changing some of the settings. [Go to the EBI site for documentation support.] Try running subgroups of sequences and try changing the order of sequences. You can change order by selecting a sequence and choosing a menu item, then return. The selected sequence is at the top of the list. You can easily scramble your list by randomly selecting and copying different sequences. You can also create edited sequences to select a region or to remove nonstandard characters. To save alignments, select "Import alignments". Then you can use the Alignment Tools. [See 6 below. You should skip 5.]
5. For option b only. [Option b, skip to 6 below.] Once you have the report, browse to see what you have. Click on Jalview for a graphical display. Wait for the calculations and color assignment to be complete before trying to navigate. For your convenience, consensus notations and colors used in Jalview are assigned as follows:
Consensus line notations:
* = identical or
conserved residues in all sequences in the alignment
: = indicates conserved
substitutions
. = indicates semi-conserved
substitutions.
|
Characteristics: |
Amino acids: |
|
|
red: small & hydrophobic R groups |
AVFPMILW |
|
|
blue: acidic |
DE |
|
|
magenta: basic |
RHK |
|
|
green: hydroxyl + X |
STYHCNGQ |
|
|
gray: other |
||
Symbols for amino acids |
Compare the results of your different runs. Which
parameters did you change? What was the effect? Record for
future reference. Upload some alignments into Biology
Workbench.
6. At this point, everyone should have some aligned sequences in Biology Workbench. To check, select Alignment Tools after selecting the appropriate session. You should see blocks of sequences listed. If that is not true, go back and continue working on alignments and/or uploads until you do.
a. Use Boxshade and Textshade to easily view conserved and non-conserved regions. Note that these are similar, but not identical to Jalview. These can be saved and used as graphic inserts in reports and manuscripts. Use one of these to browse your alignments and to make comparisons between your different alignments.b. Make note of conserved regions.
Part B: Tree building
1. Neighbor joining [NJ] is a clustering method to group pairwise distances. It is the favored distance calculation method because equal rates of evolution are not assumed, as in the arithmetic approach. In Biology Workbench's Alignment Tools, try the following:
a. Use Clustaldist to obtain a set of distance calculations.b. Choose either DNAdist or Protdist, depending on whether you have nucleotide or protein alignments. Run the same alignments again to obtain a second set of distance results. How do these results differ from the first set? Which application appears to be more sensitive to differences?
c. Sketch a tree based on distance calculations obtained from Clustaldist. Sketch another tree based on the calculation results obtained from DNAdist or Protdist.
d. Use Drawtree to produce a PHYLIP unrooted tree. Compare this tree to your sketches.
2. Parsimony [also known as max pars, for maximum parsimony, and as MP] is a method which looks for the minimum number of changes which satisfy the data. It examines sequence comparisons rather than a numerical result, as in NJ.
Use DNAPars or ProtPars to generate a tree which maximizes parsimony. How many calculation steps were required to obtain the tree? Do different alignment runs affect the outcome of the final tree? If so, how?
3. Next try using Drawgram, a PHYLIP rooted tree tool. This allows you to build a variety of tree types from the same alignment. You can generate a phenogram, based on neighbor joining, which can then be compared to your Drawtree result. You can generate a cladogram, based on parsimony, which can then be compared to both the phenogram and to the tree obtained using DNAPars or ProtPars. If you are feeling adventurous, try out some of the other tree types.
4. OK, now you are at the point where computational intensity increases considerably. To try running maximum likelihood [ML] or Bayesian analysis on your alignments, it is recommended that you download suitable software, along with any server-stored alignments of interest and run them on your PC. This is required if you want to examine protein alignments.
For nucleotide alignments, you can use WebPHYLIP's DNAML to do maximum likelihood. This is a good site to explore for other programs within PHYLIP. Try it now, or come back while you are working on the project:
For recommended downloads and other sites to explore, go to Further Exploration below.
Try to limit your answers to 2-3 typed pages [12 pt font]. This length should be sufficient for your comments and any appropriate copy/pasted examples. [You need not retype or copy/paste the questions as part of your responses.]
1. Summarize one of you MSA results. Give the following information:
a. Which option did you use? ClustalW at EBI or ClustalW in Workbench?b. Were mutations evenly distributed or were there regions relatively free of them? What might be the reason for conserved regions? How could you test this?
2. Summarize the results obtained using distance
calculations. How did your sketched trees compare to the
computer-generated tree(s) in Drawtree?
3. Summarize the results obtained using parsimony.
How did the maximum parsimony tree(s) compare with the
NJ-based trees?
4. Summarize your explorations of Drawgram,
especially between creating phenograms and cladograms. After
exploring the other types of trees, which one did you like
best overall? Why? In considering the available choices for
a tree displayed in a publication aimed at a general
scientific readership, which type would you choose? Why?
[If you enjoyed playing with trees, try out Tree View,
which you can download. See below.]
5. Optional: If you ran maximum likelihood [ML] on you alignments, summarize how the results compared to NJ and MP.
1. More information on PHYLIP can be found at these related sites:
http://evolution.genetics.washington.edu/phylip.html http://evolution.genetics.washington.edu/phylip/phylipweb.html
2. A wide variety of phylogenetic software can be downloaded from collections. The following is very easy to use:
http://evolution.genetics.washington.edu/phylip/software.htmlThe cross-reference list is useful, although you should check elsewhere for possible updates:
http://evolution.genetics.washington.edu/phylip/software.xref.html
3. PUZZLE is a good program to download for maximum likelihood analysis of nucleotide and protein sequence alignments. This program has several cool features worth exploring:
http://www.tree-puzzle.de/Warning: While running ML, your computer will be dedicated to running this program. Forget playing games or checking your mail. Go for a run or take a nap if you are running a large set of sequences.
4. MrBayes is freeware to download for Bayesian analysis:
http://morphbank.ebc.uu.se/mrbayes/Warning: While running Bay, your computer will be dedicated to running this program. Although not as bad as ML for large sets, it is slower than ML for fewer sequences [roughly, less than 40]. For exploration purposes, use a small set, and then go enjoy a leisurely cup of hot chocolate.
5. Tree View, software download for drawing quality trees using a variety of file formats:
http://taxonomy.zoology.gla.ac.uk/rod/treeview.html
6. Another tree drawing program is Phylodenron. It can be downloaded from the U. Washington site in 2 above, or it can be accessed from a server:
http://iubio.bio.indiana.edu/treeapp/treeprint-form.html
7. Additional access to lots of cool applications:
http://bioweb.pasteur.fr/intro-uk.htmlAccess to good documentation on many applications: [left-hand frame- index] http://www.molbiol.ox.ac.uk/
.
|
|