|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Exercise 2: Multiple Sequence Alignment
|
|
A38596 |
Maize |
plant |
|
CAC05496 |
Aradopsis |
plant |
|
AAC08261 |
Porphyra |
algae |
|
CAA76602 |
Plasmodium |
protozoa |
|
AAF49409 |
Drosophila |
animal |
|
CAB92626 |
Neurospora |
fungi |
|
CAA61505 |
Saccharomyces |
fungi |
|
BAB05447 |
Bacillus |
bacteria |
|
AAA58014 |
E. coli |
bacteria |
|
AAK94787 |
Klebsiella |
bacteria |
|
CAB64595 |
Nostoc |
bacteria |
|
AAG44102 |
Staphylococcus |
bacteria |
2. In your log, you need to remove the spaces on the line preceding ">gi|xxx..." for each entry. [Don't disturb the sequence lines.] This is necessary when running Clustal, because any extra spaces will terminate the alignment for all entries beyond those spaces.
3. At this point, you have two options. Choose the
one you want to try. [You can always return to try
another option.
a. You can use a server-based version at EBI for ClustalW. You need to be careful when you paste in your sequences that you remove any spaces at the beginning of lines. You may leave a blank line between sequence entries.http://www2.ebi.ac.uk/clustalw/ or http://www.ebi.ac.uk/clustalw/
Explore the site. Note the default settings and what you are able to change. You can read about the windows by clicking on them. [For the next steps, see 4a.] Once you have an alignment, you can save it, or transfer it to another application or to Biology Workbench.
b. You can upload your sequences directly into Biology Workbench before aligning them. Be sure to check each sequence entry for inadvertent spaces at the beginning of lines. If you find any, remove them. Within Workbench, you can use ClustalW in either Protein Tools or Nucleotide Tools [see 4b], and then use other applications in Alignment Tools for analysis.
4. Follow the directions according to the option you selected in 3 above.
Note: When running Clustal on a set of sequences, you may need to edit your sequences before you get reasonable alignments. It doesn't hurt to try a test run first. As you work through the following, consider what might be some of the causes of misalignments. This will be discussed in class after you have some results in hand.a. Paste your grouped FASTA sequences into the text box. For your first run, use the defaults. The alignments will take a few minutes. You may want to enter your e-mail to retrieve the report. If you run fails, the first check to see if the FASTA format and the left-hand spaces are OK. If a run seems to take too long, try "off-hours", keeping in mind that this is a European site, or try making your alignment request smaller. You can do this by selecting only 4-6 sequences. Alternatively, you may want to focus on just one region or domain of your sequences. In that case, you can select portions of the FASTA reports. After using the defaults, try changing some of the settings after reading about them in the support pages. For report viewing, see 5 below.
b. Select the sequences you want to compare by checking the boxes. Choose ClustalW. Initially accept the defaults. On repeated runs, try changing some of the settings. [Go to the EBI site for documentation support.] Try running subgroups of sequences. You can also create edited sequences to select a region or to remove nonstandard characters. To save alignments, select "Import alignments". Then you can use the Alignment Tools. [See 6 below. You should skip 5.]
5. For option a above only. [Option b, skip to 6 below.] Once you have the report, browse to see what you have. Click on Jalview for a graphical display. Wait for the calculations and color assignment to be complete before trying to navigate. For your convenience, consensus notations and colors used in Jalview are assigned as follows:
Consensus line notations:
* = identical or
conserved residues in all sequences in the alignment
: = indicates conserved
substitutions
. = indicates semi-conserved
substitutions.
|
Characteristics: |
Amino acids: |
|
|
red: small & hydrophobic R groups |
AVFPMILW |
|
|
blue: acidic |
DE |
|
|
magenta: basic |
RHK |
|
|
green: hydroxyl + X |
STYHCNGQ |
|
|
gray: other |
||
Symbols for amino acids |
Compare the results of your different runs. Which
parameters did you change? What was the effect? Record for
future reference. Upload some alignments into Biology
Workbench.
6. At this point, everyone should have some aligned sequences in Biology Workbench. To check, select Alignment Tools after selecting the appropriate session. You should see blocks of sequences listed. If that is not true, go back and continue working on alignments and/or uploads until you do.
a. Use Boxshade and Textshade to easily view conserved and non-conserved regions. Note that these are similar, but not identical to Jalview. These can be saved and used as graphic inserts in reports and manuscripts. Use one of these to browse your alignments and to make comparisons between your different alignments.b. Make note of conserved regions. Hypothesize the reason they are conserved. How can you test your hypothesis?
7. So you tried using different combinations of subgroups of sequences as a way to cross-check your alignments. What else can you do? You can compare some select pairwise alignments to see how they compare to your MSAs. Use either BLAST2 [BL2SEQ in Biology Workbench] or ALIGN [in Workbench under Nucleotide Tools and Protein Tools] to compare pairs of sequences. Try both pairs which appear to be closely related and pairs which are distantly related. Do your alignments agree?
Try to limit your answers to 1-2 [12 pt font]. This length should be sufficient for your comments and any appropriate copy/pasted examples. [You need not retype or copy/paste the questions as part of your responses.]
Summarize one of you MSA results. Give the following information:
a. What group of protein or nucleotide sequences did you select?b. Were mutations evenly distributed or were there regions relatively free of them? What might be the reason for conserved regions? How could you test this?
c. What differences did you notice when you changed the matrices selected for alignment?
d. Did running only selected subgroups of sequences affect the alignment results? If so, how?
e. How did pairwise alignments compare to the multiple alignment?
.
|
|