Scoring Rubric for Unit 2 (Genomics) Exercises

Rubric:  One point for each complete, correct answer provided.  Partial 
credit of one half point for incomplete answer or partially correct answer.

Complete, correct answers are as follows:

Q1-1.	What is DNA sequencing and how is it done?
A.	The process of determining the exact order of the 3 billion chemical 
building blocks (called bases and abbreviated A, T, C, and G) that make up 
the DNA of the 24 different human chromosomes:  In the  Sanger sequencing 
method, the single-stranded DNA to be sequenced is "primed" for replication 
with a short complementary strand at one end. This preparation is then divided 
into four batches, and each is treated with a different replication-halting 
nucleotide (depicted here with a diamond shape), together with the four "usual" 
nucleotides. Each replication reaction then proceeds until a reaction-terminating 
nucleotide is incorporated into the growing strand, whereupon replication stops. 
Thus, the "C" reaction produces new strands that terminate at positions 
corresponding to the G's in the strand being sequenced.
	Gel electrophoresis -- one lane per reaction mixture -- is then used to 
separate the replication products, from which the sequence of the original 
single strand can be inferred. Alternatively, gel-based sequencers can use 
multiple tiny (capillary) tubes to run standard electrophoretic separations 
(Capillary Array Electrophoresis, CAE). DNA samples are introduced into the 
96-capillary array; as the separated fragments pass through the capillaries, 
they are irradiated all at once with laser light. Fluorescence is measured by 
a charged coupled device that acts as a simultaneous multichannel detector.  
Because every fragment length exists in the sample, bases are identified in 
order according to the time required for them to reach the laser-detector region.

Q1-2.	Describe in general terms the phases of a genome sequencing project.
A.	Selection of BAC clones for full sequencing (phase 0):  Chromosomes, which range 
in size from 50 million to 250 million bases, must first be fragmented into bacterial 
artificial chromomome (BAC) clones, which are then broken into subclones in a 
DNA-sequencing vector (subcloning step). Each short piece from a BAC subclone 
library is used as a template to generate a set of fragments that differ in 
length from each other by a single base that are identified by automated equipment 
(sequencing step).  Useful BAC clones are selected if they meet required standards 
based on this very-low-coverage, single-pass sequence analysis.
	Draft-quality sequencing of selected BAC clones (htgs_draft phase 1-2):  
Subclones are prepared and sequenced to represent the average base 4 to 5 
times, then DNA sequence contigs are assembled using computer software.  
At this stage there are usually multiple contigs, some of which may not be 
oriented.
	Additional sequencing of draft BAC clones:  More sequencing is done by shotgun 
to determine average base 8 to 10 times.  Assembly usually yields one or a small 
number of contigs (htgs_fulltop).  The fully topped up clone is typically finished 
by a sequence finisher who eliminates any remaining gaps and brings all parts to the 
required standard of quality (htgs_phase3).
	Finally the finished BAC clones are located on the chromosome using a 
combination of map-based and sequence-based genetic information (physical mapping phase).

Q1-3.	What is the difference between draft sequence and finished sequence?
A.	Finished sequence has been determined to an accuracy of at least 99.99% 
and has no gaps.  Draft sequence has only been positioned along the physical 
map of the chromosome.

Q1-4.	Explain how a physical DNA map is different from a genetic map.
A.	A physical DNA map is determined from ordering sequenced regions of DNA.  A genetic 
map is developed by determining the order of genetic "markers" by analysis of 
recombination frequency in genetic crosses or pedigrees.
	
Q1-5.	List five terms you looked up in the glossary, or wished you had.
A.	Any list of five terms is acceptable.

Q2-1.	What are NCBI reference sequences and why are these designations necessary?
A.	NCBI designates one single reference sequence to represent each specific segment 
of genomic DNA, each specific messenger RNA and each specific protein.  The refseq 
project also designates one reference for each completed genome, each complete 
chromosome, and each genomic sequence contig.  Designations are required to reduce 
redundancy in the databases, where up to dozens of individual sequences may be 
submitted from different laboratories.

Q2-2. What output did you get from your submission to the CAP3 sequence assembler? 
Briefly explain how the consensus sequence was derived.
A.	The output consists of several files, one of which shows the final alignment.  
The software  uses forward-reverse constraints to correct assembly errors and link 
contigs, calculates base quality values in alignment of sequence reads, and 
automatically clips 5' and 3' poor regions of reads. 

Q2-3. What gene-finding software is available in the EMBOSS suite?
A.	
Program	   Description
getorf	   Finds and extracts open reading frames (ORFs) 
marscan	   Finds MAR/SAR sites in nucleic sequences 
plotorf	   Plot potential open reading frames 
showorf	   Pretty output of DNA translations 
sixpack	   Display a DNA sequence with 6-frame translation and ORFs 
syco	   Synonymous codon usage Gribskov statistic plot 
tcode	   Fickett TESTCODE statistic to identify protein-coding DNA 
wobble	   Wobble base plot 

Q2-4. What is the accession number of the contig containing the musddA5 sequence? 
Briefly explain how you tracked this down, or where you got stuck.
A.	Contig: AL662875.14.1.255199  Individual answers.

Q2-5. Did Grail predict any genes in the contig or BAC sequence you submitted? 
If so, briefly describe the result. If not, why not?
A.	Individual answers.