Scoring Rubric for Unit 2 (Genomics) Exercises Rubric: One point for each complete, correct answer provided. Partial credit of one half point for incomplete answer or partially correct answer. Complete, correct answers are as follows: Q1-1. What is DNA sequencing and how is it done? A. The process of determining the exact order of the 3 billion chemical building blocks (called bases and abbreviated A, T, C, and G) that make up the DNA of the 24 different human chromosomes: In the Sanger sequencing method, the single-stranded DNA to be sequenced is "primed" for replication with a short complementary strand at one end. This preparation is then divided into four batches, and each is treated with a different replication-halting nucleotide (depicted here with a diamond shape), together with the four "usual" nucleotides. Each replication reaction then proceeds until a reaction-terminating nucleotide is incorporated into the growing strand, whereupon replication stops. Thus, the "C" reaction produces new strands that terminate at positions corresponding to the G's in the strand being sequenced. Gel electrophoresis -- one lane per reaction mixture -- is then used to separate the replication products, from which the sequence of the original single strand can be inferred. Alternatively, gel-based sequencers can use multiple tiny (capillary) tubes to run standard electrophoretic separations (Capillary Array Electrophoresis, CAE). DNA samples are introduced into the 96-capillary array; as the separated fragments pass through the capillaries, they are irradiated all at once with laser light. Fluorescence is measured by a charged coupled device that acts as a simultaneous multichannel detector. Because every fragment length exists in the sample, bases are identified in order according to the time required for them to reach the laser-detector region. Q1-2. Describe in general terms the phases of a genome sequencing project. A. Selection of BAC clones for full sequencing (phase 0): Chromosomes, which range in size from 50 million to 250 million bases, must first be fragmented into bacterial artificial chromomome (BAC) clones, which are then broken into subclones in a DNA-sequencing vector (subcloning step). Each short piece from a BAC subclone library is used as a template to generate a set of fragments that differ in length from each other by a single base that are identified by automated equipment (sequencing step). Useful BAC clones are selected if they meet required standards based on this very-low-coverage, single-pass sequence analysis. Draft-quality sequencing of selected BAC clones (htgs_draft phase 1-2): Subclones are prepared and sequenced to represent the average base 4 to 5 times, then DNA sequence contigs are assembled using computer software. At this stage there are usually multiple contigs, some of which may not be oriented. Additional sequencing of draft BAC clones: More sequencing is done by shotgun to determine average base 8 to 10 times. Assembly usually yields one or a small number of contigs (htgs_fulltop). The fully topped up clone is typically finished by a sequence finisher who eliminates any remaining gaps and brings all parts to the required standard of quality (htgs_phase3). Finally the finished BAC clones are located on the chromosome using a combination of map-based and sequence-based genetic information (physical mapping phase). Q1-3. What is the difference between draft sequence and finished sequence? A. Finished sequence has been determined to an accuracy of at least 99.99% and has no gaps. Draft sequence has only been positioned along the physical map of the chromosome. Q1-4. Explain how a physical DNA map is different from a genetic map. A. A physical DNA map is determined from ordering sequenced regions of DNA. A genetic map is developed by determining the order of genetic "markers" by analysis of recombination frequency in genetic crosses or pedigrees. Q1-5. List five terms you looked up in the glossary, or wished you had. A. Any list of five terms is acceptable. Q2-1. What are NCBI reference sequences and why are these designations necessary? A. NCBI designates one single reference sequence to represent each specific segment of genomic DNA, each specific messenger RNA and each specific protein. The refseq project also designates one reference for each completed genome, each complete chromosome, and each genomic sequence contig. Designations are required to reduce redundancy in the databases, where up to dozens of individual sequences may be submitted from different laboratories. Q2-2. What output did you get from your submission to the CAP3 sequence assembler? Briefly explain how the consensus sequence was derived. A. The output consists of several files, one of which shows the final alignment. The software uses forward-reverse constraints to correct assembly errors and link contigs, calculates base quality values in alignment of sequence reads, and automatically clips 5' and 3' poor regions of reads. Q2-3. What gene-finding software is available in the EMBOSS suite? A. Program Description getorf Finds and extracts open reading frames (ORFs) marscan Finds MAR/SAR sites in nucleic sequences plotorf Plot potential open reading frames showorf Pretty output of DNA translations sixpack Display a DNA sequence with 6-frame translation and ORFs syco Synonymous codon usage Gribskov statistic plot tcode Fickett TESTCODE statistic to identify protein-coding DNA wobble Wobble base plot Q2-4. What is the accession number of the contig containing the musddA5 sequence? Briefly explain how you tracked this down, or where you got stuck. A. Contig: AL662875.14.1.255199 Individual answers. Q2-5. Did Grail predict any genes in the contig or BAC sequence you submitted? If so, briefly describe the result. If not, why not? A. Individual answers.