Academia.eduAcademia.edu

The Oxytricha trifallax mitochondrial genome

2012, Genome biology and evolution

The Oxytricha trifallax mitochondrial genome contains the largest sequenced ciliate mitochondrial chromosome (~70 kb) plus a ~5 kb linear plasmid bearing, mitochondrial telomeres. We identify two new ciliate split genes (rps3 and nad2) as well as four new mitochondrial genes (ribosomal small subunit protein genes: rps-2, 7, 8, 10), previously undetected in ciliates due to their extreme divergence. The increased size of the Oxytricha mitochondrial genome relative to other ciliates is primarily a consequence of terminal expansions, rather than the retention of ancestral mitochondrial genes. Successive segmental duplications, visible in one of the two Oxytricha mitochondrial subterminal regions, appear to have contributed to the genome expansion. Consistent with pseudogene formation and decay, the subtermini possess shorter, more loosely packed ORFs than the remainder of the genome. The mitochondrial plasmid shares a 251 bp region with 82% identity to the mitochondrial chromosome, suggesting that it most likely integrated into the chromosome at least once. This region on the chromosome is also close to the end of the most terminal member of a series of duplications, hinting at a possible association between the plasmid and the duplications. The presence of mitochondrial telomeres on the mitochondrial plasmid suggests that such plasmids may be a vehicle for Downloaded from 2 lateral transfer of telomeric sequences between mitochondrial genomes. We conjecture that the extreme divergence observed in ciliate mitochondrial genomes may be due, in part, to repeated invasions by relatively error-prone DNA polymerase-bearing mobile elements.

Genome Biology and Evolution Advance Access published December 16, 2011 doi:10.1093/gbe/evr136 The Oxytricha trifallax mitochondrial genome Authors and Affiliations: Estienne C. Swart1, Mariusz Nowacki1§, Justine Shum1, Heather Stiles1, Brian P. Higgins1, Thomas G. Doak2, Klaas Schotanus1, Vincent J. Magrini3, Patrick Minx3, Elaine R Mardis3, Laura F. Landweber1 1. Department of Ecology and Evolutionary Biology, Princeton University, Princeton NJ 08544 2. Department of Biology, University of Indiana, Bloomington IN 47405 3. Genome Sequencing Center, Washington University School of Medicine, St Louis MO 63108 § current address: Institute of Cell Biology, University of Bern, Switzerland * Author for correspondence: Prof. Laura Landweber, Department of Ecology and Evolutionary Biology, Princeton University, NJ 08544, USA, tel. 609-258-1947, lfl@princeton.edu The Oxytricha trifallax mitochondrial genome contains the largest sequenced ciliate mitochondrial chromosome (~70 kb) plus a ~5 kb linear plasmid bearing, mitochondrial telomeres. We identify two new ciliate split genes (rps3 and nad2) as well as four new mitochondrial genes (ribosomal small subunit protein genes: rps- 2, 7, 8, 10), previously undetected in ciliates due to their extreme divergence. The increased size of the Oxytricha mitochondrial genome relative to other ciliates is primarily a consequence of terminal expansions, rather than the retention of ancestral mitochondrial genes. Successive segmental duplications, visible in one of the two Oxytricha mitochondrial subterminal regions, appear to have contributed to the genome expansion. Consistent with pseudogene formation and decay, the subtermini possess shorter, more loosely packed ORFs than the remainder of the genome. The mitochondrial plasmid shares a 251 bp region with 82% identity to the mitochondrial chromosome, suggesting that it most likely integrated into the chromosome at least once. This region on the chromosome is also close to the end of the most terminal member of a series of duplications, hinting at a possible association between the plasmid and the duplications. The presence of mitochondrial telomeres on the mitochondrial plasmid suggests that such plasmids may be a vehicle for 1 ª The Author(s) 2011. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/ 3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 Abstract lateral transfer of telomeric sequences between mitochondrial genomes. We conjecture that the extreme divergence observed in ciliate mitochondrial genomes may be due, in part, to repeated invasions by relatively error-prone DNA polymerase-bearing mobile elements. Keywords: split genes, segmental duplication, genome expansion, linear mitochondrial plasmid, mobile elements, extreme mitochondrial divergences Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 2 Introduction While ciliates are well known for their dimorphic macronuclear and micronuclear nuclear genomes, they also possess distinctive genomes in their mitochondria. The Paramecium and Tetrahymena mitochondrial genomes were among the first confirmed to be linear and to have their telomeric sequences identified (Suyama and Miura 1968; Goddard and Cummings 1975; Morin and Cech 1986; kb) (Gray et al. 1998), though many mitochondrial genes remain unclassified (Pritchard et al. 1990; Burger et al. 2000; Brunk et al. 2003; Moradian et al. 2007), in part due to their extreme divergences from other eukaryotic mitochondrial genomes (Pritchard et al. 1990; Burger et al. 2000; Moradian et al. 2007). Split ribosomal RNA and nad1 genes (Seilhamer et al. 1984; Schnare et al. 1986; Heinonen et al. 1987; Pritchard et al. 1990; Schnare et al. 1995; Burger et al. 2000) were also discovered in ciliate mitochondria. Some anaerobic ciliates contain hydrogen-producing organelles, or hydrogenosomes, that may derive from mitochondria, and the ciliate Nyctotherus (Armophorea) has a partially sequenced hydrogenosome genome (Akhmanova et al. 1998; Boxma et al. 2005). Nyctotherus may be more closely related to Euplotes and Oxytricha (Spirotrichea) than to Paramecium and Tetrahymena (Oligohymenophorea), though this relationship still lacks convincing phylogenetic support (Ricard et al. 2008; de Graaf et al. 2009). Comparison of the mitochondrial and hydrogenosome genomes will permit examination of these relationships. Sequencing and assembly of the macronuclear genome of Oxytricha trifallax also yielded most of its mitochondrial genome, which we completed by PCR and sequencing. With the addition of this genome and the availability of complete mitochondrial genomes from two different ciliate phyla – the spirotrichs Oxytricha and Euplotes and oligohymenophorans Paramecium and Tetrahymena – detailed 3 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 Morin and Cech 1988). Ciliate mitochondrial genomes are both gene-rich and relatively large (20-60 comparative genomic studies of ciliate mitochondria are now possible. Materials and Methods DNA isolation as described in Dawson and Herrick (Dawson et al. 1982) resulted in partially purified macronuclei, which were then used to produce libraries for Sanger and 454 sequencing from various the majority of the mitochondrial genome sequence; assembly with the Newbler (proprietrary: www.454.com) produced two large mitochondrial contigs from pooled 454 and Sanger sequence data (currently represented by Contig4281.1 and Contig4553.1 from the 2.1.8 assembly). Additional sequences from PCR products amplified across the missing regions completed the mitochondrial genome sequence. We completed the mitochondrial genome assembly using these additional Sanger sequences, plus smaller contigs not originally merged in the two large contigs, using the Geneious software's assembler (Drummond et al. 2009). To investigate the size of the mitochondrial plasmid, DNA was separated on an ethidium-bromidestained agarose gel, depurinated in-gel (0.25% HCl 15 min; washed in 0.4M NaOH for 15 min) and transferred to Hybond XL membrane (Amersham) in 0.4M NaOH using a Nytran TurboBlotter (Schleicher & Schuell). Labeled probe was generated by means of random priming (RadPrime, Invitrogen) of a wild-type Oxytricha strain JRB310 cloned PCR product. After overnight hybridization at 60ºC (0.5M NaPO4, pH 7.2, 1% BSA, 1mM EDTA, 7% SDS) the membrane was washed in 0.2x SSC with 0.1% SDS (30 min, 60ºC), and visualized on a GE Healthcare Storm 840 Phosphorimager. 4 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 populations of size-selected DNA. Mitochondrial DNA present in the libraries permitted recovery of To investigate the rRNA split gene structure in Oxytricha trifallax, RNA was isolated from O. trifallax strain JRB 310 using TRIzol according to the manufacturer’s specifications (Invitrogen, Carlsbad, CA) and treated with Ambion DNA-free (Austin, TX) to remove contaminating DNA. Clean RNA was tailed with GTP in a standard reaction (1X NEB Buffer 2, 1mM GTP, 5µg RNA, and 2 U Poly (U) Polymerase [New England Biolabs; Ipswich, MA]) at 37°C for 10 minutes. A reverse transcriptase reaction was performed using 1.6 µg tailed RNA and the Invitrogen SuperScript III First-Strand Synthesis System (Carlsbad, CA) with a UXR C12D primer (Horton and Landweber 2000). PCR was TCGGAATGAACGCGAGCGGA-3’), 200 nM UXR anchor primer (5’CATCATCATCATCTCGAGAATT-3’), ~80ng cDNA, and 1.25 U Taq polymerase (Roche Applied Science; Indianapolis, IN). The PCR reaction conditions were: one cycle of 95°C for 2 minutes, 35 cycles of 95°C for 10 seconds, 58°C for 10 seconds, and 72°C for 30 seconds, before a final extension at 72°C for 5 minutes. The same PCR was performed with an extension time of 60 seconds rather than 30 seconds for the primer pair rnsbF (5’-AGTTGCTCTGAAAGGTCGGACAA-3’) and UXR anchor, as well as the pair rnlaF (5’-CATTAAGTGGATGCCTATATATTGAATG-3’) and UXR anchor. Aliquots of each reaction were visualized in 2% agarose gels with SyBr Green using a Typhoon imager (GE Healthcare, Waukesha, WI). PCR products corresponding to expected sizes were cloned into plasmid pSC-Amp/Kan using the StrataClone PCR cloning kit (Stratagene; Santa Clara, CA) and sequenced. Protein ORFs were identified using a combination of BLAST homology to either the NCBI nrdb or "mitochondrial" proteins from Uniprot (The Uniprot Consorium 2011), when homology could be identified. ORFs were also predicted where no homology was detected by a custom python script, which provides a sliding window score for the probability of being a coding sequence and automatic 5 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 performed using 1X buffer, 10mM each dNTP, 1.5 mM MgCl2, 200 nM primer rnsaF (5’- ORF predictions in Geneious (Drummond et al. 2009). Since we do not know which start codons are employed in the Oxytricha mitochondrial genome, we have predicted start codons based on the start codons used in Tetrahymena (ATG, ATA, ATT, GTG, TTG). Predicted ORFs were at least 150 bp long. We were able to detect homologs of most of the ciliate mitochondrial protein coding genes using that are so divergent that they fall within or beyond the "twilight zone" of protein sequence similarity (Rost 1999), where BLAST searches alone are unable to detect homology. An additional complication in these genomes is the presence of split genes, which may reduce sequence search sensitivity by shortening the regions available for local sequence alignment. We therefore used the more sensitive search technique provided by the HHpred web server (Soding et al. 2005), which uses a combination of PSI-BLAST (Altschul et al. 1997) and HHsearch (an HMM-profile based search tool; Soding 2005). The latter tool is one of the fastest protein structure prediction tools with reasonable prediction accuracy (Hildebrand et al. 2009) and was recently useful in identifying an additional ciliate mitochondrial gene containing an rps3 C-terminal domain (de Graaf et al. 2009). We also used Quickphyre (Kelley and Sternberg 2009) with default parameters to attempt to find homologs for a limited number of ORFs. Two additional techniques assisted us in classifying "unknown" ORFs in Oxytricha and other ciliates: transitive homology relationships ("chains of homology"; Brunk et al. 2003), and the inference of orthology based on extensive synteny within spirotrichs and oligohymenophorans (and, to a lesser degree, between these classes). We used tRNAscan-SE (Lowe and Eddy 1997) with default parameters and the “mitochondrial/chloroplast” source option to identify tRNAs in the Oxytricha mitochondrial genome. 6 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 conventional BLAST-based homology searches. However, there are still a number of additional genes Quikfold, from the UNAfold package on the DINAMelt (Markham and Zuker 2005) server, was used to predict the mitochondrial plasmid DNA hairpins with the temperature set to 20˚C, [Na+] 1 M, and [Mg2+] = 0 M. Transmembrane helices were predicted using THMM2 (Krogh et al. 2001) with default parameters. PAML version 3.15 was used to estimate dn/ds ratios (Yang 1997), in pairwise run mode with standard Results Structure of the ciliate mitochondrial chromosome The ~70 kb Oxytricha trifallax mitochondrial genome shares a number of structural features with the existing ciliate mitochondrial genomes (Figure 1; Genbank accession JN383842). As in the Tetrahymena and Euplotes mitochondrial genomes, the Oxytricha mitochondrial genes are predominantly or exclusively arranged in two transcriptional directions, diverging from an approximately central location, while both the Paramecium mitochondrial genome and Nyctotherus hydrogenosome genome have one primary direction of transcription (Figure 1). The Oxytricha mitochondrial DNA has a relatively high AT content (76%, excluding telomeres) as is typical for mitochondria in general (Gray et al. 2004). To date there seems to be little taxonomic consistency in 7 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 parameters, except that the genetic code was set to translation table=4. mitochondrial genome base composition within ciliates: Tetrahymena pyriformis has an AT content similar to O. trifallax at 79% (Burger et al. 2000), while P. tetraurelia (Pritchard et al. 1990) and E. minuta (de Graaf et al. 2009) have a considerably lower AT content at 59% and 64% respectively. The Nycotherus hydrogenosome DNA (Genbank accession: GU057832.1) is also less AT rich (58.5%). In all ciliate mitochondrial genomes, including that of Oxytricha, there is either a central (in Tetrahymena (Burger et al. 2000), Euplotes (de Graaf et al. 2009), and the hydrogenosome of Nycotherus (de Graaf et al. 2011)) or terminal (in Paramecium (Pritchard et al. 1990)) region bearing Cummings 1975; Goddard and Cummings 1977) and Tetrahymena (Arnberg et al. 1974) the AT-rich region coincides with the origin of DNA replication (in Tetrahymena it is contained within the largest mitochondrial ORF, ymf77, encoding 1386 aa protein of unknown function (which is translated (Smith et al. 2007)). The Tetrahymena paravorax mitochondrial genome contains the longest AT-rich stretch, ~1 kb of 96.5% AT sequence adjacent to the major site of change in transcription direction (Moradian et al. 2007). The central repeats in T. pyriformis, T. pigmentosa and T. malaccensis are shorter, at a few hundred base pairs each. In Oxytricha, the central region is a ~140 bp long stretch of pure AT, composed of degenerate repeats of the unit (written as a POSIX regular expression): ((AAAT)+(AT)+){4,}) which contains stretches of potentially self-complementary repeats, typically palindromes such as TATA, TATATA and TATATATA. The presence of DNA structures that would be refractory to DNA polymerase is indicated by our difficulty in amplifying across this region using conventional PCR. The Euplotes > 1 kb central repeat region is more GC-rich than that of the other ciliate mitochondrial regions (~83.5% AT), and is comprised of semi-palindromic 18 bp repeats (TANNATGTATACATNNTA). Paramecium possesses pure AT repeats in its terminal region (TATTTATTAAAATAAAAAAATATAAATATATTAA). Nyctotherus’s hydrogenosome repeat is considerably more GC rich (46.7% AT) than all the other ciliate mitochondrial genome repeats. Since 8 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 low sequence complexity repeats. In Paramecium (Pritchard and Cummings 1981; Goddard and the hydrogenosome genome is still incomplete, it is possible that terminal AT rich repeats are missing from this genome. The O. trifallax mitochondrial genome is capped by telomeres consisting of 35 bp repeats of CGACTCCTCTATCCTCATCCTAGACTCCGCTTACT, with an unknown repeat number (the longest assembled mitochondrial telomeric repeat consists of approximately 15 repeat units) and appears to be linear, like the mitochondrial genome of Tetrahymena. As in Tetrahymena and telomeric repeat units are in the same size range as those for a variety of Tetrahymena species (Morin and Cech 1988) (31-53 bp), but more GC-rich (51.4% vs. 26.0-40.0%). No sequence data for similar telomeric repeats has been published for Paramecium, for which a different end-replication model, based on cross-links between the two DNA strands, has been proposed (Pritchard and Cummings 1981; Nosek et al. 1998), nor for Euplotes or Nyctotherus. Like Tetrahymena, the Oxytricha mitochondrial genome also has a terminal inverted repeat just inside the telomeric repeats, comprised of a somewhat smaller region (~1800 bp; 87.8% identical, including a 96 bp indel) (Figure 1) than Tetrahymena’s (~2680 bp). This region is roughly bounded by a trnC and a putative trnC pseudogene (trnC-!). The Tetrahymena inverted repeat is largely comprised of the large subunit ribosomal RNAs and also contains tRNAs, including trnL paralogs, while Oxytricha’s appears to be largely comprised of protein-coding orfs of unknown function. The presence of potentially unrelated terminal inverted duplicated genes in both Oxytricha and Tetrahymena suggests that this region may be an important source for gene duplications in these genomes. Aside from ciliates, terminal inverted repeats (TIRs) are characteristic of many linear mitochondrial genomes from diverse eukaryotes, including yeasts, such as Pichia pijperi (~1.8 kb) and Willopsis saturnus (~1.9 kb) 9 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 Paramecium, we have found no macronuclear genome-encoded telomerase RNA with this repeat. The (Dinouel et al. 1993); chytridomycete fungi, such as Hyaloraphidium curvatum (~1.4 kb) (Forget et al. 2002); cnidarians, such as Hydra oligactis (Kayal and Lavrov 2008); slime molds, such as Physarum polycephalum (~ 0.6 kb) (Takano et al. 1994); and unicellular green algae, such as Chlamydomonas reinhardtii (580 bp) and Polytolmella parva, which have a well-conserved terminal inverted repeat of ~1.5 kb shared by the 4 ends of its bipartite mitochondrial genome (Fan and Lee 2002). TIRs appear to be a common characteristic not only of mitochondrial genomes, but of many linear eukaryotic and bacterial plasmids as well (Meinhardt et al. 1990; Meinhardt et al. 1997), and have been proposed to be Vahrenholz et al. 1993). Ciliate mitochondrial genome synteny The levels of inter- and intra-clade synteny (Figure 1) of the mitochondrial genomes of the 4 genera representing two ciliate classes, Spirotrichea (Oxytricha and Euplotes) and Oligohymenophorea (Tetrahymena and Paramecium), are consistent with current taxonomic classification. There is extensive collinearity within both the spirotrich and oligohymenophorean mitochondrial genomes, with the amount of collinearity between the mitochondrial genomes of Tetrahymena and Paramecium (Burger et al. 2000) comparable to that between Oxytricha and Euplotes. The mitochondrial genomes of Tetrahymena and Paramecium are largely collinear, with the exception of one large inversion and translocation (nad9-ymf76 in Tetrahymena; nad9-orf105 in Paramecium) (Burger et al. 2000); whereas the Oxytricha and Euplotes genomes are largely collinear in the core region, from nad3 to rnl. The decrease in collinearity between classes reflects the more ancient 10 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 a solution to the end-replication problem for linear mitochondrial molecules (Dinouel et al. 1993; divergence of the two classes. The ciliate mitochondrial gene order is fairly static, considering the ancient evolutionary divergences of these species (as much as 2 billion years since the divergence of oligohymenophorans from spirotrichs (Wright and Lynn 1997)). We propose that the relatively static mitochondrial genome synteny could be exploited as an additional useful classification tool at higher ciliate taxonomic levels. A large region of six mostly adjacent genes, rps4/ymf76 (T. pyriformis or ymf81 and ymf85 in P. genomes (Figure 1). In T. pyriformis and P. tetraurelia this region also includes 'rps3' (Brunk et al. 2003) (now classified as rps3_a; see “protein-coding genes”) and an unknown gene. The nad7 and rps14 genes are also adjacent in all the ciliate mitochondrial genomes, but are in the inverted transcription direction in the oligohymenophoran mitochondrial genomes. After accounting for the inversion in the Paramecium mitochondrial genome, extensive collinearity is still present between the oligohymenophoran and spirotrich genomes (trnM, rnl, rpl14, nad6, cox1, trnW, rns, the AT-rich replication origin/transcription initiation region, nad9, and rpl16, and the largely adjacent rps4, rps13, rps19, rpl2, nad10, rps12 genes). Taken as a whole, extensive collinearity between the Oxytricha, Euplotes and Tetrahymena mitochondrial genomes–plus the observation of the most extensive genome reduction in Paramecium–points to Paramecium possessing a derived ciliate mitochondrial genome form. There is limited collinearity (e.g. rpl2, nad7, rps14, rps8, rpl6) between the spirotrich mitochondrial genomes and the Nyctotherus hydrogenosome genome which may reflect the tumult of the change from a mitochondrion to a hydrogenosome. 11 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 tetraurelia), rps13, rps19, rpl2, nad10 and rps12, is present in all sequenced ciliate mitochondrial Mitochondrial Genome Gene content Protein-coding genes largely the same complement of known protein coding genes. The partially sequenced hydrogenosome genome from Nycotherus contains a subset of Oxytricha mitochondrial protein-coding genes: Nyctotherus has lost genes required for aerobic metabolism, in particular the cox genes (Boxma et al. 2005; de Graaf et al. 2011). Details of the identification and annotation of previously undiscovered or unannotated proteins in ciliate mitochondrial genomes and the Nycotherus hydrogenosomal genome are provided in Supplementary Table 1. In total, we have been able to annotate 7 previously unidentified genes in Euplotes, 6 in Tetrahymena and Paramecium, and 3 in Nyctotherus. Oxytricha’s complement of small ribosomal proteins, in particular, is fairly complete, compared to other protist repertoires (Gray et al. 2004): all ribosomal proteins except for rps1 and rps11 have been identified in all four sequenced ciliate mitochondrial genomes. We found homologs for all but one (ymf61) of the Tetrahymena putative ribosomal proteins, for which no homologs were found using conventional BLAST searches (Brunk et al. 2003). The fact that 3 of the newly-classified ribosomal proteins (rps4, 7, 10) are commonly encoded in protist mitochondrial genomes (Gray et al. 2004), but were missing from the Tetrahymena mitochondrial proteome survey (Smith et al. 2007) (which would have detected nuclear versions of these proteins, had they been transferred to the nucleus), instills confidence in these gene predictions. With the addition of these small subunit ribosomal proteins, most 12 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 As can be seen in Table 1, with the exception of a few gene losses, ciliate mitochondrial genomes share of the common mitochondrially-encoded protist ribosomal proteins (Gray et al. 2004) appear to have been discovered, in the mitochondrial or nuclear genomes in ciliates. Our annotations of a number of previously-unannotated Tetrahymena mitochondrial-encoded genes, plus the availability of proteomic and bioinformatic identification of nuclear-encoded Tetrahymena mitochondrial genes (Smith et al. 2007), indicates that the remainder of the unknown Tetrahymena ORFs are largely non-ribosomal, in agreement with a previous study (Brunk et al. 2003). These to detect homology to known proteins. The sequencing of additional ciliate mitochondrial genomes (particularly ciliate classes other than spirotrichs and oligohymenophoreans) may be beneficial in identifying the remaining unidentified genes in ciliate mitochondrial genomes, especially if HMMHMM profile comparison tools (such as HHpred) are used to improve the information content and quality of the alignments underlying the query HMM profiles. Split protein-coding genes in ciliates rps3: The Euplotes rps3 is unusually long (767 and 768 amino acids for E. minuta and E. crassus respectively) in comparison to the rps3 orthologues found in the Oxytricha (~349 aa), Tetrahymena (330 aa) and Paramecium (234 aa), and was show to contain the C-terminal domain of rps3 in the 5’terminal half of this gene (de Graaf et al. 2009) (Figure 2). The 3’ half of the Euplotes gene has no detectable similarity to rps3. In Oxytricha, this same gene is divided into a shorter, 5’-terminal portion containing the C-terminal rps3 domain, followed by a longer portion of unknown function. We 13 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 unknown ORFs could be novel mitochondrial proteins or proteins that have diverged beyond our ability identified an Oxytricha homolog to a gene previously classified as rps3 (Burger et al. 2000) in Tetrahymena and Paramecium, but disputed as such (Brunk et al. 2003). As for Tetrahymena and Paramecium, and unlike Euplotes (de Graaf et al. 2009), HHpred predicts with high probability (5.1e06 for Oxytricha) that an N-terminal rps3 domain is present in the mitochondrial genome, in an ORF that we label rps3_a. It is possible that the rps3 N-terminal domain is encoded in a missing portion of the Euplotes mitochondrial genome. It therefore appears that this is another split gene present in most, if not all, sequenced ciliate mitochondrial genomes. Accordingly, the previously disputed rps3 (N- consistent with the split gene nomenclature in Burger et al. (2000). The long gene annotated as rps3 in Euplotes may represent either a novel gene fusion or an incorrect annotation due to sequencing errors. The sum of the lengths of the Oxytricha, Tetrahymena and Paramecium rps3 domains is roughly consistent with typical Uniprot rps3 entries (e.g. ~480 aa for Tetrahymena), although there is considerable length variation in rps3 among species–even within fungi alone (Sethuraman et al. 2009). Some of the rps3 genes we inspected appear to be missing a domain (e.g. we found no N-terminal rps3 domain in Schizosaccharomyces pombe). It appears that there is some flexibility in the intervening rps3 domain spacer: in humans the intervening spacer between the N- and C-terminal domains contains a single-stranded nucleic acid binding domain (KH domain) required for stable NF-!B regulatory complex binding, an extra-ribosomal function (Wan et al. 2007). In plants, the N- and C-terminal rps3 domains are separated by a domain of unknown function (Smits et al. 2007). One other case of a split arrangement of the N- and C-terminal rps3 domains has been documented in the slime mold Dictyostelium, where the domains are separated by long peptide sequences of unknown function (Iwamoto et al. 1998; Smits et al. 2007). Unlike Dictyostelium, the ciliate split rps3 ORFs are located some distance from one another, with multiple genes separating them, and in both Oxytricha and Euplotes they are encoded on opposite strands. 14 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 terminal) can now be called rps3_a, and the recently classified C-terminal rps3 portion, rps3_b, nad2: Part of this gene was previously identified in Paramecium, Tetrahymena and Euplotes, but its length varies greatly: the Tetrahymena and Paramecium 'nad2' genes are unusually short, just 166 and 178 aa long, respectively. By contrast the shortest curated nad2 gene in Uniprot is 346 aa in Branchiostoma, possess N-terminal extensions with no substantial sequence similarity to other nad2 genes (de Graaf et al. 2009). HHpred searches using the original Tetrahymena nad2 ORF revealed that this ORF contains only a C-terminal portion of nad2. We found an ORF (372 aa) in the Oxytricha mitochondrial genome that appears to be weakly similar to a "putative non-ribosomal" Tetrahymena protein, ymf65 (blastp e-value 0.028 to T. malaccensis nad2), and to the Hydra and Phytophthora nad2 genes (e-values of 1.2 and 4.8 to NCBI’s nrdb, respectively). In Oxytricha, this region is separated from an ORF (167 aa) with blastp hits to the existing annotated ciliate nad2 entries in Genbank (e-values of e-06 to e-05), by a 60 aa unknown ORF. HHpred predictions for either the 372 aa Oxytricha ORF or Tetrahymena ymf65 correspond to the N-terminal half of nad2. ymf65 was previously predicted to have 10 transmembrane helices (Brunk et al. 2003), the same number we obtained for the 372 aa Oxytricha ORF using THMM2 (Krogh et al. 2001) (Figure 3). In Paramecium the nad2 N-terminal region appears to be further split into two ORFs: ymf65_a and ymf65_b. Like the transmembrane helix sums, the length sums of the ORFs corresponding to nad2 for Tetrahymena, Paramecium and Oxytricha (526, 518, and ~543 aa) are within the range of eukaryotic 15 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 while the longest is 538 aa in Ustilago. E. crassus (391 aa) and E. minuta (722 aa) are thought to nad2 lengths in Uniprot protein sequences. The annotated nad2 gene from the Nyctotherus hydrogenosome genome (Genbank accession: AJ871267.1) encodes the C-terminal nad2 portion and is a similar length (166 aa) to its Oxytricha, Tetrahymena and Paramecium orthologs. An open reading frame preceding the Nyctotherus nad2 Cterminal region does not share substantial sequence similarity with–and is approximately 100 amino acids shorter than–the Oxytricha, Tetrahymena and Paramecium nad2 N-terminal regions. The N- The sums of the transmembrane helices from the N- and C-terminal nad2 ciliate gene portions, for all but Euplotes (14 for Tetrahymena and Oxytricha, 13 for Paramecium and 15 for Nyctotherus), are in accord with the number predicted for most non-metazoan eukaryotes (13-14, e.g. Arabidopsis thaliana (Uniprot: O05000), Dictyostelium discoideum (O21048). The overall spacing of the helices in the N-toC terminal concatenated sequences of nad2 from Oxytricha, Nyctotherus and Tetrahymena and Paramecium THMM2 profiles are also in good agreement. In Euplotes minuta the entire ORF annotated as nad2 is predicted by THMM2 (Krogh et al. 2001) to contain 17 transmembrane helices, while Euplotes crassus appears to have 12. The annotated nad2 from E. crassus is just half the length (391 aa) of E. minuta (774 aa), and appears to encode only the Cterminal portion of the E. minuta nad2 (57.8% pairwise identity). The pairwise alignments of the concatenated translations of the ORFs upstream of the shorter E. crassus nad2 (orf175 and orf147) to the remaining N-terminal portion of E. minuta nad2 are only 18.7% identical, suggesting that these ORFs are either highly divergent or unrelated. Judging from the lengths and transmembrane numbers of nad2 from Oxytricha, Tetrahymena and Paramecium, the E. crassus nad2 is also a split gene, though we have not identified with certainty the location of the missing part. Barring sequencing and 16 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 terminal half of the Nyctotherus nad2 appears to be the ORF currently annotated as orf371. annotation errors, these Euplotes nad2 genes may indicate that nad2 can be split or fused in multiple ways. It therefore appears that nad2 may be split to different extents in different ciliate species (possibly independent evolutionary events), though a common nad2 N/C-terminal split appears to be shared by Tetrahymena, Paramecium, Oxytricha and Nyctotherus. A nad2 split-gene is also present in angiosperm mitochondrial genomes (Malek et al. 1997). In these plants nad2 is joined by the trans-splicing of a group II intron (Binder et al. 1992). We have not Jones et al. 2005) group II intron model with Infernal (Nawrocki et al. 2009). In Oxytricha, we think it is unlikely that RNA editing removes all the stops necessary to join the nad2 ORFs (at least 2 stop codons would need to be eliminated or read through). Instead it appears that, like nad1 (Seilhamer et al. 1984; Seilhamer et al. 1984; Heinonen et al. 1987; Pritchard et al. 1990; Schnare et al. 1995; Burger et al. 2000), this gene is not trans-spliced (supported by cDNA PCR results, data not shown), and therefore the gene pieces are translated as separate subunits that require co-assembly to form the functional protein structure. Split rRNA genes In both Tetrahymena and Paramecium the large and small subunit rRNA genes are further split (Seilhamer et al. 1984; Seilhamer et al. 1984; Heinonen et al. 1987; Pritchard et al. 1990; Schnare et al. 1995; Burger et al. 2000) into large and small portions. The current T. pyriformis and P. tetraurelia Genbank annotations present two different structures for the rns split: Tetrahymena has a short rns_a followed by a long rns_b, whereas Paramecium has a long rns_b followed by a short rns_a. However, 17 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 detected any group II introns in the ciliate mitochondrial genomes by scans of the RFAM (Griffiths- the experimental results in Paramecium were misinterpreted, and can be interpreted instead as rns having the same structure in both ciliates (Schnare et al. 1995). For Euplotes, no split rRNAs were detected using local sequence alignments of portions of the identified rRNA sequence (de Graaf et al. 2009), but no experimental support was provided for this conclusion. Northern analysis in Nyctotherus suggests that its SSU rRNA may be fragmented into 3 pieces of 1.7 kb, 750 bp and 600 bp (Akhmanova et al. 1998). unpublished EST data, indicate that Oxytricha has the same splits identified in the LSU and SSU genes for Paramecium and Tetrahymena (Figure 4). EST data suggest that a long AT-rich DNA spacer (~141 bp; 96% AT) divides Oxytricha rns into two subunits, as in Tetrahymena and Paramecium, because no EST reads span this region. We also verified this split by obtaining RACE products corresponding to the 3’ end of rns_a. (Supplementary Figure 2). Sequence alignments suggest that the ribosomal RNA fragment following rns_b is orthologous to the rnl_a fragment of Tetrahymena’s SSU RNA, which is physically separated from the remaining Oxytricha rnl_b fragment. A short AT rich tract (~32 bp; (91.6% AT) between the rns_b and rnl_a in Oxytricha is also poorly covered by expression data (unpublished) and hence a likely splitting point. Alignments of rns from the different ciliate species, including the Nyctotherus rns gene (see NCBI AJ871267.1) suggest that an rnl_b portion is located at the end of the gene annotated as rns in Euplotes. Furthermore, there are discrepancies in length of the annotated Euplotes rnl (2230 bp) in comparison to that of other ciliate rnls (~2550-2600 bp) (see Table 3), These lines of evidence suggest that the Euplotes ribosomal LSU is split like that of Tetrahymena, Paramecium and Oxytricha. Alignments of the ciliate SSU regions suggest that Euplotes could also possess the split SSU. At least one split (between rnl_a and rnl_b) is shared by all the ciliates and likely arose in their common ancestor. Additional experimental evidence is necessary to determine 18 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 Our inspection of alignments to the Tetrahymena and Paramecium split rRNA genes, as well as whether the SSU split is present in Euplotes and hence common to all known ciliate mitochondrial genomes. Mitochondrial genetic code and transfer RNA genes Tetrahymena, these species all seem to share the same mitochondrial genetic code: a small variation on the "mold, protozoan mitochondrial code" ("Table 4" according to NCBI BLAST tables), with a single stop codon (UAA) and a single unused codon (UAG), while UGA encodes tryptophan (Pritchard et al. 1990; Burger et al. 2000; de Graaf et al. 2009). Our BLAST searches for release factors in Oxytricha, Paramecium and Tetrahymena identified a single, nuclear-encoded mitochondrial peptide chain release factor (mtRF). As is typical for mitochondrial release factors, this release factor is more closely related to bacterial peptide chain release factor 1, which is involved in recognition of the UAA and UAG codons (Scolnick et al. 1968), than to standard eRFs. In bacteria and some eukaryotic mitochondria, RF2 recognizes the UAA and UGA codons. Codon reassignment from UGA as a stop codon to tryptophan is common in many eukaryotes and has occurred independently in numerous lineages (Massey and Garey 2007). Since both RF1 and RF2 recognize UAA, there is functional redundancy at this codon, which means that RF2 can be lost if UGA is no longer used as a stop codon. This reassignment is more likely to occur in small genomes, such as those in mitochondria, in a loss-andregain-of-codon model (Massey and Garey 2007). 19 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 In contrast to the different nuclear genetic codes in Euplotes vs. Oxytricha, Paramecium and The Oxytricha trifallax mitochondrial genome encodes 11 tRNA genes corresponding to 10 different tRNA species (see Table 2). Like the protein coding genes of known function, the O. trifallax mitochondrial tRNA collection is a superset of those previously discovered in ciliate mitochondrial genomes to date. Tetrahymena pyriformis has 8 tRNA genes corresponding to 7 tRNA species (Burger et al. 2000); Euplotes minuta has at least 7 tRNAs corresponding to 7 species; Paramecium tetraurelia has 4 tRNA genes corresponding to 4 species (Pritchard et al. 1990; Burger et al. 2000). Nycotherus only has 3 hydrogenosome-encoded tRNAs, trnF, trnW and trnY, the same tRNAs that are common to identical, including gaps) located at both non-telomeric ends of the terminal repeat: a short form (73 bp) that tRNAscan-SE recognizes as trnC , and a longer form (80 bp) that tRNAscan-SE designates an unknown tRNA. RNAfold (Hofacker 2003) using default parameters predicts that the former forms the characteristic cloverleaf secondary structure, while the latter may form a non-cloverleaf structure, and hence could be a tRNA pseudogene. tRNAscan-SE (Lowe and Eddy 1997) also predicts two trnM genes for O. trifallax. One appears to be orthologous to the trnM in Tetrahymena and Paramecium (trnM_i), while the other (trnM_ii) more closely resembles the 'trnM' of Euplotes (60% pairwise sequence identity vs. 55% for Euplotes 'trnM' to Oxytricha trnM_i). The Tetrahymena/Oxytricha/Paramecium trnM_i orthologs share a characteristic, truncated D stem and loop (Heinonen et al. 1987; Schnare et al. 1995), while the Oxytricha/Euplotes trnM_ii appears to be structurally more similar to typical initiator and elongator trnMs from other eukaryotic mitochondrial genomes, such as those from Reclinomonas. The trnM_ii is the least well conserved tRNA of the orthologs shared between Euplotes and Oxytricha, at 56.2% pairwise similarity vs. an average identify of 70.3% (min 66.7%) for the remaining pairs of tRNA orthologs (trnE, trnF, trnH, trnQ, trnW, trnY; locARNA (Will et al. 2007)), suggesting that this gene has either been evolving 20 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 all ciliate mitochondrial genomes. O. trifallax possesses two versions of the trnC gene (73.5% under relatively relaxed constraints or positive selection since divergence from the common ancestor of these two ciliates. The divergence between the two Oxytricha trnM's (55.9%) is greater than that between typical eukaryotic initiator and elongator trnMs; for instance, in Reclinomonas these genes are 67.1% identical, suggesting that there may be substantial functional divergence between the two classes of ciliate mitochondrial trnMs. Given the substantial divergence of ciliate trnMs, we are hesitant at this point to ascribe the role of initiator or elongator tRNA to any of the ciliate trnMs. During inspection of the Oxytricha mitochondrial genome assembly, we discovered an additional large contig (~4.9 kb) possessing an internal region with substantial sequence similarity – 251 bp at 82% identity – to one of the Oxytricha mitochondrial contigs (position indicated on Figure 1). We subsequently confirmed that this contig is a ~5!"#$%&$'()*+,$-'+./(0$12(34,*$5+), which we call mO (Genbank accession: JN383842). The 251 bp region appears to be a 'footprint' of a past recombination event between the plasmid and mitochondrial genome. The region of similarity is reminiscent of a 473 bp sequence shared by the Physarum mitochondrial genome and its linear plasmid mF, which has been shown to permit integration of the linear plasmid into the mitochondrial genome via homologous recombination (Takano et al. 1992). Consistent with a mitochondrial origin, the plasmid genes appear to use the same genetic code as the ~70 kb Oxytricha mitochondrial chromosome. Translation with either the standard or ciliate genetic codes would produce much shorter open reading frames, due to in-frame UGA codons. However, we do note that the tryptophan codon bias (the telltale signature of the mitochondrial genetic code) of these 21 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 A linear mitochondrial plasmid ORFs (12 UGA vs. 9 UGG) is much weaker than that of known or "ciliate-specific" mitochondrial ORFs (i.e. those that appear to have orthologs in other ciliates; 229 UGA vs. 20 UGG). The deviation from the standard Oxytricha mitochondrial tryptophan codon usage suggests that either this plasmid may be a relatively recent acquisition that has not yet acquired the standard mitochondrial codon usage, or that selection on codon usage is weaker on this plasmid. The plasmid contains two large open reading frames. The 3' ORF encodes a linear mitochondrial 8e-15 to a Fusarium proliferatum linear plasmid DNA polymerase (Genbank accession: YP_001718360)). The DNA polymerase has the characteristic "DTDS" residues of the DNA Pol B family active site (Hopfner et al. 1999) and appears to be a member of phage and linear plasmid (including mitochondrial plasmids) DNA polymerases (Kempken et al. 1992) (see next section). The 5' ORF has no convincing BLAST or HHpred hits, but appears to be an RNA polymerase based on a QuickPhyre (Kelley and Sternberg 2009) prediction: the best QuickPhyre prediction (e-value 0.41; estimated precision 80%) is to an x-ray crystal structure of a phage N4 virion RNA polymerase, related to the T7-like RNA polymerases (Kazmierczak et al. 2002). Linear plasmids, including mitochondrial ones, bearing both an RNA and DNA polymerase are a typical form (Meinhardt et al. 1990; Handa 2008). We also discovered that the longest ORF in the terminal inverted repeat of the Oxytricha mitochondrial genome, encoding a 411 aa protein, is predicted by Phyre (e-value 0.67; 80% precision) to be related to the same phage N4 RNA polymerase as the mO RNA polymerase (65% precision, evalue 1.4). The sequence similarity shared between the mO RNA polymerase and the TIR protein is a meager 17% (global pairwise alignment, gap opening and extensions penalties of 12 and 3, using the BLOSUM62 matrix). 22 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 plasmid/phage-like DNA polymerase that is easily identifiable by BLAST (BLASTp best-hit e-value We were unable to find any evidence of protein homologs of the mitochondrial plasmid ORFs in either Paramecium or Tetrahymena mitochondrial genome data. Linear mitochondrial plasmids have been identified in P. caudatum, P. jenningsi and P. micromultinucleatum (Endoh et al. 1994; 6.4%(( et al. 1994) but none of their sequences has been published. In a low coverage Stylonychia lemnae genome assembly we were able to find a telomere-lacking contig with a substantial tBLASTn hit to the DNA polymerase ORF (contig12708 (http://lamella.princeton.edu/blast/getseq.cgi?454AllContigs.fna&contig12708); e-value 7e-18). The Oxytricha and not the Oxytricha/Stylonychia macronuclear genetic code or standard genetic code (which would introduce two premature stop codons). This Stylonychia BLAST hit suggests that mitochondrial plasmids may be present in other spirotrichous ciliates. The mO 5' terminal ~250 bp contains three types of short semi-palindromic repeats (Figure 5b), which are capable of producing stem-loop structures of similar size to those of the Physarum linear plasmid mF 205 bp terminal inverted repeat region. The 3' end of mO is capped by the same type of telomeric repeats as the main mitochondrial genome. At least one example of short (5 bp), telomere-like repeats on a linear plasmid has been reported for the fungus Fusarium oxysporum (Walther and Kennell 1999) (the Fusarium oxysporum genome does not contain telomeres since it is circular (Marriott et al. 1984)). The Physarum polycephalum mitochondrial plasmid (mF) has longer – 144 bp – subterminal repeats following the plasmid TIRs, and is capable of in vivo linearization of the circular mitochondrial genome by recombining with it (Sakurai et al. 2000). Unlike Physarum, the Oxytricha mitochondrial genome’s telomeric repeats appear to be established, rather than new extensions from the plasmid. This suggests that we have found the first possible case of a stable transfer of a telomere between a mitochondrial genome and a plasmid. 23 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 entire contig appears to be translated with the same mitochondrial genetic code as the one used in Putative macronucleus-encoded mitochondrial DNA and RNA polymerases While no DNA or RNA polymerase genes have been documented in the mitochondrial genomes of have orthologs in the Oxytricha macronuclear genome as well. We sought to clarify the relationship between these putative mitochondrial polymerases and the plasmid-encoded polymerases found in Oxytricha. Mitochondrial DNA polymerases are largely unknown or uncharacterized in most eukaryotes (Shutt and Gray 2006) with the exception of humans and yeast (Kaguni 2004). The opisthokont (metazoa/fungi) mitochondrial DNA polymerase (Pol gamma) is a Pol A family DNA polymerase, like bacterial DNA Pol I, but a distinct (Lecrenier and Foury 2000) and divergent member of this family (20-25% identity relative to the E. coli Klenow fragment (Lecrenier et al. 1997)). However, Pol gamma does not appear to exist in many eukaryotic lineages (Burgers et al. 2001), and so a different mitochondrial DNA polymerase must take its place in these organisms. In Arabidopsis and the red alga Cyanidioschyzon merolae a single, putative mitochondrial DNA polymerase, which is not orthologous to Pol gamma, is targeted to both mitochondria and plastids (Elo et al. 2003; Moriyama et al. 2008). These polymerases are more similar to bacterial DNA Pol I polymerases than to Pol gamma (Mori et al. 2005) and form a distinct clade – 'plant organellar polymerases' (POPs; Moriyama et al. 2008) – comprised of diverse eukaryotic members, including the amoebozoan, Dictyostelium discoideum (Shutt and Gray 2006) the heterokont Phytophthora ramorum; diatoms; plants; red alga; and the ciliates T. 24 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 Tetrahymena and Paramecium, there are nuclear-encoded candidates for these genes, which appear to thermophila (Genbank accession: XP_001014571) and P. tetraurelia (Genbank accession: XP_001431083) (Moriyama et al. 2008). The ciliate 'POPs', including that of Oxytricha (Genbank accession: JN383844), appear to possess characteristic mitochondrial targeting signal peptides (Table 4), and therefore are putative ciliate mitochondrial DNA polymerases. A T-odd phage RNA polymerase homolog was identified for Tetrahymena pyriformis during searches complete homolog of the T. pyriformis sequence was predicted in the T. thermophila macronuclear genome assembly (Eisen et al. 2006; Genbank accession: XP_001013489) and subsequently discovered in the T. thermophila mitochondrial proteome (Smith et al. 2007). Both Paramecium tetraurelia (Genbank accession: XP_001435950) and Oxytricha trifallax (Genbank accession: JN383845) homologs also exist in the respective macronuclear genome assemblies. Mitochondrial target signal prediction software (Mitoprot (Claros and Vincens 1996) and Predotar (Small et al. 2004)) predicts that the Tetrahymena, Paramecium and Oxytricha proteins are mitochondrially targeted (Table 4). The Oxytricha linear plasmid RNA and DNA polymerases are so extremely divergent in comparison to the macronucleus-encoded putative mitochondrial RNA and DNA polymerases that conventional multiple sequence alignments of these genes are unreliable. Mitochondrial polymerases in the nuclear genomes of other organisms appear to be more closely related to the putative mitochondrial, nuclearencoded ciliate polymerases and not to the plasmid polymerases, which appear to be a secondary acquisition. 25 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 for homologues to the yeast mitochondrial and T3/T7 RNA polymerases (Cermakian et al. 1996). A An abundance of subterminal unknown open reading frames Both ends of the Oxytricha mitochondrial genome – corresponding to ~5 kb and ~14.5 kb (or in total just over " of the Oxytricha mitochondrial genome length) – contain almost exclusively ORFs without obvious homologues in any known organism or in the Oxytricha macronuclear genome These regions constitute over half (54%) of the total unknown ORF length in the Oxytricha mitochondrial genome. Tetrahymena and Euplotes mitochondrial genomes and Oxytricha’s larger one. The structure of the Oxytricha mitochondrial genome resembles the core Euplotes mitochondrial genome, with the exception of one large translocated region – the cob-to-nad5 gene block – located at opposite ends of these mitochondrial genomes (Figure 1), with two large blocks of unknown ORFs appended to either end. Segmental duplications are evident from a dot plot of the ~14.5 kb telomeric end (Figure 6). The largest of these duplications is closest to the telomeric end and is ~1450 bp long with ~91% pairwise identity. An ~170 bp region represents the sequence that has been duplicated most often. Pairwise identities relative to the first repeat (from the telomeric end) from this region decrease with increasing distance: 93.4%, 88.9%, 74.7%. If we assume that these regions are evolving approximately neutrally, then the duplications closest to the telomeric end are younger than the distal ones. This suggests that the ~14.5 kb region arose, in part, through successive expansions resulting in up to 3 successive terminal duplication events. Curiously, the 251 bp segment shared by the plasmid and mitochondrial genome is located near (~120 bp from) the end of the most recent duplication of these repeats. 26 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 The sum of these end regions accounts for the majority of the size difference between the smaller The largest duplication within the 14.5 kb end contains at least one long ORF (~600 bp), which appears to be evolving under similar levels of evolutionary constraint (dn/ds = 0.288) to that of the pair of ORFs from the terminal inverted repeats (dn/ds = 0.302). In Tetrahymena both non-terminal duplications of nad9 and the terminal inverted repeats appear to be maintained by concerted evolution (Brunk et al. 2003), which is also likely to be the case for the Oxytricha terminal inverted repeat regions. These levels of constraint are somewhat lower than those we recently reported for the micronuclear-encoded Oxytricha TBE transposase paralogs (Nowacki et al. 2009). The synonymous substitution rate is also TBE transposase paralogs (0.287 average), indicating that these genes have either duplicated more recently than the TBEs (since ciliate mitochondrial genes evolve more rapidly than their nuclear genes) and/or that substitution has been suppressed by concerted evolution. Assuming no translocations and that the strength of concerted evolution either declines with distance from the telomeres or remains approximately the same throughout the genome, the lower overall substitution rates in the 600 bp duplicated ORF pair, relative to the TIR ORF pair, suggest that these duplications arose both internally to, and after, the TIR ORF pair. The pattern of sequence conservation in the mitochondrial terminal regions suggests both that purifying selection acting upon a duplicated ORF permitted the detection of duplications, and that selective constraints have been lost in many of the surrounding ORFs leading to pseudogene formation. Two lines of evidence suggest pseudogenization: (i) the intervening regions between ORFs are longer in the terminal unknown ORF regions (~179 bp mean; ~178 bp stddev; excluding zero length regions) than the central region (~53 bp mean; 63 bp stddev); (ii) inter-ORF regions constitute ~19.5% of the terminal unknown ORF regions, whereas these spacers constitute ~5% of the Oxytricha mitochondrial genome excluding the terminal unknown ORF regions, a figure close to that of typical, tightly packed ciliate mitochondrial genomes, such as those of Tetrahymena and Euplotes (Burger et al. 2000; de 27 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 lower in these genes – 0.095 for ~600 bp duplicated ORF and 0.181 for the TIR ORF pair – than the Graaf et al. 2009); (iii) ORFs in the unknown ORF regions are shorter (419 bp mean; 310 bp std. dev.) compared to those from the central region (501 bp mean; 412 bp std. dev.). In the 14.5 kb subterminal region we also noticed an overabundance of tryptophan UGG (15) vs. UGA (43) anticodons, in comparison to 20 UGG vs. 229 UGA anticodons in conserved or ciliate-specific ORFs in the central region (the ~5 kb subtelomeric region has 0 UGG and 17 UGA tryptophan anticodons). This deviation from the standard tryptophan codon usage in the larger subterminal region from the mO plasmid and/or relaxed constraint associated with possible pseudogene formation. Discussion The Oxytricha mitochondrial genome, at ~70 kb, is the largest ciliate mitochondrial genome sequenced to date. It is approximately 22-30 kb larger than the other completely sequenced ciliate mitochondrial genomes; mitochondrial genomes of the distantly related oligohymenophorans Paramecium tetraurelia and Tetrahymena pyriformis are ~40 kb (Pritchard et al. 1990) and ~47 kb (Burger et al. 2000), respectively; the more closely-related spirotrichous ciliate Euplotes crassus is < 48 kb (de Graaf et al. 2009). The pattern of duplications within the 14.5 kb subterminal region of the Oxytricha mitochondrial genome suggests that successive duplications toward the telomere have resulted in the formation of paralogs and pseudogenes, and that these duplications are partly responsible for the larger genome size. The presence of a mitochondrial plasmid that contains a region that matches segmental 28 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 could be an indication either of a relatively recent incorporation of foreign genetic material derived duplications on the primary mitochondrial genome indicates either an association between this plasmid and the duplications, or possibly a higher probability of incorporation of the element in this region of the genome. Such an integration of the plasmid might have only mildly deleterious effects. Ciliate mitochondrial genomes appear to have high gene densities and are considered to be 'large' (Gray et al. 2004), since they contain a relatively large number of mitochondrial genes (29 known protein- mitochondrial genes in ciliates (nad1, nad2, rps3, lsu and ssu) suggests that there is no specific functional correlation with the presence of split genes in these genomes. At least some of the gene splits (in nad1, rps3 and lsu) occur at approximately the same gene position and therefore appear to have occurred prior to the last common ancestor of these organisms. Additional ciliate mitochondrial genome sequences may reveal more cases of split genes, which are currently hard to detect due to the extreme divergences of ciliate mitochondrial genomes and relative paucity of sequence data from the broader diversity of ciliates. It was previously shown that the oligohymenophoran cox1, cox2 and cob genes are extremely divergent relative to other eukaryotic mitochondrial genes, and this is partly responsible for the difficulty in classifying a large number of ciliate mitochondrial ORFs (Burger et al. 2000). Extreme divergence appears to be a general property of ciliate mitochondrial protein-coding genes, even for the highly conserved iron-sulfur proteins nad7 and nad10 (the least divergent of the ciliate mitochondrial proteins; Supplementary Figure 3). There appears to be no functional association with such extreme divergence, since genes with unrelated functions (ribosomal, electron transport and protein transport/maturation genes) all exhibit extreme divergences. Though ciliate mitochondrial rRNA genes do not appear to be evolving at exceptional rates (Gray and Spencer 1996) in relation to other eukaryotic mitochondrial 29 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 coding genes, 11 tRNAs and 2 rRNAs in Oxytricha). The current expanded set of five split rRNAs, their distances and divergence rates may be underestimated , due to saturation (Supplementary Figure 4). Therefore, gene substitution rates appear to be generally elevated in ciliate mitochondria, irrespective of whether the gene encodes a protein or RNA product. A neutral evolutionary process due to low fidelity replication or error-prone repair would be consistent with the elevated ciliate mitochondrial substitution rates. that of other eukaryotes, with ciliates such as Oxytricha and Euplotes–that possess a highly fragmented macronuclear genome structure–evolving the most rapidly (Zufall et al. 2006). Unlike the case of mitochondrially-encoded genes, we have not observed evidence of extreme divergences in any of the nuclear-encoded, putatively mitochondrially-targeted, genes that we examined in this study (RF1, mtDNA polymerase, mtRNA polymerase (Moriyama et al. 2008), nad8, nad11). Furthermore, since different DNA polymerase complexes are responsible for nuclear vs. mitochondrial replication, we do not expect a correlation between elevated mitochondrial substitution rates and nuclear substitution rates. Based on comparisons of the oligohymenophorean and spirotrich mitochondrial genomes, we propose that their common ancestor possessed: (i) a linear mitochondrial genome; (ii) a replication origin within- or in close proximity to an AT-rich region of low complexity; and (iii) terminal inverted repeats capped by telomeric repeats. In ciliate mitochondrial genomes both the putative replication origin (Arnberg et al. 1974; Goddard and Cummings 1975; Goddard and Cummings 1977; Pritchard and Cummings 1981) and primary region of transcription initiation appear to lie in close proximity to, or coincide with, a low-complexity/repeat 30 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 There is also evidence to suggest that ciliate nuclear genes have elevated substitution rates relative to region. TATA-like elements in multiple Paramecium species have been proposed as a motif for transcription recognition (Pritchard et al. 1983) but this now seems unlikely given that a different Todd phage-like eukaryotic mitochondrial RNA polymerase is most likely the primary mitochondrial RNA polymerase, and such phage RNA polymerases are TATA-independent (Cermakian et al. 1996; Shutt and Gray 2006). In Tetrahymena species, a highly conserved, GC box-like region in the central region of divergent transcription has been proposed as a motif that may be responsible for initiating transcription, and possibly also DNA replication (Moradian et al. 2007). Experimental evidence is Both mitochondrial terminal inverted repeats and telomeric sequences, such as those of Tetrahymena and Oxytricha, were proposed to be of foreign origin (Nosek and Tomá#ka 2003). Nosek and Tomá#ka also proposed that linear mitochondrial genomes owe their linearity to mobile elements, which would provide both the need and means to replicate linear genomes by providing DNA sequences/structures and a polymerase necessary for replicating linear DNA (Nosek and Tomá#ka 2003). The Oxytricha linear mitochondrial plasmid appears to lack the terminal inverted repeats characteristic of most known linear plasmids (Meinhardt et al. 1990; Handa 2008). Instead, it had a 5’ end with complex repeats and a 3’ end with the same telomeric repeats as the primary mitochondrial genome. The latter feature demonstrates that it is possible to transfer telomeric sequence repeats between mitochondrial genomes and linear plasmids. One possible scenario for such a transfer is that the original Oxytricha linear plasmid may have possessed a terminal inverted structure, which was lost during mitochondrial genome integration, followed by capture of a telomere-bearing end from the primary genome. Alternatively, the plasmid may have possessed a similar structure to its current form, with a telomeric repeat sequence that was transferred to the Oxytricha mitochondrial genome during an integration event. We propose that, as for horizontal gene transfer, the phagotrophic lifestyles of ciliates may 31 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 necessary to pinpoint the precise location of transcription initiation in these genomes. predispose them to periodic mitochondrial invasions by mobile elements bearing error-prone DNA polymerases, such as the Oxytricha mO plasmid. These foreign polymerases may in turn interfere with or partially substitute for the primary, higher fidelity mitochondrial DNA polymerase, contributing to the extreme evolutionary divergences observed in ciliate mitochondria. We’d like to thank the Hans Lipps lab (Universität Witten/Herdecke), in particular Franziska Jönsson, for providing us with DNA from Stylonychia and Jingmei Wang for general laboratory assistance. This research was supported by NIH grant GM59708 to L.F.L. 32 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 Acknowledgements References Akhmanova, A, et al., 1998. A hydrogenosome with a genome. Nature, 396(6711):527-528. Altschul S, et al., 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25(17):3389-3402. Arnberg AC, et al., 1974. An analysis by electron microscopy of intermediates in the replication of linear Tetrahymena mitochondrial DNA. Biochimica Et Biophysica Acta. 361(3):266-276. Boxma B, et al., 2005. An anaerobic mitochondrion that produces hydrogen. Nature. 434(7029):74-79. Brunk CF, et al. 2003. Complete sequence of the mitochondrial genome of Tetrahymena thermophila and comparative methods for identifying highly divergent genes. Nucl. Acids Res. 31(6):16731682. Burger G, et al. 2000. Complete Sequence of the Mitochondrial Genome of Tetrahymena pyriformis and Comparison with Paramecium aurelia Mitochondrial DNA. J. Mol. Biol. 297, 365-380. Burgers PMJ, et al. 2001. Eukaryotic DNA Polymerases: Proposal for a Revised Nomenclature. J. Biol. Chem. 276(47):43487-43490. Cermakian N, et al. 1996. Sequences homologous to yeast mitochondrial and bacteriophage T3 and T7 RNA polymerases are widespread throughout the eukaryotic lineage. Nucl. Acids Res. 24(4):648-654. Claros MG and Vincens P. 1996. Computational method to predict mitochondrially imported proteins and their targeting sequences. Eur. J. Biochem. 241(3):779-786. Dawson D and Herrick G. 1982. Micronuclear DNA sequences of Oxytricha fallax homologous to the macronuclear inverted terminal repeat. Nucleic Acids Res. 10(9):2911-24. Dinouel N, et al. 1993. Linear mitochondrial DNAs of yeasts: closed-loop structure of the termini and possible linear-circular conversion mechanisms. Mol. Cell. Biol. 13(4):2315-2323. Doolittle WF. 1998. You are what you eat: a gene transfer ratchet could account for bacterial genes in eukaryotic nuclear genomes. Trends in Genetics. 14(8):307-311. Drummond AJ, et al. 2009. Geneious v4.7. Available from http://www.geneious.com/. Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucl. Acids Res. 32(50):1792-1797. 33 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 Binder S, et al., 1992. RNA editing in trans-splicing intron sequences of nad2 mRNAs in Oenothera mitochondria. Journal of Biological Chemistry. 267(11):7615-7623. Eisen JA. et al. 2006. Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila, a Model Eukaryote. PLoS Biol. 4(9):e286. Elo A, et al. 2003. Nuclear Genes That Encode Mitochondrial Proteins for DNA and RNA Metabolism Are Clustered in the Arabidopsis Genome. Plant Cell. 15(7):1619-1631. Endoh H, et al. 1994. Hairpin and dimer structures of linear plasmid-like DNAs in mitochondria of Paramecium caudatum. Current Genetics. 27(1):90-94. Fan J and Lee RW. 2002. Mitochondrial Genome of the Colorless Green Alga Polytomella parva: Two Linear DNA Molecules with Homologous Inverted Repeat Termini. Mol Biol Evol. 19(7):9991007. Goddard JM and Cummings DJ. 1977. Mitochondrial DNA replication in Paramecium aurelia. Crosslinking of the initiation end. Journal of Molecular Biology. 109(2):327-344. Goddard JM and Cummings DJ. 1975. Structure and replication of mitochondrial DNA from Paramecium aurelia. Journal of Molecular Biology. 97(4):593-609. de Graaf RM, et al. 2009. The mitochondrial genomes of the ciliates Euplotes minuta and Euplotes crassus. BMC Genomics. de Graaf RM, et al. 2011. The organellar genome and metabolic potential of the hydrogen-producing mitochondrion of Nyctotherus ovalis. Mol Biol Evol. 28(8):2379-2391. Gray MW, Lang BF and Burger G. 2004. Mitochondria of Protists. Annual Review of Genetics. 38(1):477-524. Gray M, et al. 1998. Genome structure and gene content in protist mitochondrial DNAs. Nucl. Acids Res. 26(4):865-878. Gray M and Spencer D. 1996. Organellar Evolution. In: Roberts DMcL et al. editors. Evolution of microbial life : Fifty-fourth Symposium of the Society for General Microbiology held at the University of Warwick. Symposia of the Society for General Microbiology. UK: Cambridge University Press. p. 109-126. Griffiths-Jones S, et al. 2005. Rfam: annotating non-coding RNAs in complete genomes. Nucl. Acids Res. 33(suppl_1):D121-124. Handa H. 2008. Linear plasmids in plant mitochondria: Peaceful coexistences or malicious invasions. Mitochondrion. 8(1):15-25. Heinonen T, et al. 1987. Rearranged coding segments, separated by a transfer RNA gene, specify the two parts of a discontinuous large subunit ribosomal RNA in Tetrahymena pyriformis mitochondria. J. Biol. Chem. 262(6):2879-2887. 34 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 Forget L, et al. 2002. Hyaloraphidium curvatum: A Linear Mitochondrial Genome, tRNA Editing, and an Evolutionary Link to Lower Fungi. Mol Biol Evol. 19(3):310-319. Hildebrand, A. et al. 2009. Fast and accurate automatic structure prediction with HHpred. Proteins: Structure, Function, and Bioinformatics. 77(S9):128-132. Hofacker IL. 2003. Vienna RNA secondary structure server. Nucl. Acids Res. 31(13):3429-3431. Hopfner K, et al. 1999. Crystal structure of a thermostable type B DNA polymerase from Thermococcus gorgonarius. Proc Natl Acad Sci USA. 96(7):3600-3605. Horton TL and Landweber LF. 2000. Mitochondrial RNAs of myxomycetes terminate with nonencoded 3$ poly(U) tails. Nucl. Acids Res. 28(23):4750-4754. Kaguni LS. 2004. DNA Polymerase,the mitochondrial replicase. Annual Review of Biochemistry. 73(1):293-320. Kayal E and Lavrov DV. 2008. The mitochondrial genome of Hydra oligactis (Cnidaria, Hydrozoa) sheds new light on animal mtDNA evolution and cnidarian phylogeny. Gene. 410(1):177-186. Kazmierczak K. et al. 2002. The phage N4 virion RNA polymerase catalytic domain is related to single-subunit RNA polymerases. EMBO J. 21(21):5815-5823. Kelley LA and Sternberg MJE. 2009. Protein structure prediction on the Web: a case study using the Phyre server. Nat. Protocols. 4(3):363-371. Kempken F, Hermanns J and Osiewacz H. 1992. Evolution of linear plasmids. Journal of Molecular Evolution. 35(6):502-513. Krogh A, et al. 2001. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. Journal of Molecular Biology. 305(3):567-580. Lecrenier N and Foury F. 2000. New features of mitochondrial DNA replication system in yeast and man. Gene. 246(1-2):37-48. Lecrenier N, van der Bruggen P and Foury F. 1997. Mitochondrial DNA polymerases from yeast to man: a new family of polymerases. Gene. 185(1):147-152. Lowe T and Eddy S. 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucl. Acids Res. 25(5):955-964. Malek O, Brennicke A and Knoop V. 1997. Evolution of trans-splicing plant mitochondrial introns in pre-Permian times. Proc Natl Acad Sci USA. 94(2):553-558. Markham NR and Zuker M. 2005. DINAMelt web server for nucleic acid melting prediction. Nucl. Acids Res. 33(suppl_2):W577-581. 35 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 Iwamoto M, et al. 1998. A ribosomal protein gene cluster is encoded in the mitochondrial DNA of Dictyostelium discoideum: UGA termination codons and similarity of gene order to Acanthamoeba castellanii. Current Genetics. 33(4):304-310. Marriot AC, Archer SA and Buck KW. Mitochondrial DNA in Fusarium oxysporum is a 46.5 kilobase pair circular molecule. Journal of General Microbiology. 130(11):3001-3008. Massey SE and Garey JR. 2007. A comparative genomics analysis of codon reassignments reveals a link with mitochondrial proteome size and a mechanism of genetic code change via suppressor tRNAs. Journal of Molecular Evolution. 64(4):399-410. Meinhardt F, et al. 1990. Linear plasmids among eukaryotes: fundamentals and application. Current Genetics. 17(2):89-95. Meinhardt F, Schaffrath R and Larsen M. 1997. Microbial linear plasmids. Applied Microbiology and Biotechnology. 47(4):329-336. Mori Y, et al. 2005. Plastid DNA polymerases from higher plants, Arabidopsis thaliana. Biochemical and Biophysical Research Communications. 334(1):43-50. Morin GB and Cech TR. 1988. Mitochondrial telomeres: Surprising diversity of repeated telomeric DNA sequences among six species of Tetrahymena. Cell. 52(3):367-374. Morin GB and Cech TR. 1986. The telomeres of the linear mitochondrial DNA of Tetrahymena thermophila consist of 53 bp tandem repeats. Cell. 46(6):873-883. Moriyama T, et al. 2008. Purification and characterization of organellar DNA polymerases in the red alga Cyanidioschyzon merola. FEBS Journal. 275(11):2899-2918. Nawrocki EP, Kolbe DL and Eddy SR. 2009. Infernal 1.0: inference of RNA alignments. Bioinformatics. 25(10):1335-1337. Nosek J, et al. 1998. Linear mitochondrial genomes: 30 years down the line. Trends in Genetics. 14(5):184-188. Nosek J and Tomá#ka, %. 2003. Mitochondrial genome diversity: evolution of the molecular architecture and replication strategy. Current Genetics. 44(2):73-84. Nowacki M, et al. 2009. A Functional Role for Transposases in a Large Eukaryotic Genome. Science. 324(5929):935-938. Pritchard AE and Cummings DJ. 1981. Replication of linear mitochondrial DNA from Paramecium: sequence and structure of the initiation-end crosslink. Proc Natl Acad Sci USA. 78(12):73417345. Pritchard AE, et al. 1990. Nucleotide sequence of the mitochondrial genome of Paramecium. Nucl. Acids Res. 18(1):173-180. Pritchard AE, et al. 1983. Inter-species sequence diversity in the replication initiation region of 36 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 Moradian MM, et al. 2007. Complete mitochondrial genome sequence of three Tetrahymena species reveals mutation hot spots and accelerated nonsynonymous substitutions in ymf genes. PLoS ONE. 2(7):e650. Paramecium mitochondrial DNA. Journal of Molecular Biology. 164(1):1-15. Ricard G, et al. 2008. Macronuclear genome structure of the ciliate Nyctotherus ovalis: Single-gene chromosomes and tiny introns. BMC Genomics. 9(1):587. Rost B. 1999. Twilight zone of protein sequence alignments. Protein Engineering. 12(2):85-94. Sakurai R, et al. 2000. In vivo conformation of mitochondrial DNA revealed by pulsed-field gel electrophoresis in the true slime mold, Physarum polycephalum. DNA Research. 7:83-91. Schnare MN, et al. 1986. A discontinuous small subunit ribosomal RNA in Tetrahymena pyriformis mitochondria. The Journal of Biological Chemistry. 261(11):5187-5193. Scolnick E, et al. 1968. Release factors differing in specificity for terminator codons. Proc Natl Acad Sci USA. 61(2):768-774. Seilhamer JJ, Gutell RR and Cummings DJ. 1984. Paramecium mitochondrial genes. II. Large subunit rRNA gene sequence and microevolution. Journal of Biological Chemistry. 259(8):5173-5181. Seilhamer JJ, Olsen GJ and Cummings DJ. 1984. Paramecium mitochondrial genes. I. Small subunit rRNA gene sequence and microevolution. Journal of Biological Chemistry. 259(8):5167-5172. Sethuraman J, et al. 2009. Molecular Evolution of the mtDNA Encoded rps3 Gene Among Filamentous Ascomycetes Fungi with an Emphasis on the Ophiostomatoid Fungi. Journal of Molecular Evolution. 69(4):372-385. Shutt TE and Gray MW. 2006. Bacteriophage origins of mitochondrial replication and transcription proteins. Trends in Genetics. 22(2):90-95. Small I, et al. 2004. Predotar: A tool for rapidly screening proteomes for N-terminal targeting sequences. Proteomics. 4:1581-1590. Smith DG, et al. 2007. Exploring the Mitochondrial Proteome of the Ciliate Protozoon Tetrahymena thermophila: Direct Analysis by Tandem Mass Spectrometry. Journal of Molecular Biology. 374(3):837-863. Smits P, et al. 2007. Reconstructing the evolution of the mitochondrial ribosomal proteome. Nucl. Acids Res. 35(14):4686-4703. Soding J 2005. Protein homology detection by HMM-HMM comparison. Bioinformatics. 21(7):951960. Soding J, Biegert A and Lupas AN. 2005. The HHpred interactive server for protein homology detection and structure prediction. Nucl. Acids Res. 33(suppl_2):W244-248. 37 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 Schnare MN, Greenwood SJ and Gray MW. 1995. Primary sequence and post-transcriptional modification pattern of an unusual mitochondrial tRNAMet from Tetrahymena pyriformis. FEBS Letters. 362(1):24-28. Suyama Y and Miura K. 1968. Size and structural variations of mitochondrial DNA. Proc Natl Acad Sci USA. 60:235-242. Takano H, Kawano S and Kuroiwa T. 1992. Constitutive homologous recombination between mitochondrial DNA and a linear mitochondrial plasmid in Physarum polycephalum. Current Genetics. 22(3):221-227. Takano H, Kawano S and Kuroiwa T. 1994. Complex terminal structure of a linear mitochondrial plasmid from Physarum polycephalum: three terminal inverted repeats and an ORF encoding DNA polymerase. Current Genetics, 25(3):252-257. 6.4%(($78$9)0:;$<$+)0$7+=+%($>. 1994. Distribution and genetic variabilities of mitochondrial plasmid-like DNAs in Paramecium. The Japanese Journal of Genetics. 69(6):685-696. Walther TC and Kennell JC. 1999. Linear Mitochondrial Plasmids of F. oxysporum Are Novel, Telomere-like Retroelements. Molecular Cell. 4(2):229-238. Wan F, et al. 2007. Ribosomal Protein S3: A KH Domain Subunit in NF-[kappa]B Complexes that Mediates Selective Gene Regulation. Cell. 131(5):927-939. Will S, et al. 2007. Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering. PLoS Comput Biol. 3(4):e65. Wright ADG and Lynn DH. 1997. Maximum ages of ciliate lineages estimated using a small subunit rRNA molecular clock: Crown eukaryotes date back to the Paleoproterozoic. Archiv fur Protistenkunde. 148(4):329-342. Yang Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Computional Applications in Bioscience/ 13(5):555-556 Zufall RA, et al. 2006. Genome architecture drives protein evolution in ciliates. Mol Biol Evol. 23(9):1681-1687. Rice P, Longden I and Bleasby A. (2000) EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics. 16(6) 276-277 Xia X and Xie Z. 2001. DAMBE: Data analysis in molecular biology and evolution. Journal of Heredity. 92:371-373. The Uniprot consortium. 2011. Ongoing and future developments at the Universal Protein Resource. Nucl. Acids Res. 39 (suppl 1):D214-D219 38 Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 Vahrenholz C, et al. 1993. Mitochondrial DNA of Chlamydomonas reinhardtii: the structure of the ends of the linear 15.8-kb genome suggests mechanisms for DNA replication. Current Genetics. 24(3):241-247. Figures and Tables O. trifallax > 69.7 kb N. ovalis > 41.7 kb C_Ψ E. minuta > 42.0 kb T. pyriformis > 47.3 kb P. aurelia > 40.5 kb M_2 K M_1 M L_2 ccmF M E W F W Key: nad4L rRNA F Y Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 W Y W known orf F unknown orf E L tRNA E F repeat H interorf region Y W Q F unsequenced region transcription direction Q H M H Y L_1 nad5_i nad5_iii Y rpo dpo 10 kb mO plasmid C ! Figure 1. Gene map of the O. trifallax mitochondrial genome (Genbank accession: JN383842) in comparison to that of representative ciliate mitochondrial genomes (E. minuta - GQ903130; T. pyriformis - AF160864; P. tetraurelia NC_001324) and the N. ovalis hydrogenosome genome (GU057832.1). Unknown ORFs are as currently annotated in Genbank, without genes that we have newly classified. Split genes are suffixed with an underscore followed by an alphabetic character. O. trifallax ORFs with a lowercase roman numeral are fragments we predict to belong to the same gene, but which are possibly artificially split by sequencing errors. The undetermined lengths of the central repeat regions of O. trifallax and E. minuta are indicated by a dashed diagonal line. tRNAs are indicated by single letter amino acid codes. Collinearity between the genomes is indicated by pale-colored regions, including the collinear, but interrupted, single-genes that are demarcated by finely dashed lines. The ~250 bp region shared by the mitochondrial mO plasmid (JN383843) and the primary genome of O. trifallax is indicated by a red and black dashed line. Figures and Tables rps3_a P. tetraurelia T. pyriformis O. trifallax rps3_a rps3_b “rps3” orf234 Tetrahymena pyriformis “rps3” ymf64 27 nt “gap” Oxytricha trifallax rps3 “extension” Euplotes minuta rps3_b “rps3” “rps3” P. tetraurelia T. pyriformis O. trifallax E. minuta Figure 2. rps3 genes in ciliate mitochondrial genomes (E. minuta - GQ903130; T. pyriformis - AF160864; P. aurelia - NC_001324). The rps3_a and rps3_b multiple sequence alignments are indicated above and below a schematic representation of the split rps3 genes. Regions with substantial sequence similarity are indicated in dark purple, whereas those that are poorly conserved are indicated in pink; the rps3_a and rps3_b parts are on distant loci. The rps3 extension annotated as a part of this gene in Euplotes does not align to any of the other rps3 sequences. Multiple sequence alignments were generated with Muscle (Edgar et al. 2004) with default parameters. Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 Paramecium tetraurelia Figures and Tables TMHMM posterior probabilities for Sequence Tetrahymena pyriformis 1.2 probability 1 0.8 0.6 0.4 0.2 0 0 100 200 300 1.2 500 aa 400 TMHMM posterior probabilities for Sequence Paramecium aurelia 367 aa transmembrane inside outside probability 0.8 0.6 0.4 0.2 0 0 100 200 300 500 aa 400 197 aa 354 aa inside for Sequence Oxytricha trifallax TMHMM posterior probabilities transmembrane outside 1.2 probability 1 0.8 0.6 0.4 0.2 0 0 1.2 100 transmembrane Nyctotherus ovalis 200 300 500 aa 400 TMHMM posterior probabilities for Sequence 372 aa inside outside probability 1 0.8 0.6 0.4 0.2 0 0 100 transmembrane 200 300 inside 400 371 aa 500 outside Figure 3. THMM2 transmembrane profile predictions for the concatenated nad2 split ORFs from Tetrahymena pyriformis (AF160864), Paramecium tetraurelia (NC_001324), Oxytricha trifallax (JN383842) and Nyctotherus ovalis (GU057832.1). THMM2 posterior probabilities are given on the Y-axis; the X-axis length is in amino acids. Concatenation points are indicated by arrows. Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 1 Figures and Tables 18S 5.8S 28S M. musculus (N) 23S 200 bp E. coli 7S 3S 23S C. reinhardtii (C) rns_b rns_a rns_b rns_a rns_b rnl_a trnM rnl_b trnY rnl_b trnL T. pyriformis (M) O. trifallax rnl_a rnl_b rnl_a trnY/trnM_i trnM_i trnK ? E. minuta trnM_ii (M) rns rnl_a (M) ? rnl_b trnM_ii ? Figure 4. Comparison of ciliate split rRNA genes. Solid red and blue bars represent small and large subunit rRNA coding sequences, respectively, drawn approximately to scale; discontinuous red and blue bars represent longer sequences that have been compressed due to figure space constraints; black lines represent intervening sequences which are approximately halved relative to the coding sequences, except the central, large discontinuous intervening region in ciliates, indicated by the dashed line, which represents an extensive, primarily protein-coding, region; tRNA genes are purple. The duplicated large subunit region of Tetrahymena pyriformis is represented here by a single locus: the primary difference between the two loci is a different tRNA succeeding rnl_a (trnY and trnM_i). Homology between the different segments is indicated by pastel-colored parallelograms. Question marks indicate the lack of experimental evidence in Oxytricha or Euplotes supporting or rejecting gene splits. Genbank accession numbers for the rRNA loci are: Mus musculus nuclear genome rRNA (BK000964.1); Escherichia coli genome (NC_000913); Chlamydomonas reinhardtii chloroplast genome (NC_005353.1); Paramecium tetraurelia (NC_001324), Tetrahymena pyriformis (AF160864.1), Euplotes minuta (GQ903130) and Oxytricha trifallax (JN383842) mitochondrial genomes. Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 rns_a P. tetraurelia (M) Figures and Tables 0 1 2 3 ClaI (2074) 4 BseAI (3784) 5 kb i a) RNA Polymerase mO-orf2 mt-telo DNA Polymerase T A A T A G T T A C G A T d I la Bs e ig AI te es dd la 12000 5000 4000 3000 2000 1650 1000 A T G A + c) 75 ii nd T A A A T A T A A T A T A T C A T A A T T A A T A A A A T T A A kb A A T T T A A A A T T 1 b) A A T 50 er mitochondrial genome sequence similarity region U iii ii G C G A T C G A T G iii 100 A T A A T T A A T A A T T A T A C T A A T A T T T G C A T G C A A A A T A T T A C G T A T T A T A T A A A T G C T A G C T A C G G C T125A T A T A T A A T A A T T A T A T G G C T G C T G C A T C G T A G T te d er A T 175 A T SmaI A A T A A T G T 500 400 300 200 A A T A T A T Undigested plasmid 225 A A T A A T T 12000 A T 200 A T d) A A T A 150 T A T aI A T T es T C G C 5’ A G T A A T Sm ii A A ig i C G dd A A T A + T i A T nd G C la T U A kb T 1 T 25 A T A T A T G T C 250 C G 100 G G T G T C G A C G G G G C A A A T T G Figure 5. The Oxytricha trifallax linear mitochondrial plasmid. a) The linear plasmid is approximately drawn to scale. The 3 open reading frames are indicated by arrows; the putative integration site is indicated in teal. The three classes of hairpin-forming sequences are indicated by triangles. b) Quikfold (Markham and Zuker 2005) structures predicted for the 5ʼ end of the plasmid. c) Southern analysis of the linear plasmid; the digestion product lengths are in agreement with a linear form of the mitochondrial plasmid; the probe also hybridizes to the mitochondrial genome in the mobility limit as it includes the region of sequence similarity shared by the plasmid and the main mitochondrial genome. d) Southern analysis to infer the length of the 5ʼ end of the plasmid. Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 C G Mobility limit Figures and Tables 16000 16000 14000 14000 12000 12000 10000 10000 6000 6000 4000 4000 2000 2000 0 0 0 2000 4000 6000 8000 10000 10000 12000 14000 16000 mito_newest_2010_01_25_c_first16kb Figure 6. Segmental duplications in the 16 kb subterminal mitochondrial region of the Oxytricha mitochondrial genome long arm. The dotplot was generated by Dotmatcher (Emboss) (Rice et al. 2000) with a window size of 50 and threshold of 150. The axis scales are in base pairs. Along the axes, unknown ORFs are colored orange and known protein coding genes are green. The ~170 bp quadruplication is enclosed by red ellipsoids; the 1450 bp duplication is enclosed by a blue ellipsoid. The footprint of the mO plasmid is indicated in teal; the close proximity of this site to the end of the first region that is quadruplicated is indicated by dashed lines. Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 8000 8000 Figures and Tables Table 1. Ciliate mitochondrially-encoded genes E. minuta N. ovalis T. pyriformis P. tetraurelia nad1 a * * * * nad1 b * * * (no split determined) * * nad2 a * * * ymf65 ymf65_a + b nad2 b * * * “nad2” “nad2” nad3 * * * * * nad4 * * * * * nad4L * * * * * nad5 * * * * * nad6 * * orf236 * * nad7 * * * * * nad9 * * * * * nad10 * * * * cob * * * * cox1 * * * * cox2 * * * * atp9 * * * * ccmF/yejR * * * * rps2 * rps3 a * * ymf64 ymf64 rps3 b * orf190 “rps3” “rps3” rps4 * * ymf76 ymf81 + 85 rps7 * orf170 ymf63 ymf63 rps8 * orf125 ymf74 ymf84 rps10 * orf111 ymf59 ymf59 rps12 * * * * rps13 * orf102 * * rps14 * orf49+ * * * rps19 * orf155 orf199 * * rpl2 * * * * * rpl6 * * * * rpl14 * * * * * rpl16 * * * * orf262 * * Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 O. trifallax Figures and Tables Ciliate mitochondrially-encoded genes determined in this study and in previous studies (Pritchard et al. 1990; Burger et al. 2000; Brunk et al. 2003; Moradian et al. 2007; de Graaf et al. 2009; de Graaf et al. 2011; Barth and Berendonk 2011). rps2, rps7, rps8 and rps10 are newly discovered ciliate mitochondrial proteins, with previously-unrecognized orthologs in other species. Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 Figures and Tables Table 2. Ciliate mitochondrially-encoded tRNAs O. trifallax E. minuta N. ovalis T. pyriformis P. tetraurelia trnC * trnE * * trnF * * trnH * * trnK * trnL * * trnM i * * * trnM ii * * trnQ * * trnW * * * * * trnY * * * * * * * * * * Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 Figures and Tables Table 3. Ciliate split ribosomal RNA segment lengths (bp) E. minuta T. pyriformis P. tetraurelia rnl a 301 281a 289 280 rnl b 2289 2230 2279 2315 rnl 2590 2511a (2230) 2568 2595 rns a 200 206a 208 212b (1477) rns b 1426 >1431a 1407 1415b (204) rns 1626 1637a (2257) 1615 1627 Bracketed values indicate the lengths of the rRNA segments as annotated in the mitochondrial genomes deposited in Genbank (E. minuta - GQ903130 and P. tetraurelia NC_001324). a Length estimates from our sequence alignments. b Estimates based on an experimental reassessment (Schnare et al. 1986) of the original P. tetraurelia rRNA split gene structure (Seilhamer et al. 1984). Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 O. trifallax Figures and Tables Table 4. Signal peptide prediction for putative macronuclear-encoded mitochondrial polymerases Macronuclear-encoded mtDNA Polymerase Macronuclear-encoded mtRNA Polymerase Predotar Mitoprot Predotar O. trifallax 0.99 0.84 0.97 0.86 T. thermophila 0.96 0.46 0.24 0.64 P. tetraurelia 0.41 0.76 0.98 0.78 Mitochondrial signal prediction probabilities are shown. Annotated Macronuclearencoded mitochondrial DNA and RNA polymerase chromosomes have been deposited in Genbank (JN383844 and JN383845). The relatively low probabilites for T. thermophila and P. tetraurelia gene predictions may be due to incorrect gene start predictions (for instance, the T. thermophila mtRNA polymerase has a 200 aa extension compared to both the P. tetraurelia and O. trifallax gene predictions). Downloaded from http://gbe.oxfordjournals.org/ at Library Serials Dept on December 17, 2011 Mitoprot