UNIVERSIDAD NACIONAL AUTONOMA DE MEXICO POSGRADO EN CIENCIAS BIOLOGICAS FACULTAD DE CIENCIAS t “ANALISIS COMPARATIVO DEL TAMAÑO Y CONTENIDO DE LOS GENOMAS PROCARIONTES ” TESIS QUE PARA OBTENER EL GRADO ACADEMICO DE MAESTRA EN CIENCIAS (BIOLOGIA) PRESENTA SARA ERNESTINA ISLAS GRACIANO DIRECTOR: DR. ANTONIO EUSEBIO LAZCANO-ARAUJO REYES - MEXICO, D. F. AGOSTO 2003 TESIS CON FALLA DE ORIGEN | ANS~STITUTO DE ECOLOGlil UN'1M I ERSI AD CI NAL OMA E EXICO RADO I CIAS I I AS LTAD E I CIAS '" ALISIS PARATIVO EL ANO NTENIDO E S MAS CARIONTES" SIS E RA TENER L ADO DEMICO E ESTRA I CIAS GIA) SENTA RA ESTINA S ACIANO I TOR: R. TONIO SEBIO O· AUJO ES · EXICO, . . STO 3 SIS N LADE I EN, . .._ __ __... UNAM – Dirección General de Bibliotecas Tesis Digitales Restricciones de uso DERECHOS RESERVADOS © PROHIBIDA SU REPRODUCCIÓN TOTAL O PARCIAL Todo el material contenido en esta tesis esta protegido por la Ley Federal del Derecho de Autor (LFDA) de los Estados Unidos Mexicanos (México). El uso de imágenes, fragmentos de videos, y demás material que sea objeto de protección de los derechos de autor, será exclusivamente para fines educativos e informativos y deberá citar la fuente donde la obtuvo mencionando el autor o autores. Cualquier uso distinto como el lucro, reproducción, edición o modificación, será perseguido y sancionado por el respectivo titular de los Derechos de Autor. Agradecimientos: A la memoria de mi papá quién permanecerá en mí para siempre. A mi mamá por su tranquilidad y ternura. Para Eduardo y Diego Kazuo los amores de mi vida. Agradezco de manera especial al Dr. Antonio Lazcano por su comprensión y ayuda en mi formación. Gracias T oño por tu amistad. A los miembros del jurado Dr. Antonio Lazcano, Dr. Víctor Valdes, Dra. Valeria Souza, Dr. Arturo Becerra y Dra. Alicia Negrón, por sus valiosos comentarios y suguerencias A mis compañeros de Laboratorio de Microbiología Ana, Arturo, Luis, Erwin por su apoyo y gran sentido del humor. A mis coautores y amigos que están por otros caminos Josetxu, Amanda y Héctor. A las nuevas generaciones los tallarines y colateds A Pilar y Oiga por el apoyo bibliográfico. A mis amigos. Resumen Abstract Introducción Indice Análisis comparativo del tamaño y contenido de los genomas procariontes Factores que modifican la cantidad de DNA: Transferencia Horizontal Endosimbiosis Patinaje de la polimerasa (Slippage) Procesos de Duplicación Duplicación de un gene Duplicación de todo el genoma Metodología Artículos: 1 2 3 6 8 8 9 9 10 11 17 Islas S, Becerra A, Leguina J 1, and Lazcano A. (1998).Early metabolic evolution: insigths from comparative cellular genomics In. Chela-Flores and F. Raulin (eds)Exobiology: Matter, Energy, and lnformation in the Origin and Evolution of Life in the Universa, 167-174 Kluwer Academic Publishers. Netherlands. 18 Islas S, Castillo A, Vázquez H G, and Lazcano A. (2000). On the role of genome duplications in the evolution of prokaryotic chromosomes In: Chela-Flores et al. (eds), Astrobiology, 289-292. Kluwer Academic Publishers. Netherlands. 26 Islas S, Velasco A M, Becerra A, Delaye L, and Lazcano A (2003) Hyperthermophily and the origin and earliest evolution of life. lnternational Microbiology. Aceptado para el vol. de junio 30 Islas S, Becerra A, Luisi Luigi P, and Lazcano A. Comparativa genomics and the gene complement of a minimal cell. Enviado a Origins of lifeand evolution of the biosphere. 50 Conclusiones 78 Referencias 83 Apéndice 1: 1 a Base de datos 87 1 b Referencias de la base de datos 109 Apéndice 2 2a Gráfica1 127 2b Gráfica 2 128 Apéndice 3 3a Becerra A, Islas S, Leguina J 1, Silva E and Lazcanbo Antonio. (1997). Polyphyletic gene losses can bias backtrack characterizations of the cenancestor.J Mol Evol 45: 115-118 129 3b Becerra A, Silva E, Lloret L, Islas S, Velasco A M, and Lazcano A (2000). Molecular biology and the reconstruction of microbial phylogenies : Des iaisons Dangereuses? In: Chela-Flores et al. (eds}, Astrobiology, 135-150Kluwer Academic Publishers. Netherlands 135 Resumen Análisis comparativo del tamaño y contenido de l·os genomas procariontes Los procariontes presentan una variación considerable en el tamaño de genoma, debida por una parte a su capacidad para modificar el contenido de DNA mediante transporte horizontal, slippage, duplicación de genes y genoma completo, así como a re-arreglos propios del genoma. Aunque no sabernos como eran los primeros organismos, es probable que su maquinaria genética fuera relativamente pequeña y sus capacidades codificantes fueran restringidas. Así, explicar los mecanismos por medio de los cuales se ha incrementado el tamaño del 9enoma no es una tarea fácil. El cálculo con mayor precisión del contenido de DNA y la construcción de mapas genómicos ha sido refinado mediante la técnica de electroforesis de campo pulsado (PFGE), utilizada desde 1985 para estimar el tamaño dB genoma. Con el propósito de estudiar los factores que intervienen en las variaciones del tamaño de los genomas procariontes, incluyendo la hipótesis que sugiere que estos son el resultado de rondas de duplicación del genoma completo (Ohono 1970; Wallace y Morowitz 1973; Zipkas y Riley 1975; Sparrow y Neuman 1976 Herdman 1985;), en este trabajo reportamos el resultado de un análisis estadístico de la distribución 1je 641 tamaños de genomas tanto de bacterias como de arqueas cuyas dimensiones han sido calculadas mediante la técnica de PFGE. Se analizó una base de datos de 641 organismos procariontes construida de reportes publicados en NCBI, Scirus, Highwire, y fue completada con datos de posición filogenética, estilo de vida, temperatura, y metabolismo. Con los datos disponibles hasta febrero de 2003, encontramos que el rango de tamaño de genomas procariontes es de 0.448 Mb (yproteobacteria) a 9.7Mb (<:tproteobacteria). Los organismos con genomas más pequeños son simbiontes y parásitos obligados pertenecientes a los grupos .yproteobacteria, Anaeroplasma, Spiroplasma, Rickettsia y Spirochaetae. No todos los organismos con tamaño pequeño son anaerobios lo que puede ser explicado a través de una serie compleja de adaptaciones secundarias que han guiado a la reducción de su genoma. Se encontró sin embargo, que en general que los procariontes anaerobios y microaerofílicos están dotados con genomas más pequeños que los aerobios. No obstante, los genomas más pequeños no son por su propio tamaño una muestra que nos lleve a pensar que son formas ancestrales; igualmente, el rango relativamente pequeño del tamaño de gerioma de los hipertrmófilos puede revelar una tendencia a que ambientes con altas temperaturas constriñen el contenido de DNA a un rango específico (0.5 Mb-5.10 Mb), probablemente por la reducción del tamaño promedio de sus genes. Los genomas más grandes son típicamente organismos aerobios de vida libre y con ciclos de vida complejos. Aunque esta base de datos claramente presenta un sesgo ( organismos disponibles en WWW) y no representa toda la diversidad procarionte, en la distribución de los tamaños de genoma de la muestra no hay evidencias que nos permitan ratificar la hipótesis de Herdman (1985) es decir, los resultados sugieren que el contenido de DNA de los procariontes no proviene de duplicaciones totales del genoma. Abstract Comparative size analyses and DNA content of prokariotic genome. There is a considerable variation in the prokariotic genome size this variation is a result of their ability to modify the DNA content by different means like horizontal transfer, slippage, gene duplication, whole genome duplication and arrangements of the genome itself. Although it is still unknown how does the first cells were, is probable that they were endowed with relatively small genetic machinery with reduced . encoding capacities. Thus, to explain the mechanisms through which the genome size has been increased is not an easy task. Pulsed field gel electrophoresis is the best technique to construct a genetic map and to estímate with accuracy the determination of DNA content. This technique has been used since 1985. The purpose of this work was to analyze the factors involved in the variations of prokaryotic genome size including the hypothesis which suggests that these are the result of genome duplication (Ohno 1970; Wallace and Morowitz 1973; Zipkas and Riley 1975; Sparrow and Neuman 1976 Herdman 1985;), in this work we report a statistical analysis of a sample of 641 archaeal and bacteria! genome sizes determinad by pulsed-field gel electrophoresis (PFGE), reportad in publications included in the NCBl/PubMed, Scirus, Highwire databases until February 2003. and was completad with phylogenetic data, life style , temperatura, and metabolism. In our sample the prokaryotic genome size rank was 0.448 (y proteobacteria) to 9.7Mb (~proteobacteria). The organisms with the smallest genome size are obligated simbionts and parasites belong to y proteobacteria, Anaeroplasma, Spiroplasma, Rickettsia and Spirochaetae groups. Not all organisms with small genome are anaerobic this feature can be explained through a complex series of secondary adaptations that have leaded to reduce their genomes. In general the anaerobic and microaerofilic procaryotes are endowed with smaller genomes than aerobic. Nevertheless the smallest genomes are not for its own size a sample that they are ancient forms; likewise, the relatively small rank of the hiperthermophilic genome size can revea! a tendency which shows that environments with high temperaturas confine the few DNA content to a specific rank (0.5 Mb-5.1 O Mb), probably by the reduction average of its genes. The largest genomas are typically aerobic organisms, free lite and with complex life cycles. The database analyzed here is biased (only available organisms on www) and does not reflect all the actual prokaryotic biodiversity, in our distribution there are no evidences which support the Herdman·s hipothesis, the results suggests that procaryotic DNA content does not outcome from the whole genome duplications 2 Análisis comparativo del tamaño y contenido de los genomas procariontes Introducción. Un genoma celular se puede definir como el contenido total de la información genética (DNA) utilizada por un organismo para mantenerse y reproducirse (Kolsto 1999). A lo largo del tiempo los seres vivos han sufrido modificaciones tanto en el contenido como en las dimensiones en su genoma. Así, aunque no sabemos como eran los primeros organismos, es probable que su maquinaria genética fuera relativamente pequeña y sus capacidades codificantes fueran restringidas. El cálculo del contenido de DNA ha sido una tarea que se ha enfrentado desde hace varias décadas, utilizando diferentes técnicas como la colorimetría, la cinética de renaturalización del DNA, la electroforesis de campo pulsado (PFGE) (Cantor 1988) y, más recientemente, la secuenciación completa de genomas. Con el propósito de estudiar los factores que intervienen en las variaciones del tamaño de los genomas procariontes, incluyendo la hipótesis que sugiere que estos son el resultado de rondas de duplicación del genoma completo (Ohono 1970; Wallace y Morowitz 1973; Zipkas y Riley 1975; Sparrow y Neuman 1976 Herdman 1985; Trevors 1996 ), en este trabajo reportamos el resultado de un análisis estadístico de la distribución de tamaños de genomas tanto de bacterias como de arqueas cuyas dimensiones han sido calculadas mediante la técnica de PFGE. Cuando intentamos reconstruir fases tempranas de la evolución de los seres vivos encontramos una gran cantidad de interrogantes que marcan las diferentes etapas de cambio, que arrancan desde los procesos de evolución prebiótica que llevaron a las primeras formas de vida, que muy probablemente estaban basadas en un polímero genético distinto al RNA mismo, hasta la aparición de formas celulares con genomas de DNA pasando por una forma intermedia en la que los genomas pueden haber estado formados por RNA Los procariontes son los seres vivos más antiguos en la Tierra. con un registro fósil que data de hace 3.5 billones de 3 años (Schopf, 1993; Brasier et al,, 2002). Aunque el registro paleontológico no permite establecer con precisión ni como eran los primeros seres vivos ni el tipo de genoma que tuvieron las primeras células, se acepta que la atmósfera primitiva carecía de oxígeno libre y pudo, de hecho, haber sido reductora. Ello implica que los primeros seres vivos eran anaerobios y heterótrofos (Oparin 1938). Sus descendientes, en cambio, se fueron adaptando a un ambiente con una creciente cantidad de oxígeno liberado en la atmósfera, lo cual seleccionó nuevas capacidades metabólicas cuya presencia se refleja, al menos en parte, en las variaciones en el tamaño de los genomas procariontes. Los genomas procariontes poseen, por un lado una capacidad mucho mayor que los eucariontes para adquirir genes y porciones de DNA mediante el transporte horizontal y, por otra, una estabilidad relativa que les confiere una identidad específica. Así, explicar los mecanismos mediante los cuales se ha incrementado el tamaño del genoma no es una tarea fácil. Se podría pensar que "organismos más complejos" (pluricelulares) requieren de más genes, es decir de una mayor cantidad de DNA (Petrov 2001 ). Sin embargo, existen algunas amibas que tienen 200 veces mas DNA que los humanos (pero no necesariamente tienen más genes). A diferencia de los eucariontes, el genoma en procariontes se traduce casi directamente a funciones bioquímicas, fisiológicas y complejidad organísmica, por que la mayoría de las secuencias procariontes corresponden a regiones codificantes, es decir, son proteínas o RNAs funcionales. Es decir, en los procariontes existe una correlación directa positiva entre el número de genes y el tamaño del genoma (Mira et al,, 2001 ). Así, se puede concluir que los genomas procariontes de mayor tamaño codifican para más proteínas, secuencias reguladoras, mecanismos de reparación, diversidad de rutas metabólicas y ciclos de vida complejos. En general, se puede decir también que los organismos con replicones· más grandes poseen, una tendencia metabólica "generalista", es decir, • El término cromosoma fué acuñado para designar el aspecto adquirido por el material genético teñido en células eucariontes. Por analogía con los eucariontes, la molecula de DNA circular o linear de procariontes se denomina cromosoma, pudiendose designar igualmente como replicones pués este término hace referencia a la estructura de ácido nucleico con capacidad de 4 poseen capacidades metabólicas amplias y menos requerimientos por compuestos específicos en su medio de cultivo (Shimkets 1997). En cambio, los genomas más pequeños tienden a ser de organismos altamente especializados que ocupan nichos restringidos como aquellos procariontes parásitos que viven en hospederos bajo condiciones muy particulares (e.g. los micoplasmas, las rickettsias, etc). Sin embargo, esta correspondencia no es absoluta y determinante, dado que la distribución de tamaños se traslapa ampliamente entre estos dos niveles. Se ha propuesto que en el pasado remoto los genomas deben haber sido pequeños, codificando para enzimas poco específicas, proporcionando a dichas células, máxima flexibilidad bioquímica con un mínimo contenido de genes (Jensen 1976). Existe una variación considerable entre los tamaños de genomas procariontes , que pueden ir desde los más pequeños con 580,000pb como Mycoplasma genitalium (Fraser et al, 1995) hasta los de Stigmatiella erecta, con 9,550,000 pb (Neumann et al, 1992). Algo similar ocurre con la geometría· de sus genomas. Hasta hace poco tiempo se pensaba que los procariontes poseían solo replicones circulares, pero algunas especies presentan cromosomas lineares como el genoma de Borrelia burdogferi (Casjens 1993), Streptomyces lividans (Lin et al, 1993), Rhodococcus fasciens (Bendich y Drlica, 2000), y Azospiri//um (Martin- Didonet et al, 2000), pudiendo coexistir las dos formas en algunos organismos como Agrobacterium, Azospirillum y Streptomyces. La presencia de genomas lineares en grupos filogenéticamente muy separados sugiere que estos se han generado varias veces de manera independiente. Todos ellos pueden poseer elementos extracromosómicos o plásmidos, que a su vez pueden ser lineares o circulares, y que no son esenciales para la sobrevivencia del microorganismo. Los plásmidos suelen codificar para funciones "específicas ", que le permiten a la bacteria o arquea adaptarse a ambientes adversos. Tales funciones incluyen, por autoduplicación. Por tanto son replicones los cromosomas de las células eucariontes, procariontes, los plásmidos y los ácidos nucleicos de los virus. Igualmente el término genoma y cromosoma procarionte se ha llegado a utilizar indistintamente. • La geometría del genoma procarionte se refiere a la forma en que se presenta el DNA procarionte y puede ser circular o linear. 5 ejemplo, resistencia a los antibióticos, fertilidad (propician conjugación y transferencia de material genético), virulencia , degradación de sustancias, y fijación del nitrógeno. El tamaño de los plásmidos varía de 2 kb ( 2 genes aprox ), a 600 kb y hasta 1600 kb como en Rhizobium, al que se le conoce un megaplásmido o cromosoma auxiliar. Las proteobacterias conforman una cohorte filogenética dividida en diferentes grupos, de los cuales las B y y albergan una gran cantidad de plásmidos (Moreno 1998). El tamaño no es un rasgo único para diferenciar un plásmido de un cromosoma, ya que para distinguirlos se requiere que el plásmido posea genes esenciales, tamaño suficiente y control de replicación. ,A.sí, Ng et al, (2000) reportaron la secuencia completa de DNA del plásmido circular (pNRC100) de la arquea Halobacterium, y encontraron que contiene 191,346 pb (aprox. 186 genes) que son considerados genes esenciales, además de un gen para la replicación. Ello muestra lo difícil que es precisar la distinción entre plásmidos y cromosomas . Factores que modifican la cantidad de DNA El incremento en el contenido de DNA se produce principalmente por la transferencia horizontal, la endosimbiosis (en el caso de los eucariontes), slippage, y eventos de duplicación de genes y genomas (fig.1 ). A continuación se describen estos procesos. 6 1 i Fig 1. Factores que modifican la cantidad de DNA en procariontes. 7 Transferencia horizontal Es el nombre que describe los diferentes procesos por medio de los cuales un organismo típicamente procarionte transfiere una parte de su material genético a otro organismo que puede o no ser de su misma especie (Eisen, 2000; Jain et al, 2002). En este proceso existe un componente temporal y espacial de los organismos para la adquisición y fijación del material transferido. Se ha establecido que no todos los genes tienen la misma probabilidad de ser transferidos. Por ejemplo, aquellos que codifican para el RNA ribosomal no son transferidos frecuentemente a otras especies mientras que los genes con mayor posibilidad de ser transferidos son los genes de mantenimiento del organismo. (Jain et al, 1999). La comparación entre genomas completamente secuenciados con respecto a la composición de nucleotidos, análisis de uso de codones, y distribución filogenética basados en familias de genes son procedimientos que proporcionan evidencias de la transferencia horiz9ntal de genes entre los dominios Arquea, Bacteria y Eucaria y pueden ser la base de adaptaciones bioquímicas y ambientales (Roy 1999) como en los casos de Aquifex aeolicus (bacteria hipertermofílica) y la arquea Methanococcus jannaschiii (Aravind et al, 1998). También se ha reportado transferencia de genes de Archa ea a Aquifex aeolicus y Thermotoga marítima. (Nelson et al, 1999) Endosimbiosis Muchos procariontes muestran una tendencia para establecer asociaciones con células eucariontes, lo que conduce al establecimiento de diversos tipos de interacciones entre ellos, incluyendo la endosimbiosis que ha jugado un importante papel en la evolución (Margulis 1993). Las mitocondrias y los cloroplastos son vestigios de procariontes de vida libre y exhiben fuerte erosión genética durante su evolución como un resultado de la pérdida de genes innecesarios así como de transferencia de genes al núcleo. No obstante, no hay ejemplos hasta ahora conocidos de esta relación entre procariontes . Patinaje de la polimerasa (Slippage) Es una mutación que ocurre durante la replicación del DNA (en donde un mal apareamiento de las hebras genera un incremento o pérdida de material genético en un segundo evento de repl icación , que tiene la peculiaridad de producir regiones de nucleótidos y por ende de proteínas con un sesgo en la composición del conteniendo de segmentos repetidos, mejor conocidos como secuencias de baja complejidad. Estas secuencias proporcionan una fuente de variabilidad fenotípica y genética en la evolución del tamaño de los genomas (Tautz et al, 1986; Hancock 1995; Becerra et al, en prep) . Procesos de duplicación El significado evolutivo de la duplicación de genes fue reconocido desde hace mucho tiempo por Haldane (1932) quien sugirió que las copias del material genético extra (redundante) a través de sucesivas mutaciones pueden alcanzar funciones nuevas (Ohno 1970; Li 1980). La duplicación de genes es el mecanismo más importante para la generación de nuevos genes durante la evolución del genoma y este mecanismo puede operar a diferentes niveles como se ve en la tabla 1 Tabla 1. Tipos de duplicación Región de un gen Un gen completo Región de un cromosoma Cromosoma entero Genoma total P procarionte E eucarionte duplicación interna duplicación completa polisomía aneuploidía poliploidía PyE PyE PyE E PyE Duplicación de un gene La amplificación de un gene es un fenómeno genético generalizado en organismos procariontes . Las duplicaciones pueden surgir por la recombinación desigual entre dos moléculas de DNA en una horquilla de replicación. La recombinación ocurre entre dos diferentes copias de una secuencia corta repetida representada por las líneas gruesas, formándose una amplificación del gene o bien una duplicación en tandem (Romero y Palacios 1977; Anderson y Roth 1977) Horquilla de replicación -<----:z:--->>---/ - -< .. .... ~ · ···· - ··-· >>-------- Fig 2 Modelo de duplicación de un gene durante la replicación del genoma bacteriano. La duplicación de un gene (rectángulos grises) puede ocurrir entre dos dobles hélices hijas. La recombinación desigual se realiza entre dos secuencias cortas repetidas (líneas gruesas) , resultando por un lado una duplicación en una hebra y una pérdida en la otra. Una vez que un gen se duplica el nuevo gen puede mutar y eventualmente emergen nuevas funciones enzimáticas (hisA e hisFJ (Alifano et al, , 1996) o bien solo sufrir elongación como el caso de carB (Lawson et al, 1996). 10 Duplicación de todo el genoma La duplicación del genoma completo da como resultado una rápida expansión en el número de genes. En su libro Evolution by gene duplication Ohno (1970) propuso que nuevos genes eran producidos durante eventos de duplicación completa del genoma, y que estos eventos constituyeron un prerequisito para transiciones evolutivas mayores. Se ha planteado que la duplicación del genoma completo en procariontes puede ocurrir mediante: a) entrecruzamiento de genomas circulares idénticos b) unión cabeza cola de dos genomas lineares idénticos y c) un modo de replicación para el genoma bacteria! primitivo similar al modo usado por algunos bacteriofagos actuales (Zipkas y Riley 1975). El resultado de la duplicación completa del genoma es conocido como ploidía, y ha sido definida convencionalmente en células eucariontes y se refiere al número de cromosomas homólogos en los organismos ; la poliploidía es consecuencia de la no disyunción del material genético durante la meiosis (en eucariontes), por lo que hay un incremento en el contenido de DNA. De acuerdo con el número de cromosomas homólogos presentes en las células, éstas pueden ser haploides (n) como los gametófitos y la mayoría de los procariontes, diploides (2n) como el ratón, o poliploides como algunas ranas arborícolas. La ploidía ha sido ampliamente observada en grupos biológicos muy separados entre sí tales como las levaduras, plantas (angiospermas pteridofitas) y en animales (ostracodos y algunos anfibios); es decir, es un proceso de origen polifilético (Soltis y Soltis 1999). La mayoría de los procariontes son haploides. Sin embargo, diferenciar este proceso durante los ciclos celulares en procariontes y eucariontes puede ser complejo. En algunos procariontes la ploidía ha sido percibida como consecuencia del desfasamiento entre una tasa más rápida de crecimiento con respecto a la replicación durante el ciclo celular (Trun 1999). Es decir, cuando E.coli crece a 60 minutos, cada nueva célula hereda un cromosoma; sí el valor de crecimiento de E.coli es más rápido que el tiempo de replicación del DNA, las células heredan cromosomas con horquillas de replicación, pudiendo suceder dos posibilidades: 11 l. 11 . a) La nueva célula puede heredar un cromosoma con más de una horquilla pero con un sólo sitio de término, en cuyo caso la célula es haploide; ó bien b) varios cromosomas en replicación, entonces la célula es técnicamente diploide (ver Trun 1999) Fig 3 Fig 3 .Estado del cromosoma a diferentes valores de crecimiento en Escherichia coli tomado de Turn ( 1999) y modificado Tiempo de No. de duplicación Síntesi s cromosomas 60 minutos ~ ==> 1 cromosoma 2 cromosomas @] ==> ==> 1-2 cromosomas 2-4 cromosomas ó ~ = (®0J= C)C) C)C) 20 minutos 4 + cromosomas 2 + cromosomas 12 ==> División celular Divisón celular División celular Tabla 2. No. de equivalentes por genoma en un población de procariontes en diferentes fases del ciclo celular . (Bendich y Drlica 2000 ) Organismo Escherichia coli Methanococcus jannaschii Deinococcus radiodurans Synechococcus PCC6301 Desulfovibrio gigas Borrelia hermsii Azotobacter vinelandii Fase estacionaria (No. Eq. x genoma al inicio) 2,4, 8 3 4 Fase exponencial (No. Eq. x genoma al final) 11 7 10 8 9,17 16 80-100 El caso mas notable que se conoce es el que ocurre en la bacteria Epulopiscium fishelsoni en donde el contenido de DNA varía entre 4 y 5 ordenes de magnitud entre individuos en diferentes estados del ciclo de vida y puede exceder considerablemente la cantidad de DNA encontrada en el núcleo de mamíferos (Bresler 1998). El incremento en la cantidad de material genético no asegura directamente una complejidad mayor en el genoma, porque, despues de todo, el resultado final implica que el organismo simplemente tiene una o más copias del genoma más que nuevos genes con capacidades metabólicas nuevas. Cuando un genoma se duplica hay muchas situaciones que pueden alterar el destino de la copia extra de las secuencias del genoma, entre las que se encuentran (a) la formación de pseudogenes; (b) la adquisición de nuevas funciones; (c) la adquisición de funciones parecidas. La importancia de la duplicación total del genoma se ha convertido en un tema controvertido, porque las evidencias para afirmar que este tipo de eventos 13 sucedieron no es fácil de discernir, ni siquiera al analizar los genomas completamente secuenciados. Las evidencias más recientes (pero, al mismo tiempo, no aceptadas por todos) sobre la duplicación total del genoma provienen del análisis de dos genomas eucariontes completamente secuenciados: el de Saccharomyces cerevisiae, analizado por Wolfe y Shields ( 1997), quienes identificaron 55 grupos duplicados, comprendiendo 376 pares de genes conteniendo un mínimo de 3 genes en el mismo orden, y quienes concluyen que esta duplicación ocurrió hace 100 millones de años. Está también el caso de Arabidopsis thaliana, donde fueron encontradas 24 grandes regiones duplicadas en un genoma considerado relativamente compacto, (Simillon et al, 2002). Este descubrimiento llevó a pensar que la duplicación del genoma completo y su subsecuente contracción han sido un importante factor durante la evolución de genomas de plantas. Después que la duplicación del genoma fue descrita como la mayor fuerza evolutiva en los vertebrados (Ohno 1970), este proceso se extrapoló para explicar la evolución de los procariontes. Tal postulado fue hipótesis la de trabajo central en varias investigaciones, que de forma general han seguido dos aproximaciones metodológicas distintas: (a) mediante el análisis del arreglo genómico de un solo organismo (Zipkas Riley (1975); y (b) el análisis de la distribución de un conjunto organismos con tamaño de genoma conocido (Wallace y Morowitz 1973; Sparrow y Neuman 1976, Herdman 1985;). En lo que se refiere al análisis del arreglo de secuencias en el genoma, Zipkas y Riley (1975) plantean que E.coli experimentó dos duplicaciones secuenciales de genoma en el pasado, lo cual se refleja en la posicion de algunos genes en un mapa circular del genoma de la bacteria. Al examinar el mapa genético de E.coli, encontraron que pares de genes bioquímicamente relacionados tienen una tendencia a orientarse cada 90 o 180 grados. Sin embargo sabemos que existen cambios y arreglos genómicos como movimiento de los genes y cambio en el orden de los mismos que suelen ocurrir (Fani et al ., 1998). Si la duplicación del genoma tuvo lugar, estos sucesos deben haberse producido con poca frecuencia o muy simétricamente en todo el genoma 14 de tal forma que que se conservara dicho arreglo y que por tanto la distribución (orientación de los genes cada 90 o 180 grados ) en el genoma no sea un arreglo aleatorio. Por otra parte, Wallace y Morowitz (1973) analizaron la distribución de tamaños de genoma correspondientes a 98 especies bacterianas agrupadas en las familias Achromobacteraceae, Azotobacteraceae, Bacillaceae, Brevibacteriaceae, Brucellaceae, Corynebacteriaceae, Enterobacteriaceae, Lactobacillaceae, Micrococcaceae, Neissiaceae, Rhizobiaceae, Mycoplasmataceae, Nitrobacteraceae, Pseudomonadaceae, y Spirillaceae estimadas por cinética de renaturalización y microscopía electrónica . A través de su análisis ellos propusieron un esquema en donde los genomas más pequeños (5x 10 9 daltones) de los micoplasmas (en este caso, Acholeplasma laidlawii, un micoplasma saprofito aislado de aguas residuales) representan formas de vida primitiva y los denominaron genesistron, y concluyeron que la evolución subsecuente ocurrió por duplicación del DNA. Según Wallace y Morowitz Acholeplasma puede ser considerado como intermediario en la evolución de células protocariontes (es decir, una forma ancestral prebacteriana) a procariontes. Sparrow y Neuman llevaron a cabo un analisis de las duplicaciones completas de muchos genomas distintos (1976) a partir de una distribución logarítmica de DNA por genoma de especies compiladas de diferentes reportes que expresan el contenido de DNA en una variedad de unidades (estas unidades fueron convertidas a picogramos para igualarlas ). Su analisis comprendió 23 grupos filogenéticos ampliamente separados entre sí, incluyendo viroides, virus, bacterias, hongos, algas, protozoa, porifera, nematodos, insectos, cordadas "inferiores", celenterados, angiospermas, vertebrados, moluscos equinodermos, anélidos, crustáceos y gimnospermas. La distribución del DNA tiende a formar varios picos a valores de múltiplos lo que parece representar duplicaciones de DNA intragrupo, en el caso de los eucariontes son independientes de la poliploidía, por lo que este fenómeno ha sido llamado criptopol iploidía y denota un incremento en el tamaño 15 de genoma por aumento en el tamaño del cromosoma. Cuando Sparrow y Neuman (1976) comparan duplicaciones teóricas contra valores mínimos de DNA en cada grupo, observaron una tendencia cíclica sobre 8 ordenes de magnitud a partir 300 nucleótidos (1.65 x 10-7 pg ) calculados para un viroide de RNA que ellos interpretan como un genoma ancestral básico. En cambio, Herdman (1985) estudió la distribución de 605 genomas de diversas cepas de bacterias, cuyos tamanos fueron calculados usando diversos métodos, incluyendo sobre todo cinética de renaturalización. La distribución fue discontinua mostrando picos modales de (1) .5, (2) 1.0-1.25, (3) 2.5-2. 75, (4) una cola más larga que se extiende arriba de 4.75 y (5) muy pocos genomas entre 6 y 8.5 x 10-9 daltones. Además, Herdman postuló que los cambios en los tamaños de genoma son producto de dos principales procesos : a) fusión de genomas o b) duplicación de pequeños genomas ancestrales. Herdman también comentó que éste proceso ha ocurrido independientemente en diferentes grupos de bacterias y que el cambio de metabolismo anaerobio a aerobio ocurrió separadamente en cada una de las líneas bacterianas de descendencia y esto fue acompañado por una o más duplicaciones en el tamaño de genoma, aunque no sin dejar de hacer notar que existen mecanismos adicionales que intervienen en los cambios en el tamaño de genoma. Como se ve en los trabajos adjuntos, esta metodología y las conclusiones han sido modificadas substancialmente gracias a los datos disponibles. 16 Metodología Se construyó una base de datos con los tamaños de genoma de 641 organismos procariontes determinados por por la técnica de Electroforesis de campo pulsado (PFGE), reportados en artículos incluidos en NCBI (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed ) Scirus (far a scientific information only http://www.scirus.com/) and Highwire (library of the sciences and medicine http://intl .highwire.org/) hasta febrero de 2003. La base de datos fue completada con una descripción de la posición filogenética, el estilo de vida (simbiontes obl igados, parásitos obligados y vida libre), el intervalo de temperatura de crecimiento (mesófilos 25-44 oC, termófilos 45-70oC e Hipertermófilos > 70oC, y la respuesta al oxígeno de los organismos fue basada en reportes originales y en datos de Bergey's Manual of Bacteria! Determination (Holt et al, 1994): anaerobios (organismos con respuesta al oxígeno negativa), microaerofíl icos (organismos que requieren < 21 % de oxígeno), facultativos anaerobios (organismos con doble respuesta al oxígeno) y aerobios (organismos que requieren 21 % de oxígeno). Se realizó un análisis estadístico (Ji cuadrada) usando el programa Microsoft Excel 17 EARLY METABOLIC EVOLUTION: INSIGHTS FROM COMPARATIVE CELLULAR GENOMICS S. ISLAS. A. BECERRA. J. l. LEGUINA. and A. LAZCANO Facultad de Ciencias-UNAM Apdo. Postal 70-407, Cd. Universitaria 0451 O Aféxico D. F, MÉ>.."/CO l. lntroduction The use of small subunit rRNAs as molecular markers has led to universal phylogenies. in which ali known organisms can be grouped in one of three inajor cell lineages. the eubacteria. the archaeabacteria, and the eukaryotic nucleocytoplasm, now referred to as the domains 8act~ria. Archaea. and Eucarya. respectively (Woese et al .• 1990). A dc:;.::ripuon of :!1~ !.i~t crnunon ancestor (LCA. i.e., the cenancestor), of tJ1csc three primruy kmgcioms may be inferred from the distribution of homologous characters among its descendants. In conjunction with the fragmentar)' information available from other organisms. the complete genome sequences now available in the public databases allow such characteri7.ations, and in sorne cases can even provide insights into the nature of the ccnancestor predecessors. Here we discuss the basic assumptions and strategies used in such approaches. and apply them to thc understanding of the evolutionary assemblage of arginine biosynthesis. Additional aspects of the evolution of metabolic routes have been discussed in Peretó et al. ( 1997). 2. Sorne problems in comparative genomic analysis The distribution of many biosynthetic enzymes found in all three primary lines of descent before complete genome sequences . became available had already led to the idea that the cenancestor was comparable to modero prokaryotes in its biological complexity. ecological adaptability, and evolutionary potential (Lazcano, 1995). However. the d.ifferences in the metabolic repertoire and gene expression mechanisms among the three primary domains (cf. Olsen and Woese, 1997) demonstrate that the characterization of the LCA is an unfinished task., and that strong statements and broad generalizations should be avoided. 167 J. Clreú.z-Flores ami F. Rau/in (eds.). Exobiolo¡:y: Matter. Energy, and lnformarion in the Origin and Evolution of Life In rllL Universe. 167-174. © 1998 Kluwer Academic Publisher.<. Printe¡J Í'kthe NerhLr/ands. 168 S. ISLAS ET AL In principie, backtrack reconstructions of ancestral states can be achieved with a simple, straightforward methodology. Given the availability of complete genome sequences from the three primary domains, the cenancestor is defined by properties shared by ali living organisms. minus those that are the outcome of convergent evolution and those acquired by horizontal transfer (Figure l). However. cross- genomic analysis can be difficulted by wúdentified proteins encoded by rapidly evolving sequences. as well as from the properties of a given genomic dataset. Inferences on the nature of the LCA can also be biased by the reduced DNA content of parasites and pathogens such as the mycoplasma, which have been selected as model organisms because of their small. compact genomes (Becerra et al., 1997). Although the application of shotgun sequencing has led to an impressive growth of the databases in a very shon time. larger volumes of complete genome sequences reflecting a broader cross-section of biological diversity are stiII required. Figure l. lnter.;cction of thc complete sequcncc spaccs of 1he tlne domaim defines 1he gene complcment of thc common ancestor (LCA). ldentitication of rapidly-cvolving scqucoccs would lead to a biggcr set of ancestral genes (hatched arcas). The functions of many open reaiing frames (ORFs) derived from complete genome sequencing projects have been tentatively identified by computer searches based on structural similarities to known sequences in databases, but many more remairi unidentified (30 to 50%. depending on the organism). Such databases are collections of the sequences that make up biological systems. but understanding how each component works is not enough for a proper description of bow the entire system proceeds (Kanehisa., 1997). For instance, in the Bacil/us subti/is tryptophan operon no sequence encodes the glutamine amido transferase required for anthranilate biosynthesis. This 19 EARLY CELLULAR EVOLtrnON 169 would pose a problem in comparative genomic-based metabolic reconstructions, had biochemical experimentation not demonstrated that in B. subtilis the required gene is shared with the folate biosynthetic route, in whose operon it is located (Crawford, 1989). As sumrnarized in Table · I, understanding of the evolutionary development of metabolism can be obscured by a complex series of changes involving enzymatic additions. secondary losses, pathway replacements, and functional redundancies. Additional complications can result from (a) intraespecific enzyme substitutions involving paralogous proteins: (b) that possibility that extant enzymes may have participated in altemative routes which no longer exist or remain to be discovered (Zubay. 1993: Becerra and Lazcano, 1997); (e) homologous enzymes endowed with widely different catalytic properties (see below); and (d} intracellular horizontal transfer within nucleated cells (Embley et al .• 1997). TABLE l. Some proc:esses in mct.abolic evolution. process examples addition of enzymatic step(s) ·oxygen-dependent cbolestcrol biosyntbesis archual biosyntbesis of2,3-di-O-phytanil sn-glycerol · refcrmce Blocb (1994), Ourisson md Nakalani (1994) S1ctter ( 1996) loss of routes and enzymes purine biosynthesis in parasiles Becerra er al. ( 1997) pathway replacement functional redund:icies aerobic insttad of anaerobic biosyntbesis of Blocb ( 1994) monounsaturated fauy acids fungal lysine biosynthesis phosphatidylcholine biosyntbesis Vogel (1960) Bloch (1994) imid:izolc biosynthesis in purine and histidine Pcreló eral. ( 1997) biosynthcses J. Did metabolism evoh·e backwards'! The first attempt to explain the emergence of metabolic pathways was developed by Horowitz ( 1945), who suggested that biosynthetic enzymes had been acquired via gene duplications that took place in reverse order as found in extant pathways. This idea, also known as the retrograde hypothesis, established an evolutionary connection between the primitive soup and the development of metabolic pathways. and is frequently invoked in descriptions of early biological evolution (cf. Peretó et al., 1997). Prompted by the discovery of operons, Horowitz (1965) restated his modeL arguing 20 170 S. ISLAS ET AL. that it was supported not only by the overlap between the chemical structures of products and substrates of the enzymes catalyzing successive reactions. but also by the clustering of functionally related genes. Although sorne operon-like gene clusters are found in botb bacterial and archaeal genomes. whole genome comparisons between distant prokaryotes havc: shown that gene order can be easily eroded by extensive shuftling events (Mushegian and Koonin, 1996). This implies that the distribution in prokaJyotic chromosomes of homologous genes encoding pathway enzymes cannot be used to (dis)prove tbe Horowitz hypothesis. However, if the enzymes catalyzing successive steps in a given metabolic pathway resulted from a series of gene duplication events (Horowitz, 1965), then tbey must share structural similarities. Tbe known examples confinned by sequence comparisons that satisfy this condition are limited to few pairs of enzymes and have been discussed elsewhere (cf. Peretó et al., 1997). 4. The patchwork assemblage of biosyntbetic routes An alternative interpretation of role of gene duplication in the evolution of metabolism has been developed in the so-called patehwork hypothesis (cf. Jensen. 1976). According to this scheme. biosynthetic mutes were assembled by primitive catalysts that could react with a wide range ofchemically related substrates. The recruitment of eflZ)mes from different metabolic pathways to serve novel catabolic routes under strong selective pressures is well document under laboratory conditions. Repeated occurrcnces of homc!ogous enzymes in different pathways provide independent evidence of patchwork unkering. Data derived from the ongoing genome pr01c:ts has already demonstrated that a large portian of each organisms genes are relat ,:á to each other as well as to genes in distantly related species. As discussed in the following section, the central role that gene duplication and recruitment have played in the assemblage ofhistidine anabolism (Alifano et al., 1996) and purine nucleotide salvage pathways (Becerra and Lazcano. 1997) can also be extended to include arginine biosynthesis. 5. Gene duplication and arginine anabolism The phylogenetic distribution of arginine biosynthetic genes suggest that this route was already present in the LCA. Hence, its absence in both Helicobacter py/ori and the mycoplasma probably reflects polyphyletic secondary losses. Although the same chemical steps involved in arginine biosynthesis have been found in ali organisms · studied. two different strategies for the deacetylation of the intennediate N- acetylornithine have been described. In enterobacteria, the genus Bacillus. and the archaeon Sulfo/obus solfacaricus this reaction is catalyzed by N-acetylornithinase. the 21 mm AA (a) COASH. NAcetyigutamate ATP Nracetylgutamate kinase, argB (aga) ADP NAcetyi-y-gutamyiphosphate NADPH+ Nacetyigutamatephosphate reductase, NADP*, Pi N-Acetyigutamate-y-semialdehyde Glutamate: Nracetylomithine aminotransferase, 9D (bioA, gabT, hem) a-Ketoglutarate- NAcetylorithine Ho 'Nacetylomithinase, YE (dapE) Acetate L-Omithine Carbamoyi phosphate Omithine cartbamoy! transferase, agAl (pyr8) A L-Citruline E ae Argininosuccinate synthase, AMP, PPi L-Argininosuccinate Argininosuccinase, argH (aspA, fumC) Fumarate. LArginine Figure 2. Arginine biosynthesis. The arginine biosynthetic genes paralogs are indicated within parenthesis. EARLY CELLULAR EVOLtmON L-GIAamate f.cefyt-CoA~ N-acetytgl.tan&e synthase, apA(a-pB) oASH -k t t t ate A"TP ~ - cetyt l ate i se. agB(apA) . DP N-l>a!tyl"")'..gutamyl¡tiosphate N.AOPH+W~ ~gtJamate¡tiosphate redudase, f'WJP+,Pi4 N-f.cefytguta~aldehyde -acetytonithi e ni o st rase, Glutamate~ a-pD(biM, gsbT, hemL) - etoglutarate -l>a!tylomthi e N- cetyl ithi ase, KzO~ a-pé(dapEJ cetate - rritti e carbamoyl phosphate~ o ttl e C3barrD¡I tn fer se, a-pRI (p¡rfJ) Pi -OtnJli e L-Aspartate, A"TP~ z:;nosu:x:i te s thase, fl.Nf', pp¡ r r :x:i nate lz:~=~ umarate4 -Argri e i ure . Tginine i synthe is. hc argininc biosynthetic enes paralogs r i aled within r t csis. 22 171 172 S. ISLAS ET AL gene product of argE (Figure 2), while in other prokaryotes and in fungi the acetyl group is removed by omithine-glutamate acetyltransferase. There is no evidence of phylogenetic relationship between these two different enzymes. Another variation in this pathway occurs in the E. coli Kl2 strain. where two homologous genes (argl. argF) encode a family of four trimeric isocnzymes, that bind to L-omithine and carbamoyl-phosphate to produce Lases. This analysis was performed using the database assembled by Riley and Labedan (1996), who compared the E. coli 1,862 protein sequences available as of April 1996 in the SwissProt databank. They concluded that 52.11°/o of ali studied protein scquences had resulted from gene duplications, and classified them in paralogous families defined by sequence similarity. Tbeir list iocludes 112 small families with only two sequences. 38 with three. 41 with three to seven. and 13 large families. As noted by Riley and Labedan, most of the members of paralogous families share comparable biochemical properties. with a scarce 1.23% of homologous protein pairs displaying what appear to be different functions. We have repeated this analysis by looking exhaustively at ali the characterized paralogous genes, and excluding from our sample 88 ORFs reported as hypothetical proteins. The resulting set was · cross O Aspartate ammonia-lyase (EC 4.3.1.1) - SI Fumarate hydratase (EC 4.2.1.2) SO > Figure 3. A three-member family of E. coli paralogous enzymes which different catalytic properties. The sequences were aligned using the Macaw program. The regions with statistically significant sequence simularity are shown in black. 7. Conclusions The discovery that homologous enzymes that catalyze similar biochemical reactions are found in many different anabolic pathways supports the idea that enzyme recruitment took place at a massive scale during the early development of anabolic pathways. This conclusion is supported by analysis of the available genomic databases, which suggest that approxicizily 5U% of cellular DNA is the outcome of paralogo:s duplications that may have preceded the divergence of the three primary domains. Such high levels of redundancy suggest that the wealth of phylogenetic information older than the cenancestor itself may be larger than realized, and that this information may provide fresh insights into a crucial but largely unexplored stage of early biological evolution. Acknowledgements The work of J. L L. has been supported in part by the Consejo Superior de Investigaciones Cientificas (CSIC, Madrid, Spain). This paper was completed during a leave of absence of one of us (A. L.) as Visiting Professor at the Institut Pasteur (Paris), during which he enjoyed the hospitality of Professor Henri Buc and his associates at the Unité de Physicochimie des Macromolécules Biologiques. Support from the Manlio Cantarini Foundation (A. L.) is gratefully acknowledged. A.L. is an Affiliate of the NSCORT (NASA Specialized Center for Research and Training) in Exobiology at the University of California. San Diego. EARLY CELLULAR EVOLUTION 173 (that takes part in the synthesis and interconversion of aspartate and asparagine). and fumarate hydratase (which participates in the tricarl>oxylic acid cycle). ~denote~ their corresponding C ber, se es t l ze i rent ~ers1ble rea.ctions n-hydrolytic l age. ination, d b dration t .. ecUvely). owever, ll r e f !JSe arate s strat ., hich gest at e structural asis r eir ence ilarity ay e b ologous i ing it r is compound. rgi i osu cinase . . .1) spartate monia-lyase . . l. 1) arate dratase . . ) i ure . l ec-mc bcr ily of E. oli r l ous c es hicb if 'emit :alalytic ropertics. hc c cnces crc ed i g c acaw r ra . hc i ns ith IUl s ic:a l p nt c cncc ilarity re n l .. . onclusions he i ery at ologous es at t l ze ilar biochemi~ ét 5 re nd any i rent abolic t ays ports e a at . ri e núunent k l ce t a sive ale ri g rly el ent f abolic t ays. his clusion ported alysis f e ailable o ic t bases, hich gcst at apprn:;i;;1 :: '. . ~!y ::o f ll lar A e t e f r l ous plicati ns at ay ve r eded e i r ence f e r ary ains. ch i h els f dancy gest at e ealth f ylogenetic ation l cr n e ancestor lf ay er n li d. d at is ation ay provide fresh insights into a crucial but largely unexplored stage f early i l gical olution. cknowledgements he ork f . I. . s en ported art e onsejo perior e esti ci nes ientificas SIC. adrid, pain). his aper as pleted ri g \'e f sence f e f s . .) s isiti g rofe sor t e stitut asteur aris). ri g hich e j ed e spitality f r fe sor enri ue d bis sociates t e nité e ysi chi ie es acromolécules iologiques. port e anlio antarini undation . .) r t fu ly owledged. .L. filiate f e ORT SA pecialized enter r esearch d raining) xobiology t e niversity f alifornia. an i o. 24 'arlomagno, M. S., and Bruni, C. B, (1996) Histidinebiosyntheticpathway and genes:structure, regulasion, and evolution. Microbiol. Rev. 60, 44-69. Becerra. A and Lazcano. A. (1997) The role of gene duplication in the evolution of purine nucleotide salvage pathways, Origins Life Evol. Biosph. (in press) * Becerra. A., Islas. S.. Leguina,J. L. Silva, E.. and Lazcano. A. (1997) Polyphyletic gene losses can bias backtrack characterizations of the c nancestor, J. MoL. Evol. 45, 115 118. Bloch. K. (1994) Blondes in Venerian Paintings, the Nine Banded Armadillo, and other Essays in Biochemistry, Y ale University Press, New Haven. Crawford. 1. P. (1989) Evolution of a biosynthetic pathway: the tryptophan paradigm, Anmu Rev. Microbiol. 43, 567-600. Embley. T. M., Homer, D. A.. and Hirt, R. P. (1997) Anaerobic eukaryote evolution: hyárogenosomes as biochemically modified mitochondria? TREE 12, 437-441. Glansdorff. N. (1996) Biosynthesis of arginine and polyamines, in F.C. Neidhardt (ed.), Escherichia coli and Salmonella typhimurtm: Cellular and Molecular Biology, AMS Press, Washington, D C, pp. 408-433. Horowitz, N. H. (1945) On the evolution of biochemical synthesis, Proc. Natl Acad. Sci. USA 31. 153-157. Horowitz. N. H. (1965) The evolution of biochemical synthesis «retrospect and prospect, in V. Bryson and H. J. Vogel (eds.), Evolving Genes and Proteins, Academic Press, New York, pp. 15-23. Jensen. R. A. (1976) Enzyme recruitment in the evolution of new function, Anmu. Rev. Microbiol. 30, 409-425. Kanehisa, M. (1997) A database for post-genome analysis, Trends Genet. 13, 375-376. Lazcano, A. (1995) Cellular evolution during the early Archean: what happened between the progenote and the cenancestor? Microbiologia SEM 11. 185-198. Mushegian. A. R. and Koonin. E. V. (1996) Gene order is not conserved in bacterial evolution, 7/GS 12, 289- 290. Neidhart, D. J., Kenyon, G. L.. Gertl, J, A, and Petsko, G. A. (1990) Mandelate racemase and muconate lactonizing enzyme are mechanistically different and structurally homologous, Nature 347, 692-694. Olsen. G. J. and Woese, C. R. (1997) Archacal genomics: an overview, Cell 89, 991-994, Ourisson, G. and Nakatani, Y. (1994) The terpenoid theory of the origin of cellular life: the evolution of terpenoids to choiesterol, Chemistry de Biology 1, 11-23. Peretó. J., Fani. R.. Leguina. J. L.. and Lazcano, A. (1997) Enzyme evolution and the development of metabolic pathways. in A. Comish-Bowden (ed.), Cell-Free Fermentation and the Growth of Biochemistry: Essays in Honour of Eduard Buchner, Publicacions de la Universitat de Valencia, Valencia, Spain, (in press) Riley. M. and Labedan, B. (1996) Escherichia coli gene products: physiological functions and common ancestres. in F.C. Neidhardt (ed), Escherichia coli and Salmonella typhimurrum: Cellular and Molecular Biology. AMS Press. Washington, D.C.. pp. 2118-2202. Stetter, K. O. (1996) Hyperthermophiles in the history of life, in G. R. Bock and J. A. Goode (eds) Evolution e Hyarothermal Ecosystems on Earth (and Mars?), Wiley, Chichester, pp. 1-10. Vogel. H. J. (1960) Two modes of biosynthesis among lower fungi: evolutionary significance, Biochem. Biopiys. Acta 41, 172-173. Woese, C. R. Kandler, O., and Wheelis, M. L. (1990) Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya, Proc. Natl. Acad. Sci. USA 87, 4576-4579. Zubay. G. (1993) To what extent do biochemical pathways mimic prebiotic patiways?, Chemiracts - Biochem. Mol. Biol. 4, 317-323. » ys 174 S. ISLAS ET AL. References Alifano. P .. Fani. R.. Lio. P .• Lazcano. A., Bazzicalupo, M~ Carlomagno. M. ., d Brurú, C. . 96) ist dine i m c c t ay d nes:st CIUrC. c ulation, and evol~on. Mi robioL ev. 0. -69. ecerra. d azcano. . ( 997) he le of gene lic: lÍon i tbe oluti n f rine cleotide l ge l ays. ngms ife vol. iosph. (i ress) ' B cerra. . las. S .. eguina. J. l . Silva. E.. nd azcano. A. 97) olyphyletic gene sses can bias backtrack haracteriz.ations oflhe c nancestor, . ol Evo/. 4S, 115 118. · l ch. . 94) / ndes eneti n aintings. e ine anded rmadillo, d t er ssays i chemmry, ale niversity re s. ew aven. rawford. l. . 89) volution of i synlhetic t ay: be ry an r i . nnu ev. icrobio/. 3, 7-6 0. bley. . .• omer. . .. d irt, . . 97) naaobic ulw ote olution: hydrogenosomes s 1 chemically odified it chondria? E 12. 7- 41. lansdorff. . 96) i syntbesis f r i i d l inr:s, . . eidbardt .). scherichia oli nd / onella yph1 unum: e l /ar nd olecular i logy, S re s. ashington. , p. 8-4 3 . orowitz, . . 45) n l e oluti n of bc ical ntbc is. roc. atl cad. ci. SA 1. 3-157. orowitz, . . 965) he lution ofb o ical l esis .. t ect and prospect. . ryson d . . ogel s.), vo/vrng enes d roteins, cadcmic re s. c ork. p. -23. sen. . 76) z:. e r i ent olution of cw illldi n. nmL ev. icrobio/. 0, 9-425. l\anehisa. . 97) l as.: r st e alysis. muJs enet. 3. 5-376. azcano. 95) e lular olution ri g l e c rly n:bean: bal ened c:t cen be r enote d l e ancestor? 1crobio/ogia 1. 5-198. ushegian. . . d l\oonin. . . 96) ene a er t :onscrwd cteria! olution, TI S 2. 9- 0. eidlwt.. . .• enyon. . . crtl . .. d et lco, . . 90) andclale ase d uconate izi g c re cchanistically diffcrent and str CIW'l lly bomologous. arure 7. 2-694. lscn. . . d ocse. . . 97) r aeal ic s: o i . ell . 1- 94. uri son. . d akatani. . 94) be ap id ry f b ri in f cllular : b olution f t r oids t oicsteroL L.."hem1stry & i l gy , -23. eretó. .• ani. . eguina. . l d azcano, . 97) e oluti n d bc l ent f etabolic t ays. ornish- owden d.). e l- r e ennentation d rowth of Biochemistry: ssays onour f uard uchner, blicacions de la niversitat de Valencia, Valencia. Spain. (in press) iley, . d abedan. . 96) scherichia /i ne clucls: bysiological cti ns d nunon cestries. . . eidbardt .). scherichia li d l oMIJa tn1' i : e/J .lar d olecular i logy, S Press. ashington. .C., p. 18- 202. tc ter. . . 96) ype 21% of oxygen), facultative anaerobe (organisms with double response to free ox}'gen), and aerobes (organisms that require ;:: 21% of oxygen). Statistical analyses were performed using the X2 in Microsoft Excel 1m prograrn. J. Results 50, 1 1 1 F 40! e e 30 u e 20, n 1 e 1 y 101 01-T1&.1 ........ .,_...,. .... ..,.. .... .,.....,. ... ...,.a.-,,...-..-, ~ ~ ~ ~ · ~ ~ ~ ~ ~ C'i~'y Gene Duplicarion. Springer V erlag, New Y orle Oparin A. l. (1938) The Origin of Life, MacMillan, New York.. Shimkets. L .J. (1998) Structure and sizes ofthe genomes ofthe Archaea and Bacteria! In Bruijn F etal (eds).Bacterial Genomes: Physical Structure and Analy sis. (Chapman & Hall, New Y orle) Wallace D C and Morowitz H J (1973) genome size and evolution Chromosome 40: 121-126 Zipkas, D. and Riley, M. (1975) Proposal concerning mcchanism of evolution ofthe genome of Escherichia coli Proc. Nat. Acad. Sci. USA 72(4):1354-!358 29 Entrez-PubMed 1 oí2 ,-_. _· N_<_:n_i _ __ J SQllrch J. PubMed Fnt.H.:1. 1 1 ~¡b!'. · Ld •'t' ·· 1· I' :. ¡: 1 ', 1' ¡ lc'tl! !.d ,.j ·.'~ 1 1 ll-· '·!l' ~-\· !ti.F'Í1' ( •:'\! ¡ ' ·h•·l.·i ' Jl <: ","d l : ¡;,;¡;.•, , tl ;:(;t;il :ni'.·. H:·tJk>Í H i: · ;~ H I: ~ ;,; .. ~ ) !ti«· q,,li!i ·•'lll '.i.. ' \ I,, 1•.\ 1 •\hld ~ ' i! . , : ' ir ! l "ii· 1 ¡.¡ ' l ¡ líkl:fl/A:fhyperabstrc.html • PubMed ·-·---·-----! .... , • -i: i:•. if!I for · ~ i r &•ll (.ljp_~!.>!!.~.\! .. J.d Show:[Oifl . ~t r 1: lnt Microbio!. 2003 Jun 28 (Epnb ahcad of prinl( . [] Hy11ertherrnophily and the ori¡:in and earliest evolution oflife. Islas S, Velasco AM, Uecerrn A, Delayc: L, Laicano A. Fncultad de Ciencias, UNAM. Ciudad Universitaria. Apdo. Postal 70-407, 04510, Mcxico D.F., Mexico. TI1c possíbilíty of a high-ti:mpcrature origin of lifo has gaincd support based on indircct cvidcncc uf a hoi, early Earth and on thc bnsn! position of hypcrthcnllQphilic organisms in rRNA·bascd pliylogcnics. Howévcr. although the availability of more than KO complctcly Sl.'qUCllced ccllular genomes has led to the idcntification of'hypcrthcnnophilic-spccific lraits, such as a tnmd towards smallcr gcnomcs. rcduccd protein-cncoding gene sizos, and g!utamic·acid-rich simple sc:qucnccs. nonc of thcsc characteristics are in thcmsclvcs an indication of primitivcncss. TI1crc is no geologica! cvldcncc for thc 1>hysica! sctling in which lifc arosc, but currcnt modcls suggcst tlmt. thc Earth's surfacc coolcd down mpidly. Morcovcr, at 100 d~'grccs C thc half-liv.:s of severa! organic compouuds. including ribosc. nucleobases, and amino acids, \\ilich are gcncral!y thQUghl to havc b<:eti esscntial for the emcrgcncc of thc first living systcms, are too short to a!low for tb:ir accun111Ja1ion in thc prebiotic cnvironmcnl. According!y, if hypcrthcnnophily is noc ll\lly primordial, 1hc11 l11:at-lovi11g !ifcstyl~'S may be n:lics of a socondary adaptation that cvolvcd aftcr lhc origin oflifü, nnd beforc or soon nfter sopnration ofthe major lincagcs. PMID: 12838394 (PubMcd ·as suppticd by publishcrJ , !J Show: [ií'"E 1 ort Wrnc to thc Hclp Dcsk ~fil 1 NLM 1 tjlH O.:.paf!llléot ot' Hcnlth & Hum .. 1 11 ~n·iccs .t.r!i.~9.río cho/crac (Vic, Vic2). Hacmophilus influcnzac (Hai). Salmonclla typhí (Sat), Salmoncl/a typhímuríum LT2 (Sat2), Eschcrichia coli Kl2 (Eco2). fachcríchia co/í0157H7 (Eco7), Eschcríchía co/í0157H7 EDL933 (Eco3), Ycr . ~inía pestís KIM (YcpM), Ycrsinía pcstís C092 (Ycp2), Pscudomonas aeruginosa (Psa), Xanthomona . ~ citri (Xac). Xanthomonas campcstris (Xaa), Pastcurella multocida (Pam ). Buchncra aphidíco/a Sg (Bua ), l3uchncrn sp. (Bus);ME-Protcobactcria: Campylohactcr jcjuni (Caj). Hclicohactcr pylori 26695 (Hcp5). Hclícohacter pylori J99 (Hcp9); green sulfurlóscteria: Chlorobíum tcpidum TLS (Cht ); gram-positiv.:. low-GC: Streptococeus pneumo11ü1e TIGR4 (Stp4), Streptococcus pncumoni11e R6 (Stp6 ), Strcpl< •cocrns pyogcncs MGAS315 (Sty5). Listeria innocua (Lii), Listeria monocytogenes (Lim ). Thcrmoan:.Jen •hactcr tcngcongensis (Thl), Staphyloeoccus aureus MW2 (Sta2), Sraphylococcu . ~ aureu.\ Mu50 (StaO). St;iphylococcu.~ aureus N3 l 5 (Sta5). Lactococcus lactis (Lal), Streptococcus pyogenes (Sto). Clmtridium pcrfringens (Clp ), Clostridium ,1cetohutylieum (Cla ), Bacil/us suhtilis (Bsu ), Bilcil/us halodurans (Bah). Mycopfasma pneumoni:ic (Myn) , Mycoplasma genitalium (Myg) , Mycoplasma pulmoni . ~ (Myp). Urcaplasma urc11/yticum (Uru ): gram-positivc, high-GC: Streptomyces eoclicolor(Stc). Mycoh11ctcri11111 111/Jcrcu/osis CDCl55 I (Myt 1 ), Mycohactcrium tuberculosis H37Rv (Mytv), Mycobactcrium Jcpme (Myl); radiorcsistant hactcria: Deinococcus radiodurans(Dcrl, Dcr2); Fusohactcria: Fusohactcri11111 nucleatum (Fun ); cyanohactcria: SynechocystisCC6803 (Syn), Thermosynec/10eoccus clongatus (Thc ). Nostoc sp. (Nos); actinohactcria: Corynebacterium glutamicum (Cog); chlamydia: Chlamydophila pneumoniae AR39 (C'hp9), CMamydophi/a pneumoniae J 138 (Chp8), Chlamydoplúla pneu111011iac CWUJ29 (Chp2), Chfamydia traehomati.~ (Chr), Chlamydia muridarum (Chm); spirochctc : Borrc/ia burgdorferi (Boh ). Treponema pal/idum (Trp) Like their mesophilic counterparts, hyperthermophilic genes are endowed with simple sequences, i.e. homopolymeric tracts and tandem arrays of multiple short repeat motifs. These low-complexity regions have their origin in mutational processes, such as slipped-strand mispairing and unequal crossing-over, that take place during DNA replication and are known to represent a majar source of genetic variation in pathogenic prokaryotes [3 I ). Analysis of ali the completely sequenced hyperthermophilic and thermophilic genomes available as of Dccember 2002 shows that the natural amino acid composition of each proteome is cnhanced with respect to its corresponding simple sequences, which have a compositional bias as shown by the abundance of small, a-helix forming amino acids, i.c., alanine, leucine, lysinc, serinc and glutamic acid (Becerra, Cocho, Delaye and Lazcano, unpublished results). As shown in Fig. 3, however, simple sequences in hyperthermophiles are clearly enriched in glutamic acid . The stability of the a-helix structure of glutamic acid homopolymers under acid pH valucs [35) probably explains why, with the exception of Thermoplasma acidophilum, simple sequences of acid-resistant heat-loving prokaryotes tend to be rich in this amino acid. Enrichment of glutamic acid in extremophilic simple sequences explains the relative abundancc in hyperthermophilic genomes, as noted by Tekaia et al. [48). 36 18 I !) 11. 12 10 , . ¡¡ , . ~ r¡ · · ~ : ¡· 1 ¡1 1 j l , 'lJ 1 i ~l u ! Relacive ol>vtldanc-e of omino acrds Fig. 3. Rclative abundanccs of amino acids in simple sequenccs in ali availablc protcomcs as of Dcccmbcr, 2002. Simple sequences were identified using the SEG program 1.531, which idcntifics Jow-complcxity regions in which an enhanccd conccntration of short rcpeats not duc to chance cvcnts can be detected. White bars show the average simple-sequcnce amino acill composition of mcsophilcs. and dark bars show those of hyperthermophilic prokaryotes (80 ºCor more) . Thc hypcrthcrmophilic spedcs rcprcscntcd hcre are Pyrococcus furiosu . ~. P. horikopshi, P. abyssi, Aeropyrum pcrnix, Methanococcus jannaschii, Archacoglobus fulgidus, Su/folobus solfataricus and S. tokodaii The faulty records of archaean life As shown by recent debates, the identification of the oldest paleontological traces of life can be a highly contentious issue. The early archaean geological record is scarcc, and most of the preserved rocks have been metamorphosed to a considerable extent. However, the evidence suggests that life emerged on Earth as soon as it was possible to do so. Although the biological origin of the microstructures interpreted as cyanobacterial remnants in the 3.5x109 year old Apex sediments of the Australian Warrawoona formation [38 J havc becn questioned [3], there is additional paleontological evidence that highly diverse microhial communities were thriving during the early and middle Archaean [32]. Unfortunately, it is unlikely that data on how life originated will be providcd by the geological record. There is no direct evidence of the environmental conditions on thc Earth al the time of the origin of life, nor is there any fossil register of the evolutionary proccsses that preceded the appearance of the firsl eells. Direct information is lacking not on ly on thc composition of the terrestrial atmosphere during the period of the origin of life, but also on thc tcmpcrature, ocean pH values, and other general and local environmental conditions that mayor may not have been importan! for the emergence of living systcms. The attributes of the first living organisms are also unknown . They were probably simplcr than any ccll now alive and may have lacked not only protein-based catalysis, but pcrhaps evcn thc familiar gene tic macromoleculcs, with their sugar-phosphate backbones. It is possiblc that the only property they shared with extant organisms was the structural complcmentarity bctwccn monomeric subunits of rcplicative informational polymers, e.g. the joining togethcr of rcsiducs in a growing chain whose sequence is directed by preformed polymers. Such ancestral polymers may have not even involved nucleotides. Accordingly, the most basic questions pcrtaining to thc origin of life relate to much simpler replicating entities prcdating by a long series of evolutionary events thc oldest recognizable heat-loving prokaryotes represented in molecular phylogenies. Thc rooting of universal cladistic trees determines the directionality of cvolutionary chang.c and allows ancestral characters to be distinguished from !hose that were dcrivcd. Determination ofthe rooting point of a tree normally imparts polarity to most or ali charactcrs. lt is, howcver, importan! to distinguish between ancient and primitive organisms. Organisms located near the root of universal rRNA-based trees are cladistically ancicnt, but thcy are not endowed with a primitive molecular genetic apparatus, nor do they appear to be more rudimentary in their metabolic abilities than their aerobic counterparts. Primitive living systems would initially refer to pre-RNA worlds, in which life may have been bascd on polymers using backbones other than ribose-phosphate and possibly bases differcnt from adenine, uracil, guaninc and cytosine, followed by a stage in which life was based on RNA as both genetic material and catalysts [23]. Molecular cladistics may provide clues to sorne very early stages of biological evolution, but it is difficult to see how the applicability of this approach can be extended bcyond a thrcshold that corresponds to a period of cellular evolution in which protein biosynthcsis was already in operation, i.e ., an RNNprotein world. Older stages are not yet amenablc to molecular phylogenetic analysis. A cladistic approach to the origin of life itself is not feasible, since ali possible intermediates that may have once existed have long since vanished. 38 Was the last common ancestor a hyperthermophile? The variations of traits common to extant species can be easily explained as the outcome of divergen! processes from an ancestral life form that existed prior to the separation of thc three major biological domains, i.e., the last common ancestor (LCA) or cenancestor. No paleontological remains will bear testimony of its existence, as the search for a fossil of thc cenancestor is bound to prove fruitless. From a cladistic viewpoint, the LCA is mercly an inferred inventory of fea tu res shared among extant organisms, ali of which are locatcd at the tip of the branches of molecular phylogenies. However, if the term "universal distribution" is restricted to its most obvious sense, i.e., that of traits found in ali completely sequenced genomes now available, then quite unexpectedly the resulting repertoire is formed by relatively few fcatures and by incompletely represented biochemical processes [8]. Surprisingly. sorne of the most likely a priori candidates for strict universality, such as those sequences involvcd in DNA replication, have also turned out to be poorly prescrved [ 11 ). Analysis of an increasingly large number of completely sequcnced cellul.ar gcnomcs has revcaled major discrepancies in the topology of rRNA trees. Very often thcse differcnces have been interpreted as evidence of horizontal gene-transfer cvents between diffcrent species, questioning the feasibility of the reconstruction and proper understanding of early biological history [ 10]. There is clear evidence that genomes have a mosaic-like na tu re whose components come from a wide variety of sources. Depending on their different advocatcs. a wide spectrum of mix-and-match recombination processes have been describcd, ranging from the lateral transfer of few genes via conjugation, transduction or transformation, to cell fusion events involving organisms from different domains. The resulting reticulate phylogenies greatly complicate the inference of cenancestral traits. Driven in part by the impact of lateral gene acquisition, as revealed by the discrepancics of different gene phylogenies with the canonical rRNA tree, and in part by the surprising complexity of the universal ancestor, as suggested by direct backtrack characterizations of the oldest node of universal cladograms, Woese [53) has argued that the LCA was not a single organismic entity, but rather a highly diverse population of metabolically complementary, cellular progenotes endowed with multiple, small linear chromosome-like gcnomes that benefited from massive míiüidirectional horizontal transfer events. According to this viewpoint, the development of the essential features of translation and of metabolic pathways took place befare the earliest branching event, but what led to the three domains was nota single ancestral lineage, rather a rapidly differentiating community of genetic entities. This communal ancestor occupied as a whole the node located at the bottom of the universal tree, in which decreasing sequence exchange and increasing genetic isolation would eventually lead to the observed tripartite division of the biosphere. Did the hypothetical communal progenote ancestor proposed by Woese (53] diverge sharply in to the three domains soon after the appearance of the code and the establishment of translation? Not necessarily, since inventories of LCA genes clearly include sequenccs that originated in different pre-cenancestral epochs. The origin of the mutant sequences ancestral to those faund in ali extant speeies, and the divergence of the Bacteria, Archaea, and Eukarya were not synchronous events, i.e., the separation of the primary domains took place la ter, perhaps even much la ter, than the appearance of the genetic components of thcir LCA(8]. Universal gene-based phylogenies ultimately reach a single universal entity, but thc bacterial-like LCA [8] that we favor was not alone. Company must have been provided by its siblings, a population of entities similar to it that existed throughout the same period. They may not have survived, but sorne of their genes did if they became integrated via lateral transfer in to the LCA gen orne. The cenancestor should thus be considered as the evolutionary outcome of a series of ancestral events, including lateral gene transfer, gene losses, and paralogous duplications, that took place befare the separation of Bacteria, Archaea, and Eukarya. Comparisons of combined ortholog protein data sets that exclude sequences which may havc undergone lateral transfer are consistent with rRNA trees (5). Genomic trees also exhibit an exeellent broad-level agreement with rRNA-based phylogenies [47]. Genomic trees are not cladograms but phenograms, i.e., they are hierarehical representations of similarities and differenees in gene content, in which the presence or absence of a sequenee is counted as a character. Since different lineages evolve at different rates, such overall similarity may be an equivocal indieator of genealogical relationships. Nevertheless. thesc trees are rooted in the same area as rRNA phylogenies, which suggests that massivc lateral transfer cvents bétween distant groups has not obliterated the early history of life. Thus, although hyperthermophiles may be displaced from their basal position if molecular markers 40 other than elongation factors or A TPase subunits are employed, or if alternativc phylogeny-building methodologies are used [4), it can still be argued that rRNA-based phylogenies provide one of the best-preserved historical records of cell evolution [5J ¡. Thc recognition that the decpest branches in rooted universal phylogcnics are ocuppicd by hyperthermophilcs does not by itself provide conclusive proof of a heat-loving LCA. Analysis of the correlation of the optima! growth temperature of prokaryotes and the G +C nucleotide content of 40 rRNA sequences through a complex Markov model has led Galtier et al. [ 16) to conclude that the universal ancestor was a mesophile. This possibility has bccn contcstcd by Di Giulio [9), who has argued for a therrnophilic or hyperthcrmophilic LCA. Howcvcr, since the time factor is absent from the methodology developed by Galtier et al. [ 16], the inferred low G+C content of the cenancestral rRNA does not necessarily bclong to the ccnancestor itsel f, but may correspond to a mesophilic predecessor that may have been located along the trunk of thc universal tree. Chemical evolution and extreme environments The hypothcsis that the first organisms were anaerobic heterotrophs is based on the assumption that abiotic organic compounds were a necessary precursor for thc appearance of life. The first successful synthesis of organic compounds under plausible primordial conditions was accomplished 50 years ago by the action of electric dischargcs acting for a week over a mixture of CH,, NHJ, H2, and H20, and led to complex mixture of monomers that included racemic mixtures of severa! proteinic amino acids, in addition to hydroxy acids, urea and other molecules [27]. Prebiotic synthesis of amino acids largely procceds by a Strecker synthesis that involves the aqueous-phase reactions of highly reactive intermcdiatcs (Structure RCllO + HC1' + 21'1-11 4 • RCll!Nll! K."1' + 11!0 I, Structure RCH11\H, )CN + 2 fl,O --+ RCH1NH~lC'OOH + NH , 2). Detailed studies of the equilibrium and rate constants of these reactions demonstrated that both amino acids and hydroxy acids can be synthesized at high dilutions of HCN and aldehydes in a simulated primitive ocean. The reaction rates depend on temperature, pH, HCN, NH3, and aldehyde concentratio~ and are rapid on a geological time scale; thc half-lives for the hydrolysis of the intermedia te products in the reactions, amino nitriles and hydroxy nitriles, are less than a 1,000 years at O ºC, and there are no known slow steps [30). The remarkable ease by which adenine can be synthesized by the aqueous polymerization of ammonium cyanide demonstrated the significance of HCN and its derivatives in prebiotic chemistry [33). As summarized elsewhere [30], the prebiotic importance of HCN has been further substantiated by the discovery that the hydrolytic products of its polymers include amino acids, purines, and pyrimidines. The reaction of cyanoacetylene or cyanoacetaldchyde (a hydrolytic derivative of HCN) with urea leads to high yields of cytosine and uracil, cspecially under simulated evaporating pond conditions which increase the urea concentration [36). The ease of formation under reducing conditions (CH.i +Ni, NH_. + HiO, orco~+ H2+ Nz) of amino acids, purines, and pyrimidines in one-pot reactions strongly suggests that thcsc molecules were present in the prebiotic broth. In addition, experimental evidence suggests that urea, alcohols, sugars formed by the non-enzymatic condensation of formaldehyde, a wide variety of aliphatic and aromatic hydrocarbons, urea, carboxylic acids, and branched and straight fatty acids, including sorne which are membrane-forming compounds, wcre also components of the primitive soup. The remarkable coincidence between the molecular constituents of living organisms and those synthesized in prebiotic experiments is too striking to be fortuitous, and the robustness of this type of chemistry is supported by the occurrencc of most of these biochemical compounds in the 4.5x 109-year-old Murchison carbonaceous meteorite, which also yielded evidence of liquid water in its parent body [ 12]. A major advantage of high temperatures is that chemical reactions go faster, and the primitive enzymes, once they appeared, could have thus been less efficient but nonethcless effective . However, the price paid is manifold: high-temperature regimes would lead to: (a) reduced concentrations of volatile intermediates, such as HCN, H2CO and NH.,; (b) lower steady-state concentrations of prebiotic precursors like HCN, which at temperatures a littlc above 100 ºC undergoes hydrolysis to formamide and formic acid and, in the presence of ammonia, to NH.iHC02 (Structure o () JI .O ;¡ 11 .() I! '11 ' lt(' i\ ~ lf f ' · ~ 11 : --+ tl-t'-Olt ·H_:_ ... ,H_,HCO! 3). ( c) instability of reactive chemical intermediates like amino ni triles (RCHO(NI-h)CN), which play a central role in the Strecker synthesis of amino acids (see Structure 1 ); and (d) loss of organic compounds by thermal decomposition and diminished stability of ge nctic polymers. 42 Extremophilic genomes are protected against thermal decomposition by a number of enzyme-dependent mechanisms [ 18), but these would have not been available during prebiotic times or at the time of the origin of life. In fact, the existence of an RN A world with ribose appears to be incompatible with a (hyper)thennophilic environmcnt [29]. Suivival of nucleic acids is limited by the hydrolysis of phosphodiester bonds [24], and the stahility of Watson-Crick helices (or their pre-RNA equivalents) is strongly diminished by high-temperatures. For an RNA-based biosphere, reduced thennal stability on the geologic time scale of ribose and other sugars is the worst problem, but the situation is equally bad for pyrimidines, purines and sorne amino acids. As summarized elsewhere [23, 39), mcasurements by different groups have shown that the half-life of ribose at 100 ºC and pH 7 is only 73 min, and other sugars (2-deoxyribose, ribose 5-phosphate, and ribose 2,4-biphosphate) have comparable half-lives. The half-life for hydrolytic deamination of cytosine at 100 ºC is 19-21 days, although at 100 ºC the half-life of uracil is approximately 12 years. At 100 ºC, the thennal stability of purines is also reduced: 204-365 days for adcninc, with guanine having a low half-life. A hypertherrnophilic pyrite-dependent origin of life? An altemative to the problem of low half-lives of biochemical monomers at tempcratures of 100 ºCor more is to assume an autotrophic origin of life. Such proposals are periodically resurrected, but they are generally made without supportive evidence. The most elaborate chemoautotrophic-origin-of-life scheme has been propü5Cd by Wlichtershliuser [50). According to this hypothesis, life began with the appearance of an autocatalytic two-dimensional chemolithotrophic metabolic system based on the fonnation of the highly insoluble mineral pyrite. The synthesis in activated fonn of organic compounds such as amino acid derivativcs, thioesters and keto acids is assumed to have taken place on the surface of FeS and FeS! in environments that resembled those of deep-sea hydrothennal vents. Replication followed the appearance of non-organismal iron-sulfide-based two-dimensional life, in which chemoautotrophic carbon fixation took place by a reductive citric acid cycle, or reverse Krebs cycle, of the type originally described for the photosynthetic green sulfur bacterium Chlorobium limícola. Molecular phylogenetic trees show that this mode of carbon fixation and its modifications (such as the reductive acetyl-CoA or the reductive malonyl-CoA pathways) are found in anaerobic archaea and the most deeply divergen! eubacteria, which 43 has been interpreted as evidence of its primitive character (25]. The rcaction FeS+ H2S-+FeS2+H 2 is a very favorable one. lt has an irreversible, highly exergonic character with a standard free-energy change t:J.cf = - 9.23 kcal/mol, which corresponds to a reduction potential ff= - 620 mV. Thus, the FeS/I-I2S combination is a strong reducing agent , and has been shown to provide an efficient source of electrons for the reduction of organic compounds under mild conditions. Pyrite-mediated CO, rcduction to amino acids, purines and pyrimidines is yet to be achieved. However, as reviewed elsewhcrc [6, 20, 25], the FeS/I-hS combination has been shown to: (a) reduce nitrate and acctylcne: (b) induce peptide-bond formation that results from the activation of amino acids with carbon monoxide and (Ni, Fe )S; and ( c) to induce the synthesis of acetic acid and pyruvic a cid from CO under simulated hydrothermal conditions in the presence of sulfide minerals (6, 20, 25 j. Howcver, support for Wiichtershauser's central tenets is meager. Life does not consist solcly of mctabolic cycles, and none of these experiments prove that enzymes and nucleic acids are thc evolutionary outcome of multistep autocatalytic metabolic cycles surface-bounded to FcS/FcS2 or sorne other mineral. In fact, experiments using the FeS/H2S combination are also compatible with a more general, modified model of the primitive soup in which pyritc formation is recognized asan important source of electrons for the reduction of organic compounds (2). Summary and conclusions As thc initially molten young Earth cooled down, global temperatures of 100 ºC must havc bcen reached but could not have persisted for more than 20 million years (42). Deep-sea hydrothermal vents and other local high-temperature milieus have existed throughout thc history of the planet and have played a major role in shaping the early environments. Howcver, the rates of thermal decomposition of amino acids, nucleobases, and gene tic polymers are very short on the geological time scale and argue against a hot origin of life in such extreme environments. Since high salt concentrations protect DNA and RNA against heat-induced damagc (26, 46), this and other non-biological mechanisms, such as adsorption to minerals surfaces and formation of clay-nucleic acid complexes (15] might have played a significan! role in the preservation of organic compounds and genetic polymers in the primitive environments. However, such mechanisms would be inefficient at temperatures above 100 ºC. Because adsorption involves the formation of we414 noncovalent bonds, mineral-based concentration and protection would have been most effective at low temperatures [43]; al high temperatures any adsorbed monomers would drift away into the surrounding aqueous environmcnt and become hydrolyzed. However, sorne minerals could also have the opposite effect: as shown by the cu+2-montmorillonite catalyzed decomposition of adenine to hypoxanthine [45), the association of organic compounds with sorne minerals may in fact reduce thcir half-lives. If hyperthermophily is not truly primordial, then heat-loving lifestyles may be re lics of a secondary adaptation that evolved after the origin of life and befare or soon after the separation of the major lineages. As argued here, the so-called root of universal trces does not correspond to the first living system, but is the tip of a trunk of still undetermined lengt h in which the history of a long (but not necessarily slow) series of archaic evolutionary cvents such asan explosion of gene families and multiple events of lateral gene transfer are still preserved. Is it possible that traces of the emergence of hyperthermophily persist in the molecular records of earliest biological evolution somewhere along the trunk of rRNA-based phylogenic trees? lf hyperthermophiles were not the first organisms, then their basal position in molecular trees could be explained as: (a) a relic from early archean high-temperaturc regimes that may have resulted from asevere impact regime [17, 41]; (b) adaptation of Bacteria to extreme environments by lateral transfer of reverse gyrase [14] and other thermoadaptative traits from heat-loving Archaea; and (c) outcompetition of older mesophiles by hyperthermophiles originally adapted to stress-inducing conditions other than high temperatures [29]. Although there have been considerable advances in the understanding of chemical processes that may have taken place befare the emergence of the first living systems, life's beginnings are still shrouded in mystery. Like vegetation in a mangrove, the roots of universal phylogenetic trees are submerged in the muddy waters of the prebiotic broth, but how the transition from the non-living to the living took place is still unknown. Given the hugc gap existing in curren! descriptions of the evolutionary transition between the prcbiotic syn t hcsis of biochemical compounds and the LCA of ali extant living beings, it is probably naivc to attempt to describe the origin of life and the nature of the first living systems from molecular phylogenies. A high-temperature origin of life may be possible, but if this was the case then it could have not involved the usual purines and pyrimidines, or other biochemical monomcrs. Actnowledgements AL is an affiliate of the NSCORT-University of California, San Diego. This paper was completed during a sabblóical leave of absence in which onc of us (AL) cnjoyed the hospitality of Stanley L. Miller and his associates at the University of California , San Diego. Support from the National Aeronautics and Space Administration Specialized Center of Research and Training in Exobiology (NSCORT) is gratefully acknowledged. References l. J\chcnbach-Richtcr L Gupta R, Stcttcr KO, Wocse CR (1987) Wcrc thc original eubacteria thcrmophilcs? System Appl Microbiol 9:34-39 2. Bada JL, Lazcano A (2002) Sorne like it hot, but not biomolcculcs. Scicncc 296:1982-1983 J. Brasier MD, Green OR, Jcphcoat AP, Klcppc AK, van Krancndonk MJ, Limlsay JI', Stecle A Grassincau NV (2002) Qucstioning thc evidcncc for Earth's oldest fossils. Naturc 416:76- 81 4. Brochier C, Philippe H (2002) A non-hypcrthermophilic anccstor for Bacteria. Naturc 417:244 5. Brown JR, Douady CJ, Italia MJ, Marshall WE, Stanhopc MJ (2001) Universal trces bascd on largc combincd protein sequcncc data sets. Nat Genet 28:281-285 6. Cody GD. Bcx.tor NZ, Filley TR, Hazcn RM, Scott JH. Sharma A, Yoder HS Jr. (2000) Primordial carhonylatcd iron-sulfur compounds and thc synthesis of pyruvate. Scicncc 289: 1337- 1340 7. Corliss JB. Baross JA, Hoffman SE (1981) An hypothesis concerning thc rclationship betwecn submarinc hot springs and the origin of lifc on Earth. Oceanologica Acta (Suppl.) 4:59-69 8. Dclaye L, Becerra A, Lazcano A (2002) Thc naturc of the la~t common anccstor. In: Rihas de Pouplana L (cd) The gcnctic codc and the origin of lifc. Landes Bioscicnce, Gcorgetown (in pres.~) 9. Di Giulio M (2000) The universal anccstor lived in a thcrmophilic or hypcrthermophilic cnvironmcnt. J Thcor Biol 203:203-213 10. Doolittlc WF ( 1999) Phylogenctic classification and thc universal trce . Sciencc 284:2124-2129 11. Edgcll RD Doolittle WF (1997) Archaca and the origin(s) of DNA rcplieation protcins. Cell 89:995-998 12. Ehrenfreund P, Irvinc W, Bccker L Blank J, Brucato J, Colangcli L, Dcrcnne S, Despois D. Dutrey A, Fraaije H, Lazcano A, Owen T, Robcrt F (2002) Astrophysical and :107-109 48 , ... ·-· . .. - . ! • . .::.·: •..... FACULTAD DE CIENCIAS Dcp:mamento de Biología Ernlutiva E-mail : alar(ii. corrco.unam.mx FAX 525155/622.4828 February 7, 2003 Re: "Comparative genomics and the gene complement of a minimal cell", by S. lslas, A. Becerra, P.L. Luisi and A. Lazcano Prof. Dr. Peter Walde lnstitute of Polymers Materials Science ETH-Zentrum, CNB D 90.2 Universitaetstrasse 6 CH-8092 Zurich Switzerland Tel : +41-1-63 204 73 or-'-41-1 -63 :30 ~:; Dear Peter. Enclosed please find the printed copies of the corrected version of the manuscript "Comparative genomics and the gene complement of a minimal cell", by Sara Islas, Arruro Becerra. Picr L ui ~ '. Luisi ar. d .\nwnio Lazcano, together with a diskette in which the corresponding lile may be found in \Vord . In preparing this version we have (a) rewritten the references following the format required by Origins of L{fe and El'O!wío11 of the Bíosphere; (b) we have added a column in Table 2 in which the number of ORFs found in every one of the genomes listed there has been added, in order to facilitate the comprehension of the column in which º·'Ó of redundancies is shown; and (e) have added a short explanation in the Materi al and :\fethods section on the way the % ofredundancies was estimated. I trust you will find everythíng in order. but please feel free to contact me in any further information is required . With warmest personal regards and many thanks indeed, ,_::~:.::osu r cs . \ntoni o Lazcano Prof\?ssor 49 Comparative genomics and the gene complement of a mini mal cell Sara Islas 1, Arturo Becerra 1, P. Luigi Luisi2 and Antonio Lazcano 1 * 1Facultad de Ciencias, UNAM Apdo. Postal 70-407 Cd. Universitaria, 04510 Mexico D.f. *corresponding author MEXICO E-mail: alar@correo. unam. mx 2ETH-Zentmm Institut fur Polymere Universitatstrasse 6 CH-8092 Zurich, Switzerland 50 Abstract The concept of a minimal cell is discussed from the viewpoint of comparative genomics. Analysis of published DNA content values determined for 641 different archaeal and bacteria! species by pulsed field gel electrophoresis has lead to a more precise definition of the genome size ranges of free-living and host-associated organisms. DNA content is not an indicator of phylogenetic position. However, the smallest genomes in our sample do not have a random distribution in rRNA-based evolµtionary trees, and are found mostly in (a) the basal branches of the tree where thermophiles are located; and (b) in late clades, such as those of Gram positive bacteria. While the smallest-known genome size for an endosymbiont is only 450 kb, no free-living prokaryote has been described to have genomes < 1450 kb. Estimates of the size of rninimal gene complement can provide important insights on the primary biological functions required for a sustainable, reproducing cell nowadays and throughout evolutionary times, but definitions of the rninimum cell is dependent on specific environments. Key words: minimum gene set, minimal cellular genomes, genetic redundancy, DNA content SI l. Introduction Definition of the properties of a minimal cell is a notoriously complex question which is related not only to the understanding of the essential properties of a living system, but is also germane to the issue of the origin of life and early stages of cellular evolution. Several different, complementary approaches to this problem are already feasible or may be available in the near future, including the development of experimental systems based on populations of replicating polymers such as RNA molecules (Joyce, 2002), the in vitro synthesis of artificial cells which can metabolize, multiply and adapt (Szostak et al ., 2001 ; Pohorille and Deamer, 2002), the empirical characterization of intracellular endosymbionts and obligate parasites with highly streamlined genomes (Morowitz, 1967; Morowitz and Wallace, 1973; Mira et al ., 2001; Gil et al., 2002), the trimming of extant prokaryotic genomes by knock-out experiments and transposon mutagenesis (Itaya, 1995; Hutchinson et al., 1999), and the recently advertised attempt to design a novel form of life with a completely artificial genome (Marshall, 2002). The characteristics of a minimal cell may be inferred from the existence of the basic components required for reproduction and self-maintenance under given environmental conditions (Luisi et al., 2002). From the viewpoint of comparative genomics, the characterization of a minimal cell is equivalent to the identification of the minimum number of genes required by an unicellular organism. Such estimates can provide important insights on the primary biological fimctions required for a sustainable, reproducing cell nowadays and tluoughout evolutionary times . However, the definition of minimal genome is detennined to a considerable extent by the specific 52 envirorunent in which the presumed minimal cell is found (Space Science Board/National Research Council, 1999; Riley and Serres, 2000). Free- living, unicellular organisms may exist with genomes smaller than the l .45 Mb lower-limit exhibited by extant prokaryotes (see below), but ali the available evidence suggests that nowadays reduced, highly-streamlined genomes like those of Buchnera and the mycoplasma are viable only under the pennissive, nutrient-rich, stable intracellular environment of their hosts (Mira et al. , 2001 ; Gil et al. , 2002). However, the situation must have been different during the earliest stages of biological evolution, when it is assumed that simpler, free-living cells with genomes even smaller than those of Buchnera and Mycoplasma genitalium must have proliferated. A minimal gene set can be estimated by the presence or absence of homologous genes based on whole-genome computational sequence compansons (Mushegian and Koonin, 1996) and, similarly, by the determination of the set of sequences shared among fully sequenced proteomes, i.e., the universal families protein families (Kyrpides et al., 1999; Hutchinson et al., 1999). Significant variations may exist between the lenghts of prokaryotic genes (Tekaia et al., 2002). However, on a first approximation bacteria! genes may be considered of similar size and tightly packed, i.e., the number of prokaryotic genes is proportional to genome size (Casjens, 1998). Hence, additional insights on the minimal amount of DNA required by extant cells may also be achieved by an statistical analysis of prokaryotic genome sizes (Herdman, 1985; Casjens, 1998; Shirnkest, 1998). Previous attempts to analyze the distribution of bacterial DNA content were based on a sample of 603 prokaryotic genome sizes derived by different methodologies, such as renaturation kinetics and colorimetric techniques 53 (Herdman, 1985), which have very different degrees of accuracy . With the development of pulse-field gel electrophoresis (PFGE), a technique that allows the separation and analysis of large DNA fragrnents and the direct study of the physical strncture of genomes, however, the accuracy in the detennination of genome sizes has been significantly improved. Here we report the results of an analysis of a database of 641 prokaryote genome sizes detennined by PFGE that we have compiled from the published literature, and discuss its significance in providing insights on a minimal cellular genome. The approach developed here is very similar to that reported by Shimkets (1998), and may be considered complementary. We also discuss here how the high levels of genetic redundancy detected in all sequenced genomes can be used to obtain insights in simpler living systems without the large sets of enzymes and the sophisticated regulatory abilities of contemporary organisms, that are hypothesized to have existed prior to the divergence of the three major domains, which lacked. 2. Material and methods A genome size database has been constrncted with the 641 prokaryotic DNA content values determined by PFGE reported in publications included in the NCBI/PubMed database (http://ww\v.ncbi .nlm.nih.gov/PubMedí) as of November 2002 . The organisms in this database have been divided into four major groups: (i) free-living Archaea and Bacteria (including pathogens and symbionts that remain separate and have free-living stages ); (ii) thermophilic prokaryotes (optima! growth temperatt.ire > 45 ºC); (iii) obligate parasites; and (iv) endosymbionts, excluding mitochondria and chloroplasts . The 54 infonnation was completed with the phylogenetic position (not shown) and lifestyle of each organism, based both on the original reports and on data from the Bergey 's Manual o.f Bacteria/ Determination (Holt et al. , 1994). The database is periodically updated and is available upan request. We have estimated the levels of genetic redundancy in the smallest genomes of endosymbionts and obligate parasites using the database of levels of paralogy (Total Proteins Hits) available from the lnstitute for Genomic Research (TIGR, http ://w,vw.tigr.org). To be considered redundant, ali the ORFs in a given genome, whether annotated or not, were compared using BLAST and had to exhibit at least 60% sequence similarity (P<0.0001 ). The result of this comparison is shown in Table 2, where the sizes of sorne of the smallest known cellular genomes are indicated in kb, together with the number of ORFs, the number of redundants found in each genome, and the corresponding percentage per genome 3. Results The genome size distribution in our database is shown in Figure 1. The values of DNA content of free-living prokaryotes can vary over a tenfold range, from Halomonas halmophila, a moderately halophilic gamma proteo bacteria endowed with a small 1450 kb geno me (Mellado et al., 1998), to the 9700 kb genome of Azospirillium lipoferum Sp59b (Martin-Didonet et al., 2000). The widest range of genome sizes is exhibited by the proteobacteria, from the 450 kb Buchnera genome, to the largest ones in the sample, which conespond to aerobic organisms with complex life cycles which can include fonnation of spores and mycelia. There are not reports of SS archaeal genomes as large as those of Azospirillum and Stigmatella, perhaps due to incomplete sampling. All the archaeal genomes in our sample are small and fall within the 500 to 5100 kb range. These size range corresponds in fact to those of thennophilic bacterial and archaeal genomes, were the lower and upper limits appear to correspond to extreme cases, i.e., the 500 kb chromosome of the thennophilic ectosymbiont Nanoarchaeon equitans (Hubert et al., 2002), and the 5100 kb of the facultative thermophilic Methanosarcina acetivorans (Sowers et al. , 1988). Classification of endosymbionts as a group by themselves shows that although their genome size distribution overlaps with that of obligate parasites (Figure 1 ), their DNA content can reach values significantly smaller that those of the smallest parasites, i.e., the mycoplasma. The smallest-known cellular genome is only 450 kb and corresponds to the obligate endosymbiont proteobacterium Buchnera spp. (Gil et al., 2002), significantly smaller than the lower limit of 580 kb of the Mollicutes, which corresponds to the obligate parasite Mycoplasma genitalium (Fraser et al., 1995). Other groups with reduced genome sizes are the rickettsia and several spirochaete. The DNA content values of other obligate parasites and organisms with stringent growth conditions, which we have grouped with the mycoplasma, however, can reach values as large as the 5016 kb of Mycobacterium intracellulare (Kim et al. , 1996). 4. Discussion 56 The data summarized in Figure 1 is clearly biased and does not reflect in an accurate way the actual levels of prokaryotic diversity. Because of their significance in medica! and economical significance in human, animal, and crop plant life, pathogens and parasites are clearly overepresented in our sample. Moreover, the overlap in the 2000 to 3000 kb region in Figure 1 of several of the categories used here to group the species in our sarnple shows that prokaryotes with similar genome sizes but different lifestyles can have very different complement of genes . In spite of these limitations, the data summarized in Figure 1 provides useful insights into the evolution of prokaryotic DNA content and the size of a minimal cellular gene set. Considerable variations in DNA content may exist even within closely related bacterial species and strains (Bergthorsson and Ochman, 1995; Casjens, 1998), but as shown by the genomes of genera like Helicobacter and Streptomyces, this is not always the case (Shimkets, 1998). The size range of bacteria! genome sizes are clearly less constrained than that of the archeal chromosomes. Our results also demonstrate the unsurpassed genome plasticity of the proteobacterial clade. While some members of the group like the myxobacteria have undergone major expansion of their encoding abilities adapting to oxygen-rich environments and developing complex life cycles, others like Buchnera have followed an opposite direction and lost considerable amounts of DNA as they adapted to an intracellular environment (Gil et al., 2002). The therrnophilic bacterial and archaeal genomes tend to be relatively small, with the lowest limit represented by the 500 kb chromosome of the therrnophilic ectosymbiont Nanoarchaeon equitans (Huber et al., 2002). The 57 5100 kb genome of the facultative thennophilic Methanosarcina acetivorans is probably atypical. However, the size range of thennophilic genomes -u 50 e . Jl ·- u ;;J t 40 11.. 30 . ·1 20 10 o 500 1500 2500 3500 4500 5500 Gcnornc size (Kb) 6500 7500 8500 9500 lf) r-- Table 1. Sorne miniature cellular genomes Species 1 Genome size Lifestyle Reference ; (kb) Mycoplasma genita/ium 580 obligate parasite Fraser et al., 1995 Buchnera spp. 450 endosymbiont Gil et al.. 2002 crytomonad 55! sccondar)' endosymbiont Douglas et al. , 2001 nuclcomorph 76 Table 2. Genetic redundancies in small genomes of endosymbionts and obligate parasites* Proteome 1 Gl'-0.5-1 .5 Mb pertenecen a Micoptasmatates, Proteobacteria, Spirochaetales, Chlamydiales y Acholeplasmatales. 78 No obstante, encontramos valores del contenido de DNA de otros parásitos obligados mayores como el de Mycobacterium intracellulare con 5.016 Mb (Kim et al. . 1996). Excluyendo todos los organismos >0.5-1.5 Mb por su dependencia obligada a un hospedero. el mínimo contenido de DNA de un organismo de vida libre se encuentra a partir de 1.45 Mb y sí nuestros datos corroboraran la hipótesis de la duplicación total del genoma en la distribución esperaríamos encontrar picos con valores modales de 3.2. 6.4, y 12.8 Mb aproximadamente y claramente este no es el caso. Además si este fuera el caso en cada barra de la distribución o por lo menos en intervalos cercanos encontraríamos una "cierta" homogeneidad filogenética que marcaría no sólo relaciones ancestrales sino una "complejidad" genómica (metabólica) creciente y constante dificil de detectar por la propia dinámica del genoma. Por otra parte, el que en algunos procariontes haya sido observada la "posibilidad" de poder duplicar completamente el genoma temporalmente y bajo condiciones de laboratorio (Turn 1999) (manteniendo la duplicación o no) como es el caso de Azotobacter vinelandii, Deinococcus radiodurans y Methanococcus jannaschii (Bendich y Drlica 2000) no implica necesariamente que este hecho se pueda interpretar como el mecanismo único o más importante que operó en las primeras células, para la adquisición a gran escala de material genético. En este trabajo el intervalo de tamaños de genoma procarionte disponible hasta marzo de 2003 va de 0.448 Mb (Buchnera sp CCE) a 9.7 Mb (Azospirillum lipoferumSp59b) . (Islas et al 2003a ·; Islas et al ; 2003b ). Como ya se mencionó anteriormente los procariontes más pequeños de vida libre en la muestra incluyen organismos a partir de 1.45 Mb Ha/amonas halmophila (Islas et al ; b). Sin embargo, esto no quiere decir que éste dato corresponda a la cantidad de DNA mínimo que pudo haber tenido el ancestro común de todos los seres vivos, y mucho menos el "primer organismo vivo", sino que se intenta mostrar que existen características de genomas adicionales que merecen tomarse en cuenta para abordar el problema de una célula mínima. Puesto que procariontes dentro del mismo intervalo de tamaño de genoma pueden pertenecer al mismo grupo taxonómico como los Micoplasmatales, ó en un mismo intervalo de tamaño de genoma se encuentran diferentes grupos taxonómicos (Apéndice 2b) cepas bacterianas pueden variar en un amplio intervalo como Burkho/deria cepaica 4.6-8.6 Mb se concluye que el tamaño de genoma por sí mismo no determina una correlación entre este y capacidades metabólicas específicas. Debido a que el contenido de DNA es un atributo que representa una amplia diversidad metabólica que proviene de rutas biosintéticas comunes en algunos organismos, otros difieren en ciertos pasos enzimáticos debido a pérdidas secundarias o adiciones enzimáticas. redundancias funcionales etc. (Islas et al 1998) Entonces el interés por evaluar la cantidad mínima de DNA de una célula corresponde a la posibilidad de ubicarla retrospectivamente en las primeras fases de la evolución de la vida y sería relevante no solo por calcular el número de genes sino para tratar de caracterizar las capacidades metabólicas que podría haber ejercido dicho "organismo" en un determinado ambiente y evaluar sus posibilidades de cambio; tal como se ha pretendido definir al último ancestro común. 79 El intento por describir la naturaleza del último ancestro común es un asunto que mantiene la atención de diversos grupos de trabajo en un debate continuo entre teoría y métodos. Así la determinación del cenancestro a través de la distribución de algunas enzimas biosintéticas presentes en los tres dominios Arquea, Eubacteria y Eucaria han permitido delinear un perfil del cenancestro comparable al de los modernos procariontes en cuanto a su complejidad biológica , adaptabilidad ecológica y potencial evolutivo (Lazcano 1995).Sin embargo pueden hacerse interpretaciones incorrectas al momento de cuantificar e identificar sus rasgos como a continuación se describe. Cuando Mushegian and Koonin (1996) pretendieron detectar la cantidad de DNA requerida para mantener una célula mínima, compararon los dos genomas completamente secuenciados en ese momento Haemophilus influenzae y Micoplasma genitalium, y publicaron un inventario de 256 genes encontrándose ausentes algunas rutas biosintéticas. Estas conclusiones fueron derivadas para proponer que el último ancestro común tenía un genoma de RNA sin embargo el trabajo fue rigurosamente criticado por Becerra et al (1997) pues ambos genomas son de bacterias parásitas de humanos y han perdido gran cantidad de DNA, además la ausencia en su muestra de proteínas esenciales de eucariontes y Arqueas, involucradas en la repl icación , no es evidencia para afirmar que el cenancestro tuvo un genoma de RNA. Los estilos de vida son aspectos que pueden afectar el inventario de los genes del último ancestro común por ejemplo. la pérdida de genes, adaptaciones a microambientes intracelulares. o vida libre en ambientes muy específicos de tal forma que. El estilo de vida intracelular (de parásitos obligados y simbiontes obligados) restringe la posibilidad de adquirir genes de otros organismos vía transferencia horizontal, pudiendo también perder secuencias de inserción y secuencias relacionadas con fagos que le confieren al genoma rearreglos propios y posibilidad de cambio (Steopkowski 2001 ). En lo que concierne a la respuesta al oxígeno no todos los organismos con tamaño pequeño son anaerobios, lo que puede ser explicado a través de una serie compleja de adaptaciones secundarias que han guiado a la reducción polifilética de su genoma (Becerra et al ; 2000; Islas et al ; 2000). Sin embargo, se encontró, que en general los procariontes anaerobios, microaerofílicos y facultativos anaerobios están dotados con genomas más pequeños que los aerobios. Los organismos con genomas pequeños no son por su propio tamaño una muestra que se pueda interpretar como formas ancestrales de vida. Así , ni Micoplasmas o Rickettsias son buenos modelos de organismos ancestrales del Arqueano, y menos aún pueden ser candidatos a representar un minigenoma a partir del cual los procariontes evolucionaron por medio de duplicaciones completas del mismo. Igualmente, el intervalo relativamente pequeño del tamaño de genoma de los hipertermófilos puede revelar una tendencia a que ambientes con altas temperaturas limitan el contenido de DNA a un intervalo específico (0.5 Mb-5.10 Mb}, probablemente por la reducción del tamaño promedio de sus genes (Islas et al 2003a). Se ha sugerido que los primeros microorganismos fueron anaerobios heterótrofos, y que la disponibilidad de oxígeno promovió la aparición de nuevas capacidades metabólicas (Oparin, 1938). Nuestros resultados muestran una correlación entre tamaño de genoma y la respuesta al oxígeno, donde es evidente que los organismos anaerobios obligados microaerofilicos y facultativos 80 anaerobios están dotados con genomas más pequeños que aquellos procariontes aerobios, aunque hay considerable variación y traslape entre ellos (Islas et al 2000). La mayor diversidad taxonómica procarionte se encuentra en el intervalo de 1.5 a 4.0 Mb, en dónde además los cuatro tipos de respuesta al oxígeno están presentes. En contraste con otros grupos tales como las Arqueas hipertermofílicas, las proteobacterias han explotado exitosamente un amplio intervalo de tamaños de genoma, mientras algunos de sus miembros como las mixobacterias han experimentado la mayor expansión de sus capacidades codificantes adaptándose a ambientes ricos en oxígeno y desarrollando ciclos de vida complejos. Otros miembros como Buchnera han seguido una dirección opuesta, en una reducción máxima de su genoma debido a la pérdida de cantidades considerables de DNA que conlleva la adaptación a la vida intracelular. Se puede decir que todos los demás grupos de procariontes están dentro de este amplio intervalo. Por otra parte, los genomas más grandes (>6.5 Mb) corresponden a bacterias de vida libre con ciclos de vida complejos los cuales deben haber evolucionado una vez que significativas cantidades de oxígeno libre llegaron a estar disponibles en el ambiente Precámbrico. En ningún caso, sin embargo, hay la evidencia que apoye que el contenido de DNA de aerobios estrictos sea el resultado de duplicaciones totales del genoma. Hasta ahora no hay reportes disponibles en la literatura de genomas de arqueas con tamaños de genoma comparables a aquellos de Stigmatella (Casjens, 1998). No es claro, por supuesto, si esto refleja las estrategias evolutivas del dominio Arquea, y nuestra interpretación puede estar limitada por las descripciones actuales de la diversidad procarionte. De hecho, la base de datos analizada aquí está afectada por el significado médico y económico de los organismos, y no refleja en una manera exacta el biodiversidad de los procariontes. Es decir, los organismos patógenos y parásitos están claramente sobre-representados por la importancia médica así como la trascendencia económica que representan. Sin embargo, un gran número de organismos microaerofílicos y facultativos anaerobios de diferentes grupos filogenéticos en nuestra muestra probablemente reflejan la exitosa adaptación de niveles de oxígeno cada vez mas altos en la atmósfera terrestre. Por consiguiente, se puede concluir que la expansión de ambientes aeróbicos durante el Precámbrico no sólo dirigió a la diversificación de bacterias adaptadas a condiciones novedosas, sino también al desarrollo evolutivo en lo que se refiere a la conservación de genomas grandes. Como ya se mencionó anteriormente, es notoria una tendencia en un intervalo más o menos definido de los tamaños de genoma procariontes extremófilos que va de ~0 . 5- 5.1 O Mb) que corresponden respectivamente a Nanoarchaea equitans y Methanosarcina acetivorans. No obstante, este intervalo no necesariamente expresa una correlación entre el tamaños del genoma y el estilo de vida microbiano extremo (hiper)termófilos pues existen otros grupos de procariontes que comparten esos mismos tamaños de genoma siendo mesófilos. Un rasgo evidente es el que muestra que los genomas termófilos e hipertermófilos están dotados con secuencias génicas codificantes más pequeñas (283 ! 5.8) en comparación con los procariontes mesófilos (340 ! 9.4); sin embargo, el reducido tamaño de los genes en organismos extremófilos es un rasgo polifilético ya que esta característica es encontrada también en mesófilos de diferentes grupos. 81 También se encuentran, secuencias simples en organismos con ambos estilos de vida mesófilos e (hiper)termófilo, pero en los hipertermófilos, excepto Thermoplasma acidophi/um, las secuencias simples presentan gran cantidad de ácido glutámico que son estables en condiciones de pH ácido. Es bueno recordar que la llamada raíz del árbol universal no corresponde al primer sistema vivo, solamente refleja la punta de un tronco de tamaño indeterminado en la trayectoria de una sucesión de eventos evolutivos muy antiguos como el surgimiento de familias de genes, transferencia horizontal etc. Así la posición basal de los hipertermófilos en los árboles de rRNA se puede explicar por: a) el alto impacto al que estuvo sometido el Arqueano temprano. b) como una respuesta adaptativa de las bacterias debida a la transferencia horizontal de la reverso girasa de las arqueas y c) competencia entre mesófilos más antiguos e hipertermófilos adaptados a condiciones de altas temperaturas. Si los genes pueden moverse de un organismo a otro vía transferencia horizontal algunos genes pudieron haberse dispersado ampliamente de tal forma que ellos pudieran detectarse como parte del último ancestro común, sin percatarnos que de hecho son más recientes . Pero aún estamos muy lejos de entender completamente el origen y los atributos de los primeros seres vivos en cuanto a que no siempre se dispone de evidencias de rutas metabólicas, datos bioquímicos, ciclos de vida. registro paleontológico etc para integrar e interpretar correctamente los pasos y eventos que siguieron los organismos procariontes durante su evolución. 82 Referencias Alifano P, Fani R, Lio P, Lazcano A, Bazzicalupo M, Carlomagno MS, Bruni CB. (1996) Histidine biosynthetic pathway and genes: structure, regulation, and evolution. Microbio! Rev. 60(1 ):44-69 Anderson R P y Roth RJ (1977) . Tandem genetic duplications in phage and bacteria. Ann. Rev. Microbio!. 31 :473-505 Aravind L, Tatusov RL, Wolf YI, Walker OR, Koonin EV. (1998) Evidence for masive gene exchange between archaeal and bacteria! hyperthermophiles Trends Genet 14:442-444 Becerra A, Islas S, Leguina J 1, Silva E and Lazcanbo Antonio. (1997). Polyphyletic gene losses can bias backtrack characterizations of the cenancestor. J Mol Evol 45: 115-118 Becerra A, Cocho G, Oelaye L, y Lazcano A. Simple sequences: it is something you have, wheter you like it or not . enviado Bendich AJ y Orlica K (2000) Prokaryotic and eukaryotic chromosomes : What's the difference? BioEssays 22.: 481-486 Brasier OM, Green RO, Jephcoat PA, Kleppe KA, Van Kranendonk JM, Lindsay FJ,Steele A y Grassineau VN. (2002) Questioning the evidence for Earth 's oldest fossils . Nature 416:76-81 Bresler V, Montgomery WL, Fishelson L, Pollak PE (1998) Gigantism in a bacterium Epulopiscium fishelsoni correlates with a complex patterns in arrengements quantity and segregation of ONA. J. Bacteriol 180:5601-5611. Cantor,CR., Smith,C.L., & Mathew, K.M. (1988) Pulsed-field gel electrophoresis of very large ONA molecules. Annu Rev Biophys Chem. 17:287-304 Casjens S, Huang W. (1993) Linear Chromosomal Physical and Genetic Map of Borrelia burgdorferi, The Lyme disease agent .Mol Microbiol 8: 967-80 Eisen AJ . (2000) Horizontal transfer among microbial genomes: new insights from complete analysis. Curr Opin Genet Oev 10:606-611 Fani, R., Mori, E., Tamburini , E. and Lazcano, A. (1998). Evolution ofthe structure and chromosomal distribution of histidine biosynthetic genes. Origins Life Evol. Biosph. 28: 555-570 Fraser CM, Gocayne JO, White O, Adams MO, Clayton RA, Fleischmann RO, Bult CJ, Kerlavage AR, Sutton G, Kelley JM, et al,. (1995) The minimal gene complement of Mycoplasma genitalium Science 270: 397 -403 83 Haldane J B S (1932)The causes of evolution Princenton University Press. second printing p222 Hankock JM (1995) The contribution of slippage-like processes to the genome evolution. J Mol Evol 411038-1047. Herdman M. (1985) The Evolution of Bacteria! Genomes. In Cavalier Smith (ed) The Evolution of genome size. John Wiley, London. Holt GJ, Krieg RN, Sneath AH, Staley TJ , Williams TS. (1994) . Bergey's Manual of determinative bacteriology . Williams & Wilkins Islas S, Castillo A, Vázquez H G, and Lazcano A. (2000) . On the role of genome duplications in the evolution of prokaryotic chromosomes In: Chela-Flores et al. (eds) , Astrobiology, 289-292. Kluwer Academic Publishers. Netherlands. Islas S, Velasco A M, Becerra A, Delaye L, and Lazcano A (2003 a) Hyperthermophily and the origin and earliest evolution of life. lnternational Microbiology. Aceptado para el vol. de junio Islas S, Becerra A, Luisi Luigi P, and Lazcano A. (2003 b) Comparative genomics and the gene complement of a minimal cell . Enviado a Origins of lifeand evolution of the biosphere. (Enviado). Jain R, Rivera MC, Lake JA.(1999). Horizontal gene transfer among genomes: the complexity hypothesis. Proc Natl Acad Sci U S A. 96(7):3801-6 Jain R, Rivera MC, Moore JE, Lake JA. (2002) Horizontal Gene transfer in Microbial genome evolution. Theor Popul Bici. 61 :489-495 Jensen R A, ( 1976). Enzyme Recruitment in Evolution of new function. Annu Rev Microbio!. 30: 409-25 Kim JR, Kang, BS, Ko JH, Park JS, Kim SJ, Bai GH, Chung TH, Nam KS, Choi Y K, Choi IS, Chung T W, Lee Y C, and Kim CH (1996), Genomic heterogeneity in clinical strains of Mycobacterium tuberculosis, M. terraecomplex, M. gordonae, M. avium-intracellulare complex, and M. fortuitum by pulsed-field gel electophoresis, J. Biochem. Mol. Biol. 29: 569-573 Kolsto AB (1999) Time for a fresh look at the bacteria! chromosome. Trend in Microbio! 7(6):223-226 Lawson et al, (1996) Phylogentic analysis of CarB genes complex evolutionary history includes an interna! duplication within a gene which can root the tree of life. Mol Bici Evol 13(7): 970-977 84 Li . WH (1997) Molecular Evolution. Sinauer Associates, lnc., Publishers. Sunderland Mass. USA. Lin YS, Kieser HM, Hopwood DA Chen CW. (1993) The Chromosomal DNA of Streptomyces lividans 66 is linear Mol Microbiol 10:923-33 Margulis, L. (1993). Symbiosis in Cell Evolution, 2nd Edition. Freeman, New York. Martin-Didonet GC, Chubatsu SL, Souza ME, Kleina M, Rego MG, Rigo UL, Yates GM, and Pedroza OF. (2000) Genome Structure of Genus Azospirilum J Bacteriol 182(14):4113 -4116 Mira A,Ochman H, Moran AN (2001) Deletional bias and the evolution of bacteria! genomes Trends Genet 17 (10) : 589 ~ 596 Moreno E. (1998) Genome evolution within the alpha proteobacteria: why do sorne bacteria not possess plasmids and others exhibit more than one different chromosome? FEMS Microbio! Rev 22: 255-275 Nelson KE, Clayton RA, Gil SR, Gwinn ML, Oodson RJ , Haft OH, Hickey EK, Peterson JO, Nelson WC, Ketchum KA et al (1999) Evidence for lateral gene transfer between Archaea and bacteria from genome sequence of Thermotoga maritime. Nature 399:323-329 Neumann B, Pospiech A, Schairer HU. (1992) Size and stability of the genomes of the myxobacteria Stigmatella aurantiaca and Stigmatella erecta. J Bacteriol 174:6307-10 Ng WV. Kennedy Sp, Mahairas GG. Beruist B, Pan M, Shukla HD, Lasky SR, Baliga NS, Thorsson V, Sbrogna J, Swartzell S, Weir O, Hall J Dahl TA, Welti R, Goo YA, Leithauser B, Keller K, Cruz R, Danson MJ, Hough DW, Maddocks DG, Jablonski PE, Krebs MP, Angevine CM, Dale H, lsenbarger TA, Peck RF, Pohlschroder M, Spudich JL, Jung KW, Alam M, Freitas T, Hou S, Daniels CJ, Dennis PP, Omer AD, Ebhardt H, Lowe TM, Liang P, Riley M, Hood L, DasSarma S (2000) Genome sequence of Halobacterium species NRC-1 Proc Natl Acad Sci USA 97(22): 12176-81 Ohno, S. (1970) Evo/ution by gene duplication Springer Verlag. New York Oparin (1938) The origin of life. MacMillan, New York. USA Petrov A O (2001 ). Evolution of genome size : new approach to and old problems Trends Genet. 17(1 ): 23-28 Romero O, and Palacios R. (1977) Gene amplification and genome plasticity in prokaryotes Annu Rev Genet. 31 :91-111 85 Roy H P. ( 1999) Horizontal transfer of genes in bacteria Microbiology today 26: 168-170. Schopf J.W. (1993) . Microfossils of the early archean apex chert : new evidence of the antiquity of lite. Science 260:640-6 Shimkets, L.J. (1998) Structure and sizes of the genomes of the Archea and Bacteria!. In Bacteria! genomes: Physical structure and analysis . Bruijn F,Lupski J,Weinstock G (eds) New York Chapman & Hall Simillon C, Vandepoele K, Van Montagu C E, Zebeau M y Van de Peer Y. (2002) The hidden duplication past of Arabidopsis thaliana. Proc Natl Acad Sci. USA 99(21):13627-13632 Soltis E D & Soltis S P ( 1999). Poliploidy recurrent formation and genome evolution TREE.14(9):348-352 Sparrow AH y Nauman AF. (1976) Evolution of Genome Size by DNA Doublings. Science192( 4239): 524-529 Stepkowski T, and Legocki A (2001 ). Reduction of bacteria! genome size and expansion resultlng from obligate intracellular lifestyle and adaptation to soil habitat. Acta Biochem Polonica. 48(2): 367-381 Tautz D. , TrickM, Dover G A (1986) Criptic simplicity in DNA is a majar source of genetic variation . Nature 322:652-656 Trevors J.T. (1996) Genome size in bacteria. Antonie van Leeuwenhoek 69: 293- 303 Trun JN (1999) Genome Ploidy In: Bacteria! Genomes Physical Structure and Analysis Brujin J. F, Lupski RJ and Weinstock MG . eds New York: Chapman & Hall Wallace D.C. and Morowitz, H. J. (1973) Genome size and evolution. Chromosome 40: 121-126 Wolfe KH and Shields DC (1997) Molecular evidence far an ancient duplication of the entire yeast genome. Nature, 387:708-713 Zipkas D. and Riley M. (1975) Proposal concerning mechanism of evolution of the genome of Escherichia coli. Proc Natl Acad Sci USA 72(4):1354 00 -.J Data base of the prokaryotic diversity according with the genorne size and its relations with their metabolism and sorne environrnental properties Bacteria: genera and specie ; GS: genome size expresed in Mb; GROUP: each one of the main bacteria! categories according to the Bergey's Manual of Bacteriology; Order: taxonomic categories belonging for each organism (NCBI); TEMP: Categories about the optima! grow for each microorganisms (Mesophilic: 25 - 44ºC; Thermophilic: 45 - 70ºC; Hyperthermophilic: 71 - 11 SºC) LS: Life Style, parasites obligades simbiontes obligades and free life organisrns; Oxygen response (metabolism) REF: references in wich the genome size is reported. Nflme G.S. GROL'I' ORDER TE~ll' LS MET. REF /311chnera spCCI:" 448 G- g prokobach:-ria ~I so A Gil R. et al (2002) Buchnern v1CCU 476 (j. g proh:-ohih.'h:ria ~I so .-\ Gil R. plasma el!J·chniae J I Ehr/i,·hia ... ennet.m 32 l::hrlkhia ruticii 33 Uri!aplasma urealyllcum 3-1 /Jorrelia af:e/;i J 5 /.Jorre/ia andersonii 36 Horrelia garinii ;- Borre/ia japonica 38 Borrelia hermsii 39 Mycoplasma aga/actiuef'G2 -10 Horre/ia b11rdogfen 41 Wo/bachia wDim -12 Mycoplasmaflocculare 43 Spirop/asma monobial MQI T 44 Spiriplasma monobia/ MMG 45 Spiroplosmo sp w 115 46 He/icobacter pyloriHP 3 47 Mycoplasmo hyopneumoniae 710 G- 720 M 720 M 730 ~I 740 ~I 74~ ~I 747 ~I 7~~ G- 7R4 ~I 800 ~I 820 ~I R60 \ 1 870 \1 878 e¡_ 880 G- 890 \1 R90 U- 910 G- 910 O- 910 U- 920 G- 94~ M 950 G - 950 G- 952 M 980 M 995 M 1000 M 1040 G- 1045 M g proteobactcria Mycoplasmatalp 1-15 ' 9f Spirop/cis111a sp / 'UP-1 9 J Wo/h(lchw .\fe/ Pop 92 Wolh1.1chia . .\/ J 93 ll'o/hachia .\le/CS 9./ !Janon~lla v1nsonil 95 /fortunella henselae 96 Rickellsia masiliae Mw I 9- Spiroplas111a sp 'l:V-1 98 Rickensiu helwrica C9P9 99 Ch/a111ydia psillací AB- /()0 Ch/amidia psi1tac1 IH JO/ Chlamydia psíllací 18 102 Ha/omonasha/mophilaATCCl9~/ 103 Campylobacter laridis UA487 104 Spiroplasma 111e/lifen11n BC-3 7 105 Spiroplasma sp MQ4 106 Ehr/ichiaHGE /o- Acholepfosma modicwn 1276 G· 1295 G- 12RO ~I 1300 G • 1300 M 1~17 G • 1.117 G- P20 ~I 1320 M IJ2~ r-.t 1.131 e;. 1340 M JJ50 ~I 1360 G- JJ60 G- 1360 G- 1370 G • 1378 G - 1382 (; . JJ90 ~I 1397 G- 1450 G · 1450 G- 1450 G- 1450 G- 1451 G- 1460 M 1480 M 1494 G- 1500 M a protcobacteria e protcoba(..1.eria ~Jycopla.'imatall!S e prokoba'-1eria f\lycoplasmatalcs i' protl;!'ob;icti'ria e prol..:oba1.:tcria ~ 1 ycopla~m:.ata les Xlyi:oplasn1atalcs l\ly,·oplasmata)..:'s g proh:ohactcria ~fy1.:oplasmatalcs fl.l~ · copla~matal"·s a proteoba~ · tcria a prutcoh;.1ch:rü1 a prokoba1.·tcri;1 g proh.'oba1.1cria g protcoha1.'1cria a prokobactt:"ria fl.tycoplasmatalcs a protl."oha1.'1cria Chlam~·dialcs 1 'Vcnu1..·omicrobia Chlamydialcs:V crrucomkrobia Chlamydiaks:'Vcmacomicrohia g proteobactcria e proteobactcria Mycoplasmatales Mycoplasmatales a prot..:-obacteria Acholeplou,TI1atalcs M ~· M M ~· M ~I ~' ~I ~' ~I ~ ' ~· ~I ~I ~I ~· ~I ~· ~· ~· ~· ~· ~· M T M M M M PO PO PO ro PO PO ro PO PO ro ro PO PO so so so PO PO PO PO PO PO PO PO FL FL PO PO PO PO A ~11 FA MI FA w ~11 FA F.-\ f ,\ .-\ F.-\ f ,\ A _,\ .-\ . .\ A A FA A MI w MI FA MI f..\ FA MI FA Roux. \'.Raoult. D (1993) Takami S, \'.'J. ~l. (1994). Roux .\'. Ami Raoult.D.(199'.') Ye. F .. J.aigrct F. and Bo"'; J. ~l . (1994). YI!' . F .. l.aigrct F. :md Bon:- J. r..t. ( 1994). Sun l . .\'. et al (2001 ). Sun l..\ '. et al (2001 ). Sun l.\' . et al (2001). Roux .\' . .\nd Raoult.D.( 1995) Roux . \' . . .\nd Raoult.D.( 1995) Roux. \ ". Raoult, 0(1993) Ye. F .. l.,igret F, and Bo\'d. H (1994). Roux, \ '. Raoult, D (1993) Frutos. R. et al ( 1989). Frutos, R..gerallasrnatales e protl!'oba'-'"lcria '-' protcobackria e protcoha"1cria ~ f "·1hanoc0\.·1.·a lcs c proh.·ohal'kria f\ly1.:oplasm. L.: Burgeois. P .. Mala. ~l.. and RilZenlhakr. ( 1989) MI Yan. W., and Taylor. O.E. ( 199 1). A Resche O.K., Frazi<1ct.r pylori/11'/ /8./ lfe/;cohac1cr p,1·/ori/{f>) 185 llc•/ico/,actt."r hi::oZacter pylori 189 He/icobocter salomonis HS6 190 Streptococci SF402 M/8 191 Archaeoglobus fulgid11 s 19! S1reprococc11s SF40JMJ 193 He/icobacter sa/omonisHS/ 194 Streptococcus thermophilus054 195 Helicobacter bizzozeronii HBlO 196 S1reptococc11s 1hermophi/11s ND 1 ·6 1 9~ Helicobacter salomonisCCUGJ"848 (HS Bd) 1710 G· 17IO G · 1711 G· 1713 G • 1714 (i. 1717 G· 171R G t 1720 ~ I 1723 G • 1730 G- 1730 G· 17D G • 1757 G • 1760 G • 1760 ~I 1760 G- 1760 G • 1766 G • 1780 M 1-so G. 1780 G · 1781 G- 1783 G+ 1784 ,.\ 1784 G+ 1784 G- 1791 G+ 1796 G· 1797 O+ 1800 Q. e proteobactcria e protcob~u.1eria e protl!'obacteria :\ctinobaclcria e protcohactcria a prolcoba1..1eria Bacillus.rClostridium L'nclas."iifo:d f\ lolli Bacillus 'Clostridium ·nwn11us.'D.:in0.:occus e protcohacuria Bacillu!' 1Closlrillium ..: protcobactcri;l g prot1.·obact.:ria ~lycopl<1smat;1l1.:s e protcoha<.1.:ria e protcohadcria e protcobactcria l\lycoplasmatalcs e proteobactcria e p~oteobactc-ria e proteobacteria Bacillus/Clostridium Archaeoglobales Bacillus/Clostridium e proteobacteria Bacillus/Clostridium • proteobacteria Bacillus/Clostridium e proteobacteria ~ I ~! ~I ~I ~I ~I ~I ~ I ~ I T ~ I ~I ~ I ~ I ~I ~I ~I ~I ~I M ~I M ~! H ~! M T M T M PO PO PO FL FL PO PO PO PO FL FL PO PO PO PO PO PO PO PO PO PO PO PO FL PO PO FL PO FL PO MI Taylor, D.E. et al (1992). ~ti Taylor, D.E. et al (1992). MI Taylor, D.E.et al (1992). . \.'\A o· Riordan K, Fitzg 1828 + 1829 . .\ 1830 º' 1833 • 1834 • 1835 4 1835 O+ 1840 ~I 1841 O + 1841 G + 1845 + 1854 + 1846 + 1850 A 1855 + 1857 A alobactcriales alobach:riales alob:u.1crial.:s .: proti..~obar.. · t.:ria r k hacteria acill s ' Cl~tridium Cyanoha1. : t~ria .: r t.:\>hai..·h.·ria acill s:: lostri i ~lyl.'oplasmataks acillus. lostri i Racillus l stridiun1 l3ar..·illt1s ·ct stri
  • 1ridium Actinobact~ria Sulfolobales Sulfolobaks Th~m1ococcal~s ..\1.:tinobactt:'ria g protcobactt:"ria g proteoba\.11.:ria Th~nnocol..'\.. ·a l ~s An.·h•h. ~ ogl oha l~s " prolt:'oba.:lt'ria Nitrospira Group Ba1.:i llus C'lostridium t\lcthanot:o1.·1.·al\!'s Ba.:ilh1s·ClostriJiu111 Th .. ·mms. D einO\.·occus Oa1.·illus 'Clostridium Bacillus.'C lostridium Rt1.·illus.'Clostridium Bacillus. Clostridium Sulfolobalo. g proteobact~ria e proteobacteria Bacillus/Clostridium Thcrmus/Deinoccx:cus Bacillus/Clostridium Actinobacteria e proteobacteria Haloba\.~erial~ s Halobacteriales T M T T H ~! ~! ~! 11 ll ~ I ~ I ~I ~! ~! T ~I ~I ~I ~ I T ~I M ~I T M M M ~I M FL FL FL FL FL FL H . FL FL FI. FL FL PO Fl. FL FL PO FL FL PO FL PO PO FL FL FL FL FL FL FL FA O 'Sulli,·an TF, Fitzgttald GF. ( 1998) ANA o· Riordan K, Fitzg•-rald FG ( 1997) FA Bauman C. et al ( 1998) A Bauman C,el al ( 1998) .-\N ,\ Bauman C, et al ( 1998) AN.-\ o· Riordan K. Fitzgcrald FG (1997) F.-\ F.-\ ~lellado , E . Et al (1998) ~klladrn,· :\'.'\' :tnd F .. ·m.~ tti J.f { 1997) .-\N.-\ Sitzman J. anJ Kkin . .\.( 1991) F.·\ Sun>rn,· .-\J\ :md Fcm:tli .U ( 1997) .-\ Tabata. K , Kosugc. T .. Nakahara. T. . ;ind lloshino, T ( 199.l ) F.-\ Su,·orov AJ'\. F~neti JJ .1996. f ,\ Lortal, S .. 6./66 _.,-u H~fidohac.· t er 111111 breve . .::fTCC J 56998 _..,-¡ S1rep1ococc1t.)' muran.\· CiS-5 _,-_., Theni111s oshuuui ]-; Chforobiwn Iepidum ]--1 B~fidohacterium bre1-·eC!I'6./68 1-5 ( 'hromohalobacter marismorwiA- 100 ]-6 Thermus scotoducuul '¡ .. -a ::-- Streptococcus aga/actiae 2-s Pseudnmonas aeruxmosaATCC333./8 1 7 9 Pseudomonas aeruginosaATCC33361 280 PAO 28 I Acetobacter xylinum 282 /ialo111onas eurihalinaATCC49336 283 Thermus scotoductus .l\'H 284 Spiroplasma ixodetis ¡·32 T 285 A lteromonas sp. M-1 286 Chromohalobacter marismorruiA TCC 1-056 28í Thermus scotoductus ITl-252 2020 G + 2020 M 2036 A 2040 G- 2054 G- 2058 G • 2066 G- 20KO e; - 20SS G ' 2087 G- 2100 G- 2100 º " 2101 (; 1 2101 G t 2120 G- 2137 G- 21 SO G • 2162 G- 2166 G- 2200 G • 2200 G- 2200 G- 2200 G- 2200 G- 2214 G- 2216 G- 2220 M 2240 G- 22~2 G- 2268 G- Bacillus/Clostridium Mycoplasmatalmrce1emcomitans ] 5 s omvnas at·rngino.u1. : J1l .~ C33JjJ ] 6 f>.Hmdomonm: <11.:rugino:w 1' "C .::6n 7 crohaciJ/us tis l.:\ :!301 ] S .·l c hacilbu l eumoniac _;SI./]/ ] 9 {1ro¡n i!it1c1erium i'c11Jenrd hi1 0 I /oemophilus c1rain/l c :c1 0/ .·l :tinoh"cillus hm ¡mcwnoniae 5a ;.:¡ - JO] .-lctinohacilhu e10- 11mon1tw I :! ]9 3 1 íhrio 1i ola 0./ J1t_· \i-cmdfo trefád !i 5 .=ct haci lus l omal! JO J 39 J(Jli .:l r vhaci /us !t: 11moniae 1.'3 JO- enmu ckianus J ."i0J8 308 Actino cil/ .'i ¡meumoniae · 6 9 lf / ac1eriwn / i RC-1 JO usobacterium c/ t 311 s onas m i osa C 349 2 onas mgi osa C 350 3 s onas C 35/ 4 s onas r ginosa C 354 5 s onas ni inosaA 3 5 J 16 s o onas nosaA 357 ~ s o onas nosaA 358 70 G + 83 G- 89 G- 00 G- 00 G- 00 o 1 00 G- noo G- 2300 .t-.:nll.:z·:\h·a r..:z t l l 'J9~) aul'. 1 .. ·\nd (ioodg;1I . I l. :<1 ) hc\·ali-.1 ) F.-\ F.-\ . .\ FA A h.-·alior t l l N) hc,·ali,·r.ll ll S. "tal ( 1'>97) .·\ N :\ 7.l11.·nh.' r RI .. St:mton TH. ( 199-t) . .\N .-\ Casj S. (1998) .-\ t\t i .. ·haux-Charai:on S . .. ·1 al ( 1997) :\ ~l idm11 .x-C h a r a1.:o n S . ..:-t ;1 1 ( 1997) .-\N .\ Young~landCok(l99 . l) .-\ .·\Jlard.:t-Sen ·l'nl. :\. ~t ;11 ( 1991) .. \ F.-\ .. \ .-\ ,\ F.-\ F.-\ F..\ Grothu..:-s. D .. and Tümmkr. H .. ( 1991 ) Oana K. stridium d prntr..·oha1.:kria h protr..·oh:1ctr..·1ü Ha1.:illu!' Cll,~tridium a prnlr..'l>h:h .. ·ti:ria g protr.. ~o h.idc:ria a prokobal·ti.:riil h prot,:oh;H:tc!ria g protc:uhactc:ria g prot~obac. · k r ia Bacillus ·c1ostridium Purplr..· non Sulfür B BacillusClostridium g proteobacteria a proteohacteria g proteobacteria g proteobacteria Bacillus/Clostridium Bacillus/Clostridium Bacillus/Clostridium g protcobacteria ~I ~I ~1 ~1 M ~I ~ I ~I ~I ~I ~I ~I ~I ~1 ~I ~1 ~I ~1 ~1 M M M M M M M M FL FL ro FL FI. FL FL FI . FI. FI. FL FL Fl . FL FL FL PO FL FL FL. FI. FL PO FL FL FL FL FL FL PO A Lima T P. Correia M A (2000) A Lima T P, Correia ~1 A (2000) ,\NA Oga~1 K. ot al (1997) AN . .\ De,weux R, Willis SG, Hines ~1E (1997) . ,\NA Hielm S. el al (1998) ANA Hidm S. el al (1998) ~JI A .·\N .. \ Dean.J..-\ . and Bazylinski ( 1999). Tren>rs .l.T (1996) (i;1j11 N. el al ( 1996) .·\NA lliehn S, d al (199~) :U"\J .. \ D.:wrt>u~ R.Willis SG. Hines t\fE ( 1997). .\ Ami Is. R. El al ( 1 99~) . F.·\ ~JI A AN..\ ,\ .\ ,\ .\N . .\ .-\ . .\N.-\ .\ A A A ANA ANA ANA A Ch ... ·,·alir..·r 13. Huh.:11 .J C ami Kamm.:ri:r ( l 99-l) D~ · a11 , J . A and Bazyli11~ki ( l 1)•J9). T r1;. ·\ ·o r ~ J. T. ( 1996) CasjerL< S. ( 1998) S1ibi1z.S.and Garktts:l .L: ( 1992) Gin;ird.~I. El al (1997) Gralton F~I. Campbell .-\L. '\eidle El.. (1997) liidm S. el al (1998) Grothu..:s, D. , and Tünunl.:r. [L ( 1991) Hiehn S. otu/imm1 Bel11gaF: H .l9NS G • -o ./5fJ Clu:aridi1m1 bo1111i111tm RS-J 11 .198~ O·• N .¡5 ¡ r1os1rid111m ho11di,111111R- 90l-rE 11 _l9XX O ' 45-1 C/o.~tridmm ho111/1mm1 _,O~F .1996 li ' 453 !Joci/lusfimms 4000 G ' .J5-I Clns11"idiu111 hnmfinwn 6},:/ 4000 G • .J55 Pse11Jomonl"l!i st111:eri 4000 li - -1 56 C/os1ridi11m hotulinum 6/fJR8-6F 1, 4016 G·• .¡5- Clostridium botulinumFTIOF{) 4016 O·• -158 Lac1obaci/111s plamtmm1 CSTI10:!3 4022 G l ./59 Pse11do111onas s1111:eriDXSP2/ 4030 e;. -160 Clostridium borulimm1 R-90E 4038 G • ./61 Pseudomonas srut:eriSPJ./02 4039 G· ~ 62 Micrococc11s sp )'./ 4050 G t 463 Pseudomonas stutzeriCH88 4060 G- 46./ Pse11domonasfl1101·e:;cens (2) 4061 G- ./65 Yersinia pestis 4400 G · ./66 Pseudomonas putidt1 8-1./ 40XO O- 46--: Pseudomonas s111tzeri/ ..... 1'401 4102 O· . .\ctinobactcria Bacillus ··c1ostridium Bacillus'Clostridium Bacillus ·clostridium a prolrobactaia a prot.:ohackria Bacillus 'Clostridium A.ctinobact!o!'ria a prot!o!'oha<.1cria g prot!o!'ol:iai.:t.:ri ;1 .-\."·tinohai:t.:ri<1 IJ;h:ill11s Ch.>:-.1ridium B;1i.:illus C'lostridium Ba"·i llus Cl"lstridium lla1.·illus ClostriJi111n Ba"·illu~ 'Clostridium Badllus Clustridium g prntcoh:11.·t!o!'ria Ba"·illus Clostridium B;1~ · illus C lostridium Da"·illus. Clostridium g prot~obacteria Bacillos Clostridium g prot~ohacteria Actinobactaia g proteobacteria g proleobactl!ria g proteoba'-'1cria g protcobacteria g proti:obactl!ria M ~I ~I ~I M ~ I ~I \1 ~I ~I \ 1 \1 \1 \ 1 \I ~I ~ I \1 ~! ~I ~I ~I M ~I M M M M M M FL FL FL FL FI. FI. FL n . FL FI . FL FI. FI. FI. n . FL FJ. FL FI. FI. FI. FI. FL FL FL FL FL PO FL FL A Lima T P, Correia M A (2000) . .\i'\A YoungMandCol< (1993) ANA Hi9:'i) .-\ Ginard.~I.. ro1t·us 1111mbilis ./-(1 f 1.11·11Jtlll:1lt/(l.'í Sfllf: l!l"I ,\"f'J-1\/.\ _,- - / ' . ~c.:11dc i mc111cu / ('.\" f cJ.li/i:ro .1-s ! 1 sc ~ 11Jrnm1 11a . 1 p . 'i c t1d • wlc11li,~c ~ m · 1 V .\/50/XS .¡-e; r onas Sllll:Cl'l .· IYll JS(J fls ,'1id11m1 11111 .'i .H'l11nue ./S / {1.ü•m/c 1m1inw: .'i ll tt:eri ..IT C < ~ 1-5sr.; .JS:! /l.'i-13ilak.E .. et a l 98) .-\ hili p J.. t l . <>) ~ JI e uzza an S~J. seudomona.~ ( .. \) altophila DS.\ 150 ¡-o 518 811rkho/daiu aicu 1l 'CJ --(ifJ(383) 9 scherichiu /i I Ci2 0 c r l>actru rhrop1 \Ki 30 / 511 l onellu t ri itt TJ 5 2 ept sp1ra i t ans ~·crovar mcola 5 3 ordetella araperhusiJ 5 4 acteroide.r icron 5 5 acteroide.'i i nis 6 chrohactrwn thropi ;JT C./9} 8 2- Afycobacteri1'm icroti 4500 0- 4500 G- 4500 o - 4500 G- 4500 O- 00 G- 28 - 70 o - .J .5')0 - 92 (j . 92 e; - 00 · .u;oo G- .1r.oo G- ú00 e;. 00 G ·' c.40 O- 70 - 76 o - 00 G- 4 00 Ci- 00 G- 00 G- HOO O- 03 - 46 G- 00 O- 00 o - 00 O- 00 O+ g r teobactma r t ohacteria g r t obacteria g h.:obal..'teria r tt ba .. "teria g roh.·oha1..11,.•ria pr o teobal..'l~ri;a g prol~ohach..'ria g rnk· h;Kt..:-ri;i 'FB ro11.: ho:h.:h:1 ia g r h.· ..-t1.·1'ia nll.' ha.:kria h nl1.·ohad,:ri <1 g l."ohact.:ria ;,1.·illus ' l stri i 1.· ha1..·t..:ria g pr ot.:o bach~ria r h: 1..11.'ria proti: o had~ria l! a<.1cria r t obacteria r t actc!fia r<1ti:obacteria i l..'hadales r t obactcria FB FB prot~ o ha'-"teria cti obacteria M ~I M ~I ~ I ~ I "' ~I \1 \ 1 \1 ~I ,\ I \I \1 "' ,\ I ,\1 \ 1 "' "' ~ I "' ~ I ~! ~! ~I FL FL FL FL ro Fl. FI . ro l'O FI . FI. FI . FI. FI. FL FL FL FI. Fl. Fl. L l. l. ro l. L ro A A ..\ .-\ .·\ .-\ L\ L\ \11 .. \ F.-\ .·\ ,\ .-\ F.-\ L\ F.-\ .-\ .·\ .. \ F..\ A F . .\ A .-\ I I ,\ .-\ inard,M., t l 97) inard.M .. t l ( 997) kada.N .. t al 91) ·11tong L. tlmcheary .Pang . 997) acioglu . asi ., ta l . 96) o d !IN, rc berton H 98) Oinard , ~ t.. l l 97) o :tld.: . ., lt.: an l. amid . 991) 13.:rgtih>r .... on l '. :md dunan 11. ( 19 9~) ;t .: zza an i\!. d ;11 1 1)97 ) ( irntlrn1.·~. () . mi ·1 ·onunl .. ·r. 11 . 9 1 ) l.iu I.. amk·n;.on "-E 95) (ii nJ.7\1.. t ')97) llp: \\\\\\ .;1ps n .. ·t 11r g ~111 \1111. · ka1 1\: llud, h11hkria,:.:pai1.-a rq1li i11:1rJ.\I.. .:tal ( 97) \ ·ary r. :1) l.krgthor.i;;.s1111 1. r. mt (k n1an 11 . ( 1 1 )9~ ) i1h. . .. 87) ~ 1:.ildúna .. and Tümmkr. 1 L ( 199 1 ) tirothth:s. D . . a11d Tümmk·r. IL ( 199 1) Grothu .. ·s. D . . and T fin unJ..:r. 1 L ( 1991 ) T<1ylor. .- \ .. Barhour.G . .- \nd Thnmas.D . J 99 1 Kimm . etal (1996). \\'iJjajn R. Suw:rnto :\. Tjah.iono H. ( 1999). Kim JR. 5G 1 Escherichia co/i ffC0/151 5(,:! ('/ostridium ucetoh11tyluw11 l'•./Cf'.?f>:! 563 F:.fcherichia co/; ECOR./O 56./ /'h.)·llohc1cren11m m..i-rsmacearnm llT('( ·.¡3590 565 /frie llhu 1lmr;,1x1emi/ .\" 5300 G • 5300 G · 5300 (]. 5:130 (i. 5JJ7 (j ' 5340 (;. ))72 r;. 5400 G · 56 (1 Hurkholdcriu C(l//iu\'l1r nili1co/a XCl'/'U 963 ~550 G- 569 .·lgobacterium 11tmc!f<1 l·1ens fn:.('/·1JI':! '" }/ 5 -1) b'uclllus cere11 s * 5-¡ Hurkholclr.::na cepmcu ' ºJ:P5."' / 5-y liurkholcleno c.:cpa1c<1 l .. \f<.il./:93(/-l'.366J 5-3 .·lgrohacteri11m rnhi .. /'fCC I 33.•5 5- ./ lgrob"c1eri11111r<1Jwhl1c1er111". J( ·nu' _.,./ /-1 5-5 F' . ~ cmlomonas c:h/cwof1J'l11 .( /iS.\J50()SJ 5-6 :\fycohauerili1n m·i w11 5-- R1:hobi11m galegac 5-8 A:r>spm/Jwn haloproeferens 5 -9 Pse1tJomonas oentRino.rn D.\~~ 11 -o- 580 l'seudomonas aeruginosa PAO 581 Agrobac1eri11111 radiohac1er /w / ATCC23308 582 Agrobacterium radiobacter C58 583 Achromobac/er mh/andii DS,\/~53 584 Pseudomonas pulida KT2440 585 Rurkholderia C3430 586 811rkholderia cepuica /..¡\f(¡ I 4280(fl "365) 5¡r Anabacna :i:.p.Pcc-¡ :!O 5550 (.i . ."700 (.i . 5700 (j . 5700 (j . ~715 G - '7~0 (; . 'XOO G • 58.18 G ' 5892 e;. 5900 G- 5900 G • ,900 e;. '900 G · 5900 G · 6000 G- 6000 G- 6100 G- 6300 G- 6400 G · b proteobacteria CFB CFB g prnl~ot'l:h:h:ria Bacillos· C lostrikoh.1 c.1 .. ·ria Bal·illus Clvstridium h prot"oha ... ·t .. ·ri;i g prot..·oha ... ·1i..·1ü g prot .. ·ob;i.:t.:ri a a prnt.:ohal ·t.: ri ~1 lhcil ltt!' Ch ~: .:tridium h pn,l .. '1..lha...-t ... ·1i:1 b pruh.'\lh:11.."k'ri;1 a rrnt('úbal't1.·ria a prokoha1:kri:t gprot('Oti<.11.·l('fiil _.\c ti11ol'la1.:t1..·ri:t a prot..·obad.:ria a proh.·oha1..·11.·ria g prouohal't"ri~1 g proh.·oh:ickria a proteohacleria a proteobact"ria b proteobacteria g proteobacteria b proteobactC'f"ia b proteobI<' ·\B. ( 199 . ~) hnp: ·. \\W\\ . < 1p s nc1.. \ 1 · ~ onlin..:- k:ttur ... · lhirkfo1hkri;1...-..·p.1i ; a r .. ·pli (imthu..:-s. J) .. and ·1·im11nkr. Ji. . ( 191) 1) Chamo..·k C. ( 199~ ) Jumas·Bilak .1·: .. d :11 ( 1998) Kl)h:tn .. ·\ ( ih1nstad :\ .. Oppq~aard . 11 l l 9')!J \ hup: '"'" ·;q1sn('\.11rt: nnlin .. • f ~: :itur .: Hurl...hl1ld ... ·ri :11.:q1:ii .. :a r .. ·pli hllp: \\W\\ .;1psnd.l)f"g onlin(' foaturl.' l~url..hl1Jd .. ·ria .. · ... ·pai ... ·a r .. ·rli Jum9 1 l Kim .IR. .1 51)8 l'.ü:11d11monw· c:epmca /).\º.H5fJ/SfJ "- 'J' ) l ~ hi . ": ohmm f~!t:11111ino . ,(11w11 /•1 · r1·!fr1/i1 .·ITCC / ./JS(J 6110 f' ,,-,,udo111onas gl01he1 /.JS.\/5011/ ./ (¡{J! Hurkhofderia cepa1c·<1 ( ·55r,s 6()_1 . l:tl.\'fJll"il!wn hrasilcnse s¡r 6 0.~ /Jaue1mdes ovaws 60./ .-l:ospiri/111111 brasiliense Cd 605 U11rkholderia cepaica ('5_1 ··.¡ 606 lJ11rkholderia cepaicc1 C/::/'0_1./ Nr Burkholderia cepaica HC I I 60/1 811rkholderia cepaica ATCl '!9~l4rDBOI) 609 Ralstonia eutropha H 16 6 / ú .-1zospm1lum brasihense Sp'J.J5 611 Burkholdena cepaica CU'5 1 I 612 Azospirillwn amazonenseY6 613 Agrobacterium rhizogenes bv2ATCC 11325 614 Agrobacterium rhizogenes K84 615 Azospiril/um amazonensoY2 616 Mycobacter/um gordonae 61 i Burkholderia cepaica BcF 6400 G ' 6400 G- 6435 G - 6500 Ci-+ 65011 Ci - 5000 G • 6500 G- 6600 li - 6700 ti- 6700 G- 6700 G · GSOO (;. 6~0() ¡;. 6800 {i- r,soo u- 6900 G - 6900 G- 7000 G- 7000 G- 7000 (i. 7100 G- 7100 G- 7100 G- 7200 G- 7200 G- 723S G- 726S G- 7300 O- 739S O+ 7400 O- Actinobacteria b proteobacteria a proteobacteria Bacillus/Clostridium b proteobactcria Bacillus..'Clostridimn a protcoba..:teria a prokoh<11..·ti:-ria h prot...-obadaia a protcoha..:h.·ria h prnt1..·oh;11..·1cria " prot ... ·oba\.'kri;i h prol..:ohadi:ri;1 h prnl1..'l)h:i1..·t1:.·ria ;1 prokoh•11..·t...·ria CFB a proleob:i1,.1.~ria h proteoba1..·t..:ri<1 h proteohacti:rin h proleoba1..1.eria b proteoba..:terin b prouobacteria a proteobacteria b proteobacteria a proteobacteria a proteobacteria a proteobactcria a proteobacteria Actinobacteria b proteobacteria M ~I M M ~I M ~I ~I ~I ~' ~I ~I ~I \1 ~I ~I ~I ~I ~I ~I ~I ~I ~I ~I ~I M M M ~I M FL FL FL FL FL FL FL FL FL FI. FL FI. FI . H. FL FL FI. FL FI. FL. FL FL FL FL FL FL FL FL PO FL A Bigey F. et al ( 1995) A http: ·1www.apsnet.org.'online/foaturc:Burk.holderiacepaica 1repli A Jumas-Bilnk.E.. et al (1998) ANA Young ~I and Cole (1993) .-\ Song'5iYila S. Dharakul T. (2000) A Leblond.Pet al .( 1990). . .\ Jumas-Bilak,E.. el al ( 1998) ..\ .lumas-Bilak.L el al (1998) .-\ http: · "\\w . apsn~t.org , onlin..: katur..: J~urkho!tkria1..·~paiG1 . r ... ·pli .-\ ~lartin-Didond. d al (2000) .- \ Grothu..:-s, D .. and Tümmll.."r. B .. ( l ')'J 1) . \ .lumas-Bilak.E .. ..:t JI ( 1998) .- \ tirothu..::o;. l.> .• :111d Tünunl .. ·r. IL ( 1'>91 ) htlp: \\W\\.ar!"111..'t .í.)rg 1\11lin\.· fratur1..' l\urkl11lkkri:11.:1..·pai ... ·a r1..·pli .-\ 1'.la11ín-Did1>n..:t. l.'t al (2000) w Shah..:-duzz~unan S!\1. d al ( 1997) ,\ i\lartín-Didon.:t. .. ·t ~11 (2000) .-\ h1tp :· \\W\\ .aps11..:t.org ·o11lin..: f..::itur ... • J~ur"holckrinet.org li c T t rit Hurkhold~ria" · epaica '"·pli 6.,J t r ce5 n ew 00 G -i a1.·illus .. l str ~ I FL ,\ rJi ,· o ~ 40 5, 9, 14-17, 1 21,22 IL 30 A.. ,----- ----.., 20 10 o 0.5 1.5 2.5 3.5 -4.5 5.5 6.5 7.5 8.5 9.5 .. Halomonas halmophila GENOME SIZE (Mb} 1 Themlophilic Oxygen raducers 11 Spírochetes 21 Actinobacteria 2 Th811T1otogales 12 Chlamydlales/VefTll()OfTlicrobia 22 BacilluslClostñdium r--3 ThennuslDeinococcus 13 Fusobacteria 23 Mycoplasmatales 1 32 Methanosalcinale 4 Leptosplrilum-Nttrospira 14 • proteobacterla 24 Acholeplasmatales ! 33 An:haeoglobales 5 CFB 15 b Pfateobacteria 25 Unclasif Mollícutes 34 TIMKmococcales 6 Green Sulphur Bacteria 16 g proteobacteria 26 Pyrodictiales 35 Unkown&fChaea 7 Planc:tomycetales 17 d proteobacteña Zl Thefmoproteales 8 Purple non Sulphur Bacteria 18 e proteobacteña 26 SUlfolobales 9 Cyanobacterta 19 Porochlamydla 29 Halobacteriales 1 o Flbrobader and Acidobadeñum 20 Unc:lasifipseudomona~ JO Methanoblcteriales (1) N rl J Mol Evol (1997) 45:115-11 8 Apéndice 3ª Poillt Counter Poillt Polyphyletic Gene Losses Can Bias Backtrack Characterizations of the Cenancestor \ Mushegian and Koonin (1996) have recently published the results of a detailed comparison of the complete ge- nomes of Haemophilus inf/uen~ae and Mycop/asma geniralium in conjunction with the fragmentary data from other organisms available as of March 1996. Once parasite-specific sequences were discarded, the final out- come was an inventory of 256 genes that may resemble, not only the genetic complement of the : mce ~ t0r of Gram-positive and Gram-negati\'e bJc:~ r. J. ~- ·~ • ¡iroD" L' <,. also the amount of DNA required today to sustam a minimal cell. Since most of these sequences have eu- karyotic and/or archaeal homologs, Mushegian and Koo- nin discuss how this figure may be reduced to describe the genome of the last common ancestor (LCA) of the Bacteria. Archaea. and Eucarya. that is. the cenancestor. and suggest how insights on even earlier stages of evo- lution can be achieved. Given the rapid pace at which more and more cellular genomes are being completely mapped and sequenced, the assumptions and strategies used in such approaches merit considerable anention. As argued here, importan! pitfalls can be aYOided if the poly- phyletic gene losses that have taken place in widely sepa- rated lineages are properly acknowledged. The Cenancestor Probably Had a D:\'A Genome The backtrack methodology proposed by Mushegian and Koonin (1996) is quite straightforward. and partly based on the idea that !!enes that are not found in both bacteria and eucarya, or Ín bacteria and archaea, were probably absent from the cenancestor. The nonstated assumption is that the archaea and eucarya are sister groups, an evo- lutionary relationship supported by an increasingly larger amount of molecular data. Howe\'er. such an approach can inad\'ertently miss nuciear-encoded genes which may ha ve been part of the LCA but los1 independently in JOUllllALOF,DLECULAR EVDLUTIDN C SJrinsu·Vcriq New Yort IK. 1997 both the bacteria! and archeal domains. or not present in the prokaryotic genomes of a given data set. For in- stance, the absence in their sample of eukaryotic or ar- chaeal homologs of severa! key proteins involved in DNA replication led Mushegian and Koonin to speculate that the cenancestor may have had an RNA genome. Se,·eral objections can be raised against this conclusion: ( 1) Sequence similarities shared by many ancient, large proteins found in ali three domains suggest that consid- erable fidelity alrcady existed in the operative genetic system of their common ancestor, but such fidelity is unlikely to be found in RNA-based genetic systems. (2) S.:quence analysis and biochemical characterization of a ribonucleotide reductase from the archaeon Pyrococcus furiosus has shown that this enzyme shares considerable similarities with both its eubacterial and eukaryotic counterparts (Riera et al. 1997). (3) As underlined by Mushegian and Koonin (1996), their analysis was per- formed before any complete archaeal or eucaryal ge- nomes became available in the public data bases, and should thus be considered preliminary. Indeed, release of the entire Merhanococcus jannaschii genome has al- Iowed the identification of one archaeal DNA polymer- ase exhibiting sequence similarity and three conserved motifs with the eubacterial DNA polymerase Il, and with the eukaryotic a, -y, ande polymerases (Bult et al. 19%). Taken together, these results suggest that DNA genomes and polymerases with proofreading and synthesizing functions evolved prior to the divergence of the three · primary kingdoms. To Salvage or Not to Salvage Until a more complete data set is available, backtrack inferences on the nature of the cenancestor should be . considered as preliminary and perhaps biased by the re- ~ed genomic content of parasites, many of which have undergone multiple secondary losses. For instance, the de novo purine nucleotidé biosynthesis is probably one of 1he oldest metabolic pathways, but it is also one of the 129 116 most easily lost by a wide range of obligate symbionts and parasites. Failure to recogriize such polyphyletic streamlining processes, which have talcen place in H. influen:ae and at an even greater degree in M. geni- ralium, can lead to sorne misunderstanding. It would be tempting, for instance, to interpret the absence of purine biosynthesis in the minimal set defined by Mushegian and Koonin (1996) as evidence that the growth and re- production of the first life-forms depended on the het- erotrophic uptake of nucleotides present in the primitive soup (see. for instance, Pennisi 1996). Howe\'er, such conclusions would be at odds with the problems associ- ated with the chemical synthesis and accumulation under primitive conditions not only of ribose, but al so of purine and pyrimidine ribosides, which suggest that none of them are truly prebiotic compounds (cf. Lazcano and Miller 1996). The phylogenetic distribution of purine nucleotide sal\'age enzymes can also lead to sorne confusion regard- ing the cenancestor' s metabolic capabilities. Based on their data set, Mushegian and Koonin (1996) conclude th~t their minimal cell had the complete nucleotide sal- vagc pathways for ali bases except thymine. Adenine deaminase (ADA). which catalizes the hydrolytic dearni- nation of adenine into hypoxanthine, is absent in both H. i11j111e11:ae and M. genitalium, and, therefore, was not included in such inventory. However, since the ADA gc1 ; ~ is found in other nonpathogenic Gram-positive and Gram-negative bacteria, it may have been part of the LCA genome. The same is probably true of the G~1P reductase guaC gene. Since GMP reductase is not found in H. injluen:.ae, M. genitalium, M. jannaschii, and Sac- charomyces cerevisiae, it could be argued that the cc- nancestor lacked guaC. Such conclusion is not supponed by the prcsence of GMP reductase in a group of widely o;eparated species that includes Escherichia co/i, Tri- trichomonas foetus, Trypanosoma cru:i, Leishmania mexicana, and humans (Berens et al. 1995). E"cn organ- isms wilh close phylogenetic affinities can differ in their salvage abilities. Hypoxanthine- and guanine phosphori- bosyltransferase activitics have been found in cell ex- tracts of the euryarchaeota Methanococcus l'o/tae (Bo- wen et al. 1996), but thc corresponding genes appear to be absent in the closely related M. jannaschii, wherc the only recognizable purine phosphoribosyltransferase gene is that of adenine PRTase (Bult et al. 1996). Molecular Phylogenies Are Not Rooted in the Origin of Life Thc pionccring work of Mushegian and Koonin ( 1996) is an imponant improvement over previous attcmpts to characterize the LCA (Lazcano 1995.. and references therein). but it can be improved by systematic effom to identify streamlining processes that have led to polyphy- letic gene losscs in widely separated species. This may be particularly significan! given the choice of model or- ganisms whose entire DNA is being sequenced, sorne of which have been selected because of their relatively small, compact genomes. It is expected that in few years larger volumes of genomic data reflecting a broader cross-section of biological diversity will become avail- able. This will allow not only more precise descriptions of the gene complements of ancestral states, but al so an understanding of the cffects of parasitism on genomes and the dynamics of gene losses. Genome sequencing and analysis is rapidly becoming a key element in our understanding of early biological C\'Dlut ion. but it is difficult to see how its applicability can be ex tended beyond a threshold that corresponds to a pcriod of evolution in which protein biosynthesis was already in operation. Older stages are not yet amenable to this type of analysis. The first life-forrns were probably simpler than any cell now alive, and may ha ve lacked not only familiar traits like protein catalysts, but perhaps even genetic macromolecules with ribose-phosphate backbones (Lazcano and Miller 1996). Given the huge gap in our understanding of the evolutionary transi tion between the prebiotic synthesis of organic compounds and the cenacestor, the tcmptation to describe the nature of the very first living systems based solely on molecular cladistics should be carefully avoided. A c kn o w/cd g m~nu. Wc thank Dr. Mate Fontecavc and his coworkcrs for providinE us with thcir rcsults prior to publication. Thc \Hd· \lf J.l.L. has becn supponed by the Consejo Superior de lnvtstigoc1oncs Cicnúficas (CSIC. Madrid. Spain). A.L. is an Affiliatc of the NSCORT (NASA Spccialized Ccnter for Rescarch and Training) in Exobiology at the University of California, San Diego. References Berens RL. Krug EC. Marr J (1995) Purine and pyrimide metabolism. In: Marr JJ. MUller M (eds) Biochemisuy and molecular biology of pansites. Academic Press. London pp 89-117 Bowen TL. Lin WC, Whitman WB (1996) Characteriz31ion of guanine and hypoxanthine phosphoribos~ · ltransferase activities in M'11w· nacoccMs voltae. J Bacteria! 178:2521-2556 Bult O. White O, Olsen GJ. Zhou L. Fleischmann RO. Suuon GG. Blake JA. FitzGerald LM. Clayton RA. Gocayne JO. Kerlavage AR. Dou¡¡heny BA. Tomb JF. Adams MD. Rekh CI. Ü\'erbeek R. Kirkness EF. Weinstock KG. Mcrrick JM. Glodck A, Scou JL. Gcohagcn NSM, Wcldman JF. Fuhrmann JL. Nguyen D. Uucrback TR. Kellcy JM, Pcterson JO, Sadow PW. Hanna MC. Conon MD. Robcns KM. Hurst MA. Kaine BP. Borodo\'sky M. Klcnk MP. Fraser CM. Smith HO. Wocse CR. Venter JC (1996) Com;: ;c:o gcnomc scquencc of the methanogcnic archaeon. M~thanococcus jannaschii. Science 273: 1058-1073 Lazcano A (1995) Cellular evolution during the early Archean: what happened bctween the progenote and the cenancestor? Microbio- logia SEM 11:1-13 Lazcano A. Miller SL ( 1996) The origin and early evolution of life: prebiotic chemistry. the pre-RNA world. and time. Cell 85:793-798 Mushegian AR. Koonin EV (1996) A minimal gene set for cellular life derived by comparison of complete bac1erial genomes. Proc Natl 130 Acad Sci USA 93: t0268-10273 Pcnnisi E (1996) Sccking lifc ' s barc (gcnetic) ncccssitics. Scicncc 272: 1098-1099 Riera J. Robb FT. Wciss R. Fontccavc M (1997) Ribonucleotidc rc- ductasc in the archaeon Pyrococcus furiosus: a critical enzymc in thc cvolulion of DNA gcnomcs. Proc Natl Acad Sci USA 94:475-478 Arturo Becerra Sara Islas José Ignacio Leguina Ervin Silva Antonio Lazcano Facultad de Ciencias Unil-ersidad Nacional Autónoma de México Aparrado Postal 70-407 Cd. Universitaria 04510 México D. F. México 131 117 FALTAN PAGINAS Apéndice lb MOLECULAR BIOLOGY AND IBE RECONSTRUCTION OF MICROBIAL PHYLOGENIES: Des Liaisons Dangereuses? A. BECERRA, E. SIL V A. L. LLORET, S. ISLAS. A. M. VELASCO. and A. LAZCANO Facultad de Ciencias, UNAM Apdo. Postal 70-407 Cd. Universitaria, 0451 O México, D.F., AfEXICO l. Introduction Only half-a-century after the DNA double chain model was first suggested, molecular bioloITT' has become c:-ne of !1 :: :i '.r: "! pr0vocative, rapidly developing fields of of scientific research. th:.:: '.::3 '.:: ~ ~e ~ J: · ~ y to tantalizing new findings on processes and mechanisms at the molecular leve!, but also to major conceprual revolutions in life sciences. Is there any hope of developing methodological approaches and theoretical frameworks not only to make sense of the overwhelming growing body of data that this relatively new field is producing. but also to use them to develop a more integrative. truly multidisciplinary understanding of biological phenomena? As Peter Bowler wrote a few years ago, Charles Darwin anó bis followers were accutely aware that "evolutionism's strength as a theory carne fom its ability to make sense out of a vast range of otherwise meaningless facts" (Bowler, 1990). This situation has not changed. Evolutionary biology may be in a state of major turmoil, but its unifying powers have not diminished at ali. In fact they probably represent one of the most promising possibilities of overcoming the perils of reductionism that have plagued molecular biology since its inception. Molecular approaches to evolutionaI)' issues are a century old. The possibility of developing a sucessfuJ blending between them may have been first. suggested by the American-boro British biologist and physician George H. · F. Nuttall, wbo in 1904 published a book summarizing the results of the detailed comparison of blood proteins that he had used to reconstruct the evolutioruuy relationships of animals. "In the absence of palaentological evidence", wrote Nuttall ( 1904 ), "the question ~ of the interrelation-ship arnongst animals is based upon similarities of structure in existing forms. In judging of these similariúes. the subjective element may largely enter, in evidence of which we need but look at the historv of the classification of the Primates'· Such subjective element Nuttall believed. co~ld be succesfully overcomed by 135 J. Cheía·Flores el al. ( eds. j, Astrohiologv. 135-150. © 2000 Klu>«cr Academic Publishers. Primed in rhe Netherlands. 136 BECERRA ET AL. constructing a phylogeny based not on form but on the inmunological reactions of blood-related proteins. Although tite comparative analysis of biochemical properties. metabolic pathways and. in few cases. morphological characteristics. had provided sorne useful insights on the evolutioruuy relationstúps among certain microorganisms, until a few years ago the reconstruction of bacteria! phylogenies and tite widerstanding of microbial taxonomy were both viewed with considerable skepticism. Tu.is situation has undergone dramatic changes with the recognition that proteins and nucleic acid sequences are tústorical documents of unsurpassed evolutionary significance (Zuckerkandl and Pauling, 1965). and has led to a radical renovation of the phylogeny, classification, and systematics of prokaryotic and eukaryotic microbes (Woese. 1987). But these changes have also sparked new debates, and have led to an increased appreciation tita! the scope and limits of molecular cladistic methodologies require clarification. As shown by the curren! controversies on the characteristics of the first orgarúsms. the origín of the different components of the eukaryotic cell. and the soundness of traditional taxononuc systems. the development of the full potential •Jf molecular cladistics will depcná not only on methodological refinements to improve the algoritluns used for reconstructing evolutionary tústory from molecular data but also on tite critica! reexamination of its titeoretical framework, which includes a number of central conccpts, most of which were grafted from classical cvolutio~· theory into molecular biology. Here we discuss sorne of these issues. and review briefly some of the major contributions tita! ti1ey have promoted in our understanding of previously uncliaracterized early periods of biological evolution. 2. On the nature of eukaryotic ceUs Thc awareness that genomes are extraordinarily rich tústorical docurnents from which a wealth of evolutionaJ!' information can be retrieved has widened ti1e range of phylogenetic studies to previously unsuspected heights. The development of efficient nucleic acid sequencing teclmiques. wtúch now allows the rapid sequencing of complete cellular genomes, combined with the simultaneus and independent blossoming of computer science. lias led not only to an explosive growth of databases and new sophisticated tools for their exploitation. but also to the recognition that different macromolecules may be uniquely suited as molecular chronometers in tlle construction of nearly universal phylogenies. A majar achievement of this approach lias been the evolutionaJ!' comparison of small subunit ribosomal RNA (rRNA) sequences. which has allowed the construction of a trifurcated. unrooted tree in which ali known organisms can be grouped in one of three major tapparentiy) monophyletic cell lineages: tl1e cubacteria. the archaebacteria. and the eukaryotic nucleocytoplasm. now rcferrcd to as new taxonomic categories. i.c .. MOLECULAR PHYLOGENIES 137 the domains Bacteria, Archaea, and Eucarya, respectively (Woese et al., 1990). There is strong evidence that the identification of these lineages is not an artifact based solely upon the reductionist ex1rapolation of information derived from one single molecule . . Whlle trees based on whole genome information have confimied at a broad level rRNA-based phylogerúes (Snel et al., 1999; Tekaia et al., 1999), it is also true that tbe congruence between rRN A genes and otber molecules is not always ideal, and anomalous phylogerúes have been reponed (Rivera and Lake, 1992~ Gupta and Golding, 1993). At tbe time being there is no general explanation to account for these peculiar topologies, and the possibility that we may bave to restrict ourselves to empirical characterizations of such cases sbould be kept in mind. However, a large variety of phylogenetic trees constructed from DNA and RNA polymerases, elongation factors, F-type ATPase subunits, beat-shock and ribosomal proteins, and an increasingly large set of genes encoding enzymes involved in biosynthetic pathways, have confinned the existence of tbe tbree prirnary cellular lines of evolutionary descent (Doolittle and Brown, 1994), between wbich extensive horizontal transfer events bave taken place (Doolittle, 1999). The ensuing tripartite taxonomic description of the living world fostered by Woese and bis followers has been disputed by a number of worlcers, who contend that both eubacteria and archaebacteria are bona fide prokaryotes, regardless of the pecularities that separate ~ them at tbe molecular level, both are prokalyotes (Mayr, 1990; Margulis and Guerrero, 1991; Cavalier-Smith, 1992). Furthennore, because of their very nature, molecular dichotomous phylogenetic trees cannot be drawn which include anastomozing branches corresponding to the lineagcs which gave rise to the different components of eukayotic ce!!s. Accordingly, Margulis and Guerrero (1991) have argued that although molecuhr d;ldí:;tics is now a prime force in systematics, phylogenetically accurate taxonomic classifications should be based not only on tbe · evolutionary comparison of macromolecules, but also on metabolic pathways. chromosomal cytology, ultrastructural morphology, biochemical data, life cycles, and when available, paleontological and geochemical evidence. While molecular phylogenies have co1úinned the endosymbiotic origin of plastids and mitochondria, a number of trees also suggest that a major portion of the eukaryotic nucleocytoplasm originated from an archaebacteria-like cell wbose descendants fonn the monophyletic eucaryal branch (Gogarten-Boekels and Gogarten, 1994). As asserted by W oese and his collaborators, although tbe presence of endosymbionts is of critical importance to the eukaryotes, it is undeniable that tbe latter "have a unique, meaningful phylogeny" (Wheelis et al., 1992). While sucb view assumes an absolute continuity between the nucleocytoplasm and its direct ancestor, tbe holistic arguments advocated by Margulis and Guerrero ( 1991 ), Cavalier-Smith ( 1992), and otbers, emphasize the evolutionary emergence of an novel type of cell as a result of endosymbiotic events. According to tbe latter, the key transitional event Ieading to eukaryosis was tbe evolutionary acquisition of heritable intracellular symbionts, and the eucaryal branch does not represent eukaryotic cells as a whole, any more than fungal hyphae or 138 BECERRA ET AL. phycobionts like the Trebouxia algal cells exhibit, by thernselves, a1l the phenotypic and genetic characteristics of a lichen thallus. Of course, antagonistic taxonomies have coexisted more or less peacefully along the history of biology. However. the urgent need to critically revise current classificatory svstems cannot be underscored. Modem taxonomic schemes need to acknowledge not o"n!y the existence of three major cell lineages, but also the eukaryotic divergence patterns, which appear to be the result of rapid bursts of speciation (Sogin, 1994 ). Any such modifications in biological classification require the recognition of the functional and anatomicaJ continuity between the cukaryotic cytoplasm and the intranuclear environment, as well as the likelihood that the evolution of membrane-bounded nuclei is indced a byproduct of permanent intracellular associations. In facl extant amitochondrial cukaryotes such as Giardia and Trichomonas appear to have had mitochondria in the past (Germont et al., 1997), and still harbor permanent intracellular bacteria! endosymbionts (Margulis, 1993). Thesc amitochondrial cells, which may include the microaerophilic. amitotic, multinucleated giant amoeba Pe/omyxa palustris. are a1l located in the lowest branches of the eucarya, and contain severa! types of intracellular prokaryotes which may be the functional equivalents of mitochondria The ubiquity of endosymbionts suggests that they may havc played a critical role in the cvolutioruuy development of nucleated cells. Tiris hypothesis is amenable to observational and experimental designs, and may be supponed by studying the possible bacterial affinities of membrane-bounded hydrogenosomes that are known to multiply by binary division in the Trychomonas cytoplasm (Müller, 1988), as well as by searching for prokaryotic endosymbionts in species of Parabasalia, Retonomonads. Diplomonads, Calonymphids, and other protist taxa.. some of which may have evolved prior to mitochondrial acquisition. 3. Tbe root of the tree or the tip of the trunk? TI1e construction of the unrooted rRNA tree showed that no single major branch predates the other two, and ali three derive from a common ancestor. lt was thus concluded that the latter was a progenote, which was defined as a hypothetical entity in which phenotype and genotype still had an imprecise, rudimentary linkage relationship (Woese and Fox. 1977). According to this view, the differences found among the transcriptional and translational machineries of eubacteria, archaebacteria, and eukaryotes, were the result of evolutioruuy refinements that took place separately in each of these primary banches of descent after they have diverged from their universal ancestor (Woese, 1987). From an evolutionary point of view it is reasonable to assume that at sorne point in time the ancestors of ali forms of life must have been less complex than even the simpler e~1ant cells, but our current knowledge of the characteristics shared between the three lines has shown that the condusion that the last comrnon ancestor was a MOLECULARPHYLOGENIES 139 progenote was premature. This interpretation, based on rRNA-based trees for which no outgroups have been discovered, has been definitively superseded (Woese, 1993). A partial description of the Iast common ancestor of eubacteria, archaebacteria, and eukaryotes may be inferred from the distribution of homologous traits among its descendants. The set of such genes that have been sequenced and compared is still small. but the sketchy picture that has already emerged suggests that the most recent conunon ancestor of ali ex1ant organisms, or cenancestor, as defined by Fitch and Upper (1987), was a rather sophisticated cell with at least (a) DNA polymerases endowed with proof-reading activity; (b) ribosome-mediated translation apparatus with an oligomeric RNA polymerase; (e) membrane-associated ATP production; (d) signalling molecules such as cAMP and insulin-like peptides; (e) RNA processing enzymes; and (f) biosynthetic pathways Ieading to amino acids, purines, pyrimidines. coenzymes, and other key molecules in metabolism (cf. Lazcano, 1995). Although the possibility of horizontal transfer should always be kept in mind, the traits listed above are far to numerous iµtd complex to assume that they evolved independently or that they are the result of massive multidirectional horizontal transfer events which took place before the earliest speciation events recorded in each of the three lineages. Their presence suggests that the cenancestor was not a direct., immediate descendant of the RNA world, a protocell or any other pre-life progenitor system. Very likely, it was already a complex organism, much akin to extant bacteria, and must be considered the last of a long line of simpler earlier cells for which no modem equivalent is known. Unfortunately, the characteristics of evolutionary predecessors of the cenancestor cannot be inferred from tilc ;1lcsicmoT?i lic traits found in the space defined by rRNA sequences. Although trees constructed from such universally sbared characters appear to be free of interna) inconsistencies, the lack of outgroups leads to topologies that specify branching relationships but not the position of the ancestral phenotype. Thus, such trees cannot be rooted. Titis phylogenetic cul-de-sac may be overcomed by using paralogous genes. which are sequenccs that diverge not through speciation but after a duplication event As notcd over twcnty years ago by Schwartz and Dayhoff (1978). rooted trees can be constructcd by using one set of paralogous genes as an outgroup for the other set. a rate-independent cladistic methodology that expands the monophyletic grouping of the sequences under comparison. This approach was used independently a few years ago by lwabe et al (1989) and Goganen et al ( 1989). who analyzed paralogous genes encoding (a) the two elongation factors (EF-G and EF-Tu) tliat assist in protein biosynthesis; and (b) the alpha and beta hydrophilic subunits of F-type ATP synthetases. Using different tree-constructing algorithms. both teams independently placed the root of the universal trees between the eubacteria. on the one side, and archaebacteria and eukaryotes on the other. Their results irnply that eubacteria are the oldest recognizable cellular phenotype, and irnply tllat specific phylogenetic affinities exist between the archaea and the eucarya. 140 BECERRA ET AL. Tiús branching order, wbich was promptly adopted by Woese et al (1990), appears to be consistent with stmctural and functional similarities wbich are known to exist in the translation aIÍd replication machineries of both archaebacteria and eukaryotes (Ouzonis and Sander, 1992; Kaine et al., 1994). However, the issue is far from solved. and has in fact been funher complicated by the availability of completely sequenced genomes. The situation is further aggravated by thc fact tbat the phylogenetic analysis of sets of ancestral paralogous genes other than the elongation factors and the A TPase hydropbilic subunits has cballenged the conclusion that universal trees are rooted in the eubacterial branch (cf. Forterre et al., 1993). Wbile the sequences of the products of genes involved in the transcription/transcriptional molecular machinery of eukaryotes appear to be closer to those of the archaea than to the eubacteria, other sequences such as those cncoding heat-shock proteins and several enzymes suggest the existence of phylogcnetic affinities between archaebacteria and Gram positive bacteria. No support for a particular topology was detected when mean interdomain distance analysis was used to analize a set of approximately forty genes common to the three lineages (Doolittle and Brown, 1994). The lack of congruency between different universal phylogenies may be the result not only of the statistical problems involved in the aligment and comparison of a large munber of sequences that may have diverfec more than 3.5 x 109 years ago, but also of even older additional paralogous duplications (Forterre et al., 1993), and· of horizontal gene traofer events (Doolittle, 1999), both of which may be obscuring the natural relationsbips between the lineages. Given the likelihood that microbial phylogenetic analysis will increase its reliance on paralogous duplicates to define outgroups and character polarities (Sidow and Bowman. 1991), detailed studies should be devoted to assess the validity and limits ofthis ciadisti:: n::::hodology. Minor differences in the basic molecular processes of the three main cell lines can be distinguished, but all known organisms, including the oldest ones, share the same essential features of genome replication, gene expression, basic anabolic reactions, and membrane-associated A TPase mediated energy production. The molecular details of these universal processes not only provide direct evidence of the monophyletic origin of all extant forms of life, but also imply that the sets of genes encoding the components of these complex traits were frozen a long time ago, i. e., major changes in them are very strongly selected against and are lethal. Biological evolution prior to the divergence of the three domains was not a continuous, unbroken chain of progressive transformation steadily proceeding towards the cenancestor. However, no evolutionary intermediate stages or ancient simplified version of the basic biological processes have been discovered in extant organisms. Nevertheless, clues to the genetic organiz.ation and biochemical complexity of tbe earlier entities from which the cenancestor evolved may be derived from the analysis of paralogous sequences. Their presence in the three cell lineages implies not only that their 1ast commnn ancestor was a complex cell already endowed, among others, with MOLECULAR PHYLOGENIES 141 pairs of homologous genes encoding two elongation factors, two A TPase hydroplúlic subunits. two sets of glutamate dehydrogenases, and the A and B DNA polyrnerases. but also that the cenancestor itself must have been preceded by simpler cells in which only one copy of each of these genes existed. In other words, Archean pa.ralogous genes provide evidence of the existence of ancient organisms in which A TPases lacked the regulatory properties of its alpha subunit., protein synthesis took place with only one elongation factor, and the enzymatic machinery involved in the replication and repair of DNA genomes had only one polymerase ancestral to the E. co/i DNA polyrnerase I and II. By definition, the node located at the bottom of the cladogram is the root of a phylogenetic tree. and corresponds to the common ancestor of the group under study. But names may be rnisleading. The recognition that basic biological processes like DNA replication, protein biosynthesis, and ATP production require today the products of pairs of genes which arose by paralogous duplications during the early Archean. implies that what we have been calling the root of universal trees is in fact the tip of a trunk of unknown length in which the history of a long (but not necessarily slow) series of archaic evolutionary events may still be recorded. The inventory of paralogous genes that duplicated during this previously unchacterized · stage of biological evolution appears to include. in addition to elongation factors, ATPase subunits, and DNA polymerases. the sequences encoding heat shock proteins, ferredoxins, dehydrogenases. DNA topoisomerases. severa! pairs of aminoacyl-tRNA synthetases, and enzymes involved in nitrogen metabolism and amino acid biosynthesis. It is noteworthy that this list includes also aspartate transcarbamoyl transferase, an enzyme which together with carbamyl phosphate synthetase (whosc largc subunit is itself the product of an intemal, i.e.. partial. paralogous dupii.:;.tionJ caialyzes the initial steps of pyrimidine biosynthesis (García-Meza et al, 1995). llms. prior to the early duplication events that led to what may be a rather large number of cenancestral paralogous sequences, simpler living systems existed which lacked the large sets of enzymes and the sophisticated regulatory abilites of contemporary cells. AltliougL iateral trnnsfer of coding sequences may be almost as old as life itself. gene duplication followed by divergence probably played a dominant tole in tlle accretion of complex genornes. and may have led to a rapid rate of microbial e\"olution. If its is assumed that t11e rate of gene duplicative expansion of ancient cells was comparable to today's preser:-: Yalues, which are of 10-5 to 10-3 gene duplications per gene per cell generation (Stark and Wahl, 1984), the maximum time required to go from an hypothetical 100-gene organism to one endowed with a filamentous cyanobacterial-like genome of approximately 7000 genes would be less than ten million years (Lazcano and Miller. 199-t). Altl1ough t11ere are no published data on the rate of formation of new enzymatic activities resulting from gene duplication events under either neutral or positive selection conditions. the role of duplicates in the generation of evolutionary novelties is 142 BECERRA ET AL. well stablished. Once a gene duplicates, one of tite copies may be free to accurnulate non-lethal mutations and acquire new additional properties, which could lead into its specialization or recruitment into new role. Data summarized here supports tite idea that primitive biosyntltetic pathways were mediated by smalL inefficient enzymes of broad substrate specificity (Jensen, 1976). Larger substrate ranges may had not been a disadvantage, since relatively unspecific enzymes may have he~ped ancestral cells with reduced genomes overcome tlteir limited coding abilities (Ycas, 1974). The discovery that homologous enzymes catalyzing similar biochemical reactions are part of different anabolic patltways supports tite idea that enzyme recruitment took place during tite early development of severa! basic anabolic patltways. Evolutionary tinkering of the products of duplication events apparently had a major role in metabolic evolution. 11ús is supported by tite analysis of complete genome sequences, that has shown the large proportion of gene content that is the outcome of duplication events (Tekaia and Dujon, 1999). Such high levels of redundancy representan illuminating possibility and suggest that the wealth of phylogenetic information older than the cenancestor may be larger than realized, and its analysis may provide fresh insights into a crucial but largely undefined stage of early biological evolution during which major biosynthetic pathways emerged and became fixed. There is a major exception to the above conclusion. True fungi, euglenids, and chrytridiomycetes synthesize lysine via an eight-step pathway in which cx.-aminoadipate (AAA) is an intermediate. This route is different from the seven-step diaminopirnelate pathway used by bacteria. plants. and most protist (Bhattacharjee, 1985). The phylogenetic distribution of thcse two pathways suggest that tbe AAA route is the most recent one. Accordingly, if tbe patchwork assembly of metabolic pathways (Jensen. 1976) is valid, then it can be predicted that the enzymes catalizing tite AAA-route should be hornologous to tltose participating in other major biosyntltetic routes. The recognition that enzyme recruitment may have played a major role in rnetabolic evolution leads. however, to assume sorne caution in phylogenetic inferences. Although in sorne cases rnetabolic pathways may be sucessfully used to assess tite phylogenetic relationship ofprokaryotes (DeLey, 1968; Margulis, 1993), tite possibility that sorne of the enzymes of archaic patltways may have survived in unusual organisms (Keefe et al .• 1994), or that irnportant portions of extant metabolic routes may have been assernbled by a patchwork process (Jensen, 1976), suggest that considerable prudence should be exerted when attempting to describe the physiology of truly primordial organisrns by simple direct back extrapolation of extant metabolism. MOLECULARPHYLOGENIES 143 4. Molecular cladistics and the origin of life: is there any connection? "Ali the organic beings which have ever Jived on this Earth", wrote Charles Darwin in the Origin of Species, "may be descended from sorne primordial form". Although the placement of the root of universal trees is a matter of debate, the development of molecular cladistics has shown that despite their overwhelming diversity and tremendous differences, ali organisms are ultimately related and descend from Danvin's primordial ancestor. But what was the nature ofthis progenitor? The heterotrophic hypothesis suggested by Oparin ( 1938) not only gave birth to a whole new field devoted to the study of the origin of life, but played a central role in shaping several influential taxonomic schemes and different bacteria! phylogenies (Margulis 1993). Although the central role of glycolysis and the wide phylogenetic distribution of at least sorne of its molecular components are strong indications of its antiquity (Fothergill-Gilmore and Michels, 1993), it is no longer possible to support the ad hoc identification of putative primordial traits to assume that the first living system was a Clostridium-like anaerobic fermenter ora Mycoplasma type of cell (cf. Lazcano et al. , 1992). Like vegetation in a mangrove, the roots of universal phylogenetic trees are sumerged in the muddy waters of the prebiotic broth, but how the transition from the non-living to the living took place is still unknown. lndeed. we are still very far from understanding the origin and attributes of the first living beings. which may have Jacked even the most familiar features in extant cells. Far instance. protein synthesis is such an essential characteristic of cells, that it is frequently argued that its origin should be considered synonymous with the emergence of life itself. However. the discovery of the catalytic act1v1ues of RNA molecules has led considerable support to the possibility that during early stages of biological evolution living systems were endowed with a primitive replicating and catalytic apparatus devoid ofboth DNA and proteins The scheme may be even more complex, since RNA itself may have been preceded by simpler genctic macromolecules lacking not onJy the familiar 3'.5' phosphodiester backbones of nucleic acids, but perhaps even today's bases (Lazcano and Miller, 1996). Altl10ugh molecular cladistics may provide clues to sorne late steps in the development of the genetic code, it is difficult to see how the applicability of this approach can be extended beyond a threshold that corresponds to a period of cellular evolution in which protein biosynthesis was already in operation. Older stages are not yet amenable to molecular phylogenetic analysis. Although there have been considerable advances in the understanding of chemical processes that may have taken place befare the emergence of the first living systerns, life's beginnings are still shrouded in mystery. A cladistic approach to this problem is not feasible, since ali possible intermediates that may have once existed have long since vanished. The 144 BECERRA ET AL. temptation to do· otherwise is best resisted. Given the huge gap existing in current descriptions of the evolutionary transition between the prebiotic synthesis of biochemical compounds and the cenancestor (Lazcano, 1994), it is naive to anempt to describe the origin of life and the nature of the first living systems from the available rooted phylogenetic trees. Nevertheless, there have been several recent anempts to use macromolecular data to suppon claiins on the hyperthermophily of the first living organisms and the idea of a hot origin of life. The examination of the prokaryotic branches of unrooted rRNA trees had already suggested that the ancestors of both eubacteria and archaebacteria were extreme thermophiles, i.e., organisms that grow optimally at temperatures in the range 90º C and above (Achenbach-Richter et al.. 1 CJ ~ i ) . Rooted universal phylogenies appear to confirm this possibility, since heat-loving bacteria occupy shon branches in the basal portian of molecular cladograms (Stetter, 1994). Such correlation between hyperthermophily and prirnitiveness has led support to the idea that heat-loving lifestyles are relics from early Archean high-temperature regimes that may have resulted from asevere impact rcgime (Sleen et al., 1989). lt has also been interpreted as evidence of a high tempe. :¡:--_:;c origin o:- iife, which according to these hypotheses took place in extreme envirorunents such as those found today in deep-sea vents (Holm. 1992) or in other sites in which mineral surfaces may have fueled the appearance of primordial chemoautolithotrophic biological systems (Wachtershauser. 1990). Such ideas are not totally withou: rrzccd?nt. The possibility that the first heterotrophs may have evolved in a sizzling-hot environment is in fact an old suggestion (Harvey. 1924). Despite their long genealogy, these hypotheses have not been able to bypass the problem of the chemical decomposition faced by amino acids. RNA, and other thermolabile molecules which have very shon lifetimes under such extreme conditions (Miller and Bada, 1988). Although no mesophilic organisms older tban heat-loving bacteria have been discovered, it is possible that hyperthermophily is a secondary adaptation that evolved in early geological times (Sleep et al., 1989: Confalonieri et al .. 1993; Lazcano. 1993 ). Such possibility is in fact strongly supponed by the recent phylogenetic analysis of the G+C content of rRNA genes, which suggest tbat the 1ast cornmon ancestor was not a hyperthermophil1c organism (Galtier et al.. 1999). In fact, hyperthermophiles not only share the same basic features of the molecular machinery of all other forms of life: they ~so require a number of specific biochemical adaptations. Any theory on the hot origin of life must address the question of how such traits, or their evolutionary precedessors, arose spontaneously in the prebiotic envirorunent. Such adaptations may include histone-like proteins, RNA modificating enzymes, and reverse gyrase. a peculiar ATP-dependent enzyme that twists DNA into a positive supercoiled conformation (Confalonieri et al., 1993). Clues to the origin of MOLECULARPHYLOGENIES 145 hyperthennophily may be hidden in tlús list, and its evolutionary analysis may contribute to the understanding of the rather surprising phylogenetic distribution of the imrnediate mesophilic descendants of heat-loving prokacyotes, which shows tha1 at least five independent abandonments events of hyperthermophilic traits took place in widely separated branches of urúversal trees, one of which corresponds to the eukaryotic nucleocytoplasm (García-Meza et al., 1995). The antiquity of hyperthennophiles appears to be well established, but there is no evidence that they have a prirrútive molecular genetic apparatus. Thus, the most basic questions pertaining to the origin of life relate to much simpler replicating entities predating by a long series of evolutionary events the oldest recogni7.able heat-loving bacteria. Why hyperthennophiles are located at the base of universal trees is still an open question. but the possibility that adaptation to extreme environments is part of the evolutionary innovations that appeared in trunk ofthe tree cannot be entirely dismissed. The phylogenetic distribution of heat-loving bacteria is no evidence by itself of a hot or:.¡_;in of life. any more than the presence in the hyperthennophile archaeon Sulfolobus so/facaricus of a gene encoding a thermostable B-type DNA polymerase endowed with 3'-5' exonuclease activity (Pisani et al .. 1992) can be interpreted to imply that the first living organism hada DNA genome. 5. Final rcmarks Although in the past few years the relationship between molecular biology and microbial phylogenctics has be:::n '!mhittcrcd by frequent clashes and antagonism, the dcYelopment of rapiú:_, ~"G\iE: : ~ cm• .: i:c~ 1aw.banks has provided a unique view ofthe evolution ofbacterial and eukaryotic microorgarúsms, and has opened new perspectives in several majar fields of life sciences. Molecular evolution was originally the outcome of the wedding of molecular biology with neodarwinian theory, but it has been rapidly tram;fonned into a field of scientific enquiry in its own right. However, its full de\'elopment requircs not only the development of less-expensive, more rapid macromolecular sequenciEg tcclmiques and more powerful computer algorithms for constructing phylogenetic trees. but also the awareness of its non-stated assumptions and more precise definitions of its conceptual framework. As summarized by Patterson ( 1988), the theoretical foundations of molecular cladistics have been based on a number of central concepts. most of which were inherited from older disciplines. such as physiology, anatomy, and neodarwinism. Homology. which is one ofthe key concepts in evolutionary theory, was originally used by Wolfgang Goethe. Ettiene Geoffroy Saint-Hilaire, Richard Owen, and others, to describe structural resemblance to an archetype (Donoghue, 1992). In recent years it has not only been repeatedly confused with sequence similarity (Reeck et al., 1988), but is also used to describe a wider range of possible evolutionary relationships that include species- or gene-phylogeny. In fact, sorne classes of homology that describe 146 BECERRA ET AL. phenomena at the molecular genetic level may have no exact equivalent in orthodox evolutionary analysis of morphological traits. One such case is paralogy, a terrn coined by Fitch (1970) to .describe the diversification of genes following duplication events. Since paralogy provides evidence of gene duplication but not of speciation events. it is the basis for infering evolutionary relationships among genes, not among species. Recognition og this distinction has led to repeated recommendations on the avoidance of paralogous sequences in phylogenetic analysis. However, the use of paralogous duplicates in outgroup analyses for deterrnining the evolutionary polarity of character states in universal phylogenies (Gogarten et al. , 1989; Iwabe et al. , 1989) has rekindled keen theoretical interest in their advantageous properties. Their use, however, does pose sorne risks. The naive assumption that only one paralogous duplication has taken place in the set of sequences under consideratiuon may lead to incorrect topologies (Fonerre et al .. 1993). lndeed. the incorporation of genes that are the result of unrecognized multiple paralogous events in a tree may be even more insidious than the problem derived by convergent evolution and lateral gene transfer. The latter phenornena are much more easily identified at the molecular level. TI1e recognition that paralogous duplicares e>..-pand a monophyletic group of sequences raises a number of issues not ;; nccun~;:red ir. ciassical evolutionary analysis. From a (classical) cladistic point ofview, a character that is found only in outgroups is pri1nitive. Nonetheless. in molecular phylogenetic analysis this may not be always the case. Such rule would bold if multiple paralogous duplications have taken place, and if one (or several) of the older sequences is used as an outgroup for an llllfooted tree of younger sequences. lñis would be the case. for i~'"t:t.'1: e. if a myoglobin sequence is used to root alpha (or beta) haemoglobit! 1:::: . l :o·::'..',·e:. thi:; rule would not hold ií an alpha haemoglobin sequcnce (or a set of them) is used as an outgroup for the beta haemoglobin tree, or viceversa. TI1e same is true, of course, with universal phylogenetic trees derived from elongation factors (lwabe et al, 1989). In this case neither set is older than its homologue. In this case. the reconstruction of ancestral character states from dichotomously varying paralogous genes does not comes from the analysis of the outgroup, but may be inferred from the realization that the root of the tree must have been preceded by an even older. more primitive condition in which only one copy of the gene existed, prior to the paralogous duplication. Recognition of this fact is likely to play a central role in future understanding of enzyrne evolution during the early Archean. Although it is true that the raw material for molecular cladistic analysis is restricted to sequences derived from living organisrns ( or from fossil samples from which ancient preserved DNA can be retrieved) and cannot be applied to extinct groups of organisrns, the construction of trees derived from archaic paralogous sequences may allow us to itúer evolution prior to the ealiest detectable nodes. MOLECULAR PHYLOGENIES 147 The flourishing of molecular teclmiques has led into a proliferation not only of sequences of isolated molecular constituents of living organisms, but also of completely sequenced genomes. This is a storehouse of data that has already provided considerable insights into the phylogeny and the diversity of microbes. But because of its very nature. molecular cladistics separates clusters of adaptative characters into a nested lúerarclúcal set wlúch is expected to reflect the temporal sequence of their evolutionary acquisitioIL However fruitful. such approach has all the demerits of a reductionist one- trait approach to biological evolution chastised in early literature as "partial phylogeny", and since the birth of molecular phylogeny has rarely been used to attempt a truly integrative analysis of complete character complexes. Such limitati-0n may be overcomed in several ways, sorne of which are part of intellectual traditions deeply rooted in comparative biology. As Georges Cuvier contended in bis 1805 lectures in Comparative Anatomy, the appearance of the whole skeleton can be deduced up to a certain point by examination of a single bone. The success that Cuvier had in such anatomical .reconstructions is legendary, and was based not only in lús unsurpassed knowledge and intuition, but also on what he terrned the "correlation of parts". i. e .. the full recognition of a functional coordination of the parts of the body of a given animal (Y oung, 1992). Such correlation of parts is not restricted to bones and muscles: at subcellular levels, it underlies the functional coordination among the molecular components of multigenic traits such as metabolic pathways and protein biosynthesis. As shown by the intimate relationslúp between the biosyntheses of valine and isoleucine. their triplet assignments, and the phylogenetic proximity of their aminoacyl-tRNA synthetases. inquiries on the early evolution of the genetic code and other basic features of living systems should be understood not only by determining the molecular phylogenics of sorne of iheir isolated components or by mathetical discussions spiced with a distinct Pythagorean flavor, but with the integrative analysis of character complexes. But for all its foibles. thc relationship between molecular biology and evolutionary theo'!' has opened new. unsuspected avenues of intellectual exploration. Never before has such a wealth of mcthodological approaches and empirical data been available to the students of life·s phenomena. In part because of this prosperity, systematics and evolutionary biology. two of the most broadly oriented fields of life sciences, are now in a state of intellcctual agitation. The symptoms are manifold: it is possible that the traditional species concept may not apply to prokaryotes, time-cherished concepts Iike that of the existence of kingdoms are under fire, the origin and taxonomic position of genetic mobilc elements is unknown. There is an increased awareness that the understanding of the processes underlying the generation of evolutionary novelties and the origin of omogenic patterns cannot be restricted by classical neodarwinian explanations. We are living in the midst of hectic times in which epoch-making debates are reshaping the fu ture of the life sciences, and the development of a more integrated molecular biology may be a never-ending story. It is said that to wish someone to live in an interesting time is one of the most terrible of ali Chinese curses. Whatever the 148 BECERRA ET AL. outcome of curre¡:it discussions and debates. for biology the putative Oriental curse may tum out to be notlúng less than an intellectual blessing. Acknowledgments We are indebted to Dr. Lynn Margulis for her critical reading of the manuscript and many suggestions. Support from the UNAM-DGAPA Project PAPIIT-IN213598 is gratefully acknowledged. 6. References Achenbach-Richter, L., Gupta, R., Stetter., K. O., and Woese, C. R. (1987) Were the original eubacteria thermophiles? System. AppL Microbiol. 9, 34-39 Bhattacharjee. J. K. (1985) a -aminoadipate pathway for the biosynthesis of lysine in lower eukaryotes, CRC Crit. Rev. Microbiol.12, 131-151 Bowler. P. J. (1990) Charles Darwin. The man and his influence. Basil Blackwell, Oxford Confalonieri. F .• Elie. C., Nadal. M .. Bouthier de la Tour, C., Forterre, P., and Duguet. M. (1993) Reverse gyrase, a helicase-like domain and a type 1 topoisomerase in the same polypeptide, Proc. Natl. Acad. Sci. USA 90. 4753-4758 DeLey, J. (1968) Molecular biology and bacteria! phylogeny. in T. Dobzhansky,. K. Hecht, and W. C. Steere (eds), EvolutionaryBiology. Appleton-Ccntury-Crofis. New York. pp. 104-156 Doolittle, W. F. (1999) Phylogcnetic classification and the universal tree, Science 284. 2124-2128 Doolittle, W. F. and Brown. J. R. (1994) Tempo, mode, the progcnote and thc universal root. Proc. Natl. Acad. Sc1. USA 91. 6721-6728 Donoghue. M. J. (1992) Homology, in E. Fox Keller and E. A. Lloyd (eds), Keywords in Evolutionary Biology, Harvard University Press. Cambridge, pp. 170-179 Fitch. W. M. (1970) Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99-113 Fitch. W. M. and Upper. K. (1987) The phylo¡:"nv of iRNA scquences providcs evidence of ambiguity rcduction in thc origin ofth~ gcnctic .:ode. Cold Sprmg Harbor Symp. Quant. Biol. 52, 759-767 Forterre. P., Benachenhou-1-~hfa. N .. Ccniaionieri. F., Duguet, M., Elie, Ch., Labedan, B. (1993) The naturc ofthe last universal ancestor and thc root ofthc trce oflifc, still open qucstions, BioSystems 28, 15-32 Galtier .. N., Tourasse, N., and Gouy, M. (1999) A nonhyperthermophilic conunon ancestor to extant lifc forms. Science 283, 220-221 García-Meza, V., González-Rodrígu.:z, A. and Lazcano. A (1995) Ancicnt paralogous duplications and the scarch for Archean cells. in G. R. Flcischaker, S. Colonna, and P. L. Luisi (cds), Self-Reproducrion o[ Supramolecular Strucrures. from synthetic structures to models o[ minimaUiving systems. Kiüwer. Amsterdam. pp. 231-246 Germonl. A .. Phillipe. H .. :md Le Guyader. H. ( 1997) Evidcnce for the loss of mitochondria in Microsporidia from a mitochondrial-type HSP70 inNosema locustae. Mol. Biochem. Parasitol. 8, 159-168 Gogarten-Boekels. M. and Gogarten. J. P. (1994) The effects ofheavy meteorite bombardment on the eariy evolution of lifc --a new look at the molecular record. Origins of Life and E vol Biosph. 25. 78-83 Gogarten. J. P., Kibak.. H .. Dittrich. P .• Taiz. L., Bowman, E. J., Bowman, 8. J., Manolson, M. L., Poole. J.. Date. T., Oshima., Konishi. L., Denda. K., and Yoshida, M. (1989) Evolution ofthe vacuolar 11-ATPase. implications for the origin of eukayotes, Proc. Natl. Acad. Sci. USA 86, 6661-6665 Gupta, R. S. and Golding. G. B. (1993) Evolution o( HSP70 gene and its implications regarding relationships between archaebacteria. eubacteria. and eukaryotes, J. Mol. Evol. 37, 573-582 Harvey, R. 8. (1924) Enzymes oftherrnal algac, Science 60, 481-482 Holm. N. G., ed., (1992) Marine Hydrothermal Systems and che Origino[ Life, Klüwer Acadernic Publ.. Dordrecht MOLECULAR PHYLOGENIES 149 lwabe. '-' ·· Kuma. K.. Hasegawa. M .. Osawa, S .. and Miyata, T. (1989) Evolutionary relationslúp of arcbaebacteria. eubacteria. and eukaryotes inferred from phylogenetic trees of duplicated genes, Proc. :Vati .. ~cad. Sci. USA 86. 9355-9359 Jensen. R. A. (1976) Enzyme recruitment in the evolution ofnew function, Ann. Rev. Microbiol. 30, 409-425 Kaine. B. P .. Mehr. l. J .. and Woese. C. R. (1994) The sequence, and its evolutionary implications, of a Thermococcus celer protein associated with transcription. Proc. Natl. Acad. Sci. USA 91, 3854-3856 Kandler. O. (1994) The early diversification of life. in S. Bengtson (ed), Early Life on Earth, Nobel Symposi11111 No. 84, Columbia llniversity Press. New York, pp. 124-131 Keek A .. D .. Lazcano, A. and Miller. S. L. (1994) Evolution ofthe biosynthesis ofthe branched-chain amino acids. Ongins of Life and Evol. Biosph. 25. 99-11 O Lazcano. A. (1993) Biogencsis. sorne like it very hot, Science 260, 1154-1155 Lazcano .. -1.. ( 1994) Thc transition from non-living to living, in S. Bengtson (ed), Early Life on Earth. Nobel Symposmm No. 84. Columbia University Press, New York. pp. 60-69 Lazcano . . -1.. (1995) Cellular evolution during the early Archaean: what happended between the progenote and the cenanestor? Microbiologia SEM 11. 1-13 Lazcano .. -1.. . Fox, G. E .. and Oró, J. (1992) Life before DNA. the origin and evolution of early Archean cells. in R. P. !'vlortlock (ed), The Evolzwon o[Metabolic Function, CRC Press, Boca Raton, pp. 237-295 Lazcano. A. and Miller. S. L. (1994) How long did it take for life to begin and evolve to cyanobacteria? Jour. Mol. Evo/. 39. 546-554 Lazcano. A. and Miller. S. L. (1996) The origin and early e\·olution oflife: prebiotic chemistry, the pre-RNA world. and time. Ce// 85. 793-798 1'.fargulis. L. (1993) Symbios1s in Cell Evolution, \V. H. Freeman, New York Margulis. L. and Guerrero. R. (1991) Kingdoms in turmoil, New Scientist 132, 46-50 Mayr. E. ( 1990) A natural system of organisms~ Nature 348, 491 Miller. S. L. and Bada. J. L. ( 1988) Submarine hot springs and the origin oflife, Nature 334, 609-611 !'vlüllor. :-.-!. (1988) Energy metabolism ofprotozoa without mitochondria,Ann. Rev. Microbio!. 42, 465-488 Nuttall. G . H. F. (1904) Blood !mmunity and Blood Relalionship: a demonstration o[ certain blood- rela11onships amongst anima is hy mea ns of the precrpitation test for blood, Cambridge University Press. Cambridge Oparin. A. l. ( 1938) The Ongin o(Life. MacMillan. New York Ouzonis. C. and Sandcr. C. (1992) TFIIB. an evolutionary link between the transcription machincries of archaebactcria and eukaP:otes .. C~ i.' '."!. ! 89-19() Patterson.. C. ( 1988) lfomology ;n da" ical anli mnle~ular biology, Mol. Biol. Evol. S, 603-625 Pisani. F. \I.. De Martino. C .. and Rossi. M. (1992) A DNA polymerasc from the archaeon Sulfolobus so((ataricus shows sequenco similarity to family B DN A polymerases. Nucleic Acid Res. 20, 2711-2716 Reeck. G. R .. de Hüen. C .. Teller. D. C .. Doolittle. R. F., Fitch, W., Dickerson, R. E., Chambon, P .. Mclachlan. A. D .. \fargoliash. E .. Jukes. T. H., ;ind Zuckerlcandl, E. (1987) "Homology" in proteins and nucleic acids, a terminology muddle anda way out ofit, Ce/150, 667 Rivera. \l. C. and Lake. J. A •. (1992) Evidence that eukaryotes and eocyte prokaryotes are inmediate relatives, Sc1ence 257. 74-76 Schwartz. \1. and Davhotf. \1. O. ( 1978) Origins of prokaryotes, eukaryotes, rnitochondria, and chloroplasts. Sc1ence 199. 395-403 Sidow. A. and Bowman. B. H. ( 1991) Molecular phylogeny, Cu"ent Opinion Genet. Develop. 1, 451-456 Sleop. :--i . H .. Zahnle. K. J .. Kastings. J. F .. and Morowitz.. H. J. (1989) Annihilation of ecosystems by large asteroid impacts on thc early Earth. Nature 342. 139-142 Snel. B .. Bork. P .. and Huynen, M. A. (1999) Genome phylogeny based on gene content, Nature Genetics 21. 108-110 Sogin. \l. L. (1994) TI1e origin ofeukaryotes and evolution into major kingdorns, in S. Bengtson (ed), Early Lite on Earth . .Vobel Sympomw1 ,'.·o. 84. Columbia University Press, New York. pp.181-192 Stark. G. R .. and \Vahl. G. M. (1984) Gene amplification, Ann. Rev. Biochem. 53, 447-491 Stener. K. O. ( 1994) The lesson of archaebacteria. in S. Bengtson (ed), Early Life on Earth, Nobel Symposium No. 84. Columbia l 1niversity Press. New York. pp. 114-122 Tekaia. F. and Dujon. B. (1999) Pervasiveness of gene conservation and persistencc of duplicates in cellular genomes.J. Moi. Evo/. 49. 591-600 Tekaia. F .. Lazcano. A .. and Dujon. B. (1999) The genomic tree as revealed from whole proteome comparisons. c;enome Research 9. 550-557 150 BECERRA ET AL \\'achtershauser. G. ( 1990) The case for the chemoautotrophic origins of life in an iron-sulfur world. Ongins o[L!fe Evo/. Biosph. 20, 173-182 Wallace, D. C. and Morowitz. N. H. ( 1973) Genome size and evolution. Chromosoma 40. 121-126 Wheelis. 1\1. L.. Kandler. O .. and Woese, C. R. (1992) On the nature of global classification. Proc. ,\ia1/ Acad. Sc1. USA 89, 2930-2934 Woese. C. R. ( 1987) Bacteria! evolution. Microbio/. Reviews 51. 221-271 \Voese. C. R. (1993) The archae~. their history and significance. in M. Kates. D. J. Kushner. and A T. Matheson (eds), The 8iochem1stry o[ the Archaea (Archaebacteria), Elsevier Science Publishers. Amsterdam. pp. vii-xxix Woese. C. R. and Fox, G. E. (1977) The concept of cellular evolution,Jour. Mol. Evo/. 10, 1-6 Woese. C. R .. Kandler. O., and Wheelis. M. L. (1990) Towards a natural system of organisms. proposal for the domains Archaea. Bacteria. and Eucarya. Proc. Nacl. Acad. Set. USA 87, 4576-4579 Ycas. M. (1974) On the earlier states ofthe biochemical svstern. J. Theor. Bio/. 44. 145-160 Young, D. ( 1992) The D1scovery of Evolucion. Natural Hi~tory Museum Publications, Cambridge Zuckerkandl. E. and Pauling. L. (1965) Molecules as documents of evolutionary history, J. Theorec. Biol. 8. 357-366