3 2e UNIVERSIDAD NACIONAL AUTONOMA DE MEXICO Cuernavaca, Morelos 1992 Instituto de Biotecnología Caracterización, manipulación y posible papel en la Naturaleza del gene pac y su producto, la enzima penicilino acilasa de Escherichia culi. TESIS Que para obtener el grado de Doctor en Biotecnología presenta Enrique Merino Pérez TESIS CON FALLA DE ORIGEN UNAM – Dirección General de Bibliotecas Tesis Digitales Restricciones de uso DERECHOS RESERVADOS © PROHIBIDA SU REPRODUCCIÓN TOTAL O PARCIAL Todo el material contenido en esta tesis está protegido por la Ley Federal del Derecho de Autor (LFDA) de los Estados Unidos Mexicanos (México). El uso de imágenes, fragmentos de videos, y demás material que sea objeto de protección de los derechos de autor, será exclusivamente para fines educativos e informativos y deberá citar la fuente donde la obtuvo mencionando el autor o autores. Cualquier uso distinto como el lucro, reproducción, edición o modificación, será perseguido y sancionado por el respectivo titular de los Derechos de Autor. Contenido Capítulo I. Prólogo y Objetivos Capítulo II. Antecedentes y Justificaciones Antibíoticos 13-lactámicos, naturales y semisintéticos 11.2. Importancia de la enzima penicilino acilasa (PA) 11.2.1. Aspecto tecnológico. 11.2.2. Estado actual del conocimiento sobre el gene pac y de su producto, Capitulo III. Mecanismos y elementos involucrados en la de la actividad de penicilino acilasa. 111.1. Introducción 1111.1. Estructura del gene pac de E. coli y de sus regiones de regulación 111.1.2. Estructura de la enzima PA de E. 1112. Resultados y Discución 111.2.1. Estudios sobre la regulación transcripcional del gene pac. 111.2.1.1. Estudios de fusión de los genes pac-lacZ. 111.2.1.2. Cuantificación del mRNA específico del gene pac. 111.2.1.4. Análisis de la región 5' de egulación del gene pac. 111.2.1.5. Presencia del gene pac en cepas de E. coli. 111.2.2. Estudios de la regulación post-transcripcional de la actividad de PA 111.2.2.1. Determinación de la actividad específica de PA en extractos 111.2.2.2. Estudio de proteínas híbridas de PA. 111.3. Conclusiones particulares la enzima PA, regulación 5' y 3'. celulares. 2 Capítulo IV. Incremento de la actividad de la enzima penicilino acilasa mediante técnicas de Ingeniería Genética IVA.. Introducción IV.2. Resultados y Discusión 1V,2.1 Construcción de vehículos moleculares para lograr la sobre-expresión del gene pac 1V.2.2. Estudios de procesos fermentativos de cepas con actividad de PA, 1V.2.2.1. Efecto de la concentración y del tiempo de adición de inductor del sistema en la actividad de PA. IV.2.22, Efecto del fondo genético sobre la actividad de PA. IV.3. Conclusiones particulares Capítulo V. Posible papel de la enzima penicilino acilasa en la Naturaleza V.1. Introducción V.1.1. Distribución, propiedades y mecanismo de acción de la enzima PA. V.1.2. Vía de utilización del ácido fenilacético en Psettdornonas y E. coli. V.2. Resultados y Discusión V.2.1. Crecimiento de cepas de E. coli en presencia de compuestos fenilados, V.2.2. Posible papel fisiológico de la enzima PA, V.3. Conclusiones particulares Capítulo VI. Conclusiones generales y perspectivas Capitulo VIL Otros estudios realizados durante el Doctorado - Origen y evolución de la transmisión de la información genética - Estudio de genes sobrelapados y sus implicaciones evolutivas 3 Bibliografía Indice de Tablas y Figuras Nomenclatura Apéndice I.- Artículos en los que se incluyen elementos de esta Tesis - The role of penicillin amidases in Nature and in industry. Trends in Biochemical Sciences. - Carbon regulation and the role in Nature of the E. coli penicillin acylase (pa gene. Molecular Microbiology, - A general, PCR-based method for single or combinatoria]. oligonucleotide-directed mutagenesis on pUC/M13 vectors. BioTechnics. • Recovery of DNA from agarose gels stained with Methylene Blue, BioTechnics. Apéndice Otros artículos elaborados durante el período Doctoral - New insights on the Comma-less theory. Origens of Life. - Are overlapping open reading frames remants of a primaeval genetic code system? Manuscrito en preparación. 4 Capítulo 1. Prólogo y Objetivos La enzima penicilino acilasa (PA) (E.C. 15.1.11) es la enzima industrialmente usada para hidrolizar a la penicilina G (PenG) en ácido fenilacético (AFA) y ácido 6-amino penicilánico (6-APA). Este último es un intermediario en la producción de penicilinas semisintéticas, que son en nuestros días, el tipo de antibiótico más importante para el tratamiento de enfermedades infecciosas. La enzima PA presenta características únicas dentro de las enzimas de origen procarionte; la forma activa de la enzima corresponde a la de un heterodímero, producto del procesamiento de un precursor común. Paralelo a la importancia de esta enzima, el estudio del gene que la codifica, pac, representa un excelente modelo para conocer, a nivel molecular, algunos de los mecanismos y elementos involucrados en la expresión genética, ya que se encuentra modulado a través de diferentes sistemas de regulación. Por otro lado, el comprender la organización y la regulación de este gene, es un aspecto importante para lograr la obtención de cepas microbianas, modificadas genéticamente, que permitan sobreproducir la actividad de PA. Por estas razones, el gene pac y su enzima PA, representan un modelo de estudio de gran importancia, tanto en el campo básico como en el tecnológico. En base a lo mencionado, la presente Tesis de Doctorado ha tenido como Objetivo General el obtener información sobre los mecanismos y elementos involucrados en la regulación de la enzima PA, y de como este conocimiento a través de la manipulación de su gene estructural pac, permita obtener cepas bacterianas sobreproductoras de la actividad de PA. Así mismo es también parte del Objetivo General de esta Tesis, el elucidar el posible papel fisiológico de la enzima PA en la Naturaleza. Así mismo, los objetivos particulares de esta Tesis son: - Integrar la información publicada más relevante en torno de las enzimas PA y de los genes estructurales que las codifican en diferentes organismos y especialmente en E. coli. - Estudiar algunos de los mecanismos y elementos más importantes involucrados en la regulación de la actividad de PA: a) Organización y regulación del gene pac. b) Estructura de la PA y de su regulación post-transcripcional. - Obtener cepas de E. coli, sobreproductoras de la actividad de PA. - Proponer el posible papel de la enzima PA en la Naturaleza. 5 El contenido de esta Tesis esta organizado en capítulos, y en cada uno de ellos se trata en forma particular, los objetivos planteados. Finalmente, se incluyen en el Apéndice I, los artículos elaborados durante el período Doctoral: Los dos primeros artículos están directamente relacionados con la Tesis Doctoral. Los dos segundos artículos son referentes a innovaciones metodologías, desarrolladas para facilitar parte del estudio experimental de la Tesis. Los dos últimos artículos fueron realizados paralelamente a esta Tesis, y están integrados en otra línea de investigación de nuestro laboratorio que tiene como objetivo estudiar el origen y la evolución de los primeros mecanismos de transmisión de la información genética. 6 Capitulo II. Antecedentes y Justificación . Antibióticos Blactámicos, naturales y semisintéticos. El descubrimiento de los agentes antimicrobianos marcó el advenimiento de una nueva era en la práctica médica. Desde entonces, el control de enfermedades infecciosas se ha basado en la elección de un gran número de compuestos inhibitorios del crecimiento bacteriano a los que se les ha asignado el nombre genérico de antibióticos. Desde el descubrimiento de la penicilina por Fleming en 1929, cientos de antibióticos han sido descubiertos o generados por vías semisintéticas. Así, para 1981, se habían desarrollado cerca de 20,000 penicilinas, 4,000 cefalosporinas, 1,000 rifampicinas, 500 cloranfenicoles, 500 kanamicinas y 250 tetraciclinas. En ese mismo año, la producción de antibíoticos alcanzó aproximadamente las 25,000 toneladas, de las cuales cerca de 12,000 correspondieron a penicilinas, 5,000 a tetraciclinas, 1,200 de cefalosporinas y 800 de eritromicinas."'" Del anterior grupo de antibióticos, los B-Iactámicos (penicilinas y cefalosporinas), se caracterizan por poseer un anillo B-Iactámico en su molécula (Fig. 1). La diversidad de este tipo de compuestos se da a partir de la modificación de su núcleo básico (Fig. 2). Este grupo confiere al antibiótico la capacidad para interactuar covalentemente con proteínas periplásmicas (PBPs) involucradas en la síntesis de pared celular bacteriana en sus etapas terminales." Las cepas usadas industrialmente para incrementar los niveles de producción de estos antibióticos han sido obtenidas a partir de programas de desarrollo a gran escala. Por ejemplo, durante 40 años, la compañía Gist-Brocades (líder actual en la producción de penicilinas), ha mejorado la productividad de su proceso en 1,000 veces." Sin embargo, después de la década de los 60s, el uso desmedido de los antibióticos B-lactárnicos en la terapia clínica así como su aplicación en otras áreas tales como la preservación de alimentos y la nutrición animal, propiciarán la selección de microorganismos resistentes a los mismos. El mecanismo más común de resistencia bacteriana a estos antibióticos, es su in.activación mediante la ruptura del enlace amídico del anillo B-lactámico por las enzimas lactamasas.' Actualmente, estas enzimas constituyen uno de los más serios problemas de la terapia clínica, ya que se presentan en una gran variedad de bacterias patógenas. 7 nuá . CH3 CH3 CO2 a R R (1)1 COOH SO 3 b c Figura 1. Estructura general de un antibiótico B-lactóinico. a) penicilinas, b) cefalosporinas, c) monobactamas. El grupo 13-lactámico se indica con las líneas gruesas. El problema ha sido parcialmente resuelto con el uso de nuevos antibióticos B-lactámicos semisintéticos con estructuras modificadas, que son substratos pobres para estas enzimas, a comparación de los antibióticos naturales. Sin embargo, se ha encontrado que cambios puntuales en los genes de las 13-lactarnasas, pueden modificar el espectro de su especificidad, por lo que es claro que ningún antibiótico B-lactámico podrá permanecer en el mercado por tiempo indefinido.88"53,89,74 Las penicilinas producidas por vía semisintética, son en la actualidad los antibióticos que dominan el mercado por ser, no tan solo los más baratos, sino también los menos tóxicos. Paralelamente a lo anterior, es importante mencionar que el empleo en la clínica de un nuevo y potente inhibidor de 13-lactamasas, el ácido clavulánico, combinado con las penicilinas semisintéticas existentes, permite rescatar en muchos casos, el uso de varias de ellas." R Nombreg R Nombre HO Carbenicilina COOH C001-1 o Figura 2. Estructura química de algunas penicilinas semisintéticas, En la parte superior de la figura se muestra la estructura general de las penicilinas semisintéticas. En la parte inferior se muestra los diferentes tipos de radicales, que dan origen a las correspondientes penicilinas semisintéticas. • 9 0-C1-1-• c, H$ P7opc%t•ilipa OCH3 C>1.,. 14eticilina OC H3 CH-•• Ampicilina Amoxicilina NH3 o H Oxacilina Ciox2cilina Dicloxacilina Flucloxacilina CoMa—CHa—COOH ACIDO FENILACETICO 11.2. Importancia de la enzima penicilino acilasa. 11.2.1. Aspecto tecnológico. El primer reporte de la obtención del 6-APA de un proceso fermentativo data del ¿lío de 1959, cuando Batchelor y colaboradores lograron obtenerlo utilizando una cepa de Penicillum chrysogenum.6 A partir de entonces, fue posible la elaboración de penicilinas semisintéticas, y con ello se logró una gran revolución en la práctica terapéutica. Difícilmente se puede pensar en ejemplos similares, en donde la utilización de una enzima sea responsable de una contribución de tal importancia en el campo de la medicina. Pese al su gran éxito, la baja producción y los complejos mecanismos involucrados en el aislamiento del 6-APA, hizo del proceso desarrollado por Batchelor y colaboradores, un proceso poco rentable en aquel momento. Hoy en día, el 6-APA es producido por hidrólisis química o enzimática de las penicilinas G y y. Actualmente, la producción vía enzimática del 6-APA es preferida sobre la realizada mediante procedimientos químicos, ya que además de resultar más barata, las sustancias utilizadas en el proceso químico son altamente contaminantes, y por lo tanto de uso cada vez más restringido.` El proceso enzimático para la obtención de 6-APA se basa, como ya se ha mencionado, en la hidrólisis de la PenG utilizando a la enzima PA. Los productos de esta reacción son el AFA y 6-APA (Fig. 3). Este último al ser reacilado en el cc-amino de la posición 6 con una cadena lateral particular, permite producir una gran variedad de compuestos 13- lactámicos" (Fig. 2 y 3). Sitio de hidrcliish O1 H II 1 C4H5— C He— C N ACIDO 6-AMINO PENICILANICO • Figura 3. Esquema de la acción de la enzima penicilino acilasa sobre la molécula de penicilina G. En la figura se muestra el sitio de hidrólisis de la PenG por la enzima PA, así como los productos de esta hidrólisis, AFA y 6-APA. lo CH3 CH3 C 00H PENICILINA G PS Alfa PC Beta 4 Alfa Beta _ Aunado a lo anterior, el desarrollo de la tecnología de i111110Vill1.11Ción de enzimas ha permitido la elaboración de biocatalizadores en los que la enzima en cuestión puede ser reusada en varias ocasiones, La producción industrial de 6-APA a traves de la inmovilización de la enzima PA, ha sido uno de los primeros éxitos en la aplicación comercial de esta tecnología." 11.2.2 Estado actual del conocimiento sobre el gene pac y de su producto, la enzima PA En E. coli, la forma activa de esta enzima consiste de dos subunidades heterólogas producto del procesamiento proteolítico de un precursor comün8.9.7' (Fig. 4). Esta forma de activación es similar a la qu ese realiza en algunas hormonas de tipo eucariote, como la insulina, o en algunos zimógenos como el fibrinógeno o el tripsinógeno; sin embargo, se puede considerar que entre los organismos procariotes, este tipo de procesamiento de las enzimas capaces de deacilar penicilinas o cefalosporinas, como son los casos de la PenG acilasa de E. coli,"'", Arthrobacter viscosus,', Bacillus megaterium,", Proteus rettgeri,11.1241'' Kluyvera citrophila,5A" y las enzima GK16 glutaril 7-ACA acilasa," y la enzima SE83 cefalosporina C acilasa" de Pseudomonds melanogetum, es poco frecuente. PS Alfa PC Beta Precursor proteico de '•••••• ••• ••Éjilij 1 la enzima PA li Corte proteolítico del péptido señal (PS) y del péptido conector (PC) Generación de las dos subunidades a y 13 que conforman la enzima PA activa Figura 4. Ruta del procesamiento del precursor proteico de PA. Alfa y Beta representan a las subunidades proteicicas de 24 y 62 kDa respectivamente. PC representa al péptido conector y PS al péptido señal. La forma activa de la enzima PA corresponde a las subunidades alfa y beta correctamente asociadas. Ninguno de los intermediarios del procesamiento presenta actividad de PA," 11 Ligado a lo anterior, es importante señalar que la expresión del gene pac, es un excelente modelo para conocer, a nivel molecular, algunos de los elementos involucrados en la expresión genética de este gene y finalmente de su producto proteico. La actividad de PA se encuentra modulada por diversos elementos, siendo los más importantes la inducción por AFA, regulación por fuente de carbono y temperatura de crecimiento. Inducción por ácido fenilacético: El AFA es uno de los dos productos de la hidrólisis de la PenG por la enzima PA (Fig. 3). Se ha reportado que la actividad de PA en células crecidas en AFA (0.1 a 0.3% p/v), es aproximadamente 8 a 10 veces mayor que la que se obtiene al crecer en ausencia de este compuesto." Esta inducción es altamente específica del AFA, ya que muchos compuestos estructural y químicamente relacionados al mismo, no producen ningún efecto.' Se ha reportado que la actividad de PA, tanto en Proteus rettgeri,21.' como en Kluyvera citrophila,29 no es inducida por AFA. No obstante, cuando el gene pac de K. citrophila es donado en E. coli, el efecto de inducción por este compuesto puede ser observado." Es importante mencionar también que el AFA es un inhibidor del crecimiento celular y que concentraciones superiores a 0.3%, son tóxicas para las bacterias." Un estudio más detallado de esta regulación, así como las contribuciones en este respecto, realizadas durante la presente Tesis, se incluyen en el artículo "Carbon regulation and the role in Nature of the Escherichia coli penicillin acylase (pac) gene", que se anexa en Apéndice 1. Regulación por fuente de carbono: Se ha reportado que la expresión del gene pac de E. culi se encuentra bajo el control de represión catabólica por diferentes fuentes de carbono, tales como glucosa, lactosa, fructuosa y glicerol."'" Esta represión puede ser totalmente liberada mediante la adición exógena de AMPc' Hasta el momento de iniciar la presente Tesis, no había sido demostrado si el efecto de represión catabólica se ejercía directamente sobre el inicio de transcripción del gene pac, o bien de una manera indirecta, mediante la regulación de otros genes cuyos productos estuvieran relacionados con el transporte del inductor AFA, o el transporte y/o procesamiento del precursor de PA." El efecto, directo o indirecto, de la represión catabólica por glucosa sobre la actividad de PA en células crecidas en un medio que contiene glucosa y AFA es de aproximadamente 30% con respecto a aquellas células crecidas en ausencia de glucosa." Es interesante mencionar que la actividad de PA en P. rettgeri es activada por glucosa y reprimida por succinato, fumarato y malato,' y que cuando el gene pac de este organismo 12 es transformado en cepas de E. coli, no se observa ningún efecto de inhibición por succinato, fumarato o malato pero si se presenta una aparente inhibición catabólica por glucosa." En el Apéndice 1, se anexa el artículo "Carbon regulation and the role in Nature of the Escherichia coli penicillin acylase (pac) gene", en donde se analiza en detalle este tipo de regulación, incluyendo las contribuciones realizadas durante la presente Tesis. Termoregulación: La temperatura de crecimiento es un factor determinante en la actividad final de PA. Se ha reportado que la temperatura óptima de crecimiento de E. coli para obtener la actividad máxima de PA es 28 °C. A temperaturas mayores de crecimiento, la actividad de PA decrece, de tal manera que, en células crecidas a 37 °C, la actividad de PA es casi nula."' Esta pérdida de actividad de PA sin embargo, no es debida a la inactivación de la enzima por calor, ya que la temperatura óptima de reacción de esta enzima es de 42 °C.' Oh et al., han demostrado que cepas crecidas a 42 °C transformadas con plásmidos multicopia que contienen al gene pac bajo el control del promotor PL del bacteriofago lambda, acumulan al precursor de PA no procesado.' A partir de este resultado, estos autores concluyen que la termoregulación de la actividad de PA esta mediada a nivel del transporte, y/o procesamiento del precursor de PA. Sin embargo, es necesario tener en cuenta que estas determinaciones fueron realizadas con un promotor diferente del nativo y de que de los experimentos realizados, no es posible inferir si tal acumulación es debida a un efecto de saturación del mecanismo de transporte y/o procesamiento originada por la sobre-expresión del precursor de PA, o bien por otro tipo de regulación. Valle E ha demostrado que la termoregulación de la actividad de PA es independiente del locus envY", cuyo producto esta involucrado en la termoregulación de genes de porinas ompF, ompC y lamB.' García y Buesa,29 han reportado que el gene pac de K. citrophila, se termoregula de la misma manera que lo hace el gene pac de E. coli, no obstante el decremento de la actividad de la primera es tan solo de aproximadamente un 50% cuando es crecida a 37°C. La magnitud de esta regulación es independiente de que el gene pac de K. citrophila se encuentre en K. citrophila, o haya sido donado en E. coli. Es interesante mencionar que Daumy et al. han reportado que la actividad de PA de P. rettgeri solo existe a 28 °C, ya que este organismo no es capaz de crecer a 37 °C. 24 13 BgIII HpoI •••• Hind111 CRP —35 —10 RBS ATG S ceEcAEc Jiu/51C • •L•Z.A.-.• j4, • to M13mp19 digested with the same two enzymes and transfonned into E. col! DH5o."4. Twenty-four ilidepende.nt white w___Qrpflked'and verified for the of :gil and Afluí sites by re- / strictson analysts. Twenty-three clones had both sites incorporated (the other I done usas lacking an insert). We also deterrnined the nucleotide sequence of one of the clones and confirmed that it corresponded to what was expected, Yi with the predictecl ¿hanges at the target, sites forlsonigEdesis ancl•dcrótEa'al- > terations. We. have repeatedly used oligos A, B and C to perforen other mutagenesis experiments and also oligo B as a com- panion to similar scheines on different templates (that is, using a correspond- ing oligo, analogoui to A). We have found the approach fast and reliable, and the only viable alternlive to per- foren combinatorial mudteenesis on nonadjacent sites. We hay so tended Chis procedure ID general Eire nonadjacent point mutations, encom- passing over 1000 bases, with satisfac- tory results (data not shown). Additional mutations, not encoded by the mutagenic °lisos, bave been ob- served in our laboratory, at cates con- sistent with previous reports (3). To keep chis drawback at a minirnum, we suggest using gene fragments as short as possible and trying new, less error- prone, thermally stable polymerases as Lb ey =Hable. • 7 ..• REFERENCES 1. Carter, P., H. BednuIle and G. Winter. 1985. Improved oligonucleotide site-directed muta- genesis using M13 vectors.Sucleic Acids Res. 13:431-4443. 2.Glebel, LB. and R.A. Spritz. 1990. Site- directed mutagenesis using a double-stranded DNA fragment as a PCR primer. Nucleic Acids Res. /S:4947. 3.Hlguchl, R. B. Krurnmel and R.K. 1988. A general method off,: vitro preparador and specific mutagenesis of DNA fragraents: study of protein and DNA interactioas, Nucleic Acids Res. /6:7351.7367, 4.Ha, S.N., H.D. Hunt, R..\I Bartola, J.K. Tullen and L.R. Pease. 19E9. Site-directed mutagenesis by overlap extension using the po','merase chain reaction. Gene 77:51-59. 5.Kuhn, I., F.H. Stephenson, H.W. Boyer and P.J. Greene. 1986. Positive selection vectors utilizing lethality of the EcoRIendanuclease. Gene 44:252-263. 6.Nelson, R.N I. and G. L. Long. 1989. A general method of site-specific mutagenesis Lisies a moc.'ification of rier-mus aquaticus paly- merase chain reaction. Anal. Biochem. 1.50: 147-151. 7.Perrin, S. and G. Glalland. 1990. Site- specific mutagenesis using asymmetric poly- merase chain reaction. Nucleic Acids Res. /8:7433-7438. 8.Sambrook, J., E.F. Fritsch and T. N1anlatls. 1989. Molecular Clorting: A Latxratory Cold Spring Harbor Laboratory, Cold Spring Hartar, NY. 9.Sarkar, G. and S. Someter. 1990. fati "megaprimer" method of site-directed mutagenesis. BinTechniques8::0t 1,07. 10.Sayers, J.R., W. Schmldt and F. Ecksteln. 1988. 5'-3' exonucleases in phecphorothioate. based oligonucleotide-directed mutagenesis. Nucleic Acids Res. /6:791-802. 11.7Aller, M. and M. Sm1th. 1984. °ligo- nucle.otide-direc-ted mutagenesis: • A simple method using two oligonucleotide primers and a sin gle-stranded DNA templatz. DNA 3:479- 488. We are srateful ro Paul Gaytdn for olisonucteotide syn:hesis and to Sonia Caro for secretaria) assistance. ,This work was supported by a srant from DGSCA, Nacional University of Mear o. Address correspondence lo X. Soberdn. Enrique Merino, J'ad Osuna, Francisco Bolívar and Xavier Soberón CL1GB, UNAM Apdo. Postal 410-3 62271 Col, Miraval Cuernavaca, Morelos, Mexico 9 BioTechn;ques Vol. 12, No. 4 (1992) 1 • 587 458 434 267 ABCD • • ••• 1 1•••••~Ip•••• •••• •••• ••• ••• km= am. •ffl. o —•••••••••." • It 1 qm. •••• •••• weame=11 .4* •••• •••• •••••• ••• ••••• ••• •••••••• mm• •••• • 01.1P •••• • e 1=••••• iOld•M 1.1~•1 •••• ••• go. ~•~I~1• ••••• lamml• ami.» •••. •••••• •••• ••• -~•• ••••• •••• ••••• ••••• • • • • BioTechnies ,S February 13, 1992 Xavier Soberon, Ph.D. Universidad Nacional,Autonona de Nexico UNAM; Adpo, Postal 510-3 Cononia Miraval 62271 Curenavaca, Morelos MXIc0 RE: MS # BT91-202 Dear Doctor Soberon: • Enclosed is the galley proof of your article entitled "A general, PCR-based method for single or conbinatorial..,". Please read the galley proof VERY CAREFULLY. Indicate if you disagree with any editing changes, making any necessary corrections directly on the galley proof. Our copy editor may have :nade certain changes in wording, sentence structure anca terminology where it was felt that readability could be improved without changing the technical meaning. Please indicate your approval of the galley proof without further correction3 ("as is") or "with corrections inserted" and return the corrected galley proof to me by Pebruary 21, 1992 so that we may include your papen in the upcoming April 1992 issue. We will need the original hard copy of the galley with a cover letter explaining any corrections. Should you need to FAX the galley to us, pisase send the hardcopy and cover letter by mail. When FAXing, do not use a highlighting pen nor write in any of the margins as these details will not be transmitted. Be sure to program your FAX to its finest setting. am also enclosing information concerning reprints, which may be ordered at your oonvenience. Prepayment is required ©n all orders to be shipped outside of the U.S. Please call for shipping charges. All payments must be in U.S, funds, drawn on a U.S. bank. Thank you again for your fine submission, Sincerely, Thomas J. Stevens Assistant Editor EATON PLBL]SHING • 154 E. CENTRAL STREET • NATICK, 11A 01760 • (508) 655-8282 • FAX (508) 655.9910 RECOVERY OF DNA FROM AGAROSE GELS • STAINED WITH METHYLENE BLUE. Address co:respondence to E. Merino • Noerni Flores, Fernando Valle, Francisco Bolivar and Enrique Merino. Deparnent of Mofecular Biology, Instituto de Biotecnología, Universidad Nacional Autónoma de México. Apdo. Postal 510-3, Cuernavaca Mor. CP 62271, México. Tel. (52)(73)17-2399. FAX (52)(73)17-2388. RECOVERY OF DNA FROM AGAROSE GELS STAIN9 WITH METHYLENE BLUE. Numerous rnethods have been publishefl for isoladng and purifying 'DNA from agarose gels. However, most of them ethidium bromide (EtBr) staining and IN light for DNA visualization. This rnethod has some disadvantages due to the carcinogenic effects of EtBr, complicating handling and disposal of this chemical and stained gels. Furthermore, radiation also induces mutations that might affect integity of the isolated DNA. In this communicadon ve report the appiication of rnethylene blue (MB) staining to visualize DNA in aearosé gels, This DNA can be. easily isolated and used for ligation and cansformadon. To test the sensidvity of our staining rnethod, 4 1.te of pKX2.23-3 plasraid were digested vlith EccRI accordingly to the supplier (Boehringer Mannheim). Serial diludons of this DNA va. ing from 1 jig to 65 ng were loaded in nvo 1% agarose-TBF, gels (5 rnm thick with 2x6 mm wells). One gel was stained with EtBr using a standard protocol (2). The other gel vas stained for 15 minu:es with a 0.02% MB soludon in disdlled water. Aftet-wards, the gel wa.s de.-stained with disdlled water for 15 minutes. The ET3r-stained gel was photographed using a Uy trarisilluminator and the lvf.B-stained gel was photographed using a white light box. The results are presented in figrze 1. As can be seen, using MB-staining, 62,5 ng. of DNA can be visuaiized easily (see fig. 1B). However, EtBr is four dmes more sensitive tha.n therefore the MB staining proce4dure is mainly applicabie when high asnounts of DNA are loaded, i.e. preparadve gels. It is imr.tortan: to mention that longer de-staining times does not increase resoluzion. / In order to compare the transformadon efficiency of MB with Effir-sta.i.ned DNA, 4 110 of M13 mp8 DNA phage digested with EcoRl were loaded into two gels. They were elecrrophoresed and stained 'With EtBr or !dB respectively. Afterwards, the DNA bands were iisua_lized with the appropriate 1i2ht source, cut out with a razor blade and placed into microcentrifuge . tubes. Two different rnethods were used to purify the DNA from agarose. The phenol-freeze method (1) and NaI-glass method (accordingly to the suppliers of the. GeneClean kit, Bio 101, La Jolla, C.A.). 0.1 p...q of DNA isolated by these two methods, were ligated and transforrned into calcium-chioride competent E. culi IM101 The nurnber of plaques were scored and the results are presented Ln Table 1. The aboye procedure was perfon-ned three times from •,,Yhich reproducible results were obtained. The number of plaques obtained with MB stained-DNA was t'Atice of EtBr-stained DNA and did not depend on the purification method. Based on these results, we believe that the MB staining procedure offers a 9.00d and safe alternadve to obtain z highly clonable DNA, even that, when it is compeled with EtBr, is less sensidve. REFERENCES 1. Benson, S.A. 1984. A rapid proc.'eduré for Isolation. of DNA Fragments from Aearose Gels. BioTechniques. 2:66-67. e 2. Sambrook4., E.F.Fritsch and Tjvlaniatis. 1989. Molecular Clonine: A laboratory Manual, 2nd Edition, Cold Spring Harbor Press, Coid Sprins. Harbor, NY. Table 1. Staining`and No. of plagues isolation procedures x10 -3 MB/Nal-glass 3`.0 EtBr/Nal-glass 1.4 MB/Phenol-freeze 3.2 EtBr/Phenol-freeze 1.7 • Table 1. Number of plagues obtained alter transforrning E. coli with DNA isolated frorn agarose gels stained with N1B or EtBr. M13 rnp18 DNA was digested with EcoRl and electophoresed into a 1% agarose-TBE gel, Different lames were stained with either MB or EtBr. Alter purification using either Nar-glass or phenol-freeze rnethods, 0.1 tg of these DNAs were ligated and used to trnsform E. coli .T.N4101 cells. The number of plagues obtained with each one of the DNAs are indicated. Figure 1. DNA stained with 1\1B or EtBr. Serial dilutions of M13 mpS linear DNA: 1000, 500. 250, 125, 62.5, 31.25 and 16.25 ng Manes 1-7, respectively), were loaded and elecu-ophoresed into a 1' agarose-TBE gel and stained with either EtBr (A) or MB (B). • 1 s. A) • 2 3 4 5 6 7 8 I Mg Czea — 2 3 4 5 6 yV f.141-0 v'• • ' I. • 1 ORIGINE OF LIFE AND EVOLUTION OF THE BIOSPHERE The Journal of the International Society for the Study of the Origin of Life Editor: JAMES P. FERRIS Department of Chemistry Rensselaer Polytechnic instituir Troy, NY 12180-3590 U.S.A. Telephone: (518) 276-8493 Telex: 6716050 RPI T'ROU BITNET:..1FerrisaRPITSMTS Fax: (518) 276-4045 October 11, 1991 Dr. Francisco Bolivar Dept. of Molecular Biology Universidad Nacional Autonoma Apdo. Postal 510-3 Cuernavaca Mor. CP 62271, MEXICO Dear Dr. Bolivar: Re: Ms. #475 - New Insights on the Comma-Less Theory (2nd Revision) Thank you for providing an additional clarification of your manuscript. I have accepted it for publication and forwarded it to the publisher. You will receive page proofs in about two months. Thank you for your contribution to the Journal. Sincerely yours, James P. Ferris Editor PUBLISHER: KLUWER ACADEMIC PUBL1SHERS, P.O. BOX 322, 3300 AA DORDRECHT, HOLLAND; AND P.O. BOX 358, ACCORD STATION, HINGHAM, MA 02018-035S, U.S.A. ; NEW INSIGHTS ON THE COMMA-LESS THEORY Enrique Merino, Paulina Balbás and Francisco Bolivar Department of Molecular Biology, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Apdo. Postal 510-3, Cuernavaca, Mor. CP 62271, México. Tel. (5273) 17 23 99. Offprint requests to: Dr. Francisco Bolivar Key words: Genetic code, Comma-less theory, Sequence analysis. Summary The cometa-less hypothesis represents a theoretical effort to describe one of the steps in the early evolution of the translation apparatus. This hypothesis emphasizes the advantages that a RNY coding pattern would have provided in a primitive RNA adaptor-catalyst system. This theory has been debated for years, both in conceptual and statistical tercos, and no consensus about lis validity has been ascertained. In this work, a statistical model refuting this theory was reconsidered. This new approach eliminates the bias due to the absence of stop codons in the open reading frame, and to the amino acid composition of bacteria! genes. The results obtained support the biological significance of the RNY coding pattern. Introduction One of the major questions in molecular biology has been the origin and early evolution of the genetic code. The comma-less theory states that in a primeval stage in the evolution of the txanslation apparatus, only codons of the form RNY were used, thus providing a primitive message in which translatable codons could only be found in one of the three possible reading frames. This theory emphasizes the positional conelations between purines (R) and pyrimidines (Y), and for years much debate and further elaboration of the concept has been carried out (Crick et al. 1957, 1976; Eigen and Shuster 1978; Shepherd 1981, 1984, 1990; Staden 1984; Wong and Cedergren 1986). With the advent of powerful DNA sequencing techniques and the use of computers to perform searches for reiterative patterns of bases in long stretches of DNA, a prebiotic genetic coding system proposed to be RNY has been supported (Shepherd 1984), although the possibility of the repeating RNY pattern being the outcome of natural selection has not been ruled out (Wong and Cedergren 1986). Remnants of such RNY-rich messages can still be detected almost everywhere in coding regions. Computer programs have taken into account this property to discern the actual reading frame in newly determined DNA sequences (Shepherd 1981, 1984, 1990). Briefly, in these programs a sequence is scanned so that a test window length of DNA is taken and moved along the sequence in steps. In the test length, mutations away from the. RNY pattern in the first and third positions are counted in each reading frame, and the RNY codons are considered to be in the frame with the least such mutations. A hypothesis refuting this notion has been presented by Staden (1984). He claims that although this method does indeed generally indicate the correct reading frame, the biological significante about the RNY code being a relic from a primeval comma-less genetic code is inaccurate. To demonstrate this position, Staden generated very long random amino acid sequences without codon preferences but exhibiting exactly Dayhoff's protein sequence database average amino acid composition. A hypothetical gene was then derived from the amino acid sequence, using the triplets that code for each amino acid in an evenly probable way. He observed that in this gene it was possible to predict the real reading frame with Shepherd's method (1981). From this study Staden concluded that the average amino acid composition and the absence of stop codons in the coding frame, rather than codon preferences, are sufficient to explain how Shepherd's method will choose correctly (Staden 1984). However, this fact alone might not be sufficient to invalidate the comma-less theory because when the information contained in DNA sequences of coding genes is considered, a different outcome is observed. Materials and Methods All the programs used in chis study were written in C language and ran under a UNIX operating system on a DECstation 3100. The information recorded in the Genbank release 62 was screened prior to the analysis in order to remove redundant gene sequences. The codon values were calculated according to equation 1 and the results are depicted in Table L The coefficient YRNy/V„,,,RNy was estimated independently for every gene and these results were averaged to obtain the ratio preferente. Results and Discussion To prove that the information contained in the actual DNA sequences is relevant to the evolutionary theory sustained by the RNY code pattern, we repeated the experiment devised by Staden (1984), using a new approach. Instead of considering the deviations away from the RNY pattern in absolute terms, a numeri-cal value was given to each codon accordingly to the probability of choice between RNY (P(RNY) } versus non-RNY (P(non-RNY)) codons for every amino acid of the randomly generated protein, according to the following equations: (1) Voy = 1 - P (RNY) or Vnon.my = 1 P (non-RNY) 2 Therefore, the value (V) for those codons where no choice is possible between RNY or non-RNY equals zero (ie., the stop codons that are YNR), whereas the value for those codons where a choice is possible, depends on the number of triplets that code for the given amino acid. The value (Y) for each codon is presented in Table L Following the experimental design by Staden, very long amino acid sequences exhibiting exactly Dayhoff's protein sequence average amino acid composition, were generated at random. From these amino acid sequences, the corresponding nucleotide sequences were also randomly generated and the numerical values for EVRNy and EV,,„„.1,Ny were calculated. These values were identical therefore the coefficient EV„Ny/EV„„„.„Ny equaled 1. However, when this analysis was performed on a gene by gene basis, utilizing all the prokaryotic sequences reported in the Genbank, the average coefficient obtained was 1.6. This result implies that even when no bias is introduced due to the presence of stop codons (V=0) or to the amino acid composition per se (V, 3' direction, which meant that both strands could have directed the synthesis of different proteins by acting either directly as messengers, or indirectly as templates for RNA synthesis. The discovery of present-day genes with the capacity to code for mRNAs in overlapping strands supports this hypothesis (Rak 1982; Inouye 1988; Simons 88; Henikoff et. al., 1986; Adelman 1987; Spencer et. al,, 1986). There are also some reports where large ORFs (Open Reading Frames) have been detected in the antisense strand of a number of genes, but no expression of such putative proteins has been ascertained in most of the cases (Shepherd 1984; Goldstein and Brutlag 1989; Kumamoto and Nault 1989). Moreover, some discussions are available about the theoretical coding capacity of both DNA strands, and the properties that should be exhibited by the proteins coded by both strands (Casino et. al., 1981; Alff-Steinberg 1981; Tramontano 1984; Sharp 1985; Zull and Smith 1990; Miyajima et. al., 1990; Blalock 1990; Brentani 1990; Sloostra and Roubos 1990). Based on these observations, we decided to conduct a computer analysis of the antisense DNA strands of all the bacterial nucleotide sequences present in the Genbank to evaluate the frequency of occurrence and some properties of these ORFs. Our search detected a prominent group of bacterial genes containing unusually large overlapping ORFs in the antisense DNA strand. A statistical analysis confirined that the length of these ORFs was far from random (Tramontano 1984) but unexpectedly, a considerable group of genes was identified where the ORF overlapped entirely with the cocling gene of the sense strand or even extended further. This phenomenon appears to be prevalent in bacterial organisms for they represent 12% of the total bacterial gene entries in the database, although it is not absent in other species. The genes with totally overlapping ORFs contain a higher preference of triplets in the RNY fonn, when a choice of RNY versus non-RNY codons is possible (ile, thr, ser, gly, val, ala) than the average value, this suggesting that the tendency for using RNY codons and the theoretical capacity of coding in both strands may be related. There is also a positive correlation between the GC content of the DNA 3 1,4,11,f sequences analyzed and the overlapping percentage, but this fact alone does not account exclusively for this observation. Some of these genes have been identified as highly expressed and presents an important codon usage bias. We propose a hypothesis which correlates the theoretical two-strand coding capacity to the evolutionary process of the genetic coding system. MATERIALS AND METHODS All the programs developed for this study were written in C language and ran under a UNIX operating system on a DECstation 3100. The only source of information used in this study was the one recorded in Genbank release 63. This information was screened prior to the analysis in order to remove redundant nucleotide sequences. Eukaryotic sequences were edited to remove introns. We define an overlapping antisense ORF of a gene as the largest reading frame between two in-phase termination codons TGA, TAG or TAA in the 5'--->3' direction of the antisense DNA strand, When the antisense ORFs extend beyond the inidation or termination triplets of the gene sequence, the length is limited by the coding region of the sense strand. The codon usage for bacterial genes was calculated directly from the information recorded in the database. The GC content of genes was also calculated from the DNA sequences, therefore the values may differ from those obtained experimentally. The RNY index was previously described (Merino el., al., 1991). RESULTS Size distribution of antisense ORFs Bacterial sequences of protein encoding genes present in Genbank were used to generate the complementary DNA strand in order to identify the ORFs in every antisense strand in the 5'--->3' direction (antisense ORF). The largest antisense ORF present within each gene was selected to carry out a statistical analysis from which a modal distribution was obtained as it can be seen in Fig. la, Most of the antisense ORFs found are distributed in a Gaussian fashion, where the media and the standard deviation are 1059 bp and 701 bp respectively. However, there is a conspicuous group óf genes that show a statistically significant antisense ORF length, 4 which is by different criteria far beyond random. A control analysis was designed to verify whether the results obtained were significant or an effect of preferential codon usage and amino acid composition. The amino acid sequences of each bacteria]. protein in the Genbank were used to generate two different hypothetical nucleotide sequence databases, where the corresponding coded proteins are identical to the real ones in order to avoid any bias due to amino acid composition and gene length. The first hypothetical database contained randomly chosen codons for every amino acid (Fig, 1b), whereas the choice of triplets in the second database •was generated according to the codon usage exhibited by the actual DNA sequences present in the Genbank (Fig. 1c). The differences between Figs. lb and lc are a consequence of codon usage that affect the frequency of the codons TTA, CTA and TCA (stop codons in the antisense strand). However, the difference observed when comparing the hypothetical gene distribution in Fig. lc with the real gene distribution depicted in Fig, la, indicate that the length of these antisense ORFs is not an exclusive consequence neither of codon usage, nor of amino acid composition. Overlapping percentage of the antisense ORFs The overlapping percentage of every coding gene in the sense strand with its largest antisense ORF was also analyzed, and the results are shown in Fig. 2a. The observed trend matches the one previously observed in Fig. la, in that a normal distribution is found for the majority of the genes, while a considerable group of genes have a totally overlapping antisense ORF. In many cases, the antisense ORF extends beyond the initiation or termination codon of the sense strand, and this fact accounts for the size of the peak observed. It is important to note that the lengths of the genes with totally overlapping ORFs are not considerably biased towards smaller sizes (see the irisen in Fig. 2a), which supports the view that this is not exclusively a size-dependent phenomenon. An analogous control analysis was also carried out to verify that this distribution was not fortuitous, Fig. 2b shows the distribution obtained for a hypothetical nucleotide sequence database generated on a gene by gene basis with randomly chosen triplets, as it can be seen, a smaller peak corresponding to randomly generated 100 % overlapping antisense ORFs is observed. The coinciding peak present in Fig. 2c indicates that codon usage imprints a considerable bias in the appearance of overlapping ORFs. Nevertheless, comparison of the peaks in the three graphs in Fig. 2 supports the view that the totally overlapping antisense ORFs are definitely nonrandom, and are not an exclusive effect of codon usage, amino acid composition or gene size, so therefore they might possess biological significance. Analysis of the nucleotide sequences of the genes with totally overlapping antisense ORFs. The positional correlation of purines and pyrimidines and the GC content were two parameters evaluated in order to assess the significance of the 100% overlapping antisense ORFs phenomenon. Computer programs were devised to scan for the positional correlations between purines and pyrimidines, and to determine whether or not triplets with the pattern RNY were preferentially used in the genes with 100 % overlapping ORFs. The algorithm has been described in detall elsewhere (Merino et. al., 1991). Suffice it to say that a coefficient indicating the preferencial usage of RNY codons versus non-RNY codons whenever a choice is possible (ile, thr, ser, gly, val, ala) was obtained. The bias due to the absence of stop codons in the reading frame and to the average amino acid composition of proteins were eliminated. Absence of codon preference results in a value of 1, so deviations from this figure indícate bias by codon preference. When this analysis was performed on a gene by gene basis in the database, the average coefficient obtained was 1.6, indicating the existente of a predilection for codons with the RNY form. However, when the genes with totally overlapping ORFs were tested in identical conditions, a higher coefficient of 2.2 was obtained. Some of these genes have been identified as highly expressed and are known also to be present in the earliest living organisms. Higher CC content might result in lower abundance of stop codons in the antisense strand, which could be partially responsible for the prominent peak in Fig. 2a. The oyeran. GC content of the bacterial genes was determined to be 51%. Nevertheless, the genes with totally overlapping ORFs in bacteria contain an average value of 62% GC. Correlation study of the RNY and GC contents to overlapping percentage of different species. A detailed study of individual organisms was designed in order to determine the correlation between RNY codons and overlapping percentage of . individual species. A positive correlation with a value of 0.57 was obtained and the results are presentad in Fig. 4. Deterrnination of the correlation between GC content and overlapping percentage of all the genes reported for those species was also obtained. The average values for both parameters were calculated and plotted in Fig. 3, where a positive correlation of 0.87 can be observed. DISCUSSION It is a well established phenomenon that a nonrandom coding capacity exists in the complementary DNA strand of some protein encoding genes (Casino 1981; Tramontano 1984; Alff-Steinberger 1984; Sharp 1985).This capacity is given by large overlapping ORFs, and their frequency in bacterial genel is depicted in Fig. la, The data presented thereafter extend these observations by calculating the overlapping percentage of these ORFs, by analyzing the coding pattern exhibited by these genes, and by correlating this capacity to the evolutionary implications of this phenomenon. It has been argued that long antisense ORFs are a consequence of the low usage of codons PITA, CTA and TCA, which are the triplets opposite to the stop codons in the 5'--->3' direction (Staden 1984). This tendency is observed when comparing Figs, 2b (no codon bias present) and 2c (bias inflicted by bacterial codon usage). However, the fact that such a large number of genes contain a 100 % overlapping single ORF (Fig. 2a), suggests that the function of the antisense DNA strand in a previous evolutionary stage was not solely restricted to serve as a necessary conduit for double stranded semiconservative DNA replication. In this context, we would like to propose that the double strand coding capacity of DNA rnolecules may have been advantageous in the injtial stage to assay a higher number of potendal.coding sequences in order to obtain more proteins capable of biological function. Once function was obtained, the DNA coding sequence became "frozen" and entered a process of natural selection in order to become improved, while the antisense ORF remained as a relic of this stage. Examination of the coding pattem of the bacterial genes in the database showed two important facts. The first is that no marked global preference of GC over AT (51% GC) is present in the bacteria' database. Nevertheless, when the analysis was done with the genes containing totally overlapping ORFs, a value of 62% GC was obtained. This bias was also observed when genes from individual species were studied. A positive correlation of 0.87 between GC content and overlapping percentage was observed (Fig. 3). The second fact observed is the existence of a preferential usage of the RNY codons for those amino acids where a choice between RNY and non-RNY triplets is possible (ile, thr, ser, gly, val, ala). There is a considerable bias towards the use of RNY codons in the bacteria' database, described by the 1.6 coefficient value observed. However, the genes containing totally overlapping ORFs have an enlarged coefficient of 2.2, indicating that 1,1 these codons are largely preferred. The bias due to preferential usage of RNY codons is also observed when genes from individual species were studied. In this case, a positive correlation of 0.57 between RNY and overlapping percentage was obtained (Fig. 4). The importance of both findings is supported by the observation that measurements of the stability and lifetime of GC and AT pairs imply that in the early phase of primitive translation, GC-rich sequences liad the advantage over AT-rich sequences (Kuppers 1984). This is in agreement with the idea that the genes with totally overlapping ORFs retain some properties of the earliest genes, therefore their RNY preferente and GC content is expected to be higher, As it was previously mentioned, the RNY pattern on the sense strand, generates also a RNY pattern on the antisense strand in the direction. The eminent predilection of codons in the form of RNY might provide a device for selection pressure because of the symmetry granted to both DNA strands. The overlapped, in-phase RNY triplets in both strands might then be a reflection of a primitive structure, not only of the genetic code but of the translation process as well because both strands could act as templates. This does not imply that both strands should be today simultaneously transcribed, but rather supports our hypothesis that DNAs with the capacity to code for the highest possible number of proteins may have had a selective advantage in an earlier stage of molecular evolution. This conjecture is supported by two independent fines of evidente. The first is the theory of the coevolution of the genetic code and amino acid biosynthesis developed by Wang (1975, 1981). According to this author, not all the 20 amino acids were originally present in the primaeval genetic code, but were accommodated in a series of stages until the actual genetic code was established. Remarkably, the earliest amino acids were those which nowadays retain prevalently the form RNY, and none of them produces a stop codon in the antisense strand. Wong questions the value of the comma-less theory as opposed to natural selection. Based on calculated mutation cates of present-day genes, this author discards the possibility of RNY as relics of an ancient pattern of genetic coding (Wong 1986).. However, in the light of the "frozen stages" hypothesis in the evolution of the genetic code presented aboye, the results of this study indicate that both, a primitive gene structure and natural selection are Hable for the widely spread phenomenon of overlapping antisense ORFs. In accordance with this notion is the fact that the RNY message is particularly well preserved in genes which have been required to be heavily expressed 8 throughout their evolutionary history. The ribosomal proteins and the translation-related protein factors studied in detall reveal that in fact, the RNY pattern in these proteins is highly preserved. Although it is prevalent in bacterial species, it may still be identified in phylogenetically distant species as well. The second supporting line of evidente is offered by the observations that interacting sises in eukaryotic hormone-receptor proteins are componed of complementary amino acid sequences that might have originated from a single DNA molecule (Goldstein and Brutlag 1989; Zull and Smith 1990; Blalock 1990; Sloostra and Roubos 1990, Brentani 1990). Thig molecular recognition theory claims that overlapping ORFs•for certain eukaryotic hormones have the capacity to code for putative proteins with complementar)/ hydropathy that might dfrect the hormone-receptor interaction. Further development and experimental verification of this theory are necessary before its validity can be ascertained, but if this mechanism is indeed present in such an evolved protein interaction system, it is suggestive that it might also be present in other species. In conclusion, the nature of the molecular features now described and the fact that these are so prevalent in bacteria, strongly suggest that this gene organization offered an evolutionary advantage, which may relate to genetic translation and to the coding capacity of double stranded molecules. The information provided here also supports the view that the RNY coding pattern was once used in earlier life systems, and prevails in present day genes, although it is not certain whether they are a relic from a previous stage in the organization and evolution of the genetic code and the translation process or a result of natural selection. Owing to the lack of evidente of these primitive stages, every hypothesis must remain under constant survey so that the extense gap in our empirical knowledge is closed by theory. Figure 1. Length distribution of the bacterial overlapping ORFs. (a) Actual bacterial gene sequences. (b) Control analysis using a putative database generated with randomly chosen codons for every amino acid. (c) Control analysis with a putative database generated using the codon usage of the actual bacterial Genbank sequences. Figure 2. Overlapping percentage distribution of the bacterial overlapping ORFs. (a) Actual bacterial gene sequences. The insert shows the size distribution of genes with totally overlapping ORFs. (b) Control analysis tising the putative database containing randomly chosen codons for every amino acid. (c) Control analysis using the putative database generated with the codon usage of the actual bacterial Genbank sequences. Figure 3. Correlation between the overlapping percentage of the ORFs and the (3C content of individual species. The GC content was theoretically calculated using the DNA sequences reponed. Fig. 4. Correlation between the overlapping percentage of the ORFs and the RNY index of individual species. 10 REFERENCES Adelman, J.P, Bond, C.T., Douglass, J. and Herbert, E. (1987) Two mammalian genes transcribed from opposite strands of the same DNA locus. Science 235:1514-1517. Alf-Steinberger, C. (1984) Evidence for a coding pattern on the non-coding strand of the Escherichia con genome, Nucl. Acids Res. 12:2235-2241. Blalock, J.E. (1990) Complementarity of peptides specified by 'sense' and 'antisense' strands of DNA. Trends Biotechnol, 8:140-144. Brentani, R. (1990) Informatión capacity of both DNA strands. Trends Biochem Sci 15:463 Casino, A., Cipollaro, M., Guerrini, A,M., Mastrocinque, G. Spena, A. and Scarlato, V. (1981) Coding capacity of complementary DNA strands. Nucl. Acids Res. 9:1499-1518. Crick, F.H.C., Griffith, J.S. and Orgel, L.E. (1957) Codes without commas.Proc. Nati. Acad, Sci. USA 43:416-421. Crick, F.H.C., Brenner, S., Klug, A. and Pieczenik, G. (1976) A speculadon on the origin of protein synthesis. Origins Life 7:389-397. Eigen, M. and Shuster, P. (1978) the hypercycle. A principie of natural self-organization. Naturwissenschaften 65:341-369. Gibson, Ti. and Lamond, A.I. (1990) Metabolic complexity in the RNA world and implications for the origin of protein synthesis. J. Mol. Evol, 30:7-15. Goldstein, A. and Brutlag, D.L. (1989) is there a relationship between DNA sequences encoding peptide ligands and their receptors? Proc. Nati. Acad. Sci, USA 86:42-45 Henikoff, S., Keene, M.A., Fechtel, K. ang Fristrorn, J.W. (1986) Gene within a gene: nested Drosophila genes encode unrelated poteins on opposite DNA strands. Cell 44:33-42 .Kumamoto, C.A. and Nault, A.K. (1989) Characterization of the Escherichia coli protein-export gene seca Gene 75:167-175, Kuppers, B.Q.(1983) Molecular Theory of. Evolud9n. Springer-Veriag, Berlin. Miyajima, N., Horiuchi, R., Shibuya, Y., Fukushige, S., Matsubara, K, Toyoshima, K and Yamamoto, T. (1990) Two erbA homologous encoding proteins with different T3 binding capacities are transcribed from opposite DNA strands of the same genetic locus. Cell 57:31-39. Merino E., Balbás, P., and Bolivar, F. (1991) New insights on the Comma-less theory. 11 1.4é,e.1 Otigens of life. In press. Rak, B., Lusky, M. and Hable, M. (1982) Expression of two proteins from overlapping and oppositely oriented genes on transposable DNA insertion element IS5. Nature 297:124-128. Sharp, P.M. (1985) Does the non-coding strand code? Nucl. Acids Res. 13:1389-1397. Shepherd, J.C.W. (1984) From primaeval message to present day gene. Symp. Quant. Biol. CSH 52:1099-1108. Shepherd, J.C.W. (1984) Fossil Remnants of a primaeval genetic code in all forms of life? Trends Biochem. Sci. ??:8-10, Simons, R.W. (1988) Naturally occurring antisense RNA control- a brief review. Gene 72:35-44. Sloostra, J. and Roubos, E., (1990) Sense-antisense complementarity of hormone-receptor interaction sites. Trends Biotechnol. 8:279-281. Spencer, C.A., Geitz R.D. and Hodgetts, R.B. (1986) Overlapping transcription units in the doga descarboxylase region of Drosophila. Nature 322:279-281. Staden, R. (1984) Measurement of the effects that coding for a protein has on a DNA sequence and their use for finding genes. Nucl. Acids Res. 12:551-567. Tramontano A., Scarlato, Y., Barni, N. Cipollaro, M., Franze, A., Macchiato, M.F. and Cascino, A. (1984) Statistical evaluation of the coding capacity of complementary DNA strands. Nucl. Acids Res. 12:5049-5090. Tyagi, S. (1981) Origin of translation: the hypothesis of permanently attached adaptors. Origins Life 11:343-351. Wang, T.F. (1975) A co-evolution theory of the genetic code. Proc. Nati. Acad. Sci. USA 72:1909-1912. Wong, T.F. (1981) Coevolution of genetic code and amino asid biosynthesis. Trends Biochem. Sci. 6:33-36 Wong, T.F. and Cedergren, R. (1986) Natural selection versus primitive gene structure as determinant of codon usage. Eur. J. Biochem. 159:175-180. Zull, J.E. and Smith, S.K. (1990) Is genetic code redundancy related to retention of structural information in both DNA strands? Trends Biochem. Sci. 15:257-261. 12 o 10 070 - 10 O% 300pb 10 0% -0 o