SENTRA, a Database of Signal Transduction Proteins
Mark D'Souza,1 Margaret F. Romine,2 Natalia Maltsev1,*
1 Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA
2 Environmental Microbiology Group, Pacific Northwest National Laboratory, Richland, WA 99352, USA
ABSTRACT
SENTRA is a database of proteins associated with microbial signal transduction. The database currently includes the classical two-component signal transduction pathway proteins and methyl-accepting chemotaxis proteins, but will be expanded to also include other classes of signal transduction systems that are modulated by phosphorylation or methylation reactions. Although the majority of database entries are from prokaryotic systems, eukaroytic proteins with bacterial-like signal transduction domains are also included. Currently SENTRA contains signal transduction proteins in 34 complete and almost completely sequenced prokaryotic genomes, as well as sequences from 243 organisms available in public databases (SWISS-PROT and EMBL). The analysis was carried out within the framework of the WIT2 system, which is designed and implemented to support genetic sequence analysis and comparative analysis of sequenced genomes.
INTRODUCTION
Efficient adaptive response requires the ability to monitor environmental changes that occur both within and exterior to the cell. Bacteria use a variety of mechanisms to sense and transmit information that is important for maintaining homeostatic balance. It is well known that "bacterial signaling proteins are built from modular components, input sensing domains, output effector domains, and transmitter and receiver domains for promoting protein-protein communication. Signaling circuits are assembled by "wiring" these elements in various configurations" (1).
Signal transduction proteins represent one of the most challenging groups of proteins for genetic sequence analysis. Shuffled domains, different combinations of effector domains, and the large number of paralogous sequences in a given genome complicate the assignment of function using traditional methods of genetic sequence analysis. In most cases, it is very difficult to understand the nature of a signal transmitted by predicted signal transduction protein without extensive wet lab experiments.
SENTRA -- a database of proteins related to signal transduction--was created to address these issues. We used a variety of different sequence analysis tools: domain analysis programs, tools for the analysis of transmembrane domains, and tools for the evaluation of gene clusters. The objective was to add sensitivity to the sequence analysis of this class of proteins and to provide additional information to enable us to make an educated guess about the nature of the transmitted signal.
DESCRIPTION OF THE DATABASE
Identification of Proteins Included in SENTRA
The current version of SENTRA contains 2 classes of signal transduction proteins--sensory transduction histidine kinases, and methyl-accepting chemotaxis proteins. These proteins were identified from the proteins belonging to the 34 organisms analyzed in the WIT2 framework, as well as the proteins from the 243 organisms in SWISS-PROT and TrEMBL. We found a total of 1591 sensory transduction histidine kinases and 321 methyl-accepting chemotaxis proteins, which have been deposited in SENTRA.
Putative signal transduction proteins in public databases were identified by Pfscan (2) analysis. We used the following ProSite (3) or Pfam A (4) protein profiles to identify signal transduction proteins. Pfscan and all the profiles used were downloaded with the pftools 2.1 package from ISREC. The ProSite profiles we have used are available at this site (e.g. http://www.isrec.isb-sib.ch/cgi-bin/get_pstprf?PS50110). The bacterial histidine kinase domain (ProSite profile PS50109|HIS KIN), the histidine-receiving domain (Pfam profile PF00072|RESPONSE_REG and ProSite profile PS50110|HIS_REC), the methyl-accepting chemotaxis protein (MCP) signaling domain (Pfam profile PF00015|MCPSIGNAL and ProSite profile PS50111|MCP), the CheB-type methyl-esterase (ProSite profile PS50122|CHEB), and the CheR-type MCP methyl-transferase (ProSite profile PS50123|CHER). All proteins from SWISS-PROT release 38 and TrEMBL, as well as putative ORFs from complete or nearly complete prokaryotic genomes currently available in WIT2 that scored at Pfscan level 0 or 1 with at least one of these profiles, were deposited in SENTRA.
Analysis of Proteins in SENTRA
Analysis of putative signal transduction proteins associated with sequenced genomes in WIT2 was carried out within the framework of the WIT2 system (5). Large-scale homology searches using BLAST (6) and fastA against both public databases and among genomes in WIT2 were used to assist in assignment of functions to these proteins. Further evidence of function was gathered by analysis of the transmembrane domains using SOSUI (7) and by analysis of potential association of signal transduction proteins in gene clusters using methods described by Overbeek et al. (8).
The latter technique attempts to infer functional coupling between genes based on conservation of chromosomal gene clusters between genomes. Our analysis confirms an observation that in many cases sensory transduction proteins occur as a part of a sensory transduction cassette or as part of a chemotaxis operon. Composition of a potential gene cluster could provide important information for identifying the probable nature of the transmitted signal, as well as the composition of the genes involved in the regulatory cascade. The value in simultaneous cluster analysis over a large number of phylogenetically diverse genomes is exemplified by the chemotaxis signal transduction system in B. subtilis.
All of the proteins deposited in SENTRA have been further analyzed with the remaining profiles in Pfam and ProSite to gather information about other functional domains associated with these proteins.
Access to the Data in SENTRA
All entries in SENTRA are characterized by whether they contain transmembrane domains and by where, within the protein sequence, Pfam orProSite functional domains are found. For entries extracted from WIT2 genomes, ORFs within close proximity and found to participate in conserved gene clusters across other genomes are also listed. Each entry is directly linked to the corresponding entry in SWISS-PROT or TrEMBL and, where appropriate, to the entry in WIT2. The results of these analyses can be accessed in a number of ways:
The summary of the occurrence of signal transduction proteins in the membrane is particularly useful for understanding the processing of environmental signals by organisms residing in different environments.
FUTURE PROSPECTS
Future plans include the expansion of the contents of SENTRA to include other types of phosphorylation-based as well as methylation-based transduction systems, PAS proteins, and the like. We also plan to include information on eukaryotic signaling systems. In addition, we are developing a mechanism that will allow experts to efficiently annotate the entries in SENTRA. This annotation will include literature information as well as a summary of expert opinion about a particular entry.
ACKNOWLEDGMENTS
This work was supported by the Office of Biological and Environmental Research, U.S. Department of Energy, under Contract W-31-109-Eng-38.
REFERENCES
1. Parkinson, J. S. (1995) In: Hoch, A. and Silhavy, T. J. (eds.), Two-Component Signal Transduction. ASM Press, Washington, D.C., pp. 9-23.
2. Bucher, P., Karplus, K., Moeri, N., and Hofmann, K.(1996). Comput. Chem., 20, no. 1, 3-23.
3. Hofmann, K., Bucher, P., Falquet, L. and Bairoch, A. (1999), Nucleic Acids Res., 27, 215-219. http://www.expasy.ch/prosite/
4. Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Finn, R.D. and Sonnhammer, E.L. (1999) Nucleic Acids Res., 27, 260-262. sanger.ac.uk/Software/Pfam/"> http://www.sanger.ac.uk/Software/Pfam/
5. Overbeek, R., Larsen, N., Maltsev, N., Pusch, G.D., and Selkov, E.(1999). In: Letovsky, S. (ed.), Molecular Biology Databases, Kluwer Academic Pub., Dordrecht, pp. 158-163.
6. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990) J. Mol. Biol. 215, 403-410.
7. Hirokawa, T., Boon-Chieng, S. and Mitaku, S. (1998) Bioinformatics, 14, 378-379. http://www.tuat.ac.jp/~mitaku/adv_sosui/
8. Overbeek, R., Fonstein, M., D'Souza, M., Pusch, G.D. and Maltsev, N. (1999). Proc. Natl. Acad. Sci. USA, 96, 2896-2901.