SRP RNA genes are predicted using a two-step procedure. In the first step a heuristic search for the strongly conserved helix 8 motif of SRP RNA is carried out. Here the rnabob program of Sean Eddy is used. The sequence(s) found in this search is then further examined with covariance models [Eddy, S.R. and Durbin, R. (1994) Nucleic Acids Res. 22, 2079-2088] that are based on previously known SRP RNA sequences. In this step COVE programs are used.
Rnabob (fast RNA motif searching ) source code may be obtained from here. Rnabob descriptors were constructed for the different categories: 1) archaebacteria and eubacteria with GRRA loop, 2) eubacteria with TRRC loop, 3) plants, 4) yeasts, and 5) metazoans. The rnabob source code was slightly modified to produce output with sequences flanking the descriptor pattern and to be able to produce sequence output in fasta format. It was used to scan the sequence databases with the appropriate descriptors for the different classes of SRP RNA genes and to produce outputs in fasta format using the -f option. The -s (skip) option was used to avoid non-significant hits in regions of 'N' repeats.
The COVE programs were obtained from here . COVE statistical models were built from the SRPRNA genes alignments available on the SRPDB site at http://bio.biomedicine.gu.se/dbs/SRPDB/srprna.html using the -a option of the covet (train) program. The same sequences, unaligned, were used as initial training data for the models. The -m (maximum likelihood) option was used to give more reliable results. Statistical models were built for the following categories: 1) prokaryotes without Alu domain, 2) prokaryotes with Alu domain, 3) yeasts, and 4) non-yeast eukaryotes. Each model was trained on the subset of sequences in the SRPDB belonging to organisms of that particular category. Once novel sequences were found, models were rebuilt on the new set of sequences to obtain a higher accuracy. The covels (local score) program was used to search rnabob fasta outputs for SRP RNA gene candidates. A list was produced with the sequences with the best match to the statistical model used. The -w (window) option was set to a value slightly larger than the expected maximum size for a SRP RNA. Alignments of the sequences were constructed using the covea (align) program and the appropriate statistical model for each category. The -o (outfile) and -s (scorefile) options were used to produce file outputs with alignments and scores of similarity to the model.
Mfold version 3.1 was generously provided by Dr. M. Zuker. It was used to predict secondary structure of predicted SRP RNAs. To obtain a folding consistent with the secondary structure constraints implicit in the rnabob descriptors we used in many instances these constraints as input to mfold.