This is an example of applying REMUS system to identify unique peptide segments from a group of sequences. The step-by-step guide covers the basic strategy for an efficient searching. The PDB files of human RNaseA superfamily (1e21:A, 1gqv:A, 1dyt:A, 1rnf,:A ,.and 1b1i:A) are employed as the imported data and the identified three- dimensional structures will be shown and explained in the results. Download the example document here ( REMUS-DOCS.doc). 

Setting up a REMUS search

  •  Plan the search

Decide the goals of unique peptide extraction and prepare the interested family sequences. The family sequences can be either amino acid or DNA sequences.

  •  Enter the family sequences (or import sequence files)

The REMUS system accepts input sequences in three ways; importing FASTA sequence files, pasting FASTA sequence data on text editor directly, or importing PDB files. Only importing an appropriate file format will provide correct identification. Here we import five PDB format files (1e21:A, 1gqv:A, 1dyt:A, 1rnf,:A ,.and 1b1i:A). These five PDB files are included in the REMUS program, and can be found in the following directory ¡§C:\Program Files\REMUS\PDBFiles¡¨ after installing the REMUS. After loading the PDB files, you can click on the ¡§sources¡¨ listbox to observe their three-dimensional structure individually.

  •  Choose automatic or semi-automatic for REMUS programs

The REMUS system provides automatic/semi-automatic reinforced merging techniques and traditional brute force searching techniques for extracting unique peptide motifs. Here we employ both automatic and semi-automatic for illustration.

(1) Automatic REMUS

(a) Select "Tools --> REMUS Method --> Automatic" from the menu bar of REMUS system. The selected REMUS method is displayed on the right-hand-side of the ¡§Sources¡¨ listbox and the searching status is shown by progressive bar at the bottom of the main screen. Once the program is done, the results will be displayed on the main screen immediately.

 (b) After clicking ¡§Automatic¡¨, the REMUS system performs the searching algorithms immediately and displays the results. You can click "Description" button to view the presentation of searched results. For the case of automatic searching, the identified segments are displayed in both blue and dark blue alternatively. You can click the ¡§OK¡¨ button on the Description popup window to close it.

(c) You can single click on any searched unique peptide motif to create a popup tooltip and to observe its position and related information. For this example, we can see the popup tooltip showing ¡§Rnase3_A.pdb:A/DPRDSPRY/3(114)/8¡¨ which represent the PDB file name of the specified segment belonging to, the segment contents, sequence number (position number), and length of identified segment respectively.

(d) You can double click on any searched unique peptide motif to see its position and related information. Here we double click on the first searched peptide ¡§DPRDSPRY¡¨, and the REMUS will show the segment¡¦s position on the sequence in orange color and popup a window to show its three-dimensional position simultaneously.

 

(2) Semi-Automatic REMUS

(a) Select "Tools --> REMUS Method --> Semi Automatic" from the menu bar of REMUS system. The selected REMUS method is displayed on the right-hand-side of the ¡§Sources¡¨ listbox, and one popup window will be displayed for parameters settings.

For semi-automatic REMUS system, three main phases of parameter settings are considered and shown on the left side of window. Three main phases are ¡§Grouping Phase¡¨, ¡§Searching Phase¡¨, and ¡§Merging Phase¡¨. Each module and corresponding settings will be discussed and shown in the example.

The main purpose of the grouping module is to discriminate the tolerant features from unique peptide segment representation. However, this grouping module is optional and can be skipped in semi-automatic REMUS procedures if tolerant features are not considered.  Since the possible substitutions of different amino acids from observed frequencies in algorithms of related proteins are well-defined, we provide both a manually assignment interface or an automatically clustering algorithm to assemble 20 amino acids into several independent groups. If clustering algorithm is selected, users are able to assign threshold parameters and referred scoring matrices. A substitution matrix can be selected from standard BLOSUM/PAM matrices or created new ones based on their own aligned block database. In this example, we employ ¡§set Group Manually¡¨ to illustrate the function of tolerant feature. Now we define the substitutable groups are ((A), (R,K), (N), (D,Q,E), (C), (G), (H,F,W,Y), (I,L,M,V), (P), (S), (T)) , where amino acids within the same cluster represent matching states of identity. To obtain these settings, we click on the checkbox of ¡§set Group Manually¡¨ and the system pops up the ¡§Group Input¡¨ window. Now, we can follow the group definitions to select the amino acids into an appropriate group. For example, we click the option button of ¡§group1¡¨ and click ¡§K¡¨ and ¡§R¡¨ symbols from the top list, the system will define amino acid K and R being substitutable symbols for the following procedures. Similarly,  (D,Q,E), (H,F,W,Y), and (I,L,M,V) are assigned into the group2, group3 and group4 respectively. The manually grouping function provides smart checking for avoiding double assignment of any amino acid. After performing manually assignment on grouping, we can click ¡§OK¡¨ button to close the grouping window.

 

For the searching module, the REMUS system employs the Boyer Moore matching algorithm to extract primary unique patterns to reduce time complexity. Primary unique patterns are defined as the basic elements for merging operations. Since each set of grouped amino acids are represented by a unique symbol, the algorithm examines all candidate patterns and extracts the positional information of primary unique patterns by scanning all substituted sequences. If the clustering results are less than 20 groups from the previous module, the tolerant feature of uniqueness will be discriminated and less primary unique patterns can be extracted from the family set. In addition to adopting clustered results to describe the features of uniqueness, this module also provides a statistical analysis to show the level of determined characteristics for each extracted unique pattern. Users are able to assign the mismatch number, which should be less than the length of a primary unique pattern. If there are N sequences in a family set Z, and ZI is represented as the Ith sequence in Z. The length of a primary unique pattern is n and reserved m symbols for each pattern, i.e. allow (n-m) mismatches from matching processes, then the representative level of uniqueness can be quantized. These calculated quantitative percentages represent the level of uniqueness and range from 0% to 100%.  All the primary unique patterns with different representative percentages will be sent to the next module in order to perform merge operations based on their neighboring conditions and thresholding settings of representative levels of uniqueness.

To decide the length of primary segment, we can click the button of ¡§primary length analysis¡¨ to obtain a suggested length for fundamental unique segment. In this example, the REMUS system suggests ¡§3¡¨ for the length of primary segment (i.e. windowing size of string matching),  If we require searching results with non-mismatch conditions, then we have to insert ¡§3¡¨ for ¡§reserved site¡¨ in this case. After setting the parameters of length for primary unique segment, we can click on the ¡§Search¡¨ button for matching results. The matched string in each sequence will be display in the right text box. Each row represents a primary unique segment and its related information, such as uniqueness level, sequence number and position.

For the last merging phase, the operation is proposed to enhance the discrepancies in a family set and emphasize the neighboring relationships instead of the traditional concatenation operation. Two matched primary unique patterns can be merged if they possess overlapping symbols and both are satisfied the criteria of the minimum representative percentage. To reinforce the strength of uniqueness of the merged patterns, a strict merging operation is also provided to make sure each subsegment of the merged pattern still possess unique properties. Assume two primary unique patterns with same length n are merged. The strict merging requirements are satisfied only when the lengths of overlapping symbols are equal to primary pattern length minus one or the length of merged segment is equal to n+1. Therefore, strictly merged unique patterns are constructed by fundamental unique descriptors and reserve the most unique characteristics in sequences. After obtaining all candidate merged segments, the system provides a trimming function which returns a substring from each merged segment with the symbols stripped off the beginning and the end. This function is achieved by evaluating the beginning and ending (n-1) symbols respectively from merged unique pattern set. The (n-1) symbols will be trimmed off when they are matched with another merged unique. Through these three designed modules, unique peptide segment of each sequence from a protein family set will be allocated efficiently. After reinforced processing, these merged unique peptide segments satisfy the criteria of tolerance and representative level of uniqueness.

Here we have to assign the minimum length and minimum representation percentage parameters prior to merging operations. In this example, we assign ¡§8¡¨ as the least length of merged segments, and ¡§100%¡¨ as the minimum percentage of each primary unique segment. Then we can click one of the four merging operations to obtain unique peptide segments in different requirements. If we select ¡§Trim Strict Merge¡¨ button, then the merged results will be shown as the following window. All the other system functions are the same as we introduced in ¡§automatic REMUS¡¨ methodology.

  • Observe the protein structures

  Users can perform a variety of displaying operations by clicking the right button of mouse on the Protein 3D structure window (any position inside the window). It will show a pop-up menu for users to execute six major functions including Transformation, Display, Set Sequence Color, Lock Color, Rotation Matrix, and Saving Categories. These functions are performed in each protein structure individually.

Category 1: Transformation

Users can perform three basic types of 3D geometric transformation including translation, rotation and scaling.

Translation:

(a) Select "Transformation --> Translation" from the pop-up menu by clicking the right button of mouse on the 3D structure window.

(b) Follow the arrow direction and click the left button of mouse on the item of "Translation."

(c) Press and drag the left button of the mouse on the 3D structure.

(d) You should now observe that the 3D structure is moving according to the mouse pointer's direction.

Rotation

(a) Select "Transformation -> Rotation" from the pop-up menu by clicking the right button of mouse on the 3D structure window.

(b) Follow the arrow direction and click the left button of mouse on the item of "Rotation".

(c) Press and drag the left button of the mouse on the 3D structure.

(d) You should now observe the 3D structure is rotating according to the mouse pointer's direction.

Scaling  

(a) Select "Transformation -> Scaling " from the pop-up menu by clicking the right button of mouse on the 3D structure window.

(b) Follow the arrow direction and click the left button of mouse on the item of "Scaling".

(c) Press and drag the left button of the mouse on the 3D structure.

(d) The elements of the 3D structure are dilated if the mouse is dragged toward the right direction, or the elements of the 3D structure are shrunk if the mouse is dragged toward the left direction.

Category2: Display                                                   

REMus is able to display different structure modes or various properties. There are six functions in this category including scaling each point, showing charge properties, showing hydrophilic properties, showing searched results, showing secondary structures, and recovering to the default settings. Except the function of "Scaling each point", all other functions can be applied on the same protein structure simultaneously.

Scaling each point :  

(a) Select "Display --> Scaling Each Point " from the pop-up menu by clicking the right button of mouse on the 3D structure window.

(b) Follow the arrow direction and click the left button of mouse on the item of "Scaling each point".

(c) Press and drag the left button of the mouse on the 3D structure.

(d) Each C-alpha atom will be dilated if the mouse is dragged to the left direction or shrunk if the mouse is dragged toward the right direction.

Showing charge properties  

(a) Select " Display -> Charge " from the pop-up menu by clicking the right button of mouse on the 3D structure window.

(b) The residues possess charge properties will be displayed in red color.

Showing hydrophilic properties

(a) Select " Display -> Hydrophilic " from the pop-up menu by clicking the right button of mouse on the 3D structure window.

(b) The residues possess hydrophilic property will be displayed in blue color.

Showing search results

(a) Select " Display -> Show Search Results " from the pop-up menu by clicking the right button of mouse on the 3D structure window.

(b) If the system has performed the unique peptide motifs extraction operations by "Automatic REMUS", "MERGE", "TRIM MERGE", "STRICT MERGE", or "TRIM STRICT MERGE", the "Show Search Results" function will display the corresponding searched segments within the same colors as shown in main screen text window.

Showing secondary structures

(a) Select " Display -> Show Secondary Structure" from the pop-up menu by clicking the right button of mouse on 3D structure window.

(b) The alpha-helix segments will be colored in red and the beta-sheet segments will be shown in cyan.

Recovering to the default settings

(a) Select " To Default Settings " from the pop-up menu by clicking the right button of mouse on the 3D structure window.

(b) The REMus system will return the 3D structure back to the original settings.

Category 3. Select Subsegment                                     

Select " Select Subsegment " from the pop-up menu by clicking the right button of mouse on the 3D structure window, and the system will pop-up a dialog box to let users select a subsegment from the 3D protein structure. There are two methods to select a specified subsegment shown in yellow.

Set the beginning position and the length of the subsegment, click "Display" button to view the selected subsegment.

Type or paste the segment sequence and click "Find" button. The REMUS will search the pattern and show it in yellow if the segment can be found from the structure.

Category 4. Lock Color                                                  

After users select "Lock Color" from the pop-up menu by clicking the right button of mouse on the 3D structure window. If the function is selected, you can retain the previous color settings. In other words, the system can show charge property, hydrophilic property, searched unique segments, secondary structures, and selected subsegment in different colors simultaneously.

Category 5. Rotation Matrix                                           

Users can reserve the current rotation matrix after performing a series of geometric transformations or re-load an existing rotation matrix. Two ways to rotate structures are described in the followings.

Export:

(a) Select "Rotation Matrix --> Export... " from the pop-up menu by clicking the right button of mouse on the 3D structure window.

(b) Select an appropriate file path and type a proper name to save the rotate information.

(c) Click save. Now the file that contains current geometric information of structure is stored according to the specified file path and filename in your disk.

Import

(a) Select "Rotation Matrix -> Import... " from the pop-up menu by clicking the right button of mouse on the 3D structure window.

(b) Select one of the files created by "Export" function and click it to open.

(c) Now the current protein 3D structure will rotate automatically according to the specified file.

Category6: Saving                                                        

Users can save the searched unique segments as a new output file to store current structure information, which including protein 1D sequence, PDB file with new coordinates, and protein secondary structures information. They are described in the following.

Save 1D Sequence :

(a) Select " Save 1D Sequence " from the pop-up menu by clicking the right button of mouse on the 3D structure window.

(b) Select a direcotry path and type an appropriate file name to save the outputs.

This file contains the extracted 1-D sequence information from the original PDB files.

Save Current Coordinates in a PDB File

(a) Select "Save New PDB File " from the pop-up menu by clicking the right button of mouse on the 3D structure window.

(b) Select the directory path the same as the imported PDB files.

(c) Click "Save" to create a new PDB file with the current adjusted 3D coordinates. The saved file name is the same as the selected file name with ".new" extension.

Save Secondary Structures Information  

(a) Select "Save Secondary Info... " from the pop-up menu by clicking the right button of mouse on the 3D structure window.

(b) Select a directory path and type an appropriate file name.

(c) Click "Save" to record the protein secondary structure information which is retrieved from the original PDB files. 

 

Descriptions of the REMUS output

Automatic REMUS

The automatic REMUS system extracts unique peptide motifs with a degree value above the default lower limit after performing the strict merging operation. The segments in light blue or dark blue are the unique peptide motifs that strictly fit in with a length longer than the default limits (8 for amino acid and 15 for DNA sequences), whereas the segments in purple are the overlapping regions between two unique peptide motifs.

Semi-Automatic REMUS

The semi-automatic REMUS system provides four different merging methods to enhance different uniqueness levels. The searched results are shown in different colors to distinguish their performance, each condition is described in the following section.

Case 1. Merge Operator
The results display the unique peptide motifs with a degree value above the lower limit after performing the merge operation (i.e. only the unique peptide motifs that fit in with the merging criterion with a length longer than the lower limit are shown). The segments in red or dark red are the unique peptide motifs with a length longer than the lower limit, whereas the segments in green or dark green are the unique peptide motifs with a length shorter than the lower limit.

Case 2. Trim Merge Operator
The results display the unique peptide motifs with a degree value above the lower limit after performing the merge operation, and the imperfectly unique residues at the two ends of the peptide motifs are trimmed. The segments in red or dark red are the unique peptide motifs with a length longer than the lower limit, whereas the segments in green or dark green are the unique peptide motifs with a length shorter than the lower limit. The yellow regions represent the trimmed residues.

Case 3. Strict Merge Operator
The results display the unique peptide motifs with a degree value above the lower limit after performing the strict merge operation (i.e. only the unique peptide motifs that strictly fit in with a length longer than the lower limit are shown). The segments in light blue or dark blue are the unique peptide motifs that fit in with the strict with a length longer than the lower limit, whereas the segments in purple are the overlapping regions between two unique peptide motifs.

Case 4. Strict Trim Merge Operator
The results display the unique peptide motifs with a degree value above the lower limit after performing the strict merge operation, and the imperfectly unique residues at the two ends of the peptide motifs are trimmed. The segments in light blue or dark blue are the unique peptide motifs that strictly fit in with a length longer than the lower limit, whereas the segments in purple are the overlapping regions between two unique peptide motifs. The yellow regions represent the trimmed residues.

 

The priority for ranking the unique peptide motifs

The unique peptide motifs are ranked according to four characteristics: hydrophilicity and charge of the residues, number of prolines in the motifs, and the proximity of the motifs towards the N- or C-terminal of the protein. The higher position in the rank indicates the better antigenicity of the unique peptide motif; hence the motif may be a suitable epitope for generation of antibodies.