|
|
|
This is
an example of applying REMUS system to identify unique peptide
segments from a group of sequences. The step-by-step guide covers the basic
strategy for an efficient searching. The PDB files of human RNaseA
superfamily (1e21:A, 1gqv:A, 1dyt:A, 1rnf,:A ,.and 1b1i:A) are employed as
the imported data and the identified three- dimensional structures will be
shown and explained in the results. Download the example document here ( REMUS-DOCS.doc).
|
|
Setting up a REMUS search
Decide the goals of
unique peptide extraction and prepare the interested family sequences. The
family sequences can be either amino acid or DNA sequences.
The REMUS
system accepts input sequences in three ways; importing FASTA sequence files,
pasting FASTA sequence data on text editor directly, or importing PDB files.
Only importing an appropriate file format will provide correct
identification. Here we import five PDB format files (1e21:A, 1gqv:A, 1dyt:A, 1rnf,:A ,.and
1b1i:A). These five PDB files are included in the REMUS program, and can be found in the following directory ¡§C:\Program
Files\REMUS\PDBFiles¡¨ after installing the REMUS.
After loading the PDB files, you can click on the ¡§sources¡¨ listbox to observe their three-dimensional structure
individually.
The REMUS system
provides automatic/semi-automatic reinforced merging techniques and
traditional brute force searching techniques for extracting unique peptide
motifs. Here we employ both automatic and semi-automatic for illustration. (1) Automatic REMUS (a) Select
"Tools --> REMUS Method --> Automatic" from the
menu bar of REMUS system. The selected REMUS method is
displayed on the right-hand-side of the ¡§Sources¡¨ listbox
and the searching status is shown by progressive bar at the bottom of the
main screen. Once the program is done, the results will be displayed on the main
screen immediately.
(b) After
clicking ¡§Automatic¡¨, the REMUS system performs the searching
algorithms immediately and displays the results. You can click
"Description" button to view the presentation of searched results. For
the case of automatic searching, the identified segments are displayed in
both blue and dark blue alternatively. You can click the ¡§OK¡¨ button on the
Description popup window to close it.
(c) You can single click on any searched unique peptide
motif to create a popup tooltip and to observe its
position and related information. For this example, we can see the popup tooltip showing ¡§Rnase3_A.pdb:A/DPRDSPRY/3(114)/
(d) You can double click on any searched unique peptide
motif to see its position and related information. Here we double click on
the first searched peptide ¡§DPRDSPRY¡¨, and the REMUS will show the
segment¡¦s position on the sequence in orange color and popup a window to show
its three-dimensional position simultaneously.
(2) Semi-Automatic REMUS (a) Select "Tools --> REMUS Method
--> Semi Automatic" from the menu bar of REMUS system. The
selected REMUS method is displayed on the right-hand-side of the ¡§Sources¡¨
listbox, and one popup window will be displayed for
parameters settings.
For semi-automatic REMUS system, three
main phases of parameter settings are considered and shown on the left side
of window. Three main phases are ¡§Grouping Phase¡¨, ¡§Searching Phase¡¨, and ¡§Merging
Phase¡¨. Each module and corresponding settings will be discussed and shown in
the example. The main purpose of
the grouping module is to discriminate the tolerant features from unique peptide segment representation. However,
this grouping module is optional and can be skipped in semi-automatic REMUS procedures if tolerant features are
not considered. Since the
possible substitutions of different amino acids from observed frequencies in
algorithms of related proteins are well-defined, we
provide both a manually assignment interface or an automatically clustering
algorithm to assemble 20 amino acids into several independent groups. If
clustering algorithm is selected, users are able to assign threshold
parameters and referred scoring matrices. A substitution matrix can be
selected from standard BLOSUM/PAM matrices or created new ones based on their
own aligned block database. In this example, we employ ¡§set Group Manually¡¨
to illustrate the function of tolerant feature. Now we define the substitutable groups are ((A), (R,K), (N), (D,Q,E), (C),
(G), (H,F,W,Y), (I,L,M,V), (P), (S), (T)) , where amino acids within the same
cluster represent matching states of identity. To obtain these
settings, we click on the checkbox of ¡§set Group Manually¡¨ and the system
pops up the ¡§Group Input¡¨ window. Now, we can follow the group definitions to
select the amino acids into an appropriate group. For example, we click the
option button of ¡§group
For the searching module, the REMUS system employs the Boyer
Moore matching algorithm to extract primary unique patterns to reduce time
complexity. Primary unique patterns are defined as the basic elements for
merging operations. Since each set of grouped amino acids are represented by
a unique symbol, the algorithm examines all candidate patterns and extracts
the positional information of primary unique patterns by scanning all
substituted sequences. If the clustering results are less than 20 groups from
the previous module, the tolerant feature of uniqueness will be discriminated
and less primary unique patterns can be extracted from the family set. In
addition to adopting clustered results to describe the features of
uniqueness, this module also provides a statistical analysis to show the
level of determined characteristics for each extracted unique pattern. Users
are able to assign the mismatch number, which should be less than the length
of a primary unique pattern. If there are N
sequences in a family set Z,
and ZI is represented as the Ith
sequence in Z. The length of a primary unique pattern is n and reserved m symbols for each pattern, i.e. allow (n-m) mismatches from matching processes, then the representative
level of uniqueness can be quantized. These calculated quantitative
percentages represent the level of uniqueness and range from 0% to 100%. All the primary unique patterns with
different representative percentages will be sent to the next module in order
to perform merge operations based on their neighboring conditions and thresholding settings of representative levels of
uniqueness. To decide the length of primary segment, we can click
the button of ¡§primary length analysis¡¨ to obtain a suggested length for
fundamental unique segment. In this example, the REMUS system
suggests ¡§3¡¨ for the length of primary segment (i.e. windowing size of string
matching), If we require
searching results with non-mismatch conditions, then we have to insert ¡§3¡¨
for ¡§reserved site¡¨ in this case. After setting the parameters of length for
primary unique segment, we can click on the ¡§Search¡¨ button for matching
results. The matched string in each sequence will be display in the right
text box. Each row represents a primary unique segment and its related
information, such as uniqueness level, sequence number and position.
For the last merging
phase, the operation is proposed to enhance the discrepancies in a family set
and emphasize the neighboring relationships instead of the traditional
concatenation operation. Two matched primary unique patterns can be merged if
they possess overlapping symbols and both are satisfied the criteria of the
minimum representative percentage. To reinforce the strength of uniqueness of
the merged patterns, a strict merging operation is also provided to make sure
each subsegment of the merged pattern still possess
unique properties. Assume two primary unique patterns with same length n are merged. The strict merging
requirements are satisfied only when the lengths of overlapping symbols are
equal to primary pattern length minus one or the length of merged segment is
equal to n+1. Therefore, strictly
merged unique patterns are constructed by fundamental unique descriptors and
reserve the most unique characteristics in sequences. After obtaining all
candidate merged segments, the system provides a trimming function which
returns a substring from each merged segment with the symbols stripped off
the beginning and the end. This function is achieved by evaluating the
beginning and ending (n-1) symbols
respectively from merged unique pattern set. The (n-1) symbols will be trimmed off when they are matched with
another merged unique. Through these three designed modules, unique peptide
segment of each sequence from a protein family set will be allocated
efficiently. After reinforced processing, these merged unique peptide
segments satisfy the criteria of tolerance and representative level of uniqueness.
Here we have to assign the minimum length and minimum
representation percentage parameters prior to merging operations. In this
example, we assign ¡§8¡¨ as the least length of merged segments, and ¡§100%¡¨ as
the minimum percentage of each primary unique segment. Then we can click one
of the four merging operations to obtain unique peptide segments in different
requirements. If we select ¡§Trim Strict Merge¡¨ button, then the merged
results will be shown as the following window. All the other system functions
are the same as we introduced in ¡§automatic REMUS¡¨ methodology.
Users can perform a
variety of displaying operations by clicking the right button of mouse on the
Protein 3D structure window (any position inside the window). It will show a
pop-up menu for users to execute six major functions including
Transformation, Display, Set Sequence Color, Lock Color, Rotation Matrix, and
Saving Categories. These functions are performed in each protein structure
individually. Category 1:
Transformation Users can perform
three basic types of 3D geometric transformation including translation,
rotation and scaling. Translation: (a) Select
"Transformation --> Translation" from the pop-up menu by
clicking the right button of mouse on the 3D structure window. (b) Follow the arrow
direction and click the left button of mouse on the item of
"Translation." (c) Press and drag
the left button of the mouse on the 3D structure. (d) You should now
observe that the 3D structure is moving according to the mouse pointer's
direction.
Rotation (a) Select
"Transformation -> Rotation" from the pop-up menu by clicking
the right button of mouse on the 3D structure window. (b) Follow the arrow
direction and click the left button of mouse on the item of
"Rotation". (c) Press and drag
the left button of the mouse on the 3D structure. (d) You should now
observe the 3D structure is rotating according to the mouse pointer's
direction.
Scaling (a) Select
"Transformation -> Scaling " from the
pop-up menu by clicking the right button of mouse on the 3D structure window.
(b) Follow the arrow
direction and click the left button of mouse on the item of
"Scaling". (c) Press and drag
the left button of the mouse on the 3D structure. (d) The elements of
the 3D structure are dilated if the mouse is dragged toward the right
direction, or the elements of the 3D structure are shrunk if the mouse is
dragged toward the left direction.
Category2:
Display
REMus is able to display
different structure modes or various properties. There are six functions in
this category including scaling each point, showing charge properties,
showing hydrophilic properties, showing searched results, showing secondary
structures, and recovering to the default settings. Except the function of
"Scaling each point", all other functions can be applied on the
same protein structure simultaneously. Scaling each point
: (a) Select
"Display --> Scaling Each Point " from
the pop-up menu by clicking the right button of mouse on the 3D structure
window. (b) Follow the arrow
direction and click the left button of mouse on the item of "Scaling
each point". (c) Press and drag
the left button of the mouse on the 3D structure. (d) Each C-alpha atom
will be dilated if the mouse is dragged to the left direction or shrunk if
the mouse is dragged toward the right direction.
Showing charge
properties (a) Select " Display -> Charge " from the pop-up menu by
clicking the right button of mouse on the 3D structure window. (b) The residues
possess charge properties will be displayed in red color.
Showing hydrophilic
properties (a) Select " Display -> Hydrophilic " from the pop-up
menu by clicking the right button of mouse on the 3D structure window. (b) The residues
possess hydrophilic property will be displayed in blue color.
Showing search
results (a) Select " Display -> Show Search Results " from the
pop-up menu by clicking the right button of mouse on the 3D structure window.
(b) If the system has
performed the unique peptide motifs extraction operations by "Automatic
REMUS", "MERGE", "TRIM MERGE",
"STRICT MERGE", or "TRIM STRICT MERGE", the "Show
Search Results" function will display the corresponding searched
segments within the same colors as shown in main screen text window.
Showing secondary
structures (a) Select " Display -> Show Secondary Structure" from
the pop-up menu by clicking the right button of mouse on 3D structure window.
(b) The alpha-helix
segments will be colored in red and the beta-sheet segments will be shown in
cyan.
Recovering to the
default settings (a) Select " To Default Settings " from the pop-up menu by
clicking the right button of mouse on the 3D structure window. (b) The REMus system will return the 3D structure back
to the original settings. Category 3. Select Subsegment
Select " Select Subsegment "
from the pop-up menu by clicking the right button of mouse on the 3D
structure window, and the system will pop-up a dialog box to let users select
a subsegment from the 3D protein structure. There
are two methods to select a specified subsegment
shown in yellow. Set the beginning
position and the length of the subsegment, click
"Display" button to view the selected subsegment.
Type or paste the
segment sequence and click "Find" button. The REMUS will
search the pattern and show it in yellow if the segment can be found from the
structure.
Category 4. Lock
Color
After users select
"Lock Color" from the pop-up menu by clicking the right button of
mouse on the 3D structure window. If the function is selected, you can retain
the previous color settings. In other words, the system can show charge
property, hydrophilic property, searched unique segments, secondary
structures, and selected subsegment in different
colors simultaneously. Category 5. Rotation
Matrix
Users can reserve the
current rotation matrix after performing a series of geometric
transformations or re-load an existing rotation matrix. Two ways to rotate
structures are described in the followings. Export: (a) Select
"Rotation Matrix --> Export... " from
the pop-up menu by clicking the right button of mouse on the 3D structure
window. (b) Select an
appropriate file path and type a proper name to save the rotate information. (c) Click save. Now the file that contains current geometric
information of structure is stored according to the specified file path and
filename in your disk. Import (a) Select
"Rotation Matrix -> Import... " from
the pop-up menu by clicking the right button of mouse on the 3D structure
window. (b) Select one of the
files created by "Export" function and click it to open. (c) Now the current
protein 3D structure will rotate automatically according to the specified
file. Category6: Saving
Users can save the
searched unique segments as a new output file to store current structure information,
which including protein 1D sequence, PDB file with new coordinates, and
protein secondary structures information. They are described in the
following. Save 1D Sequence : (a) Select " Save 1D Sequence " from the pop-up menu by
clicking the right button of mouse on the 3D structure window. (b) Select a direcotry path and type an appropriate file name to save
the outputs. This file contains
the extracted 1-D sequence information from the original PDB files. Save Current
Coordinates in a PDB File (a) Select "Save
New PDB File " from the pop-up menu by clicking
the right button of mouse on the 3D structure window. (b) Select the
directory path the same as the imported PDB files. (c) Click
"Save" to create a new PDB file with the current adjusted 3D
coordinates. The saved file name is the same as the selected file name with
".new" extension. Save Secondary
Structures Information (a) Select "Save
Secondary Info... " from the pop-up menu by
clicking the right button of mouse on the 3D structure window. (b) Select a
directory path and type an appropriate file name. (c) Click
"Save" to record the protein secondary structure information which
is retrieved from the original PDB files. |
|
Descriptions
of the REMUS output Automatic REMUS The automatic REMUS
system extracts unique peptide motifs with a degree value above the default
lower limit after performing the strict merging operation. The segments in light blue or dark
blue are the unique peptide motifs that strictly fit in with a length
longer than the default limits (8 for amino acid and 15 for DNA sequences),
whereas the segments in purple are the
overlapping regions between two unique peptide motifs. Semi-Automatic REMUS The semi-automatic REMUS
system provides four different merging methods to enhance different
uniqueness levels. The searched results are shown in different colors to
distinguish their performance, each condition is
described in the following section. Case 1. Merge Operator Case 2. Trim
Merge Operator Case 3. Strict
Merge Operator Case 4. Strict
Trim Merge Operator |
|
The priority for
ranking the unique peptide motifs The unique peptide motifs are
ranked according to four characteristics: hydrophilicity
and charge of the residues, number of prolines in
the motifs, and the proximity of the motifs towards the N- or C-terminal of
the protein. The higher position in the rank indicates the better antigenicity of the unique peptide motif; hence the motif
may be a suitable epitope for generation of
antibodies. |