PAComplex: Document

Introduction:

What can PAcomplex do for you?

The PAComplex is a novel peptide antigen search server that is useful for analyzing novel peptide antigens and inferring homologous peptides. To our best knowledge, PAComplex is the first web server investigating both peptide-MHC and peptide-TCP interfaces to infer peptide antigens and homologous peptide antigens of a query from complete pathogen genome databases and experimental peptide databases. For a query, PAComplex shows the detailed atomic interactions, binding models, homologous peptide antigen, joint Z-value of each peptide candidate. On the other hand, most of the current peptide-antigen prediction web servers only provide ranks and scores of the query protein sequence or peptides except the MODPROPEP server (Table 1).

Table 1. Comparing PAComplex with four web servers for MHC-peptide interactions
Features	PAComplex	SYFPEITHI	SVMHC	PREDEP	MODPROPEP
Input format	Protein sequence / Peptide sets	Protein sequence	Protein sequence / Swiss-Prot AC/ID / RefSeq ID	Protein sequence	Protein sequence / Peptide sets
Technique / Algorithm	Template-based model and Homologous search	Motif pattern	Machine learning	Protein sequence / Peptide sets	Protein sequence / Peptide sets
Peptide-MHC	Yes	Yes	Yes	Yes	Yes
Peptide-TCR	Yes	No	No	No	No
Cutoff of prediction	Yes (Joint Z-value ≥ 4)	No	Yes (Score > 0)	No	No
Binding model visualization	Yes	No	no	No	Yes
Interacting type	Hydrogen bonds and VDW forces	No	no	No	No
Homologous peptides	Yes	No	no	No	No
Detabase search	Yes	No	no	No	No

Input Format:

There are two ways to submit data for prediction. Sequence can be pasted in Fasta format or listed by specific length of peptides.

FASTA Format of protein sequence (example 1 : Quick link of search result)

The inputted sequence is in FASTA format consists of a single-line description, followed by lines of sequence data. The first character of the description line is a ">" symbol in the first column. For example:

>sp|P03155|DPOL_HBVD1 Protein P (Fragment) OS=Hepatitis B virus genotype D subtype adw (isolate United Kingdom/adyw/1979) GN=P PE=1 SV=1
MPLSYQRFRRLLLLDDEAGPLEEELPRLADEDLNRRVAEDLNLGNLNVSIPWTHKVGNFT
GLYSSTVPVFNPHWKPPSFPNIHLHQDIIKKCEQFVGPLTVNEKRRLKLIMPARFYPNFT
KYLPLDKGIKPYYPEHLVNHYFQTRHYLHTLWKAGVLYKRVSTHSASFCGSPYSWEQELQ
HGAESFHQQSSGILSRPPVGSSLQSKHQQSRLGLQSQQGHLARRQQGRSWSIRARVHPTA
RRPFGVEPSGSGHNANLASKSASCLYQSPVRTAAYPAVSTSENHSSSGHALELHNLPPNS
ARSQSERPVFPCWWLQFRDSKPCSDYYLSHIVNLLEDWGPCAEHGEHHIRIPRTPARVTG
GVFLVDKNPHNTAESRLVVDFSQFSRGNYRVSWPKFAVPNLQSLTNLLSSNLSWLSLDVS
AAFYHLPLHPAAMPHLLVGSSGLSRYVARLSSNSRIINHQHGILQNLHDSCSRNLYVSLL
LLYKTFGWKLHLYSHPIILGFRKIPMGVGLSPFLLAQFTSAICSVVRRAFPHCLAFSYMD
DVVLGAKSVQHLESLFTAVTNFLLSLGIHLNPNKTKRWGYSLNFMGYVIGCWGSLPQDHI
IHKIKECFRKLPVHRPIDWKVCQRIVGLLGFAAPFTQCGYPALMPLYACIQSKQAFTFSP
TYKAFLCKQYLNLYPVAEQRPGLCQVFADATPTGWGLVMGHQRMRGTFLAPLPIHTAELL
AACFARSRSGANILGTDNSVVLSRKYTSFP

The list of peptide sequences (example 3)

A list consists of specific length peptide in single letter code. For example:

ALWGFFPVL

SLLMWITQV

GILGFVFTL

RLWHYPCTI

SIVAYTMSL

DLMGYIPAV

IMSSFEFQV

ALWDSNFFT

CNYSKFWYL

YLVIYLNRT

MQWLTQYYI

Scoring function:

Joint Z-value

To evaluate the complex similarity between TCR-pMHC and TCR-p'MHC, we define the joint Z-value (J_z) as:

The Z_MHC and Z_TCR of a TCR-p'MHC candidate with interaction score (E) can be calculated by (E–μ)/σ where μ is the mean and σ is the standard deviation from 10,000 random interfaces. For a TCR-pMHC template collected from Protein Data Bank (PDB), these 10,000 random interfaces are generated by substituting with another amino acid according to the amino acid composition derived from UniProt.

Knowledge-based scoring matrices for peptide-MHC interactions

Knowledge-based peptide-MHC interacting scoring matrices: (A) sidechain-sidechain van-der Waals scoring matrix; (B) sidechain-backbone van-der Waals scoring matrix; (C) sidechain-sidechain special-bond scoring matrix; (D) sidechain-backbone special-bond scoring matrix. The sidechain-sidechain scoring matrices are symmetric and sidechain-backbone scoring matrices are nonsymmetric. For sidechain-sidechain van-der Waals scoring matrix, the scores are high if large-aliphatic residues (i.e. Val, Leu, Ile, and Met) interact to large-aliphatic residues or aromatic residues (i.e. Phe, Tyr, and Trp) interact to aromatic residue. In contrast, the scores are low when nonpolar residues interact to polar residues. For sidechain-sidechain special-bond scoring matrix, the scores are high when an interacting resides (i.e. Cys to Cys) form a disulfide bond or basic residues (i.e. Arg, Lys, and His) interact to acidic residues (Asp and Glu). The scoring values are zero if nonpolar residues interact to other residues.

Knowledge-based scoring matrix for peptide-TCR interactions

This matrix is a residue-based matrix derived from a non-redundant set which consists of 62 structural antigen-antibody complexes constructed by Ponomarenko et al. The interface prefers aromatic residues (i.e. Phe, Trp, and Tyr), which interact with aliphatic residues (i.e. Ala, Val, Leu, Ile, and Met) or long side-chain polar residues (i.e. Gln, His, Arg, Lys, and Glu), to form strong van der Waals forces (yellow boxes). Additionally, the scores are high if basic residues (i.e. Arg and Lys) interact with acidic residues (i.e. Asp and Glu). In contrast, the scores are low (purple box) when nonpolar residues interact to polar residues.