Secreted proteins

Secretory proteins, together making up the secretome, can be defined as proteins that are actively transported out of the cell. In humans, cells such as endocrine cells and B-lymphocytes are specialized in protein secretion, but all cells secrete proteins to a certain extent. Proteins that are secreted from the cell play a crucial role in many physiological, developmental and pathological processes and are important for both intercellular and intracellular communication. In addition to being a rich source of new therapeutics and drug targets, a large fraction of the blood diagnostic tests used in the clinic are directed towards secreted proteins, emphasizing the importance of this class of proteins for medicine and biology. Medically important secreted proteins include cytokines, coagulation factors, growth factors and other signaling molecules. We predict 1708 proteins, or 9% of the human proteome, to be secreted based on results from multiple prediction methods.

Function of the secretory pathway

The most common secretion pathway is the secretory pathway (Figure 1). Newly synthetized proteins are transported from the endoplasmic reticulum (ER), passing the Golgi apparatus and packed into vesicles. The vesicles are then transported to the plasma membrane. Vesicles and plasma membrane merge, thereby releasing proteins into the extracellular space (exocytosis). The signal sequence that target proteins for secretion to the ER is called a signal peptide (SP) and consists of a short, hydrophobic N-terminal sequence, which is inserted into the ER membrane and subsequently cleaved off from the protein (von Heijne G. 1985). Membrane proteins may also contain a SP, but most often the N-terminal transmembrane (TM) region functions as the signal sequence. The signal sequences are recognized by chaperone proteins that guide the synthesizing ribosomes to the rough ER where a co-translational translocation of the protein sequence occurs in a protein complex named the translocon (Johnson AE et al, 1999). Membrane proteins are transferred to the lipid bilayer of the ER membrane via the translocon, whereas secretory proteins are transported into the ER lumen. Proteins that pass the quality control in the ER lumen are transported via vesicles to the Golgi apparatus, where they are further modified and sorted for transport to their final destination, which most often is the plasma membrane, lysosomes or secretion out from the cell.


Figure 1. Overview of the secretory pathway.

The functions of protein sectetion are diverse. Signaling between or within cells via secreted signaling molecules can be paracrine, autocrine, endocrine or neuroendocrine depending on the target (Nussey S et al, 2001). Among the most important signaling proteins that act by bindinding to receptors on the surface of target cells are cytokines, kinases, hormones and growth factors (Farhan H et al, 2011). A large fraction of the clinically approved treatment regimens today use drugs directed towards (or consisting of) secreted proteins or cell surface-associated membrane proteins. Out of the 754 protein targets with known pharmacological action for approved drugs on the market at present (Wishart DS et al, 2006), 163 are predicted to be secreted.

Secreted proteins are often enriched in the organelles of the secretory pathway (ER, Golgi apparatus, vesicles), before they are released to the extracellular matrix. This enables a detection of the protein by IF, although their final destination lies outside of the cell. In Figure 2, IF images of three predicted secreted proteins are shown.


CHGB - SH-SY5Y

SCG3 - SH-SY5Y

NPY - SH-SY5Y

Figure 2. Examples of three different predicted secreted proteins are shown in the neuron-like SH-SY5Y cell line: CHGB and SCG3 are found in secretory vesicles, while NPY is enriched in the Golgi apparatus.

Prediction of secreted proteins

Secreted proteins can often be identified based on their SPs, which have a number of features suitable for computational prediction models. The SP is typically 15-30 amino acids long and primarily recognized by a short hydrophobic and mostly positive N-terminal alpha-helix (n-region) combined with a hydrophobic h-region and a C-terminal polar uncharged c-region (Emanuelsson O et al, 2007). There are many algorithms which use these features to predict the location of a SP in a protein, and there are also a number of methods which incorporate a SP prediction model into their transmembrane (TM) topology prediction algorithm to allow for more reliable results when it comes to distinguishing an SP and a TM segment.

The human 'secretome' can be defined as all genes encoding at least one secreted protein and has been analyzed here by performing a whole-proteome scan using three methods for signal peptide prediction: SignalP4.0 (Petersen TN et al, 2011; Käll L et al, 2004) , Phobius and SPOCTOPUS (Viklund H et al, 2008), which have all been shown to give reliable prediction results in comparative analyses. A majority decision-based method (MDSEC) has been constructed using the results from the three different SP prediction methods to obtain a list of predicted secreted proteins (Uhlén M et al, 2015). All proteins with a predicted SP by at least two of the three methods are considered secreted and these were further annotated in order to exclude genes that are predicted to reside in intracellular locations such as ER or Golgi, despite having a signal peptide prediction, from the set. Since signal peptides are found both in secreted proteins and in certain types of membrane proteins, the results were filtered using the majority decision-based method (MDM) for membrane protein topology prediction (Fagerberg L et al, 2010). All proteins with a predicted SP in combination with a predicted TM region according to the MDM are considered membrane-spanning and therefore not secreted. The resulting numbers of genes encoding a predicted secreted protein based on the three methods as well as the majority-decision based method and the result from annotation of the secretome are shown in Table 1. The resulting lists of predicted secreted proteins as well as predicted membrane proteins were used as a classification of the human proteome.

Table 1. Prediction of the human secretome by three different prediction methods for signal peptides as well as the MDSEC and the final prediction resulting from manual annotation.

Protein class
Number of genes
Number of proteins
Source
Predicted secreted proteins 1708 4361 HPA
Secreted proteins predicted by MDSEC 2943 6743 HPA
SignalP predicted secreted proteins 2525 5816 SignalP
Phobius predicted secreted proteins 3338 7613 Phobius
SPOCTOPUS predicted secreted proteins 3710 8165 SPOCTOPUS

Expression levels of secreted proteins in tissue

An analysis of tissue distrubution categories based on RNA-seq data shows that a larger fraction of the genes encoding secreted proteins belongs to the tissue enhanced, tissue enriched or group enriched genes, compared to all genes presented in the Cell Atlas (Uhlén M et al, 2015) (Figure 3). Only a relatively small portion of the genes in the secretome show low tissue specificity. This is in agreement with tissue specific funtions for many secreted proteins. The secreted class contains many of the most abundantly expressed genes and the highest expression levels of secreted proteins are found in pancreas and salivary gland.

****Not detectedLow tissue specificityTissue enhancedGroup enrichedTissue enriched0.0102030405060708090100%Secreted proteinsAll localized genes

Figure 3. Bar plot showing the percentage of genes in different tissue specificity categories for secreted protein-coding genes, compared to all genes in the Cell Atlas. Asterisk marks a statistically significant deviation (p≤0.05) in the number of genes in a category based on a binomial statistical test. Each bar is clickable and gives a search result of proteins that belong to the selected category.

Relevant links and publications

Thul PJ et al, 2017. A subcellular map of the human proteome. Science.
PubMed: 28495876 DOI: 10.1126/science.aal3321

Emanuelsson O et al, 2007. Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc.
PubMed: 17446895 DOI: 10.1038/nprot.2007.131

Fagerberg L et al, 2010. Prediction of the human membrane proteome. Proteomics.
PubMed: 20175080 DOI: 10.1002/pmic.200900258

Farhan H et al, 2011. Signalling to and from the secretory pathway. J Cell Sci.
PubMed: 21187344 DOI: 10.1242/jcs.076455

Johnson AE et al, 1999. The translocon: a dynamic gateway at the ER membrane. Annu Rev Cell Dev Biol.
PubMed: 10611978 DOI: 10.1146/annurev.cellbio.15.1.799

Käll L et al, 2004. A combined transmembrane topology and signal peptide prediction method. J Mol Biol.
PubMed: 15111065 DOI: 10.1016/j.jmb.2004.03.016

Nussey S et al, 2001. Endocrinology: An Integrated Approach. Oxford: BIOS Scientific Publishers.

Petersen TN et al, 2011. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods.
PubMed: 21959131 DOI: 10.1038/nmeth.1701

Uhlén M et al, 2015. Tissue-based map of the human proteome. Science
PubMed: 25613900 DOI: 10.1126/science.1260419

Viklund H et al, 2008. SPOCTOPUS: a combined predictor of signal peptides and membrane protein topology. Bioinformatics.
PubMed: 18945683 DOI: 10.1093/bioinformatics/btn550

von Heijne G. 1985. Signal sequences. The limits of variation. J Mol Biol.
PubMed: 4032478 

Wishart DS et al, 2006. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res.
PubMed: 16381955 DOI: 10.1093/nar/gkj067