The cell line transcriptome

The word transcriptome refers to the full set of transcribed RNA molecules within a cell at a given time point. In contrast to the genome, which is characterized by its stability over different cells within an organism, the transcriptome varies greatly. This plastic nature of the transcriptome has made it appealing to study, owing to its potential to serve as a proxy for cellular identity and diversity. In the Cell Atlas all 19670 protein-coding human genes are classified according to their expression across a large number of in vitro cultured cell lines (Figure 1, Thul PJ et al, 2017). The cell lines have been harvested during log phase of growth and high quality extracted mRNA was used as input material for library construction and subsequent sequencing. The expression level of gene-specific transcripts is given as NX values. Genes with a NX value ≥1 are considered as detected. Altogether the transcriptome of 64 cell lines have been analyzed to form a basis of different expression categories.

Approximately one third of all protein-coding genes (n=6213) were expressed in all cell lines, indicating a vital function for the corresponding proteins (Figure 1). 2% (n=454) of all genes were not detected in any of the analyzed cell lines, suggesting that corresponding proteins are only expressed in highly specialized cell types, during specific developmental stages or under specific conditions such as cell stress. 8% (n=1523) of the protein-coding genes show a more restricted pattern of expression across the analyzed cell lines, some expressed in only a few or even just a single cell line. In Table 1 the specific expression profile for each analyzed cell line is shown with clickable numbers for total detected genes, cell line enriched genes, group enriched genes and cell line enhanced genes.

Figure 1. Pie chart showing the number of genes in the different RNA-based categories of gene expression in the panel of cell lines.

Table 1. Table showing the number of detected genes per cell line based on RNA sequencing (NX ≥1), and the number of genes in the enriched and enhanced categories.

Cell line
Detectable genes
Enriched genes
Group enriched genes
Enhanced genes
A-431 11381 8 29 282
A549 11772 12 39 330
AF22 11840 28 96 544
AN3-CA 11358 21 33 354
ASC diff 11377 31 69 571
ASC TERT1 11406 2 41 479
BEWO 11784 66 116 624
BJ 11660 3 22 272
BJ hTERT+ 11574 18 40 397
BJ hTERT+ SV40 Large T+ 11313 0 11 120
BJ hTERT+ SV40 Large T+ RasG12V 11362 1 10 140
CACO-2 11534 27 89 460
CAPAN-2 11997 19 65 543
Daudi 10316 14 79 386
EFO-21 12282 22 75 455
fHDF/TERT166 11445 8 25 390
HaCaT 11772 25 87 479
HAP1 11300 6 48 254
HBEC3-KT 11149 6 30 255
HBF TERT88 10889 0 3 106
HDLM-2 11128 89 81 577
HEK 293 11923 16 41 414
HEL 11170 60 114 475
HeLa 11871 20 45 394
Hep G2 11372 106 135 474
HHSteC 11304 6 33 326
HL-60 10200 4 29 223
HMC-1 11545 75 104 673
HSkMC 11801 16 73 495
hTCEpi 11291 20 57 382
hTEC/SVTERT24-B 11325 2 11 167
hTERT-HME1 10825 4 22 236
HUVEC TERT2 11102 16 72 352
K-562 10734 24 75 329
Karpas-707 11098 37 97 659
LHCN-M2 11204 12 26 262
MCF7 11374 14 30 466
MOLT-4 10412 38 54 279
NB-4 11282 29 88 518
NTERA-2 12338 57 143 579
PC-3 11739 9 42 339
REH 10922 20 61 345
RH-30 11213 39 49 362
RPMI-8226 11111 36 96 502
RPTEC TERT1 11748 41 67 456
RT4 11639 46 84 545
SCLC-21H 12421 113 195 810
SH-SY5Y 12204 62 134 656
SiHa 11428 4 27 244
SK-BR-3 11253 42 70 569
SK-MEL-30 11415 32 47 373
T-47d 11783 24 61 517
THP-1 11544 38 88 447
TIME 11362 5 61 441
U-138 MG 11448 7 16 258
U-2 OS 12626 42 74 452
U-2197 11403 22 40 368
U-251 MG 11118 2 11 141
U-266/70 11680 52 118 732
U-266/84 11076 32 97 479
U-698 10248 22 65 386
U-87 MG 11807 17 36 422
U-937 10940 22 83 393
WM-115 11697 17 44 366

The cell line transcriptome was compared with the transcriptome of 37 different normal tissues and organs (Uhlén M et al, 2015). 61 genes were only expressed in cell lines and not in any of the analyzed normal tissue types. These genes serve an interesting starting point to study the function and role of corresponding proteins in human biology. Furthermore, 299 genes were only found to be expressed in normal human tissues but not in any of the analyzed cell lines. Several of the proteins corresponding to these genes have functions associated with differentiated cells in specialized tissues or subcompartments of tissues, exemplified by ACR (acrosin) the major proteinase present in the acrosome of mature spermatozoa in normal testis.

  • 61 genes found only in cell lines and not tissues
  • 299 genes found only in tissues and not cell lines

A diversity of cell lines

The 64 different cell lines used in the Human Protein Atlas have been selected to represent various cell populations in different tissue types and organs of the human body. A vast majority of the selected cell lines have been derived from human cancer and thus are best described as human cancer cell lines with limited resemblance to normal cell types. Cell lines are in general adapted to cultivation in vitro and can only approximate the lives of normal cells that perform their function in a complex tissue content. As cancer is a composite tissue with heterogeneous cancer cell populations in addition to the stromal component, it is not surprising that several features of a normal cell corresponding to the putative progenitor cell are lacking in the corresponding cancer-derived cell line. Despite the evident differences between primary cells in tissue and in vitro cultured cell lines, a global analysis based on an unbiased hierarchical clustering analysis (Figure 2) shows that cell lines in fact do cluster as expected from similarities in origin and phenotype of the cancer cells from which the respective cell line was derived from. This can be exemplified by the derivatives of the isogenic BJ fibroblast model that mimics the four stages of malignant transformation (normal, immortalized, transformed and metastasizing) by cumulative addition of defined genetic elements (Hahn WC et al, 1999). At the highest level of separation, cell lines that grow in solution and also represent hematopoietic and lymphoid cell systems cluster together and separate into two major clusters dependent on myeloid or lymphoid origin/phenotype. Moreover, several related cell lines cluster together such as the versions of immortalized and transformed fibroblastic cell lines (BJ derivatives), glioma (U-138 MG and U-251 MG), melanoma (WM-115 and SK-MEL-30), breast cancer (SK-BR-3, MCF7 and T47d) and endothelial cell lines (TIME and HUVEC).

The selection of human cancer cell lines for the Cell Atlas was aimed to correspond to the origin and phenotype of solid cancer types represented in the Pathology Atlas of the Human Protein Atlas (Uhlen et al., 2017). A special emphasis has been made to represent cells in the hematopoietic and immune system as these corresponding tumor types are more scarcely represented in the Cancer Atlas. Data from altogether 7 and 8 cell lines representing different stages of myeloid and lymphoid differentiation, respectively, has been generated and analyzed. In addition to cancer-derived cell lines there are also a number of cell lines that have been generated through in vitro protocols for immortalization of growing cells as well as stem cells. Details regarding the different cell lines can be found here.


Figure 2. Hierarchical clustering based on RNA sequencing data for the 64 cell lines. The color of the cell line name represents its origin: light purple - lymphoid, grey - myeloid, dark blue - lung, dark green - brain, light blue - sarcoma, light green - renal, urinary and male reproductive system, red - breast and female reproductive system, beige - skin, orange - miscellaneous, mint green - endothelial, yellow - fibroblast, light orange - abdominal. Cells immortalized by the introduction of telomerase are indicated by an asterisk.

Cell line enriched genes

A majority of the cell line enriched genes also belong to the tissue elevated gene expression categories (tissue enriched, group enriched and tissue enhanced). The expression pattern in normal tissues and function of these proteins relate to the specific traits and functions of the corresponding normal tissue type and organ. Examples are presented in Figure 3 and include: The secreted proteins AHSG and ALB that are only expressed in normal liver and the liver derived cell line Hep-G2, where immunofluorescent analysis shows localization to the Golgi apparatus and the Golgi apparatus together with ER, respectively. The transcription factor HOXB13 that is only expressed in the nuclei of prostate, colon and rectum tissue as well as in the prostate-derived cell line PC-3. The adhesion glycoprotein CDH15 that is enriched in skeletal muscle tissue and in the sarcoma cell line RH-30. The enzyme TYR that is exclusively expressed in skin and in the melanoma derived cell line SK-MEL-30. The epidermal growth factor receptor EGFR enriched in female tissues and skin, and in the skin-derived cell line A-431.

The RNA-seq data for all 64 cell lines expressing 98% (n=19216) of all protein-coding human genes are presented in the Cell Atlas and can be used as a tool for selection of suitable cell lines for an experiment involving a particular gene or pathway or for further studies on the transcriptome of established human cell lines.


AHSG

ALB

HOXB13

AHSG - Hep G2

ALB - Hep G2

HOXB13 - PC-3

CDH15

TYR

EGFR

CDH15 - RH-30

TYR - SK-MEL-30

EGFR - A-431

Figure 3. Examples of proteins with enriched expression in a cell line and the corresponding tissue of origin. The proteins are AHSG, ALB, HOXB13, CDH15, TYR, and EGFR. The immunohistochemical (IHC) staining shows the protein expression pattern in tissue in brown. The immunofluorescent (IF) staining shows the protein subcellular expression pattern in cell lines in green. The nucleus and microtubules are shown in blue and red respectively in the IF images.

Relevant links and publications

Thul PJ et al, 2017. A subcellular map of the human proteome. Science.
PubMed: 28495876 DOI: 10.1126/science.aal3321

Uhlén M et al, 2015. Tissue-based map of the human proteome. Science
PubMed: 25613900 DOI: 10.1126/science.1260419

Hahn WC et al, 1999. Creation of human tumour cells with defined genetic elements. Nature.
PubMed: 10440377 DOI: 10.1038/22780

Uhlen M et al, 2017. A pathology atlas of the human cancer transcriptome. Science.
PubMed: 28818916 DOI: 10.1126/science.aan2507 Cellosaurus