The expression of the vast majority of the genes remains unchanged during the complex process of tumorigenesis. Indeed, a pioneering study found that the expression of no more than 1–1.5 percent of the genes was significantly altered in colon cancer, as compared to normal colon tissue.34 Nonetheless, the analysis of human cancer with the techniques described above typically identify hundreds of genes differentially expressed between normal tissues and malignant specimens. Clearly, prioritization rules according to the identity of the genes uncovered or according to specific sets of criteria must be implemented. Efficient techniques for the validation of these genes have been described in the previous section. In addition to the identification of specific differentially expressed genes, global expression profiles can be used for tumor taxonomy. This section will illustrate these principles through the description of some of the most important results of gene profiling obtained in common malignancies.
By using SAGE, one study analyzed more than 300,000 transcripts derived from colorectal cancers, pancreatic cancer, and normal colon epithelium. While all the abundant transcripts (more than five copies per cell) represented 75 percent of the mRNA mass, the rare transcripts were responsible for much of the diversity of gene expression: 86 percent of all the different genes were expressed at less than five copies per cell. Interestingly, and perhaps unexpectedly, most of the genes were expressed at similar levels between normal and cancer cells. Indeed, in the case of normal and neoplastic colon, only 548 genes were differentially expressed (less than 1.5 percent of the transcripts present in a given cell). Many genes elevated in cancer represented products known to be involved in growth and proliferation, while genes found in the normal colon were often related to differentiation. Importantly, many of the individual genes found to be differentially regulated may represent targets for mechanism-based therapy or biomarkers for diagnosis. In another study, an Affymetrix oligonucleotide array containing 6500 genes was used to investigate 40 colon tumors and normal colon tissues.66 Using two-way clustering, clusters of gene with similar patterns of gene expression were identified. Some of these clusters may represent the activation of molecular pathways relevant to colon tumorigenesis.
Gene expression in breast cancer has been monitored by using differential display, cDNA arrays, and SAGE in a variety of experimental systems. In a SAGE study of normal and neoplastic breast tissue, at least 50,000 transcripts were analyzed from 4 libraries and highly differentially expressed genes were identified. Small custom arrays were used to validate the genes identified. Claudin-7 was found up-regulated more than a hundredfold in 85 percent and 60 percent of the primary and metastatic tumors, respectively. While differences in gene expression levels can be subtle in other diseases, many genes appear to be vastly differentially regulated in cancer. Similarly, another study used a combination of differential display and cDNA arrays to gain a better understanding of gene expression patterns in breast cancer.67 This study identified 700 genes differentially expressed between normal and cancer cells, and a cDNA array containing 107 of these genes was constructed. Most of the genes highly expressed in normal cells, and down-regulated in cancer cells, represented genes important for cell adhesion, communication, and maintenance of cell shape. In contrast, most of the genes elevated in cancer were those encoding enzymes involved in metabolism, macromolecular synthesis, and disruption of the extracellular matrix. By using the custom cDNA array, clusters of genes were identified that were associated with relevant clinical parameters, such as estrogen receptor (ER) status, stage, and tumor size. Overall, gene expression patterns allowed the clustering of breast tumors into two major groups that differed in their ER status.
Other studies with large cDNA arrays contributed significantly to our understanding of gene expression in breast cancer.68,69 Many clusters of genes with related expression patterns were identified. For example, an interferon (IFN)-regulated gene cluster and a proliferation cluster (correlated with mitotic index) were found. Importantly, gene expression clusters corresponding to noncancer tissue such as stroma, lymphocytes, and endothelium were also recognized. These issues are important when primary tumors are analyzed because the gene expression profiles represent a complex environment of interacting tissues. While the large arrays failed to distinguish multiple tumor categories, a smaller, more focused array containing 496 genes clustered the tumors into two groups according to their ER status, 69 in a fashion reminiscent of the differential display study described above.67 The small array further divided ER-negative breast cancers into two groups. It is unclear whether these two categories may have divergent clinical characteristics, but this experiment emphasizes the power of these techniques in cancer taxonomy. These results suggest the possibility that gene expression patterns may be used effectively for diagnosis and therapeutic decisions in breast cancer. In yet another study, laser capture microdissection, an important validation tool, was used in combination with cDNA arrays for the identification of genes relevant to breast tumorigenesis.70
Germ-line mutations in BRCA1 and BRCA2 confer a significant risk of breast and ovarian cancer. Using a large cDNA array, it was recently shown that BRCA1 and BRCA2 tumors could be distinguished from each other and from sporadic breast cancer on the basis of gene expression profiles.71 Indeed, all the tumors with BRCA1 mutation, and 14 of 15 without the mutation, were appropriately recognized in the BRCA1 classification. Similarly, accurate classification was obtained with BRCA2 tumors. A total of 176 genes were found differentially regulated between BRCA1 and BRCA2 tumors. Interestingly, BRCA1 tumors exhibited increased expression of genes involved in response to cellular stress. A sporadic tumor, which clustered with BRCA1 tumors, proved to exhibit hypermethylation of the BRCA1 promoter. Gene profiling may thus help in the identification of breast cancer genetic status, including the identification of BRCA1 or BRCA2-like phenotype. These different categories may be useful in patient management, as patients with BRCA1-like tumors may require more rigorous follow-up.
The vast majority of ovarian cancers are diagnosed in late stages and a major emphasis of functional genomics approaches is to identify biomarkers. Schummer et al. constructed a cDNA array consisting of 21,500 randomly selected transcripts from ovarian cDNA libraries.72 The vast majority of genes were expressed at similar levels in ovarian cancer and ovarian surface epithelium, the presumed normal counterpart of ovarian cancer. However, they were able to identify cDNAs that were expressed more than 2.5-fold in at least 50 percent of the tumors. These clones also had low levels of expression in nonovarian tissues. Many of these cDNAs were novel and corresponded to ESTs, and others had previously been implicated in various cancers. One candidate, HE4, a protease inhibitor, emerged as a promising candidate because it was highly up-regulated in many ovarian tumors and was found at low levels in other tissues. These findings were subsequently confirmed and extended in a SAGE study of ovarian cancer.73 An analysis was done of 385,000 transcripts from 10 different ovarian libraries, and differentially expressed genes were identified using a strict set of criteria. Selected genes had to be high in all three primary ovarian cancers and low in all three nonmalignant specimens. Twenty-seven genes were identified that met these criteria and that were overexpressed more than tenfold in ovarian tumors. Interestingly, a majority of those genes were predicted to encode membrane or secreted proteins, making them candidates for biomarkers or tumor targeting. Many of these secreted genes encoded protease inhibitors. Another study using a combination of cDNA-RDA and cDNA arrays also found a large number of genes encoding secreted products to be elevated in ovarian cancer.74
Prostate cancer originally responds to hormone therapy but typically becomes refractory to this therapy and develops into an androgen-independent tumor.75 The elucidation of the molecular mechanisms accompanying this phenomenon has begun, but our understanding is still incomplete. To monitor gene expression changes that are associated with hormone-independent growth, the androgen-independent prostate cancer xenograft model CWR22-R and its parental androgen-dependent xenograft CWR22 were analyzed by cDNA microarray.76,77 Hybridization to a large cDNA array (10,000 clones) revealed that the expression of 160 genes was altered in CWR22 upon androgen removal. The pattern of gene expression changes suggested that the CWR22 cells were undergoing growth arrest upon androgen removal. Interestingly, the majority of these genes were expressed at similar levels between CWR22 and CWR22-R, suggesting that CWR22-R had adapted to growth without androgen and had reentered the cell cycle.77 Comparison of genes differentially expressed between CWR22 and CWR22-R allow the identification of genes that may be crucial in the progression from androgen dependence to androgen independence.76,77 Some of these genes were found to be involved in thyroid hormone receptor signaling. IGFBP2 and HSP27 were also found elevated in CWR22-R and were validated through the use of tissue microarrays.76
The classification of acute leukemias has long relied on the identity of the precursors. Lymphoid precursors give rise to acute lymphoblastic leukemia (ALL) while myeloid precursor give rise to acute myeloid leukemia (AML). The treatment regimen for these classes is distinct and accurate classification of these tumors can have a significant impact on survival. A cDNA array consisting of 6817 genes was used in order to determine whether global patterns of gene expression could be used to distinguish various classes in leukemias.78 The original data set, consisting of 38 bone marrow samples (27 ALL, 11 AML), demonstrated that a large number of genes appeared to be correlated with the AML-ALL class distinction. The 50 genes most closely correlated with class distinction were chosen and a class predictor algorithm was developed. In cross-validation analysis by using the original 38 bone marrow samples, 36 of these samples were correctly assigned to the clinical category (AML or ALL). The 50-gene predictor correctly predicted the tumor class in 29 of 34 additional acute leukemias. Interestingly, the number of genes included for prediction was not crucial, as the same results were obtained with predictors containing anywhere from 10 to 200 genes.
An important issue was whether gene expression profiling could be used to determine these classes without a priori knowledge of their existence. This is important because many cancers (prostate, for example) have a variable response to treatment, but cannot readily be divided into classes using current methods. Using self-organizing maps (SOMs), it was possible to identify two categories of acute leukemia that essentially fell along the known ALL-AML classes.78 Gene profiling could thus identify the two classes of leukemia without previous biological information. Astonishingly, considering the number of clinical specimens studied, SOM could further stratify the acute leukemia classes into four clusters. A first cluster corresponded to AML, a second cluster to T-lineage ALL, and two additional clusters corresponded to B-lineage ALL. The AML, T-cell ALL, and B-cell ALL are the most important clinical distinctions among acute leukemias. These studies demonstrate that gene profiling can accurately identify new classes of cancers (class discovery) and assign tumors to known classes (class prediction). Unfortunately, clinical outcome was not strongly correlated with a particular expression signature. In any event, because leukemic cells can easily be obtained as relatively pure population, these findings may have immediate and important clinical application.
Diffuse large B-cell lymphoma (DLBCL) is the most common non-Hodgkin lymphoma subtype. DLBCLs are highly heterogeneous, but attempts at further subclassification have failed. A cDNA array containing 17,856 clones was constructed from various lymphoid cell cDNA libraries.79 DLBCL exhibited a distinct and complex pattern of gene expression and displayed a lymph node signature. Importantly, reclustering the tumors by using genes of the germinal center (GC) B-cell cluster yielded two subtypes: the GC B-like DLBCLs and the activated-B-like DLBCLs. The expression of no single gene correlated well with the new subtypes, but only the analysis of the patterns of a large number of genes could identify these novel groups. Interestingly, these novel subtypes exhibited marked differences in prognosis. Indeed, 76 percent of GC B-like DLBCL patients were alive after 5 years, while only 16 percent of the activated B-like DLBCL patients were still alive after the same period. Gene profiling thus provides a new classification scheme for DLBCL that define prognostic categories. The molecular and clinical differences are significant and suggest that B-like DLBCL and activated B-like DLBCL may represent distinct diseases. Although this last example represents a clear case in which molecular signature involving large number of genes can be of use clinically, there are also examples of gene profiling identifying individual genes for diagnosis. For example, a recent study used Atlas cDNA arrays (Clontech) to identify the gene clusterin as a marker for anaplastic large-cell lymphomas.80
Thirty-one melanomas and 7 controls were hybridized to a cDNA array that contained probes for nearly 7000 genes.81 Although no classification schemes for melanoma existed, the gene expression data and hierarchical clustering analysis subdivided the tumors into two groups of 12 and 19. These two groups were analyzed for association with several clinical parameters, such as age and survival, but no associations were found. However, the larger cluster of tumors was predicted, from its expression signature, to consist of tumors with reduced motility and invasiveness. Indeed, these two groups showed differential responses in their ability to migrate into scratch wounds, contract collagen gels, and form tubular networks. Although the analysis did not show association with known clinical parameters, it nonetheless enabled the classification of melanoma into distinct and important classes related to the motility of the tumor cells, and identified genes that may play a role in the invasive ability of this cancer. Further analyses may allow the identification of optimized treatment for each of the classes or other parameters of clinical relevance for melanoma patients.
In another study, melanoma cells were selected for high metastatic potential in vitro and analyzed using cDNA arrays.82 Several genes involved in extracellular matrix assembly were elevated, including RhoC, which single-handedly enhanced metastasis when overexpressed in melanoma cells. A better understanding of gene expression in highly metastatic cells may lead to improve therapeutic strategies aimed at preventing invasion and metastasis. cDNA arrays and other gene profiling methods will undoubtedly continue to play a major role in this endeavor.
Most expression profiling for brain tumors has been applied to glioblastoma multiforme (GBM). DNA arrays, 83–85 SAGE, 40,86 and tissue arrays55 have all been applied to the study of the genes expressed in GBM and normal neural tissue. Even if the biological implications of the revealed patterns are not yet clear, there are practical uses for this data. One example is the use of large-scale expression data to find potential tumor markers or antigens for GBM.51 It is also likely that the pattern of expression will be useful for the classification of brain tumors, including the molecularly heterogeneous GBM classification.87
Brain tumors other than GBM have been studied by expression profiling. The major malignant pediatric brain tumor, medulloblastoma, has been studied by SAGE.88 Detailed SAGE expression profiles are also available for medulloblastomas and a variety of gliomas at the CGAP SAGEmap database.40
A series of 60 cancer cell lines of various histologic origins, known as the NCI60, forms the basis of the National Cancer Institute's cancer drug-screening program.89 Gene expression in these lines was studied by using a cDNA microarray consisting of approximately 8000 different genes.90 Except for breast and non–small cell lung carcinoma cell lines, the gene expression patterns clustered the lines according to their presumed histologic origin. The patterns of gene expression in the different tissue were thus sufficiently conserved in the cell lines to be grouped together although it is clear that the establishment of cancer lines is accompanied by changes in gene expression patterns. The clustering of the cell lines depended on the exact genes included in the analysis and other studies have shown that cell lines are significantly different from the tissue of origin in colon66 and ovarian cancer.73,74 In any event, analysis of the 60 cell lines allowed the identification of coordinately regulated cluster of genes. The clusters could be labeled according to the genes present in the cluster (proliferation cluster, interferon cluster) or to the patterns of expression of these genes (epithelial cluster, melanoma cluster). Much information might be gained concerning the microenvironment of tumors by comparing expression patterns between primary tumors and their corresponding in vitro cultures or cell lines.
The findings with the NCI60 described above validate the use of cell lines for in vitro manipulation such as treatment with hormone or chemotherapeutic drugs. Indeed, the same 60 cell lines were clustered according to the growth inhibitory activity (GI50) of 1400 compounds.91 The cell lines no longer clustered according to their tissue of origin, but according to their drug response. When a subset of these drugs with known mechanisms was used for analysis, several clusters corresponding to mechanisms of action emerged. This could clearly help to identify mechanism of action for unknown drugs. For example, 5-FU appeared with the RNA synthesis inhibitors, suggesting that the main activity of 5-FU may be as an RNA synthesis inhibitor. Further analysis allowed the identification of associations between clusters of genes and clusters of drugs. These relationships may help to identify a genetic basis for certain drug action. For example, an inverse relationship was found between dihydropyrimidine dehydrogenase (DPYD) and 5-FU potency. DPYD is a rate-limiting enzyme in 5-FU degradation. Most cell lines expressing low levels of DPYD were sensitive to 5-FU. DPYD may become useful as a prognosis marker.
Endothelial cells provide the blood supply to solid tumors and are therefore highly relevant to the process of tumorigenesis. A better understanding of angiogenesis may thus provide tools in the fight against cancer. SAGE was used to identify genes differentially expressed in vivo between endothelial cells derived from normal and malignant colorectal tissue.92 The study showed that at least 79 different genes are significantly differentially expressed between these tissues, including 46 that were specifically elevated in tumor-associated endothelial cells. On the basis of these results, it was concluded that neoplastic and normal endothelium are fundamentally different at the molecular level, suggesting that these differences may be clinically relevant. Nine SAGE tags elevated in the tumor corresponded to novel, uncategorized genes. These genes were named tumor endothelial marker (TEM), and designated TEM-1 to TEM-9. Further experiments confirmed the tumor endothelium-specific expression of these genes, not only for colorectal tumors, but also for other major tumor types. These TEMs, or other genes identified in this study, may become targets of antiangiogenic therapies.
Subtractive hybridization techniques and cDNA arrays have also been used for studying the process of angiogenesis.93,94 Overall, many known and novel genes have been implicated in this process. These candidates await testing as targets for therapeutic interventions.
Gene Profiling Techniques in the Identification of Targets of Specific Oncogenic Molecular Pathways
A main application of techniques such as differential display, SAGE, and cDNA microarrays has been the identification of downstream targets of specific pathways. For example, SAGE was used to identify many genes whose expression is believed to mediate p53-induced apoptosis.41 Many of these genes were novel and predicted to encode proteins involved in oxidative stress, providing a new paradigm for the mechanism of p53-mediated apoptosis. Similarly, SAGE was used to identify downstream targets of the APC/β-catenin pathway, a pathway activated in the vast majority of colon cancer.45,46 c-MYC and PPARΔ were both identified as direct transcriptional targets of the TCF-β-catenin transcription complex and provided important mechanistic insights into colon tumorigenesis.
cDNA arrays have also been used to identify genes relevant to specific cancer pathways. For example, superoxide dismutase was identified as a target of estrogen derivatives that could kill leukemia cells.95 In a different approach, ER-responsive breast cancer cells were treated with estrogen and analyzed by SAGE for expression changes leading to the identification of many, possibly useful, estrogen-regulated genes.48 Differential display was used in the identification of genes involved in Ras transformation.96,97 Drug resistance has also been studied extensively by gene profiling and genes relevant to cisplatin and taxol resistance have been identified.98,99 There are no doubts that gene profiling techniques will play a major role in the dissection of the myriad of molecular pathways important in human cancer. The examples above represent a minute fraction of the efforts that have already been dedicated toward this goal.