Exfoliated cytologic specimens from mouth (buccal) epithelium may contain viable cells, permitting assay of gene expression for direct and noninvasive measurement of gene-environment interactions, such as for inhalation (e.g., tobacco smoke) exposures. We determined specific mRNA levels in exfoliated buccal cells collected by cytologic brush, using a recently developed RNA-specific real-time quantitative reverse transcription-PCR strategy. In a pilot study, metabolic activity of exfoliated buccal cells was verified by 3-[4,5-dimethylthiazol-2-yl]-2,5- diphenyltetrazolium assay in vitro. Transcriptional activity was observed, after timed in vivo exposure to mainstream tobacco smoke resulted in induction of CYP1B1 in serially collected buccal samples from the one subject examined. For a set of 11 subjects, mRNA expression of nine genes encoding carcinogen- and oxidant-metabolizing enzymes qualitatively detected in buccal cells was then shown to correlate with that in laser-microdissected lung from the same individuals (χ2 = 52.91, P < 0.001). Finally, quantitative real-time reverse transcription-PCR assays for seven target gene (AhR, CYP1A1, CYP1B1, GSTM1, GSTM3, GSTP1, and GSTT1) and three reference gene [glyceraldehyde-3-phosphate dehydrogenase (GAPDH), β-actin, and 36B4] transcripts were performed on buccal specimens from 42 subjects. In multivariate analyses, gender, tobacco smoke exposure, and other factors were associated with the level of expression of CYP1B1, GSTP1, and other transcripts on a gene-specific basis, but substantial interindividual variability in mRNA expression remained unexplained. Within the power limits of this pilot study, gene expression signature was not clearly predictive of lung cancer case or control status. This noninvasive and quantitative method may be incorporated into high-throughput human applications for probing gene-environment interactions associated with cancer.
There is a considerable effort underway to identify noninvasively obtained biomarkers for assaying functional internal dose of an environmental exposure: (1) biological effect; (2) risk for disease; (3) presence of preclinical disease; diagnostics of overt disease; (4) efficacy of drug effect; or (5) toxicity from xenobiotics, including drugs. The search for exfoliated and easily accessible biological samples amenable to analysis, in addition to traditional source tissue such as blood, constitutes a major initiative in cancer research. Mutation and gene expression analyses have commenced for exfoliated cells from a variety of internal organs, including, for example, sputum as surrogate for lung (1 , 2) , urine as surrogate for bladder (3 , 4) , cervical secretions as surrogates for the uterine cervix (5) , and stool as surrogate for colon (6) .
Buccal mucosal cells (mouth cells) are epithelial in origin, are exposed to tobacco smoke and other environmental toxicants (as well as to nutrients and drugs, directly by mucosal contact and via the circulation), and are easily accessible for analysis. Exfoliated buccal cells have been used as sources for genomic DNA in a variety of settings (7, 8, 9, 10) . Buccal cell gene expression has been documented in surgical specimens, explants, and primary cultures of malignant cells (11, 12, 13) . However, at present, we are unaware of any reports of successful and quantitative gene expression studies from exfoliated or brushed exfoliated cytologic buccal specimens collected noninvasively.
Tobacco-related carcinogenesis in the human lung and other susceptible epithelia begins with the Phase I bioactivation of inhaled carcinogens and the incomplete quenching of those same bioactivated species by Phase II enzymes (14) . In the human lung, cytochrome P450 1B1 (CYP1B1) and glutathione S-transferase P1 (GSTP1) are among the most abundantly expressed Phase I and Phase II enzymes, respectively, at both the mRNA and protein levels (15) . Oral cavity carcinogenesis appears to commence with a similar metabolic pathway (16 , 17) .
In this article, we describe reproducible and quantifiable gene expression of carcinogen- and oxidant-metabolizing enzymes from brushed exfoliated buccal cells from human volunteers. This technology enables the direct measurement of gene-environment interaction. The main strategies involve use of a cytologic brush for enhanced collection of viable cells, aggressive RNA preservation, and meticulously optimized reverse transcription and PCR conditions, using qualitative and quantitative approaches that use a RNA-specific reverse transcription-PCR (RT-PCR) strategy (18) . Demonstration of this buccal cell expression-signature approach in a cancer biomarker development context was performed using multivariate analyses of carcinogen- and oxidant-metabolizing enzyme gene expression of individuals enrolled in a lung cancer case-control study.
MATERIALS AND METHODS
Recruitment and Diagnoses.
All procedures were performed under the auspices of the Albany Medical Center and New York State Department of Health institutional review boards. Recruits were part of the pulmonary medicine or thoracic surgery practices at Albany Medical Center and were otherwise destined for bronchoscopy or lung resectional surgery for clinical indications. After informed consent was obtained, the subject was interviewed, buccal mucosa was sampled by cytologic brush, and the subject was phlebotomized at the same encounter, in advance of the lung sampling procedure. Pathological diagnoses on lung tissue were available for 40 of 42 subjects; 2 of 42 individuals were termed lung cancer controls on clear, unambiguous clinical grounds alone. All lung cancer diagnoses were new (incident) bronchogenic carcinoma diagnoses at any clinical stage; enrollment, interview, and biospecimen collection occurred before any therapeutic interventions.
Interview Data Collected.
Subjects offered information on mainstream tobacco smoking history (type, amount, duration, and quitting date); environmental tobacco smoke exposure history; preexisting lung diagnoses (chronic obstructive pulmonary disease and interstitial lung disease); dietary history (daily average ingested servings of fruits, vegetables, cups of tea, supplements, and restrictions); asbestos- and radon-exposure history; occupational history; family history of cancer; and medication history (summarized in Table 1 ⇓ ). Precise, current tobacco exposure was verified by assay of plasma nicotine and cotinine in the Analytic Chemistry Laboratory of our institution, as described below.
Cytologic Buccal Epithelium Specimen Collection
Buccal biospecimen sampling by cytologic brush was performed by trained research nurses at a tertiary care center (Albany Medical Center).
A cytologic brush (e.g., Cytobrush Plus GT, Medscand Inc., Hollywood, FL) with tapering plastic bristles and a blunt end was placed in the mouth. Sufficient lateral pressure was applied to contact the buccal mucosa and bend the plastic shaft slightly. The brush was spun in place for 10 seconds in a single direction, while consistent pressure was applied to the mucosa. There were no significant episodes of irritation or bleeding. The brush was then withdrawn, and immediately plunged into the RNA preservative (e.g., RNAlater, Ambion, Austin, TX) at room temperature, and frozen for later use. Duplicate brushes were collected from each subject, one from each side of the inner cheek, at a single preoperative or prebronchoscopy time point, for these expression studies.
Laser-Capture Microdissected Lung Tissue
Lung tissue from a subset of subject donors (n = 11) was studied for qualitative gene expression. Nonmalignant lung tissue from surgical resections or fiberoptic endobronchial biopsies was flash-frozen within 10 to 15 minutes of resection and subjected to frozen microtome sectioning at −25°C, alcohol-based staining, and laser-capture microdissection (Pixcell IIe, Arcturus, Mountain View, CA; ref. 19 ) within a single day for each sample. We performed sampling of four areas of a frozen tissue section, using 200 pulses/area (each 30-μm pulse yielding 2 to 4 cells), so that each cap (of a 500-μL tube containing a RNA extraction buffer, as described below) contained 1,600 to 3,200 cells for qualitative mRNA-PCR. All lung samples in the study were affirmed as nonmalignant by gross and microscopic inspection.
The 3-[4,5-Dimethylthiazol-2-yl]-2,5- Diphenyltetrazolium (MTT) Metabolism Assay
The in vitro MTT assay of buccal cells assessed nonspecific mitochondrial metabolism based on succinate dehydrogenase activity, relying on the conversion of a yellow, water-soluble tetrazolium salt (MTT bromide, thiazolyl blue, Sigma, St. Louis, MO) to an insoluble purple crystalline formazan (Fig. 1) ⇓ . In this application, 50 μL of stock MTT (5 mg/ml in PBS) was added to 200 μL of a brushed buccal cell suspension and incubated for 4 hours at 37°C. Assessment by microscopic observation yielded the percentage of stained cells (20) .
The RNAeasy (Qiagen, Valencia, CA) total RNA extraction kit was used with minor adaptations of the manufacturer’s directions for repeated measures time-course study performed on the one subject, the 11-sample subset of laser-capture microdissection lung epithelium specimens analyzed qualitatively, and the 42 buccal cytologic sample sets analyzed quantitatively.
The buccal cytologic brush was thawed at 37°C for 2 minutes, spun at 20,000 × g in a microcentrifuge for 5 minutes, and removed with careful wiping along the sides, thus maximizing sample recovery. The 20,000 × g spin was repeated for 5 minutes, the supernatant was decanted, and 300 μL RLT/beta-mercapto ethanol (βME) buffer (RNAeasy kit) was added to the pellet and vortexed for 10 seconds.
For the subset of subjects (n = 11) providing laser-capture microdissected lung specimens for initial qualitative RNA-specific RT-PCR assays of the nine target and three reference housekeeper genes, the Arcturus Capsure cap was placed, immediately after alveolar tissue microdissection, in a 500-μL microfuge tube preloaded with 100 μL of RLT/βME buffer, inverted, and carried through the following steps in a manner identical to that used for the buccal cytologic brush samples.
The lysate was passed through a 22-gauge needle 5 to 10 times. One microgram of poly-C RNA was added to the lysate. An equal volume of 70% EtOH was added to the lysate, the solution was vortexed for 10 seconds, and then allowed to stand at room temperature for 1 minute. The lysate/EtOH was added to the column and centrifuged at 8,000 × g for 30 seconds. Five hundred-microliters RW1 buffer (RNAeasy kit) was added to the column and centrifuged at 8,000 × g for 15 seconds. Five hundred-microliters RPE/EtOH buffer (RNAeasy kit) was added to the column and centrifuged at 8,000 × g for 30 seconds. The column was then placed into a new 2-ml collection tube, and centrifugation was performed at 10,000 × g for 2 minutes to dry the membrane. The column was then placed in a RNase-free nonstick 1.5-ml tube (Ambion), 30 μL of kit-supplied H2O (prewarmed to 50 to 55°C) was added to the top of column, and the column was allowed to stand at room temperature for 2 minutes. RNA was eluted by centrifugation at 10,000 × g for 2 minutes. Quantitation of total RNA was challenging for these precious subject samples, so a representative set of buccal brushes was successfully analyzed through comparison to serial dilutions of a standard total RNA pool (Clontech, Palo Alto, CA). After isolation, total RNA was immediately frozen on dry ice and stored at −80°C for later use. Relative concentrations of target transcript were normalized to valid internal RNA and external RNA transcript controls in the real-time quantitative PCR steps, as described below.
RNA-Specificity in Transcript Amplification
For many human transcripts [β-actin, GAPDH, acidic ribosomal phosphoprotein P0 (36B4), hypoxanthine phosphoribosyl transferase, glutathione S-transferase M1 (GSTM1), glutathione S-transferase P1 (GSTP1), glutathione peroxidase, and others] that have homologous and nontranscribed gDNA sequences termed pseudogenes, there is the possibility–in the very common instance of gDNA-contaminating total RNA extracts from tissue and subsequent carry-over into the PCR–of confounded, false-positive RT-PCR assays. DNase treatment of the RNA specimen is often incompletely effective; additionally, small-volume, low-concentration total RNA specimens may not be tolerant of DNase strategies (18) . Therefore, the use of a RNA-specific RT-PCR technique is suggested (Fig. 2) ⇓ . We have previously published a RNA-specific approach that does not require DNase treatment (18) ; the latter is a particularly important feature for the assay of these precious cytologic specimens. Primer sets used in qualitative and quantitative real-time RT-PCR are listed in Table 2 ⇓ .
Reverse Transcription Protocol
The following is a minor modification of the manufacturer’s recommended protocol (for Superscript II reverse transcriptase; RNase H−, Life Technologies, Inc., Invitrogen). All steps were performed in a thermal cycler [Perkin-Elmer 9700 or LightCycler (Roche), as appropriate]. In a 0.2-ml PCR reaction tube, the following were combined: 1 μL universal RT primer (100 μmol/L) or oligo(dT) [0.5 μg/μl], RNA (from 1 ng to 5 μg), and RNase/DNase-free water to make up the final volume to 11 μL. The RNA was denatured at 65°C for 5 minutes and then at 4°C for 5 minutes. The following were added: 5× SuperScript II First Strand Buffer (Invitrogen Life Technologies, Inc., Grand Island, NY), 4 μL; 0.1-m DTT, 2 μL; and deoxynucleoside triphosphate mix (10 mmol/L), 2 μL. After 8 μL of the master mix/RT reaction was aliquoted, incubation at 42°C for 2 minutes was followed by the addition of 1 μL Superscript II RT to each tube, and each tube was individually mixed gently by pipetting up and down. This was followed by incubation at 42°C for 50 minutes and then at 70°C for 15 minutes. For removal of RNA complementary to the cDNA, 1 μL of RNase H was added, and the mixture was incubated at 37°C for 20 minutes.
Qualitative Block Thermocycler-PCR Protocol
For a single 25-μL reaction, the following was mixed at room temperature in a PCR reaction tube: 0.75 μL of 50 mmol/L MgCl2 (Invitrogen) for 1.5 mmol/L MgCl final concentration (the amount should be optimized for each transcript); 2.4 μL of ×10 PCR buffer (Life Technologies, Inc., Invitrogen); 18.6 μL of RNase/DNase-free H2O; 1.6 μL of deoxynucleoside triphosphate mix (10 mmol/L each); 0.4 μL of Platinum Taq DNA polymerase (Invitrogen); 0.25 μL primer mix (50 ρm/μL each stock solution of combined forward and reverse primers); and 1 μL of total RNA-derived template cDNA. Our thermocycling protocol did not vary according to the transcript primer set used in the qualitative-PCR protocol. All block thermocycler temperature ramps were performed at 3.5°C/seconds. The generic program began with a melt to 94°C for 60 seconds; 50 cycles of melting 94°C for 10 seconds; annealing at 58°C for 15 seconds; and extension at 72°C for 30 seconds. PCR products for qualitative purposes were run on a 2% agarose-EtBr gel, visualized under UV light, and, if visible to the unaided eye as a single band of appropriate size, they were scored as “positive.”
Quantitative Real-Time cDNA-PCR Protocol Adapted to the Lightcycler (Roche)
For a single 25-μl reaction, we combined in a glass capillary PCR-reaction tube (Roche) the following: 5.0 μL of the ×5 one-step RT-PCR buffer [Alternately, 0.75 μL/reaction (for 1.5 mmol/L MgCl2) of a 50 mmol/L MgCl2 solution (Invitrogen) plus 2.4 μL/reaction of a ×10 PCR buffer (Invitrogen) can be used; the former (Qiagen) proprietary 5× PCR buffer formula demonstrated superior sensitivity (data not shown) and was exclusively used in this study.] (Qiagen), to yield 2.5 mmol/L MgCl2; 15.5 μL of RNase/DNase-free H2O [Alternately, 18.0-μL RNase/DNase-free H2O for the Invitrogen 10× PCR-buffer protocol.]; a total of 1.6 μL of deoxynucleoside triphosphate mix (10 mmol/L each triphosphate nucleotide); 1.25 μL SYBR Green dye (1:10,000); 0.4 μL Platinum Taq (Life Technologies, Inc., Invitrogen) DNA polymerase; 0.25 μL of 50 ρm/μl each oligo of the primer pair; and 1.0 μL template cDNA from the RT reaction. A master mix of MgCl2, PCR buffer, H2O, deoxynucleoside triphosphate mix, and Platinum Taq DNA polymerase was made up for performing a series of uniplex reactions in separate capillary reaction tubes. After loading, the capillary tubes were spun at 700 × g for 5 seconds and were added to the LightCycler carousel.
PCR was performed using thermocycling reaction conditions optimized for the target of interest. Our generic conditions included one denaturing cycle (20°C/seconds up-ramp) at 95°C for 30 seconds; PCR for 50 cycles consisting of up-ramp (20°C/seconds) to melt at 95°C for 10 seconds, down-ramp (3.5°C/seconds) to anneal at 58°C, 59°C, or 60°C (depending on transcript); and up-ramp (3.5°C/seconds) to anneal at 72°C for 30 seconds. Melting analysis for one cycle was as follows: up-ramp (20°C/seconds) to melt at 95°C for 10 seconds; down-ramp (20°C/seconds) to anneal at 58°C/seconds; and then slow up-ramp (0.1°C/seconds) for continuous acquisition to 95°C. For a run to be scored as positive and quantitative data entered, the PCR product had to display both a single peak with characteristic melt temperature on LightCycler melting analysis and be corroborated by agarose EtBR electrophoresis, displaying a single product at the appropriate size. Each buccal brush extract amplified for a target or reference mRNA transcript was amplified by RT-PCR run in triplicate, on at least one occasion.
Repeated Buccal Sampling, Tobacco Exposure, One Subject
One previously tobacco smoke-naive individual was self-exposed to mainstream tobacco smoke by smoking one commercially available cigarette every two hours, for a total of four cigarettes (Marlboro, Philip Morris USA, Richmond, VA), over a 6-hour time span. Serial buccal swabs, one each from the four quadrants of the lateral buccal mucosa, were collected at baseline (immediately pre-exposure) and at 6-hour intervals up to 18 hours after the onset of tobacco smoke exposure (Fig. 3) ⇓ .
Plasma Nicotine and Cotinine Analysis
Plasma nicotine and cotinine levels were measured by an isotope dilution-high performance liquid chromatography/electrospray ionization tandem mass spectrometric (ID-HPLC-ESI-MS/MS) method. Briefly, the plasma was spiked with methyl-D3 nicotine and methyl-D3 cotinine as internal standards, and, after an equilibrium period, proteins were precipitated with trichloroacetic acid, and the target compounds were extracted from the alkalinized supernatant using methylene chloride. The organic extract was concentrated, and the solvent was exchanged to toluene. A portion of the extract was injected onto a short C-18 HPLC column, and the eluate was monitored by ESI-MS/MS in positive ion mode. The daughter ion resulting from the collision-induced dissociation of the M+H molecular ion for each native and isotope-labeled internal standard was monitored and was used to quantitate the nicotine and cotinine levels. This method was based on the published Centers for Disease Control NHANES procedure (21) . For the tandem liquid chromatography-tandem mass spectrometry used, the detection limit for cotinine was 50 pg/ml serum.
Analysis of the quantitative LightCycler PCR data generated in the above protocol was performed using the standard crossover threshold of the log-linear amplification portion of the fluorescence versus cycle number plot, as published previously (18 , 22) . As an internal control for potential housekeeper reference variability, each target gene transcript level (e.g., CYP1B1 and primer set-a) in a buccal specimen was normalized to each of the three or more internal reference housekeeper transcript levels (GAPDH, β-actin by each of two different primer sets, and/or 36B4 by each of two different primer sets) in that same sample (Table 3) ⇓ . An external standard RNA sample from a single stock reference pool of total RNA extracted from dioxin-exposed MCF-7 breast cancer cells was run in parallel for each target transcript, to permit correction of any run-to-run experimental variability. This redundant reference normalization resulted in three or more parallel crossover-threshold differences for any given target transcript primer set, each performed in triplicate, yielding multiple internal checks and one external check (also in replicate) on the quantitation of each gene transcript.
Numerically, the mean of triplicate crossover thresholds of the target was normalized to the mean of triplicate crossover thresholds of the reference internal housekeeper genes (e.g., β-actin, BAUP, or 36B4) using the formula 1/[2(CRO(target) − CRO(reference)], which yielded a relative target-to-reference transcript concentration value, as a fraction of reference transcript (e.g., one-half or one-eighth the quantity of target transcript, as compared with that of the β-actin internal reference transcript). This ratio for a given primer set-to-target reference pair was then compared across samples from different individuals. Where there was no transcript signal after real-time RT-PCR melting analysis (confirmed by ethidium gel electropheresis), the sample was categorized as “negative” and, for calculation purposes, was assigned an arbitrary crossover threshold value of 46, the limit of detection of the Roche LightCycler instrument and software system, given the 50-PCR cycle protocol.
Multivariate models were created using SigmaStat 2.0 software (SPSS, Chicago, IL) in logistic (e.g., dichotomous outcome variables, such as lung cancer case or control status) or linear (e.g., crossover threshold difference expression levels) strategies. All numerical and statistical intersubject comparisons were made using identical primer sets applied to buccal RNA/cDNA samples from different subjects. For example, CYP1B1 primer set-a quantitative RT-PCR crossover thresholds, which were normalized to the internal reference transcript GAPDH quantitative RT-PCR crossover thresholds and are denoted in crossover threshold difference units from triplicate experimental runs, were compared across the 42 subjects and were modeled in multivariate fashion separately from those for CYP1B1 primer set-b normalized to the referent MCF-7 CYP1B1 primer set-b (Table 3) ⇓ . Each of the three or more crossover threshold differences for any given target transcript [e.g., (CYP1B1 primer set-a) − (GAPDH); (CYP1B1 primer set-b) − (β-actin); and (CYP1B1 primer set-b) − (MCF-7-CYP1B1 primer set-b) was therefore modeled separately and in multivariate fashion.
Associations from the multivariate models are reported in Table 4 ⇓ only if there was consensus among two or more different crossover threshold target-to-reference gene comparisons; for example, results from both [(CYP1B1 primer set-b) − (GAPDH)] and [(CYP1B1 primer set-b) − (MCF-7-CYP1B1 primer set-b)] pairs were in qualitative agreement. Reported values are from the most conservative model of the two or more concordant multivariate models/target gene (e.g., CYP1B1) transcript. Results from modeling of the full set of 42 subjects, the set of current smokers alone, and the set of subjects whose buccal expression levels were in the top (<25th percentile) or bottom (>75th percentile) quartiles are reported.
Subject characteristics are described in Table 1 ⇓ (categorized by historical smoking status). Subsequent multivariate modeling of individual transcript levels and case-control status, the two outcome variables, was used to control for the observed differences in subject characteristics. For example, the difference in age between the current smokers (53.9) and nonsmokers (63.0) was accounted for, as will be described, in determining the relationship, in current cigarette smokers, of plasma nicotine levels (i.e., very recent tobacco smoke exposure) and CYP1B1 transcript levels (Table 4 ⇓ , bottom panel).
Buccal cell viability, as assessed by trypan blue exclusion in representative exfoliated buccal cell samples, indicated 20% viability after 5 minutes of trypan blue exposure (data not shown). Phase contrast microscopy with the MTT succinate dehydrogenase mitochondrial metabolism assay suggested that a similar fraction (∼10 to 15%) of cells collected by cytologic brush were metabolically active (Fig. 1) ⇓ . Total RNA isolation from representative cytologic brushes was optimized and yielded ∼0.75 to 1.5 μg/brush, by A260/280 spectrophotometry, when scaled to an aliquot of a commercially available, standard concentration total RNA extracted from human lung (Clontech). No RNA expression of any transcript was detectable in nonviable buccal cells (e.g., mouthwash rinsed, nonbrushed, and exfoliated buccal cells; data not shown).
The RNA-specific method for real-time quantitative RT-PCR allowed normalization of target transcript to a reliable internal reference housekeeper transcript, reflective of cellular and RNA integrity and quantity, and that was unconfounded by pseudogene sequence encoded by genomic DNA contamination of the RNA sample. Because pseudogene sequence can be carried over to the PCR and confound standard-design primers, the universal RT-PCR technique (18) of the laboratory avoided the need for DNase treatment, or other isolation or purification procedures on these precious cytologic samples, and yielded RNA-specific signal, as demonstrated by the absence of PCR products in the “no-RT” controls for both buccal cytologic and laser-capture microdissection samples (Fig. 2 ⇓ , Panel A). Relative quantitation of target-to-reference housekeeper transcript, a commonly accepted approach in studies of gene expression, was therefore feasible in these specimens (Fig. 2 ⇓ , Panel B), using a RNA-specific method. Intersubject differences in reference transcripts (e.g., GAPDH) generally varied <23 (data not shown).
In the pilot study comparing qualitative buccal cell expression with that in laser-capture microdissected human lung, we studied the aromatic hydrocarbon receptor, cytochrome P450 1A1 (CYP1A1), cytochrome 1B1 (CYP1B1), glutathione S-transferase M1 (GSTM1), glutathione S-transferase M3 (GSTM3), glutathione S-transferase P1 (GSTP1), glutathione S-transferase T1 (GSTT1), NADPH quione oxidoreductase 1 (NQO1), and glutathione peroxidase. The data demonstrated that among the nine carcinogen- and oxidant-metabolizing genes studied in an initial subset of individuals (n = 11) providing paired buccal-lung samples, there was close concordance of expression between the two tissues; the genes expressed in the laser-capture microdissected lung epithelium of an individual were also expressed in the buccal cells of that individual (48 tissue-concordant pairs), and, conversely, those not expressed in the lung tissue were not expressed in the buccal cells (34 tissue-concordant pairs). There were small numbers of tissue-discordant pairs [six lung (+): buccal (−) pairs; four lung (−): buccal (+) pairs; Fisher’s exact, P < 0.001, χ2 = 52.91, P < 0.001]. For these 10 discordant pairs, most buccal-lung discordance occurred for aromatic hydrocarbon receptor expression (data not shown); the other genes were expressed largely concordantly between the two tissues.
In the pilot study of repeated buccal cytology sampling of the buccal mucosa of a single, previously smoke-naive subject over the course of tobacco smoke exposure, clear induction of CYP1B1 and slight transient inhibition of GSTP1 gene transcript levels was demonstrated (Fig. 3) ⇓ .
For the multivariate analysis of all of the 42 subjects, ongoing tobacco smoke exposure correlated with elevated levels of CYP1B1 mRNA and with depressed levels of GSTP1 mRNA in buccal cells (Table 4 ⇓ , top). These findings were also observed in the repeated measures study of the single subject exposed under controlled conditions as well (Fig. 3 ⇓ , Panel B). Aromatic hydrocarbon receptor levels increased with age in the multivariate analysis.
For the multivariate analysis limited to the 19 current smokers only (Table 4 ⇓ , bottom), levels of plasma nicotine correlated with the levels of CYP1B1 mRNA. Female gender was also associated with enhanced CYP1B1 levels. Enhancement of buccal GSTP1 levels was associated with the diagnosis of lung cancer. Aromatic hydrocarbon receptor levels increased with age, in this smoking subgroup.
The subjects expressing the lowest quartile (<25th percentile) of quantitative gene expression were compared with those in the highest expression quartile (>75th percentile), and the two groups were regressed on the known or suspected factors relating to lung carcinogenesis and gene expression for this pathway, as previously performed for the entire 42 subjects, and for the 19 current smokers (Table 4 ⇓ ; see above). Within the power limits of the study, no single gene expression pattern in these high or low expressors emerged as predictive of their case or control status. In explaining the variance of gene transcript expression levels in buccal cells, a positive finding was that the lowest quartile of GSTP1 expressors were more likely to be current smokers (P < 0.010), again consistent with findings in Table 4 ⇓ (all of the subjects, top) and Fig. 3 ⇓ , Panel B (time course of controlled exposure, single subject).
We report a method by which gene expression can be assayed from noninvasively collected, buccal brush exfoliated specimens for a variety of purposes. This method could be incorporated into population studies of inhalation or ingestion exposure-gene interaction or into direct studies of human oral biology and carcinogenesis; it could be used in chemopreventive or therapeutic drug monitoring; or it could potentially enable the use of buccal cells as a surrogate for less accessible organs, such as the human lung or esophagus.
Validity, precision, and reliability were conferred to the approach by the following:
(1) the demonstration of cell viability (23) and metabolic activity of representative samples of buccal cells collected by cytologic brush, thus making gene expression plausible in these exfoliated buccal cell samples;
(2) the optimization of total RNA isolation and downstream analyses;
(3) the quantification of gene transcript using a RNA-specific real-time quantitative RT-PCR strategy, and scaling of target gene transcript to a reliable and unconfounded internal reference housekeeper transcript, reflective of both RNA amount and integrity;
(4) the avoidance of the need for DNase treatment or other isolation or purification procedures on these valuable cytologic samples;
(5) the quantitation of replicate target transcripts across three or more different reference housekeeper transcripts in these specimens;
(6) the exclusion through primer design of mouth bacteria and yeast sequence as confounders of the RT-PCR;
(7) the plausibility of the observed expression patterns as they relate to human exposures and biology; and
Cell viability was tested on a small subset of samples and not routinely assayed for all of the 42 sample sets; each sample was too limited in amount for use in both a dye-exclusion and mitochondrial metabolism assays as well as the mRNA expression studies. However, given that the studies used internal reference “housekeeper” transcripts as legitimate measures of RNA integrity and amount, and these reference transcripts, in turn, were judged reflective of cell integrity (RNA is among “the first to go” in cell apoptosis and death), we propose that the unquantitated cell viability is not a major limitation of the approach. Consistent with this tenet, our experience suggests no RNA expression of any transcript is measurable in nonviable buccal cells (e.g., mouthwashed, nonbrushed, and exfoliated buccal cells). Similarly, because the approach uses quantitation of target transcript against an appropriate internal reference transcript, if the cell dies, GAPDH or other reference transcript would presumably be destroyed at the same rate as CYP1B1 transcript; the ratio between the two should remain relatively stable.
In our initial application of the exfoliated buccal-cell expression technique, the qualitative presence of a specific transcript in the buccal cells predicted the presence of the transcript in the microdissected lung epithelium of that same individual. In the pilot quantitation study, the exposure of a single, previously tobacco smoke-naive subject to mainstream tobacco smoke was coincident with a clear induction of CYP1B1 and slight down-regulation of GSTP1 mRNA levels in serially sampled exfoliated buccal cells, which reverted to baseline within hours of the termination of smoke exposure. This same pattern was also seen in the subsequent application of the technique, an observational study across 42 enrolled subjects, as well as in the subset of that group who were current smokers: generally, parameters of mainstream cigarette smoke exposure (plasma nicotine and cotinine) increased CYP1B1 but decreased GSTP1 mRNA levels in brushed buccal cells. Unfortunately, the sensitivity of current technologies does not yet allow the reliable study of protein or activity expression in these very limited, exfoliated cell samples. However, the potential for Phase I/Phase II metabolic imbalance is notable and warrants follow-up studies.
There was a high degree of intersubject variability (range >10- to 1,000-fold) in expression levels among individuals who had virtually identical tobacco exposures, as measured by plasma nicotine and cotinine; this variability was not accounted for by other exposure, dietary, demographic, or clinical factors in the multivariate analyses. However, given the inherent statistical power limitations of a pilot study and other factors discussed below, we have withheld judgment on the impact of these factors on gene transcript levels in these buccal samples.
For example, attempts in the current study to control for ad libidum dietary exposures (e.g., self-reported servings of fruits, vegetables, and tea/average day) did not reveal any obvious relationships of these dietary factors to gene expression. However, it is acknowledged that simple dietary histories can be inaccurate, entailing misclassification that erodes power. Dietary intake of fruits and vegetables, including isothiocyanates, catechins, and other polyphenol chemopreventive agents, has been associated with up-regulation of Phase II enzymes such as GSTP1 in a variety of experimental and mechanistic in vitro and in vivo models (25) . Although not apparent in the current study, the expression-phenotyping approach may prove, in more detailed dietary exposure studies in the future, to lend itself to monitoring of chemoprevention strategies.
Among the potential factors impacting on case-control status, single nucleotide polymorphisms were not explored. We make the explicit assumption that the most relevant single nucleotide polymorphisms to the determination of gene transcript levels, if any are relevant, would likely be promoter region single nucleotide polymorphisms; coding region single nucleotide polymorphisms are less likely to affect transcript levels directly, although 3′ untranslated region single nucleotide polymorphisms may do so. An initial study emanating from our laboratory on the functional effects of 5′ promoter region single nucleotide polymorphisms observed in the CYP1B1 and CYP1A1 genes did not show highly significant functional effects of the observed single nucleotide polymorphisms on reporter construct expression (26) . Therefore, for the purposes of this exfoliated buccal cell expression study, whereas promoter single nucleotide polymorphisms are possible unexplored causes of the wide observed interindividual expression differences observed, we do not at this time consider them likely explanations, based on current data. However, it is acknowledged that in the future, 5′ promoter region, coding region, or 3′ untranslated region single nucleotide polymorphisms that have been clearly correlated with health risk (e.g., GSTP1 genotype and lung cancer) could legitimately be factored in, in a larger study, to assess the accuracy of combined genotype and buccal cell expression phenotype in carcinogenesis-relevant pathways, as noninvasively measurable risk factors for airway malignancy.
The study also entailed quantitative assays of aromatic hydrocarbon receptor transcript, a known transcription regulator for genes with xenobiotic response element promoter sequences (such as CYP1B1), and found it did not contribute to the models of CYP1B1 intersubject expression variability. This implies that many as-yet unmeasured biological or genetic variables influence the regulation of these genes, with potentially important metabolic, cell survival, and phenotypic consequences. The exfoliated buccal cell technology lends itself to the study of these factors in human populations.
A family history of tobacco-related malignancies (e.g., lung, head and neck, bladder, pancreas, and stomach cancers) was chosen as an easily obtainable data-collection surrogate for inherited factors, such as carcinogen metabolism patterns or other susceptibility factors related to tobacco-related malignancy. Although family history did not emerge as significant on a sufficiently consistent basis in any model set or subgroup to be reported, the power limits of this pilot study do not preclude the emergence, in subsequent and larger studies, of inherited or selected patterns in gene expression that may confer phenotypes displaying penetrance.
Female gender was examined, and consistent with previous observational and mechanistic data emerging from this and other laboratories, it correlated with elevated mRNA levels of the carcinogen- and estrogen-bioactivating enzyme CYP1B1 (15 , 27 , 28) . The lack of correlation between female gender and GSTP1 mRNA levels has also been observed in other contexts (15) .
Therefore, because of a number of biological and practical factors, including power limits inherent in a pilot study that preclude substratification sufficient to avoid a confounding bias toward the null, we reserve judgment on the absence of those expected associations that were not found (e.g., for example, high Phase I/low Phase II gene expression and incident lung cancer diagnosis), until larger studies are completed.
In summary, we have demonstrated a technique for quantification of human gene expression signatures of gene-environment interaction by noninvasive means. We can envisage, in the public health or clinical settings, a variety of potential applications of this technique to asymptomatic or clinical populations as follows: (1) in the determination of cancer susceptibility; (2) the detection of markers of early carcinogenesis; and (3) the assessment of efficacy or toxicity of chemopreventative or therapeutic agents. Each is an arena ripe for additional investigation, in pursuit of these potential benefits.
We acknowledge the exceptional work of research nurses Angela Sheehan, Kathy Mokhiber, and Ann Venezia for subject enrollment, biospecimen collection, and study coordination. Additional acknowledgment is made to Dr. Andrew Reilly of the Wadsworth Bioinformatics and Statistics Core for statistical advice, to the Molecular Genetics core at Wadsworth Center, and to clinical colleagues Drs. Anthony Malanga, Thomas Smith, Jonathan Rosen, Scott Beegle, et al. in Pulmonary Medicine, and Dr. Riivo Ilves, et al. in the Thoracic Surgery sections at Albany Medical Center, who participated in subject enrollment.
Grant support: NIH/National Cancer Institute Grant R21 CA 94714 (S. Spivack), NIH/National Cancer Institute Grant R01 CA 10618 (S. Spivack), and American Lung Association-Research Grant (S. Spivack).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Requests for reprints: Simon D. Spivack, Laboratory of Human Toxicology & Molecular Epidemiology, E622, Wadsworth Center, New York State Department of Health, Empire State Plaza, Albany, NY 12201-0509. E-mail:
- Received May 19, 2004.
- Revision received July 12, 2004.
- Accepted July 21, 2004.
- ©2004 American Association for Cancer Research.