Evaluation of Models for the Prediction of
Breast Cancer Development in Women at High Risk
Matthew S. Mayo, PhD
Kansas Cancer
Institute and Department of Preventive Medicine
Bruce F. Kimler, PhD
Department of Radiation Oncology
Carol J. Fabian, MD
Division of Clinical
Oncology, and Department of Internal Medicine
University of Kansas
Medical Center
3901 Rainbow
Boulevard
Kansas City, Kansas
66160-7312
KEY WORDS: Gail risk, fine needle
aspiration, atypia, cytology, logistic regression, proportional hazards
regression
abstract
In this manuscript we evaluate
models for the prediction of breast cancer in women with major risk factors for
the disease utilizing random periareolar fine needle aspiration (FNA) cytology
along with the original and modified Gail risk assessment. Utilizing logistic
regression, we compare the accuracy in prediction of breast cancer development
using the original Gail risk compared to modifications suggested by Gail et al
(1989) while also utilizing results from FNAs. The predictive ability of these
factors for time to disease onset is compared using Cox's proportional hazards
model. Modification of the traditional Gail risk and utilizing cytology
obtained by FNAs results in improved logistic and Cox's proportional hazards
regression models. Therefore, utilization of random periareolar FNA cytology in
conjunction with the modified Gail risk assessment improves the short-term
prediction of breast cancer in women at increased risk of the disease.
INTRODUCTION
Currently, it is
estimated that the incidence of breast cancer in women is 111 per 100,000
women. Twenty-nine percent of those women diagnosed with breast cancer will
succumb to the disease within 5 years.1 Breast cancer was
traditionally the leading cause of cancer-related death among all women until
it was surpassed by lung cancer in the 1980s. It continues to be the leading
cause of cancer-related death among women aged 40 to 55. According the
Surveillance, Epidemiology, and End Results (SEER) registry, the lifetime risk
for a breast cancer diagnosis and death from breast cancer in women are 12.64%
and 3.57%, respectively.2 Thus, there is a need for statistical
models to accurately assess a woman's risk of breast cancer.
Gail and coworkers3 developed a model to estimate
the relative risk of breast cancer in white women undergoing annual screening.
They determined the major predictors of risk in this population were a family
history of breast cancer in a first-degree relative, previous benign breast
biopsies, a late age at first live birth, and early menarche. From these
factors they created a model to estimate a woman's risk of breast cancer at 10,
20, and 30 years from her current age. They noted that the data used in the
original model included women with and without atypical hyperplasia (AH). In
their modified model, women with a prior biopsy showing AH have their relative
risk multiplied by 1.82, resulting in a modified Gail risk at 10, 20, and 30
years from her current age. The Gail model does not include risk modification factors
for prior breast cancer, age at breast cancer diagnosis, lobular carcinoma in
situ, second- or third-degree relative with breast cancer, relatives with
ovarian cancer, or hormone replacement therapy history, all factors previously
shown to increase breast cancer risk. Thus, women with prior in situ or
invasive cancer as their major risk factor, those with a strong paternal family
history of breast cancer, or those from a hereditary breast ovarian family may
have their risk substantially underestimated. The Gail model also does not take
into account lifestyle changes that may be associated with risk reduction such
as prophylactic oophorectomy in premenopausal women or prevention treatment
with tamoxifen.
Recently, it has been suggested that tissue-based biomarkers
are needed to enhance the prediction of short-term risk of breast cancer
development.4-6 Candidate markers should be both biologically
plausible and statistically associated with cancer or precancerous development.4
Potential surrogate endpoint biomarkers should also be (a) obtained from
minimally invasive procedures, (b) easily quantifiable, (c) present at a
reasonable rate in at-risk individuals, and (d) reversible with successful
interventions.4-7
Nipple or fine-needle aspiration (FNA) are minimally
invasive and inexpensive techniques that can be performed repeatedly with
limited morbidity. Atypical cytology from nipple aspiration has been shown to
be associated with increased breast cancer risk although approximately 40% of
the aspirates are acellular.8,9 Random FNA is currently being
evaluated as a technique for obtaining repeated breast tissue samples in risk
prediction and chemoprevention clinical trials.7,10-14
We have demonstrated that random periareolar FNA cytology
can be used in conjunction with the modified Gail risk assessment for the
short-term prediction of breast cancer in women at high risk of breast cancer.15
In this article, we show that utilizing FNA data does enhance the prediction of
breast cancer in women at high risk of breast cancer in comparison to the
original and modified Gail risk assessment models. We detail the population in
our cohort and then compare demographic and clinical variables between those
women who have progressed to breast cancer and those who have not. We also
define the models that are compared and discuss the results.
Population
Four hundred eighty women at
increased risk for breast cancer because of a family history of breast cancer,
prior precancerous biopsy, and/or prior invasive cancer were enrolled from
August 1989 to January 1999. All women had a mammogram interpreted as not
suspicious for breast cancer within 12 months prior to entry. Random
periareolar FNAs were performed at entry on study, and cells were characterized
cytologically as nonproliferative, epithelial hyperplasia, or epithelial
hyperplasia with atypia.16 The average follow-up time for these
women is 42.5 months, during which time 20 women have been subsequently
diagnosed with invasive breast cancer of ductal carcinoma in situ. Detailed
methodology regarding subject eligibility, FNA technique, tissue preparation
and cytologic characterization have been previously published.11,17,18
Table 1 details
the demographic, familial history, and random FNA cytologic characteristics of
this population. From this data, we see that the average age is 44.31 years,
with an average original 10-year Gail risk of 4.56% and an average modified
10-year Gail risk of 5.44%. Ninety-five point two percent of the patients are
white, 59.6% of the women were premenopausal at entry and 83.5% were not on
hormone replacement therapy at entry. Seventy-five point six percent of the
women had at least one first-degree or two second-degree relatives with breast
cancer, 22.5% had a prior precancerous mastopathy (AH or lobular carcinoma in
situ), and 17.1% had prior breast cancer. This resulted in 14.2% of the women
having multiple risk factors. FNAs determined that 21.2% of the women had
epithelial hyperplasia with atypia, 70.6% had at least one positive biomarker,
and 35.8% had evidence of multiple biomarker abnormalities.
{INSERT TABLE 1]
Table 2 compares
characteristics between those women in whom breast cancer was subsequently
clinically detected and those women who have not been subsequently clinically
diagnosed with breast cancer. Continuous measures are compared using the
two-sample t-test, and dichotomous variables are compared using Fisher's exact
test.19 As can be seen there is not a significant difference in age,
length of follow-up, race, menopausal status, hormone replacement therapy,
incidence of one first-degree or two second-degree relatives with breast
cancer, rate of prior breast cancer, at least one positive biomarker or
multiple biomarker abnormality. However, as noted previously,15 both
the original and modified 10-year Gail risks were significantly higher in those
women who have subsequently developed breast cancer. Also, women with a prior
precancerous mastopathy, with multiple risk factors, or with epithelial
hyperplasia with atypia in their random periareolar FNA were more likely to
develop breast cancer.
{INSERT TABLE 2}
Predicting Breast Cancer
In this paper we compare both
logistic regression20,21 and Cox proportional hazards regression22,23
models for the prediction of breast cancer in our cohort. Three models will be
compared: (1) 10-year Gail risk (Original), (2) 10-year Gail risk (Modified),
and (3) model selected by stepwise procedure with a 5% significance to enter
and leave the model. Both logistic regression and Cox proportional hazards regression
models are fit using SASÒ
software24 using PROC LOGISTIC and PROC PHREG, respectively.25,26
Each of these methods allows for performing stepwise procedures when given a
set of explanatory variables.
{INSERT TABLE 3}
Logistic Regression Models
Logistic regression is a
statistical modeling procedure that allows for the modeling of a categorical
response variable based on a set of explanatory variables. In our circumstance,
we used logistic regression to model the dichotomous response variable cancer. The
logistic regression model can be written in the following form
where
is the probability of
breast cancer,
is the intercept,
β1, ., βp are the p regression parameters,
x1, ., xp
are the p explanatory variables.
Three models,
given in Table 3, are compared for their ability to predict breast cancer.
Model 1 is a simple logistic regression model using only Gail's original
formulation of 10-year risk to predict breast cancer, 10-year Gail risk
(Original). Model 2 is also a simple logistic regression model using only
Gail's 10- year risk modified for AH to predict breast cancer, 10-year Gail
risk (Modified). Model 3 was determined by stepwise logistic regression. The
stepwise logistic procedure determined the best model and included two
explanatory variables, epithelial hyperplasia with atypia from FNA (Atypia from
FNA), and 10-year Gail risk (Modified).
When comparing
logistic regression models, multiple tests and or statistics can be utilized.20,25
We will look at minimizing the -2Log Likelihood, maximizing the concordant
percentage, and maximizing the area under the receiver operating
characteristics (ROC) curve20 in determining the best model for the
prediction of breast cancer. Table 4 details this information for the three
models considered.
{INSERT TABLE 4}
As can be seen
from Table 4, model 3 outperforms the other models on all three categories. The
10-year Gail risk (Modified) outperforms the 10-year Gail risk (Original).
Model 2 is a subset of model 3 that can also be tested to determine if the
addition of epithelial hyperplasia with atypia from FNA into the logistic
regression provides a significant improvement.20,21 Subtracting the
-2Log Likelihood of model 3 from the -2Log Likelihood from model 2 we get a one
degree of freedom chi-square test that shows a significant improvement of model
3 over model 2. From Figure 1 we can see from the ROC curves that model 3 is
clearly the best in terms of this criterion.
{INSERT FIGURE
1}
Table 5 gives
the odds ratio and 95% confidence interval for the odds ratio for the
explanatory variables in each of the three models considered. Table 5 also
gives the P value associated with
testing whether or not the corresponding parameters equal zero.20,21,25
The extremely high odds ratio associated with atypia from FNA is consistent
with it being the first explanatory variable to enter in the stepwise
regression procedure. An increase of over fivefold in the odds ratio not only
makes this a highly statistical significant predictor of breast cancer but also
a clinically significant predictor as well.
{INSERT TABLE 5}
Cox Proportional Hazards Regression
Cox developed the proportional
hazards regression model to allow for use of explanatory variables in predicting
a time-to-event response. The model may be expressed as
where
is the hazard for the
ith individual at time t,
is the nonnegative
baseline hazard function,
β1, ., βp are the p regression parameters,
x1, ., xp
are the p explanatory variables.
We
compare the same three models as in the previous section, but now the response
variable is time to breast cancer diagnosis. Table 6 gives the -2Log likelihood
for each of the three models as well as the P
value for the likelihood ratio Chi-square test.23, 26 As was the
case with logistic regression models, model 3 is the best for the prediction of
time to breast cancer diagnosis.
{INSERT TABLE 6}
Table 7 gives
the hazard ratio for each of the variables along with the corresponding 95%
confidence intervals. Table 7 also includes the P value for testing whether or not the regression parameter
associated with each explanatory variable(s) in the models is equal to zero.23,26
These results mimic those of the logistic regression models in the previous
subsection. Again, the extremely high hazard ratio associated with Atypia from
FNA is consistent with it being the first variable entered in the stepwise
procedure. These results further enhance the use of FNA to aid in the
prediction of breast cancer in women.
{INSERT TABLE 7}
Conclusion
The utilization of cytologic
information from random periareolar FNAs, especially epithelial hyperplasia
with atypia, enhances the ability to predict breast cancer in women with major
risk factors for breast cancer. In this cohort of women, using Gail's 10-year
risk assessment modified for AH along with epithelial hyperplasia with atypia
from random periareolar FNA provides the best prediction models for breast cancer
development and time to breast cancer development. The model seems robust since
the stepwise procedure for both the logistic and Cox proportional hazards
regression model use the same explanatory variables.
It should be
noted that this is a single cohort of women at a single institution and
multi-institutional studies should be performed. Further follow-up on this
cohort, which will reveal more breast cancer incidents, will allow us to
re-evaluate these models and determine if other factors may play a role in the
prediction of breast cancer.
REFERENCES
1. American
Cancer Society: Cancer Facts and
Figures-2000. Atlanta, Georgia, American Cancer Society Incorporated, 2000.
2. Miller
BA: Racial/ethnic patterns of cancer in the United States 1988-1992. Surveillance, Epidemiology, and End Results
(SEER) Monograph. Bethesda, MD, National Cancer Institute, 1996.
3. Gail
MH, Brinton LA, Byar DP, et al: Projecting individualized probabilities of
developing breast cancer for white females who are being examined annually. J Natl Cancer Inst 81(24):1879-1886,
1989.
4. Freedman
LS, Schatzkin A, Shiffman MH: Statistical validation of intermediate markers of
precancer for use as endpoints in chemoprevention trials. J Cellular Biochem 16(Supplement G):27-32, 1992.
5. Kelloff
GJ, Boone CW, Steele VE, et al. Mechanistic considerations in chemopreventive
drug development. J Cellular Biochem
20(Supplement G): 1-24, 1994.
6. Kelloff
GJ, Boone CW, Crowell JA, et al: Risk biomarkers and current strategies for
cancer chemoprevention. J Cellular Biochem
25:1-14, 1996.
7. Fabian
CJ, Kimler BF, Elledge RM, et al: Models for early chemopreventions trials in
breast cancer. Hematol/Oncol Clin North
Am 12:993-1017, 1998.
8. Wrensch
M, Petrakis NL, King EB, et al: Breast cancer risk associates with abnormal
cytology in nipple aspirates of breast fluid and prior history of breast
biopsy. Am J Epidemiol 137:829-833,
1993.
9. Sauter
ER, Ross E, Daly M, et al: Nipple aspirate fluid: A promising non-invasive
method to identify cellular markers of breast cancer risk. Br J Cancer 76(4):494-501, 1997.
10. Fabian CJ, Kamel S,
Kimler BF, McKittrick R: Potential use of biomarkers in breast cancer risk
assessment and chemoprevention trials. Breast
J 1:236-242, 1995.
11. Fabian CJ, Zalles C,
Kamel S, et al: Breast cytology and biomarkers obtained by random fine needle
aspiration: Use in risk assessment and early chemoprevention trials. J Cellular Biochem Suppl 28-29:101-110,
1997.
12. Khan SA, Masood S,
Miller L, Numann P: Occult epithelial proliferation of the breast detected by
random FNA. Proc Am Assoc Cancer Res 37:251,
1996.
13. Marshall CJ, Schumann
GB, Ward JH, et al: Cytologic identification of clinically occult proliferative
breast disease in women with a family history of breast cancer. Am J Clin Pathol 95:157-165, 1991.
14. Martino S, Ensley JF,
Weaver D, et al: Cellular DNA content characteristics of needle aspirates from
patients at high-risk for developing breast cancer. Proc Am Assoc Cancer Res 30:256, 1989.
15. Fabian CJ, Kimler BF,
Zalles CM, et al: Improved prediction of breast cancer risk based on random
periareolar fine needle aspiration cytology. J Natl Cancer Inst 92(15):1217-1227, 2000.
16. Zalles C, Kimler BF,
Kamel S, et al: Cytologic patterns in random aspirates from women at high and
low risk for breast cancer. Breast J 1:343-349,
1995.
17. Fabian CJ, Zalles C,
Kamel S, et al: Biomarker and cytologic abnormalities in women at high and low
risk for breast cancer. J Cellular
Biochem 17(Suppl G):153-160, 1993.
18. Fabian DJ, Zalles C,
Kamel S, et al: Prevalence of aneuploidy, overexpressed ER, and overexpressed
EGFR in random breast aspirates of women at high risk and low risk for breast
cancer. Breast Cancer Res Treatment 30:263-274,
1994.
19. Lehmann EL: Testing Statistical Hypotheses, ed 2.
New York, Chapman & Hall, 1994.
20. Agresti A: Categorical Data Analysis. New York,
Wiley, 1990.
21. Zelterman D: Models for Discrete Data. Oxford,
Clarendon Press, 1999.
22. Cox DR: Regression
models for life tables. J Royal
Statistical Soc 34:187-220, 1972.
23. Lee ET: Statistical Methods for Survival Data
Analysis. New York, Wiley, 1992.
24. SASÒ:
The SASÒ
System for Windows, Release 8.00. Cary, North Carolina, SAS Institute
Incorporated, 2000.
25. Stokes ME, Davis CS,
Koch GG: Categorical Data Analysis Using the SASÒ
System. Cary, North Carolina, SAS Institute Incorporated, 1995.
26. Allison PD: Survival Analysis Using the SASÒ System: A Practical Guide. Cary, North
Carolina, SAS Institute Incorporated, 1995.
Table 1: Demographics of 480 High-Risk Breast Cancer
Subjects*
Age
|
44.31 (8.59)
|
10-Year Gail Risk (Original)
|
4.56 (3.58)
|
10-Year Gail Risk (Modified)
|
5.44 (4.69)
|
Follow-up in Months
|
42.53 (29.68)
|
Race
|
|
White
(Nonhispanic)
|
457 (95.2)
|
Other
|
23 (4.8)
|
Menopausal
Status at Entry
|
|
Pre
|
286 (59.6)
|
Post
|
194 (40.4)
|
On Hormone
Replacement Therapy at Entry
|
|
No
|
401 (83.5)
|
Yes
|
79 (16.5)
|
At Least One First or Two
Second-Degree Relatives with Breast Cancer
|
|
No
|
117 (24.4)
|
Yes
|
363 (75.6)
|
Prior
Precancerous Mastopathy
|
|
No
|
372 (77.5)
|
Yes
|
108 (22.5)
|
Prior Breast Cancer
|
|
No
|
398 (82.9)
|
Yes
|
82 (17.1)
|
Multiple
Risk Factors
|
|
No
|
412 (85.8)
|
Yes
|
68 (14.2)
|
Hyperplasia
with Atypia from FNA
|
|
No
|
378 (78.8)
|
Yes
|
102 (21.2)
|
At Least One
Positive Biomarker from FNA
|
|
No
|
141 (29.4)
|
Yes
|
339 (70.6)
|
Evidence of
Multiple Biomarker Abnormality from FNA
|
|
No
|
308 (64.2)
|
Yes
|
172 (35.8)
|
Cancer other
than LCIS
|
|
No
|
460 (95.8)
|
Yes
|
20 (4.2)
|
*Data are summarized as mean (standard deviation) for
continuous variables and n (%) for dichotomous variables.
FNA = fine needle aspiration.
LCIS = lobular carcinoma in situ
Table 2: Comparison
of Characteristics Between Women Who Have been Subsequently Diagnosed with
Breast Cancer (Cancer) and Those Women Who Have Not (Without Cancer)*
Variable
|
With Cancer
(n=20)
|
Without Cancer
(n=460)
|
P Value
|
Age
|
46.35 (7.89)
|
44.22 (8.62)
|
.2521
|
10-Year Gail Risk (Original)
|
6.96 (4.40)
|
4.46 (3.51)
|
.0208
|
10-Year Gail Risk (Modified)
|
9.26 (6.27)
|
5.27 (4.54)
|
.0108
|
Follow-up in Months
|
43.54 (24.54)
|
42.48 (29.90)
|
.8532
|
Race
|
|
|
1.0000
|
White
Non-Hispanic
|
19 (95.0)
|
438 (95.2)
|
|
Other
|
1 (5.0)
|
22 (4.8)
|
|
Menopausal
Status at Entry
|
|
|
.3638
|
Pre
|
14 (70.0)
|
272 (59.1)
|
|
Post
|
6 (30.0)
|
188 (40.9)
|
|
On Hormone
Replacement Therapy at Entry
|
|
|
.7564
|
No
|
16 (80.0)
|
385 (83.7)
|
|
Yes
|
4 (20.0)
|
75 (16.3)
|
|
At Least One First-Degree or Two
Second-Degree Relatives with Breast Cancer
|
|
|
.7937
|
No
|
4 (20.0)
|
113 (24.6)
|
|
Yes
|
16 (80.0)
|
347 (75.4)
|
|
Prior
Precancerous Mastopathy
|
|
|
.0054
|
No
|
10 (50.0)
|
362 (78.7)
|
|
Yes
|
10 (50.0)
|
98 (21.3)
|
|
Prior Breast
Cancer
|
|
|
.2228
|
No
|
19 (95.0)
|
379 (82.4)
|
|
Yes
|
1 (5.0)
|
81 (17.6)
|
|
Multiple
Risk Factors
|
|
|
.0143
|
No
|
13 (65.0)
|
399 (86.7)
|
|
Yes
|
7 (35.0)
|
61 (13.3)
|
|
Hyperplasia
with Atypia from FNA
|
|
|
.0001
|
No
|
8 (40.0)
|
370 (80.4)
|
|
Yes
|
12 (60.0)
|
90 (19.6)
|
|
At Least One
Positive Biomarker from FNA
|
|
|
.2101
|
No
|
3 (15.0)
|
138 (30.0)
|
|
Yes
|
17 (85.0)
|
322 (70.0)
|
|
Evidence of Multiple Biomarker
Abnormality from FNA
|
|
|
.2328
|
No
|
10 (50.0)
|
298 (64.8)
|
|
Yes
|
10 (50.0)
|
162 (35.2)
|
|
*Data are summarized as mean (standard deviation) for
continuous variables and n (%) for dichotomous variables. Continuous variables
are compared via the two-sample t-test and dichotomous variables are compared
by Fisher's exact test.
FNA = fine needle aspiration.
Table 3: Models for
Prediction of Breast Cancer Development and Time to Breast Cancer Development
Model
|
Variable(s)
|
1
|
10-Year Gail Risk (Original)
|
2
|
10-Year Gail Risk (Modified)
|
3*
|
10-Year Gail Risk (Modified) + Atypia from FNA
|
*Model 3 was
determined to be best by stepwise logistic and stepwise Cox's proportional
hazards regression.
FNA = fine needle aspiration.
Table 4:
Performance of Logistic Regression Models for Prediction of Breast Cancer
Model
|
-2Log Likelihood
|
Likelihood Ratio
Chi-square (P Value)
|
% Concordant
|
Area under ROC
Curve
|
1
|
159.38
|
6.90 (.0086)
|
65.5
|
0.678
|
2
|
156.84
|
9.43 (.0021)
|
72.0
|
0.741
|
3
|
145.24
|
21.03 (<.0001)
|
79.0
|
0.797
|
ROC = receiver operating characteristics.
Table 5: Summary of
Logistic Regression Models for Prediction of Breast Cancer
Model
|
Variable
|
Odds Ratio (95%
CI)
|
P Value
|
1
|
10-Year Gail Risk (Original)
|
1.137 (1.036, 1.237)
|
.0038
|
2
|
10-Year Gail Risk (Modified)
|
1.114 (1.043, 1.184)
|
.0006
|
3
|
10-Year Gail Risk (Modified)
|
1.094 (1.021, 1.167)
|
.0075
|
Atypia from FNA
|
5.176 (2.030, 13.788)
|
.0006
|
CI = confidence interval; FNA = fine needle aspiration.
Table 6:
Performance of Cox Proportional Hazard Regression Models for Prediction of Time
to Breast Cancer Diagnosis
Model
|
-2Log Likelihood
|
Likelihood Ratio
Chi-Square (P Value)
|
1
|
206.19
|
9.47 (.0021)
|
2
|
204.06
|
11.60 (.0007)
|
3
|
191.85
|
23.81 (<.0001)
|
Table 7: Summary of
Cox Proportional Hazard Regression Models for Prediction of Time to Breast
Cancer Diagnosis
Model
|
Variable
|
Hazard Ratio (95%
CI)
|
P Value
|
1
|
10-Year Gail Risk (Original)
|
1.157 (1.071, 1.249)
|
.0002
|
2
|
10-Year Gail Risk (Modified)
|
1.118 (1.061, 1.178)
|
<.0001
|
3
|
10-Year Gail Risk (Modified)
|
1.099 (1.040, 1.162)
|
.0009
|
Atypia from FNA
|
5.087 (2.041, 12.679)
|
.0005
|
CI = confidence interval; FNA = fine needle aspiration.