the Journal of Applied Research
in Clinical and Experimental Therapeutics

Evaluation of Models for the Prediction of Breast Cancer Development in Women at High Risk

Evaluation of Models for the Prediction of Breast Cancer Development in Women at High Risk

Matthew S. Mayo, PhD

Kansas Cancer Institute and Department of Preventive Medicine

Bruce F. Kimler, PhD

Department of Radiation Oncology

Carol J. Fabian, MD

Division of Clinical Oncology, and Department of Internal Medicine

University of Kansas Medical Center

3901 Rainbow Boulevard

Kansas City, Kansas 66160-7312

KEY WORDS: Gail risk, fine needle aspiration, atypia, cytology, logistic regression, proportional hazards regression

abstract

In this manuscript we evaluate models for the prediction of breast cancer in women with major risk factors for the disease utilizing random periareolar fine needle aspiration (FNA) cytology along with the original and modified Gail risk assessment. Utilizing logistic regression, we compare the accuracy in prediction of breast cancer development using the original Gail risk compared to modifications suggested by Gail et al (1989) while also utilizing results from FNAs. The predictive ability of these factors for time to disease onset is compared using Cox's proportional hazards model. Modification of the traditional Gail risk and utilizing cytology obtained by FNAs results in improved logistic and Cox's proportional hazards regression models. Therefore, utilization of random periareolar FNA cytology in conjunction with the modified Gail risk assessment improves the short-term prediction of breast cancer in women at increased risk of the disease.

INTRODUCTION

Currently, it is estimated that the incidence of breast cancer in women is 111 per 100,000 women. Twenty-nine percent of those women diagnosed with breast cancer will succumb to the disease within 5 years.¹ Breast cancer was traditionally the leading cause of cancer-related death among all women until it was surpassed by lung cancer in the 1980s. It continues to be the leading cause of cancer-related death among women aged 40 to 55. According the Surveillance, Epidemiology, and End Results (SEER) registry, the lifetime risk for a breast cancer diagnosis and death from breast cancer in women are 12.64% and 3.57%, respectively.² Thus, there is a need for statistical models to accurately assess a woman's risk of breast cancer.

Gail and coworkers³ developed a model to estimate the relative risk of breast cancer in white women undergoing annual screening. They determined the major predictors of risk in this population were a family history of breast cancer in a first-degree relative, previous benign breast biopsies, a late age at first live birth, and early menarche. From these factors they created a model to estimate a woman's risk of breast cancer at 10, 20, and 30 years from her current age. They noted that the data used in the original model included women with and without atypical hyperplasia (AH). In their modified model, women with a prior biopsy showing AH have their relative risk multiplied by 1.82, resulting in a modified Gail risk at 10, 20, and 30 years from her current age. The Gail model does not include risk modification factors for prior breast cancer, age at breast cancer diagnosis, lobular carcinoma in situ, second- or third-degree relative with breast cancer, relatives with ovarian cancer, or hormone replacement therapy history, all factors previously shown to increase breast cancer risk. Thus, women with prior in situ or invasive cancer as their major risk factor, those with a strong paternal family history of breast cancer, or those from a hereditary breast ovarian family may have their risk substantially underestimated. The Gail model also does not take into account lifestyle changes that may be associated with risk reduction such as prophylactic oophorectomy in premenopausal women or prevention treatment with tamoxifen.

Recently, it has been suggested that tissue-based biomarkers are needed to enhance the prediction of short-term risk of breast cancer development.^4-6 Candidate markers should be both biologically plausible and statistically associated with cancer or precancerous development.⁴ Potential surrogate endpoint biomarkers should also be (a) obtained from minimally invasive procedures, (b) easily quantifiable, (c) present at a reasonable rate in at-risk individuals, and (d) reversible with successful interventions.^4-7

Nipple or fine-needle aspiration (FNA) are minimally invasive and inexpensive techniques that can be performed repeatedly with limited morbidity. Atypical cytology from nipple aspiration has been shown to be associated with increased breast cancer risk although approximately 40% of the aspirates are acellular.^8,9 Random FNA is currently being evaluated as a technique for obtaining repeated breast tissue samples in risk prediction and chemoprevention clinical trials.^7,10-14

We have demonstrated that random periareolar FNA cytology can be used in conjunction with the modified Gail risk assessment for the short-term prediction of breast cancer in women at high risk of breast cancer.¹⁵ In this article, we show that utilizing FNA data does enhance the prediction of breast cancer in women at high risk of breast cancer in comparison to the original and modified Gail risk assessment models. We detail the population in our cohort and then compare demographic and clinical variables between those women who have progressed to breast cancer and those who have not. We also define the models that are compared and discuss the results.

Population

Four hundred eighty women at increased risk for breast cancer because of a family history of breast cancer, prior precancerous biopsy, and/or prior invasive cancer were enrolled from August 1989 to January 1999. All women had a mammogram interpreted as not suspicious for breast cancer within 12 months prior to entry. Random periareolar FNAs were performed at entry on study, and cells were characterized cytologically as nonproliferative, epithelial hyperplasia, or epithelial hyperplasia with atypia.¹⁶ The average follow-up time for these women is 42.5 months, during which time 20 women have been subsequently diagnosed with invasive breast cancer of ductal carcinoma in situ. Detailed methodology regarding subject eligibility, FNA technique, tissue preparation and cytologic characterization have been previously published.^11,17,18

Table 1 details the demographic, familial history, and random FNA cytologic characteristics of this population. From this data, we see that the average age is 44.31 years, with an average original 10-year Gail risk of 4.56% and an average modified 10-year Gail risk of 5.44%. Ninety-five point two percent of the patients are white, 59.6% of the women were premenopausal at entry and 83.5% were not on hormone replacement therapy at entry. Seventy-five point six percent of the women had at least one first-degree or two second-degree relatives with breast cancer, 22.5% had a prior precancerous mastopathy (AH or lobular carcinoma in situ), and 17.1% had prior breast cancer. This resulted in 14.2% of the women having multiple risk factors. FNAs determined that 21.2% of the women had epithelial hyperplasia with atypia, 70.6% had at least one positive biomarker, and 35.8% had evidence of multiple biomarker abnormalities.

{INSERT TABLE 1]

Table 2 compares characteristics between those women in whom breast cancer was subsequently clinically detected and those women who have not been subsequently clinically diagnosed with breast cancer. Continuous measures are compared using the two-sample t-test, and dichotomous variables are compared using Fisher's exact test.¹⁹ As can be seen there is not a significant difference in age, length of follow-up, race, menopausal status, hormone replacement therapy, incidence of one first-degree or two second-degree relatives with breast cancer, rate of prior breast cancer, at least one positive biomarker or multiple biomarker abnormality. However, as noted previously,¹⁵ both the original and modified 10-year Gail risks were significantly higher in those women who have subsequently developed breast cancer. Also, women with a prior precancerous mastopathy, with multiple risk factors, or with epithelial hyperplasia with atypia in their random periareolar FNA were more likely to develop breast cancer.

{INSERT TABLE 2}

Predicting Breast Cancer

In this paper we compare both logistic regression^20,21 and Cox proportional hazards regression^22,23 models for the prediction of breast cancer in our cohort. Three models will be compared: (1) 10-year Gail risk (Original), (2) 10-year Gail risk (Modified), and (3) model selected by stepwise procedure with a 5% significance to enter and leave the model. Both logistic regression and Cox proportional hazards regression models are fit using SAS^Ò software²⁴ using PROC LOGISTIC and PROC PHREG, respectively.^25,26 Each of these methods allows for performing stepwise procedures when given a set of explanatory variables.

{INSERT TABLE 3}

Logistic Regression Models

Logistic regression is a statistical modeling procedure that allows for the modeling of a categorical response variable based on a set of explanatory variables. In our circumstance, we used logistic regression to model the dichotomous response variable cancer. The logistic regression model can be written in the following form

where

is the probability of breast cancer,

is the intercept,

β₁, ., β_p are the p regression parameters,

x₁, ., x_p are the p explanatory variables.

Three models, given in Table 3, are compared for their ability to predict breast cancer. Model 1 is a simple logistic regression model using only Gail's original formulation of 10-year risk to predict breast cancer, 10-year Gail risk (Original). Model 2 is also a simple logistic regression model using only Gail's 10- year risk modified for AH to predict breast cancer, 10-year Gail risk (Modified). Model 3 was determined by stepwise logistic regression. The stepwise logistic procedure determined the best model and included two explanatory variables, epithelial hyperplasia with atypia from FNA (Atypia from FNA), and 10-year Gail risk (Modified).

When comparing logistic regression models, multiple tests and or statistics can be utilized.^20,25 We will look at minimizing the -2Log Likelihood, maximizing the concordant percentage, and maximizing the area under the receiver operating characteristics (ROC) curve²⁰ in determining the best model for the prediction of breast cancer. Table 4 details this information for the three models considered.

{INSERT TABLE 4}

As can be seen from Table 4, model 3 outperforms the other models on all three categories. The 10-year Gail risk (Modified) outperforms the 10-year Gail risk (Original). Model 2 is a subset of model 3 that can also be tested to determine if the addition of epithelial hyperplasia with atypia from FNA into the logistic regression provides a significant improvement.^20,21 Subtracting the -2Log Likelihood of model 3 from the -2Log Likelihood from model 2 we get a one degree of freedom chi-square test that shows a significant improvement of model 3 over model 2. From Figure 1 we can see from the ROC curves that model 3 is clearly the best in terms of this criterion.

{INSERT FIGURE 1}

Table 5 gives the odds ratio and 95% confidence interval for the odds ratio for the explanatory variables in each of the three models considered. Table 5 also gives the P value associated with testing whether or not the corresponding parameters equal zero.^20,21,25 The extremely high odds ratio associated with atypia from FNA is consistent with it being the first explanatory variable to enter in the stepwise regression procedure. An increase of over fivefold in the odds ratio not only makes this a highly statistical significant predictor of breast cancer but also a clinically significant predictor as well.

{INSERT TABLE 5}

Cox Proportional Hazards Regression

Cox developed the proportional hazards regression model to allow for use of explanatory variables in predicting a time-to-event response. The model may be expressed as

where

is the hazard for the i^th individual at time t,

is the nonnegative baseline hazard function,

β₁, ., β_p are the p regression parameters,

x₁, ., x_p are the p explanatory variables.

We compare the same three models as in the previous section, but now the response variable is time to breast cancer diagnosis. Table 6 gives the -2Log likelihood for each of the three models as well as the P value for the likelihood ratio Chi-square test.^{23, 26} As was the case with logistic regression models, model 3 is the best for the prediction of time to breast cancer diagnosis.

{INSERT TABLE 6}

Table 7 gives the hazard ratio for each of the variables along with the corresponding 95% confidence intervals. Table 7 also includes the P value for testing whether or not the regression parameter associated with each explanatory variable(s) in the models is equal to zero.^23,26 These results mimic those of the logistic regression models in the previous subsection. Again, the extremely high hazard ratio associated with Atypia from FNA is consistent with it being the first variable entered in the stepwise procedure. These results further enhance the use of FNA to aid in the prediction of breast cancer in women.

{INSERT TABLE 7}

Conclusion

The utilization of cytologic information from random periareolar FNAs, especially epithelial hyperplasia with atypia, enhances the ability to predict breast cancer in women with major risk factors for breast cancer. In this cohort of women, using Gail's 10-year risk assessment modified for AH along with epithelial hyperplasia with atypia from random periareolar FNA provides the best prediction models for breast cancer development and time to breast cancer development. The model seems robust since the stepwise procedure for both the logistic and Cox proportional hazards regression model use the same explanatory variables.

It should be noted that this is a single cohort of women at a single institution and multi-institutional studies should be performed. Further follow-up on this cohort, which will reveal more breast cancer incidents, will allow us to re-evaluate these models and determine if other factors may play a role in the prediction of breast cancer.

REFERENCES

1. American Cancer Society: Cancer Facts and Figures-2000. Atlanta, Georgia, American Cancer Society Incorporated, 2000.

2. Miller BA: Racial/ethnic patterns of cancer in the United States 1988-1992. Surveillance, Epidemiology, and End Results (SEER) Monograph. Bethesda, MD, National Cancer Institute, 1996.

3. Gail MH, Brinton LA, Byar DP, et al: Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst 81(24):1879-1886, 1989.

4. Freedman LS, Schatzkin A, Shiffman MH: Statistical validation of intermediate markers of precancer for use as endpoints in chemoprevention trials. J Cellular Biochem 16(Supplement G):27-32, 1992.

5. Kelloff GJ, Boone CW, Steele VE, et al. Mechanistic considerations in chemopreventive drug development. J Cellular Biochem 20(Supplement G): 1-24, 1994.

6. Kelloff GJ, Boone CW, Crowell JA, et al: Risk biomarkers and current strategies for cancer chemoprevention. J Cellular Biochem 25:1-14, 1996.

7. Fabian CJ, Kimler BF, Elledge RM, et al: Models for early chemopreventions trials in breast cancer. Hematol/Oncol Clin North Am 12:993-1017, 1998.

8. Wrensch M, Petrakis NL, King EB, et al: Breast cancer risk associates with abnormal cytology in nipple aspirates of breast fluid and prior history of breast biopsy. Am J Epidemiol 137:829-833, 1993.

9. Sauter ER, Ross E, Daly M, et al: Nipple aspirate fluid: A promising non-invasive method to identify cellular markers of breast cancer risk. Br J Cancer 76(4):494-501, 1997.

10. Fabian CJ, Kamel S, Kimler BF, McKittrick R: Potential use of biomarkers in breast cancer risk assessment and chemoprevention trials. Breast J 1:236-242, 1995.

11. Fabian CJ, Zalles C, Kamel S, et al: Breast cytology and biomarkers obtained by random fine needle aspiration: Use in risk assessment and early chemoprevention trials. J Cellular Biochem Suppl 28-29:101-110, 1997.

12. Khan SA, Masood S, Miller L, Numann P: Occult epithelial proliferation of the breast detected by random FNA. Proc Am Assoc Cancer Res 37:251, 1996.

13. Marshall CJ, Schumann GB, Ward JH, et al: Cytologic identification of clinically occult proliferative breast disease in women with a family history of breast cancer. Am J Clin Pathol 95:157-165, 1991.

14. Martino S, Ensley JF, Weaver D, et al: Cellular DNA content characteristics of needle aspirates from patients at high-risk for developing breast cancer. Proc Am Assoc Cancer Res 30:256, 1989.

15. Fabian CJ, Kimler BF, Zalles CM, et al: Improved prediction of breast cancer risk based on random periareolar fine needle aspiration cytology. J Natl Cancer Inst 92(15):1217-1227, 2000.

16. Zalles C, Kimler BF, Kamel S, et al: Cytologic patterns in random aspirates from women at high and low risk for breast cancer. Breast J 1:343-349, 1995.

17. Fabian CJ, Zalles C, Kamel S, et al: Biomarker and cytologic abnormalities in women at high and low risk for breast cancer. J Cellular Biochem 17(Suppl G):153-160, 1993.

18. Fabian DJ, Zalles C, Kamel S, et al: Prevalence of aneuploidy, overexpressed ER, and overexpressed EGFR in random breast aspirates of women at high risk and low risk for breast cancer. Breast Cancer Res Treatment 30:263-274, 1994.

19. Lehmann EL: Testing Statistical Hypotheses, ed 2. New York, Chapman & Hall, 1994.

20. Agresti A: Categorical Data Analysis. New York, Wiley, 1990.

21. Zelterman D: Models for Discrete Data. Oxford, Clarendon Press, 1999.

22. Cox DR: Regression models for life tables. J Royal Statistical Soc 34:187-220, 1972.

23. Lee ET: Statistical Methods for Survival Data Analysis. New York, Wiley, 1992.

24. SAS^Ò: The SAS^Ò System for Windows, Release 8.00. Cary, North Carolina, SAS Institute Incorporated, 2000.

25. Stokes ME, Davis CS, Koch GG: Categorical Data Analysis Using the SAS^Ò System. Cary, North Carolina, SAS Institute Incorporated, 1995.

26. Allison PD: Survival Analysis Using the SAS^Ò System: A Practical Guide. Cary, North Carolina, SAS Institute Incorporated, 1995.

Table 1: Demographics of 480 High-Risk Breast Cancer Subjects*

Age	44.31 (8.59)
10-Year Gail Risk (Original)	4.56 (3.58)
10-Year Gail Risk (Modified)	5.44 (4.69)
Follow-up in Months	42.53 (29.68)
Race
White (Nonhispanic)	457 (95.2)
Other	23 (4.8)
Menopausal Status at Entry
Pre	286 (59.6)
Post	194 (40.4)
On Hormone Replacement Therapy at Entry
No	401 (83.5)
Yes	79 (16.5)
At Least One First or Two Second-Degree Relatives with Breast Cancer
No	117 (24.4)
Yes	363 (75.6)
Prior Precancerous Mastopathy
No	372 (77.5)
Yes	108 (22.5)
Prior Breast Cancer
No	398 (82.9)
Yes	82 (17.1)
Multiple Risk Factors
No	412 (85.8)
Yes	68 (14.2)
Hyperplasia with Atypia from FNA
No	378 (78.8)
Yes	102 (21.2)
At Least One Positive Biomarker from FNA
No	141 (29.4)
Yes	339 (70.6)
Evidence of Multiple Biomarker Abnormality from FNA
No	308 (64.2)
Yes	172 (35.8)
Cancer other than LCIS
No	460 (95.8)
Yes	20 (4.2)

*Data are summarized as mean (standard deviation) for continuous variables and n (%) for dichotomous variables.

FNA = fine needle aspiration.

LCIS = lobular carcinoma in situ

Table 2: Comparison of Characteristics Between Women Who Have been Subsequently Diagnosed with Breast Cancer (Cancer) and Those Women Who Have Not (Without Cancer)*

Variable	With Cancer (n=20)	Without Cancer (n=460)	P Value
Age	46.35 (7.89)	44.22 (8.62)	.2521
10-Year Gail Risk (Original)	6.96 (4.40)	4.46 (3.51)	.0208
10-Year Gail Risk (Modified)	9.26 (6.27)	5.27 (4.54)	.0108
Follow-up in Months	43.54 (24.54)	42.48 (29.90)	.8532
Race			1.0000
White Non-Hispanic	19 (95.0)	438 (95.2)
Other	1 (5.0)	22 (4.8)
Menopausal Status at Entry			.3638
Pre	14 (70.0)	272 (59.1)
Post	6 (30.0)	188 (40.9)
On Hormone Replacement Therapy at Entry			.7564
No	16 (80.0)	385 (83.7)
Yes	4 (20.0)	75 (16.3)
At Least One First-Degree or Two Second-Degree Relatives with Breast Cancer			.7937
No	4 (20.0)	113 (24.6)
Yes	16 (80.0)	347 (75.4)
Prior Precancerous Mastopathy			.0054
No	10 (50.0)	362 (78.7)
Yes	10 (50.0)	98 (21.3)
Prior Breast Cancer			.2228
No	19 (95.0)	379 (82.4)
Yes	1 (5.0)	81 (17.6)
Multiple Risk Factors			.0143
No	13 (65.0)	399 (86.7)
Yes	7 (35.0)	61 (13.3)
Hyperplasia with Atypia from FNA			.0001
No	8 (40.0)	370 (80.4)
Yes	12 (60.0)	90 (19.6)
At Least One Positive Biomarker from FNA			.2101
No	3 (15.0)	138 (30.0)
Yes	17 (85.0)	322 (70.0)
Evidence of Multiple Biomarker Abnormality from FNA			.2328
No	10 (50.0)	298 (64.8)
Yes	10 (50.0)	162 (35.2)

*Data are summarized as mean (standard deviation) for continuous variables and n (%) for dichotomous variables. Continuous variables are compared via the two-sample t-test and dichotomous variables are compared by Fisher's exact test.

FNA = fine needle aspiration.

Table 3: Models for Prediction of Breast Cancer Development and Time to Breast Cancer Development

Model	Variable(s)
1	10-Year Gail Risk (Original)
2	10-Year Gail Risk (Modified)
3*	10-Year Gail Risk (Modified) + Atypia from FNA

*Model 3 was determined to be best by stepwise logistic and stepwise Cox's proportional hazards regression.

FNA = fine needle aspiration.

Table 4: Performance of Logistic Regression Models for Prediction of Breast Cancer

Model	-2Log Likelihood	Likelihood Ratio Chi-square (P Value)	% Concordant	Area under ROC Curve
1	159.38	6.90 (.0086)	65.5	0.678
2	156.84	9.43 (.0021)	72.0	0.741
3	145.24	21.03 (<.0001)	79.0	0.797

ROC = receiver operating characteristics.

Table 5: Summary of Logistic Regression Models for Prediction of Breast Cancer

Model	Variable	Odds Ratio (95% CI)	P Value
1	10-Year Gail Risk (Original)	1.137 (1.036, 1.237)	.0038
2	10-Year Gail Risk (Modified)	1.114 (1.043, 1.184)	.0006
3	10-Year Gail Risk (Modified)	1.094 (1.021, 1.167)	.0075
3	Atypia from FNA	5.176 (2.030, 13.788)	.0006

CI = confidence interval; FNA = fine needle aspiration.

Table 6: Performance of Cox Proportional Hazard Regression Models for Prediction of Time to Breast Cancer Diagnosis

Model	-2Log Likelihood	Likelihood Ratio Chi-Square (P Value)
1	206.19	9.47 (.0021)
2	204.06	11.60 (.0007)
3	191.85	23.81 (<.0001)

Table 7: Summary of Cox Proportional Hazard Regression Models for Prediction of Time to Breast Cancer Diagnosis

Model	Variable	Hazard Ratio (95% CI)	P Value
1	10-Year Gail Risk (Original)	1.157 (1.071, 1.249)	.0002
2	10-Year Gail Risk (Modified)	1.118 (1.061, 1.178)	<.0001
3	10-Year Gail Risk (Modified)	1.099 (1.040, 1.162)	.0009
3	Atypia from FNA	5.087 (2.041, 12.679)	.0005

CI = confidence interval; FNA = fine needle aspiration.

Department of Radiation Oncology

KEY WORDS: Gail risk, fine needle aspiration, atypia, cytology, logistic regression, proportional hazards regression

abstract

Population

Age

Menopausal Status at Entry

On Hormone Replacement Therapy at Entry

Prior Precancerous Mastopathy

Multiple Risk Factors

Hyperplasia with Atypia from FNA

At Least One Positive Biomarker from FNA

Evidence of Multiple Biomarker Abnormality from FNA

Cancer other than LCIS

No

Age

Menopausal Status at Entry

On Hormone Replacement Therapy at Entry

Prior Precancerous Mastopathy

Prior Breast Cancer

Multiple Risk Factors

Hyperplasia with Atypia from FNA

At Least One Positive Biomarker from FNA