Richard Mark Kirkner
February 10, 2021
Higher coffee consumption is associated with a lower risk of heart failure, according to a machine-learning-based algorithm that analyzed data from three large observational trials.
"Coffee consumption actually was predictive on top of known risk factors originally identified from those three trials." The study is significant because it underscores the potential of big data for individualizing patient management, lead investigator David Kao, MD, said in an interview. "We in fact adjusted for the scores that are commonly used to predict heart disease, and coffee consumption remained a predictor even on top of that."
The study used supervised machine learning to analyze data on diet and other variables from three well-known observational studies: Framingham Heart Study (FHS), Cardiovascular Heart Study (CHS), and ARIC (Atherosclerosis Risk in Communities). The goal of the study, published online on Feb. 9, 2020, was to identify potential novel risk factors for incident coronary heart disease, stroke, and heart failure.
"The main difference of the relationship between coffee and heart disease, compared with prior analyses, is that we're able to find it in these well-known and well-accepted studies that have helped us find risk factors before," Kao said.
The study included 2,732 FHS participants aged 30–62 years, 3,704 CHS patients aged 65 and older, and 14,925 ARIC subjects aged 45–64, all of whom had no history of cardiovascular disease events when they enrolled. Primary outcomes for the machine-learning study were times to incident coronary heart disease, heart failure, and stroke.
Mathematics, Not Hypotheses
To compensate for variations in methodologies between the three observational trials, the study used 204 data measurements collected at the first FHS exam, including 16 dietary variables and for which similar data were collected for the other two studies.
The machine-learning model used what's known as a random forest analysis to identify the leading potential risk factors from among the 204 variables. To confirm findings between studies, the authors used a technique called "data harmonization" to smooth variations in the methodologies of the trials, not only with participant age and duration and date of the trials, but also in how data on coffee consumption were gathered. For example, FHS collected that data as cups per day, whereas CHS and ARIC collected that as monthly, weekly, and daily consumption. The study converted the coffee consumption data from CHS and ARIC to cups per day to conform to FHS data.
Random forest analysis is a type of machine learning that randomly creates a cluster of decision trees -- the "forest" -- to determine which variables, such as dietary factors, are important in predicting a result. The analysis uses mathematics, not hypotheses, to identify important variables.
Heart Failure and Risk Reduced
In this study, the analysis determined that each cup of caffeinated coffee daily was linked with a 5% reduction in the risk of heart failure (hazard ratio, 0.95; P = .02) and 6% reduction in stroke risk (HR, 0.94; P = .02), but had no significant impact on risk for coronary heart disease or cardiovascular disease.
When the data were adjusted for the FHS CVD risk score, increasing coffee consumption remained significantly associated with an identical lower risk of heart failure (P = .03) but not stroke (P = .33).
While the study supports an association between coffee consumption and heart failure risk, it doesn't establish causation, noted Alice H. Lichtenstein, DSc, director and senior scientist at the Cardiovascular Nutrition Laboratory at Tufts University, Boston. "The authors could not rule out the possibility that caffeinated coffee intake was a proxy for other heart-healthy lifestyle behaviors," Lichtenstein said. "Perhaps the best message from the study is that there appears to be no adverse effects of drinking moderate amounts of caffeinated coffee, and there may be benefits."
Machine Learning Mines Observational Trials
Kao explained the rationale for applying a machine-learning algorithm to the three observational trials. "When these trials were designed in general, they had an idea of what they were looking for in terms of what might be a risk factor," said Kao, of the University of Colorado at Denver, Aurora. "What we were interested in doing was to look for risk factors that nobody really thought about ahead of time and let the data show us what might be a predictor without any bias of what we imagined to be true."
He described the role of machine learning in extracting and "filtering" data from the trials. "Machine learning allows us to look at a very large number of factors or variables and identify the most important ones in predicting a specific outcome," he said. This study evaluated the 204 variables and focused on dietary factors because they're modifiable.
"We looked at them in these different studies where we could, and coffee was the one that was reproducible in all of them," he said. "Machine learning helped filter down these very large numbers of variables in ways you can't do with traditional statistics. It's useful in studies like this because they gather thousands and thousands of variables that generally nobody uses, but these methods allow you to actually do something with them -- to determine which ones are most important."
He added: "These methods I think will take us toward personalized medicine where you're really individualizing a plan for keeping a patient healthy. We still have a lot of work to do, but there's a lot of promise for really helping each of us to figure out the ways we can become the healthiest that we can be."
The study was supported with funding from the National Heart, Lung, and Blood Institute and the American Heart Association. Kao and coauthors, as well as Lichtenstein, had no relevant financial relationships to disclose.