Belief in big data

Don’t assume more data is better data.

Routinely collected data, also called “big data” (information from large databases, for example of prescription) and “real-world data” (data collected routinely, for example from hospital visits), can provide a lot of information. However, claims based on these kinds of data can be misleading. 


Sometimes, people make claims using “big data” or “real-world data”, but we need to be careful when deciding whether these claims are reliable. Although they provide a lot of data, these kinds of information often do not have enough details to be sure that a particular benefit or harm has been caused by a health action, even if there appears to be an association between them. This is because this kind of data can only give us information about factors that we already know about and have measured, but not about possible unknown factors (confounders).

Describing routinely collected data as “real world data” implies that data collected in carefully designed fair comparisons do not come from the real world. Databases of routinely collected data may indeed include a wider range of people than data collected in fair comparisons of treatments. However, routine collection of data is rarely planned to include the information that is needed to ensure fair comparisons, while randomized trials can be designed to include a wide range of people.


Real world (routinely collected) data have been used in comparisons of different types of surgery to treat blocked or narrowed blood vessels in the heart (heart bypass surgery). They found that treating two blood vessels, compared with one blood vessel, was associated with a lower risk of dying within one year. A more likely explanation is that the association was because of confounders that had not been measured in the routinely collected data. Using two blood vessels instead of one increases the complexity and invasiveness of the surgery and it is likely that surgeons tended to reserve this type of surgery for patients they considered healthier and expected to live longer. A large fair comparison between using two or one blood vessel found little or no difference in survival after 10 years.

Remember: Just because an association between a benefit or harm and something people do for their health (a health action) is found using “big data” or “real world data”, it does not mean that the health action caused the benefit or harm.

Back to Top