The digital world allows unprecedented access to vast amounts of data on real-world conditions that were beyond imagination just a few years ago (e.g., sensor data from fitness devices available for analysis by insurance providers). The life science industry is interested in using this type of data, named “Real-World Data” (RWD) and is currently pioneering the integration of this data into their experimental and regulatory pipelines. The development and approval of drugs, treatments, and therapies are driven by various regulatory requirements; therefore, it is risky and costly. RWD opens new possibilities for providing clinical evidence regarding the use and potential benefits or risks of new drugs, treatments, and therapies outside the context of prescriptive randomized clinical trials (RCTs). The use of RWD to inform on health-related decisions is defined as “Real-World Evidence” (RWE). An essential question related to RWE remains unanswered: what challenges are faced when deriving evidence from RWD? To better understand this question, a reflection on the RWD risk exposure from a business perspective is valuable. In Figure 1, the risks associated with RWD are categorized into three areas . As perceived over time and ranging from short to long term, these aggregated risks relate to “Compliance Controversies,” “Registration Failure,” and “Business Model Disruption,” respectively (Fig. 1 shows risk fields in the context of RWD ).
The inner layer of Figure 1 deals with “data-related” risks that may lead to compliance violations, for example, when certain requirements cannot be met in an audit. The middle layer of the model aims at “approval-related” risks in relation to the development processes of new drugs and/or therapies, for example, when certain process adjustments due to the use of RWD are not adequate. The outer layer of the risk categorization is “business model-related” and deals with the RWD potential to disrupt the life science industry, for example, when new competitors (e.g., Apple) enter the market with superior RWD (e.g., from health apps in combination with wearable devices, such as smartwatches).
The aggregation of RWD risks in three “ layers” is helpful to understand how RWD may be exposing the life sciences industry to an ever-increasing business exposure over time. Yet, the intention of this abstracted model is not to exclude those risks that could also occur later in time (e.g., data-related risks) or ignore the risks that may occur earlier (e.g., approval-related risks). In this study, we primarily focus on the layers “data-related” and “approval-related” as these short- and medium-term risks of using RWD are currently more concrete in the life science industry.
Definitions and Concepts
As various definitions in relation to RWD are available, the following section provides an overview of definitions and concepts which the authors deem relevant for the purpose of this study.
RWD is a term that has varying definitions depending on the source used. The Association of the British Pharmaceutical Industry, an influential British organization representing British biopharmaceutical research companies, defines RWD as “data obtained by any non-interventional methodology that describes what is happening in normal clinical practice” . This definition refers to medical research; the term “non-interventional” can be explained as procedures “without any intervention during the course.” In contrast, RAND Europe, a reputable nonprofit institution that helps improving policy and decision-making through research and analysis refers to RWD as an “umbrella” term, which stands for different types of data related to healthcare that is not collected in context of conventional RCTs . The International Society for Pharmacoeconomics and Outcome Research, a relevant player that promotes health economics and outcomes research to improve health decision-making, defined RWD as simple as “Data used for decision making that are not collected in conventional RCTs.” .
RWE is generated through the use of RWD to make meaningful health-related conclusions. The surge of electronic health records (EHRs), as well as other technologies, enables the researchers to better understand the real-world patient’s experience. For example, a smartphone can be used to measure the distance traveled by a patient to determine the fitness activity. In contrast, the New England Journal of Medicine asserted to exclude data from clinical research settings such as in EHRs from their definition of RWE . Similarly, researchers from the US Food and Drug Administration (FDA) defined RWE as “information on healthcare that is derived from the multiple sources outside typical clinical research settings, including EHRs, claims and billing data, product and disease registries, and data gathered through personal devices and health applications” .
The “Efficacy Effectiveness Gap” is the discrepancy between the real-life efficacy of a drug once it is available on the market and the outcome of the same drug in a standardized environment under ideal conditions in the context of RCTs . The efficacy-effectiveness gap poses a challenge on the decision-making process of drug licensing when it is based solely on the efficacy analyses of RCTs  as it relates to “lower than anticipated efficacy or a higher than anticipated incidence or severity of adverse effects” . GetReal, an innovative medicines initiative, launched a study which aspires to advance the awareness of how to harmonize evidence to back efficacy and effectiveness and to propose operational solutions .
An in-depth and systematic literature search was performed to identify potential challenges and risks in relevant academic publications. The focus of the study was in the health sector space of RWD. Our approach was to focus and concentrate on pertinent parameters , and finally, we developed six leading parameters as detailed in Table 1. Further, sufficient inclusion and exclusion criteria were used to extract relevant materials from “gray” (not academic) literature as detailed in Table 2.
As relevantly categorized publications included the terms or abbreviations “RWD” and/or “ RWE,” we found a total of 41 challenges relating to RWD and RWE that were mentioned repeatedly. After preprocessing the results, clustering similar and removing duplicated challenges, 16 unique challenges were identified. To achieve these results, the authors allocated a one-word identifier for each challenge and used text analysis to derive the statistics about the results. In Table 3, the identified challenges are mapped into three “classical” categories, which will be discussed in the following sections. The “Occurrence” column indicates the absolute frequency, with which a key challenge appeared in the analyzed literature, with the related percentage indicated in brackets.
To further categorize the identified key challenges that risk managers are likely to face when RWD is discussed, we developed a criteria schema, the so-called RWD Challenges Radar, as visualized in Figure 2. The RWD Challenges Radar is based on three foundational categories/views which guide the information systems discipline known as the “confluence of people, organizations, and technology” . Each category consists of various subcategories, originating from the literature search. The size of the respective subcategories is intended to indicate the relevance or scope of the current discussion on the respective topic; the larger the field, the more important the topic appears to be.
RWD Challenges Radar
“Risk is a necessary part of doing business, and in a world where enormous amounts of the data are being processed at increasingly rapid rates, identifying and mitigating risks is a challenge for any company” . Moreover, the true understanding of the implications of a risk, let alone its identification is often troublesome. Yet, as “risk management may be called both an art and a science” , we developed the RWD Challenges Radar as detailed in Figure 2 to provide an insightful overview of relevant risks related regarding the application of RWD. As the radar is an abstract representation of a very specific aspect of RWD, namely “risk,” we believe that this specific focus will be supportive to unveil the challenges associated with RWD.
The prototypical RWD Challenges Radar is as such a first attempt to visualize the relevant risk of RWD for decision-makers and other stakeholders, as seen from an organizational, technological, and people’s perspective. Furthermore, based on the underlaying insights of the RWD Challenges Radar, an RWD Challenges Cockpit can be constructed. Such a dashboard-type solution would automatically capture, classify, assess, and visualize the quality of certain RWD. The use of the RWD Challenges Radar fits the various stages of the drug development process and will enable the RWD users to be fully aware of the challenges and risks related to the data while taking full advantage of the RWD potential. In the following sections, the three main categories and related parameters of the RWD Challenges Radar are described.
The process of converting RWD into RWE to be used in a regulatory context needs to be embedded in adequate organizational structures. This leads into the second circle of the RWD Challenges Radar and the six related parameters: (1) availability of data quality mechanisms, (2) availability of suitable standards, (3) active coordination, (4) governance arrangements, (5) alignment with compliance requirements, and (6) cost considerations.
Various stakeholders are concerned when it comes to the quality of RWD. Research suggests that these concerns primarily originate from low-quality patients’ registries . Therefore, more robust RWD quality assurance processes are needed to facilitate the derivation of the evidence from RWD. Moreover, the RWD from observational studies is considered of lower quality and thus less important than RWD from RCT studies [4, 9]. Incomplete data are another factor profoundly affecting the quality of RWD. Some RWD sources are vulnerable to misclassification or systematic omissions which further extends the gap in the data (e.g., claims data that could contain information whether a patient had a test or not, do not reveal any test result details) . Even though gaps or missing data could be complemented, new issues, such as bias, a challenge that is presented in a later section, could be introduced. Additionally, claims databases may lead to quality issues as they bear the risk of incomplete and inconsistent data. For example, claims databases by nature lack information on the severity of clinical diseases and patients’ lifestyles .
Data standardization is important on a fundamental level that extends from the data collection, processing, quality, terminology, design principles, the conduct of data collection, or RWE reporting [3, 15-17]. Currently, there is a large gap in data standardization between all institutions which reduces the quality of RWD compared to the data originating from RCTs. Consequently, if the RWE cannot be utilized to ascertain the effectiveness of compared medical treatments, there will be less incentive to generate, gather, and use RWD . Nevertheless, there are some recent efforts by regulators, such as the FDA, to introduce data standards to increase the use of RWD and RWE in the drug development and regulatory life cycle .
There is a lack of coordination between different organizations on a national and international level regarding RWD translation into evidence. This is due to insufficient interactions between research groups, leading to inadequate evidence derivation from a limited research capacity . The European Medicines Agency has defined several challenges for existing registries to be utilized as evidence, one of which is the lack of harmonization and coordination between healthcare providers (HCPs) [16, 19]. Additional research indicates that there is no coordination between healthcare organizations on an international level . This is one of the most significant barriers to the capture and use of RWD.
Legal frameworks and governance arrangements for RWD access are vital to allow different groups to access the required data in time to optimize healthcare for patients. Yet, RWD is generated by different sources such as academic institutions, hospitals, and private individuals; consequently, sought-after RWD may not be accessible to all stakeholders. RWD is concentrated mostly in hospital, pharmaceutical, and university databases − entities that might not have the funding nor the interest to conduct observational research. Thus, access to these databases might be limited from outside the respective organizations. Therefore, access to RWD sources is highly related to the type of interactions that are in place between different stakeholders in the organizations . Moreover, most databases are often only accessible to researchers from academia upon request, while not being offered to other types of groups. Research suggests that this may be due to the fact that RWD is being used for reasons other than why they were initially intended .
Recent changes in data privacy legislation in Europe, such as the General Data Protection Regulation, should be considered for RWD . These rules pose a challenge to the collection and analysis of RWD. For example, the general data protection regulation’s data minimization principle, which indicates that data must be “relevant and limited to what is necessary” , can conflict with the objectives of research groups where accurate analyses and results require increasingly more datasets . Additional rules applying to the collection of RWD concerning “consent” and “purpose limitation” could restrict social media data gathering. Ethical concerns and the fear of data misusage hinder the gathering of the patients’ personal data. This has led to some unsuccessful initiatives, such as the Netherland’s attempt to develop a national electronic record system to facilitate the exchange of information between different healthcare entities, such as hospitals, insurance companies, and pharmaceutical firms [3, 15]. A specific concern is that commercial organizations may misleadingly interpret the health datasets that can contradict the original aim of the project . Additionally, several privacy frameworks have been built to facilitate further protect patient data, such as the OECD Privacy Framework . However, due to its sensitive nature, health data is always a significant concern for regulators around the globe. Data privacy legislation will continue to be a major challenge that stakeholders must take into consideration when collecting data from any source.
The International Society for Pharmacoeconomics and Outcome Research real-world task force’s article on “Evidence costs money” states that the most critical question for gathering and analyzing RWE is “who will pay for it?” . One of the tools that was suggested to evaluate the costs and benefits of the RWD is the value-of-information analysis, which offers an approach to determine the type and amount of time collecting the data would take and whether collecting a specific kind of data will ultimately improve the expected benefit . Pfizer, one of the top 10 pharmaceutical companies, mentioned that the costs of RWD analysis could be quite high, for example, in prospective noninterventional studies . Other researchers recommend real-time monitoring of the patients as a way to reduce the costs of evidence generation: for example, the use of wearable devices, such as smartwatches can routinely collect RWD .
The role of technology in the generation of RWD is closely tied as “smart” devices become available to the populace but technological RWD challenges also hinder the use of RWD by commercial institutions in particular in the form of 1) “complexity” and 2) “cybersecurity.”
Another technological challenge that is hindering the RWD advancement is the heterogeneity of data formats between the different sources and countries. The FDA recognized the importance of having a common data model, along with the standard representations like coding schemes and common terminologies, to maximize the utility of RWD . Some organizations, like the Institute for Clinical and Economic Review, have already started requesting the drug companies to provide RWD in a specific format with the aim to increase the integration of different data types . Unifying data formats from observational databases can be useful in comparative research to answer the questions related to the cause of an observed effect. An initiative called Observational Health Data Science and Informatics has introduced a common data model called Observational Medical Outcomes Partnership, which enables a separate database to be systematically analyzed .
Cybersecurity is an important measure that must be considered when collecting a vast amount of sensitive data. In the case of RWD, these measures relate to unauthorized access or alteration, data theft, and data encryption. These are underlining factors in cyberattacks, such as the ransomware “WannaCry” cyberattack in 2017. Research has shown that >40% of healthcare organizations have experienced a cyberattack involving the WannaCry cryptoworm . Data breaches are a danger to data integrity, confidentiality, and availability and, as such, are also a threat to the adoption and advancement of RWE . Furthermore, data abuse can be triggered both by external factors, such as criminal cyberattacks and by internal factors, such as internal employees. Kaspersky, a vendor for security solutions, further explains this vulnerability in the study “How Employees are Making Businesses Vulnerable from within” . The study demonstrates the dangers of irresponsible employees and shows that 52% of businesses admit that employees are their biggest weakness in IT security. Therefore, cybersecurity should become more robust and resilient to further advance the use of RWD. Attacks on HCPs’ databases are constant and pose a threat to RWD integrity and availability which can harm the reputation of data sources impeding the use of RWD .
From the people’s perspective, many new challenges also arise in the form of a lacking public (1) “awareness” on the positive uses of RWD that hinders the use of RWD, the necessity of (2) “trust,” (3) “expertise” in analyzing RWD, and the potential of (4) “data bias.”
An important area requiring collaboration for the adoption of RWD is raising the public awareness of health data and its benefits . This naturally comes side by side with educating the public about privacy and data protection. Moreover, awareness among healthcare professionals is just as important. Research reveals that the lack of awareness among health data controllers can be detrimental to RWD access and use. An example of the French personal health record, the Dossier Medical Personnel, highlights this concern: “by the end of 2013, the target number of health records was not reached due to a lack of political visibility of both patients and professionals” . Nevertheless, several firms currently seem to promote easy access to electronic health data and registries around the globe. Thus, the awareness of RWD is increasing along with the number of research groups specializing in analyzing it .
When applying RWD it needs to be clear how RWE can be helpful, but also how it can be misleading. This is to be seen in conjunction with the possibility to be able to determine the level of trust that can be put in any analysis of RWD, whether or not this refers to the own data or the data from another source . The FDA issued guidance on the use of “electronic records and electronic signature in clinical investigations,” which discusses methods to ensure that RWD is trustworthy and reliable. This guidance recommends a risk-based method for decision-making process when validating RWD or implementing audit trails for digital health data . Whereas the connotation of “trust” is manifold, for purposes of the RWD Challenges Radar it can be summarized in two dimensions, being the RWD-evidence dimension and the RWD-analytics dimension.
To rely on evidence from RWD, one must first be able to understand the data and have the skills to analyze it and generate valuable information that can be used in a decision-making process. However, research shows that the skills needed to exploit the maximum benefits of RWD are not in “abundant supply within the pharmaceutical industry” . These skills must include domain knowledge, healthcare information technology, and methodological and technical expertise . A further study in which interviews were conducted with several healthcare stakeholders, confirmed that there is lack of expertise in the RWD analysis domain, giving the example of “innocent misinterpretation” in which analysts misunderstood relationships as causality . It is indeed vital to mention that an excellent understanding of accessible databases supports the assumptions of the validity of these databases [3, 15]. Research also referred to the lack of higher education programs on data analysis of RWD and the insufficient research capacity as a significant challenge that is facing RWD . Nevertheless, several initiatives aim to combine expertise from information technology and healthcare to analyze different databases and facilitate a fruitful collaboration between other HCPs .
Credible evidence generated from RWD must be of high quality and free from any form of bias through the entire process of translating data into evidence. Bias was, with around 13%, one of the most recurring challenges mentioned in the academic literature (see Table 1). Even if data quality is ascertained and privacy concerns are addressed, the selection bias is still regarded as the most known and challenging risk that is facing the adoption of RWD [3, 4, 18]. Earlier research revealed proof of reporting bias in several disease areas, such as depression, bipolar disorder, and many others through denying study data of drug manufacturing and regulatory bodies . As such, bias is an issue known in data analysis for decades. In addition to the selection and reporting bias further forms of bias, for example, information bias can manifest itself in observational studies. In their framework, the FDA stated that randomization is the key to prevent bias when allocating interventions via making “study groups balanced for risk factors for the targeted outcome” .
In today’s digital world the unprecedented access to vast amounts of RWD has led to an irrevocable interest by the life science industry to explore the new possibilities in providing clinical evidence for the development and approval of drugs, treatments, and therapies outside the context of “traditional” RCTs. However, our study shows that numerous different challenges must be considered when using RWD. The term RWD must be related to the application field and the associated industry. In particular, the evidence associated with RWD − RWE − needs to be considered and aligned with the related industry. Furthermore, there are currently no established and globally accepted instruments and standards. Concerning the use of RWD in the health and life science industry, it is also essential to consider the efficacy-effectiveness gap and to develop solutions or controls for it. An important finding is that the regulatory landscape must be carefully assessed before utilizing RWD. As there are constant changes and adjustments, both nationally and internationally, regulatory requirements need to be systematically reviewed, and their implementation monitored. While for the fulfillment of the regulatory requirements, the monitoring of potential associated risks is vital, other areas that deserve special attention could be identified. We categorized them with a focus on organizational, technological, and people-oriented challenges; for each of the categories, the outlined units of analysis stand for the future research tasks. The prototypical RWD Challenges Radar (Fig. 2) that we have developed is a first attempt to visualize the relevant aspects for decision-makers and other stakeholders, as seen from an organizational, technology, and people’s perspective.
As this study discusses RWD from a risk perspective, it does not elaborate on the domain of risk management theories and practices. Additional insight may be established by linking the application of the RWD Challenges Radar to risk management concepts, such as the GRC Capability Model  and the COSO-ERM model . The field of RWD is rapidly growing and given our literature search strategy and inclusion criteria, various sources can and will evolve that are of relevance to the RWD Challenges Radar and further developments (e.g., the proceedings from the US National Academies Workshop on RWE ). As the RWD Challenges Radar is still prototypical we have not yet extensively tested and validated it in a business context with RWD decision-makers and other stakeholders. As next steps, a design-science-based continuation of the development, also toward the construction of a RWD Challenges Cockpit would be valuable. In addition, more emphasis could have been put on pharmaceutical companies which have a very strong interest to learn more about the indication areas and their medicinal products and to develop new diagnostic tests that could help to establish personalized medicines. Recent trends are there where pharmaceutical companies are buying data companies. Next, a specific additional challenge is related to the accessibility of the data, especially if the RWD (derived from observational studies, etc.) is owned by a company which may not have an interest to share the data with the outside researchers. Finally, the analysis of RWD regarding adverse events could become very significant for the field of pharmacovigilance (e.g., analysis of the effect of comorbidities, patient’s habits, and parallel intake of various medications).
In this study, we primarily focused on the “inner layers” of the RWD Risk fields as set out in Figure 1 (i.e., being the “data-related” and “approval-related” risks of using RWD). However, it is our expectation that the ability to deal with RWD risks in relation to the business model in the course of time will eventually be the “game-changer” for the life science industry.
Conflict of Interest Statement
The authors have no conflicts of interest to declare.
The research for this work was financially supported by the foundation “Stiftung FHNW” (www.stiftungfhnw.ch).
F.G., P.M.A, B.S., E.M., L.B., and A.H. made the substantial contributions to the conception or design of the work; and the acquisition, analysis, or interpretation of the data for the work. F.G., P.M.A, B.S., E.M., L.B., and A.H. drafted this work and revised it critically for important intellectual content. F.G., P.M.A, B.S., E.M., L.B., and A.H. gave final approval of the version to be published. F.G., P.M.A, B.S., E.M., L.B., and A.H. agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.