Getting Unstuck on the Path to Digital Transformation
Data integrity is an ongoing concern across all R&D organizations, no matter what part of the research lifecycle they’re navigating. These concerns extend beyond the potential for delayed timelines or cost overruns. Instead, it’s about something bigger: establishing a culture of quality; ensuring product efficacy and patient safety; and being a trusted brand, partner, or provider.
Prioritizing Data Integrity in the Lab
Good data practices throughout the R&D process can positively impact data integrity in the lab. Companies must be able to defend the fidelity and confidentiality of all records and data generated throughout a product’s entire lifecycle, starting with the earliest points in research, including raw data, metadata, and transformed data. To do this, companies must have the right processes and technologies in place to ensure proper:
Data integrity - How is the completeness, consistency, validity, and accuracy of data impacted by the way it is produced, captured, quality checked, transformed, and traced?
Data governance - How does the company manage and track who has access to what data, via what means, how it is used, and to what degree?
Data security - How is data encrypted, transferred, stored, and backed up?
These factors—each challenging in their own right—are all intertwined, adding to the complexity of upholding good data practices in the modern lab.
A Shifting Data Management Landscape
As R&D organizations digitize their data to make analytics at scale possible, best practices for data management must also evolve. Teams must have clear strategies for identifying and mitigating threats to data integrity, including technological, managerial, and external risks. This is no small task. In fact, in the realm of Pharmaceuticals, the U.S. Food and Drug Administration (FDA) reports increasing data integrity violations in recent years.[1-3] Data integrity is at risk in many cases because the complexity of R&D data, processes, and technologies present numerous opportunities for good data practices to go awry. The most common type of warnings and violations cited by the FDA include data loss; missing metadata; non-contemporaneous collection or backdating; data deletion and copying; sample elimination or reprocessing; poorly investigated out-of-specification results; data access and security issues; and inadequate or disabled audit trails.[1-3] Missteps like these at any point in the R&D process can impact the overall research validity. Data integrity and security breaches could potentially lead to incorrect or non-recreatable research results, raise implications on patient safety and product efficacy, or generate violations that might cause a drug to be rejected at submission or pulled from the market later.[1]
Factors Impacting Good Data Practices
With stakes so high, it’s important to assess what’s inhibiting a culture of data quality. Three key factors complicating good data practices include:
Multimodal R&D creates huge volumes of disparate data that need proper handling.
Increases in collaboration are driving data to be more widely shared, and done so with security and privacy in mind.
Artificial intelligence (AI) is changing how data are used to drive innovation.
Let’s explore each factor.
1. Multimodal R&D
Companies hoping to drive innovation are diversifying their R&D efforts and working across different areas of science with novel modalities. As a result, data are pouring from wide-ranging sources via different means and in different formats. An organization or institution may have several different internal research groups collecting data from thousands of pieces of specialty equipment or instruments; in parallel, it could also be undertaking complex post-acquisition or legacy-data migration activities, all while working with multiple external CROs who have their own distinct systems and processes. All of these different data come from teams that work not only across different modalities and speciality areas of science, but also across different locations globally, each with its own compliance standards and regulations. This incredible volume and diversity of multimodal R&D data create lab integration and data management challenges that can risk compromising data integrity and security. Many companies are struggling to keep pace with a vast volume of diverse data and metadata needed to inform decision making throughout the R&D process.
2. Collaboration
Ensuring the success of R&D at scale means improving data flow between research groups so they can build off of their collective knowledge. The importance of data sharing in advancing science was recently underscored by the United States National Institutes for Health (NIH), which established new 2023 data management and sharing policies to confirm findings, encourage reuse, and spur innovation.[4] Whether it’s chemists and biologists collaborating on chemically modified biologics, or internal and external partners working on projects across modalities and diseases, teamwork is more important than ever; unfortunately, it’s not always easy. Many R&D groups, who have long worked in relative isolation, are now required to collaborate and share data, which requires shifts in mindset and culture. It also requires a governance and execution shift. Bespoke and insulated research teams don’t have the systems and processes in place to share and hand off well-annotated data while at the same time controlling access, tracking changes, and ensuring good data practices are followed by all participants and collaborators. For many companies, it’s hard to facilitate efficient and secure data sharing that doesn’t compromise data integrity. Even the most erudite collaborators have approaches to interaction with instruments, software, workflows, and data types that don’t align with each other. This complicates collaboration. Structured and unstructured data end up scattered in multiple repositories and across different mediums rather than within a secure, centralized, standardized data pool that appropriate collaborators can access and that leverages a well-defined data governance framework. Data sharing challenges are growing so common that they’ve prompted calls to establish better data management standards. One well-known example is the FAIR guiding principles for scientific data management, which promote the adoption of technology and processes that make all data findable, accessible, interoperable, and reusable by both humans and machines alike.[5] Becoming FAIR complaint requires changes in format, model, and storage of data, as well the ways that instruments, software, and systems are integrated. While this can seem overwhelming, the change can be done incrementally; it’s not an all-or-nothing proposition. Whether a company is building a comprehensive FAIR-compliant informatics ecosystem or adopting a data analysis and graphing solution that embraces FAIR data principles, moves toward implementing FAIR-aligned methods can pay dividends in time savings, reproducibility of research, improved knowledge sharing, and AI-readiness.
3. Artificial Intelligence
As AI arrives in R&D, organizations and institutions will need data infrastructures to capture and manage the proprietary data that will differentiate their research in an AI-everywhere world. For many universities and health companies, becoming AI-ready means first adopting technology and process changes to support exponential growth in data volumes, elimination of data silos, integration of bespoke software and systems, and normalization of data. The ultimate goal is that any data created and captured throughout the R&D process will be trustworthy, well-structured, correlated, shareable, and model-ready. While achieving these aligned data standards is uniquely challenging in scientific R&D because of the complexity of the workflows, data types, software, and systems, it is, nonetheless, essential. Global compliance regulations are currently being updated to guide the use of AI and ML in medical and general research.[6-9] In March 2024, the EU passed an overarching Artificial Intelligence Act. This landmark law aims to protect human health, safety, and fundamental rights as AI is increasingly relied upon for innovation across a broad spectrum of industries, academia, government, and civil organizations. [9] Now is the time for companies to ensure that their existing systems and processes support the regulatory and ethical challenges of using AI in research, including assurance of data integrity, security, traceability, and bias limitation.
Good Data Practices
Alignment of data management and integrity are vital to long-term research success and preparation for the automated, connected, and collaborative future of research. Considerations for systems that support these imperatives can include those that:
Support research transparency, credibility, and reproducibility by ensuring complete data capture.
Automate results and metadata collection from instruments and other lab systems wherever possible.
Tie and track results back to their precise samples and fully documented experiments.
Aggregate all relevant R&D data into intelligent, correlated, model-ready data structures.
Give scientists tools to easily manage, search, and visualize their R&D data.
Unite applications that produce and analyze data within one secure data-management platform.
Centralize and store data securely, with end-to-end encryption in transfer and at rest.
Configure checks-and-balances throughout the R&D process using features such as audit trails, QC/QA and SOP checks, signature requirements, permission and access controls to different data sets and functionality, project codes and aliases, encrypted reports, and secure dashboards.
Contact us to learn more about Dotmatics.
References
Chen, S. Culture of Quality: Data Integrity and CGMP Compliance. U.S. Food and Drug Administration - SBIA Generic Drug Forum – April 26, 2022. (Accessed 02/06/2024)
Neumeyer, M. Data Integrity: 2020 FDA Data Integrity Observations in Review. American Pharmaceutical Review. Jun 23, 2020.
Vazquez, M.; Rayser, J. Regulatory warning letters in pharma: What can we learn post-COVID? Cleanroom Technology. July 27, 2022.
2023 NIH Data Management and Sharing Policy. National Institutes of Health. (Accessed 02/06/2024)
Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data, 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
Using Artificial Intelligence & Machine Learning in the Development of Drug & Biological Products. Discussion Paper and Request for Feedback. U.S. Food & Drug Administration. 2023. (Accessed 02/06/2024)
Artificial Intelligence in Drug Manufacturing. FDA Center for Drug Manufacturing and Research. Discussion Paper. 2023 (Accessed 02/06/2024)