Earlier this year I attended the Data + AI Summit produced by data and AI company, Databricks. During his keynote, Databricks CEO Ali Ghodsi revealed that 85 percent of AI use cases still aren’t in production. That's not terribly surprising. We know lots of companies want to leverage the power of AI, but they are lacking tools and expertise to bring them to market as finished products within their respective industries.
Companies need control over their data at its source and the AI models that are running on them to ensure security, privacy compliance and accurate decision making. Effectively managing data can offer an organization a competitive edge, reduce costs and provide greater flexibility when integrating applications.
Advancing Scientific Intelligence in Life Sciences
That's why we have just announced a strategic partnership with Databricks. Dotmatics is one of the very first "built-on Databricks" partners –and our Dotmatics Luma is built on a platform of Databricks, leveraging a modern market-leading AI cloud that is optimally designed for scientific data.
Michael Sanky, who is Databricks VP of Healthcare & Life Sciences said,“Dotmatics Luma exemplifies the transformative potential of building on Databricks, and it’s incredible to watch its adoption among the biopharmaceutical community who are excited to harness the full power of their data.”
Databricks recognizes the value that other companies can bring by building tools on top of its ecosystem, and so it’s committed to bringing the best technology solutions to market within every industry; this includes working with partners like Dotmatics who are building the next generation of data-driven applications for life sciences
What does that mean for our customers? It means using solutions built upon data intelligence.
Blending Scientific Intelligence + Data Intelligence
But Dotmatics goes one step farther—we know that in the realm of drug discovery scientific intelligence is just as essential as data intelligence. For organizations to harness the possibilities of generative AI, any tools must be intimately familiar with science workflows and domain expertise. To bring AI use cases into production more quickly, you need to be able to feed a giant funnel of scientific data into the cloud, make that data AI ready and leverage new AI capabilities offered by advanced data intelligence tools.
Today Dotmatics does that through:
Luma Platform helps organize the data and provide governed, configured access and data flow between different customer ecosystems.
Lab Connect serves as a massive data funnel for scientific data into the Luma and Databricks ecosystem, allowing for structured access to hundreds of different types of data from scientific instruments and research tools, including many of our own best-in-class tools.
The result is well organized, secure data ready for Delta Sharing into the customer’s own ecosystem, and ready for AI, BI, or Notebook use cases. Each is a massively challenging task. But today Dotmatics' relationship with Databricks is creating some really exciting possibilities for the future of drug discovery, a few of which we’ll preview here:
Use MLflow to Store Training Runs, Apply Security Permissions
MLflow helps data scientists and engineers manage the process of developing machine learning (ML) models. Think of it as a notebook, toolbox, and showroom combined—all designed to streamline the messy, iterative process of ML. When you're testing different approaches to train your ML model (e.g. changing parameters, algorithms, or data), it keeps track of what you did and the results. This helps you compare experiments and pick the best one. Once you've built a model you like, MLflow helps save it in a standard way so it can be reused or shared with others. The technology makes it easier to put your trained model into action, whether that's in a web app, a batch processing job, or another system. It provides a centralized place for teams to document and share their work, making it easier for others to understand and build upon your progress.
We think our customers will particularly like the ability to apply security and permissions on who may access those models. Plus, it’s super easy to use, requiring one line of code. Basically, any model that’s stored in Unity Catalog can be served right away. One use case that we've been using internally is Chemical Structure Activity prediction. With this method of storing and serving models, it's very easy to run the predictions for every new structure added to the system.
Write SQL Queries to Expedite the R&D Process
Most people know that AI companies have APIs that you can use to leverage their foundation of large language models (LLMs), but Databricks takes it a step further by integrating those models directly into SQL queries. And that can open up new, unexplored possibilities in the R&D pathway to drug discovery.
Imagine that you have 10,000 description fields and you want to know which ones to look into more closely. It would take a few hours to write a script to pull from that database of fields, submit each one to an API endpoint, and then get results back. Instead our customers can use Luma’s dataflows and the SQL-based transformations that are dataflows. You can write a SQL query that asks your AI model to predict something as part of the query's answer. Instead of just pulling data, the query works like: "Hey AI, based on this data, what do you predict?" The result combines your data with the AI's prediction, right in the query.
It’s the difference between typing 10,000 different questions individually into ChatGPT versus asking all 10,000 questions at once and getting the responses very quickly. Dotmatics Luma makes this easy, and this functionality is 100% usable in our platform today.
Enhancing AI with Scientific Context
Similarly, when you are asking a LLM like ChatGPT a question, you can define functions to help it answer more accurately with deeper context. Dotmatics is designing this equivalent for life sciences R&D. We’re developing scientific functions that will allow the AI to execute a number of statistical or scientific functions in the process of answering a question. That includes the ability to give governed, controlled access to your data should you choose to do so. We’re also building the ability to give the AI scientific powers to provide even greater understanding, for example such as statistical calculations, or gating in flow cytometry data.
That could be a real gamechanger. Giving the AI the tools it needs to answer deep, in-field questions about our customers' data is what they need to supercharge their decision-making workflows.
Build Confidence with Operational Monitoring for AI
Operational monitoring is essential for ensuring AI systems run reliably and effectively in real-world environments. It’s what catches issues early, like a data pipeline breaking or a model starting to drift from accuracy. In regulated industries, it’s the key to staying compliant and audit-ready. It also helps build trust—people rely on AI that’s consistent and reliable. Simply put, it’s how AI stays useful, ethical, and effective.
Databricks recently introduced operational monitoring through an interface that lets you see what the AI was “thinking” at different parts of you asking questions. Dotmatics Luma will be able to uncover how that AI was working using the new Databricks functionality. And though not a customer-facing feature at this time, that level of detail is important to know what the AI was trying to do at different points throughout the process. It provides deep observability into whether the AI is getting answers right or wrong, which equates to what you need to actually make a good product.
This is all just the beginning. In the coming year, Dotmatics and Databricks will continue to unlock new opportunities to enable scientific research organizations to stay ahead of the curve in data science, bench science, and advanced AI-driven insights. Together, we’re bridging data and scientific intelligence to drive innovation and accelerate breakthroughs in drug discovery and beyond.
If you’re interested in learning more about the power of this partnership to transform data-driven discovery, watch this on-demand webinar, “Building Breakthroughs: Harnessing Data and AI for Innovation”