This put up was written with Avinash Erupaka from Bayer (IT PH, Drug Innovation platform)
How can pharmaceutical corporations unlock the complete potential of their knowledge to drive breakthrough improvements? Bayer, a worldwide chief in well being and vitamin, is devoted to tackling the urgent challenges of our time, together with a rising and growing older inhabitants and the pressure on our planet’s ecosystems. Its mission of “Well being for All, Starvation for None” drives its dedication to addressing societal and environmental wants by means of groundbreaking analysis. Bayer is targeted on creating progressive options that make a tangible distinction on the earth and worth for its clients, staff, and stakeholders. Headquartered in Leverkusen, Germany, Bayer operates throughout 80 international locations and is pioneering a knowledge science ecosystem that transforms how analysis groups entry, analyze, and derive insights from complicated scientific knowledge.
By harnessing the facility of information, analytics, synthetic intelligence and machine studying (AI/ML), and generative AI, Bayer is making a cloud-based Pharma R&D Information Science Ecosystem (DSE) on AWS that powers cutting-edge applied sciences and ideas with sturdy knowledge administration. In doing so, R&D groups can absolutely understand the potential of unified knowledge and analytics.
On this put up, we talk about how Bayer used the following era of SageMaker to construct an answer that unified knowledge ingestion, storage, analytics, and AI/ML workflows. Constructed on knowledge mesh rules, Bayer’s DSE integrates superior knowledge ingestion, storage, analytics, and ML workflows to allow agile experimentation and scalable perception era. It democratizes entry to analytics, fosters cross-Area collaboration, and supplies versatile integration of structured, semi-structured, and unstructured knowledge.
Challenges in pharmaceutical analysis
In pharmaceutical analysis, knowledge has turn out to be probably the most crucial asset for driving innovation. Nevertheless, managing this knowledge successfully presents unprecedented challenges and conventional knowledge administration approaches have gotten more and more insufficient for complicated, international analysis initiatives. Many pharma R&D group face a fancy ecosystem of information and analytics associated obstacles that hinder scientific discovery and operational effectivity:
- Siloed datasets – Analysis datasets are siloed throughout domains, limiting reuse and slowing discovery.
- A number of knowledge modalities – Scientific trial knowledge (structured), real-world proof (semi-structured), and genomic recordsdata (unstructured) existed in isolation, complicating integration and evaluation.
- Rigid ingestion capabilities – Programs that help batch processing (equivalent to trial knowledge), real-time knowledge streams (for instance, from lab gear), and event-driven ingestion (equivalent to regulatory updates).
- Rising R&D prices – Disparate applied sciences and disconnected techniques create operational inefficiencies and elevated licensing and upkeep prices.
- Inconsistent panorama to totally use ML – The absence of a unified knowledge structure and standardized, domain-agnostic MLOps workflows imply that knowledge and analytics innovation is commonly advert hoc and non-repeatable. Groups lack a streamlined method to scale profitable patterns, leading to redundant efforts, longer growth cycles, and missed alternatives for cross-domain synergy.
- Disconnected architectures – Software program options are usually not built-in into the broader unified ecosystem, leading to silos, redundancies, and inefficiencies.
Recognizing these systemic challenges, Bayer launched into a transformative journey. DSE is not only a technological answer, however a strategic reimagining of how analysis knowledge and analytics might be used throughout a worldwide group. By bringing collectively cutting-edge applied sciences, standardized frameworks, a collaborative knowledge mesh, and lakehouse structure, Bayer got down to assist researchers and engineers speed up pharmaceutical innovation.
Discovering an answer with the following era of SageMaker
Bayer envisioned a unified knowledge science ecosystem that would supply the next:
- A unified collaborative growth expertise for all knowledge scientists no matter their location or specialization
- Seamless entry to each structured and unstructured knowledge by means of a constant interface
- Constructed-in governance and compliance controls applicable for pharmaceutical analysis
- Scalable compute sources to deal with probably the most complicated analytical workloads
Bayer carried out a complete analysis of varied options earlier than deciding on the following era of SageMaker because the cornerstone of their new knowledge science ecosystem. Though different choices had deserves, Bayer prioritized the next capabilities:
- Entry to multimodal knowledge – Important for genomics, proteomics, and superior biomarker analysis
- Centralized asset market – Central hub to find and reuse knowledge, options, fashions, and different enterprise belongings
- Built-in tooling ecosystem – Streamlined entry to key instruments like Git, ETL, MLflow, and generative AI utility builders in a single place
- Multi-domain and cross-Area help – Vital for international analysis collaboration
- Worth-performance – Needed for sustainable, long-term scaling
The capabilities of Amazon SageMaker Unified Studio and Amazon SageMaker Catalog aligned with Bayer’s imaginative and prescient of decentralized mesh execution mixed with centralized discovery and governance. They enabled groups to work with their most well-liked instruments, equivalent to Jupyter Notebooks or workflow builders, whereas sustaining discoverability and reusability of belongings.
Answer overview
This part describes the important thing options and structure of Bayer’s DSE constructed on SageMaker. The DSE answer addresses the recognized challenges by means of a multi-layered structure:
- Breaking down knowledge silos – Multimodal knowledge ingestion capabilities of the answer break down knowledge silos by enabling unified storage, processing of structured, semi-structured, and unstructured knowledge by means of batch, streaming, and event-driven pipelines.
- Dealing with various knowledge modalities – A hybrid lakehouse structure, constructed on Amazon Easy Storage Service (Amazon S3), Apache Iceberg, and Amazon Redshift, supplies a versatile basis for dealing with various knowledge modalities and maturities whereas offering knowledge consistency and accessibility.
- Decreasing prices by means of standardization – To deal with rising R&D prices and operational inefficiencies, pre-wired analytical workbenches provide standardized templates and built-in growth environments (IDEs) that scale back redundancy and speed up workflow growth.
- Unlocking AI/ML with Amazon SageMaker AI and Amazon Bedrock – Superior AI/ML capabilities, powered by Amazon SageMaker AI and Amazon Bedrock, create a standardized, domain-agnostic MLOps atmosphere that permits repeatable innovation and cross-domain synergy.
- Managing instruments ecosystem with end-to-end observability – Sturdy governance and observability options present compliance and system reliability whereas integrating beforehand disconnected instruments right into a unified, well-monitored ecosystem that breaks down architectural silos and promotes environment friendly useful resource utilization.
The DSE structure implements knowledge mesh rules the place knowledge domains (omics, regulatory, scientific trials) are handled as merchandise, with possession and administration duties assigned to area specialists. These domains are decentralized for execution however stay discoverable and reusable by means of SageMaker Catalog. On the core of the structure is a hybrid mesh lakehouse structure that mixes Amazon S3 and Iceberg, offering the pliability to deal with each structured and unstructured knowledge effectively. SageMaker Unified Studio supplies an analytical layer the place researchers can entry the complete suite of instruments wanted for his or her work. The next diagram illustrates this structure.
Affect
The primary section of Bayer’s DSE confirmed the following era of SageMaker as a strong basis for his or her R&D DSE—designed to stability decentralized innovation with centralized governance by means of a scalable knowledge mesh structure. With this answer, Bayer can catalog and handle multimodal knowledge belongings—together with structured and unstructured knowledge, ML options, fashions, and customized scientific belongings—with context-rich metadata throughout various Pharma R&D domains. Bayer is now positioned to onboard over 300 TB of biomarker knowledge and combine siloed omics, scientific, and chemistry knowledge repositories right into a cohesive atmosphere. With built-in instruments like JupyterLab Areas, MLflow, and SageMaker AI Studio, the DSE platform is laying the groundwork for a complete, GxP-aware ML workbench—paving the way in which to operationalize over 25 high-value ML use instances and help greater than 100 knowledge scientists throughout the group.
“The Information Science Ecosystem is important for creating our medicines,” says Daniel Gusenleitner, Mission Lead for the R&D Information Science Ecosystem. “It enhances our enterprise workflows with superior analytics, serving to us speed up the seek for new therapies. By integrating knowledge from your complete analysis and growth course of, we enhance the possibilities of technical success and guarantee our efforts are environment friendly. Unlocking our knowledge additionally facilitates goal discovery, resulting in groundbreaking developments in affected person care.”
Subsequent steps
Bayer has efficiently begun their Information Science Ecosystem on the following era of Amazon SageMaker and is working to onboard the primary use case of superior biomarker analysis. Constructing on the sturdy basis, Bayer can also be accelerating the evolution of the DSE answer with the next key enhancements:
- Federated catalogs and cross-domain integration – Enabling search and reuse of information belongings throughout therapeutic areas and enterprise models
- Superior ontology and semantic layer – Enriching metadata with area data to help AI-based search, discovery, and reasoning
- Adoption of generative and agentic AI workflows – Driving novel drug discovery and accelerating speculation era
Conclusion
By leveraging the following era of Amazon SageMaker to construct their cloud-based Information Science Ecosystem, Bayer is making a basis for sooner, extra environment friendly analysis and discovery. Amazon SageMaker is unifying various knowledge sorts, enabling international collaboration, and standardizing ML workflows to assist place Bayer on the forefront of data-driven innovation.
To study extra and get began with the following era of SageMaker, seek advice from Amazon SageMaker or the AWS console.

