SAP information ingestion and replication with AWS Glue zero-ETL

Organizations more and more need to ingest and acquire sooner entry to insights from SAP programs with out sustaining complicated information pipelines. AWS Glue zero-ETL with SAP now helps information ingestion and replication from SAP information sources reminiscent of Operational Knowledge Provisioning (ODP) managed SAP Enterprise Warehouse (BW) extractors, Superior Enterprise Utility Programming (ABAP), Core Knowledge Providers (CDS) views, and different non-ODP information sources. Zero-ETL information replication and schema synchronization writes extracted information to AWS providers like Amazon Redshift, Amazon SageMaker lakehouse, and Amazon S3 Tables, assuaging the necessity for handbook pipeline improvement. This creates a basis for AI-driven insights when used with AWS providers reminiscent of Amazon Q and Amazon Fast Suite, the place you need to use pure language queries to research SAP information, create AI brokers for automation, and generate contextual insights throughout your enterprise information panorama.

On this publish, we present the best way to create and monitor a zero-ETL integration with numerous ODP and non-ODP SAP sources.

Answer overview

The important thing element of SAP integration is the AWS Glue SAP OData connector, which is designed to work with the SAP information constructions and protocols. The connector supplies connectivity to ABAP-based SAP programs and adheres to the SAP safety and governance frameworks. Key options of the AWS SAP connector embrace:

Makes use of OData protocol for information extraction from numerous SAP NetWeaver programs
Managed replication for complicated SAP information fashions reminiscent of BW extractors (reminiscent of 2LIS_02_ITM) and CDS views (reminiscent of C_PURCHASEORDERITEMDEX)
Handles each ODP and non-ODP entities utilizing the SAP change information seize (CDC) know-how

The SAP connector works with each AWS Glue Studio or AWS managed replication with zero-ETL. Self-managed replication in AWS Glue Studio supplies full management over information processing models, replication frequencies, adjusting price-performance, web page dimension, information filters, locations, file codecs, information transformation, and writing your personal code with chosen runtime. AWS managed information replication in zero-ETL removes burden of customized configurations and supplies an AWS managed various, permitting replication frequencies between quarter-hour to six days. The next resolution structure demonstrates the approaches of ingesting ODP and non-ODP SAP information utilizing zero-ETL from numerous SAP sources and writing to Amazon Redshift, SageMaker lakehouse, and S3 Tables.

Change information seize for ODP sources

SAP ODP is a knowledge extraction framework that permits incremental and information replication from SAP supply programs to focus on programs. The ODP framework supplies functions (subscribers) to request information from supported objects, reminiscent of BW extractors, CDS views, and BW objects, in an incremental method.

AWS Glue zero-ETL information ingestion begins with executing a full preliminary load of entity information to determine the baseline dataset within the goal system. After the preliminary full load is full, SAP provisions a delta queue often called Operational Delta Queue (ODQ), which captures information adjustments, together with deletions. The delta token is distributed to the subscriber in the course of the preliminary load and endured throughout the zero-ETL inside state administration system.

The incremental processing retrieves the final saved delta token from the state retailer, then sends a delta change request to SAP utilizing this token utilizing the OData protocol. The system processes returned INSERT/UPDATE/DELETE operations via the SAP ODQ mechanism and receives a brand new delta token from SAP even in eventualities the place no information had been modified. This new token is endured within the state administration system after profitable ingestion. In error eventualities, the system preserves the present delta token state, enabling retry mechanics with out information loss.

The next screenshot illustrates a profitable preliminary load adopted by 4 incremental information ingestions on the SAP system.

Change information seize for non-ODP sources

Non-ODP constructions are OData providers that aren’t ODP enabled. These are APIs, features, views, or CDS views which are uncovered straight with out the ODP framework. Knowledge is extracted utilizing this mechanism; nonetheless, incremental information extraction relies on the character of the thing. If the thing, for instance, comprises a “final modified date” area, it’s used to trace adjustments and supply incremental information extraction.

AWS Glue zero-ETL supplies out-of-the-box incremental information extraction for non-ODP OData providers, offered the entity features a area to trace adjustments (final modified date or time). For such SAP providers, zero-ETL supplies two approaches for information ingestion: timestamp-based incremental processing and full load.

Timestamp-based incremental processing

Timestamp-based incremental processing makes use of clients’ configured timestamp fields in zero-ETL to optimize the information extraction course of. The zero-ETL system establishes a beginning timestamp that serves as the inspiration for subsequent incremental processing operations. This timestamp, often called the watermark, is essential for facilitating information consistency. The question building mechanism builds OData filters based mostly on timestamp comparisons. These queries extract information which are created or modified because the final profitable processing execution. The system’s watermark administration performance maintains monitoring of the best timestamp worth from every processing cycle and makes use of this data as the start line for subsequent executions. The zero-ETL system performs an upsert on the goal utilizing the configured main keys. This strategy facilitates correct dealing with of updates whereas sustaining information integrity. After every profitable goal system replace, the watermark timestamp is superior, making a dependable checkpoint for future processing cycles.

Nonetheless, the timestamp-based strategy has a limitation: it could actually’t monitor bodily deletions as a result of SAP programs don’t keep deletion timestamps. In eventualities the place timestamp fields are both unavailable or not configured, the system transitions to a full load with upsert processing.

Full load

The total load strategy serves as each a standalone strategy and a fallback mechanism when timestamp-based processing is just not possible. This methodology entails extracting the whole entity dataset throughout every processing cycle, making it appropriate for eventualities the place change monitoring is just not out there or required. The extracted dataset is upserted within the goal system. The upsert processing logic handles each new report insertions and updates to present information.

When to decide on incremental or full load

The timestamp-based incremental processing strategy gives optimum efficiency and useful resource utilization for giant datasets with frequent updates. Knowledge switch volumes are lowered via the selective switch of solely modified information, leading to reductions in community site visitors. This optimization straight interprets into decrease operational prices. The total load with upsert facilitates information synchronization in eventualities the place incremental processing is just not possible.

Collectively, these approaches kind an entire resolution for zero-ETL integration with non-ODP SAP constructions, addressing the varied necessities of enterprise information integration eventualities. Organizations utilizing these approaches ought to consider their particular use circumstances, information volumes, and efficiency necessities when selecting between the 2 approaches.The next diagram illustrates the SAP information ingestion workflow.

Observing SAP zero-ETL integrations

AWS Glue maintains state administration, logs, and metrics utilizing Amazon CloudWatch logs. For directions to configure observability, discuss with Monitoring an integration. Be sure AWS Id and Entry Administration (IAM) roles are configured for log supply. The combination is monitored from each supply ingestion and writing to the chosen goal.

Monitoring supply ingestion

The combination of AWS Glue zero-ETL with CloudWatch supplies monitoring capabilities to trace and troubleshoot the information integration processes. By CloudWatch, you may entry detailed logs, metrics, and occasions that assist determine points, monitor efficiency, and keep operational well being of your SAP information integrations. Let’s have a look at a number of situations of success and error eventualities.

Situation 1: Lacking permissions in your function

This error occurred throughout a knowledge integration course of in AWS Glue when making an attempt to entry SAP information. The connection encountered a CLIENT_ERROR with a 400 Unhealthy Request standing code, indicating that the function has lacking permissions:

{
    "eventTimestamp": 1755031897157,
    "integrationArn": "arn:aws:glue:us-east-2:012345678901:integration:1da4dccd-96ce-4661-8ef1-bf216623d65f",
    "sourceArn": "arn:aws:glue:us-east-2:012345678901:connection/SAPOData-sap-glue-dev",
    "degree": "ERROR",
    "messageType": "IngestionFailed",
    "particulars": {
        "loadType": "",
        "errorMessage": "You wouldn't have the required permissions to entry the glue connection. just be sure you have the right IAM permissions to entry AWS Glue assets.",
        "errorCode": "CLIENT_ERROR"
    }
}

Situation 2: Damaged delta hyperlinks

The CloudWatch log signifies a problem with lacking delta tokens throughout information synchronization from SAP to AWS Glue. The error happens when making an attempt to entry the SAP gross sales doc merchandise desk FactsOfCSDSLSDOCITMDX via the OData service. The absence of delta tokens, that are wanted for incremental information loading and monitoring adjustments, has resulted in a CLIENT_ERROR (400 Unhealthy Request) when the system tried to open the information extraction API RODPS_REPL_ODP_OPEN:

{
    "eventTimestamp": 1760700305466,
    "integrationArn": "arn:aws:glue:us-east-1:012345678901:integration:f62e1971-092c-46a3-ba88-d32f4c6cd649",
    "sourceArn": "arn:aws:glue:us-east-1:012345678901:connection/SAPOData-sap-glue-dev",
    "degree": "ERROR",
    "messageType": "IngestionFailed",
    "particulars": {
        "tableName": "/sap/opu/odata/sap/Z_C_SALESDOCUMENTITEMDEX_SRV/FactsOfCSDSLSDOCITMDX",
        "loadType": "",
        "errorMessage": "Acquired an error from SAPOData: Couldn't open information entry through extraction API RODPS_REPL_ODP_OPEN. Standing code 400 (Unhealthy Request).",
        "errorCode": "CLIENT_ERROR"
    }

Situation 3: Consumer errors on SAP information ingestion

This CloudWatch log reveals a consumer exception state of affairs the place the SAP entity EntityOf0VENDOR_ATTR is just not situated or accessed via the OData service. This CLIENT_ERROR happens when the AWS Glue connector makes an attempt to parse the response from the SAP system however fails, as a result of both the entity being non-existent within the supply SAP system or the SAP occasion being quickly unavailable:

{
    "eventTimestamp": 1752676327649,
    "integrationArn": "arn:aws:glue:us-east-1:012345678901:integration:9f1acbc0-599f-47d2-8e84-e9779976af59",
    "sourceArn": "arn:aws:glue:us-east-1:012345678901:connection/SAPOData-sap-glue-dev",
    "degree": "ERROR",
    "messageType": "IngestionFailed",
    "particulars": {
        "tableName": "/sap/opu/odata/sap/ZVENDOR_ATTR_SRV/EntityOf0VENDOR_ATTR",
        "loadType": "",
        "errorMessage": "Knowledge learn from supply failed for entity /sap/opu/odata/sap/ZVENDOR_ATTR_SRV/EntityOf0VENDOR_ATTR utilizing connector SAPOData; ErrorMessage: Glue connector returned consumer exception. The response from the connector software could not be parsed.",
        "errorCode": "CLIENT_ERROR"
    }
}

Monitoring goal write

Zero-ETL employs monitoring mechanisms relying on the goal system. For Amazon Redshift targets, it makes use of the svv_integration system view, which supplies detailed details about integration standing, job execution, and information motion statistics. When working with SageMaker lakehouse targets, zero-ETL tracks integration states via the zetl_integration_table_state desk, which maintains metadata about synchronization standing, timestamps, and execution particulars. Moreover, you need to use CloudWatch logs to observe the mixing progress, capturing details about profitable commits, metadata updates, and potential points in the course of the information writing course of.

Situation 1: Profitable processing on SageMaker lakehouse goal

The CloudWatch logs present profitable information synchronization exercise for the plant desk utilizing CDC mode. The primary log entry (IngestionCompleted) confirms the profitable completion of the ingestion course of at timestamp 1757221555568, with a final sync timestamp of 1757220991999. The second log (IngestionTableStatistics) supplies detailed statistics of the information modifications, displaying that in this CDC sync 300 new information had been inserted, 8 information had been up to date, and a pair of information had been deleted from the goal database gluezetl. This degree of element helps in monitoring the quantity and varieties of adjustments being propagated to the goal system.

{
    "eventTimestamp": 1757221555568,
    "integrationArn": "arn:aws:glue:us-east-1:012345678901:integration:b7a1c69a-e180-4d27-b71d-5fcf196d9d2d",
    "sourceArn": "arn:aws:glue:us-east-1:012345678901:connection/mam301",
    "targetArn": "arn:aws:glue:us-east-1:012345678901:database/gluezetl",
    "degree": "VERBOSE",
    "messageType": "IngestionCompleted",
    "particulars": {
        "tableName": "plant",
        "loadType": "CDC",
        "message": "Efficiently accomplished ingestion",
        "lastSyncedTimestamp": 1757220991999,
        "consumedResourceUnits": "10"
    }
}

{
    "eventTimestamp": 1757222506936,
    "integrationArn": "arn:aws:glue:us-east-1:012345678901:integration:b7a1c69a-e180-4d27-b71d-5fcf196d9d2d",
    "sourceArn": "arn:aws:glue:us-east-1:012345678901:connection/mam301",
    "targetArn": "arn:aws:glue:us-east-1:012345678901:database/gluezetl",
    "degree": "INFO",
    "messageType": "IngestionTableStatistics",
    "particulars": {
        "tableName": "plant",
        "loadType": "CDC",
        "insertCount": 300,
        "updateCount": 8,
        "deleteCount": 2
    }
}

Situation 2: Metrics on Amazon SageMaker lakehouse goal

The zetl_integration_table_state desk in SageMaker lakehouse supplies a view of integration standing and information modification metrics. On this instance, the desk reveals a profitable integration for an SAP CDS view desk with integration ID 62b1164f-5b85-45e4-b8db-9aa7ab841e98 within the testdb database. The report signifies that at timestamp 1733000485999, there have been 10 insertion information processed (recent_insert_record_count: 10), with no updates or deletions (each counts at 0). This desk serves as a monitoring device, offering a centralized view of integration states and detailed statistics about information modifications, making it simple to trace and confirm information synchronization actions within the lakehouse.

+---+--------------------------------------+---------------+----------------------------------------------------------+-----------+--------+-----------------+-------------------------------+------------------------------+------------------------------+------------------------------+
| # | integration_id                       | target_database | table_name                                               | table_state | purpose | last_updated_timestamp | recent_ingestion_record_count | recent_insert_record_count | recent_update_record_count | recent_delete_record_count |
+---+--------------------------------------+---------------+----------------------------------------------------------+-----------+--------+-----------------+-------------------------------+------------------------------+------------------------------+------------------------------+
| 2 | 62b1164f-5b85-45e4-b8db-9aa7ab841e98 | testdb        | _sap_opu_odata_sap_zcds_po_scl_new_srv_factsofzmmpurordsldex | SUCCEEDED |        | 1733000485999   | 10                            | 0                            | 0                            | 0                            |
+---+--------------------------------------+---------------+----------------------------------------------------------+-----------+--------+-----------------+-------------------------------+------------------------------+------------------------------+------------------------------+

Situation 3: Redshift monitoring system makes use of two views to trace zero-ETL integration standing

svv_integration supplies a high-level overview of the mixing standing, displaying that integration ID 03218b8a-9c95-4ec2-81ad-dd4d5398e42a has efficiently replicated 18 tables with no failures, and the final checkpoint was at transaction sequence 1761289852999.

+--------------------------------------+---------------+-----------+-----------------+-------------+----------------------------------------------+-------------------------+-----------------------+---------------+------------------+-----------------+-----------------+------------------+-----------------+-----------------+
| integration_id                       | target_database | supply    | state           | current_lag | last_replicated_checkpoint                   | total_tables_replicated | total_tables_failed | creation_time | refresh_interval | source_database | is_history_mode | query_all_states | truncatecolumns | accept_invchars |
+--------------------------------------+---------------+-----------+-----------------+-------------+----------------------------------------------+-------------------------+-----------------------+---------------+------------------+-----------------+-----------------+------------------+-----------------+-----------------+
| 03218b8a-9c95-4ec2-81ad-dd4d5398e42a | test_case     | GlueSaaS  | CdcRefreshState | 771754      | {"txn_seq":"1761289852999","txn_id":"0"}     | 18                      | 0                     | 22:54.7       | 0                |                 | FALSE           | FALSE            | FALSE           | FALSE           |
+--------------------------------------+---------------+-----------+-----------------+-------------+----------------------------------------------+-------------------------+-----------------------+---------------+------------------+-----------------+-----------------+------------------+-----------------+-----------------+

svv_integration_table_state gives table-level monitoring particulars, displaying the standing of particular person tables throughout the integration. On this case, the SAP materials group textual content entity desk is in Synced state, with its final replication checkpoint matching the mixing checkpoint (1761289852999). The desk presently reveals 0 rows and 0 dimension, suggesting it’s newly created.

+--------------------------------------+---------------+-------------+--------------------------------------------------------------+-------------+----------------------------------------------+--------+-----------------------+------------+------------+-----------------+
| integration_id                       | target_database | schema_name | table_name                                                   | table_state | table_last_replicated_checkpoint             | purpose | last_updated_timestamp | table_rows | table_size | is_history_mode |
+--------------------------------------+---------------+-------------+--------------------------------------------------------------+-------------+----------------------------------------------+--------+-----------------------+------------+------------+-----------------+
| 03218b8a-9c95-4ec2-81ad-dd4d5398e42a | test_case     | public      | /sap/opu/odata/sap/ZMATL_GRP_1_SRV/EntityOf0MATL_GRP_1_TEXT | Synced      | {"txn_seq":"1761289852999","txn_id":"0"}     |        | 23:03.8               | 0          | 0          | FALSE           |
+--------------------------------------+---------------+-------------+--------------------------------------------------------------+-------------+----------------------------------------------+--------+-----------------------+------------+------------+-----------------+

These views collectively present a complete monitoring resolution for monitoring each general integration well being and particular person desk synchronization standing in Amazon Redshift.

Conditions

Within the following sections, we stroll via the steps required to arrange an SAP connection and utilizing that connection to create a zero-ETL integration. Earlier than implementing this resolution, you need to have the next in place:

An SAP account
An AWS account with administrator entry
Create an S3 Tables goal and affiliate the S3 bucket sap_demo_table_bucket as a location of the database
Replace AWS Glue Knowledge Catalog settings utilizing the next IAM coverage for fine-grained entry management of the Knowledge Catalog for zero-ETL
Create an IAM function named zero_etl_bulk_demo_role, for use by zero-ETL to entry information out of your SAP account
Create the key zero_etl_bulk_demo_secret in AWS Secrets and techniques Supervisor to retailer SAP credentials

Create connection to SAP occasion

To arrange a connection to your SAP occasion and supply information to entry, full the next steps:

On the AWS Glue console, within the navigation pane beneath Knowledge catalog, select Connections, then select Create Connection.
For Knowledge sources, choose SAP OData, then select Subsequent.
Enter the SAP occasion URL.
For IAM service function, select the function zero_etl_bulk_demo_role (created as a prerequisite).
For Authentication Kind, select the authentication sort that you simply’re utilizing for SAP.
For AWS Secret, select the key zero_etl_bulk_demo_secret (created as a prerequisite).
Select Subsequent.
For Identify, enter a reputation, reminiscent of sap_demo_conn.
Select Subsequent.

Create zero-ETL integration

To create the zero-ETL integration, full the next steps:

On the AWS Glue console, within the navigation pane beneath Knowledge catalog, select Zero-ETL integrations, then select Create zero-ETL integration.
For Knowledge supply, choose SAP OData, then select Subsequent.
Select the connection identify and IAM function that you simply created within the earlier step.
Select the SAP objects you need in your integration. The non-ODP objects are both configured for full load or incremental load, and ODP objects are robotically configured for incremental ingestion.
1. For full load, depart Incremental replace area set as No timestamp area chosen.
2. For incremental load, select the edit icon for Incremental replace area and select a timestamp area.
3. For ODP entities that provide delta token, the incremental replace area is pre-selected, and no buyer motion is important.
  
  When making a brand new integration utilizing the identical SAP connection and entity within the information filter, you will be unable to pick a special incremental replace area from the primary integration.
For Goal particulars, select sap_demo_table_bucket (created as a prerequisite).
For Goal IAM function, select sap_demo_role (created as a prerequisite).
Select Subsequent.
Within the Integration particulars part, for Identify, enter sap-demo-integration.
Select Subsequent.
Evaluate the main points and select Create and launch integration.

The newly created integration is proven as Lively in a few minute.

Clear up

To scrub up your assets, full the next steps. This course of will completely delete the assets created on this publish; again up essential information earlier than continuing.

Delete the zero-ETL integration sap-demo-integration.
Delete the S3 Tables goal bucket sap_demo_table_bucket.
Delete the Knowledge Catalog connection sap_demo_conn.
Delete the Secrets and techniques Supervisor secret zero_etl_bulk_demo_secret.

Conclusion

Now you can remodel your SAP information analytics with out the complexity of conventional ETL processes. With AWS Glue zero-ETL, you may acquire rapid entry to your SAP information whereas sustaining its construction throughout S3 Tables, SageMaker lakehouse, and Amazon Redshift. Your groups can use ACID-compliant storage with time journey capabilities, schema evolution, and concurrent reads/writes at scale, whereas holding information in cost-effective cloud storage. The answer’s AI capabilities via Amazon Q and SageMaker might help your small business create on-demand information merchandise, run text-to-SQL queries, and deploy AI brokers utilizing Amazon Bedrock and Fast Suite.

To study extra, discuss with the next assets:

Able to modernize your SAP information technique? Discover AWS Glue zero-ETL and enrich your group’s information analytics capabilities.