Clients usually need to increase and enrich SAP supply information with different non-SAP supply information. Such analytic use instances will be enabled by constructing an information warehouse or information lake. Clients can now use the AWS Glue SAP OData connector to extract information from SAP. The SAP OData connector helps each on-premises and cloud-hosted (native and SAP RISE) deployments. By utilizing the AWS Glue OData connector for SAP, you’ll be able to work seamlessly along with your information on AWS Glue and Apache Spark in a distributed trend for environment friendly processing. AWS Glue is a serverless information integration service that makes it simpler to find, put together, transfer, and combine information from a number of sources for analytics, machine studying (ML), and utility growth.
AWS Glue OData connector for SAP makes use of the SAP ODP framework and OData protocol for information extraction. This framework acts in a provider-subscriber mannequin to allow information transfers between SAP techniques and non-SAP information targets. The ODP framework helps full information extraction and alter information seize by the Operational Delta Queues (ODQ) mechanism. As a supply for information extraction for SAP, you should use SAP information extractors, ABAP CDS views, SAP BW, or BW/4 HANA sources, HANA data views in SAP ABAP sources, or any ODP-enabled information sources.
SAP supply techniques can maintain historic information, and may obtain fixed updates. Because of this, it’s necessary to allow incremental processing of supply adjustments. This weblog publish particulars how one can extract information from SAP and implement incremental information switch out of your SAP supply utilizing the SAP ODP OData framework with supply delta tokens.
Answer overview
Instance Corp needs to research the product information saved of their SAP supply system. They need to perceive their present product providing, specifically the variety of merchandise that they’ve in every of their materials teams. It will embody becoming a member of information from the SAP materials grasp and materials group information sources from their SAP system. The fabric grasp information is accessible on incremental extraction, whereas the fabric group is just obtainable on a full load. These information sources ought to be mixed and obtainable to question for evaluation.
Conditions
To finish the answer offered within the publish, begin by finishing the next prerequisite steps:
- Configure operational information provisioning (ODP) information sources for extraction within the SAP Gateway of your SAP system.
- Create an Amazon Easy Storage Service (Amazon S3) bucket to retailer your SAP information.
- In an AWS Glue Knowledge Catalog, create a database referred to as
sapgluedatabase. - Create an AWS Id and Entry Administration (IAM) function for the AWS Glue extract, rework, and cargo (ETL) job to make use of. The function should grant entry to all assets utilized by the job, together with Amazon S3 and AWS Secrets and techniques Supervisor. For the answer on this publish, title the function
GlueServiceRoleforSAP. Use the next insurance policies:- AWS managed insurance policies:
- Inline coverage:
Create the AWS Glue connection for SAP
The SAP connector helps each CUSTOM (that is SAP BASIC authentication) and OAUTH authentication strategies. For this instance, you may be connecting with BASIC authentication.
- Use the AWS Administration Console for AWS Secrets and techniques Supervisor to create a secret referred to as
ODataGlueSecretin your SAP supply. Particulars in AWS Secrets and techniques Supervisor ought to embody the weather within the following code. You will want to enter your SAP system username rather than <your SAP username> and its password rather than <your SAP username password>. - Create the AWS Glue connection
GlueSAPOdatain your SAP system by deciding on the brand new SAP OData information supply.
- Configure the reference to the suitable values in your SAP supply.
- Software host URL: The host should have the SSL certificates for the authentication and validation of your SAP host title.
- Software service path:
/sap/opu/odata/iwfnd/catalogservice;v=2; - Port quantity: Port variety of your SAP supply system.
- Consumer quantity: Consumer variety of your SAP supply system.
- Logon language: Logon language of your SAP supply system.
- Within the Authentication part, choose CUSTOM because the Authentication Sort.
- Choose the AWS Secret created within the previous steps: SAPODataSecret.

- Within the Community Choices part enter the VPC, subnet and safety group used for the connection to your SAP system. For extra data on connecting to your SAP system, see Configure a VPC in your ETL job.
Create an ETL job to ingest information from SAP
Within the AWS Glue console, create a brand new Visible Editor AWS Glue job.
- Go to the AWS Glue console.
- Within the navigation pane beneath ETL Jobs select Visible ETL.
- Select Visible ETL to create a job within the Visible Editor.
- For this publish, edit the default title to be Materials Grasp Job and select Save.
In your Visible Editor canvas, choose your SAP sources.
- Select the Visible tab, then select the plus signal to open the Add nodes menu. Seek for
SAPand add the SAP OData Supply.
- Select the node you simply added and title it
Materials Grasp Attributes.- For SAP OData connection, choose the GlueSAPOData connection.
- Choose the fabric attributes, service and entity set out of your SAP supply.
- For Entity Title and Sub Entity Title, choose SAP OData entity out of your SAP supply.
- From the Fields, choose Materials, Created on, Materials Group, Materials Sort, Outdated Matl quantity, GLUE_FETCH_SQ, DELTA_TOKEN and DML_STATUS.
- Enter restrict 100 within the filter part, to restrict the info for design time.
Be aware that this service helps delta extraction, so Incremental switch is the default chosen possibility.
After the AWS Glue service function particulars have been chosen, the info preview is accessible. You possibly can regulate the preview to incorporate the three new obtainable fields, that are:
glue_fetch_sq: It is a sequence area, generated from the EPOC timestamp within the order the file was obtained and is exclusive for every file. This can be utilized if you have to know or set up the order of adjustments within the supply system.delta_token: All data may have this area worth clean, apart from the final handed file, which can include the worth for the ODQ token to seize any modified data (CDC). This file is just not a transactional file from the supply and is just there for the aim of passing the delta token worth.dml_status: It will present UPDATED for all newly inserted and up to date data from the supply and DELETED for data which were deleted from supply.
For delta enabled extraction, the final file handed will include the worth DELTA_TOKEN and the delta_token area shall be stuffed as talked about above.
- Add one other SAP ODATA supply connection to your canvas, and title this node
Materials Group Textual content.- Choose the fabric group service and entity set out of your SAP supply
- For Entity Title and Sub Entity Title, choose the SAP OData entity out of your SAP supply
Be aware that this service helps full extraction, so Full switch is the default chosen possibility. It’s also possible to preview this dataset.

- When previewing the info, discover the language key. SAP passes all languages, so add a filter of
SPRAS = ‘E’to solely extract English. Be aware this makes use of the SAP inner worth of the sector.
- Add a rework node to the canvas Change Schema rework after the
Materials Group Textual content.- Rename the fabric group area in goal key to
matkl2, so it’s totally different than your first supply. - Beneath Drop, choose ;spras, odq_changemode, odq_entitycntr, dml_status, delta_token and glue_fetch_sq.

- Rename the fabric group area in goal key to
- Add a be part of rework to your canvas, bringing collectively each supply datasets.
- Make sure the node dad and mom of each Materials Grasp Attributes and Change Schema have been chosen
- Choose the Be a part of sort of Left be part of
- Choose the be part of situations as the important thing fields from every supply
- Beneath Materials Grasp Attributes, choose
matkl - Beneath Change Schema, choose
matkl2
- Beneath Materials Grasp Attributes, choose
You possibly can preview the output to make sure the proper information is being returned. Now, you might be able to retailer the end result.

- Add the S3 bucket goal, to your canvas.
- Make sure the node dad and mom is Be a part of
- For format, choose Parquet.
- For S3 Goal Location, browse to the S3 bucket you created within the conditions and add
materialmaster/to the S3 goal location. - For the Knowledge Catalog replace choices, choose Create a desk within the Knowledge Catalog and on subsequent runs, replace the schema and add new partitions.
- For Database, choose the title of the AWS Glue database created earlier sapgluedatabase.
- For Desk title, enter
materialmaster.
- Select Save to save lots of your job. Your job ought to appear to be the next determine.

Clone your ETL job and make it incremental
After your ETL job has been created, it’s able to clone and embody incremental information dealing with utilizing delta tokens.
To do that, you will want to change the job script instantly. You’ll modify the script so as to add an announcement which retrieves the final delta token (to be saved on the job tag) and add the delta token worth to the to the request (or execution of the job), which can allow the Delta Enabled SAP OData Service when retrieving the info on the subsequent job run.
The primary execution of the job won’t have a delta token worth on the tag; subsequently, the decision shall be an preliminary run and the delta token will subsequently be saved within the tags for future executions.
- Go to the AWS Glue console.
- Within the navigation pane beneath ETL Jobs select Visible ETL.
- Choose the Materials Grasp Job, select Actions and choose Clone job.

- Change the title of the job to
Materials Grasp Job Delta, then select the Script tab. - It’s essential add an extra python library that can handle storing and retrieving the Delta Tokens for every job execution. To do that, navigate to the Job Particulars tab, scroll down and increase the Superior Properties part. Within the Python library path add the next path:
s3://aws-blogs-artifacts-public/artifacts/BDB-4789/sap_odata_state_management.zip

- Now select the Script tab and select Edit script on the highest proper nook. Select Verify to substantiate that your job shall be script-only.

Apply the next adjustments to the script to allow the delta token.
- 7. Import the SAP OData state administration library courses you added in step 5 above, by including the next code to row 8.

- The subsequent few steps will retrieve and persist the delta token within the job tags so it may be accessed by the next job execution. The delta token is added to the request again to the SAP supply, so the incremental adjustments are extracted. If there is no such thing as a token handed, the load will run as an preliminary load and the token shall be endured for the subsequent run which can then be a delta load.To initialize the
sap_odata_state_managementlibrary, extract the connection choices right into a variable and replace them utilizing the state supervisor. Do that by including the next code to line 16 (after thejob.initassertion).
You could find the <key of MaterialMasterAttributes node> and the <entityName for Materials Attribute> within the present generated script beneath # Script generated for node Materials Grasp Attributes. Make sure to exchange with the suitable values.
- 9. Remark out the present script generated for node
Materials Grasp Attributesby including a#, and add the next alternative snippet. - To extract the delta token from the dynamic body and persist it within the job tags, add the next code snippet simply above the final line in your script (earlier than
job.commit())
That is what your remaining script ought to appear to be:
- Select Save to save lots of your adjustments.
- Select Run to run your job. Be aware that there are at the moment no tags in your job particulars.
- Wait in your job run to be efficiently accomplished. You possibly can see the standing on the Runs tab.
- After your job run is full, you’ll discover on the Job Particulars tab {that a} tag has been added. The subsequent job run will learn this token and run a delta load.

Question your SAP information supply information
The AWS Glue job run has created an entry within the Knowledge Catalog enabling you to question the info instantly.
- Go to the Amazon Athena console.
- Select Launch Question Editor.
- Ensure you have an applicable workgroup assigned, or create a workgroup if required.
- Choose the sapgluedatabase and run a question (akin to the next) to start out analyzing your information.

Clear up
To keep away from incurring expenses, clear up the assets used on this publish out of your AWS account, together with the AWS Glue jobs, SAP OData connection, Glue Knowledge Catalog entry, Secrets and techniques Supervisor secret, IAM function, the contents of the S3 bucket, and the S3 bucket.
Conclusion
On this publish, we confirmed you create a serverless incremental information load course of for a number of SAP information sources. The method used AWS Glue to incrementally load the info from a SAP supply utilizing SAP ODP delta tokens after which load the info into Amazon S3.
The serverless nature of AWS Glue implies that there is no such thing as a infrastructure administration, and also you pay just for the assets consumed whereas your jobs are working (plus storage value for outputs). As organizations more and more grow to be extra information pushed, this SAP connector can present an environment friendly, value efficient, performant, safe approach to embody SAP supply information in your huge information and analytic outcomes. For extra data see AWS Glue.
Concerning the authors
Allison Quinn is a Sr. ANZ Analytics Specialist Options Architect for Knowledge and AI based mostly in Melbourne, Australia working carefully with Monetary Service prospects within the area. Allison labored over 15 years with SAP merchandise earlier than concentrating her Analytics technical specialty on AWS native companies. She’s very keen about all issues information, and democratizing in order that prospects of every type can drive enterprise profit.
Pavol is an Innovation Answer Architect at AWS, specializing in SAP cloud adoption throughout EMEA. With over 20 years of expertise, he helps world prospects migrate and optimize SAP techniques on AWS. Pavol develops tailor-made methods to transition SAP environments to the cloud, leveraging AWS’s agility, resiliency, and efficiency. He assists shoppers in modernizing their SAP landscapes utilizing AWS’s AI/ML, information analytics, and utility companies to boost intelligence, automation, and efficiency.
Partha Pratim Sanyal is a Software program Growth Engineer with AWS Glue in Vancouver, Canada, specializing in Knowledge Integration, Analytics, and Connectivity. With in depth backend growth experience, he’s devoted to crafting impactful, customer-centric options. His work focuses on constructing options that empower customers to effortlessly analyze and perceive their information. Partha’s dedication to addressing complicated consumer wants drives him to create intuitive and value-driven experiences that elevate information accessibility and insights for purchasers.
Diego is an skilled Enterprise Options Architect with over 20 years’ expertise throughout SAP applied sciences, specializing in SAP innovation and information and analytics. He has labored each as companion and as a buyer, giving him a whole perspective on what it takes to promote, implement, and run techniques and organizations. He’s keen about expertise and innovation, specializing in buyer outcomes and delivering enterprise worth.
Luis Alberto Herrera Gomez is a Software program Growth Engineer with AWS Glue in Vancouver, specializing in backend engineering, microservices, and cloud computing. With 7-8 years of expertise, together with roles as a backend and full-stack developer for a number of startups earlier than becoming a member of Amazon and AWS; Luis focuses on creating scalable and environment friendly cloud-based functions. His experience in AWS applied sciences permits him to design high-performance techniques that deal with complicated information processing duties. Luis is keen about leveraging cloud computing to fixing difficult enterprise issues.













