Final week, we introduced the common availability of the mixing between Amazon DataZone and AWS Lake Formation hybrid entry mode. On this publish, we share how this new function helps you simplify the best way you utilize Amazon DataZone to allow safe and ruled sharing of your knowledge within the AWS Glue Knowledge Catalog. We additionally delve into how knowledge producers can share their AWS Glue tables by Amazon DataZone with no need to register them in Lake Formation first.
Overview of the Amazon DataZone integration with Lake Formation hybrid entry mode
Amazon DataZone is a totally managed knowledge administration service to catalog, uncover, analyze, share, and govern knowledge between knowledge producers and customers in your group. With Amazon DataZone, knowledge producers populate the enterprise knowledge catalog with knowledge property from knowledge sources such because the AWS Glue Knowledge Catalog and Amazon Redshift. Additionally they enrich their property with enterprise context to make it easy for knowledge customers to grasp. After the information is accessible within the catalog, knowledge customers similar to analysts and knowledge scientists can search and entry this knowledge by requesting subscriptions. When the request is accepted, Amazon DataZone can routinely provision entry to the information by managing permissions in Lake Formation or Amazon Redshift in order that the information shopper can begin querying the information utilizing instruments similar to Amazon Athena or Amazon Redshift.
To handle the entry to knowledge within the AWS Glue Knowledge Catalog, Amazon DataZone makes use of Lake Formation. Beforehand, for those who needed to make use of Amazon DataZone for managing entry to your knowledge within the AWS Glue Knowledge Catalog, you needed to onboard your knowledge to Lake Formation first. Now, the mixing of Amazon DataZone and Lake Formation hybrid entry mode simplifies how one can get began together with your Amazon DataZone journey by eradicating the necessity to onboard your knowledge to Lake Formation first.
Lake Formation hybrid entry mode lets you begin managing permissions in your AWS Glue databases and tables by Lake Formation, whereas persevering with to take care of any present AWS Id and Entry Administration (IAM) permissions on these tables and databases. Lake Formation hybrid entry mode helps two permission pathways to the identical Knowledge Catalog databases and tables:
- Within the first pathway, Lake Formation lets you choose particular principals (opt-in principals) and grant them Lake Formation permissions to entry databases and tables by opting in
- The second pathway permits all different principals (that aren’t added as opt-in principals) to entry these sources by the IAM principal insurance policies for Amazon Easy Storage Service (Amazon S3) and AWS Glue actions
With the mixing between Amazon DataZone and Lake Formation hybrid entry mode, when you have tables within the AWS Glue Knowledge Catalog which can be managed by IAM-based insurance policies, you possibly can publish these tables on to Amazon DataZone, with out registering them in Lake Formation. Amazon DataZone registers the situation of those tables in Lake Formation utilizing hybrid entry mode, which permits managing permissions on AWS Glue tables by Lake Formation, whereas persevering with to take care of any present IAM permissions.
Amazon DataZone lets you publish any kind of asset within the enterprise knowledge catalog. For a few of these property, Amazon DataZone can routinely handle entry grants. These property are referred to as managed property, and embody Lake Formation-managed Knowledge Catalog tables and Amazon Redshift tables and views. Previous to this integration, you needed to full the next steps earlier than Amazon DataZone might deal with the printed Knowledge Catalog desk as a managed asset:
- Id the Amazon S3 location related to Knowledge Catalog desk.
- Register the Amazon S3 location with Lake Formation in hybrid entry mode utilizing a position with applicable permissions.
- Publish the desk metadata to the Amazon DataZone enterprise knowledge catalog.
The next diagram illustrates this workflow.

With the Amazon DataZone’s integration with Lake Formation hybrid entry mode, you possibly can merely publish your AWS Glue tables to Amazon DataZone with out having to fret about registering the Amazon S3 location or including an opt-in principal in Lake Formation by delegating these steps to Amazon DataZone. The administrator of an AWS account can allow the information location registration setting below the DefaultDataLake blueprint on the Amazon DataZone console. Now, a knowledge proprietor or writer can publish their AWS Glue desk (managed by IAM permissions) to Amazon DataZone with out the additional setup steps. When a knowledge shopper subscribes to this desk, Amazon DataZone registers the Amazon S3 areas of the desk in hybrid entry mode, provides the information shopper’s IAM position as an opt-in principal, and grants entry to the identical IAM position by managing permissions on the desk by Lake Formation. This makes positive that IAM permissions on the desk can coexist with newly granted Lake Formation permissions, with out disrupting any present workflows. The next diagram illustrates this workflow.

Resolution overview
To reveal this new functionality, we use a pattern buyer state of affairs the place the finance group needs to entry knowledge owned by the gross sales group for monetary evaluation and reporting. The gross sales group has a pipeline that creates a dataset containing helpful details about ticket gross sales, fashionable occasions, venues, and seasons. We name it the tickit dataset. The gross sales group shops this dataset in Amazon S3 and registers it in a database within the Knowledge Catalog. The entry to this desk is at present managed by IAM-based permissions. Nonetheless, the gross sales group needs to publish this desk to Amazon DataZone to facilitate safe and ruled knowledge sharing with the finance group.
The steps to configure this answer are as follows:
- The Amazon DataZone administrator allows the information lake location registration setting in Amazon DataZone to routinely register the Amazon S3 location of the AWS Glue tables in Lake Formation hybrid entry mode.
- After the hybrid entry mode integration is enabled in Amazon DataZone, the finance group requests a subscription to the gross sales knowledge asset. The asset exhibits up as a managed asset, which implies Amazon DataZone can handle entry to this asset even when the Amazon S3 location of this asset isn’t registered in Lake Formation.
- The gross sales group is notified of a subscription request raised by the finance group. They assessment and approve the entry request. After the request is accepted, Amazon DataZone fulfills the subscription request by managing permissions within the Lake Formation. It registers the Amazon S3 location of the subscribed desk in Lake Formation hybrid mode.
- The finance group positive factors entry to the gross sales dataset required for his or her monetary reviews. They’ll go to their DataZone atmosphere and begin operating queries utilizing Athena in opposition to their subscribed dataset.
Stipulations
To comply with the steps on this publish, you want an AWS account. When you don’t have an account, you possibly can create one. As well as, you need to have the next sources configured in your account:
- An S3 bucket
- An AWS Glue database and crawler
- IAM roles for various personas and providers
- An Amazon DataZone area and undertaking
- An Amazon DataZone atmosphere profile and atmosphere
- An Amazon DataZone knowledge supply
When you don’t have these sources already configured, you possibly can create them by deploying the next AWS CloudFormation stack:
- Select Launch Stack to deploy a CloudFormation template.

- Full the steps to deploy the template and go away all settings as default.
- Choose I acknowledge that AWS CloudFormation may create IAM sources, then select Submit.
After the CloudFormation deployment is full, you possibly can log in to the Amazon DataZone portal and manually set off a knowledge supply run. This pulls any new or modified metadata from the supply and updates the related property within the stock. This knowledge supply has been configured to routinely publish the information property to the catalog.
- On the Amazon DataZone console, select View domains.
You ought to be logged in utilizing the identical position that’s used to deploy CloudFormation and confirm that you’re in the identical AWS Area.

- Discover the area
blog_dz_domain, then select Open knowledge portal. - Select Browse all initiatives and select Gross sales producer undertaking.

- On the Knowledge tab, select Knowledge sources within the navigation pane.
- Find and select the information supply that you just need to run.
This opens the information supply particulars web page.
- Select the choices menu (three vertical dots) subsequent to
tickit_datasourceand select Run.
The info supply standing modifications to Working as Amazon DataZone updates the asset metadata.
Allow hybrid mode integration in Amazon DataZone
On this step, the Amazon DataZone administrator goes by the method of enabling the Amazon DataZone integration with Lake Formation hybrid entry mode. Full the next steps:
- On a separate browser tab, open the Amazon DataZone console.
Confirm that you’re in the identical Area the place you deployed the CloudFormation template.
- Select View domains.
- Select the area created by AWS CloudFormation,
blog_dz_domain. - Scroll down on the area particulars web page and select the Blueprints tab.
A blueprint defines what AWS instruments and providers can be utilized with the information property printed in Amazon DataZone. The DefaultDataLake blueprint is enabled as a part of the CloudFormation stack deployment. This blueprint lets you create and question AWS Glue tables utilizing Athena. For the steps to allow this in your personal deployments, confer with Allow built-in blueprints within the AWS account that owns the Amazon DataZone area.
- Select the
DefaultDataLakeblueprint.
- On the Provisioning tab, select Edit.

- Choose Allow Amazon DataZone to register S3 areas utilizing AWS Lake Formation hybrid entry mode.
You’ve got the choice of excluding particular Amazon S3 areas for those who don’t need Amazon DataZone to routinely register them to Lake Formation hybrid entry mode.
- Select Save modifications.

Request entry
On this step, you log in to Amazon DataZone because the finance group, seek for the gross sales knowledge asset, and subscribe to it. Full the next steps:
- Return to your Amazon DataZone knowledge portal browser tab.
- Swap to the finance shopper undertaking by selecting the dropdown menu subsequent to the undertaking title and selecting Finance shopper undertaking.
From this step onwards, you tackle the persona of a finance consumer seeking to subscribe to an information asset printed within the earlier step.

- Within the search bar, seek for and select the
gross salesknowledge asset.
- Select Subscribe.

The asset exhibits up as managed asset. Which means that Amazon DataZone can grant entry to this knowledge asset to the finance group’s undertaking by managing the permissions in Lake Formation.
- Enter a purpose for the entry request and select Subscribe.

Approve entry request
The gross sales group will get a notification that an entry request from the finance group is submitted. To approve the request, full the next steps:
- Select the dropdown menu subsequent to the undertaking title and select Gross sales producer undertaking.
You now assume the persona of the gross sales group, who’re the homeowners and stewards of the gross sales knowledge property.
- Select the notification icon on the top-right nook of the DataZone portal.
- Select the Subscription Request Created activity.

- Grant entry to the gross sales knowledge asset to the finance group and select Approve.

Analyze the information
The finance group has now been granted entry to the gross sales knowledge, and this dataset has been to their Amazon DataZone atmosphere. They’ll entry the atmosphere and question the gross sales dataset with Athena, together with some other datasets they at present personal. Full the next steps:
- On the dropdown menu, select Finance shopper undertaking.
On the appropriate pane of the undertaking overview display screen, you’ll find a listing of lively environments accessible to be used.
- Select the Amazon DataZone atmosphere
finance_dz_environment.
- Within the navigation pane, below Knowledge property, select Subscribed.
- Confirm that your atmosphere now has entry to the gross sales knowledge.
It could take a couple of minutes for the information asset to be routinely added to your atmosphere.
- Select the brand new tab icon for Question knowledge.

A brand new tab opens with the Athena question editor.
- For Database, select
finance_consumer_db_tickitdb-<suffix>.
This database will comprise your subscribed knowledge property.

- Generate a preview of the gross sales desk by selecting the choices menu (three vertical dots) and selecting Preview desk.

Clear up
To wash up your sources, full the next steps:
- Swap again to the administrator position you used to deploy the CloudFormation stack.
- On the Amazon DataZone console, delete the initiatives used on this publish. It will delete most project-related objects like knowledge property and environments.
- On the AWS CloudFormation console, delete the stack you deployed to start with of this publish.
- On the Amazon S3 console, delete the S3 buckets containing the tickit dataset.
- On the Lake Formation console, delete the Lake Formation admins registered by Amazon DataZone.
- On the Lake Formation console, delete tables and databases created by Amazon DataZone.
Conclusion
On this publish, we mentioned how the mixing between Amazon DataZone and Lake Formation hybrid entry mode simplifies the method to begin utilizing Amazon DataZone for end-to-end governance of your knowledge within the AWS Glue Knowledge Catalog. This integration helps you bypass the handbook steps of onboarding to Lake Formation earlier than you can begin utilizing Amazon DataZone.
For extra info on methods to get began with Amazon DataZone, confer with the Getting began information. Take a look at the YouTube playlist for a few of the newest demos of Amazon DataZone and brief descriptions of the capabilities accessible. For extra details about Amazon DataZone, see How Amazon DataZone helps clients discover worth in oceans of knowledge.
Concerning the Authors
Utkarsh Mittal is a Senior Technical Product Supervisor for Amazon DataZone at AWS. He’s obsessed with constructing modern merchandise that simplify clients’ end-to-end analytics journeys. Exterior of the tech world, Utkarsh likes to play music, with drums being his newest endeavor.
Praveen Kumar is a Principal Analytics Resolution Architect at AWS with experience in designing, constructing, and implementing trendy knowledge and analytics platforms utilizing cloud-centered providers. His areas of pursuits are serverless expertise, trendy cloud knowledge warehouses, streaming, and generative AI functions.
Paul Villena is a Senior Analytics Options Architect in AWS with experience in constructing trendy knowledge and analytics options to drive enterprise worth. He works with clients to assist them harness the facility of the cloud. His areas of pursuits are infrastructure as code, serverless applied sciences, and coding in Python













