This weblog put up is co-written with Raj Samineni from ATPCO.
In at present’s data-driven world, firms throughout industries acknowledge the immense worth of information in making choices, driving innovation, and constructing new merchandise to serve their clients. Nonetheless, many organizations face challenges in enabling their workers to find, get entry to, and use knowledge simply with the precise governance controls. The numerous obstacles alongside the analytics journey constrain their means to innovate sooner and make fast choices.
ATPCO is the spine of recent airline retailing, enabling airways and third-party channels to ship the precise provides to clients on the proper time. ATPCO’s attain is spectacular, with its fare knowledge masking over 89% of worldwide flight schedules. The corporate collaborates with greater than 440 airways and 132 channels, managing and processing over 350 million fares in its database at any given time. ATPCO’s imaginative and prescient is to be the platform driving innovation in airline retailing whereas remaining a trusted associate to the airline ecosystem. ATPCO goals to empower data-driven decision-making by making top quality knowledge discoverable by each enterprise unit, with the suitable governance on who can entry what.
On this put up, utilizing certainly one of ATPCO’s use circumstances, we present you ways ATPCO makes use of AWS companies, together with Amazon DataZone, to make knowledge discoverable by knowledge customers throughout totally different enterprise items in order that they’ll innovate sooner. We encourage you to learn Amazon DataZone ideas and terminologies first to change into acquainted with the phrases used on this put up.
Use case
Certainly one of ATPCO’s use circumstances is to assist airways perceive what merchandise, together with fares and ancillaries (like premium seat choice), are being provided and bought throughout channels and buyer segments. To assist this want, ATPCO needs to derive insights round product efficiency by utilizing three totally different knowledge sources:
- Airline Ticketing knowledge – 1 billion airline ticket gross sales knowledge processed by ATPCO
- ATPCO pricing knowledge – 87% of worldwide airline provides are powered by ATPCO pricing knowledge. ATPCO is the business chief in offering pricing and merchandising content material for airways, world distribution programs (GDSs), on-line journey businesses (OTAs), and different gross sales channels for customers to visually perceive variations between numerous provides.
- De-identified buyer grasp knowledge – ATPCO buyer grasp knowledge that has been de-identified for delicate inner evaluation and compliance.
With the intention to generate insights that may then be shared with airways as an information product, an ATPCO analyst wants to have the ability to discover the precise knowledge associated to this matter, get entry to the information units, after which use it in a SQL consumer (like Amazon Athena) to start out forming hypotheses and relationships.
Earlier than Amazon DataZone, ATPCO analysts wanted to seek out potential knowledge belongings by speaking with colleagues; there wasn’t a straightforward strategy to uncover knowledge belongings throughout the corporate. This slowed down their tempo of innovation as a result of it added time to the analytics journey.
Answer
To deal with the problem, ATPCO sought inspiration from a contemporary knowledge mesh structure. As an alternative of a central knowledge platform workforce with an information warehouse or knowledge lake serving because the clearinghouse of all knowledge throughout the corporate, an information mesh structure encourages distributed possession of information by knowledge producers who publish and curate their knowledge as merchandise, which may then be found, requested, and utilized by knowledge customers.
Amazon DataZone offers wealthy performance to assist an information platform workforce distribute possession of duties in order that these groups can select to function much less like gatekeepers. In Amazon DataZone, knowledge homeowners can publish their knowledge and its enterprise catalog (metadata) to ATPCO’s DataZone area. Information customers can then seek for related knowledge belongings utilizing these human-friendly metadata phrases. As an alternative of entry requests from knowledge client going to a ATPCO’s knowledge platform workforce, they now go to the writer or a delegated reviewer to judge and approve. When knowledge customers use the information, they achieve this in their very own AWS accounts, which allocates their consumption prices to the precise value middle as an alternative of a central pool. Amazon DataZone additionally avoids duplicating knowledge, which saves on value and reduces compliance monitoring. Amazon DataZone takes care of all the plumbing, utilizing acquainted AWS companies equivalent to AWS Id and Entry Administration (IAM), AWS Glue, AWS Lake Formation, and AWS Useful resource Entry Supervisor (AWS RAM) in a means that’s totally inspectable by a buyer.
The next diagram offers an summary of the answer utilizing Amazon DataZone and different AWS companies, following a totally distributed AWS account mannequin, the place knowledge units like airline ticket gross sales, ticket pricing, and de-identified buyer knowledge on this use case are saved in numerous member accounts in AWS Organizations.

Implementation
Now, we’ll stroll by how ATPCO carried out their answer to resolve the challenges of analysts discovering, gaining access to, and utilizing knowledge shortly to assist their airline clients.
There are 4 elements to this implementation:
- Arrange account governance and identification administration.
- Create and configure an Amazon DataZone area.
- Publish knowledge belongings.
- Eat knowledge belongings as a part of analyzing knowledge to generate insights.
Half 1: Arrange account governance and identification administration
Earlier than you begin, evaluate your present cloud atmosphere, together with knowledge structure, to ATPCO’s atmosphere. We’ve simplified this atmosphere to the next elements for the aim of this weblog put up:
- ATPCO makes use of a company to create and govern AWS accounts.
- ATPCO has present knowledge lake sources arrange in a number of accounts, every owned by totally different data-producing groups. Having separate accounts helps management entry, limits the blast radius if issues go improper, and helps allocate and management value and utilization.
- In every of their data-producing accounts, ATPCO has a typical knowledge lake stack: An Amazon Easy Storage Service (Amazon S3) bucket for knowledge storage, AWS Glue crawler and catalog for updating and storing technical metadata, and AWS LakeFormation (in hybrid entry mode) for managing knowledge entry permissions.
- ATPCO created two new AWS accounts: one to personal the Amazon DataZone area and one other for a client workforce to make use of for analytics with Amazon Athena.
- ATPCO enabled AWS IAM Id Heart and linked their identification supplier (IdP) for authentication.
We’ll assume that you’ve an analogous setup, although you may select in another way to fit your distinctive wants.
Half 2: Create and configure an Amazon DataZone area
After your cloud atmosphere is about up, the steps in Half 2 will enable you create and configure an Amazon DataZone area. A website helps you set up your knowledge, folks, and their collaborative tasks, and features a distinctive enterprise knowledge catalog and net portal that publishers and customers will use to share, collaborate, and use knowledge. For ATPCO, their knowledge platform workforce created and configured their area.
Step 2.1: Create an Amazon DataZone area
Persona: Area administrator
Go to the Amazon DataZone console in your area account. Should you use AWS IAM Id Heart for company workforce identification authentication, then choose the AWS Area by which your Id Heart occasion is deployed. Select Create area.
- Enter a identify and description.
- Go away Customise encryption settings (superior) cleared.
- Go away the radio button chosen for Create and use a brand new function. AWS creates an IAM function in your account in your behalf with the mandatory IAM permissions for accessing Amazon DataZone APIs.
- Go away clear the short setup choice for Set-up this account for knowledge consumption and publishing as a result of we don’t plan to publish or eat knowledge in our area account.
- Skip Add new tag for now. You possibly can at all times come again later to edit the area and add tags.
- Select Create Area.
After a site is created, you will notice a site element web page much like the next. Discover that IAM Id Heart is disabled by default.

Step 2.2: Allow IAM Id Heart to your Amazon DataZone area and add a gaggle
Persona: Area administrator
By default, your Amazon area, its APIs, and its distinctive net portal are accessible by IAM principals on this AWS account with the mandatory datazone IAM permissions. ATPCO wished its company workers to have the ability to use Amazon DataZone with their company single sign-on SSO credentials with no need secondary federation to IAM roles. AWS Id Heart is the AWS cross-service answer for passing identification supplier credentials. You possibly can skip this step in case you plan to make use of IAM principals instantly for accessing Amazon DataZone.
Navigate to your Amazon DataZone area’s element web page and select Allow IAM Id Heart.
- Scroll all the way down to the Consumer administration part and choose Allow customers in IAM Id Heart. While you do, Consumer and group task methodology choices seem under. Activate Require assignments. Which means that it’s worthwhile to explicitly enable (add) customers and teams to entry your area. Select Replace area.
Now let’s add a gaggle to the area to offer its members with entry. Again in your area’s element web page, scroll to the underside and select the Consumer administration tab. Select Add, and choose Add SSO Teams from the drop-down.
- Enter the primary letters of the group identify and choose it from the choices. After you’ve added the specified teams, select Add group(s).
- You possibly can affirm that the teams are added efficiently on the area’s element web page, beneath the Consumer administration tab by deciding on SSO Customers after which SSO Teams from the drop-down.
Step 2.3: Affiliate AWS accounts with the area for segregated knowledge publishing and consumption
Personas: Area administrator and AWS account homeowners
Amazon DataZone helps a distributed AWS account construction, the place knowledge belongings are segregated from knowledge consumption (equivalent to Amazon Athena utilization), and knowledge belongings are in their very own accounts (owned by their respective knowledge homeowners). We name these related accounts. Amazon DataZone and the opposite AWS companies it orchestrates maintain the cross-account knowledge sharing. To make this work, area and account homeowners have to carry out a one-time account affiliation: the area must be shared with the account, and the account proprietor must configure it to be used with Amazon DataZone. For ATPCO, there are 4 desired related accounts, three of that are the accounts with knowledge belongings saved in Amazon S3 and cataloged in AWS Glue (airline ticketing knowledge, pricing knowledge, and de-identified buyer knowledge), and a fourth account that’s used for an analyst’s consumption.
The primary a part of associating an account is to share the Amazon DataZone area with the specified accounts (Amazon DataZone makes use of AWS RAM to create the useful resource coverage for you). In ATPCO’s case, their knowledge platform workforce manages the area, so a workforce member does these steps.
- Todo this within the Amazon DataZone console, sign up to the area account and navigate to the area element web page, after which scroll down and select the Related Accounts tab. Select Request affiliation.
- Enter the AWS account ID of the primary account to be related.
- Select Add one other account and repeat the first step for the remaining accounts to be related. For ATPCO, there have been 4 to-be related accounts.
- When full, select Request Affiliation.

The second a part of associating an account is for the account proprietor to then configure their account to be used by Amazon DataZone. Primarily, this course of signifies that the account proprietor is permitting Amazon DataZone to carry out actions within the account, like granting entry to Amazon DataZone tasks after a subscription request is authorised.
- Register to the related account and go to the Amazon DataZone console in the identical Area because the area. On the Amazon DataZone dwelling web page, select View requests.
- Choose the identify of the inviting Amazon DataZone area and select Evaluation request.

- Select the Amazon DataZone blueprint you wish to allow. We choose Information Lake on this instance as a result of ATPCO’s use case has knowledge in Amazon S3 and consumption by Amazon Athena.

- Go away the defaults as-is within the Permissions and sources The Glue Handle Entry function permits Amazon DataZone to make use of IAM and LakeFormation to handle IAM roles and permissions to knowledge lake sources after you approve a subscription request in Amazon DataZone. The Provisioning function permits Amazon DataZone to create S3 buckets and AWS Glue databases and tables in your account while you enable customers to create Amazon DataZone tasks and environments. The Amazon S3 bucket for knowledge lake is the place you specify which S3bucket is utilized by Amazon DataZone when customers retailer knowledge together with your account.

- Select Settle for & configure affiliation. This can take you to the related domains desk for this related account, exhibiting which domains the account is related to. Repeat this course of for different to-be related accounts.
After the associations are configured by accounts, you will notice the standing mirrored within the Related accounts tab of the area element web page.

Step 2.4: Arrange atmosphere profiles within the area
Persona: Area administrator
The ultimate step to organize the area is making the related AWS accounts usable by Amazon DataZone area customers. You do that with an atmosphere profile, which helps much less technical customers get began publishing or consuming knowledge. It’s like a template, with pre-defined technical particulars like blueprint sort, AWS account ID, and Area. ATPCO’s knowledge platform workforce arrange an atmosphere profile for every related account.
To do that within the Amazon DataZone console, the information platform workforce member sign up to the area account and navigates to the area element web page, and chooses Open knowledge portal within the higher proper to go to the web-based Amazon DataZone portal.
- Select Choose undertaking within the upper-left subsequent to the DataZone icon and choose Create Challenge. Enter a reputation, like Area Administration and select Create. This can take you to your new undertaking web page.
- Within the Area Administration undertaking web page, select the Environments tab, after which select Surroundings profiles within the navigation pane. Choose Create atmosphere profile.
- Enter a reputation, equivalent to Gross sales – Information lake blueprint.
- Choose the Area Administration undertaking as proprietor, and the DefaultDataLake because the blueprint.
- Choose the AWS account with gross sales knowledge in addition to the popular Area for brand spanking new sources, equivalent to AWS Glue and Athena consumption.
- Go away All tasks and Any database
- Finalize your choice by selecting Create Surroundings Profile.
Repeat this step for every of your related accounts. Because of this, Amazon DataZone customers will be capable to create environments of their tasks to make use of AWS sources in particular AWS accounts forpublishing or consumption.

Half 3: Publish belongings
With Half 2 full, the area is prepared for publishers to sign up and begin publishing the primary knowledge belongings to the enterprise knowledge catalog in order that potential knowledge customers discover related belongings to assist them with their analyses. We’ll concentrate on how ATPCO revealed their first knowledge asset for inner evaluation—gross sales knowledge from their airline clients. ATPCO already had the information extracted, remodeled, and loaded in a staged S3 bucket and cataloged with AWS Glue.
Step 3.1: Create a undertaking
Persona: Information writer
Amazon DataZone tasks allow a gaggle of customers to collaborate with knowledge. On this a part of the ATPCO use case, the undertaking is used to publish gross sales knowledge as an asset within the undertaking. By tying the eventual knowledge asset to a undertaking (somewhat than a person), the asset may have long-lived possession past the tenure of any single worker or group of workers.
- As an information writer, receive theURL of the area’s knowledge portal out of your area administrator, navigate to this sign-in web page and authenticate with IAM or SSO. After you’re signed in to the information portal, select Create Challenge, enter a reputation (equivalent to Gross sales Information Belongings) and select Create.
- If you wish to add teammates to the undertaking, select Add Members. On the Challenge members web page, select Add Members, seek for the related IAM or SSO principals, and choose a job for them within the undertaking. Homeowners have full permissions within the undertaking, whereas contributors usually are not capable of edit or delete the undertaking or management membership. Select Add Members to finish the membership modifications.
Step 3.2: Create an atmosphere
Persona: Information writer
Tasks may be comprised of a number of environments. Amazon DataZone environments are collections of configured sources (for instance, an S3 bucket, an AWS Glue database, or an Athena workgroup). They are often helpful if you wish to handle levels of information manufacturing for a similar important knowledge merchandise with separate AWS sources, equivalent to uncooked, filtered, processed, and curated knowledge levels.
- Whereas signed in to the information portal and within the Gross sales Information Belongings undertaking, select the Environments tab, after which choose Create Surroundings. Enter a reputation, equivalent to Processed, referencing the processed stage of the underlying knowledge.
- Choose the Gross sales – Information lake blueprint atmosphere profile the area administrator created in Half 2.
- Select Create Surroundings. Discover that you simply don’t want any technical particulars in regards to the AWS account or sources! The creation course of may take a number of minutes whereas Amazon DataZone units up Lake Formation, Glue, and Athena.
Step 3.3: Create a brand new knowledge supply and run an ingestion job
Persona: Information writer
On this use case, ATPCO has cataloged their knowledge utilizing AWS Glue. Amazon DataZone can use AWS Glue as an information supply. Amazon DataZone knowledge supply (for AWS Glue) is a illustration of a number of AWS Glue databases, with the choice to set desk choice standards primarily based on their identify. Much like how AWS Glue crawlers scan for brand spanking new knowledge and metadata, you may run an Amazon DataZone ingestion job towards an Amazon DataZone knowledge supply (once more, AWS Glue) to drag all the matching tables and technical metadata (equivalent to column headers) as the inspiration for a number of knowledge belongings. An ingestion job may be run manually or robotically on a schedule.
- Whereas signed in to the information portal and within the Gross sales Information Belongings undertaking, select the Information tab, after which choose Information sources. Select Create Information Supply, and enter a reputation to your knowledge supply, equivalent to Processed Gross sales knowledge in Glue, choose AWS Glue as the kind, and select Subsequent.
- Choose the Processed atmosphere from Step 3.2. Within the database identify field, enter a price or choose from the instructed AWS Glue databases that Amazon DataZone recognized within the AWS account. You possibly can add further standards and one other AWS Glue database.
- For Publishing settings, choose No. This lets you assessment and enrich the instructed belongings earlier than publishing them to the enterprise knowledge catalog.
- For Metadata era strategies, hold this field chosen. Amazon DataZone will offer you really useful enterprise names for the information belongings and its technical schema to publish an asset that’s simpler for customers to seek out.
- Clear Information high quality except you’ve got already arrange AWS Glue knowledge high quality. Select Subsequent.
- For Run choice, choose to run on demand. You possibly can come again later to run this ingestion job robotically on a schedule. Select Subsequent.
- Evaluation the alternatives and select Create.
To run the ingestion job for the primary time, select Run within the higher proper nook. This can begin the job. The run time depends on the amount of databases, tables, and columns in your knowledge supply. You possibly can refresh the standing by selecting Refresh.
Step 3.4: Evaluation, curate, and publish belongings
Persona: Information writer
After the ingestion job is full, the matching AWS Glue tables can be added to the undertaking’s stock. You possibly can then assessment the asset, together with automated metadata generated by Amazon DataZone, add further metadata, and publish the asset.
- Whereas signed in to the information portal and within the Gross sales Information Belongings undertaking, go to the Information tab, and choose Stock. You possibly can assessment every of the information belongings generated by the ingestion job. Let’s choose the primary consequence. Within the asset element web page, you may edit the asset’s identify and outline to make it simpler to seek out, particularly in a listing of search outcomes.
- You possibly can edit the Learn Me part and add wealthy descriptions for the asset, with markdown assist. This might help scale back the questions customers message the writer with for clarification.
- You possibly can edit the technical schema (columns), together with including enterprise names and descriptions. Should you enabled automated metadata era, then you definitely’ll see suggestions right here which you can settle for or reject.
- After you’re accomplished enriching the asset, you may select Publish to make it searchable within the enterprise knowledge catalog.
Have the information writer for every asset observe Half 3. For ATPCO, this implies two further groups adopted these steps to get pricing and de-identified buyer knowledge into the information catalog.
Half 4: Eat belongings as a part of analyzing knowledge to generate insights
Now that the enterprise knowledge catalog has three revealed knowledge belongings, knowledge customers will discover obtainable knowledge to start out their evaluation. On this ultimate half, an ATPCO knowledge analyst can discover the belongings they want, receive authorised entry, and analyze the information in Athena, forming the precursor of an information product that ATPCO can then make obtainable to their buyer (equivalent to an airline).
Step 4.1: Uncover and discover knowledge belongings within the catalog
Persona: Information client
As an information client, receive the URL of the area’s knowledge portal out of your area administrator, navigate to within the sign-in web page, and authenticate with IAM or SSO. Within the knowledge portal, enter textual content to seek out knowledge belongings that match what it’s worthwhile to full your evaluation. Within the ATPCO instance, the analyst began by coming into ticketing knowledge. This returned the gross sales asset revealed above as a result of the outline famous that the information was associated to “gross sales, together with tickets and ancillaries (like premium seat choice preferences).”
The info client critiques the element web page of the gross sales asset, together with the outline and human-friendly phrases within the schema, and confirms that it’s of use to the evaluation. They then select Subscribe. The info client is prompted to pick a undertaking for the subscription request, by which case they observe the identical directions as making a undertaking in Step 3.1, naming it Product evaluation undertaking. Enter a brief justification of the request. Select Subscribe to ship the request to the information writer.
Repeat Steps 4.2 and 4.3 for every of the wanted knowledge belongings for the evaluation. Within the ATPCO use case, this meant trying to find and subscribing to pricing and buyer knowledge.
Whereas ready for the subscription requests to be authorised, the information client creates an Amazon DataZone atmosphere within the Product evaluation undertaking, much like Step 3.2. The info client selects an atmosphere profile for his or her consumption AWS account and the information lake blueprint.
Step 4.2: Evaluation and approve subscription request
Persona: Information writer
The subsequent time {that a} member of the Gross sales Information Belongings undertaking indicators in to the Amazon DataZone knowledge portal, they’ll see a notification of the subscription request. Choose that notification or navigate within the Amazon DataZone knowledge portal to the undertaking. Select the Information tab and Incoming requests after which the Requested tab to seek out the request. Evaluation the request and resolve to both Approve or Reject, whereas offering a disposition cause for future reference.
Step 4.3: Analyze knowledge
Persona: Information client
Now that the information client has subscribed to all three knowledge belongings wanted (by repeating steps 4.1-4.2 for every asset), the information client navigates to the Product evaluation undertaking within the Amazon DataZone knowledge portal. The info client can confirm that the undertaking has knowledge asset subscriptions by selecting the Information tab and Subscribed knowledge.

As a result of the undertaking has an atmosphere with the information lake blueprint enabled of their consumption AWS account, the information client will see an icon within the right-side tab referred to as Question Information: Amazon Athena. By deciding on this icon, they’re taken to the Amazon Athena console.

Within the Amazon Athena console, the information client sees the information belongings their DataZone undertaking is subscribed to (from steps 4.1-4.2). They use the Amazon Athena question editor to question the subscribed knowledge.

Conclusion
On this put up, we walked you thru an ATPCO use case to show how Amazon DataZone permits customers throughout a company to simply uncover related knowledge merchandise utilizing enterprise phrases. Customers can then request entry to knowledge and construct merchandise and insights sooner. By offering self-service entry to knowledge with the precise governance guardrails, Amazon DataZone helps firms faucet into the total potential of their knowledge merchandise to drive innovation and data-driven determination making. Should you’re in search of a strategy to unlock the total potential of your knowledge and democratize it throughout your group, then Amazon DataZone might help you remodel what you are promoting by making data-driven insights extra accessible and productive.
To be taught extra about Amazon DataZone and how you can get began, seek advice from the Getting began information. See the YouTube playlist for a few of the newest demos of Amazon DataZone and brief descriptions of the capabilities obtainable.
In regards to the Creator

Brian Olsen is a Senior Technical Product Supervisor with Amazon DataZone. His 15 12 months know-how profession in analysis science and product has revolved round serving to clients use knowledge to make higher choices. Outdoors of labor, he enjoys studying new adventurous hobbies, with the newest being paragliding within the sky.

Mitesh Patel is a Principal Options Architect at AWS. His ardour helps clients harness the facility of Analytics, machine studying and AI to drive enterprise development. He engages with clients to create revolutionary options on AWS.
Raj Samineni is the Director of Information Engineering at ATPCO, main the creation of superior cloud-based knowledge platforms. His work ensures strong, scalable options that assist the airline business’s strategic transformational aims. By leveraging machine studying and AI, Raj drives innovation and knowledge tradition, positioning ATPCO on the forefront of technological development.
Sonal Panda is a Senior Options Architect at AWS with over 20 years of expertise in architecting and creating intricate programs, primarily within the monetary business. Her experience lies in Generative AI, utility modernization leveraging microservices and serverless architectures to drive innovation and effectivity.
