[HTML payload içeriği buraya]
31.9 C
Jakarta
Tuesday, May 12, 2026

Apply fine-grained entry and transformation on the SUPER information sort in Amazon Redshift


Amazon Redshift is a quick, scalable, safe, and absolutely managed cloud information warehouse that makes it easy and cost-effective to investigate all of your information utilizing customary SQL and your present ETL (extract, remodel, and cargo), enterprise intelligence (BI), and reporting instruments. Tens of 1000’s of shoppers use Amazon Redshift to course of exabytes of knowledge per day and energy analytics workloads similar to BI, predictive analytics, and real-time streaming analytics.

Amazon Redshift, a cloud information warehouse service, helps attaching dynamic information masking (DDM) insurance policies to paths of SUPER information sort columns, and makes use of the OBJECT_TRANSFORM operate with the SUPER information sort. SUPER information sort columns in Amazon Redshift include semi-structured information like JSON paperwork. Beforehand, information masking in Amazon Redshift solely labored with common desk columns, however now you may apply masking insurance policies particularly to components inside SUPER columns. For instance, you could possibly apply a masking coverage to masks delicate fields like bank card numbers inside JSON paperwork saved in a SUPER column. This enables for extra granular management over information masking in Amazon Redshift. Amazon Redshift offers you extra flexibility in the way you apply information masking to guard delicate data saved in SUPER columns containing semi-structured information.

With DDM assist in Amazon Redshift, you are able to do the next:

  • Outline masking insurance policies that apply customized obfuscation insurance policies, similar to masking insurance policies to deal with bank card, personally identifiable data (PII) entries, HIPAA or GDPR wants, and extra
  • Rework the information at question time to use masking insurance policies
  • Connect masking insurance policies to roles or customers
  • Connect a number of masking insurance policies with various ranges of obfuscation to the identical column in a desk and assign them to totally different roles with priorities to keep away from conflicts
  • Implement cell-level masking through the use of conditional columns when creating your masking coverage
  • Use masking insurance policies to partially or fully redact information, or hash it through the use of user-defined features (UDFs)

On this submit, we display how a retail firm can management the entry of PII information saved within the SUPER information sort to customers based mostly on their entry privilege with out duplicating the information.

Resolution overview

For our use case, we’ve got the next information entry necessities:

  • Customers from the Buyer Service crew ought to be capable to view the order information however not PII data
  • Customers from the Gross sales crew ought to be capable to view buyer IDs and all order data
  • Customers from the Govt crew ought to be capable to view all the information
  • Workers shouldn’t be in a position to view any information

The next diagram illustrates how DDM assist in Amazon Redshift insurance policies works with roles and customers for our retail use case.

The answer encompasses creating masking insurance policies with various masking guidelines and attaching a number of to the identical position and desk with an assigned precedence to take away potential conflicts. These insurance policies might pseudonymize outcomes or selectively nullify outcomes to adjust to retailers’ safety necessities. We seek advice from a number of masking insurance policies being hooked up to a desk as a multi-modal masking coverage. A multi-modal masking coverage consists of three elements:

  • An information masking coverage that defines the information obfuscation guidelines
  • Roles with totally different entry ranges relying on the enterprise case
  • The power to connect a number of masking insurance policies on a person or position and desk mixture with precedence for battle decision

Stipulations

To implement this resolution, you want the next conditions:

Put together the information

To arrange our use case, full the next steps:

  1. On the Amazon Redshift console, select Question editor v2 underneath Explorer within the navigation pane.

In the event you’re acquainted with SQL Notebooks, you may obtain the SQL pocket book for the demonstration and import it to shortly get began.

  1. Create the desk and populate contents:
    -- 1- Create the orders desk
    drop desk if exists public.order_transaction;
    create desk public.order_transaction (
     data_json tremendous
    );
    
    -- 2- Populate the desk with pattern values
    INSERT INTO public.order_transaction
    VALUES
        (
            json_parse('
            {
            "c_custkey": 328558,
            "c_name": "Buyer#000328558",
            "c_phone": "586-436-7415",
            "c_creditcard": "4596209611290987",
            "orders":{
              "o_orderkey": 8014018,
              "o_orderstatus": "F",
              "o_totalprice": 120857.71,
              "o_orderdate": "2024-01-01"
              }
            }'
            )
        ),
        (
            json_parse('
            {
            "c_custkey": 328559,
            "c_name": "Buyer#000328559",
            "c_phone": "789-232-7421",
            "c_creditcard": "8709000219329924",
            "orders":{
              "o_orderkey": 8014019,
              "o_orderstatus": "S",
              "o_totalprice": 9015.98,
              "o_orderdate": "2024-01-01"
              }
            }'
            )
        ),
        (
            json_parse('
            {
            "c_custkey": 328560,
            "c_name": "Buyer#000328560",
            "c_phone": "276-564-9023",
            "c_creditcard": "8765994378650090",
            "orders":{
              "o_orderkey": 8014020,
              "o_orderstatus": "C",
              "o_totalprice": 18765.56,
              "o_orderdate": "2024-01-01"
              }
            }
            ')
        );

Implement the answer

To fulfill the safety necessities, we have to ensure that every person sees the identical information in numerous methods based mostly on their granted privileges. To do this, we use person roles mixed with masking insurance policies as follows:

  1. Create customers and roles, and add customers to their respective roles:
    --create 4 customers
    set session authorization admin;
    CREATE USER Kate_cust WITH PASSWORD disable;
    CREATE USER Ken_sales WITH PASSWORD disable;
    CREATE USER Bob_exec WITH PASSWORD disable;
    CREATE USER Jane_staff WITH PASSWORD disable;
    
    -- 1. Create Person Roles
    CREATE ROLE cust_srvc_role;
    CREATE ROLE sales_srvc_role;
    CREATE ROLE executives_role;
    CREATE ROLE staff_role;
    
    -- notice that public position exists by default.
    -- Grant Roles to Customers
    GRANT ROLE cust_srvc_role to Kate_cust;
    GRANT ROLE sales_srvc_role to Ken_sales;
    GRANT ROLE executives_role to Bob_exec;
    GRANT ROLE staff_role to Jane_staff;
    
    -- notice that regualr_user is hooked up to public position by default.
    GRANT ALL ON ALL TABLES IN SCHEMA "public" TO ROLE cust_srvc_role;
    GRANT ALL ON ALL TABLES IN SCHEMA "public" TO ROLE sales_srvc_role;
    GRANT ALL ON ALL TABLES IN SCHEMA "public" TO ROLE executives_role;
    GRANT ALL ON ALL TABLES IN SCHEMA "public" TO ROLE staff_role;

  2. Create masking insurance policies:
    -- Masks Full Knowledge
    CREATE MASKING POLICY mask_full
    WITH(pii_data VARCHAR(256))
    USING ('000000XXXX0000'::TEXT);
    
    -- This coverage rounds down the given worth to the closest 10.
    CREATE MASKING POLICY mask_price
    WITH(worth INT)
    USING ( (FLOOR(worth::FLOAT / 10) * 10)::INT );
    
    -- This coverage converts the primary 12 digits of the given bank card to 'XXXXXXXXXXXX'.
    CREATE MASKING POLICY mask_credit_card
    WITH(credit_card TEXT)
    USING ( 'XXXXXXXXXXXX'::TEXT || SUBSTRING(credit_card::TEXT FROM 13 FOR 4) );
    
    -- This coverage masks the given date
    CREATE MASKING POLICY mask_date
    WITH(order_date TEXT)
    USING ( 'XXXX-XX-XX'::TEXT);
    
    -- This coverage masks the given cellphone quantity
    CREATE MASKING POLICY mask_phone
    WITH(phone_number TEXT)
    USING ( 'XXX-XXX-'::TEXT || SUBSTRING(phone_number::TEXT FROM 9 FOR 4) );

  3. Connect the masking insurance policies:
    • Connect the masking coverage for the customer support use case:
      --customer_support (can't see buyer PHI/PII information however can see the order id , order particulars and standing and so on.)
      
      set session authorization admin;
      
      ATTACH MASKING POLICY mask_full
      ON public.order_transaction(data_json.c_custkey)
      TO ROLE cust_srvc_role;
      
      ATTACH MASKING POLICY mask_phone
      ON public.order_transaction(data_json.c_phone)
      TO ROLE cust_srvc_role;
      
      ATTACH MASKING POLICY mask_credit_card
      ON public.order_transaction(data_json.c_creditcard)
      TO ROLE cust_srvc_role;
      
      ATTACH MASKING POLICY mask_price
      ON public.order_transaction(data_json.orders.o_totalprice)
      TO ROLE cust_srvc_role;
      
      ATTACH MASKING POLICY mask_date
      ON public.order_transaction(data_json.orders.o_orderdate)
      TO ROLE cust_srvc_role;

    • Connect the masking coverage for the gross sales use case:
      --sales —> can see the shopper ID (non phi information) and all order data
      
      set session authorization admin;
      
      ATTACH MASKING POLICY mask_phone
      ON public.order_transaction(data_json.buyer.c_phone)
      TO ROLE sales_srvc_role;

    • Connect the masking coverage for the employees use case:
      --Workers — > can't see any information concerning the order. all columns masked for them ( we are able to hand choose some columns) to point out the performance
      
      set session authorization admin;
      
      ATTACH MASKING POLICY mask_full
      ON public.order_transaction(data_json.orders.o_orderkey)
      TO ROLE staff_role;
      
      ATTACH MASKING POLICY mask_pii_full
      ON public.order_transaction(data_json.orders.o_orderstatus)
      TO ROLE staff_role;
      
      ATTACH MASKING POLICY mask_pii_price
      ON public.order_transaction(data_json.orders.o_totalprice)
      TO ROLE staff_role;
      
      ATTACH MASKING POLICY mask_date
      ON public.order_transaction(data_json.orders.o_orderdate)
      TO ROLE staff_role;

Take a look at the answer

Let’s verify that the masking insurance policies are created and hooked up.

  1. Test that the masking insurance policies are created with the next code:
    -- 1.1- Verify the masking insurance policies are created
    SELECT * FROM svv_masking_policy;

  2. Test that the masking insurance policies are hooked up:
    -- 1.2- Confirm hooked up masking coverage on desk/column to person/position.
    SELECT * FROM svv_attached_masking_policy;

Now you may take a look at that totally different customers can see the identical information masked in a different way based mostly on their roles.

  1. Take a look at that the shopper assist can’t see buyer PHI/PII information however can see the order ID, order particulars, and standing:
    set session authorization Kate_cust;
    choose * from order_transaction;

  2. Take a look at that the gross sales crew can see the shopper ID (non PII information) and all order data:
    set session authorization Ken_sales;
    choose * from order_transaction;

  3. Take a look at that the executives can see all information:
    set session authorization Bob_exec;
    choose * from order_transaction;

  4. Take a look at that the employees can’t see any information concerning the order. All columns ought to masked for them.
    set session authorization Jane_staff;
    choose * from order_transaction;

Object_Transform operate

On this part, we dive into the capabilities and advantages of the OBJECT_TRANSFORM operate and discover the way it empowers you to effectively reshape your information for evaluation. The OBJECT_TRANSFORM operate in Amazon Redshift is designed to facilitate information transformations by permitting you to control JSON information immediately inside the database. With this operate, you may apply transformations to semi-structured or SUPER information varieties, making it simpler to work with complicated information constructions in a relational database setting.

Let’s take a look at some utilization examples.

First, create a desk and populate contents:

--1- Create the shopper desk 

DROP TABLE if exists customer_json;

CREATE TABLE customer_json (
    col_super tremendous,
    col_text character various(100) ENCODE lzo
) DISTSTYLE AUTO;

--2- Populate the desk with pattern information 

INSERT INTO customer_json
VALUES
    (
        
        json_parse('
            {
                "individual": {
                    "title": "GREGORY HOUSE",
                    "wage": 120000,
                    "age": 17,
                    "state": "MA",
                    "ssn": ""
                }
            }
        ')
        ,'GREGORY HOUSE'
    ),
    (
        json_parse('
              {
                "individual": {
                    "title": "LISA CUDDY",
                    "wage": 180000,
                    "age": 30,
                    "state": "CA",
                    "ssn": ""
                }
            }
        ')
        ,'LISA CUDDY'
    ),
     (
        json_parse('
              {
                "individual": {
                    "title": "JAMES WILSON",
                    "wage": 150000,
                    "age": 35,
                    "state": "WA",
                    "ssn": ""
                }
            }
        ')
        ,'JAMES WILSON'
    )
;
-- 3 choose the information 

SELECT * FROM customer_json;

Apply the transformations with the OBJECT_TRANSFORM operate:

SELECT
    OBJECT_TRANSFORM(
        col_super
        KEEP
            '"individual"."title"',
            '"individual"."age"',
            '"individual"."state"'
           
        SET
            '"individual"."title"', LOWER(col_super.individual.title::TEXT),
            '"individual"."wage"',col_super.individual.wage + col_super.individual.wage*0.1
    ) AS col_super_transformed
FROM customer_json;

As you may see within the instance, by making use of the transformation with OBJECT_TRANSFORM, the individual title is formatted in lowercase and the wage is elevated by 10%. This demonstrates how the transformation makes is simpler to work with semi-structured or nested information varieties.

Clear up

If you’re performed with the answer, clear up your sources:

  1. Detach the masking insurance policies from the desk:
    -- Cleanup
    --reset session authorization to the default
    RESET SESSION AUTHORIZATION;

  2. Drop the masking insurance policies:
    DROP MASKING POLICY mask_pii_data CASCADE;

  3. Revoke or drop the roles and customers:
    REVOKE ROLE cust_srvc_role from Kate_cust;
    REVOKE ROLE sales_srvc_role from Ken_sales;
    REVOKE ROLE executives_role from Bob_exec;
    REVOKE ROLE staff_role from Jane_staff;
    DROP ROLE cust_srvc_role;
    DROP ROLE sales_srvc_role;
    DROP ROLE executives_role;
    DROP ROLE staff_role;
    DROP USER Kate_cust;
    DROP USER Ken_sales;
    DROP USER Bob_exec;
    DROP USER Jane_staff;

  4. Drop the desk:
    DROP TABLE order_transaction CASCADE;
    DROP TABLE if exists customer_json;

Issues and greatest practices

Contemplate the next when implementing this resolution:

  • When attaching a masking coverage to a path on a column, that column have to be outlined because the SUPER information sort. You’ll be able to solely apply masking insurance policies to scalar values on the SUPER path. You’ll be able to’t apply masking insurance policies to complicated constructions or arrays.
  • You’ll be able to apply totally different masking insurance policies to a number of scalar values on a single SUPER column so long as the SUPER paths don’t battle. For instance, the SUPER paths a.b and a.b.c battle as a result of they’re on the identical path, with a.b being the mum or dad of a.b.c. The SUPER paths a.b.c and a.b.d don’t battle.

Check with Utilizing dynamic information masking with SUPER information sort paths for extra particulars on issues.

Conclusion

On this submit, we mentioned the way to use DDM assist for the SUPER information sort in Amazon Redshift to outline configuration-driven, constant, format-preserving, and irreversible masked information values. With DDM assist in Amazon Redshift, you may management your information masking strategy utilizing acquainted SQL language. You’ll be able to reap the benefits of the Amazon Redshift role-based entry management functionality to implement totally different ranges of knowledge masking. You’ll be able to create a masking coverage to establish which column must be masked, and you’ve got the pliability of selecting the way to present the masked information. For instance, you may fully cover all the data of the information, exchange partial actual values with wildcard characters, or outline your individual solution to masks the information utilizing SQL expressions, Python, or Lambda UDFs. Moreover, you may apply conditional masking based mostly on different columns, which selectively protects the column information in a desk based mostly on the values in a number of columns.

We encourage you to create your individual user-defined features for numerous use circumstances and obtain your required safety posture utilizing dynamic information masking assist in Amazon Redshift.


Concerning the Authors

Ritesh Kumar Sinha is an Analytics Specialist Options Architect based mostly out of San Francisco. He has helped clients construct scalable information warehousing and massive information options for over 16 years. He likes to design and construct environment friendly end-to-end options on AWS. In his spare time, he loves studying, strolling, and doing yoga.

Tahir Aziz is an Analytics Resolution Architect at AWS. He has labored with constructing information warehouses and massive information options for over 15+ years. He loves to assist clients design end-to-end analytics options on AWS. Exterior of labor, he enjoys touring and cooking.

Omama Khurshid is an Acceleration Lab Options Architect at Amazon Net Providers. She focuses on serving to clients throughout numerous industries construct dependable, scalable, and environment friendly options. Exterior of labor, she enjoys spending time along with her household, watching motion pictures, listening to music, and studying new applied sciences.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles