[HTML payload içeriği buraya]
30.8 C
Jakarta
Monday, May 11, 2026

How Zalando innovates their Quick-Serving layer by migrating to Amazon Redshift


Whereas Zalando is now considered one of Europe’s main on-line trend vacation spot, it started in 2008 as a Berlin-based startup promoting sneakers on-line. What began with just some manufacturers and a single nation shortly grew right into a pan-European enterprise, working in 27 markets and serving greater than 52 million energetic prospects.

Quick ahead to in the present day, and Zalando isn’t simply an internet retailer—it’s a tech firm at its core. With greater than €14 billion in annual gross merchandise quantity (GMV), the corporate realized that to serve trend at scale, it wanted to depend on extra than simply logistics and stock. It wanted information. And never simply to help the enterprise—however to drive it.

On this put up, we present how Zalando migrated their fast-serving layer information warehouse to Amazon Redshift to realize higher price-performance and scalability.

The size and scope of Zalando’s information operations

From customized dimension suggestions that cut back returns to dynamic pricing, demand forecasting, focused advertising, and fraud detection, information and AI are embedded throughout the group.

Zalando’s information platform operates at a powerful scale, managing over 20 petabytes of information in its lake supporting numerous analytics and machine studying purposes. The information platform hosts greater than 5,000 information merchandise maintained by 350 decentralized groups, serving 6,000 month-to-month customers, representing 80% of Zalando’s company workforce. As a totally self-service information platform, it supplies SQL analytics, orchestration, information discovery, and high quality monitoring, empowering groups to construct and handle information merchandise independently.

This scale solely made the necessity for modernization extra pressing. It was clear that environment friendly information loading, dynamic compute scaling, and future-ready infrastructure had been important.

Challenges with the prevailing Quick-Serving Layer (information warehouse)

To allow selections throughout analytics, dashboards, and machine studying, Zalando makes use of a knowledge warehouse that acts as a fast-serving layer and spine for important information/reporting use circumstances. This layer holds about 5,000 curated tables and views, optimized for fast, read-heavy workloads. Each week, greater than 3,000 customers—together with analysts, information scientists, and enterprise stakeholders—depend on this layer for immediate insights.

However the incumbent information warehouse wasn’t future proof. It was primarily based on a monolithic cluster setup optimized for peak hundreds, like Monday mornings, when weekly and every day jobs pile up. Consequently, 80% of the time, the system sat underutilized, burning compute and resulting in substantial “slack prices” from over-provisioned capability, with potential month-to-month financial savings of over $30,000 if dynamic scaling had been potential. Concurrency limitations resulted in excessive latency and disrupted business-critical reporting processes. The system’s lack of elasticity led to poor cost-to-utilization ratios, whereas the absence of workload isolation between groups incessantly brought on operational incidents. Upkeep and scaling required fixed vendor help, making it troublesome to handle peak intervals like CyberWeek on account of occasion shortage. Moreover, the platform lacked fashionable options similar to on-line question editors and correct auto scaling capabilities, whereas its gradual function growth and restricted group help additional hindered Zalando’s potential to innovate.

Fixing for scale: Zalando’s journey to a contemporary quick serving layer

Zalando was on the lookout for an answer that demonstrated capabilities which may meet their value and efficiency targets via a “easy raise and shift” method. Amazon Redshift was chosen for the POC to handle autoscaling and concurrency wants, whereas concurrently decreasing operational efforts in addition to its potential to combine with Zalando’s current information platform and align with their total information technique.

The general analysis scope for the Redshift evaluation lined following key areas.

Efficiency and price

The analysis of Amazon Redshift demonstrated substantial efficiency enhancements and price advantages in comparison with the previous information warehousing platform.

  • Redshift supplied 3-5 occasions quicker question execution time.
  • Roughly 86% of distinct queries ran quicker on Redshift.
  • In a “Monday morning situation”, Redshift demonstrated 3 occasions quicker accrued execution time in comparison with the prevailing platform
  • For brief queries, Redshift achieved 100% SLA compliance for queries within the 80-480 second vary. For queries as much as 80 seconds, 90% met SLA.
  • Redshift demonstrated 5x quicker parallel question execution, dealing with considerably increased concurrent queries than the present information warehouse’s most parallelism.
  • For Interactive Utilization use circumstances, Redshift demonstrated sturdy efficiency, which is crucial for BI instrument customers, particularly in parallel executions situation.
  • Redshift options similar to Computerized Desk Optimizations and Automated Materialized views eradicated the necessity for information producing groups to manually optimize the design of tables, making it extremely appropriate for a central service providing.

Structure

Redshift efficiently demonstrated workload isolation similar to separating transformations(ETL) from serving (BI, Advert-hoc and so on.) workload utilizing Amazon Redshift information sharing. It additionally proved its versatility via integration with Spark and customary file codecs was additionally confirmed.

Safety

Amazon Redshift efficiently demonstrated end-to-end encryption, auditing capabilities, and complete entry controls with Row-Stage and Column-Stage Safety as a part of the proof of idea.

Developer productiveness

The analysis demonstrated vital enhancements in developer effectivity. A baseline idea for central deployment template authoring and distribution by way of AWS Service Catalog was efficiently applied. Moreover, Redshift confirmed spectacular agility with its potential to deploy Redshift Serverless endpoints in minutes for ad-hoc analytics, enhancing the crew’s potential to shortly reply to analytical wants.

Amazon Redshift migration technique

This part outlines the method Zalando took emigrate the fast-serving layer to Amazon Redshift.

From monolith to modular: Redesigning with Redshift

The migration technique concerned an entire re-architecture of the fast-serving layer, shifting to Amazon Redshift with a multi-warehouse mannequin that separates information producers from information shoppers.Key parts and ideas of the goal structure embody:

  1. Workload Isolation: Use circumstances are remoted by occasion or setting, with information shares facilitating information change between them. Knowledge shares allow an “straightforward fan out” of information from the Producer warehouse to varied Shopper warehouses. The producer and client warehouses could be both Provisioned (similar to for BI Instruments) or Serverless (similar to for Analysts). This permits for information sharing between separate authorized entities.
  2. Standardized Knowledge Loading: A Knowledge Loading API (proprietary to Zalando) was constructed to standardize information loading processes. This API helps incremental loading and efficiency optimizations. Carried out with AWS Step Capabilities and AWS Lambda, it detects modified Parquet information from Delta lake metadata and makes use of Redshift spectrum for loading information into the Redshift Producer warehouse.
  3. Utilizing Redshift Serverless: Zalando goals to make use of Redshift Serverless wherever potential. Redshift Serverless presents flexibility, value effectivity, and improved efficiency, significantly for the light-weight queries prevalent in BI dashboards. It additionally allows the deployment of Redshift serverless endpoints in minutes for ad-hoc analytics, enhancing developer productiveness.

The next diagram depicts Zalando’s end-to-end Amazon Redshift multi-warehouse structure, highlighting the producer-consumer mannequin:

Architecture Diagram

The core technique of migration was “lift-and-shift” by way of code to keep away from complicated refactoring and meet deadlines.

The principle ideas used had been:

  • Run duties in parallel each time potential.
  • Decrease the workload for inner information groups.
  • Decouple duties to permit groups to schedule work flexibly.
  • Maximize the work accomplished by centrally managed companions.

Three-stage migration method

The migration is damaged down into three distinct phases to handle the transition successfully.

Stage 1: Knowledge replication

Zalando’s precedence was creating an entire, synchronized copy of all goal information tables from the previous information warehouse to Redshift. An automatic course of was applied utilizing Changehub, an inner instrument constructed on Amazon Managed Workflows for Apache Airflow (MWAA), that displays the previous system’s logs and syncs information updates to Redshift roughly each 5-10 minutes, establishing the brand new information basis with out disrupting current workflows.

Stage 2: Workload migration

The second stage targeted on shifting enterprise logic (ETL) and MicroStrategy reporting to Redshift to considerably cut back the load on the legacy system. For ETL migration, semi-automated method was applied utilizing Migvisor code convertor to transform the scripts. MicroStrategy reporting was migrated by leveraging MSTR’s functionality to robotically generate Redshift-compatible queries primarily based on the semantic layer.

Stage 3: Finalization and decommissioning

The ultimate stage completes the transition by migrating all remaining information shoppers and ingestion processes, resulting in the complete shutdown of the previous information warehouse. Throughout this section, all information pipelines are being rerouted to feed straight into Redshift, and long-term possession of processes is being transitioned to the respective groups earlier than the previous system is absolutely decommissioned.

Advantages and Outcomes

A significant infrastructure change at Zalando occurred on October 30, 2024, switching 80% of analytics reporting from the previous information warehouse resolution to Redshift. The migration of 80% of analytics reporting to Redshift efficiently lowered operational danger for the important Cyber Week interval and enabled the decommissioning of the previous information warehouse to keep away from vital license charges.

The mission resulted in substantial efficiency and stability enhancements throughout the board.

Efficiency Enhancements

Key efficiency metrics show substantial enhancements throughout a number of dimensions:

  • Sooner Question Execution: 75% of all queries now execute quicker on Redshift.
  • Improved Reporting Pace: Excessive-priority reporting queries are considerably quicker, with a 13% discount in P90 execution time and a 23% discount in P99 execution time.
  • Drastic Discount in System Load: The general processing time for MicroStrategy (MSTR) stories has dramatically decreased. Peak Monday morning execution time dropped from 130 minutes to 52 minutes. Within the first 4
  • weeks, the full MSTR job period was lowered by over 19,000 hours (equal to 2.2 years of compute time) in comparison with the earlier system. This has led to way more constant and dependable efficiency.

The next graph exhibits one of many important Monday Morning Workload elapsed period on old-data warehouse in addition to Amazon Redshift.

Critical Monday Morning Workload elapsed duration on old-data warehouse as well as Amazon Redshift

Operational stability

Amazon Redshift has confirmed to be considerably extra steady and dependable, efficiently assembly the important thing goal of decreasing operational danger.

  • Report Timeouts: Report timeouts, a main concern, have been just about eradicated.
  • Essential Enterprise Interval Efficiency: Redshift carried out exceptionally nicely through the high-stress Cyber Week 2024. This can be a stark distinction to the previous system, which suffered important, financially impactful failures throughout the identical interval in 2022 and 2023.
  • Knowledge Loading: For information producers, the consistency of information loading is important, as delays can maintain up quite a few stories and trigger direct enterprise affect. The system relied on an “ETL Prepared” occasion, which triggers report processing solely in spite of everything required datasets have been loaded. For the reason that migration to Redshift, the timing of this occasion has change into considerably extra constant, enhancing the reliability of the whole information pipeline.

The next diagram exhibits consistency in ETL Prepared occasion, after migrating to Amazon Redshift

ETL Ready Event Execution times

Finish person expertise

The discount in whole execution time of Monday morning hundreds has resulted in dramatically improved end-user productiveness. That is the time wanted to course of the complete batch of scheduled stories (peak load), which straight interprets to attend occasions and productiveness for finish customers, since that is when most customers want their weekly stories for his or her enterprise. The next graphs exhibits typical Mondays earlier than and after the change and the way Amazon Redshift handles the MSTR queue offering significantly better finish person expertise.

MSTR queue on 28/10/2024 (before switch)MSTR queue on 28/10/2024 (earlier than change)

MSTR queue on 02/12/25 (after switch)MSTR queue on 02/12/25 (after change)

Learnings and unexpected challenges

Navigating computerized optimization in a multi-warehouse structure

One of the crucial vital challenges Zalando encountered throughout migration entails Redshift’s multi-warehouse structure and its interplay with computerized desk upkeep. The Redshift structure is designed for workload isolation: a central producer warehouse for information loading, and a number of client warehouses for analytical queries. Knowledge and related objects reside solely on the producer and are shared by way of Redshift Datashare.

The core difficulty: Redshift’s Computerized Desk Optimization (ATO) operates completely on the producer warehouse. This extends to different efficiency options like Computerized Materialized Views and computerized question rewriting. Consequently, these optimization processes had been unaware of question patterns and workloads on client warehouses. For example, MicroStrategy stories operating heavy analytical queries on the buyer aspect had been exterior the scope of those automated options. This led to suboptimal information fashions and vital efficiency impacts, significantly for tables with AUTO-set distribution and type keys.

To handle this, two-pronged method was applied:

1. Collaborative handbook tuning: Zalando labored intently with the AWS Database Engineering crew, who present holistic efficiency checks and tailor-made suggestions for distribution and type keys throughout all warehouses.

2. Scheduled desk upkeep: Zalando applied a every day VACUUM course of for tables with over 5% unsorted information, guaranteeing information group and question efficiency.

Moreover, following information distribution technique was applied:

  1. KEY Distribution: Explicitly outlined DISTKEY for tables with clear JOIN circumstances.
  2. EVEN Distribution: Used for big reality tables with out clear be part of keys.
  3. ALL Distribution: Utilized to smaller dimension tables (underneath 4 million rows).

This proactive method has given higher management over cluster efficiency and mitigated information skew points. Zalando is inspired that AWS is working to incorporate cross-cluster workload consciousness in a future Redshift launch, which ought to additional optimize multi-warehouse setup.

CTEs and execution plans

Widespread Desk Expressions (CTEs) are a strong instrument for structuring complicated queries by breaking them down into logical, readable steps. Evaluation of question efficiency recognized optimization alternatives in CTE utilization patterns.

Efficiency monitoring revealed that Redshift’s question engine would generally recompute the logic for a nested or repeatedly referenced CTE from scratch each time it was known as throughout the identical SQL assertion as an alternative of writing the CTE’s consequence to an in-memory short-term desk for reuse.

Two methods proved efficient in addressing this problem:

  • Convert to a materialized view: CTEs used incessantly throughout a number of queries or with significantly complicated logic had been transformed into materialized views (MVs). This pre-compute the consequence, making the info available with out re-running the underlying logic.
  • Use express short-term tables: For CTEs used a number of occasions inside a single, complicated question, the CTE’s consequence was explicitly written right into a short-term desk originally of the transaction. For instance, inside MicroStrategy, the “intermediate desk kind” setting was modified from the default CTE to “Short-term desk.”

Implementation of both materialized views or short-term tables ensures the complicated logic is computed solely as soon as. This method eradicated the recomputation difficulty and considerably improved the efficiency of multi-layered SQL queries.

Optimizing reminiscence utilization by right-sizing VARCHAR columns

It might seem to be a minor element, however defining the suitable size for VARCHAR columns can have a stunning and vital affect on question efficiency. This was found firsthand whereas investigating the foundation reason behind gradual queries that had been exhibiting excessive quantities of disk spill.

The problem stemmed from information loading API instrument, which is answerable for syncing information from Delta Lake tables into Redshift. As a result of Delta Lake’s StringType datatype doesn’t have an outlined size, the instrument defaulted to creating Redshift columns with a really excessive VARCHAR size (similar to VARCHAR(16384)).

When a question is executed, the Redshift question engine allocates reminiscence for in-transit information primarily based on the column’s outlined dimension, not the precise dimension of the info it accommodates. This meant that for a column containing strings of solely 50 characters however outlined as VARCHAR(16384), the engine would reserve a vastly outsized block of reminiscence. This extreme reminiscence allocation led on to excessive disk spill, the place intermediate question outcomes overflowed from reminiscence to disk, drastically slowing down execution.

To resolve this, a brand new course of was applied requiring information groups to explicitly outline acceptable column lengths throughout object deployment. nalyzing the precise information and setting lifelike VARCHAR sizes (similar to VARCHAR(100) as an alternative of VARCHAR(16384)), considerably improved reminiscence utilization, lowered disk spill, and boosted total question pace. This variation underscores the significance of precision in information definition for an optimized Redshift setting.

Future outlook

Central to Zalando technique is the shift to a serverless-based warehouse topology. This transfer allows computerized scaling to satisfy fluctuating analytical calls for, from seasonal gross sales peaks to new crew tasks, all with out handbook intervention. The method permits information groups to focus fully on producing insights that drive innovation, guaranteeing platform efficiency aligns with enterprise progress.

Because the platform scales, accountable administration is paramount. The mixing of AWS Lake Formation create a centralized governance mannequin for safe, fine-grained information entry, enabling secure information democratization throughout the group. Concurrently, Zalando is embedding a powerful FinOps tradition by establishing unified value administration processes. This supplies information house owners with a complete, 360-degree view of their prices throughout Redshift’s companies, empowering them with actionable insights to optimize spending and align it with enterprise worth. In the end, the aim is to make sure each funding in Zalando’s information platform is maximized for enterprise affect.

Conclusion

On this put up, we confirmed how Zalando’s migration to Amazon Redshift has efficiently reworked its information platform, making it a extra data-driven trend tech chief. This transfer has delivered vital enhancements throughout key areas together with enhanced efficiency, elevated stability, lowered operational prices, and improved information consistency. Shifting ahead, a serverless-based structure, centralized governance with AWS Lake Formation, and a powerful FinOps tradition will proceed to drive innovation and maximize enterprise affect.

In the event you’re considering studying extra about Amazon Redshift capabilities, we suggest watching the newest What’s new with Amazon Redshift session within the AWS Occasions channel to get an summary of the options just lately added to the service. It’s also possible to discover the self-service, hands-on Amazon Redshift labs to experiment with key Amazon Redshift functionalities in a guided method.

Contact your AWS account crew to learn the way we may help you modernize your information warehouse infrastructure.


Concerning the authors

Srinivasan Molkuva

Srinivasan Molkuva

Srinivasan is an Engineering Supervisor at Zalando with over a decade and a half of experience within the information area. He presently leads the Quick Serving Layer crew, having efficiently managed the transition of important programs that help the corporate’s complete reporting and analytical panorama.

Sabri Ömür Yıldırmaz

Sabri Ömür Yıldırmaz

Ömür is a Senior Software program Engineer at Zalando, primarily based in Berlin, Germany. Captivated with fixing complicated challenges throughout backend purposes and cloud infrastructure, he specializes within the end-to-end lifecycle of important information platforms, driving architectural selections to make sure robustness, excessive efficiency, scalability, and cost-efficiency.

Prasanna Sudhindrakumar

Prasanna Sudhindrakumar

Prasanna is a Senior Software program Engineer at Zalando, primarily based in Berlin, Germany. Brings years of expertise constructing scalable information pipelines and serverless purposes on AWS. Captivated with designing distributed programs with a powerful concentrate on value effectivity and efficiency, with a eager curiosity in fixing complicated architectural and platform-level challenges.

Paritosh Kumar Pramanick

Paritosh Kumar Pramanick

Paritosh is a Senior Knowledge Engineer at Zalando, primarily based in Berlin, Germany. He has over a decade of expertise spearheading information warehousing initiatives for multinational companies. Professional in transitioning legacy programs to fashionable, cloud-native architectures, guaranteeing excessive efficiency, information integrity, and seamless integration throughout world enterprise items.

Saman Irfan

Saman Irfan

Saman is a Senior Specialist Options Architect at Amazon Net Providers, primarily based in Berlin, Germany. Saman is keen about serving to organizations modernize their information architectures to drive innovation and enterprise transformation.

Werner Gunter

Werner Gunter

Werner is a Principal Specialist Options Architect at Amazon Net Providers, primarily based in Berlin, Germany. As a seasoned information skilled, he has helped massive enterprises worldwide over the previous 2 a long time, to modernize their information analytics estates.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles