[HTML payload içeriği buraya]
35.1 C
Jakarta
Monday, May 11, 2026

Nexthink scales to trillions of occasions per day with Amazon MSK


Actual-time knowledge streaming and occasion processing current scalability and administration challenges. AWS provides a broad number of managed real-time knowledge streaming providers to effortlessly run these workloads at any scale.

On this publish, Nexthink shares how Amazon Managed Streaming for Apache Kafka (Amazon MSK) empowered them to realize large scale in occasion processing. Experiencing enterprise hyper-growth, Nexthink migrated to AWS to beat the scaling limitations of on-premises options. With Amazon MSK, Nexthink now seamlessly processes trillions of occasions per day, reaching over 5 GB per second of aggregated throughput.

Within the following sections, Nexthink introduces their product and the necessity for scalability. They then spotlight the challenges of their legacy on-premises utility and current their transition to a cloud-centered software program as a service (SaaS) structure powered by Amazon MSK. Lastly, Nexthink particulars the advantages achieved by adopting Amazon MSK.

Nexthink’s have to scale

Nexthink is the chief in digital worker expertise (DeX). The corporate is shaping the way forward for work by offering IT leaders and C-levels with insights into staff’ day by day know-how experiences on the gadget and utility stage. This permits IT to evolve from reactive problem-solving to proactive optimization.

The Nexthink Infinity platform combines analytics, monitoring, automation, and extra to handle the worker digital expertise. By accumulating gadget and utility occasions, processing them in actual time, and storing them, our platform analyzes knowledge to resolve issues and increase experiences for over 15 million staff throughout 5 continents.

In simply 3 years, Nexthink’s enterprise grew tenfold, and with the introduction of extra real-time knowledge our utility needed to scale from processing 200 MB per second to five GB per second and trillions of occasions day by day. To allow this development, we modernized our utility from an on-premises single-tenant monolith to a cloud-based scalable SaaS answer powered by Amazon MSK.

The following sections element our modernization journey, together with the challenges we confronted and the advantages we realized with our new cloud-centered, AWS-based structure.

The on-premises answer and its challenges

Let’s first discover our earlier on-premises answer, Nexthink V6, earlier than inspecting how Amazon MSK addressed its challenges. The next diagram illustrates its structure.

Nexthink v6

V6 was made up of two monolithic, single-tenant Java and C++ purposes that have been tightly coupled. The portal was a backend-for-frontend Java utility, and the core engine was an in-house C++ in-memory database utility that was additionally dealing with gadget connections, knowledge ingestion, aggregation, and querying. By bundling all these features collectively, the engine turned tough to handle and enhance.

V6 additionally lacked scalability. Initially supporting 10,000 gadgets, some new tenants had over 300,000 gadgets. We reacted by deploying a number of V6 engines per tenant, rising complexity and price, hampering person expertise, and delaying time to market. This additionally led to longer proof of idea and onboarding cycles, which harm the enterprise.

Moreover, the absence of a streaming platform like Kafka created dependencies between groups by means of tight HTTP/gRPC coupling. Moreover, groups couldn’t entry real-time occasions earlier than ingestion into the database, limiting function growth. We additionally lacked an information buffer, risking potential knowledge loss throughout outages. Such constraints impeded innovation and elevated dangers.

In abstract, though the V6 system served its preliminary goal, reinventing it with cloud-centered applied sciences turned crucial to reinforce scalability, reliability, and foster innovation by our engineering and product groups.

Transitioning to a cloud-centered structure with Amazon MSK

To attain our modernization objectives, after thorough analysis and iterations, we applied an event-driven microservices design on Amazon Elastic Kubernetes Service (Amazon EKS), utilizing Kafka on Amazon MSK for distributed occasion storage and streaming.

Our transition from the v6 on-prem answer to the cloud-centered platform was phased over 4 iterations:

  • Section 1 – We lifted and shifted from on premises to digital machines within the cloud, lowering operational complexities and accelerating proof of idea cycles whereas transparently migrating prospects.
  • Section 2 – We prolonged the cloud structure by implementing new product options with microservices and self-managed Kafka on Kubernetes. Nevertheless, working Kafka clusters ourselves proved overly tough, main us to Section 3.
  • Section 3 – We switched from self-managed Kafka to Amazon MSK, bettering stability and lowering operational prices. We realized that managing Kafka wasn’t our core competency or differentiator, and the overhead was excessive. Amazon MSK enabled us to concentrate on our core utility, releasing us from the burden of undifferentiated Kafka administration.
  • Section 4 – Lastly, we eradicated all legacy parts, finishing the transition to a completely cloud-centered SaaS platform. This multi-year journey of studying and transformation took 3 years.

At this time, after our profitable transition, we use Amazon MSK for 2 key features:

  • Actual-time knowledge ingestion and processing of trillions of day by day occasions from over 15 million gadgets worldwide, as illustrated within the following determine.

Nexthink Architecture Ingestion

  • Enabling an event-driven system that decouples knowledge producers and shoppers, as depicted within the following determine.

Nexthink Architecture Event Driven

To additional improve our scalability and resilience, we adopted a cell-based structure utilizing the broad availability of Amazon MSK throughout AWS Areas. We at present function over 10 cells, every representing an unbiased regional deployment of our SaaS answer. This cell-based strategy minimizes the world of impression in case of points, addresses knowledge residency necessities, and permits horizontal scaling throughout AWS Areas, as illustrated within the following determine.

Nexthink Architecture Cells

Advantages of Amazon MSK

Amazon MSK has been essential in enabling our event-driven design. On this part, we define the principle advantages we gained from its adoption.

Improved knowledge resilience

In our new structure, knowledge from gadgets is pushed on to Kafka matters in Amazon MSK, which gives excessive availability and resilience. This makes positive that occasions will be safely acquired and saved at any time. Our providers consuming this knowledge inherit the identical resilience from Amazon MSK. If our backend ingestion providers face disruptions, no occasion is misplaced, as a result of Kafka retains all printed messages. When our providers resume, they seamlessly proceed processing from the place they left off, due to Kafka’s producer semantics, which permit processing messages exactly-once, at-least-once, or at-most-once based mostly on utility wants.

Amazon MSK permits us to tailor the info retention period to our particular necessities, starting from seconds to limitless period. This flexibility grants uninterrupted knowledge availability to our utility, which wasn’t attainable with our earlier structure. Moreover, to safeguard knowledge integrity within the occasion of processing errors or corruption, Kafka enabled us to implement an information replay mechanism, guaranteeing knowledge consistency and reliability.

Organizational scaling

By adopting an event-driven structure with Amazon MSK, we decomposed our monolithic utility into loosely coupled, stateless microservices speaking asynchronously by way of Kafka matters. This strategy enabled our engineering group to scale quickly from simply 4–5 groups in 2019 to over 40 groups and roughly 350 engineers immediately.

The unfastened coupling between occasion publishers and subscribers empowered groups to concentrate on distinct domains, equivalent to knowledge ingestion, identification providers, and knowledge lakes. Groups may develop options independently inside their domains, speaking by means of Kafka matters with out tight coupling. This structure accelerated function growth by minimizing the chance of recent options impacting present ones. Groups may effectively eat occasions printed by others, providing new capabilities extra quickly whereas lowering cross-team dependencies.

The next determine illustrates the seamless workflow of including new domains to our system.

Adding domains

Moreover, the event-driven design allowed groups to construct stateless providers that might seamlessly auto scale based mostly on MSK metrics like messages per second. This event-driven scalability eradicated the necessity for intensive capability planning and guide scaling efforts, releasing up growth time.

By utilizing an event-driven microservices structure on Amazon MSK, we achieved organizational agility, enhanced scalability, and accelerated innovation whereas minimizing operational overhead.

Seamless infrastructure scaling

Nexthink’s enterprise grew tenfold in 3 years, and plenty of new capabilities have been added to the product, resulting in a considerable enhance in site visitors from 200 MB per second to five GB per second. This exponential knowledge development was enabled by the sturdy scalability of Amazon MSK. Attaining such scale with an on-premises answer would have been difficult and costly, if not infeasible.

Making an attempt to self-manage Kafka imposed pointless operational overhead with out offering enterprise worth. Operating it with simply 5% of immediately’s site visitors was already advanced and required two engineers. At immediately’s volumes, we estimated needing 6–10 devoted workers, rising prices and diverting assets away from core priorities.

Actual-time capabilities

By channeling all our knowledge by means of Amazon MSK, we enabled real-time processing of occasions. This unlocked capabilities like real-time alerts, event-driven triggers, and webhooks that have been beforehand unattainable. As such, Amazon MSK was instrumental in facilitating our event-driven structure and powering impactful improvements.

Safe knowledge entry

Transitioning to our new structure, we met our safety and knowledge integrity objectives. With Kafka ACLs, we enforced strict entry controls, permitting shoppers and producers to solely work together with licensed matters. We based mostly these granular knowledge entry controls on standards like knowledge kind, area, and crew.

To securely scale decentralized administration of matters, we launched proprietary Kubernetes Customized Useful resource Definitions (CRDs). These CRDs enabled groups to independently handle their very own matters, settings, and ACLs with out compromising safety.

Amazon MSK encryption made positive that the info remained encrypted at relaxation and in transit. We additionally launched a Deliver Your Personal Key (BYOK) possibility, permitting application-level encryption with buyer keys for all single-tenant and multi-tenant matters.

Enhanced observability

Amazon MSK gave us nice visibility into our knowledge flows. The out-of-the-box Amazon CloudWatch metrics allow us to see the quantity and sorts of knowledge flowing by means of every matter and cluster. This helped us quantify the utilization of our product options by monitoring knowledge volumes on the matter stage. The Amazon MSK operational metrics enabled easy monitoring and right-sizing of clusters and brokers. General, the wealthy observability of Amazon MSK facilitated data-driven choices about structure and product options.

Conclusion

Nexthink’s journey from an on-premises monolith to a cloud SaaS was streamlined through the use of Amazon MSK, a completely managed Kafka service. Amazon MSK allowed us to scale seamlessly whereas benefiting from enterprise-grade reliability and safety. By offloading Kafka administration to AWS, we may keep centered on our core enterprise and innovate quicker.

Going ahead, we plan to additional enhance efficiency, prices, and scalability by adopting Amazon MSK capabilities equivalent to tiered storage and AWS Graviton-based EC2 occasion varieties.

We’re additionally working intently with the Amazon MSK crew to arrange for upcoming service options. Quickly adopting new capabilities will assist us stay on the forefront of innovation whereas persevering with to develop our enterprise.

To study extra about how Nexthink makes use of AWS to serve its international buyer base, discover the Nexthink on AWS case examine. Moreover, uncover different buyer success tales with Amazon MSK by visiting the Amazon MSK weblog class.


In regards to the Authors

Moe HaidarMoe Haidar is a principal engineer and particular tasks lead @ CTO workplace of Nexthink. He has been concerned with AWS since 2018 and is a key contributor to the cloud transformation of the Nexthink platform to AWS. His focus is on product and know-how incubation and structure, however he additionally loves doing hands-on actions to maintain his information of applied sciences sharp and updated. He nonetheless contributes closely to the code base and likes to deal with advanced issues.
Simone PomataSimone Pomata is Senior Options Architect at AWS. He has labored enthusiastically within the tech business for greater than 10 years. At AWS, he helps prospects achieve constructing new applied sciences daily.
Magdalena GargasMagdalena Gargas is a Options Architect enthusiastic about know-how and fixing buyer challenges. At AWS, she works largely with software program corporations, serving to them innovate within the cloud. She participates in business occasions, sharing insights and contributing to the development of the containerization area.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles