[HTML payload içeriği buraya]
32.9 C
Jakarta
Wednesday, May 6, 2026

Proactive monitoring for Amazon Redshift Serverless utilizing AWS Lambda and Slack alerts


Efficiency points in analytics environments usually stay invisible till they disrupt dashboards, delay ETL jobs, or influence enterprise choices. For groups operating Amazon Redshift Serverless, unmonitored question queues, long-running queries, or surprising spikes in compute capability can degrade efficiency and enhance prices if left undetected.

Amazon Redshift Serverless streamlines operating analytics at scale by eradicating the necessity to provision or handle infrastructure. Nonetheless, even in a serverless atmosphere, sustaining visibility into efficiency and utilization is crucial for environment friendly operation and predictable prices. Whereas Amazon Redshift Serverless supplies superior built-in dashboards for monitoring efficiency metrics, delivering notifications on to platforms like Slack, brings one other stage of agility. Actual-time alerts within the crew’s workflow allow quicker response instances and extra knowledgeable decision-making with out requiring fixed dashboard monitoring.

On this submit, we present you how you can construct a serverless, low-cost monitoring answer for Amazon Redshift Serverless that proactively detects efficiency anomalies and sends actionable alerts on to your chosen Slack channels. This method helps your analytics crew determine and handle points early, usually earlier than your customers discover an issue.

The answer introduced on this submit makes use of AWS companies to gather key efficiency metrics from Amazon Redshift Serverless, consider them towards thresholds you can flexibly configure, and notify you when anomalies are detected.

scope of solution

The workflow operates as follows:

  1. Scheduled execution – An Amazon EventBridge rule triggers an AWS Lambda operate on a configurable schedule (by default, each quarter-hour throughout enterprise hours).
  2. Metric assortment – The AWS Lambda operate gathers metrics together with queued queries, operating queries, compute capability (RPUs), information storage utilization, desk depend, database connections, and slow-running queries utilizing Amazon CloudWatch and the Amazon Redshift Knowledge API.
  3. Threshold analysis – Collected metrics are in contrast towards your predefined thresholds that replicate acceptable efficiency and utilization limits.
  4. Alerting – When a threshold is exceeded, the Lambda operate publishes a notification to an Amazon SNS subject.
  5. Slack notification – Amazon Q Developer in Chat purposes (previously AWS Chatbot) delivers the alert to your designated Slack channel.
  6. Observability – Lambda execution logs are saved in Amazon CloudWatch Logs for troubleshooting and auditing.

This structure is totally serverless and requires no modifications to your present Amazon Redshift Serverless workloads. To simplify deployment, we offer an AWS CloudFormation template that provisions all required sources.

Conditions

Earlier than deploying this answer, you should gather details about your present Amazon Redshift Serverless workgroup and namespace that you just need to monitor. To determine your Amazon Redshift Serverless sources:

  1. Open the Amazon Redshift console.
  2. Within the navigation pane, select Serverless dashboard.
  3. Notice down your workgroup and namespace names. You’ll use these values when launching this weblog’s AWS CloudFormation template.

Deploy the answer

You possibly can launch the CloudFormation stack and deploy the answer through the supplied hyperlink.

GitHub Repo

When launching the CloudFormation stack, full the next steps within the AWS CloudFormation Console:

  1. For Stack identify, enter a descriptive identify reminiscent of redshift-serverless-monitoring.
  2. Evaluate and modify the parameters as wanted on your atmosphere.
  3. Acknowledge that AWS CloudFormation might create IAM sources with customized names.
  4. Select Submit.

CloudFormation parameters

Amazon Redshift Serverless Workgroup configuration

Present particulars on your present Amazon Redshift Serverless atmosphere. These values join the monitoring answer to your Redshift atmosphere. Some parameters include the default values you can substitute along with your precise configuration.

ParameterDefault worthDescription
Amazon Redshift Workgroup TitleYour Amazon Redshift Serverless workgroup identify.
Amazon Redshift Namespace TitleYour Amazon Redshift Serverless namespace identify.
Amazon Redshift Workgroup IDWorkgroup ID (UUID) of the Amazon Redshift Serverless workgroup to observe. Should comply with the UUID format: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx (lowercase hexadecimal with hyphens).
Namespace ID (UUID) of the Amazon Redshift Serverless namespace. Should comply with the UUID format: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx (lowercase hexadecimal with hyphens).
Database TitledevGoal Amazon Redshift database for SQL-based diagnostic and monitoring queries.

Monitoring schedule

The default schedule runs diagnostic SQL queries each quarter-hour throughout enterprise hours, balancing responsiveness and value effectivity. Working extra incessantly would possibly enhance prices, whereas much less frequent monitoring may delay detection of efficiency points. You possibly can alter this schedule to your precise want.

ParameterDefault worthDescription
Schedule Expressioncron(0/15 8-17 ? * MON-FRI *)EventBridge schedule expression for Lambda operate execution. Default runs each quarter-hour, Monday by Friday, 8 AM to five PM UTC.

Threshold configuration

Thresholds must be tuned primarily based in your workload traits.

ParameterDefault worthDescription
Queries Queued Threshold20Alerts threshold for queued queries.
Queries Working Threshold20Alerts threshold for operating queries.
Compute Capability Threshold (RPUs)64Alert threshold for compute capability (RPUs).
Knowledge Storage Threshold (MB)5242880Threshold for information storage in MB (default 5 TB).
Desk Rely Threshold (MB)1000Alerts threshold for complete desk depend.
Database Connections Threshold50Alert threshold for database connections.
Gradual Question Threshold (seconds)10Thresholds in seconds for gradual question detection.
Question Timeout (Seconds)30Timeout for SQL diagnostics queries.

Tip: Begin with conservative thresholds and refine them after observing baseline conduct for one to 2 weeks.

Lambda configuration

Configure the AWS Lambda operate settings. The chosen default values are acceptable for many monitoring situations. You could need to change them solely in case of troubleshooting.

ParameterDefault worthDescription
Lambda Reminiscence Measurement (MB)256Lambda operate reminiscence dimension in MB.
Lambda Time Out (Seconds)240Lambda operate timeout in seconds.

Safety Configuration – Amazon Digital Non-public Cloud (VPC)

In case your group has community isolation necessities, you’ll be able to optionally allow VPC deployment for the Lambda operate. When enabled, the Lambda operate runs inside your specified VPC subnets, offering community isolation and permitting entry to VPC-only sources.

ParameterDefault worthDescription
VPC IDVPC ID for Lambda deployment (required if EnableVPC is true). The Lambda operate will likely be deployed on this VPC. Make sure that the VPC has acceptable routing (NAT Gateway or VPC Endpoints) to permit Lambda to entry AWS companies like CloudWatch, Amazon Redshift, and Amazon SNS.
VPC Subnet IDsComma-separated record of subnet IDs for Lambda deployment (required if EnableVPC is true).
Safety Group IDsComma-separated record of safety group IDs for Lambda (non-obligatory). If not supplied and EnableVPC is true, a default safety group will likely be created with outbound HTTPS entry. Customized safety teams should permit outbound HTTPS (port 443) to AWS service endpoints.

Notice that VPC deployment would possibly enhance chilly begin instances and requires an NAT Gateway or VPC endpoints for AWS service entry. We suggest provisioning interface VPC endpoints (by AWS PrivateLink) for the 5 companies the Lambda operate calls which retains all visitors non-public with out the recurring value of a NAT Gateway.

Safety configuration – Encryption

In case your group requires encryption of information at relaxation, you’ll be able to optionally allow AWS Key Administration Service (AWS KMS) encryption for the Lambda operate’s atmosphere variables, CloudWatch Logs, and SNS subject. When enabled, the template encrypts every useful resource utilizing the AWS KMS keys that you just present, both a single shared key for all three companies, or particular person keys for granular key administration and audit separation.

ParameterDefault worthDescription
Shared KMS Key ARNAWS KMS key ARN to make use of for all encryption (Lambda, Logs, and SNS) until service-specific keys are supplied. This streamlines key administration by utilizing a single key for all companies. The important thing coverage should grant encrypt/decrypt permissions to Lambda, CloudWatch Logs, and SNS.
Lambda KMS Key ARNAWS KMS key ARN for Lambda atmosphere variable encryption (non-obligatory, overrides SharedKMSKeyArn). Use this for separate key administration per service. The important thing coverage should grant decrypt permissions to the Lambda execution position. If not supplied, SharedKMSKeyArn will likely be used when EnableKMSEncryption is true.
CloudWatch Logs KMS Key ARNAWS KMS key ARN for CloudWatch Logs encryption (non-obligatory, overrides SharedKMSKeyArn). Use this for separate key administration per service. The important thing coverage should grant encrypt/decrypt permissions to the CloudWatch Logs service. If not supplied, SharedKMSKeyArn will likely be used when EnableKMSEncryption is true.
SNS Subject KMS Key ARNAWS KMS key ARN for SNS subject encryption (non-obligatory, overrides SharedKMSKeyArn). Use this for separate key administration per service. The important thing coverage should grant encrypt/decrypt permissions to SNS service and the Lambda execution position. If not supplied, SharedKMSKeyArn will likely be used when EnableKMSEncryption is true.
Allow Lifeless Letter QueueFalseOptionally allow Lifeless Letter Queue (DLQ) for failed Lambda invocations to enhance reliability and safety monitoring. When enabled, occasions that fail in any case retry makes an attempt will likely be despatched to an SQS queue for investigation and potential replay. This helps forestall information loss, supplies visibility into failures, and allows safety audit trails for monitoring anomalies. The DLQ retains messages for 14 days.

Notice that AWS KMS encryption requires the important thing coverage to grant acceptable permissions to every consuming service (Lambda, CloudWatch Logs, and SNS).

  1. On the assessment web page, choose I acknowledge that AWS CloudFormation would possibly create IAM sources with customized names.
  2. Select Submit.

Sources created

The CloudFormation stack creates the next sources:

  • EventBridge rule for scheduled execution
  • AWS Lambda operate (Python 3.12 runtime)
  • Amazon SNS subject for alerts
  • IAM position with permissions for CloudWatch, Amazon Redshift Knowledge API, and SNS
  • CloudWatch Log Group for Lambda logs

Notice: CloudFormation deployment sometimes takes 10–quarter-hour to finish. You possibly can monitor progress in actual time below the Occasions tab of your CloudFormation stack.

Publish-deployment configuration

After the CloudFormation stack has been efficiently created, full the next steps.

Step 1: Document CloudFormation outputs

  1. Navigate to the AWS CloudFormation console.
  2. Choose your stack and select the Outputs tab.
  3. Notice the values for LambdaRoleArn and SNSTopicArn. You will want these in subsequent steps.

Step 2: Grant Amazon Redshift permissions

Grant permissions to the Lambda operate to question Amazon Redshift system tables for monitoring information. Full the next steps to grant the required entry:

  1. Navigate to the Amazon Redshift console.
  2. Within the left navigation pane, select Question Editor V2.
  3. Connect with your Amazon Redshift Serverless workgroup.
  4. Execute the next SQL instructions, changing <IAM Function ARN> with the LambdaRoleArn worth out of your CloudFormation outputs:
CREATE USER "IAMR:<IAM Lambda Function>" WITH PASSWORD DISABLE;

GRANT ROLE "sys:monitor" TO "IAMR:<IAM Function>";

RedshiftSQL-DBD-5612

These instructions create an AmazonRedshift consumer related to the Lambda IAM position and grant it the sys:monitor Amazon Redshift position. This position supplies read-only entry to catalog and system tables with out granting permissions to consumer information tables.

Step 3: Configure Slack notifications

Amazon Q Developer in chat purposes supplies native AWS integration and managed authentication, eradicating customized webhook code and decreasing setup complexity. To obtain alerts in Slack, configure Amazon Q Developer in Chat Functions to attach your SNS subject to your most popular Slack channel:

  1. Navigate to Amazon Q Developer in chat purposes (previously AWS Chatbot) within the AWS console.
  2. Comply with the directions within the Slack integration documentation to authorize AWS entry to your Slack workspace.
  3. When configuring the Slack channel, be sure that you choose the proper AWS Area the place you deployed the CloudFormation stack.
  4. Within the Notifications part, choose the SNS subject created by your CloudFormation stack (confer with the SNSTopicArn output worth).
  5. Hold the default IAM read-only permissions for the channel configuration.

SNS topic

After configured, alerts routinely seem in Slack each time thresholds are exceeded.

result-upon-success

Price issues

With the default configuration, this answer incurs minimal ongoing prices. The Lambda operate executes roughly 693 instances per thirty days (each quarter-hour throughout an 8-hour enterprise day, Monday by Friday), leading to a month-to-month value of roughly $0.33 USD. This consists of Lambda compute prices ($0.26) and CloudWatch GetMetricData API calls ($0.07). All different companies (EventBridge, SNS, CloudWatch Logs, and Amazon Redshift Knowledge API). The Amazon Redshift Knowledge API has no further prices past the minimal Amazon Redshift Serverless RPU consumption for the Amazon Redshift Serverless system desk question execution. You possibly can scale back prices by reducing the monitoring frequency (reminiscent of, each half-hour) or enhance responsiveness by operating extra incessantly (reminiscent of, each 5 minutes) with a proportional value enhance.

All prices are estimates and will differ primarily based in your atmosphere. Variations usually happen as a result of queries scanning system tables might take longer or require further sources relying on the system complexity

Safety finest practices

This answer implements the next safety controls:

  • IAM insurance policies scoped to particular useful resource ARNs for the Amazon Redshift workgroup, namespace, SNS subject, and log group.
  • Knowledge API assertion entry restricted to the Lambda operate’s personal IAM consumer ID.
  • Learn-only sys:monitor database position for operational metadata entry. Restrict to the position created by the CloudFormation template.
  • Reserved concurrent executions capped at 5.

To additional strengthen your safety posture, think about the next enhancements:

  • Allow EnableKMSEncryption to encrypt atmosphere variables, logs, and SNS messages at relaxation.
  • Allow EnableVPC to deploy the operate inside a VPC for community isolation.
  • Audit entry by AWS CloudTrail.

Necessary: That is pattern code for non-production utilization. Work along with your safety and authorized groups to fulfill your organizational safety, regulatory, and compliance necessities earlier than deployment. This answer demonstrates monitoring capabilities however requires further safety hardening for manufacturing environments, together with encryption configuration, IAM coverage scoping, VPC deployment, and complete testing.

Clear up

To take away all sources and keep away from ongoing prices in the event you don’t need to use the answer anymore:

  1. Delete the CloudFormation stack.
  2. Take away the Slack integration from Amazon Q Developer in chat purposes.

Troubleshooting

  • If no metrics or incomplete SQL diagnostics are returned, confirm that the Amazon Redshift Serverless workgroup is energetic with current question exercise, and make sure the database consumer has the sys:monitor position (GRANT ROLE sys:monitor TO <consumer>) within the question editor. With out this position, queries execute efficiently however solely return information seen to that consumer’s permissions moderately than the complete cluster exercise.
  • For VPC-deployed capabilities that fail to achieve AWS companies, affirm that VPC endpoints or a NAT Gateway are configured for CloudWatch, Amazon Redshift Knowledge API, Amazon Redshift Serverless, SNS, and CloudWatch Logs.
  • If the Lambda operate instances out, enhance the LambdaTimeout and QueryTimeoutSeconds parameters. The default timeout of 240 seconds accommodates most workloads, however clusters with many energetic queries might require further time for SQL diagnostics to finish.

Conclusion

On this submit, we confirmed how one can construct a proactive monitoring answer for Amazon Redshift Serverless utilizing AWS Lambda, Amazon CloudWatch, and Amazon SNS with Slack integration. By routinely accumulating metrics, evaluating thresholds, and delivering alerts in close to actual time to Slack or your most popular collaborative platform, this answer helps detect efficiency and value points early. As a result of the answer itself is serverless, it aligns with the operational simplicity objectives of Amazon Redshift Serverless—scaling routinely, requiring minimal upkeep, and delivering excessive worth at low value. You possibly can prolong this basis with further metrics, diagnostic logic, or various notification channels to fulfill your group’s wants.

To study extra, see the Amazon Redshift documentation on monitoring and efficiency optimization.


In regards to the authors

Headhost author 1

Cristian Restrepo Lopez

Cristian is a Options Architect at AWS, serving to clients construct trendy information purposes with a concentrate on analytics. Outdoors of labor, he enjoys exploring rising applied sciences and connecting with the info group.

Satesh Sonti

Satesh is a Principal Analytics Specialist Options Architect primarily based out of Atlanta, specializing in constructing enterprise information platforms, information warehousing, and analytics options. He has over 19 years of expertise in constructing information belongings and main complicated information platform packages for banking and insurance coverage purchasers throughout the globe.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles