Automation for the AWS platform - gain monitoring insight and automatically execute actions

01-Feb-2018 09:07 PM by Lakshmi Narayan J

automation blog banner

As you probably know, Site24x7's AWS monitoring capabilities provide complete visibility into resource utilization and performance for key compute resources, storage, and database services powering your application in the Amazon Web Services (AWS) cloud. From here on out, you'll have the power to not only identify issues that might affect application performance, but also automatically invoke operational tasks across multiple AWS resources to resolve them quickly.

Before we look at the various predefined automations and the strategies on how to best use them, we need to understand a bit more about the three main components - events, targets, and actions, that make up our IT automation framework.

Events: Events describe changes in your monitored resource. A number of scenarios can trigger an event: when the life cycle of an instance changes from running to stopped, when the metric exceeds the defined threshold, during manual resource termination, in the instance of a EC2 status check failure, and more.
Targets ( destination monitors): Targets define where the operational task needs to be executed. Currently, there are five targets supported: Amazon EC2, RDS, ElastiCache, Amazon SNS, and Lambda.
Actions (automation type): This defines the operational task that needs to be performed on the target. The task could be an action like stop, start, or reboot, or an action to invoke another AWS service.

Automation strategies

You can create an automation profile either as a part of your proactive monitoring strategy,where you create fail-safes like triggered reboots to mitigate system impairment, or as a part of your cost optimization strategy, where you identify underutilized resources and save money by stoping them.

Handle status check failures and prevent out of memory errors

You can choose to reboot your EC2 instance whenever Amazon detects a hardware or software issue. What makes this reboot action even more powerful is that you can tie it to a metric like memory usage (an agent-only metric that Site24x7 offers as a part of its enhanced EC2 monitoring capabilities) to detect a memory leak and act on it before the performance of your application begins to decline.

Automatically stop underutilized EC2 and RDS instances to lower costs

With visibility into usage data, you can determine whether the compute and database resources configured to run your applications are aligned with real demand. To control self-provisioned cloud usage and optimize your environment, you can establish automation to monitor resource usage stats, detect underutilized or unused instances, and shut them down. Also, if you are an AWS Managed Service Partner, leveraging Site24x7's MSP platform to monitor your customers environment, then you can assign these stop automation profiles to the monitored resources to help reduce instance hours and lower operational costs.

Use case:

If you are running batch computing jobs like media transcoding, your configured on-demand EC2 instances would only be running at full capacity for a specific period. In this case, you can set thresholds to keep an eye on metric data points including average CPU usage and network I/O, and assign an automation profile to automatically stop the EC2 instance whenever the metric data reaches the level you define. This way, underutilized instances won't sit idle and accrue hourly charges.

Configure and map automation to stop underutilized EC2 instances

Let the right people know

As you may know, Site24x7 already provides a number of methods to notify you of an outage. Notification options range from traditional notification channels like email, SMS, and chat applications to using Webhooks to trigger customized HTTP callbacks. By including support for Amazon Simple Notification Service (SNS), you can trigger a custom message to a previously created SNS topic, and, in turn, all endpoints subscribed to that topic) for flexible alerting.

Execute your workflow

If predefined actions aren't cutting it, you can author a Lambda function and automatically invoke it when a threshold has been met so you get the desired response. For example, you may be running RDS database instances in your development environment; to save costs you can author a Lambda function that creates snapshots and terminates instances. You can then create an action profile to invoke this function whenever the RDS performance metric "connections" drops below a specified value.

So what are you waiting for? Sign up for your free 30-day trial of Site24x7, set up automation, track and automatically respond to alert events, and unlock your operational potential!

Resources

For more information on AWS monitoring capabilities and automation, check out these links to our help documentation:

Comments (0)