The Key AWS Lambda Metrics to Monitor

Serverless programming has grown popular, thanks in large part to its usage-based functionality. AWS Lambda is one of the most powerful serverless computing services.

While it offers great benefits in terms of cost and scale, it also presents certain challenges in monitoring—general monitoring metrics such as CPU usage or memory don’t make sense in serverless architecture. This article will introduce the key metrics for monitoring performance and reliability in AWS Lambda.

What is AWS Lambda?

AWS Lambda is a serverless and event-based computing service, which allows us to execute code without having to provision and manage the server resources and execution environments. It helps us focus on more important things, such as business logic and code features rather than the infrastructure to run it. Underneath, AWS Lambda uses elastic and highly available infrastructure to run your code, providing you with good performance.

AWS Lambda can run code for almost all of the popular backend languages. It provides out-of-the-box execution environments for the same. AWS Lambda can be configured to be triggered in response to various AWS events or API calls made via AWS API Gateway. They can also be triggered manually through a console or AWS CLI.

AWS Lambda can also be scheduled to execute at regular intervals to perform tasks such as database cleanup. Lambda can be seamlessly integrated with other AWS services such as ELB andS3. While using AWS Lambda, you only pay for the computing time you use, making it cost-effective as well.

Important metrics to monitor for AWS Lambda

One of the main advantages of Lambda is that it runs the code without developers having to provision any compute resources. While this is a great feature, it also means that users aren’t granted access to system-level data. At the same time, AWS Lambda, like other AWS services, also provides specific standard monitoring metrics that can be used to track the usage, performance, and reliability of an AWS Lambda function.

Invocations

The invocation metric is one of the default metrics automatically tracked by AWS. It measures the total number of times a Lambda function was executed, including both successful and failed executions.

This is a great metric to measure Lambda usage and forecast estimated costs for your project. In addition, it can help identify upstream problems in cases where a Lambda has zero invocations but is expected to have some amount of traffic.

Duration

The duration metric measures the time (in milliseconds) it takes to complete the execution of a Lambda function. It records the time from invocation to conclusion, which makes it useful for recording performance that provides a metric similar to the latency metric in a traditional application.

It is also split into slowest, average, and fastest duration to give more valuable insights for Lambda. It has a direct impact on cost and can also help monitor functions running close to their timeout values, as they may fail due to timeouts in the future.

IteratorAge

The iterator age metric is helpful for Lambda functions that read from a stream, such as Kafka or Kinesis. These kinds of streams are used frequently in modern-day applications.

This metric tracks the age of the last record in the event. The age is defined as the length of time between a streaming service receiving a record and the Lambda receiving the event for that record. High values for this metric indicate that the events are produced faster than Lambda could consume them. This occurs when the consuming Lambda function is slow to process the record—it causes the data to stay longer in the stream, and thus the value of the iterator age metric rises.

Errors

The error metric tracks the total number of Lambda invocations and executions that result in function errors. A function error can be caused when an exception thrown by the code Lambda is running or by the Lambda runtime itself. The Lambda runtime can throw exceptions due to various reasons, including exceeded timeout or configuration errors.

It also shows the success rate, defined as the percentage of executions that were successful for a given time. This is a useful feature, as 1,000 errors with a success rate of 99% won’t raise any alarms, but 1,000 errors with a success rate of 10% will point to a serious issue. We can also use the error metric to calculate the error rate, defined as the number of errors divided by total invocations over a period.

ConcurrentExecutions

The ConcurrentExecutions metric tracks the sum of concurrent executions for all the Lambda functions of an account. AWS sets a default limit of 1,000 concurrent executions per region for an AWS account. We can also set function-level concurrency limits and query this metric for those Lambda functions.

Using this metric, we can set correct and judicial concurrency limits on individual Lambda functions and make sure that higher concurrent executions on one Lambda function won’t affect the performance of others by restricting their resources. If the project demands it, we can ask AWS to increase the limit for our account. We can also aggregate this metric over a particular time window and determine the Lambda function scale and estimated costs.

Throttles

This metric tracks the total number of invocation requests that were throttled. The AWS Lambda throttles the requests in scenarios where both of the following conditions occur:

There is no Lambda function instance available to process the request;
There is no available concurrency to further scale up the instances, as the concurrent execution limit has been exceeded.

In these cases, Lambda rejects additional requests with a TooManyRequestsException error. These requests are not counted in Invocations or Errors.

Monitoring this metric can help refine the concurrent execution limits for individual functions and for all the Lambda functions combined. As discussed above, the default limit is 1,000, but it can be increased on request.

ProvisionedConcurrencyInvocations

Since Lambda is a serverless application, it runs code only when needed—that is, when an invocation request comes in. This can cause a cold start problem. A cold start problem occurs when no containers are running for the requested Lambda, and as a result, there is a need to bring up new containers to serve the invocation request.

This usually occurs in two cases: First, when this is the first invocation request for the Lambda, and thus no containers are available for it. Second, when the Lambda has not been invoked for a while, and as a result AWS shuts down existing containers serving this Lambda for optimization.

Consequently, the cold starts often result in additional latency, making Lambda take longer than normal to process requests, especially when there is a need to initialize new instances. This can be resolved by using provisioned concurrency, which will automatically keep the configured Lambda functions pre-initialized so that they are ready to process requests as they come.

Similar to invocations, this metric tracks the number of invocation requests running on provisioned concurrency. It is an important metric to track, as a sudden drop in provisioned concurrency invocations can mean there is an issue with Lambda or one of the upstream services.

ProvisionedConcurrencyUtilization

This metric tracks the number of times a Lambda function is used in its provisioned concurrency. It can help monitor a Lambda function’s efficiency in using the provisioned concurrency.

A Lambda function could be underprovisioned if it constantly exceeds or reaches its provisioned concurrency threshold. Similarly, it could be overprovisioned if the utilization of the provisioned concurrency threshold is low. This metric will help you appropriately adjust the value of provisioned concurrency.

Conclusion

Tracking and monitoring the right metrics for AWS Lambda is vital for fully exploiting its potential. AWS Lambda provides different configurations for users’ varied needs. The metrics provided by AWS to monitor Lambda performance and usage help users define the correct configuration settings for their Lambdas.

Major problems like cold starts and throttled requests can be easily mitigated if you regularly monitor the correct metrics, understand what they mean, and take the necessary steps to prevent them. Monitoring the metrics covered in this article can also help identify issues with other upstream services or errors in code before they could affect your application and the customer experience.

Sorry to hear that. Let us know how we can improve the article.