Google Cloud Anthos Cluster monitoring

Purpose

Google Cloud Anthos monitoring is essential for ensuring the performance, reliability, and efficiency of your applications and infrastructure. For example:

A high container restart count indicates instability in your containers, such as frequent crashes or misconfigurations.
Tracking the node allocatable memory can help ensure optimal resource allocation and prevents memory saturation, which could lead to application downtime or degraded performance.
Monitoring the total and allocatable CPU cores helps ensure applications have sufficient processing power and plan resource scaling when necessary.

By monitoring Google Cloud Anthos, you can proactively troubleshoot issues, optimize resource usage, and enhance the overall performance of your Anthos-based environment. Follow the instructions in this help document to set up Anthos monitoring.

Setup and configuration

Adding an Anthos cluster while configuring a new Google Cloud monitor

If you have not configured a Google Cloud monitor yet, add one by following these steps:

1. Log in to Site24x7.
2. Go to Cloud > GCP > Add GCP Monitor or Admin > Cloud Monitoring > Google Cloud Platform (GCP) .
3. Provide a unique display name for identification purposes.
4. Upload a service account JSON file to authenticate Site24x7 for performing resource discovery.
5. Select Anthos Cluster in the Select the Resources for Monitoring list.
6. Select existing Notification Profiles , User Alerts Groups , Tags , and IT Automations , or add new ones. You can also integrate Site24x7 alarms with your preferred third-party service.
7. Click Start GCP Monitoring .
Adding an Anthos cluster to an existing Google Cloud monitor

If you already have a Google Cloud monitor configured for the service account, you can add Google Anthos Cluster by following the steps below:

1. Log in to Site24x7.
2. Go to Cloud > GCP , select your Google Cloud monitor, then go to any of the dashboards on the left pane of your Google Cloud monitor.
3. Click the hamburger icon () and select Edit.
4. In the Edit GCP Monitor page that opens, select Anthos Cluster from the Select the Resources for Monitoring drop-down, and click Save .
5. After successful configuration, go to Cloud > GCP > Anthos Cluster . Now you can view the discovered Anthos Cluster resources.

Note: It will take 15-30 minutes to discover new GCP resources.

Polling frequency

Site24x7's Google Anthos Cluster monitor collects metrics data every minute and the statuses of your Google Anthos Cluster resources every five minutes.

Supported metrics

Metric name	Description	Statistic	Unit
Summary tab
Distribution	Represents the distribution of workloads across nodes in the cluster.	Text	N/A
Cluster Type	Specifies whether the cluster is a hybrid, on-premises, or cloud-based environment.	Text	N/A
Pod Volume Utilization	Tracks the percentage of storage space being used by pods.	Percentage	Average
Container Restart Count	Measures the number of times containers have restarted, indicating stability issues.	Count	Total
Node CPU Allocatable Utilization	Shows the percentage of CPU resources allocated for workloads.	Percentage	Average
Node Memory Allocatable Utilization	Reflects the percentage of memory allocated for workloads on a node.	Percentage	Average
Node CPU Total Cores	Indicates the total number of CPU cores available on a node.	Count	Average
Node CPU Allocatable Cores	Displays the number of CPU cores allocated for workloads after reserving system resources.	Count	Average
Node Memory Usage	Tracks the actual memory usage of a node.	Bytes	Average
Node Total Memory	Represents the total memory capacity of the node.	Bytes	Average
Node Allocatable Memory	Measures the memory available for workloads after reserving for system processes.	Bytes	Average
Pod Bytes Transmitted	Indicates the amount of data sent by a pod over the network.	Bytes	Average
Pod Bytes Received	Reflects the amount of data received by a pod over the network.	Bytes	Average
Container Metrics
Container Restart Count	Tracks the number of times a container has restarted, indicating potential issues.	Count	Average
Container Limit Cores	Specifies the maximum CPU cores allocated to a container.	Count	Average
Container Request Cores	Represents the CPU cores requested by a container for its operation.	Count	Average
Container CPU Usage Time	Measures the total CPU time consumed by a container.	Seconds	Average
Container CPU Utilization	Reflects the percentage of CPU resources utilized by a container.	Percentage	Average
Container Memory Limit	Indicates the maximum memory allocated to a container.	Bytes	Average
Container Memory Request	Shows the memory requested by a container for its operation.	Bytes	Average
Container Memory Usage	Tracks the actual memory usage of a container.	Bytes	Average
Container Memory Limit Utilization	Displays the percentage of the memory limit being utilized by a container.	Percentage	Average
Container Memory Request Utilization	Reflects the percentage of the requested memory being utilized by a container.	Percentage	Average
Container Page Faults	Counts the number of memory page faults encountered by a container.	Count	Average
Container Ephemeral Storage	Tracks the temporary storage used by a container.	Bytes	Average
Container Uptime	Measures the total time a container has been running without interruption.	Seconds	Average
Node Metrics
Node Total Cores	Represents the total number of CPU cores available on a node.	Count	Average
Node Allocatable Cores	Indicates the CPU cores allocated for workloads after reserving system resources.	Count	Average
Node CPU Usage Time	Measures the total CPU time consumed by a node.	Seconds	Average
Node CPU Allocatable Utilization	Reflects the percentage of allocatable CPU resources being utilized.	Percentage	Average
Node Memory Usage	Tracks the actual memory usage of a node.	Bytes	Average
Node Total Memory	Represents the total memory capacity of the node.	Bytes	Average
Node Allocatable Memory	Indicates the memory available for workloads after reserving for system processes.	Bytes	Average
Node Memory Allocatable Utilization	Shows the percentage of allocatable memory being utilized.	Bytes	Average
Node Ephemeral Storage Usage	Tracks the temporary storage used by a node.	Bytes	Average
Node Total Ephemeral Storage	Represents the total ephemeral storage capacity of a node.	Bytes	Average
Node Allocatable Ephemeral Storage	Indicates the ephemeral storage available for workloads after reserving system resources.	Bytes	Average
Node Total Inodes	Represents the total number of inodes available on a node.	Count	Average
Node Free Inodes	Tracks the number of free inodes remaining on a node.	Count	Average
Node Bytes Transmitted	Indicates the amount of data sent by a node over the network.	Count	Average
Node Bytes Received	Reflects the amount of data received by a node over the network.	Count	Average
Pod Bytes Transmitted	Tracks the amount of data sent by pods over the network.	Count	Average
Pod Bytes Received	Measures the amount of data received by pods over the network.	Count	Average
Pod Volume Capacity	Represents the total storage capacity allocated to pods.	Bytes	Average
Pod Volume Usage	Tracks the actual storage space being used by pods.	Bytes	Average
Pod Volume Utilization	Reflects the percentage of storage capacity being utilized by pods.	Bytes	Average
Configuration Details
Cluster Region	Specifies the geographical region where the cluster is deployed.	Value	N/A
Cluster Type	Indicates whether the cluster is hybrid, on-premises, or cloud-based.	Value	N/A
Created Time	Records the timestamp when the resource was created.	Value	N/A
Distribution	Represents the distribution of workloads across nodes in the cluster.	Value	N/A
Enable Component	Identifies the components enabled in the cluster.	Value	N/A
Entity Tag	Provides a unique identifier for the resource version.	Value	N/A
Evaluation Mode	Indicates the mode used for policy evaluation.	Value	N/A
Fleet Membership	Shows the cluster's membership in a fleet for unified management.	Value	N/A
Kubernetes Version	Displays the version of Kubernetes running on the cluster.	Value	N/A
Monitoring Config	Details the configuration for monitoring the cluster.	Value	N/A
Name	Specifies the name of the resource.	Value	N/A
Platform Version	Indicates the version of the platform hosting the cluster.	Value	N/A
Project ID	Identifies the Google Cloud project associated with the resource.	Value	N/A
State	Reflects the current operational state of the resource.	Value	N/A
Updated Time	Records the timestamp of the last update made to the resource.	Value	N/A
Inventory
Monitor Licensing Category	Shows the license category of this monitor.	Value	N/A
Monitor Group(s) Associated	Displays the associated monitor groups.	Value	N/A
Threshold and Availability Profile	Shows the associated Threshold Profile.	Value	N/A
Notification Profile	Shows the associated Notification Profile.	Value	N/A
User Alert Group	Shows the associated User Alert Group.	Value	N/A
Monitor Creation Time	Displays the time when this monitor was created.	Value	N/A
Last Modified Time	Displays the time when this monitor was last modified.	Value	N/A

Threshold configuration

- Global configuration
1. In the Site24x7 web client, go to the Admin section from the left navigation pane.
2. Select Configuration Profiles from the left pane and using the drop-down, select Threshold and Availability.
3. Click Add Threshold Profile in the top-right corner.
4. For the Monitor Type , select GCP Anthos Cluster .
5. Now you can set the threshold values for all the metrics listed above.
- Monitor-level configuration
  1. In the Site24x7 web client, go to Cloud > GCP > Anthos Cluster .
  2. Select a resource you would like to set a threshold for, then click the hamburger () icon.
  3. Select Edit , which directs you to the Edit GCP Anthos Cluster Monitor page.
  4. Set the threshold values for the metrics using the Threshold and Availability option.
  5. You can also configure IT automation at the attribute level.

IT Automation

Site24x7 provides a set of exclusive IT automation tools that automatically resolve performance degradation issues. These tools react to events proactively rather than waiting for manual intervention. The IT automation tools automate repetitive tasks and automatically remediate threshold breaches. The alarm engine continually evaluates system events for which thresholds are set and executes the mapped automation when there is a breach.

How to configure IT Automation for a monitor

Configuration Rules

Editing multiple monitors to associate different monitor groups or add a different tag can be a tedious process. With Site24x7's Configuration Rules, you can automate the configuration settings of your monitoring resources. Also, Site24x7 enables you to create custom rules to track configuration changes continuously and achieve the ideal configuration settings.

How to add Configuration Rules

Summary

The Summary tab will give you the performance data organized by time for the metrics listed above. To view the summary:

Go to Cloud > GCP > GCP Anthos Cluster .
Select a resource.
Click the Summary tab.

Reports

Gain in-depth data about the various parameters of your monitored resources and accentuate your service performance using our insightful reports.

To view reports for a Google Anthos Cluster resource:

Go to the Reports section on the left navigation pane.
Select Google Anthos Cluster from the menu on the left.
You can find the Availability Summary Report , Performance Report , and Inventory Report for one selected monitor. Or you can get the Summary Report , Availability Summary Report , Health Trend Report , and Performance Report for all the Google Anthos Cluster monitors.

You can also get reports from the Summary tab of the GCP Anthos Cluster monitor:

Click the Summary tab.
Get the Availability Summary Report of the monitor by clicking Availability .
You can also find the Performance Report of the monitor by clicking any chart title.

On this page

Purpose

Setup and configuration

Polling frequency

Monitored metrics

Setting alert thresholds

IT Automation

Configuration Rules

Summary

Reports

Google Cloud Anthos Cluster monitoring

Purpose

Setup and configuration

Polling frequency

Supported metrics

Threshold configuration

IT Automation

Configuration Rules

Summary

Reports

Related links