Google Cloud Anthos Cluster monitoring
Purpose
Google Cloud Anthos monitoring is essential for ensuring the performance, reliability, and efficiency of your applications and infrastructure. For example:
- A high container restart count indicates instability in your containers, such as frequent crashes or misconfigurations.
- Tracking the node allocatable memory can help ensure optimal resource allocation and prevents memory saturation, which could lead to application downtime or degraded performance.
- Monitoring the total and allocatable CPU cores helps ensure applications have sufficient processing power and plan resource scaling when necessary.
By monitoring Google Cloud Anthos, you can proactively troubleshoot issues, optimize resource usage, and enhance the overall performance of your Anthos-based environment. Follow the instructions in this help document to set up Anthos monitoring.
Setup and configuration
- Adding an Anthos cluster while configuring a new Google Cloud monitor
If you have not configured a Google Cloud monitor yet, add one by following these steps:
-
- Log in to Site24x7.
- Go to Cloud > GCP > Add GCP Monitor or Admin > Cloud Monitoring > Google Cloud Platform (GCP) .
- Provide a unique display name for identification purposes.
- Upload a service account JSON file to authenticate Site24x7 for performing resource discovery.
- Select Anthos Cluster in the Select the Resources for Monitoring list.
- Select existing Notification Profiles , User Alerts Groups , Tags , and IT Automations , or add new ones. You can also integrate Site24x7 alarms with your preferred third-party service.
- Click Start GCP Monitoring .
- Adding an Anthos cluster to an existing Google Cloud monitor
If you already have a Google Cloud monitor configured for the service account, you can add Google Anthos Cluster by following the steps below:
-
- Log in to Site24x7.
- Go to Cloud > GCP , select your Google Cloud monitor, then go to any of the dashboards on the left pane of your Google Cloud monitor.
-
Click the hamburger icon (
) and select Edit.
- In the Edit GCP Monitor page that opens, select Anthos Cluster from the Select the Resources for Monitoring drop-down, and click Save .
- After successful configuration, go to Cloud > GCP > Anthos Cluster . Now you can view the discovered Anthos Cluster resources.
Note: It will take 15-30 minutes to discover new GCP resources.
Polling frequency
Site24x7's Google Anthos Cluster monitor collects metrics data every minute and the statuses of your Google Anthos Cluster resources every five minutes.
Supported metrics
Metric name |
Description |
Statistic |
Unit |
Summary tab |
|||
Distribution |
Represents the distribution of workloads across nodes in the cluster. |
Text |
N/A |
Cluster Type |
Specifies whether the cluster is a hybrid, on-premises, or cloud-based environment. |
Text |
N/A |
Pod Volume Utilization |
Tracks the percentage of storage space being used by pods. |
Percentage |
Average |
Container Restart Count |
Measures the number of times containers have restarted, indicating stability issues. |
Count |
Total |
Node CPU Allocatable Utilization |
Shows the percentage of CPU resources allocated for workloads. |
Percentage |
Average |
Node Memory Allocatable Utilization |
Reflects the percentage of memory allocated for workloads on a node. |
Percentage |
Average |
Node CPU Total Cores |
Indicates the total number of CPU cores available on a node. |
Count |
Average |
Node CPU Allocatable Cores |
Displays the number of CPU cores allocated for workloads after reserving system resources. |
Count |
Average |
Node Memory Usage |
Tracks the actual memory usage of a node. |
Bytes |
Average |
Node Total Memory |
Represents the total memory capacity of the node. |
Bytes |
Average |
Node Allocatable Memory |
Measures the memory available for workloads after reserving for system processes. |
Bytes |
Average |
Pod Bytes Transmitted |
Indicates the amount of data sent by a pod over the network. |
Bytes |
Average |
Pod Bytes Received |
Reflects the amount of data received by a pod over the network. |
Bytes |
Average |
Container Metrics |
|||
Container Restart Count |
Tracks the number of times a container has restarted, indicating potential issues. |
Count |
Average |
Container Limit Cores |
Specifies the maximum CPU cores allocated to a container. |
Count |
Average |
Container Request Cores |
Represents the CPU cores requested by a container for its operation. |
Count |
Average |
Container CPU Usage Time |
Measures the total CPU time consumed by a container. |
Seconds |
Average |
Container CPU Utilization |
Reflects the percentage of CPU resources utilized by a container. |
Percentage |
Average |
Container Memory Limit |
Indicates the maximum memory allocated to a container. |
Bytes |
Average |
Container Memory Request |
Shows the memory requested by a container for its operation. |
Bytes |
Average |
Container Memory Usage |
Tracks the actual memory usage of a container. |
Bytes |
Average |
Container Memory Limit Utilization |
Displays the percentage of the memory limit being utilized by a container. |
Percentage |
Average |
Container Memory Request Utilization |
Reflects the percentage of the requested memory being utilized by a container. |
Percentage |
Average |
Container Page Faults |
Counts the number of memory page faults encountered by a container. |
Count |
Average |
Container Ephemeral Storage |
Tracks the temporary storage used by a container. |
Bytes |
Average |
Container Uptime |
Measures the total time a container has been running without interruption. |
Seconds |
Average |
Node Metrics |
|||
Node Total Cores |
Represents the total number of CPU cores available on a node. |
Count |
Average |
Node Allocatable Cores |
Indicates the CPU cores allocated for workloads after reserving system resources. |
Count |
Average |
Node CPU Usage Time |
Measures the total CPU time consumed by a node. |
Seconds |
Average |
Node CPU Allocatable Utilization |
Reflects the percentage of allocatable CPU resources being utilized. |
Percentage |
Average |
Node Memory Usage |
Tracks the actual memory usage of a node. |
Bytes |
Average |
Node Total Memory |
Represents the total memory capacity of the node. |
Bytes |
Average |
Node Allocatable Memory |
Indicates the memory available for workloads after reserving for system processes. |
Bytes |
Average |
Node Memory Allocatable Utilization |
Shows the percentage of allocatable memory being utilized. |
Bytes |
Average |
Node Ephemeral Storage Usage |
Tracks the temporary storage used by a node. |
Bytes |
Average |
Node Total Ephemeral Storage |
Represents the total ephemeral storage capacity of a node. |
Bytes |
Average |
Node Allocatable Ephemeral Storage |
Indicates the ephemeral storage available for workloads after reserving system resources. |
Bytes |
Average |
Node Total Inodes |
Represents the total number of inodes available on a node. |
Count |
Average |
Node Free Inodes |
Tracks the number of free inodes remaining on a node. |
Count |
Average |
Node Bytes Transmitted |
Indicates the amount of data sent by a node over the network. |
Count |
Average |
Node Bytes Received |
Reflects the amount of data received by a node over the network. |
Count |
Average |
Pod Bytes Transmitted |
Tracks the amount of data sent by pods over the network. |
Count |
Average |
Pod Bytes Received |
Measures the amount of data received by pods over the network. |
Count |
Average |
Pod Volume Capacity |
Represents the total storage capacity allocated to pods. |
Bytes |
Average |
Pod Volume Usage |
Tracks the actual storage space being used by pods. |
Bytes |
Average |
Pod Volume Utilization |
Reflects the percentage of storage capacity being utilized by pods. |
Bytes |
Average |
Configuration Details |
|||
Cluster Region |
Specifies the geographical region where the cluster is deployed. |
Value |
N/A |
Cluster Type |
Indicates whether the cluster is hybrid, on-premises, or cloud-based. |
Value |
N/A |
Created Time |
Records the timestamp when the resource was created. |
Value |
N/A |
Distribution |
Represents the distribution of workloads across nodes in the cluster. |
Value |
N/A |
Enable Component |
Identifies the components enabled in the cluster. |
Value |
N/A |
Entity Tag |
Provides a unique identifier for the resource version. |
Value |
N/A |
Evaluation Mode |
Indicates the mode used for policy evaluation. |
Value |
N/A |
Fleet Membership |
Shows the cluster's membership in a fleet for unified management. |
Value |
N/A |
Kubernetes Version |
Displays the version of Kubernetes running on the cluster. |
Value |
N/A |
Monitoring Config |
Details the configuration for monitoring the cluster. |
Value |
N/A |
Name |
Specifies the name of the resource. |
Value |
N/A |
Platform Version |
Indicates the version of the platform hosting the cluster. |
Value |
N/A |
Project ID |
Identifies the Google Cloud project associated with the resource. |
Value |
N/A |
State |
Reflects the current operational state of the resource. |
Value |
N/A |
Updated Time |
Records the timestamp of the last update made to the resource. |
Value |
N/A |
Inventory |
|||
Monitor Licensing Category |
Shows the license category of this monitor. |
Value |
N/A |
Monitor Group(s) Associated |
Displays the associated monitor groups. |
Value |
N/A |
Threshold and Availability Profile |
Shows the associated Threshold Profile. |
Value |
N/A |
Notification Profile |
Shows the associated Notification Profile. |
Value |
N/A |
User Alert Group |
Shows the associated User Alert Group. |
Value |
N/A |
Monitor Creation Time |
Displays the time when this monitor was created. |
Value |
N/A |
Last Modified Time |
Displays the time when this monitor was last modified. |
Value |
N/A |
Threshold configuration
-
- Global configuration
- In the Site24x7 web client, go to the Admin section from the left navigation pane.
- Select Configuration Profiles from the left pane and using the drop-down, select Threshold and Availability.
- Click Add Threshold Profile in the top-right corner.
- For the Monitor Type , select GCP Anthos Cluster .
- Now you can set the threshold values for all the metrics listed above.
-
Monitor-level configuration
- In the Site24x7 web client, go to Cloud > GCP > Anthos Cluster .
-
Select a resource you would like to set a threshold for, then click the hamburger (
) icon.
- Select Edit , which directs you to the Edit GCP Anthos Cluster Monitor page.
- Set the threshold values for the metrics using the Threshold and Availability option.
- You can also configure IT automation at the attribute level.
IT Automation
Site24x7 provides a set of exclusive IT automation tools that automatically resolve performance degradation issues. These tools react to events proactively rather than waiting for manual intervention. The IT automation tools automate repetitive tasks and automatically remediate threshold breaches. The alarm engine continually evaluates system events for which thresholds are set and executes the mapped automation when there is a breach.
How to configure IT Automation for a monitor
Configuration Rules
Editing multiple monitors to associate different monitor groups or add a different tag can be a tedious process. With Site24x7's Configuration Rules, you can automate the configuration settings of your monitoring resources. Also, Site24x7 enables you to create custom rules to track configuration changes continuously and achieve the ideal configuration settings.
How to add Configuration Rules
Summary
The Summary tab will give you the performance data organized by time for the metrics listed above. To view the summary:
- Go to Cloud > GCP > GCP Anthos Cluster .
- Select a resource.
- Click the Summary tab.
Reports
Gain in-depth data about the various parameters of your monitored resources and accentuate your service performance using our insightful reports.
To view reports for a Google Anthos Cluster resource:
- Go to the Reports section on the left navigation pane.
- Select Google Anthos Cluster from the menu on the left.
- You can find the Availability Summary Report , Performance Report , and Inventory Report for one selected monitor. Or you can get the Summary Report , Availability Summary Report , Health Trend Report , and Performance Report for all the Google Anthos Cluster monitors.
You can also get reports from the Summary tab of the GCP Anthos Cluster monitor:
- Click the Summary tab.
- Get the Availability Summary Report of the monitor by clicking Availability .
- You can also find the Performance Report of the monitor by clicking any chart title.
Related links
- Google Cloud monitoring help page
- Possible reasons why GCP resources are not added for monitoring in Site24x7
- What permissions should I have in my Google account to enable Site24x7 Google Cloud Platform (GCP) monitoring?
- How to create a service account JSON file to authenticate Site24x7 for discovery of GCP resources
- How to create a service account in the GCP console