Help Docs

Google Cloud Anthos Cluster monitoring

Purpose

Google Cloud Anthos monitoring is essential for ensuring the performance, reliability, and efficiency of your applications and infrastructure. For example:

  • A high container restart count indicates instability in your containers, such as frequent crashes or misconfigurations.
  • Tracking the node allocatable memory can help ensure optimal resource allocation and prevents memory saturation, which could lead to application downtime or degraded performance.
  • Monitoring the total and allocatable CPU cores helps ensure applications have sufficient processing power and plan resource scaling when necessary.

By monitoring Google Cloud Anthos, you can proactively troubleshoot issues, optimize resource usage, and enhance the overall performance of your Anthos-based environment. Follow the instructions in this help document to set up Anthos monitoring.

Setup and configuration

  • Adding an Anthos cluster while configuring a new Google Cloud monitor

If you have not configured a Google Cloud monitor yet, add one by following these steps:

    1. Log in to Site24x7.
    2. Go to Cloud > GCP > Add GCP Monitor or Admin > Cloud Monitoring > Google Cloud Platform (GCP) .
    3. Provide a unique display name for identification purposes.
    4. Upload a service account JSON file to authenticate Site24x7 for performing resource discovery.
    5. Select Anthos Cluster in the Select the Resources for Monitoring list.
    6. Select existing Notification Profiles , User Alerts Groups , Tags , and IT Automations , or add new ones. You can also integrate Site24x7 alarms with your preferred third-party service.
    7. Click Start GCP Monitoring .
  • Adding an Anthos cluster to an existing Google Cloud monitor

If you already have a Google Cloud monitor configured for the service account, you can add Google Anthos Cluster by following the steps below:

    1. Log in to Site24x7.
    2. Go to Cloud > GCP , select your Google Cloud monitor, then go to any of the dashboards on the left pane of your Google Cloud monitor.
    3. Click the hamburger icon () and select Edit.
    4. In the Edit GCP Monitor page that opens, select Anthos Cluster from the Select the Resources for Monitoring drop-down, and click Save .
    5. After successful configuration, go to Cloud > GCP > Anthos Cluster . Now you can view the discovered Anthos Cluster resources.

Note: It will take 15-30 minutes to discover new GCP resources.

Polling frequency

Site24x7's Google Anthos Cluster monitor collects metrics data every minute and the statuses of your Google Anthos Cluster resources every five minutes.

Supported metrics

Metric name

Description

Statistic

Unit

Summary tab

Distribution

Represents the distribution of workloads across nodes in the cluster.

Text

N/A

Cluster Type

Specifies whether the cluster is a hybrid, on-premises, or cloud-based environment.

Text

N/A

Pod Volume Utilization

Tracks the percentage of storage space being used by pods.

Percentage

Average

Container Restart Count

Measures the number of times containers have restarted, indicating stability issues.

Count

Total

Node CPU Allocatable Utilization

Shows the percentage of CPU resources allocated for workloads.

Percentage

Average

Node Memory Allocatable Utilization

Reflects the percentage of memory allocated for workloads on a node.

Percentage

Average

Node CPU Total Cores

Indicates the total number of CPU cores available on a node.

Count

Average

Node CPU Allocatable Cores

Displays the number of CPU cores allocated for workloads after reserving system resources.

Count

Average

Node Memory Usage

Tracks the actual memory usage of a node.

Bytes

Average

Node Total Memory

Represents the total memory capacity of the node.

Bytes

Average

Node Allocatable Memory

Measures the memory available for workloads after reserving for system processes.

Bytes

Average

Pod Bytes Transmitted

Indicates the amount of data sent by a pod over the network.

Bytes

Average

Pod Bytes Received

Reflects the amount of data received by a pod over the network.

Bytes

Average

Container Metrics

Container Restart Count

Tracks the number of times a container has restarted, indicating potential issues.

Count

Average

Container Limit Cores

Specifies the maximum CPU cores allocated to a container.

Count

Average

Container Request Cores

Represents the CPU cores requested by a container for its operation.

Count

Average

Container CPU Usage Time

Measures the total CPU time consumed by a container.

Seconds

Average

Container CPU Utilization

Reflects the percentage of CPU resources utilized by a container.

Percentage

Average

Container Memory Limit

Indicates the maximum memory allocated to a container.

Bytes

Average

Container Memory Request

Shows the memory requested by a container for its operation.

Bytes

Average

Container Memory Usage

Tracks the actual memory usage of a container.

Bytes

Average

Container Memory Limit Utilization

Displays the percentage of the memory limit being utilized by a container.

Percentage

Average

Container Memory Request Utilization

Reflects the percentage of the requested memory being utilized by a container.

Percentage

Average

Container Page Faults

Counts the number of memory page faults encountered by a container.

Count

Average

Container Ephemeral Storage

Tracks the temporary storage used by a container.

Bytes

Average

Container Uptime

Measures the total time a container has been running without interruption.

Seconds

Average

Node Metrics

Node Total Cores

Represents the total number of CPU cores available on a node.

Count

Average

Node Allocatable Cores

Indicates the CPU cores allocated for workloads after reserving system resources.

Count

Average

Node CPU Usage Time

Measures the total CPU time consumed by a node.

Seconds

Average

Node CPU Allocatable Utilization

Reflects the percentage of allocatable CPU resources being utilized.

Percentage

Average

Node Memory Usage

Tracks the actual memory usage of a node.

Bytes

Average

Node Total Memory

Represents the total memory capacity of the node.

Bytes

Average

Node Allocatable Memory

Indicates the memory available for workloads after reserving for system processes.

Bytes

Average

Node Memory Allocatable Utilization

Shows the percentage of allocatable memory being utilized.

Bytes

Average

Node Ephemeral Storage Usage

Tracks the temporary storage used by a node.

Bytes

Average

Node Total Ephemeral Storage

Represents the total ephemeral storage capacity of a node.

Bytes

Average

Node Allocatable Ephemeral Storage

Indicates the ephemeral storage available for workloads after reserving system resources.

Bytes

Average

Node Total Inodes

Represents the total number of inodes available on a node.

Count

Average

Node Free Inodes

Tracks the number of free inodes remaining on a node.

Count

Average

Node Bytes Transmitted

Indicates the amount of data sent by a node over the network.

Count

Average

Node Bytes Received

Reflects the amount of data received by a node over the network.

Count

Average

Pod Bytes Transmitted

Tracks the amount of data sent by pods over the network.

Count

Average

Pod Bytes Received

Measures the amount of data received by pods over the network.

Count

Average

Pod Volume Capacity

Represents the total storage capacity allocated to pods.

Bytes

Average

Pod Volume Usage

Tracks the actual storage space being used by pods.

Bytes

Average

Pod Volume Utilization

Reflects the percentage of storage capacity being utilized by pods.

Bytes

Average

Configuration Details

Cluster Region

Specifies the geographical region where the cluster is deployed.

Value

N/A

Cluster Type

Indicates whether the cluster is hybrid, on-premises, or cloud-based.

Value

N/A

Created Time

Records the timestamp when the resource was created.

Value

N/A

Distribution

Represents the distribution of workloads across nodes in the cluster.

Value

N/A

Enable Component

Identifies the components enabled in the cluster.

Value

N/A

Entity Tag

Provides a unique identifier for the resource version.

Value

N/A

Evaluation Mode

Indicates the mode used for policy evaluation.

Value

N/A

Fleet Membership

Shows the cluster's membership in a fleet for unified management.

Value

N/A

Kubernetes Version

Displays the version of Kubernetes running on the cluster.

Value

N/A

Monitoring Config

Details the configuration for monitoring the cluster.

Value

N/A

Name

Specifies the name of the resource.

Value

N/A

Platform Version

Indicates the version of the platform hosting the cluster.

Value

N/A

Project ID

Identifies the Google Cloud project associated with the resource.

Value

N/A

State

Reflects the current operational state of the resource.

Value

N/A

Updated Time

Records the timestamp of the last update made to the resource.

Value

N/A

Inventory

Monitor Licensing Category

Shows the license category of this monitor.

Value

N/A

Monitor Group(s) Associated

Displays the associated monitor groups.

Value

N/A

Threshold and Availability Profile

Shows the associated Threshold Profile.

Value

N/A

Notification Profile

Shows the associated Notification Profile.

Value

N/A

User Alert Group

Shows the associated User Alert Group.

Value

N/A

Monitor Creation Time

Displays the time when this monitor was created.

Value

N/A

Last Modified Time

Displays the time when this monitor was last modified.

Value

N/A

Threshold configuration

    • Global configuration
    1. In the Site24x7 web client, go to the Admin section from the left navigation pane.
    2. Select Configuration Profiles from the left pane and using the drop-down, select Threshold and Availability.
    3. Click Add Threshold Profile in the top-right corner.
    4. For the Monitor Type , select GCP Anthos Cluster .
    5. Now you can set the threshold values for all the metrics listed above.
    • Monitor-level configuration
      1. In the Site24x7 web client, go to Cloud > GCP > Anthos Cluster .
      2. Select a resource you would like to set a threshold for, then click the hamburger () icon.
      3. Select Edit , which directs you to the Edit GCP Anthos Cluster Monitor page.
      4. Set the threshold values for the metrics using the Threshold and Availability option.
      5. You can also configure IT automation at the attribute level.

IT Automation

Site24x7 provides a set of exclusive IT automation tools that automatically resolve performance degradation issues. These tools react to events proactively rather than waiting for manual intervention. The IT automation tools automate repetitive tasks and automatically remediate threshold breaches. The alarm engine continually evaluates system events for which thresholds are set and executes the mapped automation when there is a breach.

How to configure IT Automation for a monitor

Configuration Rules

Editing multiple monitors to associate different monitor groups or add a different tag can be a tedious process. With Site24x7's Configuration Rules, you can automate the configuration settings of your monitoring resources. Also, Site24x7 enables you to create custom rules to track configuration changes continuously and achieve the ideal configuration settings.

How to add Configuration Rules

Summary

The Summary tab will give you the performance data organized by time for the metrics listed above. To view the summary:

  1. Go to Cloud > GCP > GCP Anthos Cluster .
  2. Select a resource.
  3. Click the Summary tab.

Reports

Gain in-depth data about the various parameters of your monitored resources and accentuate your service performance using our insightful reports.


To view reports for a Google Anthos Cluster resource:

  1. Go to the Reports section on the left navigation pane.
  2. Select Google Anthos Cluster from the menu on the left.
  3. You can find the Availability Summary Report , Performance Report , and Inventory Report for one selected monitor. Or you can get the Summary Report , Availability Summary Report , Health Trend Report , and Performance Report for all the Google Anthos Cluster monitors.

You can also get reports from the Summary tab of the GCP Anthos Cluster monitor:

  1. Click the Summary tab.
  2. Get the Availability Summary Report of the monitor by clicking Availability .
  3. You can also find the Performance Report of the monitor by clicking any chart title.

Was this document helpful?

Would you like to help us improve our documents? Tell us what you think we could do better.


We're sorry to hear that you're not satisfied with the document. We'd love to learn what we could do to improve the experience.


Thanks for taking the time to share your feedback. We'll use your feedback to improve our online help resources.

Shortlink has been copied!