Help Server Metrics Root Cause Analysis

Root Cause Analysis Report (RCA)

Every time a downtime is detected, a Root Cause Analysis (RCA) report is triggered and sent to a user based on the alerting contact and medium. The RCA generated provides the actual reason behind the downtime, along with the trace route map to diagnose connectivity issues.

For example, a server crashes due to a high process usage. Site24x7 will declare the monitor as Down and send a RCA to the user. The server monitoring agent will collect the top processes by CPU, memory, and other events before the server crashed and present it in the RCA report. This will help in quicker troubleshooting and prevent similar performance degradation issues in the future.

The different components of a RCA report for a Windows and a Linux server are discussed:

RCA for a Windows Server:

The various components generated in a RCA report when a downtime is detected in a Windows server is as follows:

  1. Monitor Details: Basic monitor details including monitor name, type, IP address, host name , downtime duration are listed
    RCA report for Windows 01 
  2. Top Processes by CPU (includes Last 5 minutes average): Graphical representation of the top processes utilizing the highest amount of CPU. Also, another graph shows the the top processes utilizing the highest amount of memory in the last 5 minutes
  3. Top Processes by Memory (includes Last 5 minutes average): Graphical representation of the top processes utilizing the highest amount of memory. Also, another graph shows the the top processes utilizing the highest amount of memory in the last 5 minutes 
    RCA report for Windows 02 
  4. Disk details: Lists down the disks with their total size and the available free space
  5. Hard Disk Status: The size of the hard disks, their current status, and any description of any error occurred on the hard disk is given
    RCA report for Windows 03 
  6. Trace Route: To enable inclusion of trace route analysis in the RCA, the user has to provide firewall access for taking trace route of the plus.site24x7.com domain. Enabling this will let the user to drill down to the actual reason behind connectivity issues and take corrective actions at the earliest
    Traceroute for Windows
  7. Event Logs: The type of event logs (Warning, Error, Audit Failure, Critical), their description, the time at which it was written, and its source are noted down
    RCA report for Windows 04 
  8. CPU Fan Status: Current status of the CPU Fan
  9. Logged in Users: The number of active users on that server are categorized
  10. Softwares installed in the last 30 days: The softwares that were installed in the last 30 days in your server are tabulated

RCA for a Linux Server:

The various components generated in a RCA report when a downtime is detected in a Linux server is as follows:

  1. Monitor Details: Basic monitor details including monitor name, IP address, host name , reason for the downtime, downtime duration are listed 
  2. Top Processes by CPU (includes Last 5 minutes average): Graphical representation of the top processes utilizing the highest amount of CPU. Also, another graph shows the the top processes utilizing the highest amount of memory in the last 5 minutes
  3. Top Processes by Memory (includes Last 5 minutes average): Graphical representation of the top processes utilizing the highest amount of memory. Also, another graph shows the the top processes utilizing the highest amount of memory in the last 5 minutes 
    RCA report for Linux 01
  4. CPU Utilization: Data on the load percentage, context switches per second, interrupts per second are tabulated and given 
  5. Disk Utilization: Lists down the disks with their total size and the available free space
  6. Memory Stats: Metrics on the memory including total, used, free, buffer free/used, total virtual free/used are listed
  7. Network Details: Information on the packets sent/received, status of the net connection, transmitted and received traffic are specified
    RCA report for Linux 02
  8. Trace Route: To enable inclusion of trace route analysis in the RCA, the user has to provide firewall access for taking trace route of the plus.site24x7.com domain. Enabling this will let the user to drill down to the actual reason behind connectivity issues and take corrective actions at the earliest
    Traceroute for Linux
  9. User Sessions: The number of active users on that server are categorized
  10. Disk Errors: Disk errors from kernel which includes I/O error and file system errors
  11. Driver Messages: Error messages from kernel will be listed here
  12. Syslogs: The process ID of that particular syslog, error message, the formatted time and the severity level are stated
    RCA report for Linux 03

Related Articles:

Was this document helpful?
Thanks for taking the time to share your feedback. We’ll use your feedback to improve our online help resources.

Help Server Metrics Root Cause Analysis