Help Server Reboot

Automate Server Reboot

Automate reboot of your server along with any specific arguments, if any to ensure continued server performance.

Use Case: 

Consider an e-commerce application being monitored using Site24x7's Service and Process Monitoring. Unexpectedly, the service becomes unresponsive. On analyzing, it is found that the memory usage of that service exceeds 90%.

Problem Statement:

The option of rebooting all the servers manually in the production environment might be tedious and practically impossible. The process might also be time consuming, and it would be too late before the application turns malicious. This might affect other application services, and ultimately all the servers.

Solution:

The IT personnel can set up a Strategy option in the Threshold and Availability Profile, with Process Memory Utilization Threshold greater than 90%, and Poll Count set as 3. This can be associated with a Server Reboot automation.

With this setting, if there is a threshold violation, the automation to reboot the server will be executed if the violation exists even after three polls.

Tip:

You can choose $LOCALHOST as the Destination Host, when monitoring 100's of servers. This will ensure the automation is carried out in any of the servers where there is a threshold violation. This is applicable only for server monitors (agent-based).

Add Automation

Supported versions: 18.4.0 & above for Windows | 16.6.0 & above for Linux 

  1. Log in to Site24x7 and go to Admin > IT Automation Templates (+). You can also navigate via Server > IT Automation Templates (+).  
  2. Select the Type of Automation as Server Reboot.
  3. Provide a Display Name for identification purposes.
  4. Select Hosts, Tags, or Monitor Groups for executing the server reboot automation.
    Eg: In the above case, choose $LOCALHOST to execute server reboot in any server where the threshold violation occurs. This is applicable only for server monitors (agent-based).
  5. Enter a Time-out period (in seconds) representing the maximum time period the agent has to wait for the execution to complete. Post that, there will be a time-out error. This will be captured in the email notification, if set to Yes.
    The Time-out is set at 15 seconds, by default. You can define a time-out between 1-90 seconds.
  6. You can choose to Send an Email of the Automation Result to the user group(s) configured in the notification profile. By default, it is set to No. This email will contain parameters including the automation name, type of automation, incident reason, destination hosts, and more.
    If you've multiple automations executed in one data collection, a consolidated email will be sent.
  7. Save the changes.
Once an automation is added, schedule these automations to be executed one after the other.

Notification Profile Settings:

Configure the following settings in the notification profile:

  • Notify Down/Trouble status after executing IT automations associated to the monitor: When set to Yes, if your monitor still faces an outage even after executing the specified action, you'll be immediately alerted about the Down/Trouble status. 
  • Suppress IT Automation of dependent monitors: When the status of the dependent resource is Down, execution of the IT automation is not performed. 

Test Automation

Once you add an automation, go to the IT Automation Summary page (Server > IT Automation Templates) and use the  icon for a test run. Read more.

The test run would be applied to all the hosts selected for automatic reboot execution. An exception to this would be selection of $LOCALHOST as the only host.

Click on the IT Automation Logs to view the list of automations executed by date.

Map Automation

For an automation to be executed, map it with the desired event. This can be done in two ways:

Related Articles

Was this document helpful?
Thanks for taking the time to share your feedback. We’ll use your feedback to improve our online help resources.

Help Server Reboot