What is CPU steal time: Site24x7

CPU steal time, also known as stolen CPU, is the percentage of time a virtual CPU within a cloud server involuntarily waits on a physical CPU for its processing time.

In a cloud environment, a hypervisor acts as an interface between the physical server and its virtualized environment. This software layer is installed on the physical hardware and manages all tasks by allocating CPU time to processes such as virtual machines (VMs), networking operations, storage I/O requests, and more.

CPU steal time occurs when the processes are ready to be executed by the virtual CPU, but it’s waiting for the hypervisor to allocate a physical CPU to it. This happens because the hypervisor is servicing another VM.

CPU time in virtual environments

VMs share their resources—such as RAM, hard disk, CPU cycles, etc.—with other VMs on the same host in a virtual environment. If a VM is one of four identical VMs with the same load and size on a physical server, simple math suggests that each VM should be getting 25% of the CPU cycles.

In most cases, however, the VM will end up consuming more than the allocated cycles. In some rare cases, where it’s not getting the assigned CPU cycles, the steal time will increase.

Identifying and tracking CPU steal time

It’s almost impossible to avoid some amount of steal time while running a system as a VM in a cloud environment. This is because the VM is sharing the physical server with other VMs.

The VM’s steal time can be seen by running the top command on the Linux terminal:

Fig. 1: Output of top command

The top command presents the system summary along with the list of the processes or threads currently being managed by the Linux kernel.

As highlighted in the above screenshot, steal time is labeled as st or %st. If the value of steal time is 20st, that means 20% of the total process time is spent waiting for a physical CPU to be allocated.

The maximum value st can have is 100.0. This is the worst case scenario, where the virtual CPU does nothing but wait for the hypervisor to allocate a physical CPU. Luckily, this situation is very rare.

Causes of CPU steal time

There are two major causes of high CPU steal time:

Processes need more than the allocated CPU

When load-heavy processes are run on a VM, the CPU cycles allocated to it may not be enough to handle the workload.
Physical server is overloaded by VMs
In this case, the cloud server providers oversubscribe the physical server with VMs, making it impossible for the physical CPU to handle the processes.

Unfortunately, it’s difficult to figure out which of these two cases a situation falls under just by looking at the steal time. There are other factors to consider. However, if there are identical VMs with similar workloads running on different hosts, it might be possible to figure out which case the situation falls under.

Issues with high CPU steal time

VMs running on a hypervisor with high steal time can cause significant problems, including:

Slow I/O
Slower page loading time
Slower database querying time
Slower processing of reports
Increased queue size of asynchronous tasks due to the inability to process them quickly
Increased infrastructure cost due to launching more servers to handle the same load

It is wise to have zero tolerance towards high steal time on a server. Stolen CPU slows machines down, even causing them to stop altogether in extreme circumstances.

When to worry

If the CPU steal time is lower than 10%, then there’s nothing to worry about, and the application should run smoothly. However, the VM is probably running slower than expected if the value of steal time is greater than 10%—i.e., above the normal value for around 20-30 minutes. If the steal time remains high, it indicates CPU contention, which can reduce the performance of the application.

Fixing high CPU steal time

As discussed above, in most cases CPU steal is caused by poor allocation and insufficient resources leading to an overloaded CPU. Now let’s look at some possible fixes:

If there is a sudden increase in CPU steal time, the first possible solution is to manually terminate the VM and relaunch it to another physical server. However, if the root cause is some slow and inefficient code in the application, this will only be a temporary fix.
In many cases, overselling of the physical server is the root cause of CPU steal time. In this case, server providers should limit the amount of processing power used by each VM.
If the CPU resources allocated to the VM are not enough to process the requests, increase them by scaling up either the processing time or processor cores.
If there are no financial constraints, upgrading the VM is the quickest and surest way to solve the problem at its root.

Summary

When deploying an application to a virtualized computing environment, CPU steal time is a crucial metric to watch, as it can impact the application in multiple ways. By monitoring steal time and identifying its correct cause, you can take appropriate measures to reduce it.

Was this article helpful?

Sorry to hear that. Let us know how we can improve the article.

Previous Killing a process from the Command Line in Linux

Next The importance of TLS in Linux

What is CPU steal time?