Linux Performance Monitoring – vmstat

Linux server performance monitoring through the ‘vmstat’ command, is one of the oldest and most valuable ways to capture memory statistics. ‘vmstat’ stands for ‘Virtual Memory Statistics’ and it is a utility that provides an excellent low-overhead view of system performance. Because ‘vmstat’ is such a low-overhead tool, it is practical to keep it running on a console even under a very heavily loaded server where you need to monitor the health of a Linux server at a glance.

The utility runs in two modes: average and sample mode. The sample mode will measure statistics over a specified interval. This mode is the most useful when understanding performance under a sustained load. Users can observe system activity virtually in real-time by selecting a sampling period.

Command Syntex:

$ vmstat

Output

Linux Performance Monitoring - vmstat - 1
Figure 01

The output has information about processes, memory, paging, block IO, traps, disks and cpu activity.

Field Description

  • Procs
    • r: The number of runnable processes (running or waiting for run time).
    • b: The number of processes blocked waiting for I/O to complete.
  • Memory
    • swpd: The amount of virtual memory used.
    • free: The amount of idle memory.
    • buff: The amount of memory used as buffers.
    • cache: The amount of memory used as a cache.
    • inact: The amount of inactive memory. (-a option)
    • active: The amount of active memory. (-a option)
  • Swap
    • si: The amount of memory swapped in from disk (/s).
    • so: The amount of memory swapped to disk (/s).
  • IO
    • bi: Blocks received from a block device (blocks/s). Impact, when read operations occur.
    • bo: Blocks sent to a block device (blocks/s). Impact, when write operations occur.
  • System
    • in: The number of interrupts per second, including the clock.
    • cs: The number of context switches per second.
  • CPU: These are percentages of total CPU time.
    • us: Time spent running non-kernel code. (user time, including nice time)
    • sy: Time spent running kernel code. (system time)
    • id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
    • wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle.
    • st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.

Options

-a: Display active and inactive memory. Active memory is memory in use by a process. Inactive memory is a memory that was allocated to a process which is no longer running.

Linux Performance Monitoring - vmstat - 2
Figure 02

-f: Display the number of forks since boot. Fork means a new process that is created from existing running one.

Figure 03

-n: Display the header only once rather than periodically.

Figure 04

-s: Display a table of various event counters and memory statistics.

Figure 05

d-: Report disk statistics

Figure 06

-D: Report some summary statistics about disk activity.

Figure 07

-t: Append timestamp to each line

Linux Performance Monitoring - vmstat - 6
Figure 08

[Delay Value]: The delay between the command output updates in seconds. If no delay is specified, only one report is printed with the average values since boot. In the below example, 5 seconds is a delay between command outputs.

Linux Performance Monitoring - vmstat - 5
Figure 09

[Count Value]: It represents the number of required outputs. In the absence of a count value, when the delay is defined, the default is infinite outputs until the command gets killed. In the below example, the count value is 2 and hence only 2 outputs are displayed.

Linux Performance Monitoring - vmstat - 4
Figure 10

These are the important options related to the vmstat command.

How to capture vmstat during a load test?

Before starting the load test you have to run the below command:

$ nohup vmstat [Delay Value] [Count Value] > [File Name] &

Example: If you want to run a load test for 1 hour and need to capture the server statistics in the vmstatLoadTest.dat file in every 10 seconds then the calculation of the Count Value will be:
=> 1 hour = 3600 seconds
=> 3600/10 = 360 // Count Value

And, the command will be:

$ nohup vmstat 10 360 > vmstatLoadTest.dat &

Refer vmstatLoadTest.dat file, after the completion of the test and analyse the result.

Note: Use -t with the command to include the timestamp.

How to analyse vmstat file?

You can simply use MS Excel to analyse the vmstat file. Use the ‘Text to Column’ option so that stats are separated into columns and then you can plot the graphs by selecting the required columns.

  1. ‘us’ value should be less than 70% (In ideal conditions)
  2. In case the ‘wa’ value increases that means the CPU has to wait for the IO resource.
  3. ‘si’ and ‘so’ values increase when swapping in/out several memory pages to the swap file on the hard disk, in order to get more RAM.
  4. ‘bi’ and ‘bo’ values get changed when read and write operations take place.
  5. If the user CPU percentage is high (us) but the amount of context switches (cs) does not increase significantly, this could suggest that a single-threaded application used a large amount of processors for a short period of time.
  6. If there is a high amount of interrupts (in) and a low amount of context switches (cs) then it shows that a single process is making requests to hardware devices. To further prove the presence of a single application, the user (us) time should be constantly high. Along with the low amount of context switches (cs), the process comes on the processor and stays on the processor.
  7. The amount of context switches is higher than interrupts, indicating that the kernel has to spend considerable time on context-switching threads.
  8. The high volume of context switches is causing an unhealthy balance of CPU utilization. It can be proved if the wait on IO percentage is extremely high and the user percentage is extremely low.

Important Points

  1. vmstat does not require special permissions.
  2. The ‘vmstat’ command is part of ‘sysstat’, a system monitoring tool that generates CPU and device statistics and reports.
  3. Calculate the Count Value by providing some extra time for the load test, so that you can have Linux server monitoring stats for pre, post and during the test.

You may be interested:


Leave a Comment