Important Counters for Performance Testing

What are Performance Counters?

In Performance Testing, the Performance counters are a type of metric used to measure the performance of hardware, software and networks. These counters are used to track the performance of various components of a software system, such as processors, memory, disks, networks, and application components.

Uses of Performance Counters

An operating system or application collects the Performance counters to report the performance of a system. These counters provide valuable information about how a system is performing. Using these performance counters a performance tester or engineer can say:

How quickly a system or server is responding to requests?
Is a system or server busy or ideal?
How much memory is being used?
How often it is accessing the disk?
What are the causes for the delay? and so on

In addition, performance counters are important to identify bottlenecks, diagnose problems, and optimize system performance while conducting performance testing. We can also use these counters to measure the effectiveness of system implementations and to compare different systems for performance. By providing detailed information about a system’s performance, performance counters can help administrators make informed decisions about system upgrades and optimization.

Category of Performance Counters

In Performance Testing, the Performance counters have several categories, such as system metrics, application metrics, network metrics, and disk metrics.

System metrics measure the performance of the system as a whole, such as a processor utilization, memory usage, disk I/O, and network bandwidth.
Application metrics measure the performance of a specific application, such as the response time of a web page, the number of requests served, or the amount of data processed.
Network metrics measure the performance of a network, such as the number of packets sent or received, the round-trip time of a packet, or the packet loss rate.
Disk metrics measure the performance of a disk, such as the number of read/write operations, the amount of data read or written, or the read or write latency.

How to collect?

Performance counters can be collected in real-time or log-based. It also depends on the environment where you collect the counters. Like, a tester collects real-time counters while conducting performance testing on the internal environment. The production environment can have either of them i.e. Real-time or Log-based performance counters.

Real-time performance counters are collected continuously and can be used to track system changes over time.
Log-based performance counters are collected at specific points in time, such as when an application starts up or when a task is completed.

Common Performance Counters

CPU Usage: The CPU Usage counters measure the percentage of the total CPU time used by all processes running on the server.
Memory Usage: This counter measures the amount of physical memory being used by the server.
Disk Reads/sec: This counter measures the number of disk read operations per second.
Disk Writes/sec: This counter measures the number of disk write operations per second.
Network Bytes/sec: This counter measures the amount of data sent and received over the network by the server.
Disk Queue Length: This counter measures the number of read/write requests that are waiting to be processed by the server.
Processes/sec: This counter measures the number of new processes created per second.
Context Switches/sec: This counter measures the number of context switches per second.

Common Bottlenecks

CPU Usage:

High CPU Usage due to heavy load from multiple processes: This is a common bottleneck and can be identified by tracking the CPU usage of each process to see which one is using the most resources.
High CPU Usage due to an inefficient algorithm or software design: This can be identified by analyzing the code or algorithm to identify any inefficient processes that are causing an increase in CPU usage.
High CPU Usage due to large data sets: This can be identified by analyzing the data sets to see if any queries or operations are taking longer than usual to complete due to the size of the data set.
High CPU Usage due to insufficient hardware resources: This can be identified by comparing the hardware configuration to the expected workload and determining if additional resources, such as memory or CPU, are needed.

Memory Usage:

Low available RAM: This can happen when too many applications are running at the same time and competing for the same RAM resources.
Poorly optimized applications that use too much RAM.
Fragmented Memory: It happens when the system has been running for a long time and memory pages are scattered across too many locations.
Memory Leakage: It occurs when an application does not properly release memory when it is no longer needed.

Network Usage:

Low Bandwidth: This is when the bandwidth of the network connection is too low to handle the amount of data that is being sent or received.
Congestion: This is when there is too much data or too many users on the network, causing it to slow down.
Poorly Configured Routers: Routers that are not properly configured can cause a bottleneck by limiting the amount of data that can be sent or received.
Outdated Hardware: Old or outdated network hardware can also cause bottlenecks by not being able to process the amount of data being sent or received.

Disk Write Usage:

Speed: Slow disk write speeds, such as when the disk is nearing full capacity.
Concurrency: Too many applications trying to write to the disk at the same time.
Capacity: Insufficient RAM or CPU resources.
Background Tasks: Defragging or disk cleaning operations running in the background.

Context Switches:

High frequency of context switches from user space to kernel space: This could indicate that the system is spending too much time in kernel space processing system calls.
High frequency of context switches between two or more processes: This could indicate contention between processes for resources.
High frequency of context switches caused by interrupts: This could indicate that hardware interrupts are consuming too much CPU time.

Bottlenecks can also be identified by comparing two different counters and inferring observations from them. Below are common bottlenecks.

High CPU Usage vs Low Memory Usage: If the current CPU utilization is high but the memory utilization is low, then this can indicate that the system is being bottlenecked by the CPU.
High Disk Throughput vs Low Latency: If the current disk throughput is high but the disk latency is low, then this can indicate that the system is being bottlenecked by disk I/O.