你是否真的理解多核CPU系统的CPU使用率?

上一篇 / 下一篇  2012-08-31 10:55:39 / 个人分类:IT英语

When it comes to this analyzing and interpreting CPU metrics, there are many myths, wrong practices and misconceptions which prevail. In this post I would be touching upon a few critical issues and will try explaining the facts and best practices in detail.

A wrong practice
Fetching system wide CPU utilization of server node which is a symmetric multiprocessing system (SMP: A node having multiple cores/CPU’s sharing same memory, bus and IO)

For most of the performance engineers or testers the common practice is to just fetch a system wide CPU utilization of a multi-core server node during load test and then start analyzing the result. Capturing a system wide CPU using tools like LoadRunner on your UNIX/LINUX server nodes is highly misleading and could result in wrong interpretation of the cause of the bottleneck and henceforth deriving wrong solutions and futile efforts in an attempt to fix it.

Different software’s such as a database or middleware have their own proprietary algorithms to deal with multi-core and multi-CPU’s and when you observe a system wide CPU utilization to be under the threshold limit during a load test it does not always mean that each individual core is equally loaded. Let me illustrate this: if you observe that a certain load testing tool is showing CPU utilization of 25% and if this particular server node has a quad core processor then it does not mean that all the cores are equally loaded with processes/threads. A single core might be utilized up to 100% and the rest might be left with a 100% idle task.

The possibility of loading each individual core equally or proportionately depends a lot on the underlying parallelism of the application and also on whether the software under test has the capability to perform. intra query parallelization between cores available in the SMP; example: Oracle supports intra query parallelization (i.e. splitting the work of one single query/independent query to two or more cores available in the SMP-Node) whereas MySQL database does not support this feature as a result independent query will be processed using only one core even if the utilization of that core/CPU is exceeding the threshold)

Another example: You might have observed while performing load tests that few transactions associated with slow queries mysteriously fails at the database node even though the CPU utilization is within threshold. Now, if you dig deeper into those slow queries and correlate it with CPU utilization of each individual core then you might be able to see that few cores are maxing out to 100% while the rest are well below the threshold which is the root cause of the problem

Also refer:
http://oreilly.com/catalog/oraclepp/chapter/ch01.html
http://www.forsythesunsolutions.com/node/73

Tools that I recommend to measure individual core/CPU’s utilization:

    Prstat

(option –m or -mL): Available in Solaris operating system allows you to measure utilization of CPU on per-thread basis.

    mpstat:

Available for linux/unix based OS (Note: Mpstat comes as a part of package of tools in Sysstat.)

    System Monitor:

GUI oriented tool readily available in Red Hat Linux Operating systems.

Also refer:
http://www.cyberciti.biz/tips/how-do-i-find-out-linux-cpu-utilization.html

A misconception
Getting misled by 100% CPU utilization shown by the monitoring tool.
You may be scratching and probably even banging your head against the monitor for not being able to find a feasible solution to fix the high CPU utilization problem observed on some of the server nodes. Your nodes might be experiencing 100% utilization even with an unimaginably less load.
Well, the good news is that 100% utilization doesn’t always mean that CPU is being a performance bottleneck. Especially in case of a UNIX or Linux operation system until and unless you see the ‘r’ value(process queue) in the vmstat output exceeding the total number of CPU count in the SMP-server node i.e. if r=5 but your SMP has only 4 CPU’s or cores then there is a bottleneck for sure.
The whole idea behind queuing is that if the CPU or CPUs are not busy when a thread is put into the run/processor queue(r), it is immediately executed by a CPU. But if all of the available CPUs are busy executing threads, then the incoming threads will have to wait in the run queue/processor queue until there is a CPU available to process the waiting threads.

That being said next time you see an alarmingly high value on the CPU metrics ask yourself this question: “What is CPU utilization?”
Answer: CPU utilization = 100% – (% of time spent in idle task)
If have understood the above equation you will never again misinterpret a CPU utilization metric.
When it comes to comprehending the severity of a CPU related resource crunch many engineers forget that the fundamental purpose of process dispatchers or schedulers of an operating system is to make sure the CPU utilization is always high in the time of need. It doesn’t always imply that high CPU utilization slows down the transaction rate, for there are CPU’s available in the market such as the ‘IBM System z’ which is capable to work at 100% busy CPU state by exploiting the presence of diverse workloads. The type of workload which arrives at CPU queue greatly decides severity of high CPU utilization scenario. For instances, short lived, high priority processes triggered by 100s of concurrent users can affect the responsiveness of the application greater than a scenario where there are heterogeneous processes with different priorities arriving at the processor queue. A CPU operates at a certain clock speed and processes a unit of work at certain speed (Frequency); Whether the CPU is 10% utilized or 100% utilized the processor ideally should deliver the unit of work at the same speed, but since all processes share the same physical resources such as CPU buses, caches and CPs the CPU time per transaction increases as utilization of CPU becomes high. Speed of thread/process execution gets affected only when the total number of process/threads waiting for CPU exceeds the total number of processors available in the SMP/server node. Again, do not forget to watch the “r (run queue)” value closely next time during load testing/performance testing.

Also refer:
http://blogs.sun.com/partnertech/entry/solaris_performance_primer_process_monitoring
http://www.dba-oracle.com/t_monitoring_unix_cpu.htm


TAG:

 

评分:0

我来说两句

Open Toolbar