首先,是统计某个时间点的CPU负载,还是某个时间段的?
为了画折线图报表,一般横坐标都是某个时间点,也就是希望能够统计某个时间点的CPU负载,但这是很难办得到的。
比较容易的做法是通过两个时间点之间的CPU负载,也就是某个时间段。
如果要做benchmark,就把时间段变得很小,1秒甚至更小。如果要常规监控, 可以将时间段放大到1分钟,甚至更多。
第二个问题,用什么来判断linux操作系统下某个时间段的CPU的负载?
CPU有一个基本时间度量单位叫做jiffy,这是一个很短的时间,具体时长多少取决与硬件。
不过关系不大,对于我的计算负载达到百分之多少来讲已经够用了。
有关/proc/stat文件的说明,可以参考这篇文章http://www.linuxhowtos.org/System/procstat.htm。
重点关注:
1. 第一行CPU的数值是下面几个CPU数值的总和
2. 一行7个数字的分别解释:
The meanings of the columns are as follows, from left to right:
user: normal processes executing in user mode
nice: niced processes executing in user mode
system: processes executing in kernel mode
idle: twiddling thumbs
iowait: waiting for I/O to complete
irq: servicing interrupts
softirq: servicing softirqs
有关计算公式,可以参考文章:http://stackoverflow.com/questions/3017162/how-to-get-total-cpu-usage-in-linux-c
e.g. Suppose at 14:00:00 you have
cpu 4698 591 262 8953 916 449 531
total_jiffies_1 = (sum of all values) = 16400
work_jiffies_1 = (sum of user,nice,system = the first 3 values) = 5551
and at 14:00:05 you have
cpu 4739 591 289 9961 936 449 541
total_jiffies_2 = 17506
work_jiffies_2 = 5619
So the %cpu usage over this period is:
work_over_period = work_jiffies_2 - work_jiffies_1 = 68
total_over_period = total_jiffies_2 - total_jiffies_1 = 1106
%cpu = work_over_period / total_over_period * 100 = 6.1%
很容易理解。最后算出来的小数 * 100后就是百分数。
在我的机器上,一共10列。
复制代码 代码示例:
cat /proc/stat
cpu 2065552 1692 636745 10842974 59979 16 6860 0 0 0
cpu0 524690 552 158305 2701823 8912 7 4808 0 0 0
cpu1 511203 670 157274 2703792 31404 1 1179 0 0 0
cpu2 519169 441 155591 2720326 11179 0 438 0 0 0
cpu3 510489 27 165574 2717032 8482 7 435 0 0 0
在man 5 proc中回车,输入/proc/stat后再次回车进行查找,我们可以看到:
/proc/stat
kernel/system statistics. Varies with architecture. Common entries
include:
cpu 3357 0 4313 1362393
The a
mount of time, measured in units of USER_HZ (1/100ths of a second on most architectures, use sysconf(_SC_CLK_TCK) to obtain the right value), that the system spent in user mode,
user mode with low priority (nice), system mode, and the idle task, respectively. The last value should be USER_HZ times the second entry in the uptime pseudo-file.
In Linux 2.6 this line includes three additional columns: iowait - time waiting for I/O to complete (since 2.5.41); irq - time servicing interrupts (since 2.6.0-test4); softirq - time
servicing softirqs (since 2.6.0-test4).
Since Linux 2.6.11, there is an eighth column, steal - stolen time, which is the time spent in other operating systems when running in a virtualized environment
Since Linux 2.6.24, there is a ninth column, guest, which is the time spent running a virtual CPU for guest operating systems under the control of the Linux kernel.
说明:
第8个是虚拟机环境下,其他OS偷走的时间。
第9个是如果是host机器,那么运行的guest VM用去的时间。
这些信息很有用,毕竟现在不少server其实只是VM虚拟机而已。