饱受服务器overload之苦,排除了CPU的瓶颈之后,剩下就是排查IO的问题了。
用iostat -x 5 发现IO确实很大, %util基本上在100%左右,说明有问题;但具体是什么进程在作怪呢?
找了两个工具:
iotop用python写的,要求python2.5以上,而且系统内核在2.6.20以上,而我的centos只是2.6.18无法使用
dstat还不错
下载
wget http://dstat.sourcearchive.com/downloads/0.7.2-2/dstat_0.7.2.orig.tar.gz
解压后可用
先看看包含那些modules 使用命令:
/dstat --list
internal:
aio, cpu, cpu24, disk, disk24, disk24old, epoch, fs, int, int24, io, ipc, load, lock, mem, net, page, page24, proc, raw, socket,
swap, swapold, sys, tcp, time, udp, unix, vm
/usr/sky/soft/dstat/dstat/plugins:
battery, battery-remain, cpufreq, dbus, disk-tps, disk-util, dstat, dstat-cpu,dstat-ctxt, dstat-mem, fan, freespace, gpfs, gpfs-ops, helloworld, innodb-buffer, innodb-io, innodb-ops, lustre, memcache-hits, mysql-io, mysql-keys, mysql5-cmds, mysql5-conn,
mysql5-io, mysql5-keys, net-packets, nfs3, nfs3-ops, nfsd3, nfsd3-ops, ntp, postfix, power, proc-count, qmail, rpc, rpcd, sendmail, snooze, squid, test, thermal, top-bio, top-bio-adv, top-childwait, top-cpu, top-cpu-adv, top-cputime, top-cputime-avg, top-int,top-io, top-io-adv, top-latency, top-latency-avg, top-mem, top-oom, utmp, vm-memctl, vmk-hba, vmk-int, vmk-nic, vz-cpu, vz-io, vz-ubc, wifi
内容很多
我们这里只用到top-io top-bio
dstat --top-io -d --top-bio -l
i/o process | read writ| block i/o process | 1m 5m 15m
nginx: work1407k 1718k| 0 1980k|syslogd 0 200k|4.99 4.97 5.04
nginx: work 603k 878k| 0 1644k|syslogd 0 212k|4.99 4.97 5.04
java 14M 14M| 0 1876k|java 0 14M|5.55 5.09 5.08
java 15M 15M| 0 18M|java 0 15M|5.55 5.09 5.08
java 15M 16M| 0 23M|java 0 16M|5.55 5.09 5.08
java 4216k 4272k| 0 13M|java 0 4272k|5.55 5.09 5.08
java 15M 16M| 0 12M|java 0 16M|5.55 5.09 5.08
java 17M 17M| 0 16M|java 0 17M|5.59 5.10 5.08
可以看出是java的进程有些问题,但是那个进程号还得查资料。
使用命令:
dstat --top-io-adv
./dstat -t --top-io-adv -d -l
09-02 14:32:00|java 13863 19M 19M2.2%| 0 15M|6.34 6.05 6.02
09-02 14:32:01|java 13863 16M 18M3.2%| 0 20M|6.34 6.05 6.02
09-02 14:32:02|java 13863 20M 16M3.4%| 0 13M|6.34 6.05 6.02
09-02 14:32:03|java 13863 14M 15M5.9%| 0 1724k|6.34 6.05 6.02
09-02 14:32:04|java 13863 17M 18M2.7%| 0 30M|6.34 6.05 6.02
09-02 14:32:05|java 13863 14M 14M2.0%| 0 21M|6.32 6.05 6.02
09-02 14:32:06|java 138636840k3726k0.8%| 0 10M|6.32 6.05 6.02
09-02 14:32:07|nginx: worker process1538 1120k1490k0.2%| 0 1748k|6.32 6.05 6.02