本节内容:
linux shell/ target=_blank class=infotextkey>shell脚本统计日志文件中IP地址的访问次数
使用awk统计
去哪网的面试:数据量不大的话用awk最方便,但长时间没有用过了,忘记了awk数组的用法。
假设数据格式为:
178.60.128.31 www.google.com.hk
193.192.250.158 www.google.com
210.242.125.35 adwords.google.com
210.242.125.35 accounts.google.com.hk
210.242.125.35 accounts.google.com
210.242.125.35 accounts.l.google.com
64.233.181.49 www.google.com
212.188.10.167 www.google.com
23.239.5.106 www.google.com
64.233.168.41 www.google.com
62.1.38.89 www.google.com
62.1.38.89 chrome.google.com
193.192.250.172 www.google.com
212.188.10.241 www.google.com
37.228.69.57 www.google.com
222.255.120.42 www.google.com
222.255.120.42 www.gstatic.com
212.188.10.167 www.googleapis.com
64.233.181.49 www.googleapis.com
64.233.181.49 fonts.googleapis.com
193.192.250.158 plus.google.com
193.192.250.158 talkgadget.google.com
193.192.250.158 ssl.gstatic.com
193.192.250.158 images-pos-opensocial.googleusercontent.com
193.192.250.158 images1-focus-opensocial.googleusercontent.com
193.192.250.158 images2-focus-opensocial.googleusercontent.com
193.192.250.158 images3-focus-opensocial.googleusercontent.com
193.192.250.158 images4-focus-opensocial.googleusercontent.com
193.192.250.158 images5-focus-opensocial.googleusercontent.com
193.192.250.158 images6-focus-opensocial.googleusercontent.com
193.192.250.158 clients4.google.com
222.255.120.42 google.com
222.255.120.42 apis.google.com
222.255.120.42 clients1.google.com
193.192.250.158 clients2.google.com
193.192.250.158 clients3.google.com
193.192.250.158 clients5.google.com
64.233.181.49 maps.google.com
64.233.181.49 mts0.google.com
64.233.181.49 maps.gstatic.com
awk的统计代码:
复制代码 代码示例:
awk '{arr[$1]++;}END{for(i in arr){print i , arr[i] }}' test.txt
输出:
[blog@AY1310301904525972ddZ ~]$ awk '{arr[$1]++;}END{for(i in arr){print i , arr[i] }}' test.txt
212.188.10.241 1
64.233.168.41 1
23.239.5.106 1
193.192.250.158 15
178.60.128.31 1
37.228.69.57 1
212.188.10.167 2
193.192.250.172 1
62.1.38.89 2
64.233.181.49 6
210.242.125.35 4
222.255.120.42 5
增加排序:
[blog@AY1310301904525972ddZ ~]$ awk '{arr[$1]++;}END{for(i in arr){print i , arr[i] }}' test.txt | sort -n -k 2
178.60.128.31 1
193.192.250.172 1
212.188.10.241 1
23.239.5.106 1
37.228.69.57 1
64.233.168.41 1
212.188.10.167 2
62.1.38.89 2
210.242.125.35 4
222.255.120.42 5
64.233.181.49 6
193.192.250.158 15
=============对网友:【hattah】 回答的补充===============
测试了两种方法的效率:
理论上sort排序数据量越大,速度越慢。
实测结果:
[blog@AY1310301904525972ddZ ~]$ time awk '{print $1}' test.txt |sort|uniq -c
1380 178.60.128.31
17312 193.192.250.158
1160 193.192.250.172
4640 210.242.125.35
2320 212.188.10.167
1160 212.188.10.241
5734 222.255.120.42
1160 23.239.5.106
1160 37.228.69.57
2320 62.1.38.89
1160 64.233.168.41
6894 64.233.181.49
real 0m0.236s
user 0m0.228s
sys 0m0.004s
[blog@AY1310301904525972ddZ ~]$ time awk '{arr[$1]++;}END{for(i in arr){print i , arr[i] }}' test.txt | sort -n -k 2
193.192.250.172 1160
212.188.10.241 1160
23.239.5.106 1160
37.228.69.57 1160
64.233.168.41 1160
178.60.128.31 1380
212.188.10.167 2320
62.1.38.89 2320
210.242.125.35 4640
222.255.120.42 5734
64.233.181.49 6894
193.192.250.158 17312
real 0m0.025s
user 0m0.022s
sys 0m0.001s
以上主要用到了awk命令,有关它的用法,请参考如下文章: