验证BingPreview是BING更新网页快照的爬虫

发布时间:2020-01-03编辑:脚本学堂
本文介绍下,验证BingPreview是BING更新网页快照的爬虫的方法,用来分析日志中的爬虫信息很有帮助,有需要的朋友参考下吧。

在WIN IIS日志里发现一条记录,看起来像是爬虫叫BingPreview,不过BING爬虫的标准名称是Bingbog或Msnbot。
从英文网站找下资料,经验证,BingPreview确实是BING的爬虫程序。

在日志里看到BingPreview的UserAgent如下:
IIS6 cs(User-Agent):Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/534++(KHTML,+like+Gecko)+BingPreview/1.0b

日志里的客户端IP(c-ip)分别为以下两个:
131.253.38.67, 199.30.16.124
第一个IP只抓取了BING站长验证文件/BingSiteAuth.xml,后面这个IP对网站页面进行了抓取。

验证Bingbot:
找到BING 的“验证 Bingbot”页面 http://www.bing.com/toolbox/verify-bingbot,分别输入两个IP地址, 得到结果均是:
“判定 IP 地址 131.253.38.67:是 - 该 IP 地址是经过验证的 Bingbot IP 地址。名称: msnbot-131-253-38-67.search.msn.com ”;
“判定 IP 地址 199.30.16.124:是 - 该 IP 地址是经过验证的 Bingbot IP 地址。名称: msnbot-199-30-16-124.search.msn.com”。
由此可见,BingPreview确实是Bing的爬虫程序。

看起来BING的爬虫有点混乱:
本站除了有“BingPreview”外,也有“Bingbot”与“Msnbot”。这三个应该都是BING爬虫程序吧。Msnbot的User-Agent是这样的:65.55.217.201 msnbot/2.0b+(+http://search.msn.com/msnbot.htm)

BingPreview是什么爬虫?
原来,BingPreview是BING搜索引擎通过WIN8的BING APP触发,专门用来更新网页快照的蜘蛛程序。以下为BING BLOG里的原文:

Page snapshots in Bing Windows 8 app to bring new crawl traffic to sites
Today is a very exciting day as Windows 8 is now generally available to hundreds of millions of people, who will have access to a superb search experience through the preinstalled Bing app. This week we would like to highlight one specific feature that will impact the crawl traffic (visits to your site from our crawlers) we send to your website.
In addition to traditional web search, the Bing app for Windows 8 features a very visual image search feature, allowing users to swipe conveniently through a collection of thumbnails.
On top of this overview of the search results, users have the possibility to switch to a more detailed view by simply tapping on one of the images. The result is a full screen version of the image along with some metadata, including a link to the image source page and a small snapshot of the page.
This page snapshot is the specific feature we would like to highlight this week, as it is generated by our web crawler. Even though our crawler is intelligent enough to reuse components of your site it has already seen in the past, it will occasionally come and visit your pages again, as requested by a Bing app user, in order to get the freshest and most accurate snapshot possible. Therefore, as usage of the Bing app increases, you should expect more and more of this crawl traffic coming your way.
In order to be transparent on what crawl traffic is being generated, and obtain the best results, we are using a different user agent for this specific “snapshot generation” traffic:
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b
Having this page snapshot as part of the “full details” experience is a great way for us to drive traffic to your website as Bing app users look through your images.   As search continues to evolve in a visual, tactile and vocal direction, features such as the Bing App in Windows 8 stand to deliver traffic directly to sites by introducing searchers to sites they hadn’t previously discovered.

说明:bing的这个BingPreview很是让人头疼,会在百度统计中显示直接访问,导致跳出太高,如果你觉得bing来的流量太少,或不重要,大可屏蔽它。

屏蔽bing蜘蛛的方法:
 

复制代码 代码示例:
User-agent: Bingbot
Disallow: /
User-agent: msnbot
Disallow: /