python爬虫小程序代码_python爬虫

python爬虫小程序代码: 发布时间：2019-10-18编辑：脚本学堂

一例python爬虫小程序，用于取python urllib2模块与re模块，urllib2是实现python爬虫的重点模块，re模块主要用于正则匹配。

例子，python 爬虫程序（python urllib2模块，python re模块）

复制代码代码示例:

#!/usr/bin/env python
#
import urllib2
import re

response = urllib2.urlopen('http://www.jb200.com/')

text = 'JGood is<title>sdfa</title> a handsome <title> boy, </title>he is cool, clever, and so on...'
text2 = text.replace('y','')
#m = re.search(r'<title>(.*)</title>',response.read())
#m = re.match(r'.*<title>(.*)</title>.*',response.read())
#m = re.match(r'.*<title>(.*)</title>.*',text2)
m = re.search(r'<title>(.*)</title>',text2)

print m.group(1).decode('utf-8','ignore')

#m = re.finditer(r'<title>(.*)</title>',text)
#m = re.finditer(r'<title>([^<title>]*)</title>',text) ///匹配不能包含<title>中任意字符的一个。

m = re.finditer(r'<title>((.(?!<title>))*.)</title>',text) ///匹配不是<title>的字符串。

您可能感兴趣的文章：

上一篇：python beautifulsoup爬虫程序获取百度搜索结果
下一篇：Python爬虫：python抓取PM2.5浓度与排名方法示例

与 python爬虫小程序代码有关的文章

本文标题：python爬虫小程序代码
本页链接：http://www.jb200.com/article/30120.html

python爬虫小程序代码