python beautifulSoup爬虫代码的例子_python爬虫

python beautifulSoup爬虫代码的例子: 发布时间：2019-11-08编辑：脚本学堂

python爬虫代码的例子，在python编程中实现网络爬虫是非常经典的例子，本节使用python beautifulSoup模块实现一个网页爬虫代码，供大家学习参考。

python beautifulSoup实现爬虫功能

利用 beautifulSoup(文档：http://www.crummy.com/software/BeautifulSoup/bs4/doc/)这个python模块，抓取网页内容。

python爬虫代码参考：

代码：

复制代码代码示例:

# coding=utf-8

import urllib

from bs4 import BeautifulSoup

url ='http://www.baidu.com/s'

values ={'wd':'网球'}

encoded_param = urllib.urlencode(values)

full_url = url +'?'+ encoded_param

response = urllib.urlopen(full_url)

soup =BeautifulSoup(response)

alinks = soup.find_all('a')

上面可以抓取百度搜出来结果是网球的记录。
beautifulSoup内置了很多非常有用的方法。

特性：
1、构造一个node元素

复制代码代码示例:

soup = BeautifulSoup('<b class="boldest">Extremely bold</b>')

tag = soup.b

type(tag)

# <class 'bs4.element.Tag'>

2、属性可以使用attr拿到，结果是字典

复制代码代码示例:

tag.attrs

# {u'class': u'boldest'}

或直接tag.class取属性也可。

3、自由操作属性

复制代码代码示例:

tag['class'] = 'verybold'

tag['id'] = 1

tag

# <blockquote class="verybold" id="1">Extremely bold</blockquote>

del tag['class']

del tag['id']

tag

# <blockquote>Extremely bold</blockquote>

tag['class']

# KeyError: 'class'

print(tag.get('class'))

# None

查找dom元素。
1.构建一份文档

复制代码代码示例:

html_doc = """

<html><head><title>The Dormouse's story</title></head>

<p><b>The Dormouse's story</b></p>

<p>Once upon a time there were three little sisters; and their names were

<a href="http://example.com/elsie" id="link1">Elsie</a>,

<a href="http://example.com/lacie" id="link2">Lacie</a> and

<a href="http://example.com/tillie" id="link3">Tillie</a>;

and they lived at the bottom of a well.</p>

<p>...</p>

"""

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_doc)

2.各种搞

复制代码代码示例:

soup.head

# <head><title>The Dormouse's story</title></head>

soup.title

# <title>The Dormouse's story</title>

soup.body.b

# <b>The Dormouse's story</b>

soup.a

# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

soup.find_all('a')

# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,

# <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,

# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

head_tag = soup.head

head_tag

# <head><title>The Dormouse's story</title></head>

head_tag.contents

[<title>The Dormouse's story</title>]

title_tag = head_tag.contents[0]

title_tag

# <title>The Dormouse's story</title>

title_tag.contents

# [u'The Dormouse's story']

len(soup.contents)

# 1

soup.contents[0].name

# u'html'

text = title_tag.contents[0]

text.contents

for child in title_tag.children:

  print(child)

head_tag.contents

# [<title>The Dormouse's story</title>]

for child in head_tag.descendants:

  print(child)

# <title>The Dormouse's story</title>

# The Dormouse's story

len(list(soup.children))

# 1

len(list(soup.descendants))

# 25

title_tag.string

# u'The Dormouse's story'

上一篇：Python爬虫：python抓取PM2.5浓度与排名方法示例
下一篇：返回列表

与 python beautifulSoup爬虫代码的例子有关的文章

本文标题：python beautifulSoup爬虫代码的例子
本页链接：http://www.jb200.com/article/31306.html

浏览排行

栏目分类

热点文章

python beautifulSoup爬虫代码的例子

python beautifulSoup爬虫代码的例子

python爬虫代码参考：

与 python beautifulSoup爬虫代码的例子 有关的文章

浏览排行

栏目分类

栏目导航

热点文章

与 python beautifulSoup爬虫代码的例子有关的文章