python抓取网页中图片与网页内容_python网络编程

当前位置：首页 > 脚本编程 > python > python网络编程 > 正文

python抓取网页中图片与网页内容: 发布时间：2020-05-26编辑：脚本学堂

分享二例python代码，分别用于抓取网页中图片与网页内容，有需要的朋友做个参考。

1，pythonzhuaqu/ target=_blank class=infotextkey>python抓取网页中图片。

复制代码代码示例:

#coding:utf8
import re
import urllib
def getHTML(url):
    page = urllib.urlopen(url)
    html = page.read()
    return html
def getImg(html,imgType):
    reg = r'src="(.*?.+'+imgType+'!slider)" '
    imgre = re.compile(reg)
    imgList = re.findall(imgre, html)
    x=0
    for imgurl in imgList:
        print imgurl
        urllib.urlretrieve(imgurl, '%s.%s' % (x, imgType))
        x =x+1

html= getHTML("http://www.jb200.com")
getImg(html,'jpg')

2，python抓取网页内容

python抓取网页内容示例，在抓取时对于gbk编码网页需要做好编码转换。

复制代码代码示例:

import socket
def open_tcp_socket(remotehost,servicename):
    s=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
    portnumber=socket.getservbyname(servicename,'tcp')
    s.connect((remotehost,portnumber))
    return s
mysocket=open_tcp_socket('www.taobao.com','http')
mysocket.send('hello')
while(1):
    data=mysocket.recv(1024)
    if(data):
        print data.decode('gbk').encode('utf-8')#对于gbk编码网页必须这样转化一下
    else:
        break
mysocket.close()

上一篇：python多线程实例学习
下一篇：python获取进程pid号

与 python抓取网页中图片与网页内容有关的文章

本文标题：python抓取网页中图片与网页内容
本页链接：http://www.jb200.com/article/17520.html

python抓取网页中图片与网页内容