python Url模块详解_Python模块

python Url模块详解: 发布时间：2020-12-17编辑：脚本学堂

python Url模块详解

一、该模块的用途：
1、从制定的URL获取数据
2、对URL字符串进行格式化处理

二、__version__='1.17'的urllib模块中的主要函数和类介绍：
1、函数：
(1)def urlopen(url, data=None, proxies=None)
参数说明：
url    符合URL规范的字符串（包括http,ftp,gopher,local-file标准)
data 向指定的URL发送的数据字符串，GET和POST都可，但必须符合标准格式
格式为key=value&key1=value1....
proxies 代理服务器地址字典，如果未指定，在WINDOWS平台上则依据IE的设置
不支持需要验证的代理服务器
   例如:proxies = {'http': 'http://www.someproxy.com:3128'}
   该例子表示一个http代理服务器http://www.someproxy.com:3128
函数实现说明:
   该函数使用类FancyURLopener从URLopener继承来的open方法执行具体的操作。

返回值:
返回一个类似文件对象的对象(file_like) object
   该对象拥有的方法为
   read()
   readline()
   readlines()
   fileno()
   close()
   以上方法同file object的类似方法的使用方法基本一致
   info()返回从服务器传回的MIME标签头
   geturl()返回真实的URL,之所以称为真实，是因为对于某些重定向的URL,将返回被重定后的。

(2)def urlretrieve(url, filename=None, reporthook=None, data=None):
参数说明：
url 符合URL规范的字符串
filename 本地文件路径的字符串，从URL返回的数据将保存在该文件中,如果设置为None
则生成一个临时文件

reporthook 一个函数引用，自己可以任意定义该函数的行为，只需要保证函数有三个参数
urlretrieve为这个函数传递的三个参数的含义为:
第一个参数为目前为止传递的数据块数量
第二个参数为每个数据块的大小，单位为byte
第三个参数文件总的大小（某些时候可能为-1)
data 向指定的URL发送的数据字符串，GET和POST都可，但必须符合标准格式
格式为key=value&key1=value1....
函数实现说明:
该函数使用类FancyURLopener从URLopener继承来的retrieve方法执行具体的操作。
返回值：
返回一个元组 (filename, headers)
filename为参数中的 filename
headers 为从服务器传回的MIME标签头

(3)def urlcleanup():
参数说明：无参数
函数实现说明：该函数使用类FancyURLopener从URLopener继承来的cleanup方法执行具体的操作。
清除使用urlopen或urlretrieve后产生的缓存文件
返回值:无返回值

(4)def quote(s, safe = '/'):
参数说明：
s    需要转化的字符串
safe     需要保留不转化的字符序列
函数实现说明:
根据rfc 2396规定，URL保留字符为
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |"$" | ","
但是这些字符并不是在所有类型的URL中都保留不转化，
所以要求使用该函数的时候在转化不同的URL时设置不同的保留字符
其中在任意部分都不转化的字符为:大小写字母，数字，'_','.','-'
对于汉字，也可以使用，但要加上相应的编码，比如quote(u'蟒蛇'.encode('gb2312'))

该函数会将所有非保留的字符转化成%xx这样的形式，其中xx为两位的十六进制数
返回值：
转化后的字符串

(5)def quote_plus(s, safe = ''):
参数说明：
s 需要转化的字符串
safe 需要保留不转化的字符序列
函数实现说明：
与quote函数基本一致，只是把参数s中的空格转化成'+',而不是%20
返回值：
转化后的字符串

(6)def unquote(s):
参数说明：
s 需要反向转化的字符串
函数实现说明：
与quote作用相反的解析函数
返回值：
转化后的字符串

(7)def unquote_plus(s):
s 需要反向转化的字符串
函数实现说明：
与quote_plus作用相反的解析函数
返回值：
转化后的字符串

(8)def urlencode(query,doseq=0):
参数说明：
query 可以是一个由二元元组做元素的元组，也可以是一个字典
doseq 如何对参数字符串化
函数实现说明：
将成对的数据按照URL中的要求组成一个参数字符串
例如:
query = (('name','cs'),('age','1'),('height','2'))
re = urllib.urlencode(query)
print re
query1 = {'name':'cs','age':'1','height':'2'}
print urllib.urlencode(query1)
这两个的效果基本一样,但字典类型会自动排序,
输出结果为:
name=cs&age=1&height=2
age=1&name=cs&height=2

doseq参数的意义，文档太简短，尝试了一下，发现在如何字符串化上不同
query1 = {'name':'cs','age':('a','b'),'height':'1'}
print urllib.urlencode(query1,1)
print urllib.urlencode(query1,0)
输出结果为:
age=a&age=b&name=cs&height=1
age=%28%27a%27%2C+%27b%27%29&name=cs&height=1
返回值:
经过拚接的参数字符串

(9)def url2pathname(pathname):
参数说明：
pathname URL字符串
函数实现说明:
该函数将根据操作系统确定如何转化URL中的'/'为'' ，
然后其他的就和quote行为一样了(就是将字符串传递给quote来处理的)
返回值:
符合本地文件路径格式的字符串

(10)def pathname2url(pathname):
参数说明：
pathname 符合本地文件路径命名的格式的字符串
函数实现说明:
该函数将根据操作系统确定如何转化URL中的'/'为'' ，
然后其他的就和unquote行为一样了(就是将字符串传递给unquote来处理的)
返回值:
符合URL标准的字符串

（注：9，10这两个函数一般不直接用，而是在提供的以统一的方法定位网络和本地资源的接口函数中使用）

(11)def splittag(url):
(12)def localhost():
(13)def thishost():
(14)def ftperrors():
(15)def unwrap(url):
(16)def splittype(url):
(17)def splithost(url):
(18)def splituser(host):
(19)def splitpasswd(user):
(20)def splitport(host):
(21)def splitnport(host, defport=-1):
(22)def splitquery(url):
(23)def splitattr(url):
(24)def splitvalue(attr):
(25)def splitgophertype(selector):
(26)def getproxies_environment():

(11)---(26)这些函数都是作为工具函数来使用的，主要用来对URL中的数据进行分割，如果需要添加新的协议，需要对这些函数做修改，其他的时候不会直接使用。

2、类:
（1）class URLopener:
类说明:
    该类只是一个基础类，提供了基本的和服务器打交道的方法,缺乏具体的错误处理方法，
    如果需要类对具体的错误具有相应的处理功能，可以从该类继承,
    然后在子类里添加相应的错误处理方法.

该类的方法:
   def __init__(self, proxies=None, **x509):
   def close(self):
    close只是调用了cleanup方法
   def cleanup(self):
    如果在子类中覆写了该方法，需要注意，不能在该方法中
    使用任何全局对象，或者从其他模块引入的对象
    因为该方法不能保证在其他的对象销毁后不被调用

   def addheader(self, *args):
    添加MIME标签头的内容
   def open(self, fullurl, data=None):
    与urlopen函数的功能基本一样
   def open_unknown(self, fullurl, data=None):
   def open_unknown_proxy(self, proxy, fullurl, data=None):
    在open方法里，如果传入的url（可能是代理服务器）不能被自动识别出来，
    则把相应的url传入open_unknown或open_unknown_proxy
    然后引发一个异常
   def retrieve(self, url, filename=None, reporthook=None, data=None):
    与urlretrieve函数的基本功能一样
   def http_error(self, url, fp, errcode, errmsg, headers, data=None):
   def http_error_default(self, url, fp, errcode, errmsg, headers):
    错误处理方法将去寻找形如http_error_‘errorcode’的错误处理方法
    添加错误处理方法:应该在该该类的子类中添加这种形式的错误处理方法
    可以参考类FancyURLopener中的错误处理方法
   def open_http(self, url, data=None):
   def open_gopher(self, url):
   def open_file(self, url):
   def open_local_file(self, url):
   def open_ftp(self, url):
   def open_data(self, url, data=None):
    在open方法里，根据不同的url将使用上边这些方法打开
    如果要加入新的协议，需要在子类中添加一个open_XXX的方法
    然后修改splittype，splithost等工具函数，添加相关的信息

(2)class FancyURLopener(URLopener):
类说明:
为HTTP协议增加了错误处理方法，如果需要增加自己的错误处理方法，只需要添加形如http_error_%d的方法，%d应该为错误号。
编写自己的URL处理类可以参考该类
该类的方法:

这些错误处理方法，会被URLopener类的错误处理函数自动搜索到
def __init__(self, *args, **kwargs):
def http_error_default(self, url, fp, errcode, errmsg, headers):
def http_error_302(self, url, fp, errcode, errmsg, headers, data=None):
def redirect_internal(self, url, fp, errcode, errmsg, headers, data):

http协议返回的是302错误，然后确定不是500错误后，使用方法redirect_internal
def http_error_301(self, url, fp, errcode, errmsg, headers, data=None):
def http_error_303(self, url, fp, errcode, errmsg, headers, data=None):
def http_error_307(self, url, fp, errcode, errmsg, headers, data=None):
def http_error_401(self, url, fp, errcode, errmsg, headers, data=None):
def http_error_407(self, url, fp, errcode, errmsg, headers, data=None):
def retry_proxy_http_basic_auth(self, url, realm, data=None):
def retry_proxy_https_basic_auth(self, url, realm, data=None):
def retry_http_basic_auth(self, url, realm, data=None):
def retry_https_basic_auth(self, url, realm, data=None):
def get_user_passwd(self, host, realm, clear_cache = 0):
def prompt_user_passwd(self, host, realm):
当访问需要被授权时,该类将使用该方法,让用户输入认证信息，从该类继承的子类,可以覆写该方法,以实现更多的控制功能。
以下为该方法的源代码

复制代码代码如下:

def prompt_user_passwd(self, host, realm):

"""Override this in a GUI environment!"""

import getpass

try:

   user = raw_input("Enter username for %s at %s: " % (realm,host))

   passwd = getpass.getpass("Enter password for %s in %s at %s: " %(user, realm, host))

   return user, passwd

     except KeyboardInterrupt:

   print

   return None, None

上一篇：python sys模块的一些常用功能
下一篇：Python zipfile模块学习

与 python Url模块详解有关的文章

本文标题：python Url模块详解
本页链接：http://www.jb200.com/python/2696.htm

浏览排行

栏目分类

热点文章

python Url模块详解