|
5 | 5 | ----------
|
6 | 6 | 问题
|
7 | 7 | ----------
|
8 |
| -You need to access various services via HTTP as a client. For example, downloading |
9 |
| -data or interacting with a REST-based API. |
10 |
| - |
| 8 | +你需要以客户端的方式通过HTTP协议方位多种服务。例如,下载数据或者与基于REST的API进行交互。 |
11 | 9 | |
|
12 | 10 |
|
13 | 11 | ----------
|
14 | 12 | 解决方案
|
15 | 13 | ----------
|
16 |
| -For simple things, it’s usually easy enough to use the urllib.request module. For |
17 |
| -example, to send a simple HTTP GET request to a remote service, do something like this: |
| 14 | +对于简单的事情来说,通常使用 ``urllib.request`` 模块就够了。例如,发送一个简单的HTTP GET请求到远程的服务上,可以这样做: |
18 | 15 |
|
19 |
| -from urllib import request, parse |
| 16 | +.. code-block:: python |
20 | 17 |
|
21 |
| -# Base URL being accessed |
22 |
| -url = 'http://httpbin.org/get' |
| 18 | + from urllib import request, parse |
23 | 19 |
|
24 |
| -# Dictionary of query parameters (if any) |
25 |
| -parms = { |
26 |
| - 'name1' : 'value1', |
27 |
| - 'name2' : 'value2' |
28 |
| -} |
| 20 | + # Base URL being accessed |
| 21 | + url = 'http://httpbin.org/get' |
29 | 22 |
|
30 |
| -# Encode the query string |
31 |
| -querystring = parse.urlencode(parms) |
| 23 | + # Dictionary of query parameters (if any) |
| 24 | + parms = { |
| 25 | + 'name1' : 'value1', |
| 26 | + 'name2' : 'value2' |
| 27 | + } |
32 | 28 |
|
33 |
| -# Make a GET request and read the response |
34 |
| -u = request.urlopen(url+'?' + querystring) |
35 |
| -resp = u.read() |
| 29 | + # Encode the query string |
| 30 | + querystring = parse.urlencode(parms) |
36 | 31 |
|
37 |
| -If you need to send the query parameters in the request body using a POST method, |
38 |
| -encode them and supply them as an optional argument to urlopen() like this: |
| 32 | + # Make a GET request and read the response |
| 33 | + u = request.urlopen(url+'?' + querystring) |
| 34 | + resp = u.read() |
39 | 35 |
|
40 |
| -from urllib import request, parse |
| 36 | +如果你需要使用POST方法在请求主体中发送查询参数,可以将参数编码后作为可选参数提供给 ``URLopen()`` 函数,就像这样: |
41 | 37 |
|
42 |
| -# Base URL being accessed |
43 |
| -url = 'http://httpbin.org/post' |
| 38 | +.. code-block:: python |
44 | 39 |
|
45 |
| -# Dictionary of query parameters (if any) |
46 |
| -parms = { |
47 |
| - 'name1' : 'value1', |
48 |
| - 'name2' : 'value2' |
49 |
| -} |
| 40 | + from urllib import request, parse |
50 | 41 |
|
51 |
| -# Encode the query string |
52 |
| -querystring = parse.urlencode(parms) |
| 42 | + # Base URL being accessed |
| 43 | + url = 'http://httpbin.org/post' |
53 | 44 |
|
54 |
| -# Make a POST request and read the response |
55 |
| -u = request.urlopen(url, querystring.encode('ascii')) |
56 |
| -resp = u.read() |
| 45 | + # Dictionary of query parameters (if any) |
| 46 | + parms = { |
| 47 | + 'name1' : 'value1', |
| 48 | + 'name2' : 'value2' |
| 49 | + } |
57 | 50 |
|
58 |
| -If you need to supply some custom HTTP headers in the outgoing request such as a |
59 |
| -change to the user-agent field, make a dictionary containing their value and create a |
60 |
| -Request instance and pass it to urlopen() like this: |
| 51 | + # Encode the query string |
| 52 | + querystring = parse.urlencode(parms) |
61 | 53 |
|
62 |
| -from urllib import request, parse |
63 |
| -... |
| 54 | + # Make a POST request and read the response |
| 55 | + u = request.urlopen(url, querystring.encode('ascii')) |
| 56 | + resp = u.read() |
64 | 57 |
|
65 |
| -# Extra headers |
66 |
| -headers = { |
67 |
| - 'User-agent' : 'none/ofyourbusiness', |
68 |
| - 'Spam' : 'Eggs' |
69 |
| -} |
| 58 | +如果你需要在发出的请求中提供一些自定义的HTTP头,例如修改 ``user-agent`` 字段,可以创建一个包含字段值的字典,并创建一个Request实例然后将其传给 ``urlopen()`` ,如下: |
70 | 59 |
|
71 |
| -req = request.Request(url, querystring.encode('ascii'), headers=headers) |
| 60 | +.. code-block:: python |
72 | 61 |
|
73 |
| -# Make a request and read the response |
74 |
| -u = request.urlopen(req) |
75 |
| -resp = u.read() |
| 62 | + from urllib import request, parse |
| 63 | + ... |
76 | 64 |
|
77 |
| -If your interaction with a service is more complicated than this, you should probably |
78 |
| -look at the requests library. For example, here is equivalent requests code for the |
79 |
| -preceding operations: |
| 65 | + # Extra headers |
| 66 | + headers = { |
| 67 | + 'User-agent' : 'none/ofyourbusiness', |
| 68 | + 'Spam' : 'Eggs' |
| 69 | + } |
80 | 70 |
|
81 |
| -import requests |
| 71 | + req = request.Request(url, querystring.encode('ascii'), headers=headers) |
82 | 72 |
|
83 |
| -# Base URL being accessed |
84 |
| -url = 'http://httpbin.org/post' |
| 73 | + # Make a request and read the response |
| 74 | + u = request.urlopen(req) |
| 75 | + resp = u.read() |
85 | 76 |
|
86 |
| -# Dictionary of query parameters (if any) |
87 |
| -parms = { |
88 |
| - 'name1' : 'value1', |
89 |
| - 'name2' : 'value2' |
90 |
| -} |
| 77 | +如果需要交互的服务比上面的例子都要复杂,也许应该去看看 requests 库(https://pypi.python.org/pypi/requests)。例如,下面这个示例采用requests库重新实现了上面的操作: |
91 | 78 |
|
92 |
| -# Extra headers |
93 |
| -headers = { |
94 |
| - 'User-agent' : 'none/ofyourbusiness', |
95 |
| - 'Spam' : 'Eggs' |
96 |
| -} |
| 79 | +.. code-block:: python |
97 | 80 |
|
98 |
| -resp = requests.post(url, data=parms, headers=headers) |
| 81 | + import requests |
99 | 82 |
|
100 |
| -# Decoded text returned by the request |
101 |
| -text = resp.text |
| 83 | + # Base URL being accessed |
| 84 | + url = 'http://httpbin.org/post' |
102 | 85 |
|
103 |
| -A notable feature of requests is how it returns the resulting response content from a |
104 |
| -request. As shown, the resp.text attribute gives you the Unicode decoded text of a |
105 |
| -request. However, if you access resp.content, you get the raw binary content instead. |
106 |
| -On the other hand, if you access resp.json, then you get the response content inter‐ |
107 |
| -preted as JSON. |
108 |
| -Here is an example of using requests to make a HEAD request and extract a few fields |
109 |
| -of header data from the response: |
| 86 | + # Dictionary of query parameters (if any) |
| 87 | + parms = { |
| 88 | + 'name1' : 'value1', |
| 89 | + 'name2' : 'value2' |
| 90 | + } |
110 | 91 |
|
111 |
| -import requests |
| 92 | + # Extra headers |
| 93 | + headers = { |
| 94 | + 'User-agent' : 'none/ofyourbusiness', |
| 95 | + 'Spam' : 'Eggs' |
| 96 | + } |
112 | 97 |
|
113 |
| -resp = requests.head('http://www.python.org/index.html') |
| 98 | + resp = requests.post(url, data=parms, headers=headers) |
114 | 99 |
|
115 |
| -status = resp.status_code |
116 |
| -last_modified = resp.headers['last-modified'] |
117 |
| -content_type = resp.headers['content-type'] |
118 |
| -content_length = resp.headers['content-length'] |
| 100 | + # Decoded text returned by the request |
| 101 | + text = resp.text |
119 | 102 |
|
120 |
| -Here is a requests example that executes a login into the Python Package index using |
121 |
| -basic authentication: |
122 |
| -import requests |
| 103 | +关于requests库,一个值得一提的特性就是它能以多种凡是从请求中返回响应结果的内容。从上面的代码来看, ``resp.text`` 带给我们的是以Unicode解码的响应文本。但是,如果去访问 ``resp.content`` ,就会得到原始的二进制数据。另一方面,如果访问 ``resp.json``` ,那么就会得到JSON格式的响应内容。 |
123 | 104 |
|
124 |
| -resp = requests.get('http://pypi.python.org/pypi?:action=login', |
125 |
| - auth=('user','password')) |
| 105 | +下面这个示例利用 ``requests`` 库发起一个HEAD请求,并从响应中提取出一些HTTP头数据的字段: |
126 | 106 |
|
127 |
| -Here is an example of using requests to pass HTTP cookies from one request to the |
128 |
| -next: |
| 107 | +.. code-block:: python |
129 | 108 |
|
130 |
| -import requests |
| 109 | + import requests |
131 | 110 |
|
132 |
| -# First request |
133 |
| -resp1 = requests.get(url) |
134 |
| -... |
| 111 | + resp = requests.head('http://www.python.org/index.html') |
135 | 112 |
|
136 |
| -# Second requests with cookies received on first requests |
137 |
| -resp2 = requests.get(url, cookies=resp1.cookies) |
| 113 | + status = resp.status_code |
| 114 | + last_modified = resp.headers['last-modified'] |
| 115 | + content_type = resp.headers['content-type'] |
| 116 | + content_length = resp.headers['content-length'] |
138 | 117 |
|
139 |
| -Last, but not least, here is an example of using requests to upload content: |
| 118 | + Here is a requests example that executes a login into the Python Package index using |
| 119 | + basic authentication: |
| 120 | + import requests |
140 | 121 |
|
141 |
| -import requests |
142 |
| -url = 'http://httpbin.org/post' |
143 |
| -files = { 'file': ('data.csv', open('data.csv', 'rb')) } |
| 122 | + resp = requests.get('http://pypi.python.org/pypi?:action=login', |
| 123 | + auth=('user','password')) |
144 | 124 |
|
145 |
| -r = requests.post(url, files=files) |
| 125 | + Here is an example of using requests to pass HTTP cookies from one request to the |
| 126 | + next: |
| 127 | +
|
| 128 | + import requests |
| 129 | +
|
| 130 | + # First request |
| 131 | + resp1 = requests.get(url) |
| 132 | + ... |
| 133 | +
|
| 134 | + # Second requests with cookies received on first requests |
| 135 | + resp2 = requests.get(url, cookies=resp1.cookies) |
| 136 | +
|
| 137 | + Last, but not least, here is an example of using requests to upload content: |
| 138 | +
|
| 139 | + import requests |
| 140 | + url = 'http://httpbin.org/post' |
| 141 | + files = { 'file': ('data.csv', open('data.csv', 'rb')) } |
| 142 | +
|
| 143 | + r = requests.post(url, files=files) |
146 | 144 |
|
147 |
| -| |
148 | 145 |
|
149 | 146 | ----------
|
150 | 147 | 讨论
|
151 | 148 | ----------
|
152 |
| -For really simple HTTP client code, using the built-in urllib module is usually fine. |
153 |
| -However, if you have to do anything other than simple GET or POST requests, you really |
154 |
| -can’t rely on its functionality. This is where a third-party module, such as requests, |
155 |
| -comes in handy. |
156 |
| -For example, if you decided to stick entirely with the standard library instead of a library |
157 |
| -like requests, you might have to implement your code using the low-level http.cli |
158 |
| -ent module instead. For example, this code shows how to execute a HEAD request: |
159 |
| - |
160 |
| -from http.client import HTTPConnection |
161 |
| -from urllib import parse |
162 |
| - |
163 |
| -c = HTTPConnection('www.python.org', 80) |
164 |
| -c.request('HEAD', '/index.html') |
165 |
| -resp = c.getresponse() |
166 |
| - |
167 |
| -print('Status', resp.status) |
168 |
| -for name, value in resp.getheaders(): |
169 |
| - print(name, value) |
170 |
| - |
171 |
| -Similarly, if you have to write code involving proxies, authentication, cookies, and other |
172 |
| -details, using urllib is awkward and verbose. For example, here is a sample of code that |
173 |
| -authenticates to the Python package index: |
174 |
| - |
175 |
| -import urllib.request |
176 |
| - |
177 |
| -auth = urllib.request.HTTPBasicAuthHandler() |
178 |
| -auth.add_password('pypi','http://pypi.python.org','username','password') |
179 |
| -opener = urllib.request.build_opener(auth) |
180 |
| - |
181 |
| -r = urllib.request.Request('http://pypi.python.org/pypi?:action=login') |
182 |
| -u = opener.open(r) |
183 |
| -resp = u.read() |
184 |
| - |
185 |
| -# From here. You can access more pages using opener |
186 |
| -... |
187 |
| - |
188 |
| -Frankly, all of this is much easier in requests. |
189 |
| -Testing HTTP client code during development can often be frustrating because of all |
190 |
| -the tricky details you need to worry about (e.g., cookies, authentication, headers, en‐ |
191 |
| -codings, etc.). To do this, consider using the httpbin service. This site receives requests |
192 |
| -and then echoes information back to you in the form a JSON response. Here is an |
193 |
| -interactive example: |
194 |
| - |
195 |
| ->>> import requests |
196 |
| ->>> r = requests.get('http://httpbin.org/get?name=Dave&n=37', |
197 |
| -... headers = { 'User-agent': 'goaway/1.0' }) |
198 |
| ->>> resp = r.json |
199 |
| ->>> resp['headers'] |
200 |
| -{'User-Agent': 'goaway/1.0', 'Content-Length': '', 'Content-Type': '', |
201 |
| -'Accept-Encoding': 'gzip, deflate, compress', 'Connection': |
202 |
| -'keep-alive', 'Host': 'httpbin.org', 'Accept': '*/*'} |
203 |
| ->>> resp['args'] |
204 |
| -{'name': 'Dave', 'n': '37'} |
205 |
| ->>> |
206 |
| - |
207 |
| -Working with a site such as httpbin.org is often preferable to experimenting with a real |
208 |
| -site—especially if there’s a risk it might shut down your account after three failed login |
209 |
| -attempts (i.e., don’t try to learn how to write an HTTP authentication client by logging |
210 |
| -into your bank). |
211 |
| -Although it’s not discussed here, requests provides support for many more advanced |
212 |
| -HTTP-client protocols, such as OAuth. The requests documentation is excellent (and |
213 |
| -frankly better than anything that could be provided in this short space). Go there for |
214 |
| -more information. |
| 149 | +对于真的很简单HTTP客户端代码,用内置的 ``urllib`` 模块通常就足够了。但是,如果你要做的不仅仅只是简单的GET或POST请求,那就真的不能在依赖它的功能了。这时候就是第三方模块比如 ``requests`` 大显身手的时候了。 |
| 150 | + |
| 151 | +例如,如果你决定坚持使用标准的程序库而不考虑像 ``requests`` 这样的第三方库,那么也许就不得不使用底层的 ``http.client`` 模块来实现自己的代码。比方说,下面的代码展示了如何执行一个HEAD请求: |
| 152 | + |
| 153 | +.. code-block:: python |
| 154 | +
|
| 155 | + from http.client import HTTPConnection |
| 156 | + from urllib import parse |
| 157 | +
|
| 158 | + c = HTTPConnection('www.python.org', 80) |
| 159 | + c.request('HEAD', '/index.html') |
| 160 | + resp = c.getresponse() |
| 161 | +
|
| 162 | + print('Status', resp.status) |
| 163 | + for name, value in resp.getheaders(): |
| 164 | + print(name, value) |
| 165 | +
|
| 166 | +
|
| 167 | +同样地,如果必须编写涉及代理、认证、cookies以及其他一些细节方面的代码,那么使用 ``urllib`` 就显得特别别扭和啰嗦。比方说,下面这个示例实现在Python包索引上的认证: |
| 168 | + |
| 169 | +.. code-block:: python |
| 170 | +
|
| 171 | + import urllib.request |
| 172 | +
|
| 173 | + auth = urllib.request.HTTPBasicAuthHandler() |
| 174 | + auth.add_password('pypi','http://pypi.python.org','username','password') |
| 175 | + opener = urllib.request.build_opener(auth) |
| 176 | +
|
| 177 | + r = urllib.request.Request('http://pypi.python.org/pypi?:action=login') |
| 178 | + u = opener.open(r) |
| 179 | + resp = u.read() |
| 180 | +
|
| 181 | + # From here. You can access more pages using opener |
| 182 | + ... |
| 183 | +
|
| 184 | +坦白说,所有的这些操作在 ``equests`` 库中都变得简单的多。 |
| 185 | + |
| 186 | +在开发过程中测试HTTP客户端代码常常是很令人沮丧的,因为所有棘手的细节问题都需要考虑(例如cookies、认证、HTTP头、编码方式等)。要完成这些任务,考虑使用httpbin服务(http://httpbin.org)。这个站点会接收发出的请求,然后以JSON的形式将相应信息回传回来。下面是一个交互式的例子: |
| 187 | + |
| 188 | +.. code-block:: python |
| 189 | +
|
| 190 | + >>> import requests |
| 191 | + >>> r = requests.get('http://httpbin.org/get?name=Dave&n=37', |
| 192 | + ... headers = { 'User-agent': 'goaway/1.0' }) |
| 193 | + >>> resp = r.json |
| 194 | + >>> resp['headers'] |
| 195 | + {'User-Agent': 'goaway/1.0', 'Content-Length': '', 'Content-Type': '', |
| 196 | + 'Accept-Encoding': 'gzip, deflate, compress', 'Connection': |
| 197 | + 'keep-alive', 'Host': 'httpbin.org', 'Accept': '*/*'} |
| 198 | + >>> resp['args'] |
| 199 | + {'name': 'Dave', 'n': '37'} |
| 200 | + >>> |
| 201 | +
|
| 202 | +在要同一个真正的站点进行交互前,先在 httpbin.org 这样的万展上做实验常常是可取的办法。尤其是当我们面对3次登录失败就会关闭账户这样的风险时尤为有用(不要尝试自己编写HTTP认证客户端来登录你的银行账户)。 |
| 203 | + |
| 204 | +尽管本节没有涉及, ``request`` 库还对许多高级的HTTP客户端协议提供了支持,比如OAuth。 ``requests`` 模块的文档(http://docs.python-requests.org)质量很高(坦白说比在这短短的一节的篇幅中所提供的任何信息都好),可以参考文档以获得更多地信息。 |
0 commit comments