5
5
----------
6
6
问题
7
7
----------
8
- You want to write C extension code that consumes data from any Python file-like object
9
- (e.g., normal files, StringIO objects, etc.).
8
+ 你要写C扩展来读取来自任何Python类文件对象中的数据(比如普通文件、StringIO对象等)。
10
9
11
10
|
12
11
13
12
----------
14
13
解决方案
15
14
----------
16
- To consume data on a file-like object, you need to repeatedly invoke its read() method
17
- and take steps to properly decode the resulting data.
18
- Here is a sample C extension function that merely consumes all of the data on a file-like
19
- object and dumps it to standard output so you can see it:
20
-
21
- #define CHUNK_SIZE 8192
22
-
23
- /* Consume a "file-like" object and write bytes to stdout */
24
- static PyObject *py_consume_file(PyObject *self, PyObject *args) {
25
- PyObject *obj;
26
- PyObject *read_meth;
27
- PyObject *result = NULL;
28
- PyObject *read_args;
29
-
30
- if (!PyArg_ParseTuple(args,"O", &obj)) {
31
- return NULL;
32
- }
33
-
34
- /* Get the read method of the passed object */
35
- if ((read_meth = PyObject_GetAttrString(obj, "read")) == NULL) {
36
- return NULL;
37
- }
38
-
39
- /* Build the argument list to read() */
40
- read_args = Py_BuildValue("(i)", CHUNK_SIZE);
41
- while (1) {
42
- PyObject *data;
43
- PyObject *enc_data;
44
- char *buf;
45
- Py_ssize_t len;
46
-
47
- /* Call read() */
48
- if ((data = PyObject_Call(read_meth, read_args, NULL)) == NULL) {
49
- goto final;
15
+ 要读取一个类文件对象的数据,你需要重复调用 ``read() `` 方法,然后正确的解码获得的数据。
16
+
17
+ 下面是一个C扩展函数例子,仅仅只是读取一个类文件对象中的所有数据并将其输出到标准输出:
18
+
19
+ ::
20
+
21
+ #define CHUNK_SIZE 8192
22
+
23
+ /* Consume a "file-like" object and write bytes to stdout */
24
+ static PyObject *py_consume_file(PyObject *self, PyObject *args) {
25
+ PyObject *obj;
26
+ PyObject *read_meth;
27
+ PyObject *result = NULL;
28
+ PyObject *read_args;
29
+
30
+ if (!PyArg_ParseTuple(args,"O", &obj)) {
31
+ return NULL;
32
+ }
33
+
34
+ /* Get the read method of the passed object */
35
+ if ((read_meth = PyObject_GetAttrString(obj, "read")) == NULL) {
36
+ return NULL;
37
+ }
38
+
39
+ /* Build the argument list to read() */
40
+ read_args = Py_BuildValue("(i)", CHUNK_SIZE);
41
+ while (1) {
42
+ PyObject *data;
43
+ PyObject *enc_data;
44
+ char *buf;
45
+ Py_ssize_t len;
46
+
47
+ /* Call read() */
48
+ if ((data = PyObject_Call(read_meth, read_args, NULL)) == NULL) {
49
+ goto final;
50
+ }
51
+
52
+ /* Check for EOF */
53
+ if (PySequence_Length(data) == 0) {
54
+ Py_DECREF(data);
55
+ break;
56
+ }
57
+
58
+ /* Encode Unicode as Bytes for C */
59
+ if ((enc_data=PyUnicode_AsEncodedString(data,"utf-8","strict"))==NULL) {
60
+ Py_DECREF(data);
61
+ goto final;
62
+ }
63
+
64
+ /* Extract underlying buffer data */
65
+ PyBytes_AsStringAndSize(enc_data, &buf, &len);
66
+
67
+ /* Write to stdout (replace with something more useful) */
68
+ write(1, buf, len);
69
+
70
+ /* Cleanup */
71
+ Py_DECREF(enc_data);
72
+ Py_DECREF(data);
73
+ }
74
+ result = Py_BuildValue("");
75
+
76
+ final:
77
+ /* Cleanup */
78
+ Py_DECREF(read_meth);
79
+ Py_DECREF(read_args);
80
+ return result;
50
81
}
51
82
52
- /* Check for EOF */
53
- if (PySequence_Length(data) == 0) {
54
- Py_DECREF(data);
55
- break;
56
- }
83
+ 要测试这个代码,先构造一个类文件对象比如一个StringIO实例,然后传递进来:
57
84
58
- /* Encode Unicode as Bytes for C */
59
- if ((enc_data=PyUnicode_AsEncodedString(data,"utf-8","strict"))==NULL) {
60
- Py_DECREF(data);
61
- goto final;
62
- }
85
+ ::
63
86
64
- /* Extract underlying buffer data */
65
- PyBytes_AsStringAndSize(enc_data, &buf, &len);
66
-
67
- /* Write to stdout (replace with something more useful) */
68
- write(1, buf, len);
69
-
70
- /* Cleanup */
71
- Py_DECREF(enc_data);
72
- Py_DECREF(data);
73
- }
74
- result = Py_BuildValue("");
75
-
76
- final:
77
- /* Cleanup */
78
- Py_DECREF(read_meth);
79
- Py_DECREF(read_args);
80
- return result;
81
- }
82
-
83
- To test the code, try making a file-like object such as a StringIO instance and pass it in:
84
-
85
- >>> import io
86
- >>> f = io.StringIO(' Hello\n World\n ' )
87
- >>> import sample
88
- >>> sample.consume_file(f)
89
- Hello
90
- World
91
- >>>
87
+ >>> import io
88
+ >>> f = io.StringIO('Hello\nWorld\n')
89
+ >>> import sample
90
+ >>> sample.consume_file(f)
91
+ Hello
92
+ World
93
+ >>>
92
94
93
95
|
94
96
95
97
----------
96
98
讨论
97
99
----------
98
- Unlike a normal system file, a file-like object is not necessarily built around a low-level
99
- file descriptor. Thus, you can’t use normal C library functions to access it. Instead, you
100
- need to use Python’s C API to manipulate the file-like object much like you would in
101
- Python.
102
- In the solution, the read() method is extracted from the passed object. An argument
103
- list is built and then repeatedly passed to PyObject_Call() to invoke the method. To
104
- detect end-of-file ( EOF), PySequence_Length() is used to see if the returned result has
105
- zero length.
106
- For all I/O operations, you’ll need to concern yourself with the underlying encoding
107
- and distinction between bytes and Unicode. This recipe shows how to read a file in text
108
- mode and decode the resulting text into a bytes encoding that can be used by C. If you
109
- want to read the file in binary mode, only minor changes will be made. For example :
110
-
111
- ...
100
+ 和普通系统文件不同的是,一个类文件对象并不需要使用低级文件描述符来构建。
101
+ 因此,你不能使用普通的C库函数来访问它。
102
+ 你需要使用Python的C API来像普通文件类似的那样操作类文件对象。
103
+
104
+ 在我们的解决方案中,`` read() `` 方法从被传递的对象中提取出来。
105
+ 一个参数列表被构建然后不断的被传给 `` PyObject_Call() `` 来调用这个方法。
106
+ 要检查文件末尾( EOF),使用了 `` PySequence_Length() `` 来查看是否返回对象长度为0.
107
+
108
+ 对于所有的I/O操作,你需要关注底层的编码格式,还有字节和Unicode之前的区别。
109
+ 本节演示了如何以文本模式读取一个文件并将结果文本解码为一个字节编码,这样在C中就可以使用它了。
110
+ 如果你想以二进制模式读取文件,只需要修改一点点即可,例如:
111
+ : :
112
+
113
+ ...
112
114
/* Call read() */
113
115
if ((data = PyObject_Call(read_meth, read_args, NULL)) == NULL) {
114
116
goto final;
@@ -129,15 +131,14 @@ want to read the file in binary mode, only minor changes will be made. For examp
129
131
PyBytes_AsStringAndSize(data, &buf, &len);
130
132
...
131
133
132
- The trickiest part of this recipe concerns proper memory management. When working
133
- with PyObject * variables, careful attention needs to be given to managing reference
134
- counts and cleaning up values when no longer needed. The various Py_DECREF() calls
135
- are doing this.
136
- The recipe is written in a general-purpose manner so that it can be adapted to other file
137
- operations, such as writing. For example, to write data, merely obtain the write()
138
- method of the file-like object, convert data into an appropriate Python object (bytes or
139
- Unicode), and invoke the method to have it written to the file.
140
- Finally, although file-like objects often provide other methods (e.g., readline(),
141
- read_into()), it is probably best to just stick with the basic read() and write() meth‐
142
- ods for maximal portability. Keeping things as simple as possible is often a good policy
143
- for C extensions.
134
+ 本节最难的地方在于如何进行正确的内存管理。
135
+ 当处理 ``PyObject * `` 变量的时候,需要注意管理引用计数以及在不需要的变量的时候清理它们的值。
136
+ 对 ``Py_DECREF() `` 的调用就是来做这个的。
137
+
138
+ 本节代码以一种通用方式编写,因此他也能适用于其他的文件操作,比如写文件。
139
+ 例如,要写数据,只需要获取类文件对象的 ``write() `` 方法,将数据转换为合适的Python对象
140
+ (字节或Unicode),然后调用该方法将输入写入到文件。
141
+
142
+ 最后,尽管类文件对象通常还提供其他方法(比如readline(), read_info()),
143
+ 我们最好只使用基本的 ``read() `` 和 ``write() `` 方法。
144
+ 在写C扩展的时候,能简单就尽量简单。
0 commit comments