|
5 | 5 | ----------
|
6 | 6 | 问题
|
7 | 7 | ----------
|
8 |
| -You want to convert strings from C to Python bytes or a string object. |
| 8 | +怎样将C中的字符串转换为Python字节或一个字符串对象? |
9 | 9 |
|
10 | 10 | |
|
11 | 11 |
|
12 | 12 | ----------
|
13 | 13 | 解决方案
|
14 | 14 | ----------
|
15 |
| -For C strings represented as a pair char *, int, you must decide whether or not you |
16 |
| -want the string presented as a raw byte string or as a Unicode string. Byte objects can |
17 |
| -be built using Py_BuildValue() as follows: |
| 15 | +C字符串使用一对 ``char *`` 和 ``int`` 来表示, |
| 16 | +你需要决定字符串到底是用一个原始字节字符串还是一个Unicode字符串来表示。 |
| 17 | +字节对象可以像下面这样使用 ``Py_BuildValue()`` 来构建: |
18 | 18 |
|
19 |
| -char *s; /* Pointer to C string data */ |
20 |
| -int len; /* Length of data */ |
| 19 | +:: |
21 | 20 |
|
22 |
| -/* Make a bytes object */ |
23 |
| -PyObject *obj = Py_BuildValue("y#", s, len); |
| 21 | + char *s; /* Pointer to C string data */ |
| 22 | + int len; /* Length of data */ |
24 | 23 |
|
25 |
| -If you want to create a Unicode string and you know that s points to data encoded as |
26 |
| -UTF-8, you can use the following: |
| 24 | + /* Make a bytes object */ |
| 25 | + PyObject *obj = Py_BuildValue("y#", s, len); |
27 | 26 |
|
28 |
| -PyObject *obj = Py_BuildValue("s#", s, len); |
| 27 | +如果你要创建一个Unicode字符串,并且你知道 ``s`` 指向了UTF-8编码的数据,可以使用下面的方式: |
29 | 28 |
|
30 |
| -If s is encoded in some other known encoding, you can make a string using PyUni |
31 |
| -code_Decode() as follows: |
| 29 | +:: |
32 | 30 |
|
33 |
| -PyObject *obj = PyUnicode_Decode(s, len, "encoding", "errors"); |
| 31 | + PyObject *obj = Py_BuildValue("s#", s, len); |
34 | 32 |
|
35 |
| -/* Examples /* |
36 |
| -obj = PyUnicode_Decode(s, len, "latin-1", "strict"); |
37 |
| -obj = PyUnicode_Decode(s, len, "ascii", "ignore"); |
| 33 | +如果 ``s`` 使用其他编码方式,那么可以像下面使用 ``PyUnicode_Decode()`` 来构建一个字符串: |
38 | 34 |
|
39 |
| -If you happen to have a wide string represented as a wchar_t *, len pair, there are a |
40 |
| -few options. First, you could use Py_BuildValue() as follows: |
| 35 | +:: |
41 | 36 |
|
42 |
| -wchar_t *w; /* Wide character string */ |
43 |
| -int len; /* Length */ |
| 37 | + PyObject *obj = PyUnicode_Decode(s, len, "encoding", "errors"); |
44 | 38 |
|
45 |
| -PyObject *obj = Py_BuildValue("u#", w, len); |
| 39 | + /* Examples /* |
| 40 | + obj = PyUnicode_Decode(s, len, "latin-1", "strict"); |
| 41 | + obj = PyUnicode_Decode(s, len, "ascii", "ignore"); |
46 | 42 |
|
47 |
| -Alternatively, you can use PyUnicode_FromWideChar(): |
| 43 | +如果你恰好有一个用 ``wchar_t *, len`` 对表示的宽字符串, |
| 44 | +有几种选择性。首先你可以使用 ``Py_BuildValue()`` : |
48 | 45 |
|
49 |
| -PyObject *obj = PyUnicode_FromWideChar(w, len); |
| 46 | +:: |
50 | 47 |
|
51 |
| -For wide character strings, no interpretation is made of the character data—it is assumed |
52 |
| -to be raw Unicode code points which are directly converted to Python. |
| 48 | + wchar_t *w; /* Wide character string */ |
| 49 | + int len; /* Length */ |
| 50 | + |
| 51 | + PyObject *obj = Py_BuildValue("u#", w, len); |
| 52 | + |
| 53 | +另外,你还可以使用 ``PyUnicode_FromWideChar()`` : |
| 54 | + |
| 55 | +:: |
| 56 | + |
| 57 | + PyObject *obj = PyUnicode_FromWideChar(w, len); |
| 58 | + |
| 59 | +对于宽字符串,并没有对字符数据进行解析——它被假定是原始Unicode编码指针,可以被直接转换成Python。 |
53 | 60 |
|
54 | 61 | |
|
55 | 62 |
|
56 | 63 | ----------
|
57 | 64 | 讨论
|
58 | 65 | ----------
|
59 |
| -Conversion of strings from C to Python follow the same principles as I/O. Namely, the |
60 |
| -data from C must be explicitly decoded into a string according to some codec. Common |
61 |
| -encodings include ASCII, Latin-1, and UTF-8. If you’re not entirely sure of the encoding |
62 |
| -or the data is binary, you’re probably best off encoding the string as bytes instead. |
63 |
| -When making an object, Python always copies the string data you provide. If necessary, |
64 |
| -it’s up to you to release the C string afterward (if required). Also, for better reliability, |
65 |
| -you should try to create strings using both a pointer and a size rather than relying on |
66 |
| -NULL-terminated data. |
| 66 | +将C中的字符串转换为Python字符串遵循和I/O同样的原则。 |
| 67 | +也就是说,来自C中的数据必须根据一些解码器被显式的解码为一个字符串。 |
| 68 | +通常编码格式包括ASCII、Latin-1和UTF-8. |
| 69 | +如果你并不确定编码方式或者数据是二进制的,你最好将字符串编码成字节。 |
| 70 | +当构造一个对象的时候,Python通常会复制你提供的字符串数据。 |
| 71 | +如果有必要的话,你需要在后面去释放C字符串。 |
| 72 | +同时,为了让程序更加健壮,你应该同时使用一个指针和一个大小值, |
| 73 | +而不是依赖NULL结尾数据来创建字符串。 |
| 74 | + |
0 commit comments