Python3标准库：codecs字符串编码和解码(5)

当前位置:

首页 > Python基础教程 >

Python3标准库：codecs字符串编码和解码(5)

)
			encoded_file.write(utf8)

			# Fetch the buffer contents as a UTF-16 encoded byte string

			utf16 = output.getvalue()

			print('Encoded to UTF-16:', to_hex(utf16, 2))

			# Set up another buffer with the UTF-16 data for reading,

			# and wrap it with another EncodedFile.

			buffer = io.BytesIO(utf16)

			encoded_file = codecs.EncodedFile(buffer, data_encoding='utf-8',

			file_encoding='utf-16')

			# Read the UTF-8 encoded version of the data.

			recoded = encoded_file.read()

			print('Back to UTF-8 :', to_hex(recoded, 1))

这个例子显示了如何读写EncodedFile()返回的不同句柄。不论这个句柄用于读还是写，file_encoding总是指示总是指示打开文件句柄所用的编码(作为第一个参数传入)，data_encoding值则指示通过read()和write()调用传递数据时所用的编码。

1.6 非Unicode编码

尽管之前大多数例子都使用Unicode编码，但实际上codecs还可以用于很多其他数据转换。例如，Python包含了处理base-64、bzip2、ROT-13、ZIP和其他数据格式的codecs。

import codecs
import io
buffer = io.StringIO()
stream = codecs.getwriter('rot_13')(buffer)
text = 'abcdefghijklmnopqrstuvwxyz'
stream.write(text)
stream.flush()
print('Original:', text)
print('ROT-13 :', buffer.getvalue())

如果转换可以被表述为有单个输入参数的函数，并且返回一个字节或Unicode串，那么这样的转换都可以注册为一个codec。对于'rot_13'codec，输入应当是一个Unicode串；输出也是一个Unicode串。

使用codecs包装一个数据流，可以提供比直接使用zlib更简单的接口。

import codecs
import io
buffer = io.BytesIO()
stream = codecs.getwriter('zlib')(buffer)
text = b'abcdefghijklmnopqrstuvwxyz\n' * 50
stream.write(text)
stream.flush()
print('Original length :', len(text))
compressed_data = buffer.getvalue()
print('ZIP compressed :', len(compressed_data))
buffer = io.BytesIO(compressed_data)
stream = codecs.getreader('zlib')(buffer)
first_line = stream.readline()
print('Read first line :', repr(first_line))
uncompressed_data = first_line + stream.read()
print('Uncompressed :', len(uncompressed_data))
print('Same :', text == uncompressed_data)

并不是所有压缩或编码系统都支持使用readline()或read()通过流接口读取数据的一部分，因为这需要找到压缩段的末尾来完成解压缩。如果一个程序无法在内存中保存整个解压缩的数据集，那么可以使用压缩库的增量访问特性，而不是codecs。

1.7 增量编码

目前提供的一些编码(特别是bz2和zlib)在处理数据流时可能会显著改变数据流的长度。对于大的数据集，这些编码采用增量方式可以更好的处理，即一次只处理一个小数据块。IncrementalEncoder/IncreamentalDecoder API就是为此而设计。

import codecs
import sys
text = b'abcdefghijklmnopqrstuvwxyz\n'
repetitions = 50
print('Text length :', len(text))
print('Repetitions :', repetitions)
print('Expected len:', len(text) * repetitions)
# Encode the text several times to build up a
# large amount of data
encoder = codecs.getincrementalencoder('bz2')()
encoded = []
print()
print('Encoding:', end=' ')
last = repetitions - 1
for i in range(repetitions):
en_c = encoder.encode(text, final=(i == last))
if en_c:
print('\nEncoded : {} bytes'.format(len(en_c)))
encoded.append(en_c)
else:
sys.stdout.write('.'

栏目列表