使用 pickle 和 cPickle 模块序列化数据
序列化是将对象存储为字节或字符流的过程,以便在网络上传输它或将其存储在磁盘上,以便在需要时重新创建它及其状态。相反的过程称为反序列化。
在Python,Pickle 模块为我们提供了序列化和反序列化Python对象的方法。 Pickle 是一个强大的库,它可以序列化许多其他库无法做到的复杂和自定义对象。就像pickle一样,有一个cPickle模块与pickle共享相同的方法,但它是用C编写的。cPickle模块是作为C函数而不是类格式编写的。
Pickle 和 cPickle 的区别:
- Pickle 使用基于Python类的实现,而 cPickle 被编写为 C 函数。因此,cPickle 比pickle 快很多倍。
- Pickle 在Python 2.x 和Python 3.x 中都可用,而 cPickle 默认在Python 2.x 中可用。要在Python 3.x 中使用 cPickle,我们可以导入 _pickle。
- cPickle 不支持 pickle 的子类。如果子类化不重要,cPickle 会更好,否则 Pickle 是最好的选择。
由于pickle 和cPickle 共享相同的接口,因此我们可以以相同的方式使用它们。下面是一个示例代码作为参考:
Python3
try:
# In python 2.x it is available as default
import cPickle as pickle
except ImportError:
# In python 3.x cPickle is not available
import pickle
import random
# A custom class to demonstrate pickling
class ModelTrainer:
def __init__(self) -> None:
self.weights = [0,0,0]
def train(self):
for i in range(len(self.weights)):
self.weights[i] = random.random()
def get_weights(self):
return self.weights
# Create an object
model = ModelTrainer()
# Populate the data
model.train()
print('Weights before pickling', model.get_weights())
# Open a file to write bytes
p_file = open('model.pkl', 'wb')
# Pickle the object
pickle.dump(model, p_file)
p_file.close()
# Deserialization of the file
file = open('model.pkl','rb')
new_model = pickle.load(file)
print('Weights after pickling', new_model.get_weights())
Python3
try:
# In python 2.x it is available as default
import cPickle as pickle
except ImportError:
# In python 3.x cPickle is not available
import pickle
# Deserialization of the file
file = open('model.pkl','rb')
new_model = pickle.load(file)
print('Weights of model', new_model.get_weights())
Python3
try:
# In python 2.x it is available as default
import cPickle as pickle
except ImportError:
# In python 3.x cPickle is not available
import pickle
import random
# If the file is available,
# we can use import statement to import the class
# A custom class to demonstrate pickling
class ModelTrainer:
def __init__(self) -> None:
self.weights = [0, 0, 0]
def train(self):
for i in range(len(self.weights)):
self.weights[i] = random.random()
def get_weights(self):
return self.weights
# Deserialization of the file
file = open('model.pkl', 'rb')
new_model = pickle.load(file)
print('Weights of model', new_model.get_weights())
Python3
try:
# In python 2.x it is available as default
import cPickle as pickle
except ImportError:
# In python 3.x cPickle is not available
import pickle
import random
# A custom class to demonstrate pickling
class ModelTrainer:
def __init__(self) -> None:
self.weights = [0,0,0]
def train(self):
for i in range(len(self.weights)):
self.weights[i] = random.random()
def get_weights(self):
return self.weights
# Create an object
model = ModelTrainer()
# Populate the data
model.train()
print('Weights before pickling', model.get_weights())
# Pickle the object
byte_string = pickle.dumps(model)
print("The bytes of object are:",byte_string)
# Deserialization of the object using same byte string
new_model = pickle.loads(byte_string)
print('Weights after depickling', new_model.get_weights())
输出:
Weights before pickling [0.6089721131909885, 0.7891019431265203, 0.5653418337976294]
Weights after pickling [0.6089721131909885, 0.7891019431265203, 0.5653418337976294]
在上面的代码中,我们创建了一个自定义类 ModelTrainer 来初始化一个 0 的列表。 train() 方法用一些随机值填充列表,get_weight() 方法返回生成的值。接下来,我们创建了模型对象并打印了生成的权重。我们以“wb”(写入字节)模式创建了一个新文件。 dump() 方法将对象作为字节流转储到文件中。验证是通过将文件加载到新对象中并打印权重来完成的。
Pickle 模块对于Python对象非常强大。但它只能保留数据,不能保留类结构。因此,如果我们不提供类定义,任何自定义类对象都不会加载。以下是脱酸失败的示例:
蟒蛇3
try:
# In python 2.x it is available as default
import cPickle as pickle
except ImportError:
# In python 3.x cPickle is not available
import pickle
# Deserialization of the file
file = open('model.pkl','rb')
new_model = pickle.load(file)
print('Weights of model', new_model.get_weights())
输出:
Traceback (most recent call last):
File “des.py”, line 12, in
new_model = pickle.load(file)
AttributeError: Can’t get attribute ‘ModelTrainer’ on
产生上面的错误是因为我们当前的脚本不知道这个对象的类。因此,我们可以说pickle只保存对象内部的数据,而不能保存方法和类结构。
要纠正上述错误,我们必须向脚本提供类定义。以下是如何正确加载自定义对象的示例:
蟒蛇3
try:
# In python 2.x it is available as default
import cPickle as pickle
except ImportError:
# In python 3.x cPickle is not available
import pickle
import random
# If the file is available,
# we can use import statement to import the class
# A custom class to demonstrate pickling
class ModelTrainer:
def __init__(self) -> None:
self.weights = [0, 0, 0]
def train(self):
for i in range(len(self.weights)):
self.weights[i] = random.random()
def get_weights(self):
return self.weights
# Deserialization of the file
file = open('model.pkl', 'rb')
new_model = pickle.load(file)
print('Weights of model', new_model.get_weights())
输出:
Weights of model [0.6089721131909885, 0.7891019431265203, 0.5653418337976294]
我们为 ModelTrainer 类提供了参考。脚本现在可以识别该类,并且可以再次调用构造函数来构建对象。我们可以简单地从以前的文件中导入它,而不是键入整个类代码。
序列化为字符串
我们也可以将对象序列化为字符串。 Pickle 和 cPickle 模块提供 dumps() 和 loading() 方法。 dumps() 方法将对象作为参数并返回编码后的字符串。 load() 方法则相反。它接受编码后的字符串并返回原始对象。下面是将自定义对象序列化为字符串的代码。
蟒蛇3
try:
# In python 2.x it is available as default
import cPickle as pickle
except ImportError:
# In python 3.x cPickle is not available
import pickle
import random
# A custom class to demonstrate pickling
class ModelTrainer:
def __init__(self) -> None:
self.weights = [0,0,0]
def train(self):
for i in range(len(self.weights)):
self.weights[i] = random.random()
def get_weights(self):
return self.weights
# Create an object
model = ModelTrainer()
# Populate the data
model.train()
print('Weights before pickling', model.get_weights())
# Pickle the object
byte_string = pickle.dumps(model)
print("The bytes of object are:",byte_string)
# Deserialization of the object using same byte string
new_model = pickle.loads(byte_string)
print('Weights after depickling', new_model.get_weights())
输出:
Weights before pickling [0.923474126606742, 0.34909608824193983, 0.3761122243447367]
The bytes of object are: b’\x80\x03c__main__\nModelTrainer\nq\x00)\x81q\x01}q\x02X\x07\x00\x00\x00weightsq\x03]q\x04(G?\xed\x8d\x19\x9c\x8fL\xc3G?\xd6W\x97\x1e\x8aHHG?\xd8\x129\x01\xcb\xee\xf2esb.’
Weights after depickling [0.923474126606742, 0.34909608824193983, 0.3761122243447367]