📜  PyBrain – 数据集类型

📅  最后修改于: 2022-05-13 01:54:22.858000             🧑  作者: Mango

PyBrain – 数据集类型

数据集有助于轻松访问训练、测试和验证数据。 PyBrain 无需处理数组,而是为您提供更复杂的数据结构,使您可以更轻松地处理数据。

PyBrain 中的数据集

Pybrain 支持的最常用的数据集是 SupervisedDataSet 和 ClassificationDataSet。

SupervisedDataSet:由输入字段和目标字段组成。它是数据集的最简单形式,主要用于监督学习任务。顾名思义,这种最简单的数据集形式旨在用于监督学习任务。它由“输入”和“目标”字段组成,必须在创建时设置其模式大小:

Python3
from pybrain.datasets import SupervisedDataSet
  
DS = SupervisedDataSet(3, 2)
DS.appendLinked([1, 2, 3], [4, 5])
len(DS)
DS['input']
array([[1.,  2.,  3.]])


Python3
# Importing all the necessary libraries
from sklearn import datasets
import matplotlib.pyplot as plt
from pybrain.datasets import ClassificationDataSet
from pybrain.utilities import percentError
from pybrain.tools.shortcuts import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.structure.modules import SoftmaxLayer
from numpy import ravel
  
# Loading iris dataset from sklearn datasets
iris = datasets.load_iris()
  
# Defining feature variables and target variable
X_data = iris.data
y_data = iris.target
  
# Defining classification dataset model
classification_dataset = ClassificationDataSet(4, 1, nb_classes=3)
  
# Adding sample into classification dataset
for i in range(len(X_data)):
    classification_dataset.addSample(ravel(X_data[i]), y_data[i])
  
# Spilling data into testing and training data 
# with the ratio 7:3
testing_data, training_data = classification_dataset.splitWithProportion(0.3)
  
# Classification dataset for test data
test_data = ClassificationDataSet(4, 1, nb_classes=3)
  
# Adding sample into testing classification dataset
for n in range(0, testing_data.getLength()):
    test_data.addSample(testing_data.getSample(
        n)[0], testing_data.getSample(n)[1])
  
# Classification dataset for train data
train_data = ClassificationDataSet(4, 1, nb_classes=3)
  
# Adding sample into training classification dataset
for n in range(0, training_data.getLength()):
    train_data.addSample(training_data.getSample(
        n)[0], training_data.getSample(n)[1])
  
test_data._convertToOneOfMany()
train_data._convertToOneOfMany()
  
# Building network with outclass as SoftmaxLayer
# on training data
build_network = buildNetwork(
    train_data.indim, 4, train_data.outdim, outclass=SoftmaxLayer)
  
# Building a backproptrainer on training data
trainer = BackpropTrainer(
    build_network, dataset=train_data, learningrate=0.01, verbose=True)
  
# 20 iterations on training data
trainer.trainEpochs(20)
  
# Testing data
print('Error percentage on testing data=>', percentError(
    trainer.testOnClassData(dataset=test_data), test_data['class']))


输出:

ClassificationDataSet:主要用于处理分类问题。它接受输入、目标字段以及一个称为“类”的额外字段,它是给定目标的自动备份。例如,输出将是 1 或 0,或者输出将与基于给定输入的值组合在一起,即,它要么属于一个特定类别。

Python3

# Importing all the necessary libraries
from sklearn import datasets
import matplotlib.pyplot as plt
from pybrain.datasets import ClassificationDataSet
from pybrain.utilities import percentError
from pybrain.tools.shortcuts import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.structure.modules import SoftmaxLayer
from numpy import ravel
  
# Loading iris dataset from sklearn datasets
iris = datasets.load_iris()
  
# Defining feature variables and target variable
X_data = iris.data
y_data = iris.target
  
# Defining classification dataset model
classification_dataset = ClassificationDataSet(4, 1, nb_classes=3)
  
# Adding sample into classification dataset
for i in range(len(X_data)):
    classification_dataset.addSample(ravel(X_data[i]), y_data[i])
  
# Spilling data into testing and training data 
# with the ratio 7:3
testing_data, training_data = classification_dataset.splitWithProportion(0.3)
  
# Classification dataset for test data
test_data = ClassificationDataSet(4, 1, nb_classes=3)
  
# Adding sample into testing classification dataset
for n in range(0, testing_data.getLength()):
    test_data.addSample(testing_data.getSample(
        n)[0], testing_data.getSample(n)[1])
  
# Classification dataset for train data
train_data = ClassificationDataSet(4, 1, nb_classes=3)
  
# Adding sample into training classification dataset
for n in range(0, training_data.getLength()):
    train_data.addSample(training_data.getSample(
        n)[0], training_data.getSample(n)[1])
  
test_data._convertToOneOfMany()
train_data._convertToOneOfMany()
  
# Building network with outclass as SoftmaxLayer
# on training data
build_network = buildNetwork(
    train_data.indim, 4, train_data.outdim, outclass=SoftmaxLayer)
  
# Building a backproptrainer on training data
trainer = BackpropTrainer(
    build_network, dataset=train_data, learningrate=0.01, verbose=True)
  
# 20 iterations on training data
trainer.trainEpochs(20)
  
# Testing data
print('Error percentage on testing data=>', percentError(
    trainer.testOnClassData(dataset=test_data), test_data['class']))

输出:

Total error:  0.0892390931641
Total error:  0.0821479733597
Total error:  0.0759327938967
Total error:  0.0722385583142
Total error:  0.0690818068826
Total error:  0.0667645311923
Total error:  0.0647079622731
Total error:  0.0630345245312
Total error:  0.0608030839912
Total error:  0.0595356750412
Total error:  0.0586635639408
Total error:  0.0573043661487
Total error:  0.0559188704413
Total error:  0.0548155819544
Total error:  0.0535537679931
Total error:  0.0527051106108
Total error:  0.0515783629912
Total error:  0.0501025301423
Total error:  0.0499123823243
Total error:  0.0482250742606
Error percentage on testing data=> 20.0