我们大多数人都将C++作为第一语言,但是当涉及到诸如数据分析和机器学习之类的东西时, Python成为了我们的首选语言,因为它的简单性和大量预编写的模块库。
但是C++也可以用于机器学习吗?如果是,那怎么办?
先决条件:
- C++ Boost库:-这是一个功能强大的C++库,可用于各种目的,例如大型数学运算等。
您可以在这里参考此库的安装 - ML pack C++库:-这是一个小型且可扩展的C++机器学习库。
您可以在这里参考此库的安装。
注意:安装mlpack时将USE_OPENMP = OFF设置为OFF,不要出汗,因为给出了有关如何执行该操作的指南 - 样本CSV数据文件:-由于MLpack库没有任何内置的样本数据集,因此我们必须使用自己的样本数据集。
我们的模型
我们正在编写的代码采用简单的向量数据集,并为每个数据点找到最近的邻居。
培训部分已突出显示
Input : Our Input is a file named data.csv containing a dataset of vectors
The File Contains the Following Data:
3, 3, 3, 3, 0
3, 4, 4, 3, 0
3, 4, 4, 3, 0
3, 3, 4, 3, 0
3, 6, 4, 3, 0
2, 4, 4, 3, 0
2, 4, 4, 1, 0
3, 3, 3, 2, 0
3, 4, 4, 2, 0
3, 4, 4, 2, 0
3, 3, 4, 2, 0
3, 6, 4, 2, 0
2, 4, 4, 2, 0
代码:
#include
#include
using namespace std;
using namespace mlpack;
// NeighborSearch and NearestNeighborSort
using namespace mlpack::neighbor;
// ManhattanDistance
using namespace mlpack::metric;
void mlModel()
{
// Armadillo is a C++ linear algebra library;
// mlpack uses its matrix data type.
arma::mat data;
/*
data::Load is used to import data to the mlpack,
It takes 3 parameters,
1. Filename = Name of the File to be used
2. Matrix = Matrix to hold the Data in the File
3. fatal = true if you want it to throw an exception
if there is an issue
*/
data::Load("data.csv", data, true);
/*
Create a NeighborSearch model. The parameters of the
model are specified with templates:
1. Sorting method: "NearestNeighborSort" - This
class sorts by increasing distance.
2. Distance metric: "ManhattanDistance" - The
L1 distance, the sum of absolute distances.
3. Pass the reference dataset (the vectors to
be searched through) to the constructor.
*/
NeighborSearch nn(data);
// in the above line we trained our model or
// fitted the data to the model
// now we will predict
arma::Mat neighbors; // Matrices to hold
arma::mat distances; // the results
/*
Find the nearest neighbors. Arguments are:-
1. k = 1, Specify the number of neighbors to find
2. Matrices to hold the result, in this case,
neighbors and distances
*/
nn.Search(1, neighbors, distances);
// in the above line we find the nearest neighbor
// Print out each neighbor and its distance.
for (size_t i = 0; i < neighbors.n_elem; ++i)
{
std::cout << "Nearest neighbor of point " << i << " is point "
<< neighbors[i] << " and the distance is "
<< distances[i] << ".\n";
}
}
int main()
{
mlModel();
return 0;
}
使用以下命令在终端/ CMD中运行以上代码
g++ knn_example.cpp -o knn_example -std=c++11 -larmadillo -lmlpack -lboost_serialization
其次是
./knn_example
Output:
Nearest neighbor of point 0 is point 7 and the distance is 1.
Nearest neighbor of point 1 is point 2 and the distance is 0.
Nearest neighbor of point 2 is point 1 and the distance is 0.
Nearest neighbor of point 3 is point 10 and the distance is 1.
Nearest neighbor of point 4 is point 11 and the distance is 1.
Nearest neighbor of point 5 is point 12 and the distance is 1.
Nearest neighbor of point 6 is point 12 and the distance is 1.
Nearest neighbor of point 7 is point 10 and the distance is 1.
Nearest neighbor of point 8 is point 9 and the distance is 0.
Nearest neighbor of point 9 is point 8 and the distance is 0.
Nearest neighbor of point 10 is point 9 and the distance is 1.
Nearest neighbor of point 11 is point 4 and the distance is 1.
Nearest neighbor of point 12 is point 9 and the distance is 1.