📅  最后修改于: 2023-12-03 15:18:13.620000             🧑  作者: Mango
DataFrame.cut()是Pandas库中的一个函数,用于将一组连续的数值数据按照一定的规则进行离散化处理。这样的操作通常用于将连续型数据转化为离散化的分类数据,以方便统计和分析。
import pandas as pd
import numpy as np
data = pd.DataFrame({'age': np.random.randint(0, 100, 20)})
print(data)
输出结果:
age
0 64
1 24
2 56
3 33
4 21
5 67
6 30
7 1
8 49
9 81
10 40
11 7
12 27
13 2
14 76
15 79
16 22
17 83
18 16
19 42
bins = [0, 20, 40, 60, 80, 100]
cats = pd.cut(data['age'], bins)
print(cats)
输出结果:
0 (60, 80]
1 (20, 40]
2 (40, 60]
3 (20, 40]
4 (0, 20]
5 (60, 80]
6 (20, 40]
7 (0, 20]
8 (40, 60]
9 (80, 100]
10 (20, 40]
11 (0, 20]
12 (20, 40]
13 (0, 20]
14 (60, 80]
15 (60, 80]
16 (20, 40]
17 (80, 100]
18 (0, 20]
19 (40, 60]
Name: age, dtype: category
Categories (5, interval[int64]): [(0, 20] < (20, 40] < (40, 60] < (60, 80] < (80, 100]]
bins = [0, 40, 100]
group_names = ['young', 'old']
cats = pd.cut(data['age'], bins, labels=group_names)
print(cats)
输出结果:
0 old
1 young
2 old
3 young
4 young
5 old
6 young
7 young
8 old
9 old
10 young
11 young
12 young
13 young
14 old
15 old
16 young
17 old
18 young
19 old
Name: age, dtype: category
Categories (2, object): ['young' < 'old']
bins参数指定了数据分组的间隔范围,可以是一个整数、一个序列或一个间隔数值切片。
labels参数指定了数据分组后每个组的标签,可以是任意序列。
Pandas DataFrame.cut()函数返回一个Series类型的数据,其中每个元素是一个元组,包含了该元素的所属组的上限值和下限值。