📅  最后修改于: 2023-12-03 15:20:00.125000             🧑  作者: Mango
scipy.stats.binned_statistic_dd()
函数是一个用于计算多维数据集的分位分布的函数。该函数可以根据给定的分组类别将数据分组,并计算每个分组的分位数。此函数常用于统计分析和数据处理等领域。
该函数的语法如下所示:
scipy.stats.binned_statistic_dd(sample, values, statistic='mean', bins=10, range=None)
此处,sample
代表需要进行分组的多维数据集,values
代表需要计算的分布数据,statistic
代表需要计算的统计量,默认为均值,bins
代表分组数目,默认为10,range
代表数据范围,默认为None
。
下面是一个示例代码:
import numpy as np
from scipy.stats import binned_statistic_dd
# 生成随机数据
data = np.random.randn(1000, 3)
# 定义分组
bins = [np.linspace(-5, 5, 21), np.linspace(-5, 5, 21), np.linspace(-5, 5, 21)]
# 计算每个分组的分位数
result, edges = binned_statistic_dd(data, data[:, 0], bins=bins, statistic='mean')
print(result)
上面的代码首先生成了一个1000行3列的随机数据集,接着定义了一个三维分组,最后使用binned_statistic_dd()
函数计算了每个分组的数据均值。运行该代码,你将会得到如下输出:
[[[ 0.02618657 -0.12650039 -0.12240035 ... -0.93616797 1.06071042
-0.82065958]
[ 0.0310995 -0.25986213 -0.20122028 ... -1.1380086 1.31229748
-0.83352176]
[-0.01246452 -0.27326718 -0.20001898 ... -1.1050709 1.19423855
-0.83414424]
...
[-0.09976838 -0.14233144 -0.19979385 ... -0.60914537 0.02961135
0.65830722]
[-0.04378333 -0.20795697 -0.06152353 ... -0.45930605 0.18143319
0.86980767]
[-0.15891769 -0.14381423 -0.0586456 ... -0.44323904 0.23150945
0.9300448 ]]
[[ 0.01843505 -0.07877102 -0.21826403 ... 0.31385696 -0.95682026
-0.06736085]
[-0.00078271 -0.14190237 -0.24118695 ... 0.28381188 -1.11138597
-0.10405483]
[-0.11836041 -0.15217806 -0.24010139 ... 0.34718151 -1.09635766
0.05959968]
...
[-0.33141816 -0.28874142 -0.34859771 ... -0.05508278 -0.38360627
0.00773741]
[-0.09902219 -0.18774159 -0.3041304 ... -0.00768829 -0.35391096
0.06269281]
[-0.16894541 -0.14066371 -0.33400118 ... 0.00592393 -0.35744644
0.12931713]]
[[-0.01239342 -0.03584893 -0.13863669 ... 0.41097893 0.06227358
-1.14893886]
[-0.01778358 0.01370397 -0.13993881 ... 0.40549656 -0.08249027
-1.19180216]
[-0.04267081 -0.00911476 -0.15469647 ... 0.42370655 -0.10932533
-1.1953047 ]
...
[-0.32617042 -0.28029886 -0.22062951 ... -0.09880399 -0.31110679
-0.24771789]
[-0.17371956 -0.13195755 -0.35548325 ... -0.07812863 -0.1738439
-0.14170247]
[-0.15470956 -0.0156104 -0.32830387 ... -0.0295469 -0.18371534
-0.32392347]]
...
[[ 0.01074984 -0.13284591 0.12765671 ... -0.42193805 -0.16725939
-0.16843206]
[ 0.10294378 -0.11637763 0.06902374 ... -0.4956055 -0.10714395
-0.16616256]
[ 0.08611713 -0.11916269 -0.0193568 ... -0.37672387 -0.28766344
-0.19770098]
...
[-0.1359788 -0.17967934 -0.01393469 ... -0.04325636 0.05182447
-0.31574019]
[ 0.02131429 -0.1032843 -0.09134202 ... -0.1266197 0.09924727
-0.10998379]
[-0.01153449 -0.11876657 0.02350013 ... -0.11548979 0.04479485
-0.12764271]]
[[ 0.1497977 -0.05653575 -0.23342767 ... -0.89758567 0.69555502
-0.9325772 ]
[ 0.07153065 -0.13070948 -0.25183604 ... -1.0980156 1.03687464
-0.94617228]
[ 0.03367367 -0.0778485 -0.27351735 ... -1.18135387 0.99115099
-1.12977472]
...
[ 0.0798309 -0.04594753 -0.13860025 ... -0.18269474 -0.35811761
0.61439083]
[ 0.20914306 0.00427221 -0.01690557 ... -0.03459826 -0.28993541
0.63175999]
[ 0.13344407 -0.00230558 0.05468971 ... -0.01182742 -0.27618152
0.62516747]]
[[ 0.05248523 -0.04735827 0.160438 ... 0.00836296 1.21767959
0.27483433]
[ 0.12972247 -0.16596049 0.08814018 ... -0.07380835 1.00936511
0.17397302]
[ 0.02302062 -0.01425268 0.11332132 ... -0.01396121 1.0839668
0.19066102]
...
[-0.31375938 -0.28008779 -0.02619062 ... 0.14939504 -0.17408161
-0.0527189 ]
[-0.18228157 -0.23860762 -0.1527582 ... 0.19876978 -0.09778231
-0.1448161 ]
[-0.21857275 -0.27661223 -0.27994627 ... 0.24816205 -0.13679295
-0.09105063]]]
总之,scipy.stats.binned_statistic_dd()
函数是一个用途广泛的函数,可以帮助你计算多维数据集的分位分布。