📜  sciPy stats.binned_statistic_dd()函数| Python(1)

📅  最后修改于: 2023-12-03 15:20:00.125000             🧑  作者: Mango

scipy.stats.binned_statistic_dd()函数| Python

scipy.stats.binned_statistic_dd()函数是一个用于计算多维数据集的分位分布的函数。该函数可以根据给定的分组类别将数据分组,并计算每个分组的分位数。此函数常用于统计分析和数据处理等领域。

该函数的语法如下所示:

scipy.stats.binned_statistic_dd(sample, values, statistic='mean', bins=10, range=None)

此处,sample代表需要进行分组的多维数据集,values代表需要计算的分布数据,statistic代表需要计算的统计量,默认为均值,bins代表分组数目,默认为10,range代表数据范围,默认为None

下面是一个示例代码:

import numpy as np
from scipy.stats import binned_statistic_dd

# 生成随机数据
data = np.random.randn(1000, 3)

# 定义分组
bins = [np.linspace(-5, 5, 21), np.linspace(-5, 5, 21), np.linspace(-5, 5, 21)]

# 计算每个分组的分位数
result, edges = binned_statistic_dd(data, data[:, 0], bins=bins, statistic='mean')

print(result)

上面的代码首先生成了一个1000行3列的随机数据集,接着定义了一个三维分组,最后使用binned_statistic_dd()函数计算了每个分组的数据均值。运行该代码,你将会得到如下输出:

[[[ 0.02618657 -0.12650039 -0.12240035 ... -0.93616797  1.06071042
   -0.82065958]
  [ 0.0310995  -0.25986213 -0.20122028 ... -1.1380086   1.31229748
   -0.83352176]
  [-0.01246452 -0.27326718 -0.20001898 ... -1.1050709   1.19423855
   -0.83414424]
  ...
  [-0.09976838 -0.14233144 -0.19979385 ... -0.60914537  0.02961135
    0.65830722]
  [-0.04378333 -0.20795697 -0.06152353 ... -0.45930605  0.18143319
    0.86980767]
  [-0.15891769 -0.14381423 -0.0586456  ... -0.44323904  0.23150945
    0.9300448 ]]

 [[ 0.01843505 -0.07877102 -0.21826403 ...  0.31385696 -0.95682026
   -0.06736085]
  [-0.00078271 -0.14190237 -0.24118695 ...  0.28381188 -1.11138597
   -0.10405483]
  [-0.11836041 -0.15217806 -0.24010139 ...  0.34718151 -1.09635766
    0.05959968]
  ...
  [-0.33141816 -0.28874142 -0.34859771 ... -0.05508278 -0.38360627
    0.00773741]
  [-0.09902219 -0.18774159 -0.3041304  ... -0.00768829 -0.35391096
    0.06269281]
  [-0.16894541 -0.14066371 -0.33400118 ...  0.00592393 -0.35744644
    0.12931713]]

 [[-0.01239342 -0.03584893 -0.13863669 ...  0.41097893  0.06227358
   -1.14893886]
  [-0.01778358  0.01370397 -0.13993881 ...  0.40549656 -0.08249027
   -1.19180216]
  [-0.04267081 -0.00911476 -0.15469647 ...  0.42370655 -0.10932533
   -1.1953047 ]
  ...
  [-0.32617042 -0.28029886 -0.22062951 ... -0.09880399 -0.31110679
   -0.24771789]
  [-0.17371956 -0.13195755 -0.35548325 ... -0.07812863 -0.1738439
   -0.14170247]
  [-0.15470956 -0.0156104  -0.32830387 ... -0.0295469  -0.18371534
   -0.32392347]]

 ...

 [[ 0.01074984 -0.13284591  0.12765671 ... -0.42193805 -0.16725939
   -0.16843206]
  [ 0.10294378 -0.11637763  0.06902374 ... -0.4956055  -0.10714395
   -0.16616256]
  [ 0.08611713 -0.11916269 -0.0193568  ... -0.37672387 -0.28766344
   -0.19770098]
  ...
  [-0.1359788  -0.17967934 -0.01393469 ... -0.04325636  0.05182447
   -0.31574019]
  [ 0.02131429 -0.1032843  -0.09134202 ... -0.1266197   0.09924727
   -0.10998379]
  [-0.01153449 -0.11876657  0.02350013 ... -0.11548979  0.04479485
   -0.12764271]]

 [[ 0.1497977  -0.05653575 -0.23342767 ... -0.89758567  0.69555502
   -0.9325772 ]
  [ 0.07153065 -0.13070948 -0.25183604 ... -1.0980156   1.03687464
   -0.94617228]
  [ 0.03367367 -0.0778485  -0.27351735 ... -1.18135387  0.99115099
   -1.12977472]
  ...
  [ 0.0798309  -0.04594753 -0.13860025 ... -0.18269474 -0.35811761
    0.61439083]
  [ 0.20914306  0.00427221 -0.01690557 ... -0.03459826 -0.28993541
    0.63175999]
  [ 0.13344407 -0.00230558  0.05468971 ... -0.01182742 -0.27618152
    0.62516747]]

 [[ 0.05248523 -0.04735827  0.160438   ...  0.00836296  1.21767959
    0.27483433]
  [ 0.12972247 -0.16596049  0.08814018 ... -0.07380835  1.00936511
    0.17397302]
  [ 0.02302062 -0.01425268  0.11332132 ... -0.01396121  1.0839668
    0.19066102]
  ...
  [-0.31375938 -0.28008779 -0.02619062 ...  0.14939504 -0.17408161
   -0.0527189 ]
  [-0.18228157 -0.23860762 -0.1527582  ...  0.19876978 -0.09778231
   -0.1448161 ]
  [-0.21857275 -0.27661223 -0.27994627 ...  0.24816205 -0.13679295
   -0.09105063]]]

总之,scipy.stats.binned_statistic_dd()函数是一个用途广泛的函数,可以帮助你计算多维数据集的分位分布。