如何在Python中计算 Cramer 的 V?
Cramer's V:定义为两个给定名义变量之间的长度测量值。名义变量是一种数据测量尺度,用于对不同类型的数据进行分类。 Cramer 的 V 介于 0 和 1(含)之间。 0 表示这两个变量没有任何关系。 1 表明两个变量之间存在强关联。 Cramer's V 可以使用以下公式计算:
√(X2/N) / min(C-1, R-1)
Here,
- X2: It is the Chi-square statistic
- N: It represents the total sample size
- R: It is equal to the number of rows
- C: It is equal to the number of columns
示例 1:
让我们计算一个 3 × 3 表的 Cramer's V。
Python3
# Load necessary packages and functions
import scipy.stats as stats
import numpy as np
# Make a 3 x 3 table
dataset = np.array([[13, 17, 11], [4, 6, 9],
[20, 31, 42]])
# Finding Chi-squared test statistic,
# sample size, and minimum of rows
# and columns
X2 = stats.chi2_contingency(dataset, correction=False)[0]
N = np.sum(dataset)
minimum_dimension = min(dataset.shape)-1
# Calculate Cramer's V
result = np.sqrt((X2/N) / minimum_dimension)
# Print the result
print(result)
Python3
# Load necessary packages and functions
import scipy.stats as stats
import numpy as np
# Make a 5 x 4 table
dataset = np.array([[4, 13, 17, 11], [4, 6, 9, 12],
[2, 7, 4, 2], [5, 13, 10, 12],
[5, 6, 14, 12]])
# Finding Chi-squared test statistic,
# sample size, and minimum of rows and
# columns
X2 = stats.chi2_contingency(dataset, correction=False)[0]
N = np.sum(dataset)
minimum_dimension = min(dataset.shape)-1
# Calculate Cramer's V
result = np.sqrt((X2/N) / minimum_dimension)
# Print the result
print(result)
输出:
Cramers V 等于 0.121,这清楚地描述了表中两个变量之间的弱关联。
示例 2:
我们现在将计算较大表格和不等尺寸的 Cramer's V。 Cramers V 等于 0.12,这清楚地描述了表中两个变量之间的弱关联。
Python3
# Load necessary packages and functions
import scipy.stats as stats
import numpy as np
# Make a 5 x 4 table
dataset = np.array([[4, 13, 17, 11], [4, 6, 9, 12],
[2, 7, 4, 2], [5, 13, 10, 12],
[5, 6, 14, 12]])
# Finding Chi-squared test statistic,
# sample size, and minimum of rows and
# columns
X2 = stats.chi2_contingency(dataset, correction=False)[0]
N = np.sum(dataset)
minimum_dimension = min(dataset.shape)-1
# Calculate Cramer's V
result = np.sqrt((X2/N) / minimum_dimension)
# Print the result
print(result)
输出:
Cramers V 等于 0.146,这清楚地描述了表中两个变量之间的弱关联。