如何计算 R 中的条件概率?
在本文中,我们将讨论如何在 R 编程语言中计算条件概率。
以另一个事件的发生为条件的一个事件的发生概率(即,一个事件的发生取决于另一个事件的条件)称为条件概率。简单来说,这意味着如果 A 和 B 是两个事件,那么以事件 A 的发生为条件的事件 B 的发生概率由 P(B|A) 给出。换句话说,它也是在事件 A 已经发生的情况下,事件 B 的条件概率。
类似地,以事件 B 的发生为条件的事件 A 的发生概率由 P(A|B) 给出,它也表示在事件 B 已经发生的情况下事件 A 的条件概率。
条件概率的公式可以表示为
P(A|B) = P(A ∩ B) / P(A)
这仅在 P(A)≠ 0 时有效,即当事件 A 不是不可能事件时。
相似地,
P(B|A) = P(A ∩ B) / P(B)
这仅在 P(B)≠ 0 时有效,即当事件 B 不是不可能事件时。
下图描绘了维恩图表示
示例 1:条件概率的计算
从一包 50 张神奇宝贝卡中,随机抽取一张卡。这 50 张卡片分别有 5 组相等的红、蓝、绿、黄和黑卡,每组有 2 只水属性神奇宝贝,其中一种水属性为高强度,另一种为中等强度。
假设A为抽到高强度水系宝可梦卡的事件,B为抽到红牌的事件,那么已经抽到红牌的高强度水系宝可梦卡的概率是多少?画?
解决步骤
步骤 1.抽到红牌的概率(事件 B)。
P(B) = 10/50 (since there are 10 red cards within a pack of 50 Pokémon cards.)
第二步:抽到高强度水属性宝可梦卡片的概率(事件A)
P(A) = 5/50 (as there are 5 high-strength water-type Pokémon cards within a pack of 50 cards.)
Step 3 : P( A Ո B) = 1/50 (因为一包50张卡片中只有一张红色高强度水属性宝可梦卡片)
第 4 步:由于事件 B 已经发生,因此有 10 个详尽的案例,而不是之前的 50 个。这10张红色宝可梦卡片中,有1张是高强度水属性宝可梦卡片。
Hence, P(A|B) = P( A Ո B) / P(B) = (1/50) / (10/50) = 1/10.
这是在 B 已经发生的情况下 A 的条件概率。
相似地,
P(B|A) = P( A Ո B) / P(A) = (1/50) / (5/50) = 1 / 5
由于已经从 50 张卡片中抽取的高强度水属性宝可梦卡片中只能有 1 张红色高强度水属性宝可梦卡片。
示例 2:条件概率的计算
一个店主有一个包含 15 位顾客的列表。他观察到他们购买的某些模式,如下表所示。 Customers 1 High Less 2 Low More 3 High More 4 High Less 5 Low Less 6 Low More 7 High More 8 Low Less 9 Low Less 10 High More 11 Low More 12 Low Less 13 High Less 14 High More 15 High LessMoney spent Frequency
根据上表,他有兴趣找出
- 考虑到他们购买的频率较低,客户消费高的概率是多少?
- 考虑到他们购买的频率更高,客户花费更少的概率是多少?
- 考虑到他们购买的频率较低,客户花费较少的概率是多少?
- 考虑到他们购买的频率更高,客户消费高的概率是多少?
解决步骤
1. P(高消费|低频率)
P(Less Frequency) = 8/15( as from the table,8 times out of 15, frequency is less)
P(High Spend Ո Less Frequency) = 4/15 (as from the table, there are 4 combinations out of 15 with high spend and less frequency)
P(High Spend | Less Frequency) = P(High Spend Ո Less Frequency)/ P(Less Frequency) = (4/15)/( 8/15) = 0.5
2. P(低消费|更多频率)
P(More Frequency) = 7/15( as from the table,7 times out of 15, frequency is less)
P(Low Spend Ո More Frequency) = 3/15 (as from the table, there are 3 combinations out of 15 with low spend and more frequency)
P(Low Spend | More Frequency) = P(Low Spend Ո More Frequency)/ P(More Frequency) = (3/15)/( 7/15) = 0.4285714
相似地,
3. P(低支出 | 较少频率) = 0.5
4. P(高消费|更多频率) = 0.5714286
要完成工作,首先安装包“prob”和“tidyverse”并创建一个数据框。以表格的形式表示 Data frame 来表示每个组合。现在,计算数据帧中唯一组合的频率,在输出中表示为“n”。计算在输出中描述为“概率”的每一行的单独概率。根据手头的问题计算最终的条件概率。
下面是用于计算的 R 代码
R
# Library for calculation of conditional probability
library(prob)
library(tidyverse)
Money_Spent < - c("High", "Low", "High", "High",
"Low", "Low", "High", "Low",
"Low", "High", "Low", "Low",
"High", "High", "High")
Frequency < - c("Less", "More", "More", "Less",
"Less", "More", "More", "Less",
"Less", "More", "More", "Less",
"Less", "More", "Less")
Customer < - c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15)
# Customer Data Frame
Customer_Data < - as.data.frame(cbind(Customer, Money_Spent, Frequency))
Customer_Data % >%
count(Money_Spent, Frequency, sort=T)
# Creating two-way table from data frame
Customer_Data_Table < - addmargins(table("Money_Spent"=Customer_Data$Money_Spent,
"Frequency"=Customer_Data$Frequency))
# view table
Customer_Data_Table
Customer_Data < - probspace(Customer_Data)
Customer_Data
# Probability of the customer spending high
# given that they are purchasing less often
Prob(Customer_Data, event=Money_Spent == "High", given=Frequency == "Less")
# Probability of the customer spending less
# given that they are purchasing more often
Prob(Customer_Data, event=Money_Spent == "Low", given=Frequency == "More")
# Probability of the customer spending less
# given that they are purchasing less often
Prob(Customer_Data, event=Money_Spent == "Low", given=Frequency == "Less")
# Probability of the customer spending high
# given that they are purchasing more often
Prob(Customer_Data, event=Money_Spent == "High", given=Frequency == "More")
输出: