如何计算 R 中的条件概率？

在本文中，我们将讨论如何在 R 编程语言中计算条件概率。

以另一个事件的发生为条件的一个事件的发生概率（即，一个事件的发生取决于另一个事件的条件）称为条件概率。简单来说，这意味着如果 A 和 B 是两个事件，那么以事件 A 的发生为条件的事件 B 的发生概率由 P(B|A) 给出。换句话说，它也是在事件 A 已经发生的情况下，事件 B 的条件概率。

类似地，以事件 B 的发生为条件的事件 A 的发生概率由 P(A|B) 给出，它也表示在事件 B 已经发生的情况下事件 A 的条件概率。

条件概率的公式可以表示为

P(A|B) = P(A ∩ B) / P(A)

编程需要懂一点英语

这仅在 P(A)≠ 0 时有效，即当事件 A 不是不可能事件时。

相似地，

P(B|A) = P(A ∩ B) / P(B)

编程需要懂一点英语

这仅在 P(B)≠ 0 时有效，即当事件 B 不是不可能事件时。

下图描绘了维恩图表示

示例 1：条件概率的计算

从一包 50 张神奇宝贝卡中，随机抽取一张卡。这 50 张卡片分别有 5 组相等的红、蓝、绿、黄和黑卡，每组有 2 只水属性神奇宝贝，其中一种水属性为高强度，另一种为中等强度。

假设A为抽到高强度水系宝可梦卡的事件，B为抽到红牌的事件，那么已经抽到红牌的高强度水系宝可梦卡的概率是多少？画？

解决步骤

步骤 1.抽到红牌的概率（事件 B）。

P(B) = 10/50 (since there are 10 red cards within a pack of 50 Pokémon cards.)

编程需要懂一点英语

第二步：抽到高强度水属性宝可梦卡片的概率（事件A）

P(A) = 5/50 (as there are 5 high-strength water-type Pokémon cards within a pack of 50 cards.)

编程需要懂一点英语

Step 3 : P( A Ո B) = 1/50 （因为一包50张卡片中只有一张红色高强度水属性宝可梦卡片）

第 4 步：由于事件 B 已经发生，因此有 10 个详尽的案例，而不是之前的 50 个。这10张红色宝可梦卡片中，有1张是高强度水属性宝可梦卡片。

Hence, P(A|B) = P( A Ո B) / P(B)  = (1/50) / (10/50) = 1/10.

这是在 B 已经发生的情况下 A 的条件概率。

相似地，

P(B|A) = P( A Ո B) / P(A)  = (1/50) / (5/50)  = 1 / 5

由于已经从 50 张卡片中抽取的高强度水属性宝可梦卡片中只能有 1 张红色高强度水属性宝可梦卡片。

示例 2：条件概率的计算

一个店主有一个包含 15 位顾客的列表。他观察到他们购买的某些模式，如下表所示。

Customers	Money spent	Frequency
1	High	Less
2	Low	More
3	High	More
4	High	Less
5	Low	Less
6	Low	More
7	High	More
8	Low	Less
9	Low	Less
10	High	More
11	Low	More
12	Low	Less
13	High	Less
14	High	More
15	High	Less

根据上表，他有兴趣找出

考虑到他们购买的频率较低，客户消费高的概率是多少？
考虑到他们购买的频率更高，客户花费更少的概率是多少？
考虑到他们购买的频率较低，客户花费较少的概率是多少？
考虑到他们购买的频率更高，客户消费高的概率是多少？

解决步骤

1. P（高消费|低频率）

P(Less Frequency) = 8/15( as from the table,8 times out of 15, frequency is less)

P(High Spend Ո Less Frequency) = 4/15 (as from the table, there are 4 combinations out of 15 with high spend and less frequency)

P(High Spend | Less Frequency) = P(High Spend Ո Less Frequency)/ P(Less Frequency) = (4/15)/( 8/15) = 0.5

编程需要懂一点英语

2. P（低消费|更多频率）

P(More Frequency) = 7/15( as from the table,7 times out of 15, frequency is less)

P(Low Spend Ո More Frequency) = 3/15 (as from the table, there are 3 combinations out of 15 with low spend and more frequency)

P(Low Spend | More Frequency) = P(Low Spend Ո More Frequency)/ P(More Frequency) = (3/15)/( 7/15) = 0.4285714

编程需要懂一点英语

相似地，

3. P（低支出 | 较少频率） = 0.5

4. P（高消费|更多频率） = 0.5714286

要完成工作，首先安装包“prob”和“tidyverse”并创建一个数据框。以表格的形式表示 Data frame 来表示每个组合。现在，计算数据帧中唯一组合的频率，在输出中表示为“n”。计算在输出中描述为“概率”的每一行的单独概率。根据手头的问题计算最终的条件概率。

下面是用于计算的 R 代码

R

# Library for calculation of conditional probability
library(prob)
library(tidyverse)
  
Money_Spent < - c("High", "Low", "High", "High",
                  "Low", "Low", "High", "Low", 
                  "Low", "High", "Low", "Low",
                  "High", "High", "High")
Frequency < - c("Less", "More", "More", "Less", 
                "Less", "More", "More", "Less",
                "Less", "More", "More", "Less",
                "Less", "More", "Less")
Customer < - c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 
               11, 12, 13, 14, 15)
  
# Customer Data Frame
Customer_Data < - as.data.frame(cbind(Customer, Money_Spent, Frequency))
Customer_Data % >%
count(Money_Spent, Frequency, sort=T)
  
# Creating two-way table from data frame
Customer_Data_Table < - addmargins(table("Money_Spent"=Customer_Data$Money_Spent,
                                         "Frequency"=Customer_Data$Frequency))
# view table
Customer_Data_Table
  
Customer_Data < - probspace(Customer_Data)
Customer_Data
  
# Probability of the customer spending high 
# given that they are purchasing less often
Prob(Customer_Data, event=Money_Spent == "High", given=Frequency == "Less")
  
# Probability of the customer spending less
# given that they are purchasing more often
Prob(Customer_Data, event=Money_Spent == "Low", given=Frequency == "More")
  
# Probability of the customer spending less
# given that they are purchasing less often
Prob(Customer_Data, event=Money_Spent == "Low", given=Frequency == "Less")
  
# Probability of the customer spending high 
# given that they are purchasing more often
Prob(Customer_Data, event=Money_Spent == "High", given=Frequency == "More")

输出：