如何从 Excel 文件中提取电子邮件列并使用 Pandas 找出邮件类型?
在本文中,让我们看看如何从 Excel 文件中提取电子邮件列,并使用 Pandas 找出邮件的类型。假设我们的 Excel 文件如下图所示,然后我们必须在 Dataframe 的不同列中存储不同类型的电子邮件。
要查看 Excel 文件,请单击此处
方法:
- Import required module.
- Import data from Excel file.
- Make an extra column for each different Email.
- Set Each required Index for searching.
- Define the Pattern of the Email.
- Search the Email and assigning to the respective column in Dataframe.
让我们看看分步实施:
步骤 1:导入所需模块并从 Excel 文件中读取数据。
Python3
# import required module
import pandas as pd;
import re;
# Read excel file and store in to DataFrame
data = pd.read_excel("Email_sample.xlsx");
# show the dataframe
data
Python3
data['Google-mail'] = None
data
Python3
data['Yahoo-mail'] = None
data
Python3
# set required index
index_set = data.columns.get_loc('E-mail')
index_gmail = data.columns.get_loc('Google-mail')
index_yahoo = data.columns.get_loc('Yahoo-mail')
print(index_set, index_gmail,
index_yahoo)
Python3
# define pattern of Email
google_pattern = r'gmail.com'
yahoo_pattern = r'yahoo.com'
Python3
# Search the Email in DataFrame and store
for row in range(0, len(data)):
if re.search(google_pattern,
data.iat[row, index_set]) == None :
data.iat[row,index_gmail] = 'Account not belongs to Google'
else:
gmail = re.search(google_pattern,
data.iat[row, index_set]).group()
data.iat[row,index_gmail] = "Google-Mail"
if re.search(yahoo_pattern,
data.iat[row, index_set]) == None :
data.iat[row,index_yahoo] = 'Account not belongs to Yahoo'
else:
yahoo = re.search(yahoo_pattern,
data.iat[row, index_set]).group()
data.iat[row,index_yahoo] = "Yahoo-Mail"
data
Python3
# importing required module
import pandas as pd
import re
# Creating df
# Reading data from Excel
data = pd.read_excel("Email_sample.xlsx")
print("Original DataFrame")
print(data)
# Create column for
# each type of Email
data['Google-mail'] = None
data['Yahoo-mail'] = None
# set index
index_set = data.columns.get_loc('E-mail')
index_gmail = data.columns.get_loc('Google-mail')
index_yahoo = data.columns.get_loc('Yahoo-mail')
# define Email pattern
google_pattern = r'gmail.com'
yahoo_pattern = r'yahoo.com'
# Searching the email
# Store into DataFrame
for row in range(0, len(data)):
if re.search(google_pattern,
data.iat[row, index_set]) == None:
data.iat[row, index_gmail] = 'Account not belongs to Google'
else:
gmail = re.search(google_pattern,
data.iat[row, index_set]).group()
data.iat[row, index_gmail] = "Google-Mail"
if re.search(yahoo_pattern,
data.iat[row, index_set]) == None:
data.iat[row, index_yahoo] = 'Account not belongs to Yahoo'
else:
yahoo = re.search(yahoo_pattern,
data.iat[row, index_set]).group()
data.iat[row, index_yahoo] = "Yahoo-Mail"
data
输出:
第 2 步:为每个不同的电子邮件创建一个额外的列。
Python3
data['Google-mail'] = None
data
输出:
Python3
data['Yahoo-mail'] = None
data
输出 :
第 3 步:设置搜索所需的每个索引。
Python3
# set required index
index_set = data.columns.get_loc('E-mail')
index_gmail = data.columns.get_loc('Google-mail')
index_yahoo = data.columns.get_loc('Yahoo-mail')
print(index_set, index_gmail,
index_yahoo)
输出:
1 2 3
第 4 步:定义电子邮件的模式。
Python3
# define pattern of Email
google_pattern = r'gmail.com'
yahoo_pattern = r'yahoo.com'
第 5 步:搜索电子邮件并分配到 Dataframe 中的相应列。
Python3
# Search the Email in DataFrame and store
for row in range(0, len(data)):
if re.search(google_pattern,
data.iat[row, index_set]) == None :
data.iat[row,index_gmail] = 'Account not belongs to Google'
else:
gmail = re.search(google_pattern,
data.iat[row, index_set]).group()
data.iat[row,index_gmail] = "Google-Mail"
if re.search(yahoo_pattern,
data.iat[row, index_set]) == None :
data.iat[row,index_yahoo] = 'Account not belongs to Yahoo'
else:
yahoo = re.search(yahoo_pattern,
data.iat[row, index_set]).group()
data.iat[row,index_yahoo] = "Yahoo-Mail"
data
输出:
完整代码:
Python3
# importing required module
import pandas as pd
import re
# Creating df
# Reading data from Excel
data = pd.read_excel("Email_sample.xlsx")
print("Original DataFrame")
print(data)
# Create column for
# each type of Email
data['Google-mail'] = None
data['Yahoo-mail'] = None
# set index
index_set = data.columns.get_loc('E-mail')
index_gmail = data.columns.get_loc('Google-mail')
index_yahoo = data.columns.get_loc('Yahoo-mail')
# define Email pattern
google_pattern = r'gmail.com'
yahoo_pattern = r'yahoo.com'
# Searching the email
# Store into DataFrame
for row in range(0, len(data)):
if re.search(google_pattern,
data.iat[row, index_set]) == None:
data.iat[row, index_gmail] = 'Account not belongs to Google'
else:
gmail = re.search(google_pattern,
data.iat[row, index_set]).group()
data.iat[row, index_gmail] = "Google-Mail"
if re.search(yahoo_pattern,
data.iat[row, index_set]) == None:
data.iat[row, index_yahoo] = 'Account not belongs to Yahoo'
else:
yahoo = re.search(yahoo_pattern,
data.iat[row, index_set]).group()
data.iat[row, index_yahoo] = "Yahoo-Mail"
data
输出 :
注意:在运行此程序之前,请确保您已经在Python环境中安装了xlrd库。