使用 Excel 文件创建数据框
让我们看看如何使用Pandas将 excel 文件读取到 Pandas 数据框对象。
代码 #1:使用 pandas 的 read_excel() 方法读取一个 excel 文件。
Python3
# import pandas lib as pd
import pandas as pd
# read by default 1st sheet of an excel file
dataframe1 = pd.read_excel('SampleWork.xlsx')
print(dataframe1)
Python3
# import pandas lib as pd
import pandas as pd
# read 2nd sheet of an excel file
dataframe2 = pd.read_excel('SampleWork.xlsx', sheet_name = 1)
print(dataframe2)
Python3
# import pandas lib as pd
import pandas as pd
require_cols = [0, 3]
# only read specific columns from an excel file
required_df = pd.read_excel('SampleWork.xlsx', usecols = require_cols)
print(required_df)
Python3
# import pandas lib as pd
import pandas as pd
# Handling missing values of 3rd sheet of an excel file.
dataframe = pd.read_excel('SampleWork.xlsx', na_values = "Missing",
sheet_name = 2)
print(dataframe)
Python3
# import pandas lib as pd
import pandas as pd
# read 2nd sheet of an excel file after
# skipping starting two rows
df = pd.read_excel('SampleWork.xlsx', sheet_name = 1, skiprows = 2)
print(df)
Python3
# import pandas lib as pd
import pandas as pd
# setting the 3rd row as header.
df = pd.read_excel('SampleWork.xlsx', sheet_name = 1, header = 2)
print(df)
Python3
# import pandas lib as pd
import pandas as pd
# read both 1st and 2nd sheet.
df = pd.read_excel('SampleWork.xlsx', na_values = "Missing",
sheet_name =[0, 1])
print(df)
Python3
# import pandas lib as pd
import pandas as pd
# read all sheets together.
all_sheets_df = pd.read_excel('SampleWork.xlsx', na_values = "Missing",
sheet_name = None)
print(all_sheets_df)
输出 :
Name Age Stream Percentage
0 Ankit 18 Math 95
1 Rahul 19 Science 90
2 Shaurya 20 Commerce 85
3 Aishwarya 18 Math 80
4 Priyanka 19 Science 75
代码 #2:使用 read_excel() 方法的 'sheet_name' 读取特定表格。
Python3
# import pandas lib as pd
import pandas as pd
# read 2nd sheet of an excel file
dataframe2 = pd.read_excel('SampleWork.xlsx', sheet_name = 1)
print(dataframe2)
输出 :
Name Age Stream Percentage
0 Priya 18 Math 95
1 shivangi 19 Science 90
2 Jeet 20 Commerce 85
3 Ananya 18 Math 80
4 Swapnil 19 Science 75
代码 #3:使用 read_excel() 方法的“usecols”参数读取特定列。
Python3
# import pandas lib as pd
import pandas as pd
require_cols = [0, 3]
# only read specific columns from an excel file
required_df = pd.read_excel('SampleWork.xlsx', usecols = require_cols)
print(required_df)
输出 :
Name Percentage
0 Ankit 95
1 Rahul 90
2 Shaurya 85
3 Aishwarya 80
4 Priyanka 75
代码 #4:使用 read_excel() 方法的 'na_values' 参数处理缺失数据。
Python3
# import pandas lib as pd
import pandas as pd
# Handling missing values of 3rd sheet of an excel file.
dataframe = pd.read_excel('SampleWork.xlsx', na_values = "Missing",
sheet_name = 2)
print(dataframe)
输出 :
Name Age Stream Percentage
0 Priya 18 Math 95
1 shivangi 19 Science 90
2 Jeet 20 NaN 85
3 Ananya 18 Math 80
4 Swapnil 19 Science 75
代码 #5:使用 read_excel() 方法的 'skiprows' 参数读取 Excel 文件时跳过起始行。
Python3
# import pandas lib as pd
import pandas as pd
# read 2nd sheet of an excel file after
# skipping starting two rows
df = pd.read_excel('SampleWork.xlsx', sheet_name = 1, skiprows = 2)
print(df)
输出 :
shivangi 19 Science 90
0 Jeet 20 Commerce 85
1 Ananya 18 Math 80
2 Swapnil 19 Science 75
代码 #6 :使用 read_excel() 方法的 'header' 参数将标题设置为任何行并从该行开始读取。
Python3
# import pandas lib as pd
import pandas as pd
# setting the 3rd row as header.
df = pd.read_excel('SampleWork.xlsx', sheet_name = 1, header = 2)
print(df)
输出 :
shivangi 19 Science 90
0 Jeet 20 Commerce 85
1 Ananya 18 Math 80
2 Swapnil 19 Science 75
代码 #7:使用 read_excel() 方法的“sheet_name”参数读取多个 Excel 表格。
Python3
# import pandas lib as pd
import pandas as pd
# read both 1st and 2nd sheet.
df = pd.read_excel('SampleWork.xlsx', na_values = "Missing",
sheet_name =[0, 1])
print(df)
输出 :
OrderedDict([(0, Name Age Stream Percentage
0 Ankit 18 Math 95
1 Rahul 19 Science 90
2 Shaurya 20 Commerce 85
3 Aishwarya 18 Math 80
4 Priyanka 19 Science 75),
(1, Name Age Stream Percentage
0 Priya 18 Math 95
1 shivangi 19 Science 90
2 Jeet 20 Commerce 85
3 Ananya 18 Math 80
4 Swapnil 19 Science 75)])
代码 #8:使用 read_excel() 方法的“sheet_name”参数一起读取 excel 文件的所有表格。
Python3
# import pandas lib as pd
import pandas as pd
# read all sheets together.
all_sheets_df = pd.read_excel('SampleWork.xlsx', na_values = "Missing",
sheet_name = None)
print(all_sheets_df)
输出 :
OrderedDict([('Sheet1', Name Age Stream Percentage
0 Ankit 18 Math 95
1 Rahul 19 Science 90
2 Shaurya 20 Commerce 85
3 Aishwarya 18 Math 80
4 Priyanka 19 Science 75),
('Sheet2', Name Age Stream Percentage
0 Priya 18 Math 95
1 shivangi 19 Science 90
2 Jeet 20 Commerce 85
3 Ananya 18 Math 80
4 Swapnil 19 Science 75),
('Sheet3', Name Age Stream Percentage
0 Priya 18 Math 95
1 shivangi 19 Science 90
2 Jeet 20 NaN 85
3 Ananya 18 Math 80
4 Swapnil 19 Science 75)])