如何使用Python将多个 Excel 文件合并为一个文件?
通常,我们使用 Excel 文件,我们肯定遇到过需要将多个 Excel 文件合并为一个的情况。传统的方法一直是在 excel 中使用 VBA 代码,它可以完成这项工作,但它是一个多步骤的过程,并不那么容易理解。另一种方法是手动将长 Excel 文件复制到一个中,这不仅耗时、麻烦而且容易出错。
使用Pandas 模块,使用Python的几行代码即可轻松快速地完成此任务。首先,我们需要使用 pip 安装模块。因此,让我们摆脱安装。
在终端中使用以下命令:
pip install pandas
方法 1:使用dataframe.append()
Pandas dataframe.append()函数用于将其他数据帧的行附加到给定数据帧的末尾,返回一个新的数据帧对象。不在原始数据框中的列作为新列添加,新单元格填充 NaN 值。
Syntax : DataFrame.append(other, ignore_index=False, verify_integrity=False, sort=None)
Parameters :
- other : DataFrame or Series/dict-like object, or list of these
- ignore_index : If True, do not use the index labels. default False.
- verify_integrity : If True, raise ValueError on creating index with duplicates. default False.
- sort : Sort columns if the columns of self and other are not aligned. default False.
Returns: appended DataFrame
例子:
使用的 Excel: FoodSales1-1、FoodSales2-1
Python3
# importing the required modules
import glob
import pandas as pd
# specifying the path to csv files
path = "C:/downloads"
# csv files in the path
file_list = glob.glob(path + "/*.xlsx")
# list of excel files we want to merge.
# pd.read_excel(file_path) reads the excel
# data into pandas dataframe.
excl_list = []
for file in excl_list:
excl_list.append(pd.read_excel(file))
# create a new dataframe to store the
# merged excel file.
excl_merged = pd.DataFrame()
for excl_file in excl_list:
# appends the data into the excl_merged
# dataframe.
excl_merged = excl_merged.append(
excl_file, ignore_index=True)
# exports the dataframe into excel file with
# specified name.
excl_merged.to_excel('total_food_sales.xlsx', index=False)
Python3
# importing the required modules
import glob
import pandas as pd
# specifying the path to csv files
path = "C:/downloads"
# csv files in the path
file_list = glob.glob(path + "/*.xlsx")
# list of excel files we want to merge.
# pd.read_excel(file_path) reads the
# excel data into pandas dataframe.
excl_list = []
for file in excl_list:
excl_list.append(pd.read_excel(file))
# concatenate all DataFrames in the list
# into a single DataFrame, returns new
# DataFrame.
excl_merged = pd.concat(excl_list, ignore_index=True)
# exports the dataframe into excel file
# with specified name.
excl_merged.to_excel('Bank_Stocks.xlsx', index=False)
输出 :
方法二:使用pandas.concat()
pandas.concat()函数完成了与 Pandas 对象轴一起执行串联操作的所有繁重工作,同时在其他轴上执行索引(如果有)的可选设置逻辑(联合或交集)。
Syntax: concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
Parameters:
- objs: Series or DataFrame objects
- axis: axis to concatenate along; default = 0 //along rows
- join: way to handle indexes on other axis; default = ‘outer’
- ignore_index: if True, do not use the index values along the concatenation axis; default = False
- keys: sequence to add an identifier to the result indexes; default = None
- levels: specific levels (unique values) to use for constructing a MultiIndex; default = None
- names: names for the levels in the resulting hierarchical index; default = None
- verify_integrity: check whether the new concatenated axis contains duplicates; default = False
- sort: sort non-concatenation axis if it is not already aligned when join is ‘outer’; default = False
- copy: if False, do not copy data unnecessarily; default = True
Returns: a pandas dataframe with concatenated data.
例子:
在最后一个示例中,我们只处理了两个包含几行的 Excel 文件。让我们尝试合并更多文件,每个文件包含大约 5000 行和 7 列。我们有 5 个文件 BankE、BankD、BankC、BankB、BankA,其中包含各自银行的历史库存数据。让我们将它们合并到一个“Bank_Stocks.xlsx”文件中。这里我们使用的是 pandas.concat() 方法。
蟒蛇3
# importing the required modules
import glob
import pandas as pd
# specifying the path to csv files
path = "C:/downloads"
# csv files in the path
file_list = glob.glob(path + "/*.xlsx")
# list of excel files we want to merge.
# pd.read_excel(file_path) reads the
# excel data into pandas dataframe.
excl_list = []
for file in excl_list:
excl_list.append(pd.read_excel(file))
# concatenate all DataFrames in the list
# into a single DataFrame, returns new
# DataFrame.
excl_merged = pd.concat(excl_list, ignore_index=True)
# exports the dataframe into excel file
# with specified name.
excl_merged.to_excel('Bank_Stocks.xlsx', index=False)
输出 :