如何使用Python合并文件夹中的所有 excel 文件？

在本文中，我们将看到如何将文件夹中的所有 Excel 文件合并为一个文件。

使用的模块：

使用的Python库是：

熊猫：熊猫是一个为Python编程语言开发的Python库，用于操作数据和分析数据。它广泛用于数据科学和数据分析。
Glob ： glob 模块根据 Unix Shell 使用的规则匹配所有匹配指定模式的路径名。

使用的 Excel 文件：

将使用三个 Excel 文件，这些文件将使用Python组合成一个文件夹中的单个 Excel 文件。这三个 Excel 文件是x1.xlsx 、 x2.xlsx和x3.xlsx ：

逐步方法：

首先我们必须导入库和模块

Python3

# importing pandas libraries and 
# glob module
import pandas as pd
import glob

Python3

# path of the folder
path = r'test'

Python3

# reading all the excel files
filenames = glob.glob(path + "\*.xlsx")
print('File names:', filenames)

Python3

# Initializing empty data frame
finalexcelsheet = pd.DataFrame()

Python3

# to iterate excel file one by one 
# inside the folder
for file in filenames:
  
    # combining multiple excel worksheets 
    # into single data frames
    df = pd.concat(pd.read_excel(file, sheet_name=None),
                   ignore_index=True, sort=False)
      
    # Appending excel files one by one
    finalexcelsheet = finalexcelsheet.append(
      df, ignore_index=True)

Python3

# to print the combined data
print('Final Sheet:')
display(finalexcelsheet)

Python3

# save combined data
finalexcelsheet.to_excel(r'Final.xlsx',index=False)

Python3

#import modules
import pandas as pd
import glob
  
# path of the folder
path = r'test'
  
# reading all the excel files
filenames = glob.glob(path + "\*.xlsx")
print('File names:', filenames)
  
# initializing empty data frame
finalexcelsheet = pd.DataFrame()
  
# to iterate excel file one by one 
# inside the folder
for file in filenames:
  
    # combining multiple excel worksheets
    # into single data frames
    df = pd.concat(pd.read_excel(
      file, sheet_name=None), ignore_index=True, sort=False)
  
    # appending excel files one by one
    finalexcelsheet = finalexcelsheet.append(
      df, ignore_index=True)
  
# to print the combined data
print('Final Sheet:')
display(finalexcelsheet)
  
finalexcelsheet.to_excel(r'Final.xlsx', index=False)

设置存储文件的文件夹的路径。这行代码将获取存储文件的文件夹。

蟒蛇3

# path of the folder
path = r'test'

使用 Glob 模块显示文件夹中文件的名称。 glob.glob()函数将搜索给定路径中所有扩展名为 .xlsx 的文件。 print(filenames) 显示所有扩展名为 xlsx 的文件的名称。

蟒蛇3

# reading all the excel files
filenames = glob.glob(path + "\*.xlsx")
print('File names:', filenames)

初始化空数据帧。数据框是Python用于分析和操作数据的表数据结构。这里我们要初始化一个空的数据框，用于存储三个文件中的组合数据

蟒蛇3

# Initializing empty data frame
finalexcelsheet = pd.DataFrame()

一一遍历文件夹中的所有文件。我们必须使用 for 循环遍历每个文件。 pd.concat()函数将连接 excel 文件中存在的所有多个工作表，就像本示例中的第三个 excel 文件一样，并将存储在名为 df 的变量中。 finalexcelsheet.append()函数会将 df 变量中存在的数据一一附加到 finalexcelsheet 中。因此，使用这段代码，您将能够轻松组合 Excel 文件

蟒蛇3

# to iterate excel file one by one 
# inside the folder
for file in filenames:
  
    # combining multiple excel worksheets 
    # into single data frames
    df = pd.concat(pd.read_excel(file, sheet_name=None),
                   ignore_index=True, sort=False)
      
    # Appending excel files one by one
    finalexcelsheet = finalexcelsheet.append(
      df, ignore_index=True)

显示组合数据。要显示组合文件，只需编写 print(finalexcelsheet)。

蟒蛇3

# to print the combined data
print('Final Sheet:')
display(finalexcelsheet)

将合并的数据插入到新的 Excel 文件中。

蟒蛇3

# save combined data
finalexcelsheet.to_excel(r'Final.xlsx',index=False)

下面是基于上述方法的完整Python程序：

蟒蛇3

#import modules
import pandas as pd
import glob
  
# path of the folder
path = r'test'
  
# reading all the excel files
filenames = glob.glob(path + "\*.xlsx")
print('File names:', filenames)
  
# initializing empty data frame
finalexcelsheet = pd.DataFrame()
  
# to iterate excel file one by one 
# inside the folder
for file in filenames:
  
    # combining multiple excel worksheets
    # into single data frames
    df = pd.concat(pd.read_excel(
      file, sheet_name=None), ignore_index=True, sort=False)
  
    # appending excel files one by one
    finalexcelsheet = finalexcelsheet.append(
      df, ignore_index=True)
  
# to print the combined data
print('Final Sheet:')
display(finalexcelsheet)
  
finalexcelsheet.to_excel(r'Final.xlsx', index=False)

输出：

最终的Excel：