📅  最后修改于: 2023-12-03 14:45:45.208000             🧑  作者: Mango
PyPDF2 is a library in Python that can help you to manipulate and extract data from PDF files. It can be used to work with single PDF files or to merge multiple PDF files into one.
To install PyPDF2, you can use pip:
pip install PyPDF2
To extract text from a PDF file with PyPDF2, you can use the PdfFileReader
class:
import PyPDF2
with open('example.pdf', 'rb') as pdf_file:
pdf_reader = PyPDF2.PdfFileReader(pdf_file)
for page in range(pdf_reader.getNumPages()):
page_obj = pdf_reader.getPage(page)
print(page_obj.extractText())
To merge multiple PDFs into one using PyPDF2, you can use the PdfFileMerger
class:
import PyPDF2
pdf_merger = PyPDF2.PdfFileMerger()
pdf_merger.append('file1.pdf')
pdf_merger.append('file2.pdf')
with open('merged_files.pdf', 'wb') as pdf_output_file:
pdf_merger.write(pdf_output_file)
PyPDF2 is a useful library for working with PDF files in Python. It can be used to extract text from PDFs or to merge multiple PDFs into one. With PyPDF2, you have the ability to manipulate and extract data from PDF files easily.