📜  使用Python合并存储在远程服务器中的 PDF

📅  最后修改于: 2022-05-13 01:55:00.237000             🧑  作者: Mango

使用Python合并存储在远程服务器中的 PDF

先决条件:在Python中处理 PDF 文件

有许多库用于在Python中处理 PDF 文件,但当所有 PDF 文件都已下载到本地计算机时,所有库都在使用它们。但是,如果您的目标 PDF文件位于远程服务器中,那么您只有文件的 URL,而您的机器或计算服务器中不需要下载。在这里,我们将讨论这个问题及其解决方案。

安装:
这里我们使用Python PyPDF2 的库来合并 PDF。

pip install PyPDF2

我们将合并同一个 pdf 文件两次。此处使用的 pdf 文件的链接

下面是实现。

from io import BytesIO, SEEK_SET, SEEK_END
import PyPDF2 
import requests
  
  
# Create a class which convert PDF in
# BytesIO form
class ResponseStream(object):
      
    def __init__(self, request_iterator):
        self._bytes = BytesIO()
        self._iterator = request_iterator
   
    def _load_all(self):
        self._bytes.seek(0, SEEK_END)
          
        for chunk in self._iterator:
            self._bytes.write(chunk)
   
    def _load_until(self, goal_position):
        current_position = self._bytes.seek(0, SEEK_END)
          
        while current_position < goal_position:
            try:
                current_position = self._bytes.write(next(self._iterator))
                  
            except StopIteration:
                break
   
    def tell(self):
        return self._bytes.tell()
   
    def read(self, size = None):
        left_off_at = self._bytes.tell()
          
        if size is None:
            self._load_all()
        else:
            goal_position = left_off_at + size
            self._load_until(goal_position)
   
        self._bytes.seek(left_off_at)
          
        return self._bytes.read(size)
   
    def seek(self, position, whence = SEEK_SET):
          
        if whence == SEEK_END:
            self._load_all()
        else:
            self._bytes.seek(position, whence)
              
              
# Merge PDFs using URL List
url_list = ["lis of URL"]
target_pdf_path = './Merged.pdf'
pdf_writer = PyPDF2.PdfFileWriter()
                
for url in url_list:
      
    response = requests.get(url)
    reader = PyPDF2.PdfFileReader(ResponseStream(response.iter_content(64)))
      
    for page in range(reader.getNumPages()):
        pdf_writer.addPage(reader.getPage(page))
  
# write to output file
with open(target_pdf_path, 'wb') as g:
    pdf_writer.write(g)

输出:

创建的合并pdf文件在这里