Python|使用 .docx 模块
Word 文档包含包装在三个对象级别中的格式化文本。最低级别 - 运行对象,中间级别 - 段落对象和最高级别 - 文档对象。
因此,我们无法使用普通文本编辑器处理这些文档。但是,我们可以使用 python-docx 模块在Python中操作这些 word 文档。
1. The first step is to install this third-party module python-docx. You can use pip “pip install python-docx” or download the tarball from here. Here’s the Github repository.
2. After installation import “docx” NOT “python-docx”.
3. Use “docx.Document” class to start working with the word document.
代码#1:
# import docx NOT python-docx
import docx
# create an instance of a word document
doc = docx.Document()
# add a heading of level 0 (largest heading)
doc.add_heading('Heading for the document', 0)
# add a paragraph and store
# the object in a variable
doc_para = doc.add_paragraph('Your paragraph goes here, ')
# add a run i.e, style like
# bold, italic, underline, etc.
doc_para.add_run('hey there, bold here').bold = True
doc_para.add_run(', and ')
doc_para.add_run('these words are italic').italic = True
# add a page break to start a new page
doc.add_page_break()
# add a heading of level 2
doc.add_heading('Heading level 2', 2)
# pictures can also be added to our word document
# width is optional
doc.add_picture('path_to_picture')
# now save the document to a location
doc.save('path_to_document')
输出:
注意第二页的分页符。代码 #2:现在,要打开一个 word 文档,创建一个实例并传递文档的路径。
# import the Document class
# from the docx module
from docx import Document
# create an instance of a
# word document we want to open
doc = Document('path_to_the_document')
# print the list of paragraphs in the document
print('List of paragraph objects:->>>')
print(doc.paragraphs)
# print the list of the runs
# in a specified paragraph
print('\nList of runs objects in 1st paragraph:->>>')
print(doc.paragraphs[0].runs)
# print the text in a paragraph
print('\nText in the 1st paragraph:->>>')
print(doc.paragraphs[0].text)
# for printing the complete document
print('\nThe whole content of the document:->>>\n')
for para in doc.paragraphs:
print(para.text)
输出:
List of paragraph objects:->>>
[,
,
,
,
]
List of runs objects in 1st paragraph:->>>
[]
Text in the 1st paragraph:->>>
Heading for the document
The whole content of the document:->>>
Heading for the document
Your paragraph goes here, hey there, bold here, and these words are italic
Heading level 2
参考: https://python-docx.readthedocs.io/en/latest/#user-guide。