📅  最后修改于: 2020-11-06 06:21:20             🧑  作者: Mango
要阅读Word文档,我们需要使用名为docx的模块。我们首先安装docx,如下所示。然后编写一个程序以使用docx模块中的不同功能按段读取整个文件。
我们使用以下命令将docx模块放入我们的环境中。
pip install docx
在下面的示例中,我们通过将每行追加到段落中并最终打印出所有段落文本来读取word文档的内容。
import docx
def readtxt(filename):
doc = docx.Document(filename)
fullText = []
for para in doc.paragraphs:
fullText.append(para.text)
return '\n'.join(fullText)
print (readtxt('path\Tutorialspoint.docx'))
当我们运行上面的程序时,我们得到以下输出-
Tutorials Point originated from the idea that there exists a class of readers who respond
better to online content and prefer to learn new skills at their own pace from the comforts
of their drawing rooms.
The journey commenced with a single tutorial on HTML in 2006 and elated by the response it generated,
we worked our way to adding fresh tutorials to our repository which now proudly flaunts
a wealth of tutorials and allied articles on topics ranging from programming languages
to web designing to academics and much more.
我们可以使用段落属性从Word文档中读取特定段落。在下面的示例中,我们仅从单词文档中读取第二段。
import docx
doc = docx.Document('path\Tutorialspoint.docx')
print len(doc.paragraphs)
print doc.paragraphs[2].text
当我们运行上面的程序时,我们得到以下输出-
The journey commenced with a single tutorial on HTML in 2006 and elated by the response
it generated, we worked our way to adding fresh tutorials to our repository
which now proudly flaunts a wealth of tutorials and allied articles on topics
ranging from programming languages to web designing to academics and much more.