Python|计算给定文本文件中每个单词的出现次数(使用字典)
很多时候需要计算文本文件中每个单词的出现次数。为此,我们使用了一个字典对象,该对象将单词存储为键,将其计数存储为相应的值。我们遍历文件中的每个单词并将其添加到字典中,计数为 1。如果字典中已经存在该单词,我们将其计数增加 1。
示例 #1:
首先,我们创建一个文本文件,我们要计算其中的单词。让这个文件为sample.txt
,内容如下:
Mango banana apple pear
Banana grapes strawberry
Apple pear mango banana
Kiwi apple mango strawberry
注意:确保文本文件与Python文件位于同一目录中。
# Open the file in read mode
text = open("sample.txt", "r")
# Create an empty dictionary
d = dict()
# Loop through each line of the file
for line in text:
# Remove the leading spaces and newline character
line = line.strip()
# Convert the characters in line to
# lowercase to avoid case mismatch
line = line.lower()
# Split the line into words
words = line.split(" ")
# Iterate over each word in line
for word in words:
# Check if the word is already in dictionary
if word in d:
# Increment count of word by 1
d[word] = d[word] + 1
else:
# Add the word to dictionary with count 1
d[word] = 1
# Print the contents of dictionary
for key in list(d.keys()):
print(key, ":", d[key])
输出:
mango : 3
banana : 3
apple : 3
pear : 2
grapes : 1
strawberry : 2
kiwi : 1
示例 #2:
考虑一个包含带有标点符号的句子的文件sample.txt
。
Mango! banana apple pear.
Banana, grapes strawberry.
Apple- pear mango banana.
Kiwi "apple" mango strawberry.
import string
# Open the file in read mode
text = open("sample.txt", "r")
# Create an empty dictionary
d = dict()
# Loop through each line of the file
for line in text:
# Remove the leading spaces and newline character
line = line.strip()
# Convert the characters in line to
# lowercase to avoid case mismatch
line = line.lower()
# Remove the punctuation marks from the line
line = line.translate(line.maketrans("", "", string.punctuation))
# Split the line into words
words = line.split(" ")
# Iterate over each word in line
for word in words:
# Check if the word is already in dictionary
if word in d:
# Increment count of word by 1
d[word] = d[word] + 1
else:
# Add the word to dictionary with count 1
d[word] = 1
# Print the contents of dictionary
for key in list(d.keys()):
print(key, ":", d[key])
输出:
mango : 3
banana : 3
apple : 3
pear : 2
grapes : 1
strawberry : 2
kiwi : 1