Python - 以百分比获取词频
给定一个字符串列表,任务是编写一个Python程序来获取字符串列表中每个单词的百分比份额。
计算解释: (X 字的出现次数/总字数)* 100。
例子:
Input : test_list = [“Gfg is best for geeks”, “All love Gfg”, “Gfg is best for CS”, “For CS geeks Gfg is best”]
Output : {‘Gfg’: 0.21052631578947367, ‘is’: 0.15789473684210525, ‘best’: 0.15789473684210525, ‘for’: 0.10526315789473684, ‘geeks’: 0.10526315789473684, ‘All’: 0.05263157894736842, ‘love’: 0.05263157894736842, ‘CS’: 0.10526315789473684, ‘For’: 0.05263157894736842}
Explanation : Frequency percentage of each word wrt. all other words in list is computed. Gfg occurs 4 times. Total words = 19.
Input : test_list = [“Gfg is best for geeks”, “All love Gfg”]
Output : {‘Gfg’: 0.25, ‘is’: 0.125, ‘best’: 0.125, ‘for’: 0.125, ‘geeks’: 0.125, ‘All’: 0.125, ‘love’: 0.125}
Explanation : Frequency percentage of each word wrt. all other words in list is computed.
方法#1:使用sum() + Counter() + join() + split()
在这种情况下,我们在使用 join() 连接每个字符串后使用 split() 执行获取每个单词的任务。 Counter() 获取映射的每个单词的频率。发布使用 sum() 计算的所有单词大小,可以获得每个单词的所需份额,利用存储在 Counter 中的频率。
Python3
# Python3 code to demonstrate working of
# Each word frequency percentage
# Using sum() + Counter()+ join() + split()
from collections import Counter
# initializing list
test_list = ["Gfg is best for geeks",
"All love Gfg",
"Gfg is best for CS",
"For CS geeks Gfg is best"]
# printing original list
print("The original list is : " + str(test_list))
# concatenating using join
joined = " ".join(ele for ele in test_list)
# mapping using Counter()
mappd = Counter(joined.split())
# getting total using sum
total_val = sum(mappd.values())
# getting share of each word
res = {key: val / total_val for key,
val in mappd.items()}
# printing result
print("Percentage share of each word : " + str(res))
Python3
# Python3 code to demonstrate working of
# Each word frequency percentage
# Using combined one-liner
from collections import Counter
# initializing list
test_list = ["Gfg is best for geeks", "All love Gfg",
"Gfg is best for CS", "For CS geeks Gfg is best"]
# printing original list
print("The original list is : " + str(test_list))
# mapping using Counter()
mappd = Counter(" ".join(ele for ele in test_list).split())
# getting share of each word
res = {key: val / sum(mappd.values()) for key,
val in mappd.items()}
# printing result
print("Percentage share of each word : " + str(res))
输出:
The original list is : [‘Gfg is best for geeks’, ‘All love Gfg’, ‘Gfg is best for CS’, ‘For CS geeks Gfg is best’]
Percentage share of each word : {‘Gfg’: 0.21052631578947367, ‘is’: 0.15789473684210525, ‘best’: 0.15789473684210525, ‘for’: 0.10526315789473684, ‘geeks’: 0.10526315789473684, ‘All’: 0.05263157894736842, ‘love’: 0.05263157894736842, ‘CS’: 0.10526315789473684, ‘For’: 0.05263157894736842}
方法#2:使用组合单线
与上述方法类似,只需将每个段组合起来即可提供紧凑的单衬管解决方案。
蟒蛇3
# Python3 code to demonstrate working of
# Each word frequency percentage
# Using combined one-liner
from collections import Counter
# initializing list
test_list = ["Gfg is best for geeks", "All love Gfg",
"Gfg is best for CS", "For CS geeks Gfg is best"]
# printing original list
print("The original list is : " + str(test_list))
# mapping using Counter()
mappd = Counter(" ".join(ele for ele in test_list).split())
# getting share of each word
res = {key: val / sum(mappd.values()) for key,
val in mappd.items()}
# printing result
print("Percentage share of each word : " + str(res))
输出:
The original list is : [‘Gfg is best for geeks’, ‘All love Gfg’, ‘Gfg is best for CS’, ‘For CS geeks Gfg is best’]
Percentage share of each word : {‘Gfg’: 0.21052631578947367, ‘is’: 0.15789473684210525, ‘best’: 0.15789473684210525, ‘for’: 0.10526315789473684, ‘geeks’: 0.10526315789473684, ‘All’: 0.05263157894736842, ‘love’: 0.05263157894736842, ‘CS’: 0.10526315789473684, ‘For’: 0.05263157894736842}7894736842}