使用Python在字典中抓取和查找有序单词
什么是有序词?
有序词是其中字母按字母顺序出现的词。例如修道院和泥土。其余的单词是无序的,例如geeks
手头的任务
这个任务取自 Rosetta Code,它并不像上面描述的那样平凡。为了获得大量单词,我们将使用http://www.puzzlers.org/pub/wordlists/unixdict.txt上提供的在线词典,其中包含大约 2,500 个单词,因为我们将使用Python ,所以我们可以通过抓取字典而不是将其作为文本文件下载,然后对其进行一些文件处理操作来做到这一点。
要求:
pip install requests
代码
该方法将遍历整个单词并成对比较元素的 ascii 值,直到我们找到错误的结果,否则将对该单词进行排序。
所以这个任务将分为两部分:
刮痧
- 使用Python库请求,我们将从给定的 URL 获取数据
- 将从 URL 获取的内容存储为字符串
- 使用 UTF-8 解码通常在 Web 上编码的内容
- 将长字符串内容转换成单词列表
查找有序词
- 遍历单词列表
- 每个单词中每个相邻字符的 ASCII 值的成对比较
- 如果一对无序,则存储错误结果
- 否则打印有序词
# Python program to find ordered words
import requests
# Scrapes the words from the URL below and stores
# them in a list
def getWords():
# contains about 2500 words
url = "http://www.puzzlers.org/pub/wordlists/unixdict.txt"
fetchData = requests.get(url)
# extracts the content of the webpage
wordList = fetchData.content
# decodes the UTF-8 encoded text and splits the
# string to turn it into a list of words
wordList = wordList.decode("utf-8").split()
return wordList
# function to determine whether a word is ordered or not
def isOrdered():
# fetching the wordList
collection = getWords()
# since the first few of the elements of the
# dictionary are numbers, getting rid of those
# numbers by slicing off the first 17 elements
collection = collection[16:]
word = ''
for word in collection:
result = 'Word is ordered'
i = 0
l = len(word) - 1
if (len(word) < 3): # skips the 1 and 2 lettered strings
continue
# traverses through all characters of the word in pairs
while i < l:
if (ord(word[i]) > ord(word[i+1])):
result = 'Word is not ordered'
break
else:
i += 1
# only printing the ordered words
if (result == 'Word is ordered'):
print(word,': ',result)
# execute isOrdered() function
if __name__ == '__main__':
isOrdered()
Output:
aau: Word is ordered
abbe: Word is ordered
abbey: Word is ordered
abbot: Word is ordered
abbott: Word is ordered
abc: Word is ordered
abe: Word is ordered
abel: Word is ordered
abet: Word is ordered
abo: Word is ordered
abort: Word is ordered
accent: Word is ordered
accept: Word is ordered
...........................
...........................
...........................
参考资料:罗塞塔守则