📜  使用Python在字典中抓取和查找有序单词

📅  最后修改于: 2022-05-13 01:54:52.442000             🧑  作者: Mango

使用Python在字典中抓取和查找有序单词

什么是有序词?

有序词是其中字母按字母顺序出现的词。例如修道院泥土。其余的单词是无序的,例如geeks

手头的任务

这个任务取自 Rosetta Code,它并不像上面描述的那样平凡。为了获得大量单词,我们将使用http://www.puzzlers.org/pub/wordlists/unixdict.txt上提供的在线词典,其中包含大约 2,500 个单词,因为我们将使用Python ,所以我们可以通过抓取字典而不是将其作为文本文件下载,然后对其进行一些文件处理操作来做到这一点。

要求:

pip install requests

代码

该方法将遍历整个单词并成对比较元素的 ascii 值,直到我们找到错误的结果,否则将对该单词进行排序。
所以这个任务将分为两部分:
刮痧

  1. 使用Python库请求,我们将从给定的 URL 获取数据
  2. 将从 URL 获取的内容存储为字符串
  3. 使用 UTF-8 解码通常在 Web 上编码的内容
  4. 将长字符串内容转换成单词列表

查找有序词

  1. 遍历单词列表
  2. 每个单词中每个相邻字符的 ASCII 值的成对比较
  3. 如果一对无序,则存储错误结果
  4. 否则打印有序词
# Python program to find ordered words
import requests
  
# Scrapes the words from the URL below and stores 
# them in a list
def getWords():
  
    # contains about 2500 words
    url = "http://www.puzzlers.org/pub/wordlists/unixdict.txt"
    fetchData = requests.get(url)
  
    # extracts the content of the webpage
    wordList = fetchData.content
  
    # decodes the UTF-8 encoded text and splits the 
    # string to turn it into a list of words
    wordList = wordList.decode("utf-8").split()
  
    return wordList
  
  
# function to determine whether a word is ordered or not
def isOrdered():
  
    # fetching the wordList
    collection = getWords()
  
    # since the first few of the elements of the 
    # dictionary are numbers, getting rid of those
    # numbers by slicing off the first 17 elements
    collection = collection[16:]
    word = ''
  
    for word in collection:
        result = 'Word is ordered'
        i = 0
        l = len(word) - 1
  
        if (len(word) < 3): # skips the 1 and 2 lettered strings
            continue
  
        # traverses through all characters of the word in pairs
        while i < l:         
            if (ord(word[i]) > ord(word[i+1])):
                result = 'Word is not ordered'
                break
            else:
                i += 1
  
        # only printing the ordered words
        if (result == 'Word is ordered'):
            print(word,': ',result)
  
  
# execute isOrdered() function
if __name__ == '__main__':
    isOrdered()
Output:
aau: Word is ordered
abbe: Word is ordered
abbey: Word is ordered
abbot: Word is ordered
abbott: Word is ordered
abc: Word is ordered
abe: Word is ordered
abel: Word is ordered
abet: Word is ordered
abo: Word is ordered
abort: Word is ordered
accent: Word is ordered
accept: Word is ordered
...........................
...........................
...........................

参考资料:罗塞塔守则