在Python使用detect_delimiter 检测文本中的分隔符
有时在处理大量文本时,我们可能会遇到一个问题,即我们试图找出哪个字符充当分隔符。在处理大量数据和判断分隔符时,这可能是一个有趣且有用的实用程序。本文讨论了一种使用detect_delimiter 的Python库解决此问题的方法。
安装
要安装此模块,请在终端中键入以下命令。
pip install detect_delimiter
第一步是检查输入文本中是否存在所有白名单字符,如果找到,则对大多数频率计算这些字符,并返回最多一个,如果提供,则忽略黑名单列表中的所有字符。如果没有分隔符是从白名单中,则避免了黑名单的字符的字符被计算为最大频率,如果找到,该字符被返回作为分隔符。如果仍然找不到分隔符,则默认返回作为分隔符(如果提供),否则返回 None。
Syntax: detect(text:str, text:str, default=None, whitelist=[‘,’, ‘;’, ‘:’, ‘|’, ‘\t’], blacklist=None)
text : The input string to test for delimiter.
default : The default value to output in case no valid delimiter is found.
whitelist : The first set of characters to be checked for delimiters, if these are found, they are treated as delimiters. Useful in cases one knows out of which delimiters are possible. Defaults to [‘,’, ‘;’, ‘:’, ‘|’, ‘\t’].
blacklist : By default all digits, alphabets and full stop are not considered as blacklist, In case more values one needs to avoid being tagged as delimiters, these will get avoided in check.
示例 1:使用 detect() 和 default
在此,展示了一些检测分隔符的示例以及默认值的使用。
Python3
from detect_delimiter import detect
# simple example
print("The found delimiter [base example] : ")
print(detect("Geeksforgeeks-is-best-for-geeks"))
# simple example without default and no delimiter
# . is not considered as delim
print("The found delimiter [no default] : ")
print(detect("Geeksforgeeks.is.best.for.geeks"))
# simple example with default
# . is not considered as delim
# No delim is found, hence, default is printed
print("The found delimiter [with default] : ")
print(detect("Geeksforgeeks.is.best.for.geeks", default='@'))
Python3
from detect_delimiter import detect
from string import ascii_letters
# simple example
# check for , as whitelist picked from default
# - [',', ';', ':', '|', '\t']
print("The found delimiter [default whitelist] : ")
print(detect("Geeksforgeeks$is-best,for-geeks"))
# simple example with whitelist
# ! priotitized
print("The found delimiter [provided whitelist] : ")
print(detect("Geeksforgeeks-is-best-for!geeks",
whitelist=['@', "!"]))
# simple example with blacklist
# default blacklist overridden
print("The found delimiter [provided blacklist] : ")
print(detect("Geeksforgeeks-is-best-for!geeks",
blacklist=['@', "-", 'e']))
输出 :
示例 2:使用黑名单和白名单参数
提供白名单参数优先考虑任何特定的分隔符,即使其频率低于非白名单分隔符。 blacklist 参数可以帮助忽略任何分隔符。
蟒蛇3
from detect_delimiter import detect
from string import ascii_letters
# simple example
# check for , as whitelist picked from default
# - [',', ';', ':', '|', '\t']
print("The found delimiter [default whitelist] : ")
print(detect("Geeksforgeeks$is-best,for-geeks"))
# simple example with whitelist
# ! priotitized
print("The found delimiter [provided whitelist] : ")
print(detect("Geeksforgeeks-is-best-for!geeks",
whitelist=['@', "!"]))
# simple example with blacklist
# default blacklist overridden
print("The found delimiter [provided blacklist] : ")
print(detect("Geeksforgeeks-is-best-for!geeks",
blacklist=['@', "-", 'e']))
输出 :