📅  最后修改于: 2023-12-03 15:14:14.109000             🧑  作者: Mango
Are you tired of manually matching strings and getting mediocre results? Look no further than BlurWuzzy, a powerful fuzzy string matching library for Python that can accurately match strings even with typos and variable spellings.
Fuzzy String Matching is a process of finding strings that match a pattern approximately. It involves comparing two strings and calculating how similar they are based on various factors like typos, spelling variations, word order, spacing, etc.
BlurWuzzy is an open-source Python package that provides functions for fuzzy string matching. It uses the Levenshtein distance algorithm to calculate the edit distance between two strings. It also provides various methods to improve matching accuracy, like tokenization, stopword removal, and string length normalization.
You can easily install BlurWuzzy using Conda, the popular package manager for Python.
conda install -c conda-forge fuzzywuzzy
BlurWuzzy provides several functions for string matching, each with its unique features and usage.
Here are a few examples:
fuzz.ratio()
function uses the Levenshtein distance to calculate the edit distance between two strings and returns a similarity score in the range of 0 to 100, where 100 means the strings are identical.
from fuzzywuzzy import fuzz
str1 = "Hello World"
str2 = "Helo wrld"
ratio = fuzz.ratio(str1, str2)
print(ratio) # Output: 67
fuzz.token_sort_ratio()
function sorts the tokens in each string alphabetically before comparing them, providing a better matching score for strings with the same words but in different order.
from fuzzywuzzy import fuzz
str1 = "Apple Banana Cherry"
str2 = "Cherry Banana Apple"
ratio = fuzz.token_sort_ratio(str1, str2)
print(ratio) # Output: 100
fuzz.token_set_ratio()
function compares two strings and matches the intersection of the token sets in each string, providing a better matching score for strings with different word order and spacing.
from fuzzywuzzy import fuzz
str1 = "Apple Banana Cherry"
str2 = "Banana Cherry Apple Pear"
ratio = fuzz.token_set_ratio(str1, str2)
print(ratio) # Output: 75
BlurWuzzy is an essential tool for any programmer who deals with string matching. It provides a vast array of functions that can handle various use cases with high accuracy. So, if you want to save time and get better string matching results, give it a try with Conda!