python 删除标点符号

📌 相关文章

📜 python 删除标点符号 - Python (1)

📅 最后修改于: 2023-12-03 15:19:07.900000 🧑 作者: Mango

在进行自然语言处理（NLP）时，我们可能需要从纯文本数据中删除标点符号。这可以通过使用Python内置的字符串方法和正则表达式来实现。

使用字符串方法删除标点符号

Python内置的string模块包含一个称为punctuation的字符串，其中包含所有标点符号。我们可以使用translate()方法删除这些标点符号。

import string

text = "This is a sentence. It contains punctuation!"
translator = text.maketrans("", "", string.punctuation)
text_without_punct = text.translate(translator)

print(text_without_punct)

这将输出:

This is a sentence It contains punctuation

使用正则表达式删除标点符号

我们也可以使用Python的正则表达式模块（re）来删除标点符号。下面是一个使用正则表达式的例子。

import re

text = "This is a sentence. It contains punctuation!"
text_without_punct = re.sub(r'[^\w\s]','',text)

print(text_without_punct)

这将输出:

This is a sentence It contains punctuation

在这个例子中，我们使用re.sub()方法将不是单词字符或空格的所有字符替换为空字符串。

无论是使用字符串方法还是正则表达式，从文本中删除标点符号都很简单。