📜  Python regex emailadres no jpg - Python (1)

📅  最后修改于: 2023-12-03 14:46:03.535000             🧑  作者: Mango

Python Regex Email Address without JPG

When it comes to processing text data, one of the most common tasks is to extract email addresses from a bunch of text. Even though email addresses have a well-defined format, using regular expressions to match them is still hard. In this tutorial, we'll learn how to extract email addresses from a string of text and exclude any email addresses that contain ".jpg".

Importing the Required Libraries

For this task, we'll be using the re library that comes pre-installed with Python. It provides support for regular expressions and makes it easy to search for patterns in text data.

import re
Defining the Regular Expression

In order to extract the email addresses from a string of text, we need to define a regular expression that matches the pattern of an email address. Here's what the expression looks like:

email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b(?<!\.jpg)'

Let's break down the above expression:

  • \b matches a word boundary (i.e., the beginning or end of a word).
  • [A-Za-z0-9._%+-]+ matches one or more characters that are either letters of the alphabet, digits, or one of the special characters ._%+-.
  • @ matches the "@" symbol.
  • [A-Za-z0-9.-]+ matches one or more characters that are either letters of the alphabet, digits, or one of the special characters .-.
  • \. matches a "." character.
  • [A-Z|a-z]{2,} matches two or more characters that are either uppercase or lowercase letters of the alphabet.
  • \b again matches a word boundary.
  • (?<!\.jpg) is a negative lookbehind assertion that excludes any matches that end with ".jpg".
Extracting Email Addresses

Once we have defined the regular expression, we can use the re.findall() function to extract all email addresses from a string of text.

Here's an example:

text = 'my email address is me@example.com and my colleague\'s email address is you@example.com. However, we don\'t want to receive emails with .jpg attachment.'

emails = re.findall(email_pattern, text)
print(emails)

The output will be:

['me@example.com', 'you@example.com']

As you can see, the output only includes email addresses that do not contain the ".jpg" extension.

Conclusion

In this tutorial, we learned how to extract email addresses from a string of text using regular expressions in Python. We also learned how to exclude email addresses that contain the ".jpg" extension. Regular expressions are a powerful tool for processing text data and can be used for many other text manipulation tasks as well.