Python Regular Expression (RegEX)

Python Regular Expressions, or RegEX, are handy tools for pattern matching and manipulating strings. Python regex offers a fast method for searching, matching, and manipulating text based on predefined patterns. Python regex may significantly improve your programming abilities, whether you're validating user input, parsing data, or extracting information from massive text files. This article will help you become proficient with Python regex by introducing you to its fundamentals, explaining its operation, and offering real-world applications. By the conclusion, you'll have the skills necessary to use regex in various practical applications, improving the effectiveness and efficiency of your coding.

Master Web Scraping, Django & More!

Python Certification CourseENROLL NOW
Master Web Scraping, Django & More!

Regex Module in Python

A collection of functions for working with regular expressions can be found in Python's 're' module. It enables you to search, match, and work with text using particular patterns. The following are some of the main ideas and features of the 're' module:

1. Importing the Module

Before using regex functions, you need to import the ‘re’ module:

import re

2. Basic Functions

search()

Searches a string for a match and returns a match object if found.

match = re.search(r'\d+', 'There are 123 apples')

print(match.group())  # Output: 123

match()

Checks if the beginning of a string matches the pattern.

match = re.match(r'Hello', 'Hello, world!')

print(match.group())  # Output: Hello

findall()

Finds all matches of a pattern in a string and returns a list of matches.

matches = re.findall(r'\d+', '123 apples and 456 oranges')

print(matches)  # Output: ['123', '456']

sub()

Replaces matches of a pattern with a specified string.

result = re.sub(r'apples', 'bananas', 'I like apples')

print(result)  # Output: I like bananas

Dive Deep into Core Python Concepts

Python Certification CourseENROLL NOW
Dive Deep into Core Python Concepts

3. Special Characters

  • . (Dot): Matches any character except a newline.
  • ^ (Caret): Matches the start of the string.
  • $ (Dollar Sign): Matches the end of the string.
  • [] (Square Brackets): Matches any one of the characters inside the brackets.
  • \ (Backslash): Escapes special characters or signals a particular sequence.

4. Special Sequences

  • \d: Matches any digit.
  • \D: Matches any non-digit character.
  • \s: Matches any whitespace character.
  • \S: Matches any non-whitespace character.
  • \w: Matches any alphanumeric character.
  • \W: Matches any non-alphanumeric character.

5. Quantifiers

  • *: Matches 0 or more repetitions of the preceding pattern.
  • +: Matches 1 or more repetitions of the preceding pattern.
  • ?: Matches 0 or 1 repetition of the preceding pattern.
  • {n}: Matches exactly n repetitions of the preceding pattern.
  • {n,}: Matches n or more repetitions of the preceding pattern.
  • {n,m}: Matches between n and m repetitions of the preceding pattern.

Dive Deep into Core Python Concepts

Python Certification CourseENROLL NOW
Dive Deep into Core Python Concepts

6. Compiling Patterns

For better performance, especially for patterns used multiple times, compile the regex pattern using ‘re.compile()’.

pattern = re.compile(r'\d+')

matches = pattern.findall('123 apples and 456 oranges')

print(matches)  # Output: ['123', '456']

Given its versatility and strength, Python's 're' module is a must-have tool for any programmer with text processing and pattern matching. You can handle various text manipulation tasks effectively if you grasp these principles and functions.

Seize the Opportunity: Become a Python Developer!

Python Certification CourseENROLL NOW
Seize the Opportunity: Become a Python Developer!

How to Use RegEx in Python?

To search, match, and edit strings in Python, import the 're' module and use its functions to create regular expressions (RegEx). Instructions and examples for using RegEx in Python are provided below.

1. Importing the ‘re’ Module

import re

2. Using ‘search()’ Function

The ‘search()’ function searches the string for a match and returns a match object if found.

import re

text = "The price is 123 dollars"

match = re.search(r'\d+', text)

if match:

    print("Found a match:", match.group())  # Output: Found a match: 123

3. Using ‘match(); Function

The ‘match()’ function checks if the beginning of the string matches the pattern.

import re

text = "Hello, world!"

match = re.match(r'Hello', text)

if match:

    print("Found a match:", match.group())  # Output: Found a match: Hello

4. Using ‘findall()’ Function

The ‘findall()’ function finds all matches of a pattern in a string and returns them as a list.

import re

text = "123 apples and 456 oranges"

matches = re.findall(r'\d+', text)

print("All matches:", matches)  # Output: All matches: ['123', '456']

Master Web Scraping, Django & More!

Python Certification CourseENROLL NOW
Master Web Scraping, Django & More!

5. Using ‘sub()’ Function

The ‘sub()’ function replaces matches of a pattern with a specified string.

import re

text = "I like apples"

result = re.sub(r'apples', 'bananas', text)

print("Replaced text:", result)  # Output: Replaced text: I like bananas

6. Using ‘compile()’ Function

The ‘compile()’ function compiles a regular expression pattern into a regex object for reuse.

import re

pattern = re.compile(r'\d+')

text = "123 apples and 456 oranges"

matches = pattern.findall(text)

print("All matches:", matches)  # Output: All matches: ['123', '456']

Example

Here's a practical example of using RegEx to extract email addresses from a text.

import re

text = """

Contact us at support@example.com for more information.

You can also reach out to sales@example.com or marketing@example.net.

"""

# Define the regex pattern for email addresses

email_pattern = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'

# Use findall() to extract all email addresses

email_addresses = re.findall(email_pattern, text)

# Print the extracted email addresses

print("Extracted email addresses:", email_addresses)

# Output: Extracted email addresses: ['support@example.com', 'sales@example.com', 'marketing@example.net']

By importing the ‘re’ module and using functions like ‘search()’, ‘match()’, ‘findall()’, and ‘sub()’, you can efficiently perform pattern matching and text manipulation in Python. Compiling regex patterns using ‘re.compile()’ can improve performance for repeated use. Understanding and utilizing these functions can significantly enhance your text-processing capabilities.

Seize the Opportunity: Become a Python Developer!

Python Certification CourseENROLL NOW
Seize the Opportunity: Become a Python Developer!

RegEx Functions

RegEx Function

Explanation

Example

findall()

  • Returns a list of all matches of a pattern in a string.

```python

  • Useful for extracting all occurrences of a pattern from text.

import re

text = "123 apples, 456 oranges, and 789 bananas"

matches = re.findall(r'\d+', text)

print(matches) # Output: ['123', '456', '789']

search()

  • Searches the string for a match to the pattern and returns a match object if found.

```python

  • Useful for checking if a pattern exists within a string.

import re

text = "Hello, world!"

match = re.search(r'world', text)

if match:

print("Found:", match.group()) # Output: Found: world

split()

  • Splits the string by the pattern occurrences and returns a list of substrings.

```python

  • Useful for breaking a string into parts based on a pattern.

import re

text = "one, two, three; four"

parts = re.split(r'[,;]', text)

print(parts) # Output: ['one', ' two', ' three', ' four']

sub()

  • Replaces occurrences of the pattern in the string with a specified replacement string.

```python

  • Useful for modifying parts of a string based on a pattern.

import re

text = "I like apples"

result = re.sub(r'apples', 'bananas', text)

print(result) # Output: I like bananas

compile()

  • Compiles a regex pattern into a regex object for repeated use.

```python

  • Useful for improving performance when using the same pattern multiple times.

import re

pattern = re.compile(r'\d+')

text = "123 apples and 456 oranges"

matches = pattern.findall(text)

print(matches) # Output: ['123', '456']

escape()

  • Escapes all non-alphanumeric characters in a string.

```python

  • Useful for treating special characters in a string as literals.

import re

text = "example.com?query=value"

escaped_text = re.escape(text)

print(escaped_text) # Output: example.com?query=value

fullmatch()

  • Checks if the entire string matches the pattern.

```python

  • Useful for validating strings that must conform entirely to a pattern.

import re

pattern = r'Hello, world!'

text = 'Hello, world!'

match = re.fullmatch(pattern, text)

if match:

print("Exact match!") # Output: Exact match!

Master Web Scraping, Django & More!

Python Certification CourseENROLL NOW
Master Web Scraping, Django & More!

MetaCharacters

MetaCharacter

Explanation

Example

[]

  • Used to specify a set of characters to match.
  • Matches any one of the characters inside the brackets.

```python

import re

text = "bat, cat, hat"

matches = re.findall(r'[bch]at', text)

print(matches) # Output: ['bat', 'cat', 'hat']

\

  • Escapes special characters, allowing them to be treated as literals.
  • Also used for special sequences (e.g., \d for digits)

```python

import re

text = "This is a test. 123."

matches = re.findall(r'\d+', text)

print(matches) # Output: ['123']

.

  • Matches any character except a newline.
  • Useful for wildcard searches.

‘’’Python

import re

text = "cat, cot, cut"

matches = re.findall(r'c.t', text)

print(matches) # Output: ['cat', 'cot', 'cut']

^

  • Matches the start of the string.
  • Ensures the pattern appears at the beginning.

```python

import re

text = "Hello, world!"

match = re.search(r'^Hello', text)

if match:

print("Found:", match.group()) # Output: Found: Hello

$

  • Matches the end of the string.
  • Ensures the pattern appears at the end.

```python

import re

text = "Welcome to Python"

match = re.search(r'Python$', text)

if match:

print("Found:", match.group()) # Output: Found: Python

*

1. Matches 0 or more repetitions of the preceding pattern.

2. Useful for matching optional and repeated characters.

```python

import re

text = "ac, abc, abbc"

matches = re.findall(r'ab*c', text)

print(matches) # Output: ['ac', 'abc', 'abbc']

+

  • Matches 1 or more repetitions of the preceding pattern.
  • Useful for matching at least one occurrence.

```python

import re

text = "ac, abc, abbc"

matches = re.findall(r'ab+c', text)

print(matches) # Output: ['abc', 'abbc']

?

  • Matches 0 or 1 repetition of the preceding pattern.
  • 2. Useful for making the preceding character optional.

```python

import re

text = "color, colour"

matches = re.findall(r'colou?r', text)

print(matches) # Output: ['color', 'colour']

**

  • Acts as a logical OR, matching patterns on either side.
  • Useful for matching multiple patterns.

```python

import re

text = "cat, bat, rat"

matches = re.findall(r'cat

print(matches) # Output: ['cat', 'rat']

Skyrocket Your Career: Earn Top Salaries!

Python Certification CourseENROLL NOW
Skyrocket Your Career: Earn Top Salaries!

Special Sequences

Special Sequence

Explanation

Example

\A

  • Matches if the specified characters are at the start of the string.
  • Similar to ‘^’ but more restrictive (works only at the start).

```python

import re

text = "Hello world"

match = re.search(r'\AHello', text)

if match:

print("Found:", match.group()) # Output: Found: Hello

\b

  • Matches the empty string at the beginning or end of a word.
  • Useful for word boundaries.

```python

import re

text = "Hello, world!"

matches = re.findall(r'\bworld\b', text)

print(matches) # Output: ['world']

\B

  • Matches the empty string, not at the beginning or end of a word.
  • Useful for non-word boundaries.

```python

import re

text = "Hello, world!"

matches = re.findall(r'\Bworld\B', text)

print(matches) # Output: []

\d

  • Matches any digit ‘(0-9)’.
  • Equivalent to ‘[0-9]’.

```python

import re

text = "There are 123 apples"

matches = re.findall(r'\d+', text)

print(matches) # Output: ['123']

\D

  • Matches any non-digit character.
  • Equivalent to ‘[^0-9]’.

```python

import re

text = "There are 123 apples"

matches = re.findall(r'\D+', text)

print(matches) # Output: ['There are ', ' apples']

\s

  • Matches any whitespace character (spaces, tabs, newlines).
  • Equivalent to ‘[ \t\n\r\f\v]’.

```python

import re

text = "Hello world!"

matches = re.findall(r'\s', text)

print(matches) # Output: [' ']

\S

  • Matches any non-whitespace character.
  • Equivalent to ‘[^ \t\n\r\f\v]’.

```python

import re

text = "Hello world!"

matches = re.findall(r'\S+', text)

print(matches) # Output: ['Hello', 'world!']

\w

  • Matches any alphanumeric character (letters and digits) plus underscores.
  • Equivalent to ‘[a-zA-Z0-9_]’.

```python

import re

text = "Hello_world 123"

matches = re.findall(r'\w+', text)

print(matches) # Output: ['Hello_world', '123']

\W

  • Matches any non-alphanumeric character.
  • Equivalent to ‘[^a-zA-Z0-9_]’.

```python

import re

text = "Hello world!"

matches = re.findall(r'\W+', text)

print(matches) # Output: [' ', '!']

\Z

  • Matches if the specified characters are at the end of the string.
  • Similar to ‘$’ but more restrictive (works only at the end).

```python

import re

text = "Hello world"

match = re.search(r'world\Z', text)

if match:

print("Found:", match.group()) # Output: Found: world

Unleash Your Career as a Full Stack Developer!

Full Stack Developer - MERN StackEXPLORE COURSE
Unleash Your Career as a Full Stack Developer!

SETS

Set

Explanation

Example

[arn]

  • Matches any one of the characters 'a', 'r', or 'n'.
  • Useful for matching a specific set of characters.

```python

import re

text = "apple, banana, orange"

matches = re.findall(r'[arn]', text)

print(matches) # Output: ['a', 'a', 'a', 'n', 'a', 'a', 'a']

[a-n]

  • Matches any character in the range 'a' to 'n'.
  • Useful for matching a range of characters.

```python

import re

text = "apple, banana, orange"

matches = re.findall(r'[a-n]', text)

print(matches) # Output: ['a', 'l', 'e', 'a', 'a', 'n', 'a', 'a', 'e']

[^arn]

  • Matches any character except 'a', 'r', or 'n'.
  • Useful for excluding specific characters.

```python

import re

text = "apple, banana, orange"

matches = re.findall(r'[^arn]', text)

print(matches) # Output: ['p', 'p', 'l', 'e', 'b', ' ', 'g', 'e']

[0123]

  • Matches any one of the digits '0', '1', '2', or '3'.
  • Useful for matching a specific set of digits.

```python

import re

text = "1024, 123, 456"

matches = re.findall(r'[0123]', text)

print(matches) # Output: ['1', '0', '2', '1', '2', '3']

[0-9]

  • Matches any digit from 0 to 9.
  • Equivalent to ‘\d’ and useful for matching any digit.

```python

import re

text = "1024, 123, 456"

matches = re.findall(r'[0-9]', text)

print(matches) # Output: ['1', '0', '2', '4', '1', '2', '3', '4', '5', '6']

[0-5][0-9]

  • Matches any two-digit number from 00 to 59.
  • Useful for matching ranges like minutes or seconds.

```python

import re

text = "The time is 12:45 and 08:30."

matches = re.findall(r'[0-5][0-9]', text)

print(matches) # Output: ['12', '45', '08', '30']

[a-zA-Z]

  • Matches any letter, lowercase or uppercase.
  • Useful for case-insensitive matching of letters.

```python

import re

text = "Hello, World!"

matches = re.findall(r'[a-zA-Z]', text)

print(matches) # Output: ['H', 'e', 'l', 'l', 'o', 'W', 'o', 'r', 'l', 'd']

[+]

  • Matches the literal plus sign '+'.
  • Useful for matching special characters by placing them inside square brackets.

```python

import re

text = "Use + for addition"

matches = re.findall(r'[+]', text)

print(matches) # Output: ['+']

Skyrocket Your Career: Earn Top Salaries!

Python Certification CourseENROLL NOW
Skyrocket Your Career: Earn Top Salaries!

Conclusion

Python Regular Expressions (RegEX) are a powerful and adaptable tool for pattern matching and string manipulation that are necessary for everything from text processing to data validation. Gain proficiency with the 're' module and its functions, including 'match()', 'sub()', 'findall()', and 'search()', to effectively handle intricate text processing assignments. Python regex offers the versatility and effectiveness required to easily perform tasks like text replacement, information extraction, and pattern recognition. As you learn more about and put regex into practice using Python, you'll discover that it's a valuable tool for programmers and will help you work with textual data more efficiently. Enrolling in Python Training with Simplilearn can further enhance your understanding and mastery of these essential skills, equipping you with the expertise needed to excel in various programming tasks.

FAQs

1. How to Check if a String Matches a Regex in Python?

To check if a string matches a regex pattern in Python, you can use the ‘re’ module's ‘match()’, ‘search()’, or ‘fullmatch()’ functions. The ‘re.match()’ function checks if the beginning of the string matches the regex pattern, while ‘re.search()’ scans the entire string for any part that matches the pattern. The ‘re.fullmatch()’ function ensures that the entire string matches the regex pattern. Each function returns a match object if a match is found or None if there is no match. For example, ‘re.search(r'\d+', 'Hello 123')’ would find a match for the digits in the string, confirming the presence of the pattern.

2. How to Search a Phrase in Regex Python?

To search for a phrase in a string using regex in Python, you can use the ‘re.search()’ function from the ‘re’ module. First, import the ‘re’ module, then define your regex pattern, which can include the exact phrase you want to search for. The ‘re.search()’ function scans the entire string for a match to the pattern and returns a match object if it finds the phrase; otherwise, it returns ‘None’. For instance, to search for the phrase "hello world" in a text, you can use ‘re.search(r'hello world', text)’. If the phrase is found, you can access the matched text using ‘match.group()’. This method is effective for locating specific phrases within larger texts.

3. How to Replace Something in a Text File With Regex Python?

To replace text in a file using regex in Python, you can use the ‘re’ module's ‘sub()’ function. First, read the content of the file into a string. Then, use ‘re.sub()’ to define the pattern and the replacement text you want to replace. After performing the substitution, write the modified content back to the file. For example, to replace all occurrences of "foo" with "bar" in a file, you can open the file, read its content, apply ‘re.sub(r'foo', 'bar', content)’, and then write the updated content back to the file. This method allows for powerful and flexible text manipulation using regex patterns.

Example

import re

# Read the file content

with open('example.txt', 'r') as file:

    content = file.read()

# Replace text using regex

updated_content = re.sub(r'foo', 'bar', content)

# Write the modified content back to the file

with open('example.txt', 'w') as file:

    file.write(updated_content)

4. How to Find Full Name Regex in Python?

To find a full name using regex in Python, you can create a pattern that matches typical name formats. A common pattern for full names consists of two words, each starting with an uppercase letter followed by lowercase letters and possibly including middle names or initials. Use the ‘re’ module's ‘findall()’ or ‘search()’ functions to locate names in the text. For example, the pattern ‘r'\b[A-Z][a-z]+\s[A-Z][a-z]+\b'’ can match names like "John Doe". This pattern ensures the first and last names start with capital letters and are separated by a space. To find all full names in a string, you can use ‘re.findall()’ to return a list of matches.

Example

import re

text = "Contact John Doe or Jane Smith for more information."

# Define the regex pattern for a full name

pattern = r'\b[A-Z][a-z]+\s[A-Z][a-z]+\b'

# Find all full names in the text

full_names = re.findall(pattern, text)

print(full_names)  # Output: ['John Doe', 'Jane Smith']

About the Author

SimplilearnSimplilearn

Simplilearn is one of the world’s leading providers of online training for Digital Marketing, Cloud Computing, Project Management, Data Science, IT, Software Development, and many other emerging technologies.

View More
  • Acknowledgement
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, OPM3 and the PMI ATP seal are the registered marks of the Project Management Institute, Inc.