fbpx

Pattern Matching with the re Module in Python

1. Introduction:

The re module in Python provides a powerful set of functions for working with regular expressions. Pattern matching with re involves defining a pattern and searching or matching that pattern within a given string.

2. Basic Pattern Matching:

The simplest form of pattern matching involves searching for a specific sequence of characters.

import re

pattern = re.compile(r'hello')
result = pattern.search('hello world')
print(result.group())  # Output: hello

3. Character Classes and Quantifiers:

Using character classes and quantifiers to match more complex patterns.

pattern = re.compile(r'\d{2,4}')
result = pattern.search('The year is 2022')
print(result.group())  # Output: 2022

4. Anchors for Start and End:

Using anchors ^ and $ to match the start and end of a string.

pattern = re.compile(r'^\d{3}-\d{2}-\d{4}$')
result = pattern.match('123-45-6789')
print(result.group())  # Output: 123-45-6789

5. Character Classes and Negation:

Matching characters within a specific range or negating a character class.

pattern = re.compile(r'[aeiou]')
result = pattern.findall('Hello World')
print(result)  # Output: ['e', 'o', 'o']
pattern = re.compile(r'[^aeiou]')
result = pattern.findall('Hello World')
print(result)  # Output: ['H', 'l', 'l', ' ', 'W', 'r', 'l', 'd']

6. Groups and Capturing:

Using groups to capture specific parts of a pattern.

pattern = re.compile(r'(\d+)-(\d+)')
result = pattern.match('123-456')
print(result.group(1))  # Output: 123
print(result.group(2))  # Output: 456

7. Quantifiers and Greedy Matching:

Understanding how quantifiers operate and using non-greedy quantifiers.

pattern = re.compile(r'\d+')
result = pattern.match('123456789')
print(result.group())  # Output: 123456789
pattern = re.compile(r'\d+?')
result = pattern.match('123456789')
print(result.group())  # Output: 1

8. Named Groups:

Assigning names to groups for easier access.

pattern = re.compile(r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})')
result = pattern.match('2022-01-15')
print(result.group('year'))  # Output: 2022
print(result.group('month'))  # Output: 01
print(result.group('day'))  # Output: 15

9. Search and Findall:

Using search to find the first match and findall to find all matches.

pattern = re.compile(r'\d+')
result = pattern.search('There are 42 apples and 36 oranges.')
print(result.group())  # Output: 42
pattern = re.compile(r'\d+')
result = pattern.findall('There are 42 apples and 36 oranges.')
print(result)  # Output: ['42', '36']

10. Conclusion:

Pattern matching with the re module is a powerful tool for working with textual data in Python. It provides a flexible and expressive way to define and search for patterns within strings. Whether you’re validating user input, extracting information, or manipulating text, regular expressions are an invaluable tool in your Python programming toolkit.

In the next sections, we’ll explore more advanced topics and practical applications of regular expressions in Python.