fbpx

Regular Expressions in Python

1. Introduction to Regular Expressions:

Regular expressions, often abbreviated as regex or regexp, provide a powerful and concise way to search, match, and manipulate strings. They are a sequence of characters that define a search pattern, which can be used for string matching and manipulation.

2. The re Module:

In Python, regular expressions are implemented using the re module. This module provides functions for working with regular expressions.

3. Basic Patterns:

  • Literal Characters: Match the exact characters.
  import re

  pattern = re.compile(r'hello')
  result = pattern.match('hello world')
  • Character Classes: Match any character in a set.
  pattern = re.compile(r'[aeiou]')
  result = pattern.search('Hello World')
  • Quantifiers: Specify the number of occurrences.
  pattern = re.compile(r'\d{2,4}')
  result = pattern.search('12345')

4. Special Characters:

  • . (Dot): Matches any character except a newline.
  • ^ (Caret): Matches the start of the string.
  • $ (Dollar): Matches the end of the string.

5. Modifiers:

  • re.IGNORECASE or re.I: Ignore case in matching.
  pattern = re.compile(r'hello', re.IGNORECASE)
  result = pattern.match('HeLLo World')
  • re.MULTILINE or re.M: Match the start and end of each line.
  pattern = re.compile(r'^\d+', re.MULTILINE)
  result = pattern.findall('123\n456\n789')

6. Groups and Capturing:

  • Use parentheses to create groups.
  pattern = re.compile(r'(\d+)-(\d+)')
  result = pattern.match('123-456')
  • Capturing groups extract matched parts.
  print(result.group(1))  # Output: 123
  print(result.group(2))  # Output: 456

7. Common Patterns:

  • Email Validation:
  email_pattern = re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b')
  • Phone Number Validation:
  phone_pattern = re.compile(r'\b\d{3}-\d{3}-\d{4}\b')

8. Search and Match Functions:

  • re.search(pattern, string): Searches for the first occurrence.
  • re.match(pattern, string): Matches only at the beginning of the string.
  • re.findall(pattern, string): Returns all occurrences as a list.

9. Substitution and Splitting:

  • re.sub(pattern, replacement, string): Replaces occurrences with a specified string.
  text = 'Hello, world!'
  result = re.sub(r'world', 'Python', text)
  • re.split(pattern, string): Splits a string based on a pattern.
  text = 'apple,orange,banana'
  result = re.split(r',', text)

10. Conclusion:

Regular expressions are a powerful tool for string manipulation and pattern matching in Python. They provide a concise and flexible way to handle complex string patterns. Understanding how to use regular expressions can significantly enhance your ability to work with textual data and improve the efficiency of your code.

In the next sections, we’ll explore more advanced topics and practical applications of regular expressions in Python.


Lesson Content
0% Complete 0/1 Steps