Automation - Regex
credit link:https://www.datacamp.com/tutorial/python-regular-expression-tutorial
Regular Expression
short for Regex, are used to check if a pattern exists in a give string. classical regex use are internet search engines and file system finders.
- Regex in Python:
import re
and you can use it. - Basic patterns: use
re.match(pattern, sequence)
to match two parameters.
wild card characters
-
```angular2html # ********group() method useage************ statement = 'support@datacamp.com' match = re.search(r'([\w\.-]+)@([\w\.-]+)', statement) if statement: print("Email address:", match.group()) # return Email address: support@datacamp.com print("Username:", match.group(1)) # return Username: support print("Host:", match.group(2)) # return Host: datacamp.com
.
the period: match any char.^
the caret: match the start of string$
match the end of the string.[abc]
: match a or b or c.[a-zA-Z0-9]
match any letter or number.\
:- -[x] escape character
- -[x] ordinary character
- -[x] in front of metachar to remove their special meaning
\w
: match any single letter, digit, underscore\W
: match any character not part of\w
.\s
: match single whitespace: space, newline, tab, return\S
: match any char not par of\s
.\d
: Lowercase d. Matches decimal digit 0-9.\D
: Uppercase d. Matches any character that is not a decimal digit.\t
: Lowercase t. Matches tab.\n
: Lowercase n. Matches newline.\r
: Lowercase r. Matches return.\A
: Uppercase a. Matches only at the start of the string. Works across multiple lines as well.\Z
: Uppercase z. Matches only at the end of the string.
Repetitions
+
: check if preceding char appears one or more time start frm that position.*
: check if preceding char appears zero or more time start frm that position.?
: check if preceding char appears zero or one time start frm that position.{x}
: Repeat exactly x number of times.{x,}
: Repeat at least x times or more.{x, y}
: Repeat at least x times but no more than y times.()
: Creates a group when performing matches.<>
: Creates a named group when performing matches.
search(), match() and findall()
- match() function checks for a match only at the beginning of the string (by default),
- search() function checks for a match anywhere in the string.
- findall() Finds all the possible matches in the entire sequence and returns them as a list of strings.
- sub(): sbustitute fucntion
- split(): split matched and return a list.
Compliation flag
- IGNORECASE (I) - Allows case-insensitive matches.
- DOTALL (S) - Allows . to match any character, including newline.
- MULTILINE (M) - Allows start of string (^) and end of string ($) anchor to match newlines as well.
- VERBOSE (X) - Allows you to write whitespace and comments within a regular expression to make it more readable.
shutil - High-level File Operations
- copyfile() copies the contents of the source to the destination and raises IOError if it does not have permission to write to the destination file.
- copy() function interprets the output name similar to the commandline
cp
- copy2() similar to copy() but have more details including access and modifying time.
- copystat() copy meta data
- other shutil methods:
copyfile() copyfileobj() copymode() copystat() copytree() copytree_verbose() disk_usage() get_archive_formats() get_unpack_formats() make_archive() move() rmtree() unpack_archive() which() which_regular_file()
more useage details see this reference link: https://pymotw.com/3/shutil/
Things I want to know more:
- Regex implementation
- implementation of shutil to precess files from local file systems.