Advanced Python Development Course
Chapter
>
Level
String & Time Modules
Regex Module
Objective
Verify and organize files pertaining to employees and zoning allocations by using regular expressions.
In the neighboring offices there is some documentation that needs to be updated regarding employees starting their work on the new farmland and the zoning for where crops and livestock is to be tended to. Managing files can be quite tricky, especially when handling large bodies of string text. For this purpose we will be using the re module, which is short for Regular Expressions also abbreviated as Regex. We can access it’s functions by using import re , for the purposes of this level we’ll be using the following functions:
re.findall(): Return a list of all occurrences of a string. Takes two (2) arguments, the first is what characters you’re looking for in a string, the second is the string to search in.re.sub(): Replaces indicated occurrence with a certain sting. Takes three (3) arguments, the first is what characters you would like to replace, the second is what you would like to replace it with and the third is the string you would like to search in.re.search(): Find the location of something in a string, returns an object. Takes two (2) arguments, the characters you’re searching for and the string you’re searching in. You can use other functions with the object it returns such asspan()which returns a vector of the searched item’s start and end positions in the text.re.split(): Splits string into a list at specified intervals. Takes two(2) arguments, the first where to split the string by and the second is the string you would like to split.re.match(): Checks if a string contains a certain value at the start of the string. This works like a simplified versionre.search()function but is more efficient and checks if the string you’re searching in has the query at the start.
The re module can also use special sequences, there are codes that you can use with the re functions that allow you to encompass a variety of string properties. There is a long list of special sequences but for this level we will be using the following:
\B: Checks if specified characters are in the string but not located at the start or end of a word.\D: Returns characters are not digits 0-9
If an r is present with a special sequence, it means that it’s checking the raw string.
Start off by walking to the gold X mark and facing the table with the memo, use the read() function to check the memo which contains a manifest of all employees. Each name contains # that delineates their employee number, the string containing the names is also stored in a constant named manifest.
Create a list named tags and store the value of re.findall() , used to search for all the # in the manifest constant, like this: tags = re.findall("#", manifest) . Create a variable named number and use len() with the tags list to count how many items are on the list, this will let us know how many employees are on the manifest. Use the speak() function with the number variable announce how many names are on the list.
Next up, walk to the light X mark next to the blue carpet and face the desk and use the read() function. Here you will have a list of new hires with their assigned posts. Make note of the names in each profession so you can cross reference with the people currently assigned. Walk to the X mark over the blue carpet and use read() again to verify the currently assigned jobs.
The current workforce are stored in a constant named assignments , we must update this document with the information in the new hires list. Replace the names that are different in the list by using the re.sub() function to substitute one part of the string with another. For example one of the edits is as follows:
assignments = re.sub("Billy Hodgins", "Carol Hopkins", assignments)
The name "Billy Hodgins" is replaced with "Carol Hopkins" in the document. Aside from this change scan the document and update this and one other name in the list in order to fully update the document. Use the write() function with assignments to check the results.
Now that we’ve taken care of the employee rosters it’s time to move onto zoning for the farm. Walk to the dark X mark over the red carpet and use the read() function, this will show you a detailed review of the zoning. This information is stored in a constant named zones .
Of particular note it’s important to identify the location of 6210 in the document as the zone needs to be reevaluated. To do this we need to use the re.search() function to identify the location of that sector number in the zoning. Create a variable named index and store the search object, set the search to r"6210\B" . What this does is, the r looks for a raw string, the 6210 is the zone sector we’re looking for, and the \B is the special sequence criteria for the search. This is performed like this: index = re.search(r"6210\B", zones) .
Now that index holds a search object, it’s time to extract data from it using span() . Create a variable named vector and store the location of the search object, like this: vector = index.span() . Once set up, use the vector variable with the pre-written write() .
Next we’ll be working identifying the zoning items walk to the dark X mark over the green carpet. Create a list named sectors and use the re.split() function to consolidate all the various zoning sectors into a list. On the split function, use the special sequence "\D" to take only the numbers from the zones variables. Use a list comprehension to remove any empty spaces in the list using the len() function to remove items that are too small to be a sector number. Like this:
sectors = re.split("\D", zones) sectors = [x for x in sectors if len(x) > 3]
Use the zones with the write() functions to chart down all the zoning sectors.
Walk to the light X mark next to the purple carpet and face the filing cabinet, use the read() function to verify priority sectors in the farmland zoning. Make notes of the sectors as we need to verify if these sectors are located in the zoning list.
Walk to the dark X mark over the purple carpet, there are three (3) priority sectors in variables named: sector_a, sector_b , sector_c , pre-written on the code editors, insert the values you read in the light X mark. These variables use a list comprehension that goes through all the items in the sectors list and filters them through the re.match() function. Use the sector_a, sector_b , sector_c variables and insert them into the pre-written write() function in order to complete the level.