I just love the way Eric Wastl links the story.
This time it’s about passport processing.
Do visit the link to read the story and understand the task better. But at a high-level, you need to understand this – there are two types of documents:
- North Pole Credentials
- Passport
The passport scanner machine is slow and facing some problem detecting which passport have all the fields. The expected fields are as follows:
byr
(Birth Year)iyr
(Issue Year)eyr
(Expiration Year)hgt
(Height)hcl
(Hair Color)ecl
(Eye Color)pid
(Passport ID)cid
(Country ID)
The format of the passport is represented as:
- sequence of the
key:val
pair separated by space or newline. - Passports are separated by blank lines.
Below is the example batch file containing four passports:
ecl:gry pid:860033327 eyr:2020 hcl:#fffffd byr:1937 iyr:2017 cid:147 hgt:183cm
iyr:2013 ecl:amb cid:350 eyr:2023 pid:028048884 hcl:#cfa07d byr:1929
hcl:#ae17e1 iyr:2013 eyr:2024 ecl:brn pid:760753108 byr:1931 hgt:179cm
hcl:#cfa07d eyr:2025 pid:166559648 iyr:2011 ecl:brn hgt:59in
Table of Contents
Part 1 – Scan and Verify that the given passport contains all the required fields
While scanning the passport we have to look for the presence of all the credentials. However, there is one optional field cid
. Except cid
all the other fields are mandatory.
So, we have to read the batch file, scan all the passports and identify how many of those passports are valid (contains all the mandatory fields).
The first thing that we have to do is to parse the given file and fetch the required information.
Perhaps this is the most important step in this problem. So, how can we parse the file in a way that will help us to quickly solve the problem.
Here’s an idea.
When you look at the test file, what do you see?
Well, the information related to the single passport is spread across multiple lines. But one thing that separates one passport from the other is the empty line. Or specifically whenever we encounter \n\n
twice in a row. I’ll use this information to separate out the different passports in the batch file.
Once I have the each passport details in its own line then I can simply replace the \n
with an empty space ' '
. This will give me all the information in one line.
After I have the data in the above format then by splitting them by ' '
will give us the key and value of the passport and then further splitting each keyval by :
will give us their value.
I’ll stop with the explanation now and write some code.
Step – 1
f = open("../inputs/day4.txt")
all_lines = f.read().strip()
lines = all_lines.split("\n\n")
Step – 2
Iterate over each line and parse the key and value
for line in lines:
line = line.strip().replace("\n", " ")
details = {}
for keyval in line.split(" "):
key,val = keyval.split(":")[0], keyval.split(":")[1]
details[key] = val
Step – 3
Verify the required field and increment the counter.
def is_valid(details):
mandatory_fields = ("byr", "iyr", "eyr", "hgt", "hcl", "ecl", "pid")
for field in mandatory_fields:
if field not in details:
return False
return True
f = open("../inputs/day4.txt")
all_lines = f.read().strip()
lines = all_lines.split("\n\n")
count = 0
for line in lines:
line = line.strip().replace("\n", " ")
details = {}
for keyval in line.split(" "):
key,val = keyval.split(":")[0], keyval.split(":")[1]
details[key] = val
if is_valid(details):
count += 1
print(count)
Let’s run the above code with our actual input.
Great!!! it works. With this we have completed our part of the problem. Let’s checkout the part two.
Part – 2 – Strict Field Validation
In addition to the previous condition, more rules have been introduced. Now each field has to be strictly validated as per the requirement. Following are the added rules:
byr
(Birth Year) – four digits; at least1920
and at most2002
.iyr
(Issue Year) – four digits; at least2010
and at most2020
.eyr
(Expiration Year) – four digits; at least2020
and at most2030
.hgt
(Height) – a number followed by eithercm
orin
:- If
cm
, the number must be at least150
and at most193
. - If
in
, the number must be at least59
and at most76
.
- If
hcl
(Hair Color) – a#
followed by exactly six characters0
–9
ora
–f
.ecl
(Eye Color) – exactly one of:amb
blu
brn
gry
grn
hzl
oth
.pid
(Passport ID) – a nine-digit number, including leading zeroes.cid
(Country ID) – ignored, missing or not.
Looking at different types of validation, I can broadly categorize the validators as:
- Range Validator
- Color Validator
- Number Validator
- Height Validator
Just by looking at the problem, I’m tending towards the use of lambda
to make the code more clean to the eyes. This problem can be done gracefully in by following object oriented style and strategic design pattern but that’s an overkill for this problem; also I’m not in the mood to create classes today. So, lambda will serve us well in this scenario.
I will choose Python to work with this problem.
I will start by creating different validators. You can see them below:
def height_validator(val = ""):
if not val[0:-2].isnumeric():
return False
if val[-2:] == "cm":
return range_validator(val[:-2], 150, 193)
if val[-2:] == "in":
return range_validator(val[:-2], 59, 76)
return False
def range_validator(val, low, high):
number = int(val)
if low <= number <= high:
return True
def color_validator(val):
if val[0] != '#':
return False
if len(val) != 7:
return False
flag = True
for index in range(1, len(val)):
if not ('0' <= val[index] <= '9' or 'a' <= val[index] <= 'f'):
flag = False
break
return flag
def number_validator(val = ""):
return val.isdigit()
Each validator validates what is asked for.
Now all we have to do is associate these validators with the is_valid
method to get the correct output.
This is where the lambdas
are helpful. You see we are going to perform a similar validation on all the fields, so instead of creating a different condition for a different field, we can map the respective field validator at the time of the initialization.
This can be done via lambdas.
Then all that will be left to do is to call that lambda on that field to validate it. Simple.
Let me show you how it’s done in the code:
def is_valid(details):
mandatory_fields = {
"byr": lambda val: range_validator(val, 1920, 2002),
"iyr": lambda val: range_validator(val, 2010, 2020),
"eyr": lambda val: range_validator(val, 2020, 2030),
"hgt": lambda val: height_validator(val),
"hcl": lambda val: color_validator(val),
"ecl": lambda val: val in ("amb", "blu", "brn", "gry", "grn", "hzl", "oth"),
"pid": lambda val: len(val) == 9 and val.isdigit(),
"cid": lambda val: True
}
for field in mandatory_fields:
if field == "cid": continue
if field not in details:
return False
for field in details:
if not mandatory_fields[field](details[field]):
return False
return True
In the above code, we have associated a lambda function with the field at the time of initialization. In fact, it would be more efficient if we move the initialization part outside of the function.
As you can see in the code below. This is much more optimized because now it doesn’t have to create the dictionary every time the is_valid
function is called.
mandatory_fields = {
"byr": lambda val: range_validator(val, 1920, 2002),
"iyr": lambda val: range_validator(val, 2010, 2020),
"eyr": lambda val: range_validator(val, 2020, 2030),
"hgt": lambda val: height_validator(val),
"hcl": lambda val: color_validator(val),
"ecl": lambda val: val in ("amb", "blu", "brn", "gry", "grn", "hzl", "oth"),
"pid": lambda val: len(val) == 9 and val.isdigit(),
"cid": lambda val: True
}
def is_valid(details):
for field in mandatory_fields:
if field == "cid": continue
if field not in details:
return False
for field in details:
if not mandatory_fields[field](details[field]):
return False
return True
Great!!! I think the code is ready to be tested with the actual input.
Awesome!!! Getting the correct result for the given problem.
I hope you liked this question. And if you are enjoying it then don’t forget to subscribe to this blog. There’s a lot more to come. And don’t forget to comment your thoughts below.