AOC 2024, Day 3: All About Regex
Another day, another Advent of code problem! Day 3 was easy… if you know your regex.
The below is a guest post by the amazing Shraddha Agrawal.
Part 1
Problem
Given an input string, we had to find all instances of mul(X,Y)
where X and Y could be between 1 and 3 digits long. For example, the following highlighted sub sequences are valid:
The final answer should be the sum of the product of the numbers in each sub-sequence. So in the above case it would look be 161.
Solution
The problem statement was shouting to use regex, short for regular expressions. Let’s first create the expression first!
Our capturing group should start with mul(
, followed by one to three digits, followed by ,
, then another one to three digits and finally should end with )
.
Putting it all together the expression looks like this: (mul\([0-9]{1,3},[\d]{1,3}\))
. Let’s break this down:
mul\(
- matches exactly withmul(
we use the escape character for(
as it is a special character in regex.[0-9]{1,3}
- matches the characters within the range [0,9].{1,3}
matches the previous pattern anywhere between one to three times. Putting it together, this section matches a digit occurring once to maximum thrice.,
- matches exactly with,
.[\d]{1,3}
- is same as[0,9]{1,3}
. Another way to specify a digit is using the escaped character d, ie\d
.\)
- and finally this last section matches exactly with)
. We again use the escape character as)
is also a special character in regex.
You can find more details about this on https://regex101.com/. Its a handy site to build regex while comparing it against a test string and the explanations are clear and concise.
Now that our expression is ready, time to break out the regexp
package. Its a handy collection of commonly used methods for dealing with regex. For our usecase, we want to find all substrings that match the regular express in the input string. FindAllStringSubmatch
(ref) does exactly that.
Putting it all together, the function to get the valid subsequences would look like this:
func findValidMuls(input string) [][]string {
r := regexp.MustCompile(`(mul\()([\d]{1,3})[,]([0-9]{1,3})\)`)
return r.FindAllStringSubmatch(input, -1)
}
Nice and short, gotta love regex! All that’s left to do is now iterate over this and get the sum of the products of the numbers in each valid subsequence. You can find my code for the same here.
Gotchas in Input
In Advent of Code, usually the test input is very small and the actual user input is a many fold scaled version of the same. Sometimes though, there will be corner cases in the actual input that you might not have encountered in the test input. This is also what makes a few days hard because we might not think of a corner case and waste a tonne of time wondering why the solution is not working against the main input.
Comparing the test input and the given input for this problem, I noticed while the test input is only a single line, the actual input spanned many lines. This means we have to be careful to not miss any input data. This will become even more relevant in part 2, lets get started on that!
Part 2
Problem
This had a slight twist. Now instead of just finding mul(X,Y)
, we also have to look for activating do()
and deactivating don’t()
patterns.
As we are iterating over the input string, encountering a do()
means all following mul(X,Y)
are valid. But when we stumble across a don't()
func, all mul(X,Y)
after it should be ignored till the next do()
is encountered. Applying that to the above, the actual valid patterns are:
Solution
Again, regex can help us get all matching subsequences, all mul(X,Y)
, both activated and deactivated, do()
and don't()
.
To the previously created expressions, we just need to add a couple of alternative matching patterns for do()
and don’t()
. This will result in: (mul\([\d]{1,3},[0-9]{1,3}\))|don't\(\)|do\(\)
Let’s break this down:
(mul\([\d]{1,3},[0-9]{1,3}\))
- The first matching group. Same as what we used in part 1.|
- Alternation operator equivalent to an OR gate. This helps us define other matching groups.don't\(\)
- Matching group fordon’t()
with both(
and)
escaped.do\(\)
- Matching group fordo()
with both(
and)
escaped.
The above pattern when used on the input will give us all matching subsequences of the three patterns in order. In order is important as we need to only consider matches of mul(X,Y)
that don’t have a proceeding don’t()
. This can be done using a boolean variable, isActive
, initially set to true. Whenever we encounter a don’t()
we set it to false and encountering a do()
sets it to true. Only those mul(X,Y)
are used for the final sum when isActive
is set to true.
One thing to keep in mind, as our actual input spans many lines, we need to preserve the value of isActive
across lines. If a line ends on “deactivated” mode, we should only start considering mul(X,Y)
in new line when a do()
is encountered. To make this simpler, we can also merge the input lines into a single string.
You can find the final code here.
And with that, Day 3 of Advent of Code is concluded. Again a small and easy problem which required a bit of regex knowledge!
Did you write a custom parser to do the same? I would love to compare notes and checkout your solution. Reach out to me on Twitter, Bluesky or on e-mail at contact[at]shraddhaag[dot]dev.