Let's solve Advent of Code 2024 Day 3 with regex!

AOC 2024, Day 3: All About Regex


Another day, another Advent of code problem! Day 3 was easy… if you know your regex.


The below is a guest post by the amazing Shraddha Agrawal.


Part 1

Problem

Given an input string, we had to find all instances of mul(X,Y) where X and Y could be between 1 and 3 digits long. For example, the following highlighted sub sequences are valid:

day 2 part 1

The final answer should be the sum of the product of the numbers in each sub-sequence. So in the above case it would look be 161.

day 2 part 1 answer

Solution

The problem statement was shouting to use regex, short for regular expressions. Let’s first create the expression first!

Our capturing group should start with mul(, followed by one to three digits, followed by ,, then another one to three digits and finally should end with ).

Putting it all together the expression looks like this: (mul\([0-9]{1,3},[\d]{1,3}\)). Let’s break this down:

  • mul\( - matches exactly with mul( we use the escape character for ( as it is a special character in regex.
  • [0-9]{1,3} - matches the characters within the range [0,9]. {1,3} matches the previous pattern anywhere between one to three times. Putting it together, this section matches a digit occurring once to maximum thrice.
  • , - matches exactly with ,.
  • [\d]{1,3} - is same as [0,9]{1,3}. Another way to specify a digit is using the escaped character d, ie \d.
  • \) - and finally this last section matches exactly with ). We again use the escape character as )is also a special character in regex.

day 2 part 1 regex visualised

You can find more details about this on https://regex101.com/. Its a handy site to build regex while comparing it against a test string and the explanations are clear and concise.

Now that our expression is ready, time to break out the regexp package. Its a handy collection of commonly used methods for dealing with regex. For our usecase, we want to find all substrings that match the regular express in the input string. FindAllStringSubmatch (ref) does exactly that.

Putting it all together, the function to get the valid subsequences would look like this:

func findValidMuls(input string) [][]string {
	r := regexp.MustCompile(`(mul\()([\d]{1,3})[,]([0-9]{1,3})\)`)
	return r.FindAllStringSubmatch(input, -1)
}

Nice and short, gotta love regex! All that’s left to do is now iterate over this and get the sum of the products of the numbers in each valid subsequence. You can find my code for the same here.

Gotchas in Input

In Advent of Code, usually the test input is very small and the actual user input is a many fold scaled version of the same. Sometimes though, there will be corner cases in the actual input that you might not have encountered in the test input. This is also what makes a few days hard because we might not think of a corner case and waste a tonne of time wondering why the solution is not working against the main input.

Comparing the test input and the given input for this problem, I noticed while the test input is only a single line, the actual input spanned many lines. This means we have to be careful to not miss any input data. This will become even more relevant in part 2, lets get started on that!

Part 2

Problem

This had a slight twist. Now instead of just finding mul(X,Y), we also have to look for activating do() and deactivating don’t() patterns.

day 2 part 2

As we are iterating over the input string, encountering a do() means all following mul(X,Y) are valid. But when we stumble across a don't() func, all mul(X,Y) after it should be ignored till the next do() is encountered. Applying that to the above, the actual valid patterns are:

day 2 part 2 answer

Solution

Again, regex can help us get all matching subsequences, all mul(X,Y), both activated and deactivated, do() and don't().

To the previously created expressions, we just need to add a couple of alternative matching patterns for do() and don’t(). This will result in: (mul\([\d]{1,3},[0-9]{1,3}\))|don't\(\)|do\(\)

Let’s break this down:

  • (mul\([\d]{1,3},[0-9]{1,3}\)) - The first matching group. Same as what we used in part 1.
  • | - Alternation operator equivalent to an OR gate. This helps us define other matching groups.
  • don't\(\) - Matching group for don’t() with both ( and ) escaped.
  • do\(\) - Matching group for do() with both ( and ) escaped.

day 2 part 2 regex visualised

The above pattern when used on the input will give us all matching subsequences of the three patterns in order. In order is important as we need to only consider matches of mul(X,Y) that don’t have a proceeding don’t(). This can be done using a boolean variable, isActive, initially set to true. Whenever we encounter a don’t() we set it to false and encountering a do() sets it to true. Only those mul(X,Y) are used for the final sum when isActive is set to true.

One thing to keep in mind, as our actual input spans many lines, we need to preserve the value of isActive across lines. If a line ends on “deactivated” mode, we should only start considering mul(X,Y) in new line when a do() is encountered. To make this simpler, we can also merge the input lines into a single string.

You can find the final code here.

And with that, Day 3 of Advent of Code is concluded. Again a small and easy problem which required a bit of regex knowledge!

Did you write a custom parser to do the same? I would love to compare notes and checkout your solution. Reach out to me on TwitterBluesky or on e-mail at contact[at]shraddhaag[dot]dev.