10 Dec 2017

A fun integer overflow bug investigation

Sometimes we run into weird bugs or buggish situations that are frustrating but are actually fun to solve. I ran into a similar problem recently. A good old NumberFormatException in the open-source analytics dashboard service: Metabase.

We use Metabase to create product metric dashboards. Recently, they allowed caching long running queries and it allows you to set the max entry size as a MAX CACHE ENTRY SIZE setting in kilobytes. This is where the fun starts!

The classic unit conversion mistake

I wanted to set the cache size to 2 Gb because we have a lot of dashboards and some have really large queries that we don’t really need refreshed every time they run. However, instead of writing the value in kb, I absentmindedly entered the value in bytes inviting an inevitable NumberFormatException: Failed to parse setting: For input string: 2147483648 The UI didn’t complain and happily applied the value to the backend resulting in a blank page when you open settings the next time:

The investigation

Equipped with a blank settings page and a cryptic error message: A Jetty 500 response with the body - For input string: "2147483648", I set out to hunt the bug. I had changed the setting a few weeks ago and since the UI didn’t complain, I promptly closed the admin panel and forgot about it till I needed to tweak some settings later so this was pretty mysterious symptom.

The first thing I tell junior engineers when they are faced with debugging code is to not be afraid of reading the stack-trace and just taking a moment to connect the dots. Just taking the time to try and read the stack-trace + logs from code that you have never seen seems like a daunting task initially but with practice it gets easier. So we hunt for the stack-trace next because the http response isn’t super helpful but we have the first clue: the number 2147483648.

Next, let’s look at the logs from Metabase:

12-10 21:52:58 WARN server.HttpChannel :: /api/setting
java.lang.NumberFormatException: For input string: "2147483648"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Integer.parseInt(Integer.java:587)
	at java.lang.Integer.parseInt(Integer.java:615)
	at metabase.models.setting$get_integer.invokeStatic(setting.clj:162)
	at metabase.models.setting$get_integer.invoke(setting.clj:158)
	... <more logs> ...

Ok, so that makes sense: 2147483648 is exactly 2^31 which is exactly 1 above the max value of a signed 32 bit integer. However, we still don’t know where this is set and how it can be changed because the UI itself isn’t loading. So we head over to their github repo and dig around metabase.models.setting$get_integer. It seems like something is trying to read this value and breaking but how do I get around to un-setting this value?

Since the value was the same every time and coming from the settings module, there must be some representation of this value in a database somewhere so that’s where we head to next. Sure enough, there is a table called Settings and a value in that table is set to 2147483648. After manually updating the value in the database and restarting the server, we were back in action!