Not all clocks are equal!

Talk about working with “time” with software engineers and you’re pretty much guaranteed someone is going to post a meme. I was recently talking with a co-worker about managing “simulated time” for a project and we ended up discussing why handling time is so complicated. This post attempts to (non-exhaustively) highlight how “time” works for a computer and some fun gotchas software engineers might have to deal with.

Days since last timezone issue 🕐#software #dev #programmer #webdev #web #developer #javascript #nodejs #reactjs #angular #100DaysOfCode #CSS #programming #CodeNewbie pic.twitter.com/I9jiZUVgYj
— Uncaught Exception (@UncaughtE) December 2, 2019

“Time” in double quotes

Without going into quantum physics to define time, let’s talk about how to measure time in way that is useful for computer science.

Ideally, we’d want to embed Atomic Clocks - the gold standard for measuring time accessible to humans in every computer. But this is not feasible in terms cost and I am not familiar enough with the manufacturing challenges. A quick Google does seem to suggest there are some manufacturers selling something like this.

Without access to embedded Atomic Clocks, we want something that can generate a consistent “tick” signal. What properties would we want in this signal?

Always spaced apart by the same duration
High resolution (at least more than what you might need)
Consistent under changing temperature, voltage and system load
Keeps time even when the computer is switched off

This is solved by a quartz crystal embedded in the motherboard circuit - called an RTC (Real Time Clock). The main purpose of RTC is to keep time when the computer isn’t turned on via a small battery. Let’s focus on how the OS handles time.

Time keeping in Linux

When the computer boots, it reads the time from the RTC and then the kernel syncs its own “software clock” based on the RTC.

At this point, we have 3 streams of “time” already:

The real time, ie the human time a.k.a wall time - which is itself grounded to some atomic clock somewhere (hopefully), at least on modern smartphones and computers
The RTC time - from the quartz crystal
The Software clock time - the value returned by system calls

And they can all drift from each other!

Linux has certain system calls that can be used to adjust these clocks based on the drift factors. And since the drift can be either forward or backward, things get complicated. More about that in a bit.

How do we solve this? What if we had a “central” time source we all agreed upon and sync’d everything to it? Well, that exists and it utilizes Atomic Clocks we talked about earlier. There is an international consortium of labs which average the time kept by the 450+ atomic clocks all over the world: https://en.wikipedia.org/wiki/International_Atomic_Time

Most of the modern, internet-connected systems, use NTP - the Network Time Protocol - a protocol that is designed to let computers sync time over a network, to query these atomic clocks remotely and adjust their own clocks.

Drift: Backwards/Forwards

This is a fun topic to think about: how should the OS adjust the clocks when it’s told by NTP that it’s time keeping has been shoddy? Ideally, you just tell the system what the time should be and the OS simply sets the new time value. But it isn’t that straight forward :)

I had a lot of fun thinking about these timing issues while developing my open-source cancelable job scheduler queue: ChronoMQ

Let’s tackle the backwards drift first…

If the local clock is falling behind, it can be fast-forwarded ahead with relative ease so that it catches up with the “real” time. But what if you have written a program that alarms at 6:00:00pm but your computer adjusted the clock from 5:59:55pm straight to 6:00:25pm?

Time keeping systems (NTP) have to take care of adjusting the time in small enough increments so that something like this doesn’t happen - NTP uses sub-second increments while adjusting the clock forward and conversely, slows down the clock if the local time is ahead.

The forwards drift raises some very interesting problems

What if you record a transaction with a timestamp T1 and then a Backward clock correction happens. Then T2 would have occurred before T1 - ie T2 < T1. This might break applications like banking where for example, a withdraw must happen after a sufficient deposit.

Whacky Olympics - Sync Race

Imagine that you are running a race. Usually the objective is to be first of course, but what if we changed the rules? In our new race, let’s call it a Sync Race, the objective is to cross the finish line exactly at the same time as (technically, the majority of) other racers.

Since it’s the Whacky Olympics, let’s give you a superpower - you control your position by turning a dial on your watch.

Here are the rules for operating your watch: - The clock ticks normally and with every tick, you take a step forward - You can twist the dial anti-/clockwise to set the time. This makes you take a big step forward/backward proportional to the twist - If you keep pressing the dial, the clock stays at the same time and you practically stay in the same position practically stuck in molasses

Note: This task of correcting the drift in NTP is called Clock Discipline

If you end up running too fast, you’d have to options:

Reverse Run - Go backwards to reduce the gap

This is a decent solution, twist the dial backwards! You can run in the reverse direction for a bit and when you catch up with the group, start running with them again. If you keep your cadence in sync with the rest of the group, you have a chance of winning our Sync Race! However, as with the banking example above, there are many distributed systems applications where you cannot have the time run backwards - even if it was deliberately set far in the future by an attacker.

This was a situation I encountered while developing my wheels of time approach for ChronoMQ . If the time goes backwards, a lot of guarantees of expecting linear time growth fail. S

Slow Run - Slow down and let the rest of the group to catch up

This is a better solution than the naive approach of Reverse Run for our distributed systems and for any application which would rely on a monotonous increase in time. Press down on the dial for short bursts and you “slow down” and let the rest of the group catch up. This has the benefit of never reversing. Now if you think of the banking example, you can be guaranteed that the withdrawal happens after the deposit.

The Golang stdlib has a built-in monotonic time clock which was really handy with ChronoMQ. An excerpt from the Go docs:

Monotonic Clocks: Operating systems provide both a “wall clock,” which is subject to changes for clock synchronization, and a “monotonic clock,” which is not. The general rule is that the wall clock is for telling time and the monotonic clock is for measuring time. Rather than split the API, in this package the Time returned by time.Now contains both a wall clock reading and a monotonic clock reading; later time-telling operations use the wall clock reading, but later time-measuring operations, specifically comparisons and subtractions, use the monotonic clock reading.

Chrony

I prefer to use Chrony on my servers as it provides a really good implementation of Time Smoothing - which is like our Slow Run, pushing down on the watch dial.

Multi-Server/Distributed time keeping

So far, we only talked about what is happening on a single machine. Imagine what happens when in the banking example, the deposit and the withdrawal happens on different servers?

For that, we use vector clocks which are based on the seminal paper by Leslie Lamport . This is a very cool space to explore on its own. I will cover this in a later, time-part-two post.

Conclusion

Timekeeping in computing is a complex and multifaceted challenge, encompassing everything from hardware constraints to software intricacies. By understanding how time is managed across different systems and the potential issues that arise, we can write better code that ensures reliable and consistent system behavior.

References

Last updated on January 10, 2022

HAProxy - a tale of two queues Debugging shell-startup latency while using Starship Prompt