Time is hard, Part 2

Apr 30th 2018

As I had indicated I will continue with why time is hard.  In part 1 I discussed how a simple delay() function can be hard to implement.  Here I will discuss tracking time.

This problem came up in my past when trying to do a simple data logger.  The basic idea was to record some data with a time stamp, say something like battery mAh used.  So I write some data to flash memory, when memory is full then I over write the old data, ie circular buffer.  

So I want to know what time the last sample is recorded.  Hence the simple thing is to set up a timer, like SysTick to track the number of seconds.  Then I record the time stamp in the data log, hence the data log with the newest time stamp (largest time number and good CRC) is the last recorded sample. 

Now lets say that the processor reboots, for example a watchdog reset. Then at power up I read the data log and find the last time written, then we can set the SysTick to this timestamp +1 and continue.  

So that was not so hard was it? 

Now, like normal, management comes in and says, "we need to know what calendar time the data was recorded."  OK that would have been nice to know in the requirements, but no problem we will allow you to set the time. 

Now things start breaking...  We have had a unit running and it is now at seconds 1000, and then we go set the seconds to epoch time.  Our data log just  jumped from 1000 to 1525115496.  What if you were using the counter for a delay loop or to schedule tasks and time jumped like that?  Now the system thinks those first samples were in the 1970's.  What about units that were in the field before the calendar time change firmware, how do we update the firmware and keep the data log? 

Even worse what if the user figured out he set the time wrong and changed the time to 15000?   Now we just went back in time. Time is REALLY HARD... 

This 'time travel' problem is very common, and I think time problems is where I have spend most of my time fixing bugs. It actually gets worse on the newer processors with embedded RTC (real time clocks or real time counters). That is back in the old days you had a RTC chip which was powered from it's own battery. Thus after a power cycle on the processor the RTC would still be correct.  Now many processors reset the RTC when a reset happens (even watchdog resets). 

So to deal with the problem we need to understand a few things about time:

1. Time is(should be) always increasing 

2. Time ideally increases monotonically 

The first one has a conflict with user setting the time, that is they could set the time in the past.  So an easy fix to this problem is to keep time based on the first time board is powered on. So our time counter starts at one, and increases ALWAYS.  For calendar time we store an offset. This offset can change as needed but internally we should always use the time since first power on to store everything.   

For example our data log would be stored as the time since first power on. When we report the data log however we add in the offset such that the user sees the calendar time.  

The monotonically increasing of time is not always possible, that is you might not know how long it has been since the watchdog reset happened or between battery changes.  The best we can do is figure out what our last time since reboot and start from there. That is make sure time is increasing.

However this could lead to a problem where the data log is incorrect, maybe battery was dead for a day and we would be off by a day.   So assume the user resets our calendar time after battery change, then everything before the reboot would be off by day... 

Time is Hard... 

A solution to this is to record in the datalog a time stamp of when the reboot happened and the time offset value at time of the reset.  Now we use the new time offset for everything after reboot and the old value for everything before reboot.   This also allows multiple reboots... 

What if we have on demand data logging, for example user presses a button and we record data sample. Now if we store data logs with time in seconds it is very easy to end up with two data log entries at the same time.  So you might say use milliseconds resolution,  this might solve your problem today, but as processors get faster it might still happen on the same millisecond.  

Time is Hard... 

A solution to this problem is to realize that time is hard, and you should not rely on time for indexing data.  Hence on your data log if you stored a number as to the data log number, which starts at 1 for first sample save and increments each data log, then you know even if they have the same time stamp which came first. 

This brings up another subject. Try to avoid using zero for time or external indexes.  For example when powering up a processor never set time to zero, start at 1. This way if you see a time of zero you know something is wrong.  As an example imagine you have a function that gets the seconds()

uint32_t getSeconds();

Now if you start your seconds at 1 (or more) then you can use zero as an error flag. For example you might return zero if the timer/counter is not configured yet. 

Time is Hard... 

All the above problems with time have been bugs in firmware I have created at some point in time.  I personally feel like time is the hardest problem I have ever faced with a microcontroller.  I also fully realized that although I have found many problems, I will be finding more bugs with time.  However, I will personally be happy if I can create new bugs and not recreate my old ones. 

Usually when someone asks me about creating some embedded firmware I ask them two questions:

1. How do you plan to do field firmware updates?

2. How do you plan to handle time tracking?

You can tell the people who have done embedded products before based on their answers to these two questions.  The ones that have done embedded programming before know that time is hard and firmware updates are must.