web analytics

Align Timestamps Using Python

Very often I have to deal with logs that come with timestamps. In some cases, I collect logs from two different tools at the same time, yet they have their own timestamp style, or even in different locales. That makes the time analysis difficult, because I have to (manually) mapping the packages using the universal timestamp in my head. If, somehow, the time difference calculation is involved, it totally becomes a nightmare.

So, if there is any way to convert the timestamp format so that we can handily align them?

Well, the Python packages datetime and pytz come to the rescue!

Below are two example log excerpts over the same period of time:

LOG1 – from the phone

[1809:573] ### Start [Wed Feb  3 19:06:07 2016] ###
[1809:573] Current time is 2016-02-04 03:40:56.293 UTC

LOG2 – from tcpdump

No. Time Timestamp Source Destination Protocol Length Info
15 16.311732 Feb 3, 2016 19:40:56.337615000 PST fd18:58d7:** fd18:58d7:** DNS 99 Standard query 0xc933 A stun.l.google.com
19 16.367842 Feb 3, 2016 19:40:56.393725000 PST 192.168.3.* 74.125.142.* STUN 64 Binding Request

Then using the following code snippet, we can easily convert LOG1’s time to the standard timestamp in PST!

# Filename: ts_remap.py

Remapping the timestamp


import re
from datetime import datetime, timedelta
from pytz import timezone
import math

if __name__ == "__main__":

    file_in = 'rawlog.log'
    file_out = 'tsremap.log'
    lines_out = []

    # Make sure file gets closed after being iterated
    with open(file_in, 'r') as f:
        lines = f.readlines()

    rel_sec = 1809.573  # from log1
    clocktime = "2016-02-04 03:40:56.293 UTC"  # from log2
    time_ref = datetime.strptime(clocktime, '%Y-%m-%d %H:%M:%S.%f %Z')

    for line in lines:
        if line == "" or line == "\n":

            sec, ms, content = re.match(r"^\[(\d+)\:(\d+)\](.*)", line).groups()
            cur_sec = float(sec + "." + ms)
            delta_sec = cur_sec - rel_sec

            sec_dec, sec_int = math.modf(cur_sec - rel_sec)

            ts_naive = time_ref + timedelta(seconds = sec_int, milliseconds = int(sec_dec * 1000))
            ts_utc = timezone('UTC').localize(ts_naive)
            ts_pst = ts_utc.astimezone(timezone('US/Pacific'))
            lines_out.append("[%s] %s\n" % (str(ts_pst.strftime("%H:%M:%S.%f")[:-3]), content))


    with open(file_out, 'a') as f:
         # go to start of file
         f.writelines("%s" % l for l in lines_out)

LOG1 – timestamp converted

[19:40:56.293]  ### Start [Wed Feb  3 19:06:07 2016] ###
[19:40:56.293]  Current time is 2016-02-04 03:40:56.293 UTC

Notice that we also need some tricks to play with the Python regex package re and math package math. Enjoy.

Creative Commons License
This work by Zengwen Yuan is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
comments powered by Disqus