Align Timestamps Using Python
04 Feb 2016Very often I have to deal with logs that come with timestamps. In some cases, I collect logs from two different tools at the same time, yet they have their own timestamp style, or even in different locales. That makes the time analysis difficult, because I have to (manually) mapping the packages using the universal timestamp in my head. If, somehow, the time difference calculation is involved, it totally becomes a nightmare.
So, if there is any way to convert the timestamp format so that we can handily align them?
Well, the Python packages datetime
and pytz
come to the rescue!
Below are two example log excerpts over the same period of time:
LOG1 – from the phone
[1809:573] ### Start [Wed Feb 3 19:06:07 2016] ###
[1809:573] Current time is 2016-02-04 03:40:56.293 UTC
...
LOG2 – from tcpdump
No. | Time | Timestamp | Source | Destination | Protocol | Length | Info |
---|---|---|---|---|---|---|---|
15 | 16.311732 | Feb 3, 2016 19:40:56.337615000 PST | fd18:58d7:** | fd18:58d7:** | DNS | 99 | Standard query 0xc933 A stun.l.google.com |
19 | 16.367842 | Feb 3, 2016 19:40:56.393725000 PST | 192.168.3.* | 74.125.142.* | STUN | 64 | Binding Request |
Then using the following code snippet, we can easily convert LOG1’s time to the standard timestamp in PST!
#!/usr/bin/python
# Filename: ts_remap.py
'''
Remapping the timestamp
'''
import re
from datetime import datetime, timedelta
from pytz import timezone
import math
if __name__ == "__main__":
file_in = 'rawlog.log'
file_out = 'tsremap.log'
lines_out = []
# Make sure file gets closed after being iterated
with open(file_in, 'r') as f:
lines = f.readlines()
rel_sec = 1809.573 # from log1
clocktime = "2016-02-04 03:40:56.293 UTC" # from log2
time_ref = datetime.strptime(clocktime, '%Y-%m-%d %H:%M:%S.%f %Z')
for line in lines:
if line == "" or line == "\n":
continue
try:
sec, ms, content = re.match(r"^\[(\d+)\:(\d+)\](.*)", line).groups()
cur_sec = float(sec + "." + ms)
delta_sec = cur_sec - rel_sec
sec_dec, sec_int = math.modf(cur_sec - rel_sec)
ts_naive = time_ref + timedelta(seconds = sec_int, milliseconds = int(sec_dec * 1000))
ts_utc = timezone('UTC').localize(ts_naive)
ts_pst = ts_utc.astimezone(timezone('US/Pacific'))
lines_out.append("[%s] %s\n" % (str(ts_pst.strftime("%H:%M:%S.%f")[:-3]), content))
except:
lines_out.append(line)
with open(file_out, 'a') as f:
# go to start of file
f.seek(0)
f.truncate()
f.writelines("%s" % l for l in lines_out)
LOG1 – timestamp converted
[19:40:56.293] ### Start [Wed Feb 3 19:06:07 2016] ###
[19:40:56.293] Current time is 2016-02-04 03:40:56.293 UTC
...
Notice that we also need some tricks to play with the Python regex package re
and math package math
. Enjoy.