Featured


A foreshadow of my data science career in my engineering undergrad: Kriging

AKA Gaussian process regression for the geosciences.

2023-10-06

Image not found
The German tank problem

Estimating the upper bound of a discrete uniform probability mass function from sampling without replacement using frequentist and Bayesian techniques.

2024-02-24

Image not found


Latest


Name conventions for temporal columns

2023-12-08

Data engineering Data cleansing Best practices

Contents


For goodness' sake, can we please all agree on some naming conventions for temporal (date, time, datetime etc.) columns?

1 The problem

Suppose in some tables, you see this:

DTWEEKDAYMTHX
2024-01-011142
2024-01-022142
2024-01-033142
2024-01-044142
2024-01-055142

And in other tables, you see this:

DATEWEEKDAYMNTHX
2024-01-01MondayJanuary42
2024-01-02TuesdayJanuary42
2024-01-03WednesdayJanuary42
2024-01-04ThursdayJanuary42
2024-01-05FridayJanuary42

Now, you can't really label this incorrect. From the lens of a single table, anyone could argue that 'date', 'day' and 'mnth' are acceptable names to refer to a date, day of the week and month of the year. And anyone could argue that weekdays can be represented as both a string ('Monday') and an integer (1) and are therefore acceptable to place in a column called 'weekday'.

However, it is inconsistent. There are inconsistencies in name and inconsistencies in meaning.

Me
About

Hi, I'm Tim. I'm an experienced technical data science leader with a passion for delivering value to businesses using data science, machine learning and artificial intelligence. In my 8+ years of experience as a data scientist, I have acquired exposure across a diverse range of industries. I have worked for two ASX 200 companies in the energy and broadcast media industries, and have acquired international exposure at a top-tier financial technology and consulting firm in UK and Singapore.

Continue reading