your File Name Convention

Posted on

I am sure that everyone has had at least one such case when you open your computer and cannot find that "Untitled" file that you need so much right now. And all because the name of the file had no meaning, as a rule, the reason was: it seems clear as it is, I have no time to think about it, I will rename it later or so on - underline it. If in real life we can still cope with this, then in data processing it is a serious mistake.

What do we do? The answer is the File Name Convention. It is a set of rules that we apply when naming a file. Ideally, the rules should be defined and agreed upon by the entire team. This will provide clarity and prevent unnecessary mistakes, when at the end of the day it may turn out that you worked with the wrong set of data. Though file names should be descriptive and you can include any important information in the file name, it is good practice not to exceed 25 characters. To start with it you can consider using these elements:

  • project name (or acronym)
  • researcher name (or initials)
  • location(if needed)
  • condition
  • date in format ISO 8601 (YYYYMMDD) this will make sure that all your files are in chronological order
  • the number of versions (with leading zeroes to be sure that files sort in sequential order). Use “v001, v002, v003, ...” instead of “v1, v2, v3, ...”.

The last advices are to avoid special characters like $, %, *,& and separation. Instead it is better to use underscore (sales_data), dashes (sales-data) or camel case (SalesData).

Maryna Demchenko's website. I use this website to share my experience of becoming a data analyst.

Copyright © 2021

This website is built with GatsbyJS and Bulma