6 Naming folders and files

When you are working in Terminal/Bash, it is important to use good folder and file names to make your life easier.

1. Good names for folders: no spaces or special characters

Recall when you name a variable in SAS or R, the variable name cannot contain spaces or unusual characters. It is best practice to not use spaces or unusual characters in folder or file names, even though spaces are permissible and commonly used by Windows and Mac Users.

You may wonder what the problem is with spaces, anyway? While spaces are human-readable they aren’t machine-friendly. When you refer to a folder or file using Git in Terminal or Bash, a name without spaces is much easier to type (otherwise you have to insert a backslash before the space). Spaces also break the auto-complete function that Git users love. This is frustrating.

Good folder names and files names use dashes and underscores in place of spaces. For example “life-expectancy” is a great folder name for a project estimating life expectancy, while “pollution-ptb” is a great folder name for a project estimating the causal effect of air pollution on preterm birth.



2. Good names for code files: numbered with underscores and dashes

When running a statistical analysis, there is an underlying order across the code files (e.g., starting with importing the data, then data cleaning, and analysis). Good code file names should start with a number (padded by 0) to order the files according to the order they are performed in the analysis. This is followed by a short (human and machine readable) descriptor of what the file does. You can also use underscore “_” to delimit field, and dashes “-” to separate words within field.

For example, here are some of the file names in the “pollution-ptb” folder:

These files are:

  • machine readable
  • human readable
  • play well with default ordering

Flip through Jenny Bryan’s slide set here for a deeper dive into file naming:

Download slides here



3. What happens when there are bad names?

The issue and frustration of includes spaces in file names really hits home when you include spaces in the names and trying interacting with GitHub to add the files and make a commit – if you like pain and frustration – you can try this at home!

For example, suppose I made an R markdown file called “Data Visualization Evaluation Report.rmd”. In the screenshot below, you can see that this file has been modified when I used the git status command. The issue occurs when I need to add that file using git add {file_name}. If I typed git add Data Visualization Evaluation Report.rmd there would be an error because Git doesn’t know that the name continues after the space after the word “Data”! Even worse, the space breaks the auto-complete functionality that occurs when you press “tab” to auto-complete the file name after typing the first few letters.

To really add the file we need to use the escape character (the backslash “\”) before each space in the name, as shown in the git add command highlighted in yellow. This might not sound like a big deal if you haven’t interacted with Terminal/Bash very much, but I can guarantee you that the frustration will build to a crescendo over time… save yourself this pain by using dashes and underscores instead of spaces!