🧼 The 3-Minute Fix for Dirty Column Names

Clean up your headers. Write cleaner code. Save time.

DEEP CHATTERJEE
May 18, 2025

Inconsistent column names slow everyone down.

You’ve seen this mess before:

Different formats. Same meaning. But every time, you have to fix it manually.

Let’s fix that — fast.

df.columns = df.columns.str.lower().str.replace(' ', '_')

import janitor
df = df.clean_names()

import re
df.columns = [re.sub(r'\W+', '_', col).lower() for col in df.columns]

This fix might seem small. But it compounds over time. If you’re cleaning data daily, this saves hours each month.

Use these techniques inside a data loading function. That way, your columns are always clean — automatically.

Example:

def load_and_clean(path):
    df = pd.read_csv(path)
    df.columns = df.columns.str.lower().str.replace(' ', '_')
    return df

We're testing new formats in this newsletter.
How’s this one feel?