Clean data isn’t just about tidy spreadsheets; it’s what ensures that insights are reliable and decisions are well-informed. This stage has been about simplifying, structuring, and preparing the data to answer key business questions around churn, trial conversion, and revenue leakage.
What I’ve Done So Far
I began by carefully reviewing the dataset and the initial data model. During this process, I identified a few important points that shaped how I approached cleaning:
- Dropped the Location Table: Although location data can be valuable, in this case the
Locationtable didn’t have a meaningful connection to other tables in the model. Rather than keeping unnecessary or disconnected data, I chose to remove it. This helps keep the model simple and focused. - Refined the Date Table: Dates play a crucial role in subscription analysis. I rebuilt the
Datetable from scratch, retaining only the essentialDatecolumn and adding new columns that will make time-based analysis easier down the line (e.g., month, quarter, year). This ensures consistency and flexibility when tracking trends over time. - Reviewed Status Columns: Both the
SubscriptionsandCustomertables haveStatuscolumns, but they aren’t identical. InSubscriptions, status values include active, cancelled, and trial, reflecting the state of an individual subscription at a point in time. InCustomer, the status represents the overall state of the customer (e.g., active, churned, trial). I noted this mismatch as a potential risk area for analysis. For example, a customer might have one cancelled subscription but still be active on another. Rather than try to fix this at the cleaning stage, I’ve documented it clearly so I can address it thoughtfully during analysis. - Ensured Basic Consistency: I filtered and reviewed key columns like
Status,CustomerID,SubscriptionID, and dates to spot any blanks, inconsistencies, or typos. This helps avoid errors when calculating key metrics later.
Why This Matters
Without clean, well-structured data, it’s easy for analysis to lead to the wrong conclusions. In this case, clean data will allow me to compute accurate churn rates, trial conversion rates, and failed payment impacts, all crucial for understanding where StaffWise’s revenue leakage is happening.

