Documenting the Analysis Process

This entry is part 5 of 6 in the series Investigating Subscription Churn at StaffWise

Investigating Subscription Churn at StaffWise

Problem Statement
Identifying Key Data Points for Analysis
Simulating Realistic Subscription Data for Business Intelligence
Data Cleaning Progress Update
Documenting the Analysis Process
Report: Investigating Churn at Staffwise

This phase of the project focused on executing the actual analysis. While I had already done significant prep work in earlier stages, one key lesson became clear: you’re never really done transforming data. As I dug deeper, I found myself creating new columns, recalculating measures, and rethinking earlier steps.

The process wasn’t without its challenges – frequent laptop crashes meant I had to redo parts of the work multiple times. At some point, I had to restart the entire process, trimming the dataset from five years to just sixteen months and consolidating the tables to reduce strain on my machine.

I also learned that even the most structured plans will shift as you become more familiar with the data. Each new insight led to new questions, and that ongoing curiosity ultimately shaped the final approach.

Data Cleaning & Preparation (Project Restart)

When restarting the project, I consolidated and prepped the data with the following steps:

1. Initial Merges & Column Selection

Merged fact_subscription with fact_sales; retained SubscriptionPlanID and BillingCycleId
Joined dim_customer for CustomerName and SignUpDate
Added BillingCycle info from the billing cycle table (kept Label as BillingCycle, DurationMonths)
Merged with SubscriptionPlan to get PlanName and MonthlyPrice
Copied the final table as values into a new worksheet, cleaned the columns, and renamed them

Final columns:

SubscriptionID, Subscription Plan, Billing Cycle, DurationMonths, CustomerName, SignUpDate, TransactionDate, Payment Status, PaymentMethod, Amount Charged, Monthly Price.

Saved as a new file: Sales.xlsx

2. Data Import & Type Checks

Imported the file into Microsoft Excel as fact_sales
Verified and corrected data types

3. Fixing `SignUpDate`

I noticed that some SignUpDate values were later than the TransactionDate, which isn’t valid.

Fix: I recalculated SignUpDate per customer as the earliest transaction date.

Removed the original SignUpDate
Duplicated fact_sales -> grouped by CustomerName -> min of TransactionDate as the new SignUpDate
Merged this back into fact_sales

4. Created New Columns

Column Name	Formula
ExpectedRevenue	`=[DurationMonths]*[MonthlyPrice]`
Actual Revenue	`=[DurationMonths]*[AmountCharged]`
CoverageEndDate	=D`ate.AddMonths([TransactionDate], [DurationMonths])`
Customer Status	`=let TrialEnd = Date.AddMonths([SignUpDate], 1), GraceEnd = if [CoverageEndDate] <> null then Date.AddDays([CoverageEndDate], 15) else null, EvalDate = [TransactionDate] in if [ActualRevenue] = 0 and [PaymentStatus] = "Success" and EvalDate <= TrialEnd then "Trial" else if [ActualRevenue] > 0 and GraceEnd <> null and EvalDate <= GraceEnd then "Active" else if [ActualRevenue] > 0 and GraceEnd <> null and EvalDate > GraceEnd then "Churned" else "Churned"`

5. Date Table

Imported a Date table and created: Year, Month, Quarter
Related it to fact_sales[TransactionDate]

6. Created Some Measures

Created a blank query called “Measures” for storing measures, then created the following;

Measure	Description	Formula
Failed Payments	Number of failed payments	`=CALCULATE(COUNT(fact_sales[PaymentStatus]), fact_sales[PaymentStatus]="Failed")`
Failed Payments Loss	Revenue lost due to failed payments	`=CALCULATE(SUM(fact_sales[ExpectedRevenue]), fact_sales[PaymentStatus]="Failed")`
Refunds	Number of refunds	`=CALCULATE(COUNT(fact_sales[PaymentStatus]), fact_sales[PaymentStatus]="Refunded")`
Refunds Loss	Revenue lost due to refunds	`=CALCULATE(SUM(fact_sales[ExpectedRevenue]), fact_sales[PaymentStatus]="Refunded")`
Free Trials	Number of free trials	`=CALCULATE(COUNT(fact_sales[ExpectedRevenue]), fact_sales[PaymentStatus]="Success", fact_sales[ActualRevenue]=0)`
Free Trial Loss	Revenue lost due to free trials ($0 value transactions)	`=CALCULATE(SUM(fact_sales[ExpectedRevenue]), fact_sales[PaymentStatus]="Success", fact_sales[ActualRevenue]=0)`
Active Customers	Distinct count of active customers at any point in time	`=CALCULATE(DISTINCTCOUNT(fact_sales[CustomerName]), fact_sales[Customer Status]="Active")`
Trial Customers	Numbers of customers on free trial	`=CALCULATE(DISTINCTCOUNT(fact_sales[CustomerName]), fact_sales[Customer Status]="Trial")`
Churned Customers	Number of churned customers at any point in time	`=CALCULATE(DISTINCTCOUNT(fact_sales[CustomerName]), fact_sales[Customer Status]="Churned")`
Total Customers	Number of customers	`=DISTINCTCOUNT(fact_sales[CustomerName])`
Churn Rate	Churn rate	`=DIVIDE([Churned Customers], [Total Customers], 0)`
Average Revenue per User (ARPU)	Average revenue each customer has paid/contributed	`=DIVIDE([Sum of ActualRevenue], [Active Customers], 0)`
Average Tenure	Average no. of months a customer stayed subscribed before churning (or up to current date if still active)	`=AVERAGEX(VALUES(fact_sales[CustomerName]),DATEDIFF(CALCULATE(MIN(fact_sales[SignUpDate])),CALCULATE(MAX(fact_sales[CoverageEndDate])), MONTH))`
ChurnRevenueLoss	Estimated revenue lost when a customer churns	`=CALCULATE([Sum of ExpectedRevenue]-[Sum of ActualRevenue], fact_sales[Customer Status]="Churned")`

Task 1 Process: Patterns and Plan-Level Differences in Customer Churn

This task focused on exploring churn behavior across different plans, billing cycles, and time periods. The goal was to create dynamic visuals that allow for flexible slicing while identifying top-level churn patterns.

Churn by Subscription Plan

I began by creating a pivot table that displayed churn rates across different subscription plans. To make it easier to identify the most affected plans, I created a second helper table (a copy of the first, linked via =) and added a Highlight column.

This column dynamically pulls in the top three churn rates using a combination of formulas.

I first used the LARGE function to identify the top 3 churn rates from the original pivot table. Then, I applied a nested IFERROR/IF logic to populate the Highlight column only if the churn rate matched one of the top three values. This made it easier to call attention to the highest churn values in a visual format.

I also created a third helper table to extract just the top churned plan and its rate. This made it possible to label key visuals dynamically, depending on the selected time frame (using slicers like Year, Month, and Quarter). These slicers allowed me to explore how churn patterns shift over time.

A clustered column chart was used to visualize churn rate by subscription plan. In addition to churn rate, I also analyzed churn volume—the actual number of customers lost per plan—using similar tables and visuals.

Churn by Billing Cycle

I repeated a similar process to explore churn patterns across different billing cycles (e.g., monthly vs. annual). Again, both churn rate and churn volume were considered, using helper tables to isolate and highlight the top values.

Churn by Month

For monthly churn, I created a separate pivot table and disconnected the Month and Quarter slicers to ensure that selections didn’t conflict.

I used a combo chart for the final visual, where the bar chart displayed churn rate and a line graph showed churned customers over time.

A highlight column identified the top 3 churn months, drawing attention to seasonal or time-based churn spikes.

Churn by Quarter

I used another pivot table to analyze quarterly churn trends. This one was connected to the Year slicer and helped reveal larger, more stable trends compared to monthly data.

Churn Trends by Subscription Plan + Billing Cycle

To see how plan and billing cycle interacted, I built a pivot table that broke down churn rates and volumes across both variables. I then flattened this data into a structured table, where each row contained a subscription plan, billing cycle, churn rate, and churned customer count.

From this table, I created two focused helper tables:

One to rank the top 3 highest churn rates by plan and billing cycle combo
Another to rank the top 3 highest churn volumes using the same combination

These were used to drive visual highlights and callouts in the report.

Task 2 Process: When Do Customers Churn, and How Much Revenue Is Lost?

The goal of this task is to determine how long customers typically stay before they churn and quantify the revenue lost from churned customers.

Step 1: Calculate Average Tenure for Churned Customers

Created a measure to calculate average tenure for churned customers only by filtering on [Customer Status] = "Churned".

This helped isolate how long, on average, a customer stays before canceling.

Step 2: Create Tenure Buckets (via Power Query)

Using DAX for buckets proved inefficient, so I switched to Power Query:

Duplicated the sales table

Grouped by CustomerName and computed:

Min Date = first transaction date
Max Date = most recent transaction date

2. Created TenureDuration column

= (Date.Year([MaxDate]) - Date.Year([MinDate])) * 12 + (Date.Month([MaxDate]) - Date.Month([MinDate]))

Note: The above formula calculates the total number of months between two dates by first computing the difference in years and converting it to months, then adding the difference in months. It ensures accurate month counts without assuming each month has a fixed number of days.

3. Merged the result back with the original sales table using CustomerName

4. Added Conditional Column to bucket the tenure into 3-month intervals:

0-3 Months, 4-6 Months, 7-9 Months,…up to 25+ Months. This resulted in 9 tenure buckets.

Step 3: Calculate Revenue Loss from Churn

Created a DAX measure to quantify churn-related revenue loss. The formula for this available in the Measures table.

Step 4: Analyze Churn by Tenure Bucket

Built a pivot table with Tenure (bucket) as rows and Churned Customers (measure) as values.
Created 2 helper tables to identify the top 3 tenure buckets with the highest churn volume.
Used these tables to drive dynamic visuals (top 3 highlights).

Step 5: Analyze Revenue Lost by Tenure Bucket

Applied a similar approach:

Pivoted on Tenure
Values = ChurnRevenueLoss
Created helper tables to isolate and highlight top tenure buckets by revenue loss

Task 3: How Much Revenue Is Lost Due to Failed or Refunded Payments?

To quantify the impact of failed payments, refunds, and free trials on revenue:

Created a Pivot Table for KPI calculation

Included the following measures from the data model:

ExpectedRevenue
FailedPaymentsLoss
RefundsLoss
FreeTrialLoss
ActualRevenue
RevenueGap

Created a KPI Summary Table from this pivot

For each loss type, I calculated:

Total amount lost
% of RevenueGap
% of ExpectedRevenue

Built a second Pivot Table summarizing `RevenueGap` by month (January to December).

From this second pivot, I;

Extracted the top 3 months with the highest revenue gap using a helper table + LARGE() and IF formulas
Used this to support a dynamic clustered column chart that automatically highlights those top 3 months

Report and Dashboard Build

As I worked through the analysis, I also built the accompanying Excel dashboard and report, shaping them in real time as new insights and challenges emerged. The final interactive version will be available on the site soon.

Series Navigation<< Data Cleaning Progress UpdateReport: Investigating Churn at Staffwise >>