Product Analytics: Growth Accounting Walkthrough

Ben Bregman
9 min readFeb 22, 2019

In this post, I’ll share a walk-through of the definitions, implementation, and output of growth accounting for a sample app. The app we’ll use is Poesie — an app I built last year to help people read and learn more about poetry. You can download it here: https://itunes.apple.com/app/id1332841582.

Let’s get started!

Sneak peak of our main output: a growth accounting chart of active users.

Introduction: Goal and definitions

A great intro to growth accounting is written here: https://medium.com/swlh/diligence-at-social-capital-part-1-accounting-for-user-growth-4a8a449fddfc. I’ll summarize some of the article before moving on to the implementation walk-through.

The purpose of growth accounting is to break apart a basic user growth metric like DAU/WAU/MAU into explanatory and actionable parts. The basic approach is to think about a user-base in terms of (1) new users, coming in for the first time; (2) retained users, who are sticking in the app; (3) churned users, who have used and left the app; and (4) resurrected users, who have used, left, and returned to the app.

With this framework, the current [x]AU of your app at any point can be described as retained + new + resurrected users. And, change in [x]AU over time can be thought of as new + resurrected - churned users in that time period. If you are bringing in more users than leaving, you are growing. The growth rate will depend on how many users you are churning out vs. how many users are coming in.

This framework is helpful for understanding how to prioritize work. For example, a drop in user growth due drop in “new” users vs “retained” users leads to very different action items — the former may encourage you to look at your marketing or invite mechanisms; the latter may encourage you to look at recent changes to features or app performance. Noticing room for bringing back for “resurrected” users may encourage you to improves features like notifications that engage users who have dropped out of the app.

Implementation: Preparing the data sets

In this section, we’ll create the data sets that allow us to visualize and analyze our user base using the growth accounting frameworks. Our goal is to plot growth metrics over dates. So we will need logging, user/date rollups, and then finally date rollups.

1. The first step is to prepare a raw list of relevant user activity. To prepare this, we must define what activity will be counted for active usage. In this example, I define active usage as “viewed a poem”. (Notably, app users who have not created an account do not view a poem. So, my analysis will occur downstream of a successful account creation experience.) The core input dataset for our process is a list of the relevant events with key fields [user_id, event_time].

2. The first intermediary output is a table telling us whether any given user is active on any given day. This will be used to begin rolling up our counts using the definitions from the introduction. In order to create this data set, we need to perform two operations on the raw data set: first, transform each event_time into a date; second, add a TRUE “is_active” flag if any rows exist for a user and date combination. We should end up with a table with rows of the form [user_id, date, is_active].

Code for this looks something like what we have below, written in R. Note that my “is_active” flag is just that you have non-zero poem views. No specific reason to iterate in groups of 7 besides “fetch_single_date” activity is resource intensive.

# Goal: Fetch "activity" logging for dates and create a user/date/is_active table.file_name = "user_date_poem_views.csv"user_date_poem_views <- data.frame(
user=character(),
Date=character(),
Poems=integer()
)
read_rows <- read.csv(file_name, stringsAsFactors=FALSE)[,-1]
user_date_poem_views = rbind(user_date_poem_views, read_rows)
date = max(user_date_poem_views$Date)for (increment in 0:7) { user_activity = fetch_single_date(date) # (<- This is a function that fetches activity logging for a single date.) new_row = aggregate(x = user_activity$distinct_id, by = list(user_activity$distinct_id), FUN = length)
names(new_row) <- c("user", "Poems")
new_row$Date = date
user_date_poem_views = user_date_poem_views[user_date_poem_views$Date != date,]
user_date_poem_views = rbind(user_date_poem_views, new_row)
date = format(as.Date(date) + 1, "%Y-%m-%d")
}write.csv(user_date_poem_views, file = file_name,row.names=TRUE)

3. The second intermediary output is a table assigning every user/date pair a flag based on their [x]AU growth accounting “status”. In order to determine this flag, we must know the user’s creation date and whether they were active X days ago. With that information, the flags are defined as: (1) new => if your creation date is within the past X days, (2) retained => if you were active X days ago AND are active today, (3) churned => if you were active X days ago AND NOT active today, (4) resurrected => if you were churned X days ago AND are active today. For completion we can also add (5) stale => if you were churned X days ago AND NOT active today, and (-1) not created => if you havent existed yet on a given date. After adding these flags, our table looks like [user_id, date, growth_flag].

Code for this looks something like what we have below, written in R. For simplicity, we’re iterating on a per user, per date basis. Though, once we have prepared weekly active flags, there’s no real need to iterate at all — everything could be done simply by inspecting individual elements of the earlier table independently.

# Goal: Take our user/date/is_active table and, by looking across a given timeframe, create a user/date/growth-accounting-flag table.file_name = "user_date_poem_views.csv"user_activity <- read.csv(file_name, stringsAsFactors=FALSE)[,-1]
user_activity$Date = as.Date(user_activity$Date)
creation_dates = aggregate(Date ~ user, user_activity, function(x) min(x))unique_users = unique(user_activity$user)overall_start_date = min(user_activity$Date)
overall_end_date = max(user_activity$Date)
all_dates = seq(as.Date(overall_start_date), as.Date(overall_end_date), by="days")
user_session_matrix <- matrix(NA, nrow = length(unique_users), ncol = length(all_dates))
user_session_matrix[matrix(c(match(user_activity$user, unique_users), match(user_activity$Date, all_dates)), ncol = 2)] = user_activity$sessions
print("Filling non-null dates since creation.")
# TODO: Vectorize.
for (row in 1:nrow(user_session_matrix)) {
user_row = user_session_matrix[row,]
isna <- is.na(user_row)
nonna <- match(FALSE,isna)
id <- which(isna)
user_row[id[id>nonna]] <- 0
user_session_matrix[row,] = user_row
}
print("Creating per-day active flags.")
# TODO: Vectorize.
colnames(user_session_matrix) = as.character(all_dates)
rownames(user_session_matrix) = unique_users
recent_sessions = user_session_matrix
lookback = 1
while (lookback < (timeframe)) {
print(paste("Lookback, row: ", as.character(lookback), " of ", as.character(timeframe)))
lookback_session = user_session_matrix[,1:(ncol(user_session_matrix) - lookback)]
na_col = matrix(NA, nrow = length(unique_users), ncol = lookback)
lookback_session = cbind(na_col, lookback_session)
colnames(lookback_session) = colnames(user_session_matrix)
recent_sessions = abind(recent_sessions, lookback_session , along = 3)
lookback = lookback + 1
}
user_date_active_states = matrix(NA, nrow = length(unique_users), ncol = length(all_dates))
colnames(user_date_active_states) = as.character(all_dates)
rownames(user_date_active_states) = unique_users
rowcount = length(unique_users)
# TODO: Vectorize.
for (row in 1:nrow(user_session_matrix)) {
print(paste("Active states, row: ", as.character(row), " of ", as.character(rowcount)))
for (col in 1:ncol(user_session_matrix)) {
recent_history = recent_sessions[row, col, ]
if (sum(is.na(recent_history)) == length(recent_history)) {
# not active yet
user_date_active_states[row, col] = NA
} else {
# active at some point
recent_history[is.na(recent_history)] <- 0
if (sum(recent_history) > 0) {
# active this timeframe
user_date_active_states[row, col] = 1
} else {
# not active this timeframe
user_date_active_states[row, col] = 0
}
}
}
}
print("Creating daily growth states.")
# TODO: Vectorize.
user_date_growth_states = matrix(NA, nrow = length(unique_users), ncol = length(all_dates))
colnames(user_date_growth_states) = as.character(all_dates)
rownames(user_date_growth_states) = unique_users
compare_sessions = user_date_active_states
lookback_session = user_date_active_states[,1:(ncol(user_date_active_states) - timeframe)]
na_col = matrix(NA, nrow = length(unique_users), ncol = timeframe)
lookback_session = cbind(na_col, lookback_session)
compare_sessions = abind(compare_sessions, lookback_session , along = 3)
rowcount = nrow(user_session_matrix)
new_limit = 1
for (row in 1:nrow(user_session_matrix)) {
print(paste("Growth states, row: ", as.character(row), " of ", as.character(rowcount)))
for (col in 1:ncol(user_session_matrix)) {
compare_history = compare_sessions[row, col, ]
if (is.na(compare_history[1])) {
# not active yet
user_date_growth_states[row, col] = -1
} else if (is.na(compare_history[2])) {
# new user
user_date_growth_states[row, col] = 1
} else {
if (compare_history[1] == 0) {
# inactive
if (compare_history[2] > 0) {
# just churned
user_date_growth_states[row, col] = 2
} else {
# has been churned
user_date_growth_states[row, col] = 5
}
} else {
# active
if (compare_history[2] > 0) {
user_date_growth_states[row, col] = 3
} else {
user_date_growth_states[row, col] = 4
}
}
}
}
}
write.csv(user_date_growth_states, file = output_file,row.names=FALSE)

4. Finally, we can create the rollup we were aiming for. This will be an overall count of the number of users in a given flag on a given date. So, we aggregate over date and count the number of users with any given flag. After performing this operation, our roll-up table will be of the form [event_date, growth_flag, user_count]. We are done!

Code for this looks something like what we have below, written in R. For simplicity, we’re populating the counts on a per date loop. Though, there’s no real need to iterate at all — everything can be done simply by summing columns independently.

# Goal: Take our user/date/growth-accounting-flag table and count the number of users in any given bucket for each date.dates_to_rollup = tail(colnames(user_date_growth_states), -1)
user_date_growth_rollup <- data.frame(matrix(ncol = 6, nrow = 0))
colnames(user_date_growth_rollup) <- c("Date", "New", "Churned", "Retained", "Resurrected", "Stale")
for (date in dates_to_rollup) {
user_states_for_date = user_date_growth_states[,date]
new_users = length(user_states_for_date[user_states_for_date == 1])
churned_users = length(user_states_for_date[user_states_for_date == 2])
retained_users = length(user_states_for_date[user_states_for_date == 3])
resurrected_users = length(user_states_for_date[user_states_for_date == 4])
stale_users = length(user_states_for_date[user_states_for_date == 5])
rollup_row = c(date, new_users, churned_users, retained_users, resurrected_users, stale_users)
names(rollup_row) <- c("Date", "New", "Churned", "Retained", "Resurrected", "Stale")
user_date_growth_rollup = rbind(user_date_growth_rollup, t(rollup_row))
}timeframe = 7
file_name = paste("user_date_", as.character(timeframe), "d_growth_rollup.csv")
write.csv(user_date_growth_rollup, file = file_name,row.names=TRUE)

Exploration: Analyzing the Data Sets

Now, I’ll show the most basic visualizations that come out of this data set along with some hints at deeper analysis.

The most basic visualization of our data set is a plot over time of [x]AU along with new/retained/churned/resurrected users. This immediately provides explanation for the current [x]AU and any deltas over time. Note that if you are summing on a given day, the churned is not actually a part of the [x]AU equation but is just a useful derivative on retained.

Here’s a data quality side-note: Our activity logging didn’t start until May. Luckily, we had a separate table of user creation dates. Importing this table as mock “activity” is why we only have new/churned in the first months. Though we get a “resurrected” spike in the first week of logging, we successfully avoid any incorrect “new user” counts down the line.

Another useful visualization dives in to the deltas in [x]AU. As described above, changes in [x]AU can be explained by changes in incoming/outgoing users. Our “rates” here are defined relative to the week before. As long as resurrection + new is greater than churn, we’ll have a positive week over week WAU growth.

Finally, we can also begin to dive into some interesting analyses. One of the most important indicators of a successful product is your retention rate. In our framework, a higher retention rate is equivalent to lower churn and lead to better growth (assuming you have any new users coming in). Let’s take a look at retention rates, by taking [Users who were active on Day N] and seeing what portion were STILL active on Day N+X.

Note: Here you will see one very interesting breakdown — splitting overall users (orange) into “new” (blue) and “old” (green). Different features can have vastly different effects on first time users vs experienced users. Guiding users from their first use, into return usage, and into retained usage is a huge part of building a successful user journey. For example, some issues with content loading times in November may have had a pretty big negative impact on new users… a sad waste of new user acquisition. Good thing we improved loading times!

Next steps: Putting insight into action

Once we begin understanding our user base in terms of new/retained/churned/resurrected, we can prioritize next features. For example, we may have observed in the last chart how critical fast loading times is for the new user experience. (Note: “may” because there is no A/B testing… just correlative observation). We should continue to improve perf and observe how much impact that has on new user retention.

Next, this language has been a stark reminder how much additional work I could be doing to resurrect stale users. How smart are my notifications in the day, week, month after you drop out? (Answer: not very smart, at the moment). There is plenty to do here — and we can validate our changes by watching lifts in resurrection rates and the impact on WAU.

We should continue doing whatever we can to reduce user churn and increase retention. We’ve observed the (reasonably expected) difference in retention between first time and regular users —so what features can we build to guide new users into a successful regular-use experience? What about the difference between medium-term and dedicated long-term users?

Finally, we can always ask ourselves when it is time to make a push on new user growth. This may include prioritizing invite features or investing in marketing. The main factor here is the retention of new users —complicated by the fact that new users, acquired through different channels, will likely have different retention. Higher new user retention means a better return on our investment into new user acquisition; lower retention means marketing will be wasted.

That’s all for now! Hope you enjoyed the post.

--

--

Ben Bregman

Violin teacher in Santa Cruz, CA. App developer as a hobby.