Data

Lean Analytics – Alistair Croll, Benjamin Yoskovitz

Lean Analytics – by Alistair Croll & Benjamin Yoskovitz
Date read: 8/11/19. Recommendation: 8/10.

The best book that I’ve read to date on product metrics and using data to your advantage without overanalyzing. The core of the book focuses on how to use data to build a better startup faster. Croll and Yoskovitz walk through a dashboard for every stage of a business, from validating a problem, to identifying customers, to deciding what to build, to positioning yourself. They discuss how to choose strong metrics and the analytics frameworks available for building a successful business (Pirate Metrics, Engines of Growth, Lean Canvas, Growth Pyramid). Croll and Yoskovitz also define their own “Lean Analytics” framework which features five stages – empathy, stickiness, virality, revenue, scale – and explain the metrics you should be tracking and the gates required to move forward at each stage. This is a great resource for founders and product managers if you’re looking to improve your analytics and be more strategic with key metrics.

See my notes below or Amazon for details and reviews.


My Notes:

How to use data to build a better startup faster. Dashboard for every stage of business, from validating a problem, to identifying customers, to deciding what to build, to positioning yourself. 

Find a meaningful metric, experiment to improve it until it’s good enough for you to move on to the next stage of your business. 

Airbnb photography metric:
Hypothesis: hosts with professionally photographed homes will get more business and sign up for this as a service. Airbnb took the Concierge MVP approach and sent professional photographers to take pictures of hosts’ homes. Initial tests with this MVP showed that listings featuring professional photographs received 2-3X more bookings. The new metric they began to monitor was shoots per month (already knew it resulted in more bookings). 

Good metrics:
Comparative, understandable, a ratio or rate, changes the way you behave (actionable). 

Total signups = a vanity metric. “Total active users” is a bit more insightful, but even better is “percent of users who are active.” Tells you the level of engagement users have with your product. 

Be data-informed, not data-driven:
Slippery slope towards overanalyzing: “Using data to optimize one part of your business, without stepping back and look at the big picture can be dangerous – even fatal.”

Analytics frameworks for building a successful business:
Pirate metrics: AARRR - acquisition, activation, retention, revenue, referral. 

Learn Startup, engines that drive growth: sticky, virality, payment. 

Sean Ellis’s Growth Pyramid: product market fit, stack the odds (find defensible unfair advantage), scale growth.

Lean analytics: empathy, stickiness, vitality, revenue, scale. Each of these has a gate required before you’re able to move forward (page 53). 

Two-sided marketplaces:
Start by focusing on whoever has the money, model the buyer side as your primary focus. Harder to find people who want to spend money, than it is people who want to make money. 

When Uber launched in Seattle, they created supply. They overcame chicken-and-egg problem by buying up towncars, paying drivers $30 an hour to drive passengers, and switched to commission once there was sufficient demand. 

Business models vs. business plans:
“Business plans are for bankers; business models are for founders.”

“By knowing the kind of business you are, and the stage you’re at, you can track and optimize the One Metric That Matters to your startup right now. By repeating this process, you’ll overcome many of the risks inherent in early-stage companies or projects, avoid premature growth, and build atop a solid foundation of true needs, well-defined solutions, and satisfied customers.”

Empathy (stage 1):
Primary job is to get inside someone else’s head. Discovering and validating a real problem. Then find out if your proposed solution is likely to work. Interview at least 15 people at each stage. 

Goal of this stage is to determine whether the problem is painful enough for enough people and they’re already trying to solve it. 

“Always know what risk you’re eliminating, and then design the minimum functionality to measure whether you’ve overcome it.”

Stickiness (stage 2):
Focus is squarely on retention and engagement. Are people using product as expected? Are they getting enough value out of it?

Goal of this stage is to build a core set of features that gets used regularly and successfully. 

One in, one out: If a new feature doesn’t improve the one metric that matters most, remove it. 

Major risk is driving new traffic when you’re unable to convert that attention into engagement.

Revenue (stage 4):
Shift focus from proving idea is right to proving you can make money in scalable, self-sustaining way. 

You need to be able to answer these: how big can the business grow, how good can the margins get, and what kind of barriers to entry does it have?

“Users engage with the online world in three postures: creation (often on a computer with a keyboard), interaction (usually with a smartphone), and consumption (with a tablet).”

User groups and feedback:
You can get better answers by asking your customers to make a selection of one alternative from a set of possibilities, rather than asking them to rate something on a scale of 1 to 10. “Would you prefer a delicious, high calorie candy made with artificial ingredients or a bland, low calorie organic candy?”

“Asking customers to trade off variations of combinations, over and over, dramatically improves prediction accuracy.”

Confident Data Skills – Kirill Eremenko

Confident Data Skills – by Kirill Eremenko
Date read: 3/29/19. Recommendation: 8/10.

Great resource for those wanting to learn the fundamentals of data science. It’s particularly relevant if you’re looking to better leverage data in your existing job (as I am in product management) or explore a new career path in data science (huge opportunities here, in case you’ve been living under a rock). Eremenko does a great job breaking down the data science process for beginners and explaining the essential algorithms. Case studies from Netflix and LinkedIn, help bring these concepts to life.

See my notes below or Amazon for details and reviews.

My Notes:

Fundamentals:
Data has always been out there. What’s changed in the past decade is our ability to collect, organize, analyze and visualize it.

Data = quantitative AND qualitative.

“Big data” is a dynamic term given to datasets that are massive in volume (too big), velocity (too rapid), or variety (too many different data attributes). Technology always being developed to improve this, that’s why what we consider “big data” is in constant flux.

Cloud = storage facility with a virtualized infrastructure. 

Netflix:
The Netflix recommendation is a great example of the power of data science. Netflix was able use viewing habits to create niche subcategories (Exciting horror movies from the 1980s). They were also able to see overlap in audience’s viewing patterns - identifying that people who enjoyed political dramas also enjoyed Kevin Spacey films, which led them to remake House of Cards.

Healthcare:
One of the things that makes data science so powerful is the sheer volume it enables us to process. Can help support doctors in diagnosing patients. Doctor might have seen 5,000 patients in their career. Machine has accumulated knowledge of 1,000,000 cases.

Multidisciplinary:
Beneficial to have roots in a different discipline when you enter data science – gives you an advantage and helps you ask the right questions. 

The data science process:

  1. Identify the question

  2. Prepare the data (ETL - extract, transform, load)

  3. Analyze the data

  4. Visualize the insights

  5. Present the insights

Prepare the data:
-Extract the data from its sources – ensures that you aren’t altering the original source.

-Transform the data into a comprehensible language for access in a relational database. This step is about reformatting, joining, splitting, aggregating, and cleaning the data. 

-Load the data into the end source (warehouse).

Essential algorithms:
Three main groups – classification, clustering, reinforcement learning.

Classification – when you know the categories you want to group, or classify, new data points into (e.g. survey response to a yes/no question)

-Types of classification algorithms: decision trees, random forest, K-nearest neighbors (K-NN), Naive Bayes, logistic regression.

-Decision tree runs tests on individual attributes in your dataset in order to determine the possible outcomes. Questions are the branches, answers are the leaves. Better for smaller datasets.

-Random forest builds upon same principles as decision tree, it just uses many different trees to make the same prediction and averages the results from the individual trees. Every decision tree casts its vote, random forest takes most voted option. Better for larger datasets

-K-nearest neighbors (K-NN) analyzes likeness by calculating the distance between a new data point and existing data points. Deterministic model. Assumption it makes is that unknown features will be similar. 

-Naive Bayes allows new data points to be easily included in the algorithm to dynamically update the probability value. Probabilistic model. Good for non-linear problems where classes cannot be separated with a straight line on the scatter plot and for datasets containing outliers (other algorithms easily biased by outliers). Drawback: naive assumptions made can create bias.

-Logistic regression is good for analyzing the likelihood of a customer’s interest in your product, evaluating response of customers based on demographic data, specifying which variable is the most statistically significant.

-Simple linear regression analyzes relationship between one dependent and one independent variable.

-Multiple linear regression analyzes relationships between on dependent and two or more independent variables.

Clustering – when you don’t know the groups you want an analysis to place your data into (e.g. survey based on age, distance from company’s closest store). 

-Types of clustering algorithms: K-means, hierarchical.

-K-means discovers statistically significant categories or groups in a given dataset. 

-Hierarchical includes agglomerative (bottom-up, works from single data point and groups it with nearest data points in incremental steps until all points have been absorbed into single cluster, this is the most common) and divisive (begins at top, single cluster encompasses all data points, works its way down, splitting the single cluster apart in order of distance between data points), both recorded in a dendrogram. 

Reinforcement learning - a form of machine learning that leans on concepts of behaviorism to train AI. 

-Types of algorithms: Upper confidence bound, Thompson sampling.

-Upper confidence bound (UCB) is a dynamic strategy that increases in accuracy as additional information is collected. Deterministic. After a single round, use data to alter bounds of one of the variants. Good for finding most effective ad campaigns or managing multiple project finances.