library(tidyverse)
library(modelsummary)
library(marginaleffects)
library(broom)
firms <- read_csv("data/bisnode-firms.csv")Exercise: Modelling Binary Outcomes
Session 3 · Quantitative Data Analysis
Session business question: Can we predict which firms are likely to exit the market in the next year?
Work through the tasks below. For each task, write your R code in the code chunk provided and your interpretation in the text below it. Aim to write complete sentences — imagine you are explaining your findings to a manager who has not studied statistics.
You have approximately 40 minutes.
1 Setup
Run the setup chunk to load packages and data.
Take two minutes to familiarise yourself with the data:
# Explore the data
glimpse(firms)Rows: 1,916
Columns: 11
$ comp_id <dbl> 1022796, 1340059, 1627539, 1885666, 2407000, 8516875, 10…
$ exit <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ sales_log <dbl> 10.169549, 10.440306, 11.355318, 9.204397, 10.806355, 9.…
$ employees_log <dbl> -2.48490662, -2.39789524, -2.48490662, -2.48490662, -2.0…
$ profit_margin <dbl> -0.050241269, -0.119844111, -0.252612413, 0.087928463, 0…
$ age <dbl> 11, 2, 1, 3, 2, 5, 12, 4, 2, 6, 2, 1, 23, 21, 1, 6, 3, 9…
$ liquidity <dbl> 0.90785164, 0.35994765, 0.48970007, 0.55894003, 0.848029…
$ foreign <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,…
$ female_ceo <dbl> 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0,…
$ industry <dbl> 56, 56, 56, 56, 56, 56, 29, 47, 56, 56, 56, 56, 56, 56, …
$ urban <dbl> 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,…
Question: What does one row in this dataset represent? What does exit = 1 mean?
Your answer:
2 Task 1: Fit a Logistic Regression Model
Fit a logistic regression model that predicts firm exit using the following predictors:
sales_log(log of sales revenue)profit_margin(profit as a share of sales)employees_log(log of number of employees)age(firm age in years)
Call your model model_base.
Use glm() with family = binomial. The syntax is otherwise the same as lm().
# Fit the logistic regression model
# model_base <- ___Inspect the output:
# summary(model_base)Question 1a: Which predictors are statistically significant at the 5% level?
Your answer:
Question 1b: Based on the signs of the coefficients, which factors seem to increase the probability of exit, and which seem to decrease it? Does this make economic sense?
Your answer:
3 Task 2: Extend Your Model
Now add at least two more predictors from the dataset to your model. Use your subject-matter intuition: which firm characteristics might predict exit?
Call your extended model model_ext.
# Look at available variables
# names(firms)# Fit the extended model
# model_ext <- ___Compare your two models side-by-side using modelsummary():
# YOUR CODE HEREQuestion 2a: Did adding the new predictors improve the model fit (use AIC to compare: lower is better)?
Your answer:
Question 2b: Did any coefficients change notably when you added the new variables? What might that indicate?
Your answer:
4 Task 3: Interpret Coefficients as Odds Ratios
For your extended model, compute and display the odds ratios (with 95% confidence intervals):
# Odds ratios
# exp(coef(model_ext))
# With confidence intervals
# exp(confint(model_ext))Question 3: Choose one predictor from your model and write a complete, businesslike interpretation of its odds ratio. Use the following template:
“A one-unit increase in [variable] is associated with a multiplication of the odds of firm exit by [odds ratio]. This means the odds of exit [increase/decrease] by [X]%. This effect is [statistically significant / not statistically significant] at the 5% level.”
Your answer:
5 Task 4: Predicted Probabilities for Two Firm Profiles
Define two contrasting firm profiles and compute the predicted probability of exit for each. Choose profiles that tell an interesting story.
Fill in the scenario values below:
# scenarios <- tibble(
# label = c("Profile A: ___", "Profile B: ___"),
# sales_log = c(___, ___), # replace with values
# employees_log = c(___, ___),
# profit_margin = c(___, ___),
# age = c(___, ___),
# # add your additional predictors here
# )
# scenarios <- scenarios |>
# mutate(pred_prob = predict(model_ext, newdata = scenarios, type = "response"))Question 4: Describe your two profiles in plain language. Why did you choose these particular values? What story do the predicted probabilities tell?
Your answer:
6 Task 5: Write a Finding Using Inline R Code
Write a short paragraph (3–5 sentences) summarising one key finding from your analysis. Your paragraph must include at least two numbers embedded as inline R code (not typed manually).
Your paragraph:
According to the logistic regression model, …
(Continue writing here, embedding inline R code for your numbers)
7 Bonus: Average Marginal Effects
Optional — attempt this if you finish the main tasks early.
Compute the average marginal effects (AMEs) for your extended model:
# avg_slopes(...)Bonus question: Compare the AME for profit_margin to the LPM coefficient for profit_margin (you may need to fit a quick LPM). Are they similar? What does this tell you about when the LPM might be acceptable?
# Fit LPM for comparison
# lpm <- lm(exit ~ sales_log + profit_margin + employees_log + age,
# data = firms)
# Compare coefficientsYour answer:
8 Wrap-up Reflection
Before we debrief, take 2 minutes to note:
What was the most conceptually challenging part of this exercise?
If you had to explain logistic regression to a colleague with no statistics background, what analogy or example would you use?
Your answers: