Session 6: Panel Data and Fixed Effects — Live Demo

Advanced Data Science · Europa-Universität Flensburg

Author

Claudius Gräbner-Radkowitsch

Published

28 05 2026

1 Setup

1.1 Business question

Does rising income make countries happier — and what would it take to interpret that relationship causally?

This question has a deceptively simple answer in cross-sectional data (yes, richer countries are happier) and a much less clear answer when we follow countries over time (the Easterlin paradox). Panel data methods let us be precise about which comparison we are making and what we can credibly claim.

1.2 Packages and data

Code

here::i_am("content/material/session06/lecture_notes.qmd")
library(here)
library(tidyverse)
library(fixest)
library(modelsummary)
library(flextable)
library(patchwork)

dat <- read_csv(
  here("content/material/session06/data/happiness_income.csv"), 
  show_col_types = FALSE)
glimpse(dat)

Rows: 1,869
Columns: 10
$ country      <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan…
$ iso2c        <chr> "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF…
$ iso3c        <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "…
$ year         <dbl> 2011, 2012, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 202…
$ GDP_ppp      <dbl> 80913174278, 91231454558, 98965950295, 100402257641, 1026…
$ population   <dbl> 29347708, 30560034, 32792523, 33831764, 34700612, 3568893…
$ unemployment <dbl> 7.830, 7.875, 7.915, 9.032, 10.116, 11.184, 11.192, 11.18…
$ GDP_pc       <dbl> 2757.053, 2985.319, 3017.943, 2967.692, 2958.785, 2952.99…
$ happiness    <dbl> 4.2580, 4.0400, 3.5750, 3.3600, 3.7940, 3.6320, 3.2030, 2…
$ gini_disp    <dbl> 31.7, 31.8, 31.9, 31.9, 31.9, 31.9, NA, NA, NA, NA, NA, N…

The dataset merges three sources:

Variable	Source	Description
`happiness`	World Happiness Report 2026	Life satisfaction (0–10, 3-year average)
`GDP_pc`	World Bank WDI	GDP per capita, PPP (constant 2017 int’l $)
`gini_disp`	SWIID 9.2	Disposable income Gini coefficient
`unemployment`	World Bank WDI	Unemployment rate (% of labour force)
`population`	World Bank WDI	Total population

2 Exploring the Data

2.1 Panel structure

Code

dat |>
  summarise(
    n_countries = n_distinct(country),
    n_years     = n_distinct(year),
    year_min    = min(year),
    year_max    = max(year),
    n_obs       = n()
  ) |> 
  flextable() |> 
  theme_vanilla() |> 
  autofit()

n_countries	n_years	year_min	year_max	n_obs
146	14	2,011	2,025	1,869

Code

dat |>
  count(country) |>
  count(n, name = "n_countries") |>
  rename(years_observed = n)|> 
  flextable() |> 
  theme_vanilla() |> 
  autofit()

years_observed	n_countries
1	2
2	1
3	2
4	2
5	2
6	3
7	2
9	2
10	4
11	1
12	2
13	6
14	117

The panel is unbalanced — not every country is observed in every year. This is typical for survey-based cross-country data and causes no problems for fixed effects estimation.

2.2 Cross-country variation in happiness

Before running any regression, it helps to see the structure of the data. How much of the variation in happiness is between countries vs. within countries over time?

Code

dat |>
  group_by(country) |>
  summarise(
    happiness = mean(happiness, na.rm = TRUE),
    log_gdp   = mean(log(GDP_pc), na.rm = TRUE),
    .groups   = "drop"
  ) |>
  filter(!is.na(happiness), !is.na(log_gdp)) |>
  ggplot(aes(log_gdp, happiness)) +
  geom_point(colour = "#2C5F8A", alpha = 0.7, size = 2) +
  geom_smooth(method = "lm", colour = "#D04A2F", se = TRUE,
              fill = "#D04A2F", alpha = 0.12, linewidth = 1) +
  labs(
    x = "Log GDP per capita (PPP), country mean",
    y = "Life satisfaction (0–10), country mean"
  ) +
  theme_minimal(base_size = 13)

Figure 1: Average happiness and log GDP per capita by country (full-period means). Richer countries are clearly happier in cross-section.

The cross-sectional correlation is strong and positive. But is this evidence that income causes happiness?

2.3 Within-country trends: the Easterlin challenge

Code

highlight <- c("United States", "Japan", "South Korea", "China", "Germany")

col_hl <- c(
  "United States" = "#D04A2F",
  "Japan"         = "#e67e22",
  "South Korea"   = "#8e44ad",
  "China"         = "#2C5F8A",
  "Germany"       = "#4A9B6F"
)

dat |>
  filter(country %in% highlight, !is.na(happiness)) |>
  ggplot(aes(year, happiness, colour = country)) +
  geom_line(linewidth = 1.1) +
  geom_point(size = 1.8) +
  scale_colour_manual(values = col_hl, name = NULL) +
  scale_x_continuous(breaks = seq(2006, 2024, by = 4)) +
  labs(x = NULL, y = "Life satisfaction (0–10)",
       caption = "Source: World Happiness Report (3-year averages)") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "right")

Figure 2: Happiness over time for selected countries. GDP per capita rose substantially in all of them — but happiness trends differ.

China’s GDP per capita roughly quadrupled over this period. Its happiness score rose modestly. The United States had strong aggregate growth — yet happiness has trended flat or downward. This is the Easterlin paradox in the data.

3 The Model Progression

3.1 Step 0 — Pooled OLS: the naive baseline

Pool all country-year observations and regress happiness on log income, ignoring panel structure entirely.

Code

m0 <- feols(happiness ~ log(GDP_pc), data = dat)

modelsummary(
  list("Pooled OLS" = m0),
  coef_map  = c("log(GDP_pc)" = "Log GDP per capita"),
  statistic = "conf.int", conf_level = 0.95,
  gof_map   = c("nobs", "r.squared"),
  stars     = TRUE,
  output    = "flextable"
) |> autofit()

	Pooled OLS
Log GDP per capita	0.813***
	[0.786, 0.840]
Num.Obs.	1729
R2	0.672
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001

The large positive coefficient is not surprising — it reflects the strong cross-sectional pattern we already saw. But it conflates between-country differences (richer country = happier country) with within-country change (rising income → rising happiness). The former is dominated by stable country characteristics that we have not controlled for.

3.2 The composite error — why pooled OLS is biased

The pooled OLS error $\epsilon_{it}$ has two components:

\[\epsilon_{it} = \eta_i + v_{it}\]

$\eta_i$: everything stable about country $i$ that we do not observe — culture, institutions, social trust, history
$v_{it}$: genuine year-to-year noise

If $\eta_i$ correlates with $\log(\text{GDP}_{it})$ — and it clearly does, since rich institutional environments produce both high income and high happiness — pooled OLS is biased. This is OVB with an unobserved variable.

3.3 Step 1 — Country fixed effects: removing stable confounders

Country FE absorbs $\eta_i$ entirely by subtracting each country’s time-mean from every variable:

\[\tilde{y}_{it} = y_{it} - \bar{y}_i \qquad \tilde{x}_{it} = x_{it} - \bar{x}_i\]

Because $\eta_i - \bar{\eta}_i = 0$, the stable country component disappears. The coefficient is now identified from within-country variation only: when a country’s income rises above its own historical average, does its happiness rise too?

Code

m1 <- feols(happiness ~ log(GDP_pc) | country,
            vcov = ~country,
            data = dat)

modelsummary(
  list("Country FE" = m1),
  coef_map  = c("log(GDP_pc)" = "Log GDP per capita"),
  statistic = "conf.int", conf_level = 0.95,
  gof_map   = c("nobs", "r.squared", "r2.within"),
  stars     = TRUE,
  output    = "flextable"
) |> autofit()

	Country FE
Log GDP per capita	1.259***
	[0.829, 1.688]
Num.Obs.	1727
R2	0.928
R2 Within	0.150
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001

Contrary to a naive expectation, the country FE coefficient is larger than in pooled OLS — see the full discussion below.

3.3.1 Visualising the within-transformation

To build intuition, let us manually demean one slice of the data and overlay the within-country slopes on the raw scatter.

Code

within_dat <- dat |>
  filter(country %in% highlight) |>
  group_by(country) |>
  mutate(
    happiness_dm = happiness   - mean(happiness,    na.rm = TRUE),
    log_gdp_dm   = log(GDP_pc) - mean(log(GDP_pc),  na.rm = TRUE)
  ) |>
  ungroup()

p_raw <- dat |>
  filter(country %in% highlight) |>
  ggplot(aes(log(GDP_pc), happiness)) +
  geom_point(colour = "grey70", alpha = 0.4, size = 1.5) +
  geom_smooth(method = "lm", colour = "#D04A2F", se = FALSE,
              linewidth = 0.8, linetype = "dashed") +
  geom_smooth(aes(colour = country), method = "lm", se = FALSE, linewidth = 1) +
  scale_colour_manual(values = col_hl, guide = "none") +
  labs(x = "Log GDP per capita", y = "Happiness",
       title = "Raw data", subtitle = "Dashed = pooled OLS") +
  theme_minimal(base_size = 11)

p_demeaned <- within_dat |>
  ggplot(aes(log_gdp_dm, happiness_dm)) +
  geom_point(aes(colour = country), alpha = 0.6, size = 1.5) +
  geom_smooth(method = "lm", colour = "#D04A2F", se = FALSE, linewidth = 1) +
  scale_colour_manual(values = col_hl, name = NULL) +
  labs(x = "Log GDP (demeaned)", y = "Happiness (demeaned)",
       title = "Demeaned data", subtitle = "Within-country variation only") +
  theme_minimal(base_size = 11) +
  theme(legend.position = "bottom")

p_raw + p_demeaned

Figure 3: Raw data vs. demeaned data for selected countries. The grey overall trend is dominated by between-country differences; coloured lines show within-country slopes only.

The between-country slope (dashed, left panel) is steep. The within-country slope (right panel) is flatter — in some countries barely positive. This is the Easterlin paradox captured in a single figure.

3.4 Step 2 — Two-way fixed effects: controlling for global time shocks

Country FE removes stable country differences. But what about shocks that hit all countries in the same year — a global recession, a pandemic, a commodity price surge? If such shocks correlate with income changes, country-only FE is not enough.

Two-way FE adds a separate intercept for each year:

\[y_{it} = \alpha_i + \gamma_t + \beta \log(\text{GDP}_{it}) + v_{it}\]

$\hat\beta$ is now identified from variation that is neither explained by which country nor which year.

Code

m2 <- feols(happiness ~ log(GDP_pc) | country + year,
            vcov = ~country,
            data = dat)

modelsummary(
  list("Two-way FE" = m2),
  coef_map  = c("log(GDP_pc)" = "Log GDP per capita"),
  statistic = "conf.int", conf_level = 0.95,
  gof_map   = c("nobs", "r.squared", "r2.within"),
  stars     = TRUE,
  output    = "flextable"
) |> autofit()

	Two-way FE
Log GDP per capita	1.554***
	[0.943, 2.164]
Num.Obs.	1727
R2	0.930
R2 Within	0.142
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001

3.5 The standard progression side by side

Code

modelsummary(
  list("Pooled OLS" = m0, "Country FE" = m1, "Two-way FE" = m2),
  coef_map  = c("log(GDP_pc)" = "Log GDP per capita"),
  statistic = "conf.int", conf_level = 0.95,
  gof_map   = c("nobs", "r.squared", "r2.within"),
  stars     = TRUE,
  notes     = "95% CIs in brackets. Pooled OLS: HC1 SEs. FE models: clustered at country level.",
  output    = "flextable"
) |> autofit()

Table 1: Three models of the income–happiness relationship. 95% confidence intervals in brackets. Within-R² is the relevant fit statistic for FE models.

	Pooled OLS	Country FE	Two-way FE
Log GDP per capita	0.813***	1.259***	1.554***
	[0.786, 0.840]	[0.829, 1.688]	[0.943, 2.164]
Num.Obs.	1729	1727	1727
R2	0.672	0.928	0.930
R2 Within		0.150	0.142
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001
95% CIs in brackets. Pooled OLS: HC1 SEs. FE models: clustered at country level.

Reading the table:

Pooled OLS → Country FE: The coefficient increases from 0.81 to 1.26. This is counter-intuitive but real: the within-country income–happiness relationship is stronger than the cross-sectional comparison between countries. Countries going through active income growth (e.g. emerging markets) show larger happiness gains than the gap between a rich and a poor country at one point in time would suggest. The cross-sectional estimate is compressed partly by diminishing returns at high income levels and by stable between-country differences in happiness baselines.
Country FE → Two-way FE: The coefficient increases further to 1.55. Year FE removes global shocks that simultaneously raised incomes and happiness everywhere (good global years); after removing those, the within-country, within-year income–happiness signal is stronger.
Within-R² (0.15–0.14): income explains roughly 15–14% of within-country happiness variation — meaningful but modest. Happiness has many determinants beyond income.

4 Causal Interpretation

4.1 What the FE estimate claims — and does not claim

The two-way FE coefficient answers:

When a country’s income rises above its own historical average in a year that is not unusually good or bad for all countries, does its happiness tend to rise?

Country FE already controls for:

All time-invariant country characteristics: stable institutions, culture, geography, social trust, historical development path
This includes most of what makes Scandinavian countries both rich and happy — it is absorbed into $\alpha_i$

Year FE already controls for:

Global shocks affecting all countries in a given year: financial crises, pandemics, commodity cycles

What is still potentially uncontrolled:

Time-varying factors that change differently across countries and independently affect both income and happiness. Two obvious candidates in our data: income inequality and unemployment.

4.2 Three estimands — precision about what you want to estimate

Before adding more controls, it is worth being precise about which causal quantity you are actually after.

Estimand	What you condition on	What it answers
Total effect of income on happiness	Nothing on the causal path	“Does richer = happier, all channels included?”
Controlled direct effect (CDE)	Mediators held fixed (e.g. healthcare)	“Does income matter beyond health, trust, etc.?”
Indirect / path-specific effect via M	Estimated via mediation analysis	“How much of the income effect runs through health?”

Our main estimate (m2) is an attempt at the total within-country effect — it includes all pathways through which rising income may raise happiness: better healthcare, reduced stress, more autonomy, improved public services, and any direct effect.

4.3 A common temptation: controlling for WHR variables

The WHR reports several variables alongside happiness: social support, healthy life expectancy, freedom to make life choices, perceptions of corruption. The temptation is to add these as controls to “isolate the direct income effect” or “remove confounding.”

This is the bad-control problem again

Most of these WHR variables are mediators, not confounders:

Healthy life expectancy: income → health → happiness
Social support: income enables social networks → happiness
Freedom to choose: affluence enables autonomy → happiness

Conditioning on a mediator does two things simultaneously:

It removes part of the income effect (blocking the mediated path)
If the mediator also has independent determinants (culture, geography), it opens new backdoor paths — introducing bias rather than removing it

The result is a controlled direct effect that answers a different question than the total effect — and does so with additional bias from the newly opened paths. This is precisely the “bad control” problem from Session 5, applied at the macro level.

4.4 Stage 1 — Total effect with time-varying confounders

The appropriate additions are variables that independently affect both income and happiness and change over time within countries, without lying on the causal path from income to happiness.

Income inequality (gini_disp): The US Easterlin story is partly about rising inequality — aggregate GDP grew, but most of the gains went to the top, so median incomes stagnated. Gini is a time-varying confounder (or effect modifier): it captures distributional changes that affect happiness independently of average income.

Unemployment (unemployment): Labour market conditions affect wellbeing through channels (job security, social belonging) that are partly independent of aggregate income.

Code

m3a <- feols(happiness ~ log(GDP_pc) + gini_disp               | country + year,
             vcov = ~country, data = dat)

m3b <- feols(happiness ~ log(GDP_pc) + unemployment            | country + year,
             vcov = ~country, data = dat)

m3  <- feols(happiness ~ log(GDP_pc) + gini_disp + unemployment | country + year,
             vcov = ~country, data = dat)

Code

modelsummary(
  list(
    "Two-way FE"    = m2,
    "+ Gini"        = m3a,
    "+ Unemployment"= m3b,
    "+ Both"        = m3
  ),
  coef_map = c(
    "log(GDP_pc)"  = "Log GDP per capita",
    "gini_disp"    = "Gini (disposable income)",
    "unemployment" = "Unemployment rate"
  ),
  statistic = "conf.int", conf_level = 0.95,
  gof_map   = c("nobs", "r.squared", "r2.within"),
  stars     = TRUE,
  notes     = "Two-way FE (country + year) throughout. SEs clustered at country level.",
  output    = "flextable"
) |> autofit()

Table 2: Adding time-varying confounders to the two-way FE baseline. 95% confidence intervals in brackets. Note: the sample shrinks when Gini is included (missing data).

	Two-way FE	+ Gini	+ Unemployment	+ Both
Log GDP per capita	1.554***	1.740***	1.217***	1.376***
	[0.943, 2.164]	[1.044, 2.436]	[0.602, 1.833]	[0.666, 2.086]
Gini (disposable income)		-0.021		-0.026
		[-0.057, 0.016]		[-0.059, 0.008]
Unemployment rate			-0.042***	-0.036***
			[-0.061, -0.024]	[-0.055, -0.017]
Num.Obs.	1727	1308	1711	1297
R2	0.930	0.942	0.935	0.945
R2 Within	0.142	0.156	0.186	0.194
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001
Two-way FE (country + year) throughout. SEs clustered at country level.

Interpreting the results

The income coefficient remains large and significant throughout (1.55 → 1.38 in the full model, 95% CI well above zero). The key findings from each column:

+ Gini only: The Gini coefficient is negative but not statistically significant at conventional levels. The sample shrinks substantially due to missing Gini observations — results should be interpreted cautiously.
+ Unemployment only: Unemployment is negative and highly significant — higher unemployment reduces happiness within countries even after controlling for income and fixed effects. Adding unemployment leaves the income coefficient largely unchanged.
+ Both: The income coefficient falls modestly (11% relative to the baseline). Most of the action comes from unemployment, not Gini. The income–happiness relationship is robust to these time-varying controls.

A note on sample composition: adding Gini reduces the sample by 419 observations (from 1727 to 1308). Part of the income coefficient shift when Gini is included may reflect this sample restriction rather than the Gini control itself. Compare columns (1) and (3) — the unemployment-only model uses nearly the same sample as the baseline — to isolate the cleaner comparison.

4.5 Stage 2 — Why mediation analysis, not more controls

The concern “maybe it is not income but good healthcare that makes people happier” is a question about indirect effects: how much of the total income–happiness relationship runs through healthcare?

That is a legitimate question — but it requires a different estimand than what we have been computing, and a different analytical strategy.

The wrong approach: add healthcare (or WHR healthy life expectancy) to the main regression. As explained above, this gives the controlled direct effect, introduces mediator bias, and does not actually answer the indirect effect question.

The right approach — causal mediation decomposition:

\[\underbrace{\hat\beta_{\text{total}}}_{\text{Total effect}} = \underbrace{\hat\beta_{\text{direct}}}_{\text{Income → Happiness, not via M}} + \underbrace{\hat\beta_{\text{indirect}}}_{\text{Income → M → Happiness}}\]

This requires:

A model for the mediator (e.g., healthy life expectancy ~ income + country FE + year FE + confounders)
A model for happiness including both income and the mediator
Strong identification assumptions — crucially, no unmeasured confounders of the mediator–outcome relationship, which is demanding at the macro level

The practical upshot: if a mediation analysis found a large indirect effect through healthcare (i.e., most of the income effect runs through better health), that would support the view that policies improving health without income growth could also raise happiness — a different policy conclusion than if the income effect were mostly direct.

5 Reporting Panel Results in Quarto

5.1 What to include in every panel table

Code

modelsummary(
  list(
    "(1) Pooled OLS"      = m0,
    "(2) Country FE"      = m1,
    "(3) Two-way FE"      = m2,
    "(4) + Gini & Unemp." = m3
  ),
  coef_map  = c(
    "log(GDP_pc)"  = "Log GDP per capita",
    "gini_disp"    = "Gini coefficient",
    "unemployment" = "Unemployment rate"
  ),
  statistic = "conf.int", conf_level = 0.95,
  gof_map   = c("nobs", "r.squared", "r2.within"),
  stars     = TRUE,
  notes     = c(
    "95% CIs in brackets.",
    "Column (1): HC1 standard errors.",
    "Columns (2)–(4): clustered at country level.",
    "Country and year fixed effects in columns (3) and (4)."
  ),
  output    = "flextable"
) |> autofit()

Table 3: Income and happiness: pooled OLS, country FE, two-way FE, and two-way FE with time-varying controls. 95% confidence intervals in brackets. Preferred specification: column (4).

	(1) Pooled OLS	(2) Country FE	(3) Two-way FE	(4) + Gini & Unemp.
Log GDP per capita	0.813***	1.259***	1.554***	1.376***
	[0.786, 0.840]	[0.829, 1.688]	[0.943, 2.164]	[0.666, 2.086]
Gini coefficient				-0.026
				[-0.059, 0.008]
Unemployment rate				-0.036***
				[-0.055, -0.017]
Num.Obs.	1729	1727	1727	1297
R2	0.672	0.928	0.930	0.945
R2 Within		0.150	0.142	0.194
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001
95% CIs in brackets.
Column (1): HC1 standard errors.
Columns (2)–(4): clustered at country level.
Country and year fixed effects in columns (3) and (4).

Five rules for panel tables:

Show the progression — pooled OLS to your preferred specification. The reader needs to see what each layer of controls does.
Report CIs, not SEs — confidence intervals communicate both precision and the range of compatible effect sizes more directly than standard errors.
Report which FEs are included — a footnote is cleaner than coefficient rows: “Country and year fixed effects in columns (3) and (4).”
Report the within-$R^2$ — use gof_map = c("nobs", "r.squared", "r2.within"). The overall $R^2$ is inflated by the fixed effects.
State the clustering — always note clustering level in the table footer.

5.2 Reporting confidence intervals in `modelsummary`

By default modelsummary shows standard errors below point estimates. To show 95% confidence intervals instead:

Code

modelsummary(
  list("Model" = m),
  statistic  = "conf.int",   # [lower, upper] replaces (SE)
  conf_level = 0.95
)

To show the estimate and CI on the same line (common in reports):

Code

modelsummary(
  list("Model" = m),
  estimate   = "{estimate} [{conf.low}, {conf.high}]",
  statistic  = NULL           # suppress the second row entirely
)

For a publication-ready flextable, add output = "flextable" and pipe through autofit():

Code

modelsummary(
  list("Model" = m),
  statistic = "conf.int",
  conf_level = 0.95,
  output    = "flextable"
) |>
  autofit()

5.3 Writing up the result

A clean analytical paragraph based on the actual results:

Table 1 shows the relationship between log GDP per capita and life satisfaction across countries and years. The pooled OLS estimate (column 1) suggests a positive association (β = 0.81, 95% CI [0.79, 0.84]), but this conflates stable between-country differences with within-country income changes. Contrary to a naive reading of the Easterlin paradox, country fixed effects (column 2) reveal a stronger within-country relationship (β = 1.26, 95% CI [0.83, 1.69]): countries experiencing income growth show larger happiness gains than the cross-sectional gap between rich and poor countries would predict. Adding year fixed effects (column 3) increases the coefficient further to β = 1.55 (95% CI [0.94, 2.16]), as global time trends that simultaneously affected incomes and happiness everywhere are removed. Column 4 adds time-varying controls for inequality and unemployment; the income coefficient falls modestly to β = 1.38 (95% CI [0.67, 2.09]), driven mainly by the significant negative effect of unemployment rather than by Gini (which is not significant). The evidence points to a robust, positive within-country income–happiness relationship — with unemployment emerging as an important independent determinant of within-country happiness variation.

6 Summary

What we did:

Established the cross-sectional income–happiness correlation and the Easterlin challenge
Ran the standard FE progression: pooled OLS → country FE → two-way FE
Clarified what each model estimates and what confounders remain
Distinguished three estimands: total effect, controlled direct effect, indirect effect
Added time-varying confounders (Gini, unemployment) as Stage 1 extensions
Explained why conditioning on WHR variables (health, trust) is the wrong way to address mediator concerns — and what the right way (mediation analysis) would look like

Key takeaways:

The within-country income–happiness relationship is stronger than the cross-sectional comparison — the FE coefficient exceeds pooled OLS
Country FE eliminates all stable confounders; year FE removes common time shocks
Unemployment is a significant time-varying confounder; Gini is not significant in this panel (possibly due to limited within-country variation and missing data)
WHR social/health variables are mediators — adding them gives a controlled direct effect, not a cleaner total effect, and may introduce new bias
Mediation analysis is the right framework for decomposing pathways; it requires stronger assumptions and is a separate analytical step

In R: fixest::feols(y ~ x + controls | country + year, vcov = ~country)

7 Further Reading

Békés, G. & Kézdi, G. (2021). Data Analysis for Business, Economics, and Policy. Cambridge UP. Ch. 24 (Panel data).
Huntington-Klein, N. (2022). The Effect. Ch. 16 (Fixed Effects). theeffectbook.net
Cunningham, S. (2021). Causal Inference: The Mixtape. Ch. 8 (Panel data). mixtape.scunning.com
Pearl, J. & Mackenzie, D. (2018). The Book of Why. Ch. 9 (Mediation) — accessible treatment of direct vs. indirect effects
Imai, K., Keele, L. & Tingley, D. (2010). A general approach to causal mediation analysis. Psychological Methods, 15(4), 309–334 — the standard reference for the mediation R package
Killingsworth, M. A., Kahneman, D., & Mellers, B. (2023). Income and emotional well-being: A conflict resolved. Proceedings of the National Academy of Sciences, 120(10), e2208661120. https://doi.org/10.1073/pnas.2208661120
World Happiness Report (annual, worldhappiness.report) — statistical appendices document their approach to decomposing happiness gaps across countries

--- title: "Session 6: Panel Data and Fixed Effects — Live Demo" subtitle: "Advanced Data Science · Europa-Universität Flensburg" author: "Claudius Gräbner-Radkowitsch" date: "2026-05-28" format: html: toc: true toc-depth: 3 number-sections: true code-fold: true code-tools: true self-contained: true execute: echo: true warning: false message: false execute-dir: file --- # Setup ## Business question > *Does rising income make countries happier — and what would it take to interpret that relationship causally?* This question has a deceptively simple answer in cross-sectional data (yes, richer countries are happier) and a much less clear answer when we follow countries over time (the Easterlin paradox). Panel data methods let us be precise about which comparison we are making and what we can credibly claim. ## Packages and data ```{r} #| label: setup here::i_am("content/material/session06/lecture_notes.qmd") library(here) library(tidyverse) library(fixest) library(modelsummary) library(flextable) library(patchwork) dat <- read_csv( here("content/material/session06/data/happiness_income.csv"), show_col_types = FALSE) glimpse(dat) ``` The dataset merges three sources: | Variable | Source | Description | |---|---|---| | `happiness` | World Happiness Report 2026 | Life satisfaction (0–10, 3-year average) | | `GDP_pc` | World Bank WDI | GDP per capita, PPP (constant 2017 int'l $) | | `gini_disp` | SWIID 9.2 | Disposable income Gini coefficient | | `unemployment` | World Bank WDI | Unemployment rate (% of labour force) | | `population` | World Bank WDI | Total population | --- # Exploring the Data ## Panel structure ```{r} #| label: panel-structure dat |> summarise( n_countries = n_distinct(country), n_years = n_distinct(year), year_min = min(year), year_max = max(year), n_obs = n() ) |> flextable() |> theme_vanilla() |> autofit() ``` ```{r} #| label: panel-balance dat |> count(country) |> count(n, name = "n_countries") |> rename(years_observed = n)|> flextable() |> theme_vanilla() |> autofit() ``` The panel is **unbalanced** — not every country is observed in every year. This is typical for survey-based cross-country data and causes no problems for fixed effects estimation. ## Cross-country variation in happiness Before running any regression, it helps to see the structure of the data. How much of the variation in happiness is *between* countries vs. *within* countries over time? ```{r} #| label: fig-country-means #| fig-cap: "Average happiness and log GDP per capita by country (full-period means). Richer countries are clearly happier in cross-section." #| fig-height: 5 dat |> group_by(country) |> summarise( happiness = mean(happiness, na.rm = TRUE), log_gdp = mean(log(GDP_pc), na.rm = TRUE), .groups = "drop" ) |> filter(!is.na(happiness), !is.na(log_gdp)) |> ggplot(aes(log_gdp, happiness)) + geom_point(colour = "#2C5F8A", alpha = 0.7, size = 2) + geom_smooth(method = "lm", colour = "#D04A2F", se = TRUE, fill = "#D04A2F", alpha = 0.12, linewidth = 1) + labs( x = "Log GDP per capita (PPP), country mean", y = "Life satisfaction (0–10), country mean" ) + theme_minimal(base_size = 13) ``` The cross-sectional correlation is strong and positive. But is this evidence that income *causes* happiness? ## Within-country trends: the Easterlin challenge ```{r} #| label: fig-within-trends #| fig-cap: "Happiness over time for selected countries. GDP per capita rose substantially in all of them — but happiness trends differ." #| fig-height: 4 highlight <- c("United States", "Japan", "South Korea", "China", "Germany") col_hl <- c( "United States" = "#D04A2F", "Japan" = "#e67e22", "South Korea" = "#8e44ad", "China" = "#2C5F8A", "Germany" = "#4A9B6F" ) dat |> filter(country %in% highlight, !is.na(happiness)) |> ggplot(aes(year, happiness, colour = country)) + geom_line(linewidth = 1.1) + geom_point(size = 1.8) + scale_colour_manual(values = col_hl, name = NULL) + scale_x_continuous(breaks = seq(2006, 2024, by = 4)) + labs(x = NULL, y = "Life satisfaction (0–10)", caption = "Source: World Happiness Report (3-year averages)") + theme_minimal(base_size = 13) + theme(legend.position = "right") ``` China's GDP per capita roughly quadrupled over this period. Its happiness score rose modestly. The United States had strong aggregate growth — yet happiness has trended flat or downward. This is the Easterlin paradox in the data. --- # The Model Progression ## Step 0 — Pooled OLS: the naive baseline Pool all country-year observations and regress happiness on log income, ignoring panel structure entirely. ```{r} #| label: model-pooled m0 <- feols(happiness ~ log(GDP_pc), data = dat) modelsummary( list("Pooled OLS" = m0), coef_map = c("log(GDP_pc)" = "Log GDP per capita"), statistic = "conf.int", conf_level = 0.95, gof_map = c("nobs", "r.squared"), stars = TRUE, output = "flextable" ) |> autofit() ``` The large positive coefficient is not surprising — it reflects the strong cross-sectional pattern we already saw. But it conflates between-country differences (richer country = happier country) with within-country change (rising income → rising happiness). The former is dominated by stable country characteristics that we have not controlled for. ## The composite error — why pooled OLS is biased The pooled OLS error $\epsilon_{it}$ has two components: $$\epsilon_{it} = \eta_i + v_{it}$$ - $\eta_i$: everything stable about country $i$ that we do not observe — culture, institutions, social trust, history - $v_{it}$: genuine year-to-year noise If $\eta_i$ correlates with $\log(\text{GDP}_{it})$ — and it clearly does, since rich institutional environments produce both high income *and* high happiness — pooled OLS is biased. This is OVB with an unobserved variable. ## Step 1 — Country fixed effects: removing stable confounders Country FE absorbs $\eta_i$ entirely by subtracting each country's time-mean from every variable: $$\tilde{y}_{it} = y_{it} - \bar{y}_i \qquad \tilde{x}_{it} = x_{it} - \bar{x}_i$$ Because $\eta_i - \bar{\eta}_i = 0$, the stable country component disappears. The coefficient is now identified from **within-country variation only**: when a country's income rises above its own historical average, does its happiness rise too? ```{r} #| label: model-country-fe m1 <- feols(happiness ~ log(GDP_pc) | country, vcov = ~country, data = dat) modelsummary( list("Country FE" = m1), coef_map = c("log(GDP_pc)" = "Log GDP per capita"), statistic = "conf.int", conf_level = 0.95, gof_map = c("nobs", "r.squared", "r2.within"), stars = TRUE, output = "flextable" ) |> autofit() ``` Contrary to a naive expectation, the country FE coefficient is *larger* than in pooled OLS — see the full discussion below. ### Visualising the within-transformation To build intuition, let us manually demean one slice of the data and overlay the within-country slopes on the raw scatter. ```{r} #| label: fig-within-slopes #| fig-cap: "Raw data vs. demeaned data for selected countries. The grey overall trend is dominated by between-country differences; coloured lines show within-country slopes only." #| fig-height: 5 within_dat <- dat |> filter(country %in% highlight) |> group_by(country) |> mutate( happiness_dm = happiness - mean(happiness, na.rm = TRUE), log_gdp_dm = log(GDP_pc) - mean(log(GDP_pc), na.rm = TRUE) ) |> ungroup() p_raw <- dat |> filter(country %in% highlight) |> ggplot(aes(log(GDP_pc), happiness)) + geom_point(colour = "grey70", alpha = 0.4, size = 1.5) + geom_smooth(method = "lm", colour = "#D04A2F", se = FALSE, linewidth = 0.8, linetype = "dashed") + geom_smooth(aes(colour = country), method = "lm", se = FALSE, linewidth = 1) + scale_colour_manual(values = col_hl, guide = "none") + labs(x = "Log GDP per capita", y = "Happiness", title = "Raw data", subtitle = "Dashed = pooled OLS") + theme_minimal(base_size = 11) p_demeaned <- within_dat |> ggplot(aes(log_gdp_dm, happiness_dm)) + geom_point(aes(colour = country), alpha = 0.6, size = 1.5) + geom_smooth(method = "lm", colour = "#D04A2F", se = FALSE, linewidth = 1) + scale_colour_manual(values = col_hl, name = NULL) + labs(x = "Log GDP (demeaned)", y = "Happiness (demeaned)", title = "Demeaned data", subtitle = "Within-country variation only") + theme_minimal(base_size = 11) + theme(legend.position = "bottom") p_raw + p_demeaned ``` The between-country slope (dashed, left panel) is steep. The within-country slope (right panel) is flatter — in some countries barely positive. This is the Easterlin paradox captured in a single figure. ## Step 2 — Two-way fixed effects: controlling for global time shocks Country FE removes stable country differences. But what about shocks that hit all countries in the same year — a global recession, a pandemic, a commodity price surge? If such shocks correlate with income changes, country-only FE is not enough. **Two-way FE** adds a separate intercept for each year: $$y_{it} = \alpha_i + \gamma_t + \beta \log(\text{GDP}_{it}) + v_{it}$$ $\hat\beta$ is now identified from variation that is neither explained by *which country* nor *which year*. ```{r} #| label: model-twoway-fe m2 <- feols(happiness ~ log(GDP_pc) | country + year, vcov = ~country, data = dat) modelsummary( list("Two-way FE" = m2), coef_map = c("log(GDP_pc)" = "Log GDP per capita"), statistic = "conf.int", conf_level = 0.95, gof_map = c("nobs", "r.squared", "r2.within"), stars = TRUE, output = "flextable" ) |> autofit() ``` ```{r} #| label: inline-values-1 #| include: true #| echo: false # Point estimates (log GDP_pc coefficient) b_m0 <- round(coef(m0)[["log(GDP_pc)"]], 2) b_m1 <- round(coef(m1)[["log(GDP_pc)"]], 2) b_m2 <- round(coef(m2)[["log(GDP_pc)"]], 2) # 95% confidence intervals ci_m0 <- round(confint(m0)["log(GDP_pc)", ], 2) ci_m1 <- round(confint(m1)["log(GDP_pc)", ], 2) ci_m2 <- round(confint(m2)["log(GDP_pc)", ], 2) lo_m0 <- ci_m0[[1]]; hi_m0 <- ci_m0[[2]] lo_m1 <- ci_m1[[1]]; hi_m1 <- ci_m1[[2]] lo_m2 <- ci_m2[[1]]; hi_m2 <- ci_m2[[2]] # Within-R² r2w_m1 <- round(r2(m1)[["war2"]], 2) r2w_m2 <- round(r2(m2)[["war2"]], 2) # Sample sizes n_m2 <- nobs(m2) ``` ## The standard progression side by side ```{r} #| label: tbl-main-progression #| tbl-cap: "Three models of the income–happiness relationship. 95% confidence intervals in brackets. Within-R² is the relevant fit statistic for FE models." modelsummary( list("Pooled OLS" = m0, "Country FE" = m1, "Two-way FE" = m2), coef_map = c("log(GDP_pc)" = "Log GDP per capita"), statistic = "conf.int", conf_level = 0.95, gof_map = c("nobs", "r.squared", "r2.within"), stars = TRUE, notes = "95% CIs in brackets. Pooled OLS: HC1 SEs. FE models: clustered at country level.", output = "flextable" ) |> autofit() ``` **Reading the table:** - **Pooled OLS → Country FE:** The coefficient *increases* from `r b_m0` to `r b_m1`. This is counter-intuitive but real: the within-country income–happiness relationship is *stronger* than the cross-sectional comparison between countries. Countries going through active income growth (e.g. emerging markets) show larger happiness gains than the gap between a rich and a poor country at one point in time would suggest. The cross-sectional estimate is compressed partly by diminishing returns at high income levels and by stable between-country differences in happiness baselines. - **Country FE → Two-way FE:** The coefficient increases further to `r b_m2`. Year FE removes global shocks that simultaneously raised incomes and happiness everywhere (good global years); after removing those, the within-country, within-year income–happiness signal is stronger. - **Within-R²** (`r r2w_m1`–`r r2w_m2`): income explains roughly `r round(r2w_m1 * 100)`–`r round(r2w_m2 * 100)`% of within-country happiness variation — meaningful but modest. Happiness has many determinants beyond income. --- # Causal Interpretation ## What the FE estimate claims — and does not claim The two-way FE coefficient answers: > When a country's income rises above its own historical average in a year that is not unusually good or bad for all countries, does its happiness tend to rise? **Country FE already controls for:** - All time-invariant country characteristics: stable institutions, culture, geography, social trust, historical development path - This includes most of what makes Scandinavian countries both rich *and* happy — it is absorbed into $\alpha_i$ **Year FE already controls for:** - Global shocks affecting all countries in a given year: financial crises, pandemics, commodity cycles **What is still potentially uncontrolled:** Time-varying factors that change *differently across countries* and independently affect both income and happiness. Two obvious candidates in our data: **income inequality** and **unemployment**. ## Three estimands — precision about what you want to estimate Before adding more controls, it is worth being precise about which causal quantity you are actually after. | Estimand | What you condition on | What it answers | |---|---|---| | **Total effect** of income on happiness | Nothing on the causal path | "Does richer = happier, all channels included?" | | **Controlled direct effect** (CDE) | Mediators held fixed (e.g. healthcare) | "Does income matter *beyond* health, trust, etc.?" | | **Indirect / path-specific effect** via M | Estimated via mediation analysis | "How much of the income effect runs *through* health?" | Our main estimate (`m2`) is an attempt at the **total within-country effect** — it includes all pathways through which rising income may raise happiness: better healthcare, reduced stress, more autonomy, improved public services, and any direct effect. ## A common temptation: controlling for WHR variables The WHR reports several variables alongside happiness: social support, healthy life expectancy, freedom to make life choices, perceptions of corruption. The temptation is to add these as controls to "isolate the direct income effect" or "remove confounding." ::: {.callout-warning} ## This is the bad-control problem again Most of these WHR variables are **mediators**, not confounders: - Healthy life expectancy: income → health → happiness - Social support: income enables social networks → happiness - Freedom to choose: affluence enables autonomy → happiness Conditioning on a mediator does two things simultaneously: 1. It removes part of the income effect (blocking the mediated path) 2. If the mediator also has independent determinants (culture, geography), it opens new backdoor paths — introducing bias rather than removing it The result is a **controlled direct effect** that answers a different question than the total effect — and does so with additional bias from the newly opened paths. This is precisely the "bad control" problem from Session 5, applied at the macro level. ::: ## Stage 1 — Total effect with time-varying confounders The appropriate additions are variables that independently affect both income and happiness *and change over time within countries*, without lying on the causal path from income to happiness. **Income inequality (`gini_disp`):** The US Easterlin story is partly about rising inequality — aggregate GDP grew, but most of the gains went to the top, so median incomes stagnated. Gini is a time-varying confounder (or effect modifier): it captures distributional changes that affect happiness independently of average income. **Unemployment (`unemployment`):** Labour market conditions affect wellbeing through channels (job security, social belonging) that are partly independent of aggregate income. ```{r} #| label: model-with-confounders m3a <- feols(happiness ~ log(GDP_pc) + gini_disp | country + year, vcov = ~country, data = dat) m3b <- feols(happiness ~ log(GDP_pc) + unemployment | country + year, vcov = ~country, data = dat) m3 <- feols(happiness ~ log(GDP_pc) + gini_disp + unemployment | country + year, vcov = ~country, data = dat) ``` ```{r} #| label: inline-values-2 #| include: true #| echo: false # Point estimates (log GDP_pc coefficient) b_m3 <- round(coef(m3)[["log(GDP_pc)"]], 2) # 95% confidence intervals ci_m3 <- round(confint(m3)["log(GDP_pc)", ], 2) lo_m3 <- ci_m3[[1]]; hi_m3 <- ci_m3[[2]] # Within-R² r2w_m3 <- round(r2(m3)[["war2"]], 2) # Sample sizes n_m3a <- nobs(m3a) n_drop <- n_m2 - n_m3a # observations lost when adding Gini # % change in income coefficient m2 → m3 pct_drop <- round(abs(b_m2 - b_m3) / b_m2 * 100) ``` ```{r} #| label: tbl-confounders #| tbl-cap: "Adding time-varying confounders to the two-way FE baseline. 95% confidence intervals in brackets. Note: the sample shrinks when Gini is included (missing data)." modelsummary( list( "Two-way FE" = m2, "+ Gini" = m3a, "+ Unemployment"= m3b, "+ Both" = m3 ), coef_map = c( "log(GDP_pc)" = "Log GDP per capita", "gini_disp" = "Gini (disposable income)", "unemployment" = "Unemployment rate" ), statistic = "conf.int", conf_level = 0.95, gof_map = c("nobs", "r.squared", "r2.within"), stars = TRUE, notes = "Two-way FE (country + year) throughout. SEs clustered at country level.", output = "flextable" ) |> autofit() ``` ::: {.callout-note} ## Interpreting the results The income coefficient remains **large and significant** throughout (`r b_m2` → `r b_m3` in the full model, 95% CI well above zero). The key findings from each column: - **+ Gini only:** The Gini coefficient is negative but not statistically significant at conventional levels. The sample shrinks substantially due to missing Gini observations — results should be interpreted cautiously. - **+ Unemployment only:** Unemployment is negative and highly significant — higher unemployment reduces happiness within countries even after controlling for income and fixed effects. Adding unemployment leaves the income coefficient largely unchanged. - **+ Both:** The income coefficient falls modestly (`r pct_drop`% relative to the baseline). Most of the action comes from unemployment, not Gini. The income–happiness relationship is robust to these time-varying controls. **A note on sample composition:** adding Gini reduces the sample by `r n_drop` observations (from `r n_m2` to `r n_m3a`). Part of the income coefficient shift when Gini is included may reflect this sample restriction rather than the Gini control itself. Compare columns (1) and (3) — the unemployment-only model uses nearly the same sample as the baseline — to isolate the cleaner comparison. ::: ## Stage 2 — Why mediation analysis, not more controls The concern *"maybe it is not income but good healthcare that makes people happier"* is a question about **indirect effects**: how much of the total income–happiness relationship runs through healthcare? That is a legitimate question — but it requires a **different estimand** than what we have been computing, and a different analytical strategy. **The wrong approach:** add healthcare (or WHR healthy life expectancy) to the main regression. As explained above, this gives the controlled direct effect, introduces mediator bias, and does not actually answer the indirect effect question. **The right approach — causal mediation decomposition:** $$\underbrace{\hat\beta_{\text{total}}}_{\text{Total effect}} = \underbrace{\hat\beta_{\text{direct}}}_{\text{Income → Happiness, not via M}} + \underbrace{\hat\beta_{\text{indirect}}}_{\text{Income → M → Happiness}}$$ This requires: 1. A model for the **mediator** (e.g., healthy life expectancy ~ income + country FE + year FE + confounders) 2. A model for **happiness** including both income and the mediator 3. Strong identification assumptions — crucially, no unmeasured confounders of the mediator–outcome relationship, which is demanding at the macro level The practical upshot: if a mediation analysis found a large indirect effect through healthcare (i.e., most of the income effect runs through better health), that would support the view that policies improving health *without* income growth could also raise happiness — a different policy conclusion than if the income effect were mostly direct. ::: {.callout-note} ## Further reading The World Happiness Report investigates mediation pathways in its statistical appendices — see the methods notes in recent editions for their approach to decomposing the cross-country happiness gaps. Note that findings on the relative importance of income vs. social variables differ across WHR editions and decomposition methods, and these are cross-sectional decompositions, not panel FE estimates. ::: **Mediation analysis is beyond the scope of this session**, but if you pursue this line of research, the `mediation` package in R implements the Imai et al. framework for causal mediation under the sequential ignorability assumption. --- # Reporting Panel Results in Quarto ## What to include in every panel table ```{r} #| label: tbl-final-quarto #| tbl-cap: "Income and happiness: pooled OLS, country FE, two-way FE, and two-way FE with time-varying controls. 95% confidence intervals in brackets. Preferred specification: column (4)." modelsummary( list( "(1) Pooled OLS" = m0, "(2) Country FE" = m1, "(3) Two-way FE" = m2, "(4) + Gini & Unemp." = m3 ), coef_map = c( "log(GDP_pc)" = "Log GDP per capita", "gini_disp" = "Gini coefficient", "unemployment" = "Unemployment rate" ), statistic = "conf.int", conf_level = 0.95, gof_map = c("nobs", "r.squared", "r2.within"), stars = TRUE, notes = c( "95% CIs in brackets.", "Column (1): HC1 standard errors.", "Columns (2)–(4): clustered at country level.", "Country and year fixed effects in columns (3) and (4)." ), output = "flextable" ) |> autofit() ``` **Five rules for panel tables:** 1. **Show the progression** — pooled OLS to your preferred specification. The reader needs to see what each layer of controls does. 2. **Report CIs, not SEs** — confidence intervals communicate both precision and the range of compatible effect sizes more directly than standard errors. 3. **Report which FEs are included** — a footnote is cleaner than coefficient rows: *"Country and year fixed effects in columns (3) and (4)."* 4. **Report the within-$R^2$** — use `gof_map = c("nobs", "r.squared", "r2.within")`. The overall $R^2$ is inflated by the fixed effects. 5. **State the clustering** — always note clustering level in the table footer. ## Reporting confidence intervals in `modelsummary` By default `modelsummary` shows standard errors below point estimates. To show 95% confidence intervals instead: ```{r} #| eval: false modelsummary( list("Model" = m), statistic = "conf.int", # [lower, upper] replaces (SE) conf_level = 0.95 ) ``` To show the estimate and CI on the same line (common in reports): ```{r} #| eval: false modelsummary( list("Model" = m), estimate = "{estimate} [{conf.low}, {conf.high}]", statistic = NULL # suppress the second row entirely ) ``` For a **publication-ready flextable**, add `output = "flextable"` and pipe through `autofit()`: ```{r} #| eval: false modelsummary( list("Model" = m), statistic = "conf.int", conf_level = 0.95, output = "flextable" ) |> autofit() ``` ## Writing up the result A clean analytical paragraph based on the actual results: > Table 1 shows the relationship between log GDP per capita and life satisfaction across countries and years. The pooled OLS estimate (column 1) suggests a positive association (β = `r b_m0`, 95% CI [`r lo_m0`, `r hi_m0`]), but this conflates stable between-country differences with within-country income changes. Contrary to a naive reading of the Easterlin paradox, country fixed effects (column 2) reveal a *stronger* within-country relationship (β = `r b_m1`, 95% CI [`r lo_m1`, `r hi_m1`]): countries experiencing income growth show larger happiness gains than the cross-sectional gap between rich and poor countries would predict. Adding year fixed effects (column 3) increases the coefficient further to β = `r b_m2` (95% CI [`r lo_m2`, `r hi_m2`]), as global time trends that simultaneously affected incomes and happiness everywhere are removed. Column 4 adds time-varying controls for inequality and unemployment; the income coefficient falls modestly to β = `r b_m3` (95% CI [`r lo_m3`, `r hi_m3`]), driven mainly by the significant negative effect of unemployment rather than by Gini (which is not significant). The evidence points to a robust, positive within-country income–happiness relationship — with unemployment emerging as an important independent determinant of within-country happiness variation. --- # Summary **What we did:** 1. Established the cross-sectional income–happiness correlation and the Easterlin challenge 2. Ran the standard FE progression: pooled OLS → country FE → two-way FE 3. Clarified what each model estimates and what confounders remain 4. Distinguished three estimands: total effect, controlled direct effect, indirect effect 5. Added time-varying confounders (Gini, unemployment) as Stage 1 extensions 6. Explained why conditioning on WHR variables (health, trust) is the wrong way to address mediator concerns — and what the right way (mediation analysis) would look like **Key takeaways:** - The within-country income–happiness relationship is **stronger** than the cross-sectional comparison — the FE coefficient exceeds pooled OLS - Country FE eliminates all stable confounders; year FE removes common time shocks - Unemployment is a significant time-varying confounder; Gini is not significant in this panel (possibly due to limited within-country variation and missing data) - WHR social/health variables are mediators — adding them gives a controlled direct effect, not a cleaner total effect, and may introduce new bias - Mediation analysis is the right framework for decomposing pathways; it requires stronger assumptions and is a separate analytical step **In R:** `fixest::feols(y ~ x + controls | country + year, vcov = ~country)` --- # Further Reading - Békés, G. & Kézdi, G. (2021). *Data Analysis for Business, Economics, and Policy*. Cambridge UP. Ch. 24 (Panel data). - Huntington-Klein, N. (2022). *The Effect*. Ch. 16 (Fixed Effects). [theeffectbook.net](https://theeffectbook.net) - Cunningham, S. (2021). *Causal Inference: The Mixtape*. Ch. 8 (Panel data). [mixtape.scunning.com](https://mixtape.scunning.com) - Pearl, J. & Mackenzie, D. (2018). *The Book of Why*. Ch. 9 (Mediation) — accessible treatment of direct vs. indirect effects - Imai, K., Keele, L. & Tingley, D. (2010). A general approach to causal mediation analysis. *Psychological Methods*, 15(4), 309–334 — the standard reference for the `mediation` R package - Killingsworth, M. A., Kahneman, D., & Mellers, B. (2023). Income and emotional well-being: A conflict resolved. Proceedings of the National Academy of Sciences, 120(10), e2208661120. https://doi.org/10.1073/pnas.2208661120 - World Happiness Report (annual, [worldhappiness.report](https://www.worldhappiness.report/)) — statistical appendices document their approach to decomposing happiness gaps across countries

1 Setup

1.1 Business question

1.2 Packages and data

2 Exploring the Data

2.1 Panel structure

2.2 Cross-country variation in happiness

2.3 Within-country trends: the Easterlin challenge

3 The Model Progression

3.1 Step 0 — Pooled OLS: the naive baseline

3.2 The composite error — why pooled OLS is biased

3.3 Step 1 — Country fixed effects: removing stable confounders

3.3.1 Visualising the within-transformation

3.4 Step 2 — Two-way fixed effects: controlling for global time shocks

3.5 The standard progression side by side

4 Causal Interpretation

4.1 What the FE estimate claims — and does not claim

4.2 Three estimands — precision about what you want to estimate

4.3 A common temptation: controlling for WHR variables

4.4 Stage 1 — Total effect with time-varying confounders

4.5 Stage 2 — Why mediation analysis, not more controls

5 Reporting Panel Results in Quarto

5.1 What to include in every panel table

5.2 Reporting confidence intervals in modelsummary

5.3 Writing up the result

6 Summary

7 Further Reading

5.2 Reporting confidence intervals in `modelsummary`