Schedule & Overview

General reference

The main textbook for this course is:1

Békés, G., & Kézdi, G. (2021). Data Analysis for Business, Economics, and Policy. Cambridge University Press. https://gabors-data-analysis.com/

A complementary reference for R and tidyverse:

Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for Data Science (2nd ed.). O’Reilly. r4ds.hadley.nz

Ismay, C., & Kim, A. Y.-S. (2020). Statistical inference via data science: A ModernDive, into R and the tidyverse. CRC Press, Taylor and Francis Group. https://moderndive.com/index.html

For more advanced details on the fundamentals of programming in R, I recommend the following:

Wickham, H. (2019). Advanced R (Second edition). CRC Press/Taylor & Francis Group. https://adv-r.hadley.nz/

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning: With applications in R (Second edition). Springer. https://www.statlearning.com/


Full schedule

# Date Title Communication skill
1 Thu, 12 Mar Welcome, recap & tooling upgrade Document structure, YAML, BibTeX bibliography
Thu, 19 Mar Take-home task 1 (part 1) First full Quarto report submitted independently
Thu, 26 Mar Take-home task 1 (part 2)
Thu, 02 Apr Easter break
2 Thu, 09 Apr Multiple regression: going beyond the basics Regression tables with modelsummary
Thu, 16 Apr Take-home task 2 (part 1) Regression table formatting, model comparison
Thu, 23 Apr Take-home task 2 (part 2)
- Thu, 30 Apr Canceled; replacement date TBD
3 Thu, 07 May Modelling binary and categorical outcomes Inline R code for automatic result reporting
Thu, 14 May No lecture
4 Thu, 21 May What can go wrong: biases and diagnostics Diagnostic plots, figure captions, cross-references
5 Thu, 28 May Causation vs. correlation TBD
6 Thu, 04 Jun Panel data and fixed effects Panel model output with fixest
7 Thu, 11 Jun Using AI tools: advanced visualizations as an example Publication-ready figures; AI-assisted Quarto: debugging, improving prose
8 Thu, 18 Jun Recap and looking ahead

The exam takes place in room MAD 131. The exam lasts 120 minutes, so make sure you are in the classroom at the latest 15:50. It is an open book exam that is written via Moodle. Please make sure you register for the exam on time!

The exam takes place in room HEL 165. The exam lasts 120 minutes, so make sure you are in the classroom at the latest 15:50. It is an open book exam that is written via Moodle. Please make sure you register for the exam on time!


Datasets

Most datasets used in this course come from the open-access repository of Békés & Kézdi (2021), available at gabors-data-analysis.com. Additional data is drawn from publicly available sources including the World Bank World Development Indicators and Eurostat.

Session Dataset Business Question
2 hotels-vienna What features drive hotel prices in Vienna?
3 bisnode-firms Which firms are likely to exit the market?
4 hotels-europe Do regression assumptions hold across 46 cities?
5 cps-earnings What explains the gender wage gap?
6 wms-management-survey Does management quality predict firm performance?
7 working-from-home Does WFH improve employee performance?
8 worldbank-lifeexpectancy + world-bank-immunization How do health outcomes relate to income globally?
Task 1 cps-earnings What determines weekly earnings for market research analysts?
Task 2 hotels-europe What drives hotel prices across European cities, and does distance to the city centre matter more in capitals?

Take-home tasks

Two mandatory tasks are assigned during instructor-unavailable periods and submitted as rendered Quarto HTML reports. They are graded pass/fail and use separate datasets from in-person sessions.

Task Assigned Due
Task 1 — Earnings, age, and hours worked End of Session 1 (12 Mar) 02 Apr, 23:59
Task 2 — Hotel prices across European cities End of Session 2 (09 Apr) 29 Apr, 23:59

Footnotes

  1. There is no need to buy the book; the materials for each session provided here are self-contained. Still, its a good book that I can recommend reading.↩︎