Schedule & Overview
General reference
The main textbook for this course is:
TBD
A complementary reference for R and tidyverse:
Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for Data Science (2nd ed.). O’Reilly. r4ds.hadley.nz
Ismay, C., & Kim, A. Y.-S. (2020). Statistical inference via data science: A ModernDive, into R and the tidyverse. CRC Press, Taylor and Francis Group. https://moderndive.com/index.html
For more advanced details on the fundamentals of programming in R, I recommend the following:
Wickham, H. (2019). Advanced R (Second edition). CRC Press/Taylor & Francis Group. https://adv-r.hadley.nz/
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning: With applications in R (Second edition). Springer. https://www.statlearning.com/
Full schedule
| # | Date | Title | Communication skill |
|---|---|---|---|
| 1 | Thu, 12 Mar | Welcome, recap & tooling upgrade | Document structure, YAML, BibTeX bibliography |
| — | Thu, 19 Mar | Take-home task 1 (part 1) | First full Quarto report submitted independently |
| — | Thu, 26 Mar | Take-home task 1 (part 2) | |
| — | Thu, 02 Apr | Easter break | |
| 2 | Thu, 09 Apr | Multiple regression: going beyond the basics | Regression tables with modelsummary |
| — | Thu, 16 Apr | Take-home task 2 (part 1) | Regression table formatting, model comparison |
| — | Thu, 23 Apr | Take-home task 2 (part 2) | |
| 3 | Thu, 30 Apr | Modelling binary and categorical outcomes | Inline R code for automatic result reporting |
| 4 | Thu, 07 May | What can go wrong: biases and diagnostics | Diagnostic plots, figure captions, cross-references |
| — | Thu, 14 May | No lecture | |
| 5 | Thu, 21 May | Causation vs. correlation: thinking like an economist | Structuring an analytical narrative |
| 6 | Thu, 28 May | Panel data and fixed effects | Panel model output with fixest |
| 7 | Thu, 04 Jun | Coding smarter: R and AI tools | AI-assisted Quarto: debugging, improving prose |
| 8 | Thu, 11 Jun | Communicating data: advanced visualization | Publication-ready figures, multi-format export, Word output |
| 9 | Thu, 18 Jun | Recap and looking ahead |
Datasets
Most datasets used in this course come from the open-access repository of Békés & Kézdi (2021), available at gabors-data-analysis.com. Additional data is drawn from publicly available sources including the World Bank World Development Indicators and Eurostat.
| Session | Dataset | Business Question |
|---|---|---|
| 2 | hotels-vienna | What features drive hotel prices in Vienna? |
| 3 | bisnode-firms | Which firms are likely to exit the market? |
| 4 | hotels-europe | Do regression assumptions hold across 46 cities? |
| 5 | cps-earnings | What explains the gender wage gap? |
| 6 | wms-management-survey | Does management quality predict firm performance? |
| 7 | working-from-home | Does WFH improve employee performance? |
| 8 | worldbank-lifeexpectancy + world-bank-immunization | How do health outcomes relate to income globally? |
| Task 1 | cps-earnings | What determines weekly earnings for market research analysts? |
| Task 2 | hotels-europe | What drives hotel prices across European cities, and does distance to the city centre matter more in capitals? |
Take-home tasks
Two mandatory tasks are assigned during instructor-unavailable periods and submitted as rendered Quarto HTML reports. They are graded pass/fail and use separate datasets from in-person sessions.
| Task | Assigned | Due |
|---|---|---|
| Task 1 — Earnings, age, and hours worked | End of Session 1 (12 Mar) | 02 Apr, 23:59 |
| Task 2 — Hotel prices across European cities | End of Session 2 (09 Apr) | 29 Apr, 23:59 |