19 Mehrebenenmodelle

19.1 Folien

19.2 Daten zur heutigen Sitzung

19.3 Code und Ausgaben aus der Vorlesung

Laden der relevanten Pakete

library(lme4) # Mehrebenenmodelle

Loading required package: Matrix

library(lmerTest) # Signifikanztests für Mehrebenenmodelle


Attaching package: 'lmerTest'

The following object is masked from 'package:lme4':

    lmer

The following object is masked from 'package:stats':

    step

library(marginaleffects) # Modellvorhersagen und Vergleiche


Attaching package: 'marginaleffects'

The following object is masked from 'package:lme4':

    refit

library(performance) # Modellbeurteilung (hier vor allem ICC)
library(parameters) # Modellschätzungen extrahieren und darstellen
library(report) # Einfaches Erstellen von statistischen Berichten
library(tidyverse) # Datenmanagement und Visualisierung: https://www.tidyverse.org/

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ tidyr::expand() masks Matrix::expand()
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
✖ tidyr::pack()   masks Matrix::pack()
✖ tidyr::unpack() masks Matrix::unpack()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

theme_set(theme_classic()) # Voreinstellung für Plots

Daten laden

d <- read_rds(here::here("data/Faehnrich_2020.rds"))

Struktur der Stichprobe

d |>
  summarise(n_posts = n(), .by = uni)

# A tibble: 42 × 2
   uni                     n_posts
   <fct>                     <int>
 1 Columbia U                  576
 2 Cornell U                   404
 3 Duke U                      323
 4 Harvard U                   203
 5 Imperial College London     179
 6 Johns Hopkins U             481
 7 MIT                         243
 8 New York U                  407
 9 Northwestern U              141
10 Princeton U                 507
# ℹ 32 more rows

Stichprobe: Unabhängige Variablen (L1)

sample_l1 <- d |>
  select(timeofday, type, year, starts_with("topic"), word_count, day_type) |>
  report_sample()
sample_l1 |>
  _[1:12, ]

# Descriptive Statistics

Variable                 | Summary
----------------------------------
timeofday [morning], %   |    38.3
timeofday [afternoon], % |    46.1
timeofday [evening], %   |    14.2
timeofday [night], %     |     1.4
type [status], %         |     5.4
type [link], %           |    37.4
type [photo], %          |    48.9
type [video], %          |     8.3
year [2015], %           |    26.9
year [2012], %           |     7.6
year [2013], %           |    32.8
year [2014], %           |    32.6

Stichprobe: Unabhängige Variablen (L1) - Fortsetzung

sample_l1 |>
  _[13:nrow(sample_l1), ]

# Descriptive Statistics

Variable                |       Summary
---------------------------------------
topic_research [yes], % |          28.0
topic_teaching [yes], % |          25.6
topic_awards [yes], %   |           5.7
topic_event [yes], %    |          35.1
topic_interact [yes], % |          37.2
topic_self [yes], %     |          16.8
Mean word_count (SD)    | 30.81 (24.71)
day_type [weekend], %   |          13.8

Stichprobe: Abhängige Variablen (L1)

d |>
  select(likes_count, comments_count, shares_count) |>
  report_table()

Variable       | n_Obs |   Mean |      SD | Median |    MAD | Min |    Max
--------------------------------------------------------------------------
likes_count    | 10391 | 425.22 | 1153.04 |    108 | 130.47 |   0 |  24847
comments_count | 10391 |  12.18 |   36.54 |      3 |   4.45 |   0 |   1556
shares_count   | 10391 |  62.60 | 1441.28 |      9 |  11.86 |   0 | 145480

Variable       | Skewness | Kurtosis | percentage_Missing
---------------------------------------------------------
likes_count    |     9.10 |   129.01 |                  0
comments_count |    15.73 |   456.79 |                  0
shares_count   |    98.92 |  9976.61 |                  0

Stichprobe: Unabhängige Variablen (L2)

d |>
  distinct(uni, uni_fans, uni_us) |>
  select(-uni) |>
  report_sample()

# Descriptive Statistics

Variable           |               Summary
------------------------------------------
Mean uni_fans (SD) | 435063.17 (758868.02)
uni_us [US], %     |                  85.7

Stichprobe: log-Transformation - Code

# d <- d |>
#   mutate(
#     word_count_log = log1p(word_count),
#     uni_fans_log = log(uni_fans),
#     likes_count_log = log1p(likes_count),
#     comments_count_log = log1p(comments_count),
#     shares_count_log = log1p(shares_count)
#   )

Stichprobe: log-Transformation - Grafik

d |>
  select(id, uni_fans, ends_with("count"), ends_with("log")) |>
  gather(key, value, -id) |>
  mutate(
    type = if_else(str_detect(key, "log"), "log", "original"),
    key = str_remove_all(key, "_log")
  ) |>
  spread(type, value) |>
  ggplot(aes(original, log)) +
  geom_point() +
  facet_wrap(vars(key), scales = "free") +
  scale_x_continuous(labels = scales::label_comma())

Stichprobe: Datensatz

d |>
  select(comments_count_log, topic_research, uni, uni_fans_log) |>
  slice_head(n = 4, by = uni) |>
  _[1:10, ]

# A tibble: 10 × 4
   comments_count_log topic_research uni        uni_fans_log
                <dbl> <fct>          <fct>             <dbl>
 1              0     no             Columbia U         12.4
 2              0.693 no             Columbia U         12.4
 3              0.693 no             Columbia U         12.4
 4              2.77  no             Columbia U         12.4
 5              0     no             Cornell U          12.5
 6              4.03  no             Cornell U          12.5
 7              0.693 yes            Cornell U          12.5
 8              1.61  no             Cornell U          12.5
 9              1.61  no             Duke U             12.6
10              0     no             Duke U             12.6

Null-Modell schaetzen

m0 <- lmer( # Funktion für lineares Mehrebenenmodell
  comments_count_log ~ # abhängige Variable
    1 + # Gewichteter Gesamtmittelwert
    (1 | uni), # Struktur des Modells
  data = d
) # Daten

Null-Modell Tabelle

m0 |>
  report_table(metrics = "R2", include_effectsize = FALSE)

Parameter        | Coefficient |       95% CI | t(10388) |      p | Effects
---------------------------------------------------------------------------
(Intercept)      |        1.72 | [1.47, 1.96] |    13.73 | < .001 |   fixed
                 |        0.81 |              |          |        |  random
                 |        1.01 |              |          |        |  random
                 |             |              |          |        |        
R2 (conditional) |             |              |          |        |        
R2 (marginal)    |             |              |          |        |        

Parameter        |    Group |  Fit
----------------------------------
(Intercept)      |          |     
                 |      uni |     
                 | Residual |     
                 |          |     
R2 (conditional) |          | 0.39
R2 (marginal)    |          | 0.00

Null-Modell Vorhersage

plot_predictions(m0, condition = "uni") +
  coord_flip() +
  scale_x_discrete(limits = rev)

Warning: For this model type, `marginaleffects` only takes into account the
uncertainty in fixed-effect parameters. This is often appropriate when
`re.form=NA`, but may be surprising to users who set `re.form=NULL` (default)
or to some other value. Call `options(marginaleffects_safe = FALSE)` to silence
this warning.
Warning: For this model type, `marginaleffects` only takes into account the
uncertainty in fixed-effect parameters. This is often appropriate when
`re.form=NA`, but may be surprising to users who set `re.form=NULL` (default)
or to some other value. Call `options(marginaleffects_safe = FALSE)` to silence
this warning.

ICC

icc(m0, by_group = TRUE)

# ICC by Group

Group |   ICC
-------------
uni   | 0.390

Random-Intercept-Modell mit L1-Praediktoren schaetzen

m1 <- lmer( # Funktion für lineares Mehrebenenmodell
  comments_count_log ~ # abhängige Variable
    topic_interact + topic_research + topic_teaching + # Prädiktoren
    (1 | uni), # Struktur des Modells
  data = d
)

Random-Intercept-Modell mit L1-Praediktoren Tabelle

m1 |>
  report_table(metrics = "R2", include_effectsize = FALSE)

Parameter            | Coefficient |         95% CI | t(10385) |      p
-----------------------------------------------------------------------
(Intercept)          |        1.77 | [ 1.52,  2.01] |    14.13 | < .001
topic interact [yes] |        0.08 | [ 0.04,  0.12] |     3.98 | < .001
topic research [yes] |       -0.20 | [-0.24, -0.15] |    -8.68 | < .001
topic teaching [yes] |       -0.09 | [-0.13, -0.04] |    -3.76 | < .001
                     |        0.80 |                |          |       
                     |        1.00 |                |          |       
                     |             |                |          |       
R2 (conditional)     |             |                |          |       
R2 (marginal)        |             |                |          |       

Parameter            | Effects |    Group |      Fit
----------------------------------------------------
(Intercept)          |   fixed |          |         
topic interact [yes] |   fixed |          |         
topic research [yes] |   fixed |          |         
topic teaching [yes] |   fixed |          |         
                     |  random |      uni |         
                     |  random | Residual |         
                     |         |          |         
R2 (conditional)     |         |          |     0.39
R2 (marginal)        |         |          | 6.78e-03

Random-Intercept-Modell mit L1-Praediktoren Plot

m1 |>
  parameters(effects = "fixed") |>
  plot()

Random-Intercept-Modell mit L2-Praediktoren schaetzen

m2 <- lmer(comments_count_log ~ topic_interact + topic_research +
  topic_teaching + uni_fans_log + (1 | uni), data = d)

Random-Intercept-Modell mit L2-Praediktoren Tabelle

m2 |>
  report_table(metrics = "R2", include_effectsize = FALSE)

Parameter            | Coefficient |         95% CI | t(10384) |      p
-----------------------------------------------------------------------
(Intercept)          |       -4.88 | [-6.32, -3.44] |    -6.65 | < .001
topic interact [yes] |        0.09 | [ 0.04,  0.13] |     4.07 | < .001
topic research [yes] |       -0.20 | [-0.24, -0.15] |    -8.67 | < .001
topic teaching [yes] |       -0.09 | [-0.13, -0.04] |    -3.77 | < .001
uni fans log         |        0.54 | [ 0.43,  0.66] |     9.10 | < .001
                     |        0.46 |                |          |       
                     |        1.00 |                |          |       
                     |             |                |          |       
R2 (conditional)     |             |                |          |       
R2 (marginal)        |             |                |          |       

Parameter            | Effects |    Group |  Fit
------------------------------------------------
(Intercept)          |   fixed |          |     
topic interact [yes] |   fixed |          |     
topic research [yes] |   fixed |          |     
topic teaching [yes] |   fixed |          |     
uni fans log         |   fixed |          |     
                     |  random |      uni |     
                     |  random | Residual |     
                     |         |          |     
R2 (conditional)     |         |          | 0.38
R2 (marginal)        |         |          | 0.25

Random-Intercept-Modell mit L2-Praediktoren Plot

m2 |>
  parameters(effects = "fixed") |>
  plot()

Random-Slope-Modell schaetzen

m3 <- lmer(
  comments_count_log ~ topic_interact + topic_research +
    topic_teaching + uni_fans_log +
    (topic_research | uni), # Koeffizient von topic_research darf zwischen Uni-Seiten variieren
  data = d
)

Random-Slope-Modell Modellvergleich

anova(m2, m3, model.names = c("Random Intercept", "Random Slope")) |>
  rownames_to_column(var = "Modell")

refitting model(s) with ML (instead of REML)

Data: d
Models:
Random Intercept: comments_count_log ~ topic_interact + topic_research + topic_teaching + uni_fans_log + (1 | uni)
Random Slope: comments_count_log ~ topic_interact + topic_research + topic_teaching + uni_fans_log + (topic_research | uni)
     Modell npar   AIC   BIC logLik -2*log(L) Chisq Df Pr(>Chisq)    
[1,]      1    7 29745 29796 -14865     29731                        
[2,]      2    9 29729 29794 -14856     29711 19.77  2  5.095e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Random-Slope-Modell Tabelle der Fixed Effects

m3 |>
  report_table(metrics = "R2", include_effectsize = FALSE) |>
  filter(Effects == "fixed") |>
  select(-Effects, -Group, -Fit)

Parameter            | Coefficient |         95% CI | t(10382) |      p
-----------------------------------------------------------------------
(Intercept)          |       -4.87 | [-6.29, -3.44] |    -6.68 | < .001
topic interact [yes] |        0.09 | [ 0.04,  0.13] |     4.06 | < .001
topic research [yes] |       -0.20 | [-0.27, -0.13] |    -5.40 | < .001
topic teaching [yes] |       -0.08 | [-0.13, -0.04] |    -3.61 | < .001
uni fans log         |        0.54 | [ 0.43,  0.66] |     9.15 | < .001

Random-Slope-Modell Tabelle der Random Effects

m3 |>
  parameters(effects = "random", ci_random = FALSE, group_level = FALSE)

# Random Effects

Parameter                              | Coefficient | 95% CI
-------------------------------------------------------------
SD (Intercept: uni)                    |        0.46 |       
SD (topic_researchyes: uni)            |        0.18 |       
Cor (Intercept~topic_researchyes: uni) |       -0.17 |       
SD (Residual)                          |        1.00 |

Random-Slope-Modell Plot der Random Effects

m3 |>
  plot_slopes(variables = "topic_research", condition = "uni") +
  coord_flip() +
  scale_x_discrete(limits = rev) +
  geom_hline(yintercept = 0, linetype = 2)

Warning: For this model type, `marginaleffects` only takes into account the
uncertainty in fixed-effect parameters. This is often appropriate when
`re.form=NA`, but may be surprising to users who set `re.form=NULL` (default)
or to some other value. Call `options(marginaleffects_safe = FALSE)` to silence
this warning.
Warning: For this model type, `marginaleffects` only takes into account the
uncertainty in fixed-effect parameters. This is often appropriate when
`re.form=NA`, but may be surprising to users who set `re.form=NULL` (default)
or to some other value. Call `options(marginaleffects_safe = FALSE)` to silence
this warning.
Warning: For this model type, `marginaleffects` only takes into account the
uncertainty in fixed-effect parameters. This is often appropriate when
`re.form=NA`, but may be surprising to users who set `re.form=NULL` (default)
or to some other value. Call `options(marginaleffects_safe = FALSE)` to silence
this warning.

Random-Slope-Modell mit Cross-Level-Interaktion schaetzen

m4 <- lmer(comments_count_log ~ topic_interact +
  topic_research * uni_fans_log + # Cross-Level-Interaktion
  topic_teaching + (topic_research | uni), data = d)

Random-Slope-Modell mit Cross-Level-Interaktion Tabelle der Fixed Effects

m4 |>
  report_table(metrics = "R2", include_effectsize = FALSE) |>
  filter(Effects == "fixed") |>
  select(-Effects, -Group, -Fit)

Parameter                           | Coefficient |         95% CI | t(10381) |      p
--------------------------------------------------------------------------------------
(Intercept)                         |       -5.11 | [-6.57, -3.65] |    -6.88 | < .001
topic interact [yes]                |        0.09 | [ 0.04,  0.13] |     4.06 | < .001
topic research [yes]                |        0.38 | [-0.31,  1.08] |     1.08 | 0.282 
uni fans log                        |        0.56 | [ 0.44,  0.68] |     9.29 | < .001
topic teaching [yes]                |       -0.08 | [-0.13, -0.04] |    -3.60 | < .001
topic research [yes] × uni fans log |       -0.05 | [-0.10,  0.01] |    -1.64 | 0.100

Random-Slope-Modell mit Cross-Level-Interaktion Plot der Interaktion

plot_slopes(m4, variables = "topic_research", condition = "uni_fans_log") +
  geom_hline(yintercept = 0, linetype = 2)

Warning: For this model type, `marginaleffects` only takes into account the
uncertainty in fixed-effect parameters. This is often appropriate when
`re.form=NA`, but may be surprising to users who set `re.form=NULL` (default)
or to some other value. Call `options(marginaleffects_safe = FALSE)` to silence
this warning.
Warning: For this model type, `marginaleffects` only takes into account the
uncertainty in fixed-effect parameters. This is often appropriate when
`re.form=NA`, but may be surprising to users who set `re.form=NULL` (default)
or to some other value. Call `options(marginaleffects_safe = FALSE)` to silence
this warning.
Warning: For this model type, `marginaleffects` only takes into account the
uncertainty in fixed-effect parameters. This is often appropriate when
`re.form=NA`, but may be surprising to users who set `re.form=NULL` (default)
or to some other value. Call `options(marginaleffects_safe = FALSE)` to silence
this warning.

19.4 Hausaufgabe

Reproduzieren Sie die Mehrebenenmodelle aus der Vorlesung. Interpretieren Sie die Ergebnisse.
Schätzen Sie die umfangreicheren Modelle im Übungsskript. Passen Sie die Modelle nach Ihren Interessen an. Interpretieren Sie die Ergebnisse.

Lösung

Zu 1) Siehe Code und Ausgaben aus der Vorlesung
Zu 2) R Skript | HTML mit Output

19.5 Transkript

Hinweise zum automatisiert erstellten Transkript

Das folgende Transkript wurde auf Basis der Aufzeichnung der Vorlesung erstellt. Die vollständigen Aufzeichnungen inklusive der Bildschirminhalte sind in Blackboard🔒 verfügbar. Die Tonspur wurde zuerst mit Hilfe der Werkzeuge des Oral-History.Digital Projekts wörtlich transkribiert. Die wörtliche Transkription wurde in Kombination mit den Vorlesungsfolien mithilfe von Sprachmodellen (v. a. Claude Sonnet 4.5 und GPT 5.2) zu einem übersichtlichen Transkript zusammengefasst. Im Anschluss wurde das Transkript von einer studentischen Hilfskraft überprüft, geglättet und ggf. angepasst. In diesem Prozess kann es an verschiedenen Stellen zu Fehlern kommen. Im Zweifel gilt das gesprochene Wort, und auch beim Vortrag mache ich Fehler.

Ich stelle das Transkript hier als experimentelles, ergänzendes Material zur Dokumentation der Vorlesung zur Verfügung. Noch bin ich mir unsicher, ob es eine sinnvolle Ergänzung ist und behalte mir vor, es weiter zu bearbeiten oder zu löschen.