Generate a synthetic data set with qualitative outcomes under a regression discontinuity design. The data include a binary treatment indicator and a single covariate (the running variable). The conditional probability mass fuctions of potential outcomes are continuous in the running variable.

generate_qualitative_data_rd(n, outcome_type)

Arguments

n

Sample size.

outcome_type

String controlling the outcome type. Must be either "multinomial" or "ordered". Affects how potential outcomes are generated.

Value

A list storing a data frame with the observed data, and the true probabilities of shift at the cutoff.

Details

Outcome type

Potential outcomes are generated differently according to outcome_type. If outcome_type == "multinomial", generate_qualitative_data_rd computes linear predictors for each class using the covariates:

$$\eta_{mi} (d) = \beta_{m1}^d X_{i1} + \beta_{m2}^d X_{i2} + \beta_{m3}^d X_{i3}, \quad d = 0, 1,$$

and then transforms \(\eta_{mi} (d)\) into valid probability distributions using the softmax function:

$$P(Y_i(d) = m | X_i) = \frac{\exp(\eta_{mi} (d))}{\sum_{m'} \exp(\eta_{m'i}(d))}.$$

It then generates potential outcomes \(Y_i(1)\) and \(Y_i(0)\) by sampling from {1, 2, 3} using \(P(Y_i(d) = m | X_i), \, d = 0, 1\).

If instead outcome_type == "ordered", generate_qualitative_data_rd first generates latent potential outcomes:

$$Y_i^* (d) = \tau d + X_{i1} + X_{i2} + X_{i3} + N (0, 1), \quad d = 0, 1,$$

with \(\tau = 2\). It then constructs \(Y_i (d)\) by discretizing \(Y_i^* (d)\) using threshold parameters \(\zeta_1 = 2\) and \(\zeta_2 = 4\). Then,

$$P(Y_i(d) = m) = P(\zeta_{m-1} < Y_i^*(d) \leq \zeta_m) = \Phi (\zeta_m - \sum_j X_{ij} - \tau d) - \Phi (\zeta_{m-1} - \sum_j X_{ij} - \tau d), \quad d = 0, 1,$$

which allows us to analytically compute the probabilities of shift at the cutoff.

Treatment assignment

Treatment is always assigned as \(D_i = 1(X_i \geq 0.5)\).

Other details

The function always generates three independent covariates from \(U(0,1)\). Observed outcomes \(Y_i\) are always constructed using the usual observational rule.

Author

Riccardo Di Francesco

Examples

## Generate synthetic data.
set.seed(1986)

data <- generate_qualitative_data_rd(100,
                                     outcome_type = "ordered")

data$pshifts_cutoff
#> [1] -0.5267177 -0.1136908  0.6404085