generate_qualitative_data_soo.Rd
Generate a synthetic data set with qualitative outcomes under a selection-on-observables design. The data include a binary treatment indicator and a matrix of covariates. The treatment is either independent or conditionally (on the covariates) independent of potential outcomes, depending on users' choices.
generate_qualitative_data_soo(n, assignment, outcome_type)
Sample size.
String controlling treatment assignment. Must be either "randomized"
(random assignment) or "observational"
(random assigment conditional on the generated covariates).
String controlling the outcome type. Must be either "multinomial"
or "ordered"
. Affects how potential outcomes are generated.
A list storing a data frame with the observed data, the true propensity score, and the true probabilities of shift.
Potential outcomes are generated differently according to outcome_type
. If outcome_type == "multinomial"
, generate_qualitative_data_soo
computes linear predictors for each class using the covariates:
$$\eta_{mi} (d) = \beta_{m1}^d X_{i1} + \beta_{m2}^d X_{i2} + \beta_{m3}^d X_{i3}, \quad d = 0, 1,$$
and then transforms \(\eta_{mi} (d)\) into valid probability distributions using the softmax function:
$$P(Y_i(d) = m | X_i) = \frac{\exp(\eta_{mi} (d))}{\sum_{m'} \exp(\eta_{m'i}(d))}, \quad d = 0, 1.$$
It then generates potential outcomes \(Y_i(1)\) and \(Y_i(0)\) by sampling from {1, 2, 3} using \(P(Y_i(d) = m | X_i), \, d = 0, 1\).
If instead outcome_type == "ordered"
, generate_qualitative_data_soo
first generates latent potential outcomes:
$$Y_i^* (d) = \tau d + X_{i1} + X_{i2} + X_{i3} + N (0, 1), \quad d = 0, 1,$$
with \(\tau = 2\). It then constructs \(Y_i (d)\) by discretizing \(Y_i^* (d)\) using threshold parameters \(\zeta_1 = 2\) and \(\zeta_2 = 4\). Then,
$$P(Y_i(d) = m | X_i) = P(\zeta_{m-1} < Y_i^*(d) \leq \zeta_m | X_i) = \Phi (\zeta_m - \sum_j X_{ij} - \tau d) - \Phi (\zeta_{m-1} - \sum_j X_{ij} - \tau d), \quad d = 0, 1,$$
which allows us to analytically compute the probabilities of shift.
## Generate synthetic data.
set.seed(1986)
data <- generate_qualitative_data_soo(100,
assignment = "observational",
outcome_type = "ordered")
data$pshifts
#> [1] -0.577876162 -0.006807437 0.584683599