Estimation strategy to estimate conditional choice probabilities for ordered non-numeric outcomes.
multinomial_ml(Y = NULL, X = NULL, learner = "forest", scale = TRUE)
Object of class mml
.
Multinomial machine learning expresses conditional choice probabilities as expectations of binary variables:
$$p_m \left( X_i \right) = \mathbb{E} \left[ 1 \left( Y_i = m \right) | X_i \right]$$
This allows us to estimate each expectation separately using any regression algorithm to get an estimate of conditional probabilities.
multinomial_ml
combines this strategy with either regression forests or penalized logistic regressions with an L1 penalty,
according to the user-specified parameter learner
.
If learner == "l1"
, the penalty parameters are chosen via 10-fold cross-validation
and model.matrix
is used to handle non-numeric covariates. Additionally, if scale == TRUE
, the covariates are scaled to
have zero mean and unit variance.
Di Francesco, R. (2023). Ordered Correlation Forest. arXiv preprint arXiv:2309.08755.
## Generate synthetic data.
set.seed(1986)
data <- generate_ordered_data(100)
sample <- data$sample
Y <- sample$Y
X <- sample[, -1]
## Training-test split.
train_idx <- sample(seq_len(length(Y)), floor(length(Y) * 0.5))
Y_tr <- Y[train_idx]
X_tr <- X[train_idx, ]
Y_test <- Y[-train_idx]
X_test <- X[-train_idx, ]
## Fit multinomial machine learning on training sample using two different learners.
multinomial_forest <- multinomial_ml(Y_tr, X_tr, learner = "forest")
multinomial_l1 <- multinomial_ml(Y_tr, X_tr, learner = "l1")
## Predict out of sample.
predictions_forest <- predict(multinomial_forest, X_test)
predictions_l1 <- predict(multinomial_l1, X_test)
## Compare predictions.
cbind(head(predictions_forest), head(predictions_l1))
#> P(Y=1) P(Y=2) P(Y=3) P(Y=1) P(Y=2) P(Y=3)
#> [1,] 0.3537709 0.4865778 0.15965128 0.37483081 0.4934319 0.13173734
#> [2,] 0.6324324 0.2491552 0.11841243 0.39553675 0.4512993 0.15316400
#> [3,] 0.1416059 0.4282620 0.43013203 0.06737991 0.3712126 0.56140745
#> [4,] 0.6299436 0.2814432 0.08861318 0.59561559 0.3571572 0.04722723
#> [5,] 0.4841232 0.2875252 0.22835164 0.32814211 0.3535764 0.31828152
#> [6,] 0.6875407 0.2334853 0.07897406 0.43330267 0.4505802 0.11611709