As described in the short
tutorial, we can get standard errors for the GATEs by estimating via
OLS appropriate linear models. Then, under an “honesty” condition, we
can use the estimated standard errors to conduct valid inference about
the GATEs as usual, e.g., by constructing conventional confidence
intervals. In this article, we discuss which linear models are estimated
by the `inference_aggtree`

function.

As mentioned above, we require an honesty condition to achieve valid inference about the GATEs.

Honesty is a subsample-splitting technique that requires that different observations are used to form the subgroups and estimate the GATEs.

To this end, `inference_aggtree`

always uses the honest
sample to estimate the linear models below, unless we called the
`build_aggtree`

function without allocating any observation
to the honest sample (e.g., we set `honest_frac = 0`

or we
used a vector of FALSEs for the `is_honest`

argument).

When calling the `build_aggtree`

function, the user can
control the GATE estimator employed by the routine by setting the
`method`

argument. The `inference_aggtree`

function inherits this argument and selects the model to be estimated
accordingly.

If `method`

is set to `"raw"`

, the
`inference_aggtree`

function estimates via OLS the following
linear model:

$\begin{equation} Y_i = \sum_{l = 1}^{|\mathcal{T_{\alpha}}|} L_{i, l} \, \gamma_l + \sum_{l = 1}^{|\mathcal{T}_{\alpha}|} L_{i, l} \, D_i \, \beta_l + \epsilon_i \end{equation}$

with $|\mathcal{T}_{\alpha}|$ the number of leaves of a particular tree $\mathcal{T}_{\alpha}$, and $L_{i, l}$ a dummy variable equal to one if the $i$-th unit falls in the $l$-th leaf of $\mathcal{T}_{\alpha}$.

Exploiting the random assignment to treatment, we can show that each $\beta_l$ identifies the GATE in the $l$-th leaf.

Under honesty, the OLS estimator $\hat{\beta}_l$ of $\beta_l$ is root-$n$ consistent and asymptotically normal.

If `method`

is set to `"aipw"`

, the
`inference_aggtree`

function estimates via OLS the following
linear model:

$\begin{equation} \widehat{\Gamma}_i = \sum_{l = 1}^{|\mathcal{T}_{\alpha}|} L_{i, l} \, \beta_l + \epsilon_i \end{equation}$

where $\Gamma_i$ writes as:

$\begin{equation*} \Gamma_i = \mu \left( 1, X_i \right) - \mu \left( 0, X_i \right) + \frac{D_i \left[ Y_i - \mu \left( 1, X_i \right) \right]}{p \left( X_i \right)} - \frac{ \left( 1 - D_i \right) \left[ Y_i - \mu \left( 0, X_i \right) \right]}{1 - p \left( X_i \right)} \end{equation*}$

with $\mu \left(D_i, X_i \right) = \mathbb{E} \left[ Y_i | D_i, Z_i \right]$ the conditional mean of $Y_i$ and $p \left( X_i \right) = \mathbb{P} \left( D_i = 1 | X_i \right)$ the propensity score.

The doubly-robust scores
$\Gamma_i$
are inherited by the output of the `build_aggtree`

function.

As before, we can show that each $\beta_l$ identifies the GATE in the $l$-th leaf, this time even in observational studies.

Under honesty, the OLS estimator $\hat{\beta}_l$ of $\beta_l$ is root-$n$ consistent and asymptotically normal, provided that the $\Gamma_i$ are cross-fitted in the honest sample and that the product of the convergence rates of the estimators of the nuisance functions $\mu \left( \cdot, \cdot \right)$ and $p \left( \cdot \right)$ is faster than $n^{1/2}$.