As described in the short tutorial, we can get standard errors for the GATEs by estimating via OLS appropriate linear models. Then, under an “honesty” condition, we can use the estimated standard errors to conduct valid inference about the GATEs as usual, e.g., by constructing conventional confidence intervals. In this article, we discuss which linear models are estimated by the inference_aggtree function.

Honesty

As mentioned above, we require an honesty condition to achieve valid inference about the GATEs.

Honesty is a subsample-splitting technique that requires that different observations are used to form the subgroups and estimate the GATEs.

To this end, inference_aggtree always uses the honest sample to estimate the linear models below, unless we called the build_aggtree function without allocating any observation to the honest sample (e.g., we set honest_frac = 0 or we used a vector of FALSEs for the is_honest argument).

Linear Models

When calling the build_aggtree function, the user can control the GATE estimator employed by the routine by setting the method argument. The inference_aggtree function inherits this argument and selects the model to be estimated accordingly.

Difference in Mean Outcomes

If method is set to "raw", the inference_aggtree function estimates via OLS the following linear model:

Yi=l=1|𝒯α|Li,lγl+l=1|𝒯α|Li,lDiβl+ϵi\begin{equation} Y_i = \sum_{l = 1}^{|\mathcal{T_{\alpha}}|} L_{i, l} \, \gamma_l + \sum_{l = 1}^{|\mathcal{T}_{\alpha}|} L_{i, l} \, D_i \, \beta_l + \epsilon_i \end{equation}

with |𝒯α||\mathcal{T}_{\alpha}| the number of leaves of a particular tree 𝒯α\mathcal{T}_{\alpha}, and Li,lL_{i, l} a dummy variable equal to one if the ii-th unit falls in the ll-th leaf of 𝒯α\mathcal{T}_{\alpha}.

Exploiting the random assignment to treatment, we can show that each βl\beta_l identifies the GATE in the ll-th leaf.

Under honesty, the OLS estimator β̂l\hat{\beta}_l of βl\beta_l is root-nn consistent and asymptotically normal.

Doubly-Robust Scores

If method is set to "aipw", the inference_aggtree function estimates via OLS the following linear model:

Γ̂i=l=1|𝒯α|Li,lβl+ϵi\begin{equation} \widehat{\Gamma}_i = \sum_{l = 1}^{|\mathcal{T}_{\alpha}|} L_{i, l} \, \beta_l + \epsilon_i \end{equation}

where Γi\Gamma_i writes as:

Γi=μ(1,Xi)μ(0,Xi)+Di[Yiμ(1,Xi)]p(Xi)(1Di)[Yiμ(0,Xi)]1p(Xi)\begin{equation*} \Gamma_i = \mu \left( 1, X_i \right) - \mu \left( 0, X_i \right) + \frac{D_i \left[ Y_i - \mu \left( 1, X_i \right) \right]}{p \left( X_i \right)} - \frac{ \left( 1 - D_i \right) \left[ Y_i - \mu \left( 0, X_i \right) \right]}{1 - p \left( X_i \right)} \end{equation*}

with μ(Di,Xi)=𝔼[Yi|Di,Zi]\mu \left(D_i, X_i \right) = \mathbb{E} \left[ Y_i | D_i, Z_i \right] the conditional mean of YiY_i and p(Xi)=(Di=1|Xi)p \left( X_i \right) = \mathbb{P} \left( D_i = 1 | X_i \right) the propensity score.

The doubly-robust scores Γi\Gamma_i are inherited by the output of the build_aggtree function.

As before, we can show that each βl\beta_l identifies the GATE in the ll-th leaf, this time even in observational studies.

Under honesty, the OLS estimator β̂l\hat{\beta}_l of βl\beta_l is root-nn consistent and asymptotically normal, provided that the Γi\Gamma_i are cross-fitted in the honest sample and that the product of the convergence rates of the estimators of the nuisance functions μ(,)\mu \left( \cdot, \cdot \right) and p()p \left( \cdot \right) is faster than n1/2n^{1/2}.