As outlined in the short tutorial, we can test specific hypotheses regarding the BLP, GATES, and RATEs to validate the estimated heterogeneity. This article explores these hypotheses and their implications.

The notation is the same as in the short tutorial.

BLP

Consider the heterogeneity parameter β2=Cov[τ(Xi),τ̂(Xi)]/Var[τ̂(Xi)]\beta_2 = Cov [ \tau ( X_i ), \hat{\tau} ( X_i ) ] / Var [ \hat{\tau} ( X_i ) ]. Notice that Cov[τ(Xi),τ̂(Xi)]=0Cov [ \tau ( X_i ), \hat{\tau} ( X_i ) ] = 0 in two cases:

  • If τ(x)=τ\tau ( x) = \tau for all xx (that is, if the effects are homogeneous);
  • If τ̂()\hat{\tau} ( \cdot ) is pure noise uncorrelated to τ()\tau ( \cdot ) (that is, if our CATE estimates are really bad).

Consequently, β2=0\beta_2 = 0 either if the effects are homogeneous or our CATE estimates are unreliable. On the other hand, Cov[τ(Xi),τ̂(Xi)]=Var[τ̂(Xi)]Cov [ \tau ( X_i ), \hat{\tau} ( X_i ) ] = Var [ \hat{\tau} ( X_i ) ] if τ̂(x)=τ(x)\hat{\tau} ( x ) = \tau ( x) for all xx (that is, if we have “perfect” CATE estimates). Therefore, β21\beta_2 \approx 1 if our CATE estimates are reliable.

We can thus consider the hypothesis β2=0\beta_2 = 0 as a test for effect heterogeneity and the reliability of our CATE estimates. If the effects are homogeneous, or if our estimates are unreliable (or if both conditions hold), then β2\beta_2 is close to zero, and we should fail to reject our hypothesis. Conversely, if the effects are heterogeneous and our estimates are accurate, then β2\beta_2 is close to one, and we should reject our hypothesis.

If we estimate the BLP by one of the strategies outlined in the short tutorial and using only the validation sample, then β̂2\hat{\beta}_2 exhibits well-behaved asymptotic properties conditioned on the training sample. This enables us to employ standard tools for inference, such as conventional confidence intervals and pp-values.

GATES

One could compare the estimated GATES to validate how various groups respond differently to the treatment. However, disparities in the point estimates can emerge merely due to estimation noise.

A more appropriate method for assessing the presence of systematic heterogeneity is to test the hypothesis that all GATES are identical, namely γ1=γ2==γK\gamma_1 = \gamma_2 = \dots = \gamma_K. Alternatively, we can test whether the difference in the GATES for the most and least affected groups is statistically significant, that is, γK=γ1\gamma_K = \gamma_1.

If we estimate the GATES by one of the strategies outlined in the short tutorial and using only the validation sample, then γ̂k\hat{\gamma}_k exhibits well-behaved asymptotic properties conditioned on the training sample. This enables us to employ standard tools for inference, such as conventional confidence intervals and pp-values.

RATE

Notice that TOC(u;τ̂)=0TOC ( u; \hat{\tau} ) = 0 for any u(0,1]u \in (0, 1] in two cases:

  • If τ(x)=τ\tau ( x) = \tau for all xx (that is, if the effects are homogeneous);
  • If τ̂()\hat{\tau} ( \cdot ) is pure noise uncorrelated to τ()\tau ( \cdot ) (that is, if our CATE estimates are really bad).

Consequently, θα(τ̂)=0\theta_{\alpha} ( \hat{\tau} ) = 0 either if the effects are homogeneous or our CATE estimates are unreliable.

We can thus consider the hypothesis θα(τ̂)=0\theta_{\alpha} ( \hat{\tau} ) = 0 as a test for effect heterogeneity and the reliability of our CATE estimates. If the effects are homogeneous, or if our estimates are unreliable (or if both conditions hold), then θα(τ̂)\theta_{\alpha} ( \hat{\tau} ) is close to zero, and we should fail to reject our hypothesis. Conversely, if the effects are heterogeneous and our estimates are accurate, then θα(τ̂)\theta_{\alpha} ( \hat{\tau} ) is large enough so that should reject our hypothesis.

If we estimate the RATE by the strategy outlined in the short tutorial and using only the validation sample, then θ̂α(τ̂)\hat{\theta}_{\alpha} ( \hat{\tau} ) exhibits well-behaved asymptotic properties conditioned on the training sample. This enables us to employ standard tools for inference, such as conventional confidence intervals and pp-values.