As outlined in the short tutorial, we can test specific hypotheses regarding the BLP, GATES, and RATEs to validate the estimated heterogeneity. This article explores these hypotheses and their implications.
The notation is the same as in the short tutorial.
Consider the heterogeneity parameter . Notice that in two cases:
Consequently, either if the effects are homogeneous or our CATE estimates are unreliable. On the other hand, if for all (that is, if we have “perfect” CATE estimates). Therefore, if our CATE estimates are reliable.
We can thus consider the hypothesis as a test for effect heterogeneity and the reliability of our CATE estimates. If the effects are homogeneous, or if our estimates are unreliable (or if both conditions hold), then is close to zero, and we should fail to reject our hypothesis. Conversely, if the effects are heterogeneous and our estimates are accurate, then is close to one, and we should reject our hypothesis.
If we estimate the BLP by one of the strategies outlined in the short tutorial and using only the validation sample, then exhibits well-behaved asymptotic properties conditioned on the training sample. This enables us to employ standard tools for inference, such as conventional confidence intervals and -values.
One could compare the estimated GATES to validate how various groups respond differently to the treatment. However, disparities in the point estimates can emerge merely due to estimation noise.
A more appropriate method for assessing the presence of systematic heterogeneity is to test the hypothesis that all GATES are identical, namely . Alternatively, we can test whether the difference in the GATES for the most and least affected groups is statistically significant, that is, .
If we estimate the GATES by one of the strategies outlined in the short tutorial and using only the validation sample, then exhibits well-behaved asymptotic properties conditioned on the training sample. This enables us to employ standard tools for inference, such as conventional confidence intervals and -values.
Notice that for any in two cases:
Consequently, either if the effects are homogeneous or our CATE estimates are unreliable.
We can thus consider the hypothesis as a test for effect heterogeneity and the reliability of our CATE estimates. If the effects are homogeneous, or if our estimates are unreliable (or if both conditions hold), then is close to zero, and we should fail to reject our hypothesis. Conversely, if the effects are heterogeneous and our estimates are accurate, then is large enough so that should reject our hypothesis.
If we estimate the RATE by the strategy outlined in the short tutorial and using only the validation sample, then exhibits well-behaved asymptotic properties conditioned on the training sample. This enables us to employ standard tools for inference, such as conventional confidence intervals and -values.