Alihan Hüyük , Qiyao Wei, Alicia Curth, Mihaela van der Schaar UniversityofCambridgeAbstract:Decision-makers are often experts of their domain and take actions based on their domain knowledge. Doctors, for instance, may prescribe treatments by predicting the likely outcome of each available treatment. Actions of an expert thus naturally encode part of their domain knowledge, and can help make inferences within the same domain: Knowing doctors try to prescribe the best treatment for their patients,we can tell treatments prescribed more frequently are likely to be more effective. Yet in machine learning, the fact that most decision-makers are experts is often overlooked, and “expertise” is seldom leveraged as an inductive bias. This is especially true for the literature on treatment effect estimation, where often theonly assumption made about actions is that of overlap. In this paper, we argue that expertise—particularly the type of expertise the decision-makers of a domain are likely to have—can be informative in designing and selecting methods for treatment effect estimation. We formally define two types of expertise, predictiveand prognostic, and demonstrate empirically that: (i) the prominent type of expertise in a domain significantly influences the performance of different methods in treatment effect estimation, and (ii) it is possible to predict the type of expertise present in a dataset, which can provide a quantitative basis for model selection.
Action-predictive representations: Finally, in stark contrast with balancing representations, we consider a final strategy that learns function so that representations zi = (xi) are actually predictive of actions ai. This strategy encodes predictive expertise as an inductive bias in the sense that it assumes that policies and outcome predictors can be represented in a joint space R from which it is easier to learn function f then from the original space. Such a strategy is implemented as DragonNet in Shi et al. (2019) (Action-predictive)albeit motivated from a different angle.3 The loss function for DragonNet looks like L = (cid:80) i(yi f (zi, ai) + log g(zi)[ai]) where function g : R ({0, 1}) is trained jointly with functions , f to predict action distributions.
id: f29c4567e8b36c95817ca3e7a0b5428b - page: 7
Performance metric As our main metric of performance, we consider precision in estimation of heterogeneous effects (PEHE)that is EX[( (X) (X))2]1/2 for an estimator (x). In the main paper, we focus on a single simulation environment, additional experiments can be found in Appendix D. 3Shi et al. (2019) consider average treatment effect estimationthat is the estimation of E[Y1 Y0]. In this context, policy plays a different, special, role: It is sufficient for adjustment (Rosenbaum & Rubin, 1983), which is what Shi et al. (2019) propose to exploit. Note that is not sufficient for adjustment in CATE estimation unless a(x) . = E[Ya|X = x] is a function of (x) alone. 7 Published as a conference paper at ICLR 2024
id: d395270c9b6a2f962c96124a05254385 - page: 7
4.1 PERFORMANCE UNDER DIFFERENT EXPERTISE SCENARIOS When varying policy as in Figure 2, as we move away from the best case scenario in Figure 1 to the case with high expertise and eventually to the worst case scenario (BestExpertWorst), the treatment effect estimation problem gets generally harder see Figure 3. In order to compensate for this inherent change in the difficulty of estimating treatment effects, in this subsection, we measure the performance of all methods relative to Baseline. e n i l e s a B f o E H E P 100 BestExpert (soft) WorstExpert (mis) 101
id: 278ee908f9339e61b7b493adc4284bdf - page: 8
Figures 4a and 4b show the PEHE improvement of different methods over Baseline for the setting with predictive expertise. Three Figure 3: As BestExpert (i.e. observations stand out: First, Action-predictive achieves better and away from the best case scenario), better performance over Baseline as the decision-makers expertise treatment effect estimation gets genincreases. This is because Action-predictive learns variables that erally harder and the performance of Baseline degrades. Similarly, as are predictive of actions, and when the expertise is high, these WorstExpert (i.e. away from the variables happen to be good predictors of the treatment effect as worst case scenario) the perforwell. Second, the performance of Balancing degrades relative to mance of Baseline improves instead. Baseline as the expertise increases. This is because more relevant features becoming predictive of the actions forces Balancing to exclude more of those features from its representation space.4 Third
id: f8b11a8e1182dbe63a79c1bb62cbf487 - page: 8