Abstract
Model evaluation for long term climate predictions must be done on quantities other than the actual prediction, and a comprehensive uncertainty quantification is impossible. An ad hoc alternative is provided by coordinated model intercomparisons which typically use a “one model one vote” approach. The problem with such an approach is that it treats all models as independent and equally plausible. Reweighting all models of the ensemble for performance and dependence seems like an obvious way to improve on model democracy, yet there are open questions on what constitutes a “good” model, how to define dependency, how to interpret robustness, and how to incorporate background knowledge. Under¬standing those issues has the potential to increase confidence in model predictions in modeling efforts outside of climate science where similar challenges exist.