Abstract
Research on “improper” linear models has shown that predetermined weighting schemes for the linear model, such as equally weighting all predictors, can be surprisingly accurate on cross-validation. We review recent advances that can characterize the optimal choice of an improper linear model. We extend this research to the understanding of fast and frugal heuristics, particularly to the ecologically rational goal of understanding in which task environments given heuristics are optimal. We demonstrate how to test this model using the Recognition Heuristic and Take the Best heuristic, show how the model reconciles with the ecological rationality program, and discuss how our prescriptive, computational approach could be approximated by simpler mental rules that might be more descriptive. Echoing the arguments of van Rooij et al. (Synthese 187:471–487, 2012), we stress the virtue of having a computationally tractable model of strategy selection, even if one proposes that cognizers use a simpler heuristic process to approximate it.
Similar content being viewed by others
Notes
The random error term, \(\epsilon\), is assumed to be distributed as a random variable with mean zero and constant error variance, \(\sigma\). Note that we do not assume any particular distributional form for \(\epsilon\), e.g., normality.
This formulation was designed for binary cues. It is more difficult to construct to represent a lexicographic weighting with continuous cues. For example, with truly continuous cues, the second cue value could be arbitrarily larger than the first, having an almost infinite ratio, so no weight placed on the first cue would “drown out” the importance of the second cue. Some researchers investigating TTB with continuous cues turn them into binary cues, for example by performing a midpoint split and recoding to 1 all values above the midpoint and 0 all values below the midpoint.
References
Brandstätter, E., Gigerenzer, G., & Hertwig, R. (2006). The priority heuristic: Making choices without trade-offs. Psychological Review, 113, 409–432.
Czerlinski, J., Gigerenzer, G., & Goldstein, D. G. (1999). How good are simple heuristics? In G. Gigerenzer, P. M. Todd, & The ABC Research Group (Eds.), Simple heuristics that make us smart (pp. 97-118). New York: Oxford University Press.
Dana, J. (2008). What makes improper linear models tick? In J. Krueger (Ed.), Rationality and social responsibility: Essays in honor of Robyn Mason Dawes. Mahwah, NJ: Lawrence Erlbaum Associates.
Dana, J., & Dawes, R. M. (2004). The superiority of simple alternatives to regression for social science predictions. Journal of Educational and Behavioral Statistics, 3, 317–331.
Davis-Stober, C. P. (2011). A geometric analysis of when fixed weighting schemes will outperform ordinary least squares. Psychometrika, 76, 650–669.
Davis-Stober, C. P., Dana, J., & Budescu, D. (2010a). A constrained linear estimator for multiple regression. Psychometrika, 75, 521–541.
Davis-Stober, C. P., Dana, J., & Budescu, D. (2010b). Why recognition is rational: Optimality results on single-variable decision rules. Judgment and Decision Making, 5, 216–229.
Dawes, R. M. (1979). The robust beauty of improper linear models. The American Psychologist, 34, 571–582.
Dawes, R. M., & Corrigan, B. (1974). Linear models in decision making. Psychological Bulletin, 81, 95–106.
Einhorn, H. J., & Hogarth, R. M. (1975). Unit weighting schemes for decision making. Organizational Behavior and Human Performance, 13, 171–192.
Fasolo, B., McClelland, G. H., & Todd, P. M. (2007). Escaping the tyranny of choice: When fewer attributes make choice easier. Marketing Theory, 7, 13–26.
Flury, B., & Riedwyl, H. (1985). T2 tests, the linear two-group discriminant function, and their computation by linear regression. The American Statistician, 39, 20–25.
Gigerenzer, G. (1991). From tools to theories: A heuristic of discovery in cognitive psychology. Psychological Review, 98, 254–267.
Gigerenzer, G. (2008). Why heuristics work. Perspectives on Psychological Science, 3, 20–29.
Gigerenzer, G., & Brighton, H. (2009). Homo heuristics: Why biased minds make better inferences. Topics in Cognitive Science, 1, 107–143.
Gigerenzer, G., & Goldstein, D. G. (1996). Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review, 103, 650–669.
Gigerenzer, G., Todd, P. M., & the ABC Research Group. (1999). Simple heuristics that make us smart. New York: Oxford University Press.
Goldstein, D. G. (1997). Models of bounded rationality for inference. Doctoral thesis, The University of Chicago. Dissertation Abstracts International, 58(01), 435B. (University Microfilms No. AAT 9720040).
Goldstein, D. G., & Gigerenzer, G. (2002). Models of ecological rationality: The recognition heuristic. Psychological Review, 109, 75–90.
Goldstein, D. G., & Gigerenzer, G. (2009). Fast and frugal forecasting. International Journal of Forecasting, 25, 760–772.
Hertwig, R., Davis, J. N., & Sulloway, F. J. (2002). Parental investment: How an equity motive can produce inequality. Psychological Bulletin, 128, 728–745.
Hogarth, R. M., & Karelaia, N. (2005). Ignoring information in binary choice with continuous variables: When is less more? Journal of Mathematical Psychology, 49, 115–124.
Hogarth, R. M., & Karelaia, N. (2006). “Take-The-Best” and other simple strategies: Why and when they work “well” with binary cues. Theory and Decision, 61, 205–249.
Kahneman, D., Slovic, P., & Tversky, A. (Eds.). (1982). Judgment under uncertainty: Heuristics and biases. Cambridge: Cambridge University Press.
Katsikopoulos, K. V. (2011). Psychological heuristics for making inferences: Definition, performance, and the emerging theory and practice. Decision Analysis, 8, 10–29.
Katsikopoulos, K. V., Schooler, L. J., & Hertwig, R. (2010). The robust beauty of ordinary information. Psychological Review, 117, 1259–1266.
Lehmann, E. L., & Casella, G. (1998). Theory of point estimation (2nd ed.). New York: Springer.
Marden, J. I. (2013). Multivariate statistics old school. Department of Statistics, The University of Illinois at Urbana-Champagin.
Martignon, L., & Hoffrage, U. (2002). Fast, frugal, and fit: Simple heuristics for paired comparison. Theory and Decision, 52, 29–71.
Schmidt, F. L. (1971). The relative efficiency of regression and simple unit predictor weights in applied differential psychology. Educational and Psychological Measurement, 31, 699–714.
Shanteau, J., & Thomas, R. P. (2000). Fast and frugal heuristics: What about unfriendly environments? Behavioral and Brain Sciences, 23, 762–763.
van Rooij, I., Wright, C. D., & Wareham, T. (2012). Intractability and the use of heuristics in psychological explanations. Synthese, 187, 471–487.
von Winterfeldt, D., & Edwards, W. (1973). Costs and payoffs in perceptual research. Technical Report, No. 011313-1-T, Engineering Psychology Laboratory, University of Michigan.
Wainer, H. (1976). Estimating coefficients in linear models: It dont make no nevermind. Psychological Bulletin, 83, 213–217.
Wilks, S. S. (1938). Weighting systems for linear functions of correlated variables when there is no dependent variable. Psychometrika, 3, 23–40.
Acknowledgments
We thank Mirjam Jenny and Jean Whitmore for helpful comments.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1
Prior work has shown that simple heuristics, such as TTB, can outperform standard statistical methods, such as regression, in many task environments (Katsikopoulos et al. 2010). There is a parallel literature on comparing improper linear models to standard regression techniques (Dana and Dawes 2004; Davis-Stober 2011; Dawes 1979; Wainer 1976), demonstrating that improper linear models often perform quite favorably for task environments where information is limited, i.e., sample and effect sizes are small to modest in size. There is converging evidence that simple decision heuristics and improper linear models work well for similar reasons. For example, the loss function we have used, mean squared error, can be decomposed into the sum of an estimator’s squared bias and its variance. Said simply, one can improve overall accuracy, mean squared error, by systematically accepting a small amount of bias while reducing the variability of the estimates. As described by Dana (2008) and Davis-Stober et al. (2010a), this is one interpretation for the strong performance of improper linear models when compared to standard estimation methods. The improper models are biased as the weighting policies are pre-determined, yet variance is greatly reduced for the same reason; on the other hand, standard methods, such as regression, are unbiased in their estimates but are more variable from sample to sample.
In this appendix, we summarize the main results from Davis-Stober et al. (2010a) that underlie our approach. We refer readers to the original papers for detailed proofs. Throughout, we consider the standard regression model where \(\textit{Y} = \varvec{X\beta } + \epsilon\), where \(\varvec{X}\) is a known \(n \times p\) design matrix, \(\epsilon \sim (0, \; \sigma ^{2}\varvec{I}_{n \times n})\), \(\varvec{\beta }\) is a set of unknown population weights, and the matrix \(\varvec{X'X}\) is full rank and positive-definite. We now formerly define our estimator of \(\varvec{\beta }\) that conforms to the weighting policy of a suitably chosen improper linear model. We term this estimator the constrained estimator.
Definition
(Davis-Stober et al. 2010a). Define \(\varvec{a}\) to be an exogenously chosen \(p \times 1\) weighting vector, with \(\Vert \varvec{a}\Vert ^{2} > 0\). Assume \(\textit{Y} \sim (\varvec{X\beta }, \; \sigma ^{2}I_{n \times n})\) with i.i.d. sampling. The constrained estimator \({\hat{\varvec{\beta }}}_{\varvec{a}}\) is an estimator of \(\varvec{\beta }\) and is defined as follows:
where \(k=(\varvec{a'X'Xa})^{-1}\varvec{a'X'y}\).
This formulation allows us to compare the mean squared error (a.k.a. loss) of the constrained estimator \({\hat{\varvec{\beta}}}_{\varvec{a}}\) to more traditional estimators such as the standard regression estimator, Ordinary Least Squares (OLS). We can also calculate the loss of \({\hat{\varvec{\beta}}}_{\varvec{a}}\) under various choices of exogenously chosen weights, \(\varvec{a}\). In other words, given different predictor cue structures, \(\varvec{X}\), we can determine which choices of \(\varvec{a}\) result in a constrained estimator that incurs less loss than others.
A potential stumbling block is that the true value of the population weights, \(\varvec{\beta }\), is unknown. Because the parameter k is a scalar value and cannot change the pre-determined relationships among the estimates in \({\hat{\varvec{\beta}}}_{\varvec{a}}\), which are determined by \(\varvec{a}\), the constrained estimator is both biased and inconsistent. Further, we don’t know, a priori, how biased the constrained estimator will be for any particular choice of \(\varvec{a}\). In other words, if our exogenously chosen weights are, perhaps by luck or ecological rationality principles, very similar to the population weights then the resulting constrained estimator would not be very biased, resulting in a relatively small value of mean squared error. On the other hand, a choice of exogenously chosen weights could be quite different than \(\varvec{\beta }\), resulting in a larger bias, hence larger mean squared error, all things equal.
We solve this problem by solving for the maximum possible mean squared error. The following theorem provides a tight upper bound for the mean squared error of \({\hat{\varvec{\beta}}}_{\varvec{a}}\), and this value can be expressed as a function of the exogenously chosen weights, \(\varvec{a}\), and the matrix \(\varvec{X}\).
Theorem 1
(Davis-Stober et al. 2010a). Assume \(\varvec{a}\) is chosen exogenously and without loss of generality let \(\Vert \varvec{a}\Vert = 1, \Vert \varvec{\beta }\Vert ^{2} < \infty , and\,\textit{Y} \sim (\varvec{X\beta }, \sigma ^{2}\varvec{I}_{n \times n})\) with i.i.d. sampling. The mean squared error (MSE) of \({\hat{\varvec{\beta}}}_{\varvec{a}}\) is bounded above by the following,
This maximal MSE is attained when the population weights, \(\varvec{\beta }\), is a scale multiple of the vector
Theorem 1 provides two important pieces of information. First, for any choice of weights \(\varvec{a}\) and matrix \(\varvec{X'X}\), it explicitly describes the maximum loss that the constrained estimator can incur. Second, it provides the population weights, \(\varvec{\beta }^{\varvec{*}}\), under which this maximal loss occurs. Not surprisingly, this worst case set of population weights is geometrically orthogonal to \(\varvec{a}\) (Davis-Stober et al. 2010a). Under this “worst case” condition, the improper linear model being deployed, \(\varvec{a}\), is maximally unfit given the true weighting relationship among the dependent variable and predictors, \(\varvec{\beta }^{\varvec{*}}\).
Theorem 1 also gives an explicit description of maximal loss in terms of the well-known bias-variance trade-off. As is well-known, mean squared error can be written as the sum of an estimator’s squared bias and its variance. More formally, for any estimator, \({\hat{\varvec{\beta}}}\),
where \(\varvec{\Psi }_{\hat{\varvec{\beta }}}\) is the covariance matrix of \(\hat{\varvec{\beta }}\), and “tr” denotes the trace operator. Returning to Theorem 1, the maximal value of mean squared error for the constrained estimator is written as a sum of its squared bias, \(\Vert \varvec{\beta }\Vert ^{2} \left( \frac{\varvec{a}'(\varvec{X}'\varvec{X})^{2} \varvec{a}}{(\varvec{a}'\varvec{X}'\varvec{Xa})^{2}} \right)\), and the sum of its variance, \(\frac{\sigma ^{2}}{\varvec{a'X'Xa}}\), see Davis-Stober et al. (2010a) for the complete derivation and proof. The value, \(\Vert \varvec{\beta }\Vert ^{2}\), in the bias term can be equated with overall effect size and does not depend on the direction of the population weights, only the sum of their squared values. The remaining values in the bias and variance terms are only functions of \(\varvec{a}\) and \(\varvec{X'X}\). From this interpretation, the relationship between a choice of \(\varvec{a}\) and the predictor variables is clear. Minimizing the maximal loss that one could incur depends upon optimizing the relationship between \(\varvec{a}\) and \(\varvec{X'X}\), as the product \(\varvec{X'Xa}\) features prominently in both the maximal bias and variance terms in Eq. (8). This is quite intuitive, as for mean-centered predictor variables, \(\varvec{X'X} = (n-1)\varvec{C}\), where \(\varvec{C}\) is the covariance matrix among the predictor cues. In other words, to minimize maximum loss, the mini–max criterion, we need only find the best choice of \(\varvec{a}\) for a particular predictor cue covariance structure. The following theorem gives precisely this result.
Theorem 2
(Davis-Stober et al. 2010a). Assume \(\varvec{X}\) is given, and without loss of generality let \(\Vert \varvec{a}\Vert ^{2} = 1\). Define \(\lambda _{\max }\) as the largest eigenvalue of the matrix \(\varvec{X'X}\). Assume \(\Vert \varvec{\beta }\Vert ^{2} < \infty\). The weight vector \(\varvec{a}\) that is mini–max with respect to all exogenously chosen \(\varvec{a} \in \mathbb {R}^{p}\), is the eigenvector corresponding to the largest eigenvalue of \(\varvec{X'X}\). The maximal value of \(MSE_{\hat{\varvec{\beta }}_{\varvec{a}}}\) is bounded by the following inequality:
Putting everything together, Theorems 1 and 2 provide necessary and sufficient conditions for finding an optimal choice of improper weighting policy, according to the mini–max criterion, given a predictor cue covariance structure \(\varvec{C}\). By Theorem 2, this optimal weighting policy will be mini–max with respect to all other exogenously chosen weights that could be considered.
Appendix 2
Proof of Result 1
The proof follows by direct calculation.
which is equal to
which simplifies to
Thus, \(\varvec{a}_{1}\) is an eigenvector of \(\varvec{Cov}_{\varvec{cascade}}\) with an eigenvalue equal to \(\lambda _{max}=1 + c\sum _{i=0}^{p-1}2^{2i}\). It is routine to show that all other eigenvectors of \(\varvec{Cov}_{\varvec{cascade}}\) have an eigenvalue equal to 1, thus \(\lambda _{max}\) is maximal for any positive value of c. \(\square\)
Rights and permissions
About this article
Cite this article
Dana, J., Davis-Stober, C.P. Rational Foundations of Fast and Frugal Heuristics: The Ecological Rationality of Strategy Selection via Improper Linear Models. Minds & Machines 26, 61–86 (2016). https://doi.org/10.1007/s11023-015-9372-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11023-015-9372-z