Abstract

Weights of evidence (WofE) and logistic regression (LR) are two loglinear methods for mineral potential mapping. Both models are limited by their respective basic assumptions in application. Ideally, WofE indicator patterns have the property of conditional independence (CI) with respect to the point pattern of mineral deposits to be predicted; in LR, there supposedly are no interactions between the point pattern and two or more of the indicator patterns. If the CI assumption is satisfied, estimated LR coefficients become approximately equal to WofE contrasts and the two methods produce similar results; additionally, bias then is avoided in that the sum of all estimated posterior probabilities becomes approximately equal to the number of observed discrete events. WofE allows construction of input layers that have missing data as a separate category in addition to known presence-absence type input, while logistic regression as such is not capable of handling missing data. As an improved WofE model based on LR, modified weights of evidence (MWofE) inherit the advantages of both LR and WofE, i.e., eliminates bias due to lack of CI and can handle missing data as well. Pixel or unit area input for MWofE consists of positive and negative weights for presence and absence of a pattern plus zeros for missing data. MWofE first is illustrated by application to simple examples. Next, it is applied to a study area with 20 known gold occurrences in southwestern Nova Scotia in relation to four input layers based on geological and lake geochemical data. Assuming that geochemical data were missing for the northern part of the study area, MWofE, like WofE but unlike LR, provides posterior probabilities for the entire area.

1. Introduction

Introductions to basic principles of weights of evidence (WofE) can be found in Agterberg [1], Bonham-Carter et al. [2], and Bonham-Carter [3]. The method has been applied widely in different fields. The reader is referred to Lindsay et al. [4], Chen et al. [5], and Qin and Liu [6] for recent examples of application. Besides, as a powerful tool to deal with missing data, WofE was also used to develop a spatially weighted logistic regression model for mineral prospectivity mapping [7]. A prerequisite for WofE modeling is that there is approximately conditional independence (CI) between the evidential layers with respect to the target layer [8]. It can be difficult to fully satisfy the CI hypothesis in practice. Various methods have been developed (1) to test for CI and (2) to overcome its effects if it exists. Pairwise - and -tests for CI were proposed by Bonham-Carter et al. [2]. Later, the omnibus test [3], new omnibus test [9, 10], and Kolmogorov-Smirnov test [11] were introduced for testing the CI hypothesis. In general, lack of CI can be reduced by combining interdependent explanatory variables with one another, or the conditional dependence (CD) problem can be circumvented by using a different statistical model. Agterberg [12] introduced the use of logistic regression (LR); Journel [13] proposed the Tau model which was refined by Krishnan et al. [14], Caumon et al. [15], and Polyakova and Journel [16]. These methods have been discussed in the comprehensive review paper by Schaeben [17]. Additionally, several weighted WofE and stepwise WofE models have been developed by Zhang et al. [18], Deng [19], Agterberg [20, 21], and Cheng [2224]. It should be kept in mind that the purpose of WofE and LR is prediction of probable locations of point events that have not been observed to exist. In general, the true total number of events in a study area remains unknown. If unknown events would occur in the same geological environments as known events, only the prior probability in WofE is affected by this lack of information. In general, WofE predictive maps are only valid in a relative sense, because true event occurrences patterns are either overestimated or underestimated. They are used to delineate prospective target areas with no or few known events within a study area. This approach of using data from the entire study area differs from the one in which the well-explored part of a study area is taken as training area, for prediction in a target area (cf. [25]).

Weights of evidence and logistic regression both belong to the family of loglinear models (e.g., [2631]). They are based on different assumptions: (1) ideally, WofE indicator patterns are conditionally independent with respect to the point pattern, and (2) in LR there are no interactions between the point pattern and two or more of the indicator patterns [12]. Compared to WofE, LR can be applied to avoid or weaken the effect of CD although it can remain affected by interactions between groups of three or more variables; LR necessarily results in a total number of estimated events (=sum of posterior probabilities for all predicted event occurrences) that is exactly equal to the number of known events [21, 32]. Because of these advantages, Agterberg [20, 21] and Deng [19] independently proposed to use LR to improve upon WofE results with lack of CI. Deng’s model does not result in unbiased estimates but it significantly reduces bias; in an example used for illustration by Agterberg [21], total bias in comparison with WofE results was reduced from 92% to 31% of the estimated total number of events when Deng’s method was used. Later, the theoretical error in Deng’s original assumption of unbiasedness was pointed out and described in detail by Schaeben and van den Boogaart [33]. Deng’s model diminishes lack of CI bias in WofE results but better results generally can be obtained by other methods such as boosting [2224] or by modified weights of evidence (MWofE) in which WofE and LR are combined with one another [21, 34]. MWofE was firstly proposed by Agterberg [21] to overcome the lack of CI in WofE modeling. It used positive and negative weights instead of 1 and 0 to estimate the coefficients in LR and then used these coefficients as correction factors for the weights of evidential layers in WofE, Therefore, it belongs to the weighted WofE models [34]. When the explanatory variables are presence-absence type and quantified from maps, the weighted logistic regression (WLR) generally is to be preferred over logistic regression because numerous “observations” with equal values for the explanatory variables must be combined with one another to form “unique conditions,” which are weighted according to their map areas [12]. The term “unique condition” describes the set of unit areas or pixels with identical values for all explanatory variables. The map areas for a unique condition either can be measured directly or approximated by counting pixels with the same values for all map patterns. The advantage of WLR is that the number of unique conditions () generally is much smaller than the number of pixels or unit areas (). For example, in a square study area, where can be as large as several millions so that becomes much too large for practical applications. On the other hand, is derived from the number of different input patterns () with . In general, , thus facilitating estimation of logistic regression coefficients in applications using an iterative process that involves successive inversions of () instead of () matrices assuming that all variables have been corrected for their arithmetic means before matrix inversions are applied. For example, if for 7 binary map patterns digitized for a square array with 1000 × 1000 pixels, the (256 × 256) WLR input matrix for the explanatory variables has 65·106 fewer elements than the corresponding LR input matrix with 1012 elements. Some software packages (e.g., recent versions of IBM SPSS) allow weighting of observations in a manner that is equivalent to WLR. However, in applications to very large data sets, it can be advantageous to perform a final convergence check by comparing the sum of all posterior probabilities with the number of known events. These two quantities should be equal to one another. If there is a significant discrepancy, iterations should be continued until full convergence is reached [35]. WLR should obtain the same results as LR if afrondingsprecisie is not considered.

WofE and LR produce the same coefficients when a point pattern is related to a single map layer [36, 37] but standard deviations of these coefficients as obtained by the two methods are different. When there are relatively few point events, LR is likely to produce smaller variances of the coefficients than WofE contrasts provided that the map patterns are nearly conditionally independent. This is because WofE variances of weights and contrasts are based on an assumption of asymptotic normality of maximum likelihood estimators [38], a condition that is unlikely to be satisfied when the number of observed point events is relatively small. It is noted here that the WofE contrast measures the strength of spatial correlation between a point pattern and an indicator pattern. If a WofE contrast would be standardized to the [] interval, it becomes Yule’s original coefficient of association for binary variables [38].

In general, differences in results obtained by two different loglinear models can be tested for statistical significance using the -test (cf. [12]). If there are map layers, for the difference in results obtained by WofE and LR is approximately distributed as chi-square with degrees of freedom. For example, if , and if . When is large, rapidly becomes very large, and other strategies for model comparison may be adopted. For example, Agterberg et al. [11] related a gold deposit pattern in Nova Scotia to seven map patterns that were approximately conditionally independent of the point pattern. In this situation, the resulting logistic regression coefficients were close to the WofE contrasts ([11], Table 1).

Logistic regression prevents lack of CI bias in situations where WofE does not. However, if the CI condition is not satisfied, individual regression coefficients normally acquire large variances because of near singularities in the matrices to be inverted when the iterative scoring (or any other) method is used for LR. Contrary to WofE weights, LR coefficients then cannot be used individually for interpretation or prediction purposes. This in itself can be a good reason to prefer usage of WofE. There are many different strategies for obtaining approximate CI. For example, patterns based on geochemical elements measured across a whole study area generally have strong mutual interactions with respect to occurrences of mineral deposits. Approximate CI then usually can be obtained by combining all elements into a single index either by multiple regression analysis (e.g., [39]) or by using scores for the first principal component of the correlation matrix (e.g., [40]; also see Section 5 in this paper). Even if CD exists, the new AdaBoost WofE method still can give good approximations provided that the number of patterns is not very large [24].

In the following section, it will be discussed that MWofE is equivalent to LR. A major advantage of MWofE is that, like WofE, it can handle missing data better than LR. This subject will be discussed in the remainder of this paper.

2. Why Modified Weights of Evidence Is Equivalent to General Logistic Regression

Equivalence of LR and MWofE will be reviewed and illustrated by two examples of application. In Agterberg’s study [20], it was pointed out that MWofE is based on a method originally introduced into the GLADYS medical expert system [41]. Weights of evidence in GLADYS usually resulted in biased posterior probabilities. For this reason, Spiegelhalter and Knill-Jones used logistic regression of their presence-absence data with explanatory variables for which the presence/absence data were replaced by positive and negative weights in order to eliminate this bias.

Schaeben and van den Boogaart [33] illustrated the fact that Deng’s [19] adjusted WofE model does not produce unbiased results by using a “fabricated” data set in which there are three explanatory variables (, , and ) and a binary dependent variable (). Table 2 shows the values for these variables in the first four columns. Probabilities () obtained by WofE are shown in the next column. They differ from probabilities () in the last column that were obtained by LR or MWofE, which produce identical results. In Section 4, it will be illustrated that similar results are obtained for a “practical example with fabricated training data set results” used by Schaeben [42].

Equivalence of LR and MWofE already had been illustrated in Agterberg [20, 21] for a relatively small rectangular study area of about 4.0 km2 on the East Pacific Rise with 13 volcanic vents that were related to depth below sea level, fissures, relative age, and composition of volcanic rocks. Input for this seafloor example is shown in the first five columns of Table 3. The rows in this table are for the 25 = 32 unique conditions of which the total areas (in units of 10 m × 10 m) are given in the sixth column. The dependent variable in the seafloor example of Table 3 is for the presence or absence of volcanic vents. The number of volcanic vents per unique condition is given in the column for . In total, there were 39,851 unit areas in this example, and WLR is to be preferred over LR for the reasons given in Section 1 when the scoring method is used to estimate the logistic regression coefficients. As in Table 2, the last two columns of Table 3 show WofE () and MWofE () probabilities for (10 m × 10 m) unit areas with the same unique conditions. The latter are equal to posterior probabilities obtained by WLR. In this example, there is strong conditional dependence of the five map layers [11]. This can be seen from the fact that the sum of all (=39,851) WofE posterior probabilities (=24.8) in Table 3 significantly exceeds the corresponding sum of MWofE probabilities (=13.0). Lack of CI can be assessed by estimating the variance of the sum of all WofE posterior probabilities followed by application of a simple -test or -test to determine whether or not the difference between the sum of all posterior probabilities and total number of known events is statistically significant [9].

3. Lack of Constancy of Statistical Parameters in Large Areas

One of the drawbacks of LR is that estimated regression coefficients may have large variances unless there is approximate CI. There are several other potentially serious disadvantages as well, especially if the method is applied naively to continuous explanatory variables for prediction purposes. Agterberg and Bonham-Carter [43] have systematically compared WofE and WLR predictions with one another in a number of experiments on the relationship between gold occurrences with geological, geochemical, and geophysical map data in the Gowganda area on the Canadian Shield in east-central Ontario. The results of these experiments clearly showed that discretization (reducing map patterns to binary or ternary form, before using them as explanatory variables) and integration of training and testing areas in mineral potential evaluation studies result in better predictions when logistic regression is used. The reason for this is that a primary requirement for validity of application of a mathematical model in nature is constancy of statistical parameters within the entire study area. Extrapolation from a training area to a testing area that is geographically distinct from the training area is possible but special arrangements have to be made to ensure approximate equality of statistical parameters for the explanatory variables (cf. [25]).

Because of the widespread uniqueness of geological phenomena that can rapidly change within the same study area, a single target population with constant parameters such as means, variances, and covariances often does not exist. The parameters of interest can change systematically from place to place but methods are available to reduce or eliminate the effects of systematic changes in the values of the parameters of interest. Some geological properties such as lithological composition or anomalies versus background are intrinsically binary. Discretization by Heaviside (0 or 1) transformation of nonbinary geochemical or geophysical variables can stabilize statistical parameters within a region. Another example of stabilization of statistical parameters is to replace observed data by residuals after regional trend elimination.

4. Map Patterns with Missing Data

Weights of evidence as well as logistic regression were originally developed by Good [44], Spiegelhalter and Knill-Jones [41], and many others for independent identically distributed observations. Mosaic-type map data that are quantities for vanishingly small unit areas or pixels constitute observations of a very different type. Nevertheless, WofE and WLR, including Spiegelhalter’s [45] refinement in GLADYS to handle missing data can be applied successfully to mosaic data. Geoscience patterns often have gaps for different reasons: geochemical lake or stream sediment data may be available for parts of a study area only (obviously, they are missing in subareas where there are no lakes or rivers). Biogeochemical data are restricted to areas where specific plants or trees occur. It may be that only a limited part of an area was mapped to obtain a particular kind of data; e.g., a geochemical survey may have been restricted to parts of an area. Also, exposure of bedrock can be uneven. Locally, it may not be known whether or not a rock type is present. In situations of this type, the binary presence-absence pattern of one or more explanatory variables has gaps that can be quantified separately so that the binary pattern becomes a ternary pattern with separate states for presence, absence, and unknown. WofE and MWofE can cope with missing or unknown data and will first be illustrated in the following simple example.

Tables 1 and 4 show results for an artificial example with a square study area ([42], Figure 1) that contains (10 × 10) regularly spaced data points located at the centers of square cells for which presence or absence is assumed to be known for two rock types ( and ) and for the point event of interest. These data are shown in Table 4 together with probabilities of occurrence estimated by WofE and MWofE. As in Tables 2 and 3, MWofE results are identical to those obtained by LR. These probabilities are the same as those previously reported by Schaeben [42].

As a variant on this artificial example, we assumed that information for was incomplete in that it was not available for the 30% of the study area on the left side of Schaeben’s Figure 1. This means that the binary pattern for becomes a ternary pattern with three states (yes, no, and missing). Both WofE and modified WofE were applied to this new artificial data set with the results shown in Table 1. In this application, Spiegelhalter’s [45] method for dealing with missing data was applied twice: initially to obtain the WofE weights and again when MWofE was applied.

The example of Table 1 illustrates how missing data problems can be solved by recognizing “missing” as a separate state that requires special consideration. Other possible methods to apply LR when there are missing data for a predictor variable were discussed in detail by Agterberg and Bonham-Carter [46]. In their paper, four methods are applied to the previously mentioned example of gold deposits in Nova Scotia: (1) the variable with missing data is deleted; (2) the missing data are replaced by zeros for absence; (3) all polygons or subareas with missing data are deleted; and (4) the missing data are replaced by a mean value for the same variable estimated in the part of the study area where it is known. It was concluded that patterns of posterior probabilities resulting from methods (1) and (3) were relatively poor in comparison with patterns resulting from methods (2) and (4). A motivation for the development of MWofE was to provide an even better method that can remedy missing data problems in the same way as WofE [21].

5. Case History Study of Gold Deposits in Southwestern Nova Scotia, Canada

This larger-scale example builds on a study of weights-of-evidence modeling originally published by Cheng [8]. Geology, occurrences of gold deposits, and locations of lake sediment samples are given in Figure 1 (after [8], Figure 1). The geochemical lake sediment data consisted of 671 samples with concentration values for 16 elements (Cu, Pb, Zn, Ag, F, Li, Nb, Rb, Sn, Zr, Ti, Au, Sb, As, Th, and W) originally published by Rogers et al. (1987). Sampling density was about 1 sample per 5 km2. Results of principal component analysis and application of high-pass and low-pass filtering to principal component scores are described in detail by Cheng [8].

Originally, four binary maps (cf. [8], Figure 2; Table 2) were used as input for WofE. The same input data were used by us for the current study, although there were slight differences in choice of binary maps and resolution. Here, we used: (A) proximity to traces of anticlines; (B) proximity to contact between Goldenville and Halifax Formations; (C) high-pass filter applied to scores of first principal component; and (D) low-pass filter applied to scores of first principal component. Figure 3 shows WofE results in comparison with LR results for the same four input layers. These results are for experiment 1 in a series of five experiments of which the other four experiments will be described later. Our Figure 3(a) differs slightly (but not significantly) from the earlier WofE map of Cheng [8]. The two maps in Figure 3 for the 20 gold deposits are similar in appearance. However, from the legends, it can be seen that estimated posterior probabilities of gold deposits are much larger in Figure 3(a) than in Figure 3(b). On average, the WofE probabilities are more than twice as large as the LR probabilities. The sums of all posterior probabilities were 46.9 (WofE) and 20 (LR), respectively. LR results are unbiased in the sense that this sum is exactly equal to the number of gold deposits used. The WofE results are systematically too large due to violations of the conditional independence (CI) assumption. It is noted that, even if the WofE would be corrected for bias due lack of CI, both totals would remain too low because of undiscovered gold deposits. Such bias is likely because intensity of exploration varied greatly across the area. Also, undiscovered gold deposits may not only occur near the surface of bedrock in relatively unexplored parts of the area, but probably everywhere in favorable environments at greater depths.

Student’s -test (cf. [38]) can be used to measure the strength of spatial correlation between known deposits and prognostic contours as shown in Figure 4. Every -value in Figure 4 is for a comparison of the contrast in two subareas as follows. One of these subareas consists of all points within a given distance of one or more gold deposits. Its area is shown along the horizontal axis in Figure 4. The other subarea is simply for the complement representing the remainder of the study area. Due to spatial correlation effects, the number of degrees of freedom for -tests of this type is not known exactly. However, for a one-tailed -test with significance level set at 0.05, values greater than approximately 1.645 would indicate positive spatial correlation between the point pattern for deposits and the patterns of posterior probabilities for WofE or LR. Obviously, there is a relatively strong spatial correlation between the pattern of gold deposits and either one of the posterior probability maps shown in Figure 3.

The results shown in Figure 3 are for experiment 1 from a series of five experiments conducted to compare WofE, LR, and modified weights-of-evidence (MWofE) for the 20 gold deposits in this study area. In the other four experiments, the study area was divided into two parts according to the east-west line in Figure 5 showing gold deposits and pairs of binary layers as were used in the four experiments. Experiments 2 to 4 used deposits in the southern part of the area () only for prediction in the northern part (), and all deposits were used in experiment 5.

Results for experiments 2 and 3 are shown in Figures 2 and 6. The difference between these two experiments is that four binary layers were used as input from the entire study area () for Figure 6 and from subarea only for Figure 2. Both WofE and LR could be used for experiments 2 and 3. It could not be used for LR in experiments 4 and 5 because geochemical information in northern half of the study area was assumed to be missing. Contrary to LR, MWofE could be used in experiments 4 and 5 because, like WofE, MWofE can cope with missing data. The geochemical input layers in these two experiments were assumed to be ternary instead of binary. The difference in modeling then is that WofE immediately produces a result for the two binary and the two ternary input layers, but for MWofE the ones and zeros used in the WofE input layers are replaced by weights for all four independent variables using as much information as was available for each of them. As in WofE, zeros were used as evidential weights in MWofE for subarea where the two geochemical input layers were assumed to be missing. The difference between experiments 4 and 5 is that both areas and were used as training area in experiment 5 whereas only deposits in area were used in experiment 4. Results for these two experiments are shown in Figures 7 and 8, respectively.

Table 1 shows the expected number of deposits (EDN) for all models in the five experiments. If there are no missing data, the sum of posterior probabilities for LR is exactly equal to the number of deposits, whereas the WofE sum is larger due to violation of the conditional independence assumption. On the other hand, for MWofE (experiments 4 and 5), the expected number of deposits in the training area is equal to the number of deposits in such area that were assumed to be known.

Tables 5 and 6 show the weights and contrasts with standard deviations for WofE results as shown in Figures 7(a) and 8(a) followed by MWofE regression coefficients with their standard deviations. As in experiments 1–3, the WofE weights in Tables 5 and 6 are too large. The corresponding contrasts with standard deviations also are too large because of the lack of conditional independence. Moreover, the standard deviations of the contrasts were derived by using the standard asymptotic formula which is not necessarily satisfied. The MWofE coefficients as well as their standard deviations are probably more realistic.

6. Concluding Remarks

Equivalence of the modified WofE and logistic regression was discussed in Section 2. However, the question can be asked about whether or not there are advantages in using modified WofE given that this method and logistic regression produced the same results for the three examples (Tables 24). There are three potential advantages: (1)Most existing LR computer software/programs including those written in do not offer satisfactory remedies for missing data, because they require input values for the explanatory variables at all data points and do not allow for data gaps. The problem is that if data are missing for one or more patterns at a given point, other data may have to be deleted as well before LR can be applied. This is because the unit areas or pixels used for quantification of map patterns are exceedingly small. They are subject to strong spatial correlation simply because they can be made arbitrarily small so that neighboring pixels for any pixel probably belong to the same unique condition. An example of missing data with predictions for the entire study area without deletion of data blocks was given in Figures 7 and 8 and Tables 1, 57.(2)Variances of logistic regression coefficients and posterior probabilities are likely to become more precise than variances of WofE weights, contrasts, and posterior probabilities, because the latter are based on an assumption of normality for maximum likelihood estimators that can provide poor results, especially if there are relatively few discrete events for the dependent variable. However, a requirement for this advantage would be that the input map patterns are approximately conditionally independent with respect to the discrete event pattern. Otherwise, the variances of individual logistic regression coefficients can become very large, although the corresponding variances of posterior probabilities would not suffer from this drawback. This potential advantage as well as the previous one will have to be tested further in other experiments and applications.(3)WofE is easy to understand and MWofE retains this feature. Although the coefficients in LR obtained through maximum likelihood estimation are used in MWofE to adjust the positive and negative weights for all evidential layers [34], the MWofE almost maintains the framework of WofE, which confirms to the geologists’ intuitive understanding in mineral prospectivity mapping.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study benefited from a joint financial support by the Programs of the National Natural Science Foundation of China (nos. 41602336 and 71503200), Postdoctoral Science Foundation of China (nos. 2016M592840 and 2017T100773), Natural Science Foundation of Shaanxi Province (no. 2017JQ7010), and the Fundamental Research Funds for the Central Universities (no. 2017RWYB08). The first author thanks former supervisor Dr. Qiuming Cheng for the discussions about spatial weights and for providing constructive suggestions.