Skip to main content
Log in

Validation of a bayesian belief network representation for posterior probability calculations on national crime victimization survey

  • Published:
Artificial Intelligence and Law Aims and scope Submit manuscript

Abstract

This paper presents an effort to induce a Bayesian belief network (BBN) from crime data, namely the national crime victimization survey (NCVS). This BBN defines a joint probability distribution over a set of variables that were employed to record a set of crime incidents, with particular focus on characteristics of the victim. The goals are to generate a BBN to capture how characteristics of crime incidents are related to one another, and to make this information available to domain specialists. The novelty associated with the study reported in this paper lies in the use of a Bayesian network to represent a complex data set to non-experts in a way that facilitates automated analysis. Validation of the BBN’s ability to approximate the joint probability distribution over the set of variables entailed in the NCVS data set is accomplished through a variety of sources including mathematical techniques and human experts for appropriate triangulation. Validation results indicate that the BBN induced from the NCVS data set is a good joint probability model for the set of attributes in the domain, and accordingly can serve as an effective query tool.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. The incident-based file includes five property crimes: Attempted/completed purse snatching and pocket picking; Burglary; Attempted forcible entry; Attempted forcible entry; and Attempted/completed theft.

  2. For proper processing by WEKA, each discrete value was converted into a nominal value by introducing an “x” as a prefix to the represented value.

  3. The use of the GUI interface: Explorer is appropriate for the user not familiar with command line interface. The GUI interface becomes quite costly in terms of memory usage on the computing platform.

  4. The class attribute “Victimization” is variable V4529, which represents a list of 16 personal (violent) crimes and property crimes as defined by the NCVS. The V4529 was selected as the class attribute due to the general interest focused on the end-crime committed and the inherent crime classification nature of the variable itself.

  5. The complexity of model should be kept as low as possible. However, since BBN by definition approximates a joint PDF, we would desire the approximation quality to be good: this is achieved by a slightly more complex model through a greater value for “the number of parents” variable.

  6. Building a single Bayes net and computing inferences for all types of queries through this one structure may not be optimal. It might be beneficial to learn a query-dependent Bayes net structure on the fly using a fast structure learning method assuming that the high computational cost can be managed effectively.

  7. The selection of the V4529 attribute as a class attribute was driven the author’s and experts’ interest in classifying the data based upon the “end crime” committed. It is understood that other class attributes may provide interesting models and final results; however, with over 200 attributes and the extensive processing time in training and building, a single constant class attribute provided a constant for comparison in developing the final model.

  8. NCVS MSA values for V4529 define x60-x71 as ‘Personal Crimes’. All but x71 are defined specifically as ‘Violent Crimes’. For the purpose of distinguishing from ‘Property Crimes’, values x60-x71 will be classified as ‘Violent Crimes’.

  9. “The public at large, including experts in all walks of life, are known to be very bad at even simple probabilistic reasoning. Hence, we feel strongly that the public at large (and crucially, this includes influential judges) will never really understand complex Bayesian arguments presented from first principles. But then the public at large would not be expected to understand the relevance of forensic evidence if it was presented from first principles either. Hence, we feel that the key to a broader acceptance of Bayesian reasoning is to be able show all of the implications and results of a complex Bayesian argument without requiring any understanding of the underlying theory or mathematics, and without having to perform any of the calculations.” (Fenton and Neil 2000].

  10. (See http://www.spss.com/press/template_view.cfm?PR_ID=743).

  11. V2116 HOUSEHOLD WEIGHT, V2117 PSEUDOSTRATUM CODE, V2118 SECUCODE: HALF SAMPLE CODE.

  12. V3025 MONTH INTERVIEW COMPLETED, V3026 DAY INTERVIEW COMPLETED, V3027 YEAR INTERVIEW COMPLETED, V3080 PERSON WEIGHT, V3010 PERSON LINE NUMBER.

References

  • Bray T, Paoli J, Sperberg-McQueen CM, Maler E, Yergeau F (2006) Extensible markup language (XML) 1.0 (4th edn)—origin and goals. World Wide Web Consortium. Retrieved on October 29 (September 2006)

  • Bouckaert R (2005) Bayesian network classifiers in WEKA. Technical Report, Department of Computer Science, Waikato University, Hamilton, NZ

  • de Campos LM, Fernández-Luna JM, Puerta JM (2003) An iterated local search algorithm for learning Bayesian networks with restarts based on conditional independence tests. Int J Intell Syst 18:221–235

    Article  MATH  Google Scholar 

  • Catalano SM (2004) Crime victimization 2004. September 2005, NCJ 210674. http://www.ojp.usdoj.gov/bjs/cvict_v.htm#page

  • Catalano SM (2005) Crime Victimization 2005, September 2006, NCJ 214644. http://www.ojp.usdoj.gov/bjs/abstract/cv05.htm (See also http://www.paralumun.com/issuesrapestats.htm)

  • Chickering DM, Geiger D, Heckerman D (1995) Learning Bayesian networks: search methods and experimental results. In: Preliminary papers of the fifth international workshop on artificial intelligence and statistics, pp 112–128

  • Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Mach Learn 9:309–348

    MATH  Google Scholar 

  • Cozman FG (2007) JavaBayes—Bayesian networks in Java. Online access at http://www.cs.cmu.edu/~javabayes/index.html. Cited May

  • Fenton N, Neil M (2000) The “Jury Observation Fallacy” and the use of Bayesian networks to present probabilistic legal arguments. Math Today-Bull Inst Math

  • Hart TC, Rennison C (2003) PhD, Bureau of Justice Statistics, “Special Report”, March 2003, NCJ 195710. http://www.ojp.usdoj.gov/bjs/abstract/rcp00.htm

  • Heckerman D (1997) A tutorial on learning with Bayesian networks. In: Jordan M (ed) Learning in graphical models, MIT Press, Cambridge, MA, 1999. Also appears as Technical Report MSR-TR-95-06, Microsoft Research, March 1995. An earlier version appears as Bayesian networks for data mining, Data Min Knowl Disc 1:79–119

  • Heckerman D, Geiger D, Chickering DM (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20:197–243

    MATH  Google Scholar 

  • Madden M (2003) The performance of Bayesian network classifiers constructed using different techniques. In: Working notes of the ECML/PKDD-03 workshop on probabilistic graphical models for classification, pp 59–70

  • Mitchell TM (2006) The discipline of machine learning. July 2006 CMU-ML-06-108 School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213

  • SAS (2004) The original data for this paper was generated using SAS software. Copyright, SAS Institute Inc. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc., Cary, NC, USA

  • U.S. Dept. of Justice, Bureau of Justice Statistics (2005) 2005 Summary findings. http://www.ojp.usdoj.gov/bjs/cvict_v.htm#page

  • U.S. Dept. of Justice, Bureau of Justice Statistics (2007) National crime victimization survey: MSA DATA, 1979–2004 (Computer file). ICPSR04576-v1. Conducted by U.S. Dept. of Commerce, Bureau of the Census

  • ICPSR, Inter-university Consortium for Political and Social Research, Ann Arbor, MI. http://www.icpsr.umich.edu/cocoon/NACJD/STUDY/04576.xml

  • Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco

    MATH  Google Scholar 

Download references

Acknowledgements

Authors wish to express gratitude for their contributions as domain experts for validation of the BBN model of the NCVS data to Gabriella Davis, J. D., and Lois A. Ventura, PhD. Anonymous referees contributed greatly towards improvement of this manuscript, which is gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gursel Serpen.

Appendix

Appendix

1.1 National crime victimization survey

The NCVS has 259 attributes and uses a labeling system composed of letters and numbers to represent those variables. Variables are partitioned into so-called variable groups, which are labeled as VG followed by the number of the group. All of the variable groups are followed by F2 representing the use of the second data file, namely the Incident file.

1.1.1 VG1F2 identification variable

There are eight identification variables used by the NCVS to track and label the data as shown in Table 15. For the purposes of this study, there appeared to exist little or no interest in the seven of the eight NCVS identification variables and those variables were removed from the data set. Only the MSA identification variable, i.e. MSACC, will be retained as it labels the 40 largest MSAs, geographical areas. Although, V2003 initially showed some promise, the representation of the time vs. crime trend was more properly included in the later incident variables V4014 and V4015. Therefore, V2003 as an identification variable was removed and V4014 and V4015 remained to represent the month and year of the incident, respectively. With the removal of these 7 variables, the dataset now has a total of 252 variables.

Table 15 NCVS identification variables

1.1.2 VG2F2 household variable

The household variables represent the information specific to the household of victims reported in the NCVS. As shown in Table 16, there are 11 household variables, among which three were not considered for this study.Footnote 11 The three variables that were removed deal with NCVS coding and weighting factors for the household variables. In this study, our findings are not dependant on NCVS coding system and these variables are therefore not relevant. The further removal of these three household variables leaves the dataset with 249 variables.

Table 16 NCVS household variables

1.1.3 VG3F2 person variable

The person variables represent the information specific to the persons that were a victim of crime reported in the NCVS. There are a total of 15 variables reported in the NCVS as person variables as listed in Table 17. With regards to the person variables, there are five variables that were found to be not relevant for this study.Footnote 12 One of the five simply was an identification variable. Another three variables had to do with the day, month and year of the interview. The fifth variable was used by the NCVS as a weighting factor for the person variables. With the removal of these five variables, the revised list is now left with a total of 244 variables.

Table 17 NCVS person variables

1.1.4 VG4F2 incident variables

The final set of variables encompasses the largest group of variables and deals exclusively with information regarding the specific incident of victimization as reported in the NCVS. A total of 19 VG4F2 incident variables were removed from the final list due to redundancy.

All the variables used in this study were converted into nominal values based upon the discrete data or data ranges provided by the NVCS data file. The nominal value representation was required by certain Bayesian network inductive learning algorithms embedded in WEKA. For example, the NCVS data includes numeric values for representation of surveyed answers such as “1” meaning “Yes” and “3” meaning a certain age range. As such, the values for each variable were converted to nominal by prepending with the character “x”, which, for instance, changes the value of “1” to “x1.” The integrity of the data was not affected; however, the limitations of certain WEKA algorithms were addressed by using all “nominal” values.

1.2 An introduction to bayesian belief networks

Bayesian belief networks can be expressed as directed graphs in which each variable is represented by a node, and causal or conditional independence relationships are denoted by an arrow, called an edge (Heckerman et al. 1995), or the lack of it. A node represents a variable or attribute from the the domain being modeled. A node is often represented graphically by a labeled oval. An edge or arc between two nodes represents a causal or conditional dependence relationship between two corresponding variables. It is represented graphically by an arrow between nodes; the direction of the arrow indicates the direction of causality (or conditional dependence); which points from parent nodes (causes) to child nodes (effects). Every node also has a conditional probability table (CPT) associated with it. Conditional probabilities represent posteriors based on prior information or past experience. Beliefs are the probability that a variable will be in a certain state based on the addition of evidence in a current situation.

The main steps of constructing a Bayesien belief network (BBN) are as follows (Heckerman 1997). Each variable in a dataset is assigned a node. Using expert opinions, prior knowledge or domain-specific data, causal or conditional independence links between parent and child nodes are defined. If conditional independence exists, no link is associated between the independent nodes. Once the links are defined, the CPTs for each node may be computed, noting that the conditional independence relationships will determine the complexity of the CPT. Therein lies the distinct advantage of approximating the full joint distribution using a BBN. Once the CPTs are defined for each node, queries may be posed on the network. However, if there is more evidence (i.e. data) the process continues, and the links and CPTs are updated to accommodate the new information. The common notation used for denoting the posterior probability of a hypothesis H given the collection of evidence \( E_{1} ,E_{2} , \ldots ,E_{n} \) is as follows:

$$ p(H|E_{1} ,E_{2} , \ldots ,E_{n} ). $$

In many of the examples illustrated in the paper, it can be seen that V4529 was used as the hypothesis, while the attributes and values of said attributes were changed in order to update the evidence provided, and thereby calculate a new posterior probability for the hypothesis V4529.

Next an example illustrates the creation of a BBN. Hypothetically, we can model a situation where reporting a crime is casually related to gender. Given the gender of a victim what is the probability that an individual will report a violent crime? Assume for this example that gender and reporting of crime are not independent events with isolated probabilities. If an individual is a male, it is more likely that they will report a crime. Solving such a problem involves determining the chance that a victim of a violent crime will be a specific gender (i.e. male), and then determining the chance that the same individual will report the crime conditional on the probability that (s)he is of gender. These are known as “joint probabilities.” Suppose that, p(gender is male) = 0.20; and p(reporting crime given gender is male) = 0.70. Assuming that the random variable x 1 represents the “gender,” and the random variable x 2 represents the “reporting crime”, the probability of such joint events is determined by

$$ p(x_{1} ,x_{2} ) = p(x_{1} ) \times p(x_{2}|x_{1} ), $$

where \( p(x_{2}|x_{1} ) \) means the probability of x 2 given a specific value assignment for random variable x 1. Working out the joint probabilities for all eventualities, the results can be expressed in a tabular format as in Table 18.

Table 18 Hypothetical joint probabilities for reporting of violent crime

From Table 18, it is evident that the joint probability of a male victim reporting a crime is 0.14. This same scenario can be expressed using a tree diagram as shown in Fig. 1. One attraction of BBN is the efficiency that only one branch of the tree needs to be traversed. We are really only concerned with p(x 1), \( p(x_{2}|x_{1} ) \) and p(x 1, x 2). We can also utilize the graph to determine which parameters are independent of each other. Instead of calculating four joint probabilities, we can use the independence of the parameters to limit our calculations to two. If we are concerned with finding the probability of a male victim reporting a crime, the result is independent of whether a female would or would not report. This independence relationship is illustrated in Fig. 2. If the independence relationship holds true, it is apparent that the computational costs of posing a conditional query is greatly reduced. In larger datasets, such as the NCVS dataset, the computational cost difference between using a full joint distribution vs. the BBN which employs conditional independence relationships, for query processing, are staggering.

Fig. 1
figure 1

The tree diagram for reporting of violent crime

Fig. 2
figure 2

Demonstrating independence relationships in the tree diagram

Rights and permissions

Reprints and permissions

About this article

Cite this article

Riesen, M., Serpen, G. Validation of a bayesian belief network representation for posterior probability calculations on national crime victimization survey. Artif Intell Law 16, 245–276 (2008). https://doi.org/10.1007/s10506-008-9064-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10506-008-9064-6

Keywords

Navigation