.. f Knowledge Bases and Neural Network Synthesis Todd Davies Artificial Intelligence Center SRI International :3:33 Ravenswood A venue Menlo Park, California 9402.5 USA DAVIES@ALSRI.COM Prob lem-specific Information decompile BB compile NN Figure 1: Iterative Compiling and Decompiling. Abstract We describe and try to motivate our project to build systems using both a knowledge based and a neural network approach. These two appl'oac.lies a.re used at different stages ill the solu~ion of a. problem, instead of tising knowledge bases exclusively on some problems, and neural nets exclusively on others. The knowledge base (KB) is defined first in a deciarative, symbolic language that is easy to use. It is then compiled into an efficient neural network (NN) representation, run, and the results from run time and (eventually) from learning are decompiled to a symbolic description of the knowledge contained in the network . After inspecting this recovered knowledge, a designer would be able to modify the KB and go through the whole cycle of compiling, running, and decompiling again, as illustrated in Figure 1. The central question with which this project is concerned is, t.herefore, How do we go from a KB to an NN, and back again? We are investiga.ting this question by building tools consisting of a l'epertoire of liwguage/translation/network types, (l,nd trying them on problems in a variety of domains. 1 Features of Neural Networks and Knowledge-Based Systems Attempts to build intelligent machines have historically divided into two broad types. One has emphasized the use of recursive programming languages like Lisp and Prolog, sy mbolic data structures, a.nd the declarative representa.~ion of knowledge (Webbe!' and Nilsson, 1981) . The other (anLhologized in Anderson and Rosenfeld, 1988) has focused on bounded state machines, mathematical models of neu rill circuits, and statistical learning a.lgorithms. These two schools of thought are sometimes called "symbolism" and "connectionism", respecti vely, or the "knowledge-based" (KB) approach and the "neural networks" (NN) approach. Some problems appear to be more appropriate applications of one approach or the other. For instance, very high level reasoning of the kind often performed by experts wi th lots of book knowledge (e.g., mathematicians, medical doctors) seems at present more suited to a KB approach (e.g., Buchana.n and Shortliffe, 1984; Bundy, 1983), while low-level vision and signal-processing tasks appear better for the NN approach (e.g., Lapedes and Farber, 1987; Mead a.nd 11'1 ahowald, 1988). But there is a la.rge array of problems in between, such as the more knowledge-intensive perceptual ta.sks, natural language tasks, and common sense reasoning. We refer to problems in this general class as "knowledgeintensive problems" or "KIPs." KIPs have been treated as a battleground, with both KB and NN researchers claiming that their approach is the best for these tasks. Some of the enthusias~ for the NN school that ha.s been rekindled in this decade can be understood as a. response to disadvantages of the KB approach for which NNs seem to provide a corresponding advantage. The main advantages of NNs arc the following. 1. They are fast classifiers. Most fixed weight connectionist networks compute an answer for a given set of inputs very quickly, partly through parallelism (trading space for time), but mostly through a closed world assumption that maps every input to an output within tight temporal bounds. 2. They learn. Neural networks can change their weights and/or connections, which makes them self-modifying programs . This can save labor when it is easier to present a set of training examples than to program the network or write a set of rules for solving a problem. 3. They are probabilistic. Neural networks learn a.nd classify using statistical optimization criteria, so they can deal well with uncertainty. Specifically, they cope well with conflicting a.nd incomplete information. 4. They are global. Neural networks' high degree of connectivit.y 1l13kes them well suited to problems involving a. great deal of impinging evidence, for example when all of the data about a situation must be taken into account. N a.tllrally occurring problems are often of this type, particulary KIPs. 5. They are fault-tolerant. Part of the extra space and connectivity used by neural networks is informationally redundant in a way that permits graceful degradation when parts of the network ma.lfunction or when there is noise in the inputs. These advantages of neural networks all address specific disadvantages often attributed to the KB approach, such as the slowness of inference, the need for too much labor to ellter all the knowledge, the rigidity of binary distinctions, and the often unrea.listic requirements that problems be nea tly decomposable and t.ha.t local information maint.ain perfect. integrity. But neural networks themselves have several prob* lems associated with them. Specifically, 1. Slow learning. Neural network learning algorithllls T. Davies 717 slich as back-propagation can take quite long to converge, alld although efforts to develop faster algorithms are under I\'~y, there appear to be important limits to these speedups. The convergence rate depends on specifics of the problem being worked on (Minsky and Papert, 1988). 2. Spatial combinatorics. NN architectures differ in their space requirements, but nearly all of them grow quite ra.pidly with the size of the problem if we try to solve for the general case, since they usually trade time complexity for space complexity. 3. Local minima. Neural nets can get trapped in suboptimal states in the space of activation or of weights. Much of the problem can be eliminated if we choose a judicious input-output representation, but as the problem gets larger and more complicated this can be increasingly hard to do, a.nd it becomes apparent that neural networks fail to eliminate the need for careful thinking about the problem. 4. Poor generalization. The network can fail to correctly cla.ssify new input patterns even when it has learned all of the correct classifications for the training set. This is an important shortcoming because for problems of reasonable si ze, it is impossible to present all of the input patterns. 5. Output underdetermination. It is usually not possible to solve a. real problem u.sing input data from just one sensor or other input source. This underdetermination can arise for a number of reasons. For instance, the context can be important, or the input source may be a. limited perceptual window on the problem, or background information whose acquisition is separated in time from the current input can be crucial to a.n interpretation . The technical efforts to improve neural networks have so far focused mostly on the first few problems noted above, like learning speed and network size. Yet these do not appear to be the most serious barriers to 'the widespread use of neura.l networks. Moreover, the approaches that have been taken in trying to solve them (as well as the few efforts undertaken LO ad dn:s. local minima. and gene ralization) have almost all been based 0 11 gene.ral properties of networks, e.g. learning algorithms that a re fas ter in general, simulated annealing algoriLhms fO l' escaping local minim a, or size considerations that apply to all problems. It looks as though a great deal of effort aimed at solving problems in this way could provide us with algorithms and architectures as generally well-tuned as possi ble but still prone to serious errors to whi ch people will be very sensIt ive. These errors would come from those paTLs of tile pr6blems stated above (in cl udlng part of lhe speed p l'*obl em , <1 good deal of the size, local minim a, <1ud generalizatioll 1 roblel11s, an d possibly all of the underdetermin a ti.oll problem) which are intrinsically unsolvable using general approaches. Because the difficulties arising with neural networks depend grea.tly on the particular problems to which they are bein g applied-some problems mapping easily onto an effici ent architecture, a.nd others not-the real solutions to the gen eral problems with neural networks must vary with the specific applicatioll, invol ving principles for taking adva.ntage of what we know aboll t particular tasks to which the networks will h e applied. T hi s approach requi res us to nse knowJedge lhat We 11 ave a bou a problem in selecting network st.ructures and in itia l I'alucs for parameters. In other word s, it requires that Ive do some initial proglamming of the network. Th is leads U~ to three additional problems with neural networks th a.t affect our ability to program and understand (or verify) them: G. Lack of locality. The representation of knowledge in neural networks is global, and this creates problems for building knowledge into them. In general, we cannot simply build 718 PRICAI '90 links between nodes incrementally without worrying about how such links fit into the entire problem representation. 7. Restrictive syntax. To ensure the nice computational properties mentioned earlier and to ensure convergence in dynamic networks, network architectures have restrictions on the types of connections allowed. Sometimes our knowledge of evidential relationships simply does not obey such restrictions, although once expressed it can usually be recast in a form that does obey them. 8. Semantic obscurity. WJlell we talk a.bou L problems in ordinary language (or even in a formal symbolic language), we do not use terms like weights, energy, and the like, so it is not obvious how to map knowledge so expressed onlo a network representation. Hopes for verification procedures also must rest partially on semantic understanding, since empirical tests on a limited set of examples can be risky in real situations . Considera.tion of the general problems 1-5 led us to conclude that we need to make substantial use of our knowledge about a pa.rticular task and domain, and build it into the network 's structure and inillal state. Problem~ 6-8 suggesl LIla! it may be too difficult to do this directly. Instead It appears t hat we should make use of tbe I<B a.pproach in some way, since it is geared toward solvi ng the programmillg alld veri fi ca.tion problems we have with neural networks. Specifically, building knowledge bases in a sentential langua.ge gives us t he following advantages: (a) It gi ves us a convenient way to enter what we know about the task and domain a whatever level of detail we seem to have in mind. For instance, we can say simply that proposition A supports proposition C as we would in a production system of heuristics or a truth maintenance system, without specifying some numerical probability, or we can specify exact probabilities if we want . (b) It eases the nonlocality problem (number 6) by giviI}g us 8. way of stating axioms or const raints somewhat Independent1y, with the usual concerns a.bout consistency of the knowledge base. (c) It gives us a much less restrictive, more natural syntax than the ones required for neural networks, with which we can set forth the facts of the problem. And (d) it removes us from the semantic obscurity of neural networks by giving us a language (chosen from the repertoire of AI knowledge representation formalisms) with a well-understood semantics . So the KB approach to AI has a lot to offer as a solution to the problems that plague neural networks, and, as argued earlier, neural networks nicely complement the KB approach. In particula.r, networks that overcome the general difficulties with the NN approach must embody, in their initial configuration, knowledge about the particular tasks they are to perform, and defining that knowlege in a design phase is what the KB approach is geared for. At the same time, the learning capabilities of neural nets lessen the amount of knowledge that must be defined, and the other features of neural networks, namely their soft, holistic inferencing, and run-time speed, help to break down the traditional barriers to using large knowledge bases for solving real problems. An obvious way to combine the approaches would be to define our knowledge in a KB first, possibly using the AI tools that have been built over the years such as theorem provers and other inference engines, and tools for entering knowledge. Then, when we felt that we had a good theory of the task and its domain, we could convert the KB into an appropriate neura.l network which would embody the knowledge contained in the KB. After network learning, we could try to verify the network for correctness by looking at what knowledge it has learned, and the easiest way to do this would be to construct a KB from the NN itself. The difficulty, then, and the main problem to which this project will be addressed, is How do we go from a KB to an NN, and back again? In other words, how can a network be made to embody, or be interpreted as, declarative knowledge? David Marr and many others in the cognitive sciences have noted the usefulness of viewing problems in AI at two levels of analysis, called by him the "computational level" and the "algorithmiclevel" (Marr, 1982)1 The basic idea is that solutions at the computational level must specify only the constraints involved in a problem (e.g . enough facts about it to uniquely determine a solution) without meeting the constraints of resource availability for carrying out the computation. At this computational level a theory for solving a problem is like (and may be literally) a logical theory, i.e. a set of axioms and its consequential closure. Solutions at the algorithmic level, on the other hand, must cope with resource constraints as well. Thus, at this level we must specify how conclusions are to be drawn from, for example, a set of axioms, and by necessity some conclusions in the consequential closure will be left out . fhe resources required for computation at the algorithmic level can be divided into four kinds : time, space, labor, and data. In a Turing machine, these correspond to the number of moves, the amount of tape, the complexity of the finite control, and the length of the inputs .2 Presently in computing we have, as a rule, less time and labor than we need, and more space and data than we can use, with a few exceptions. For the resource requirements of KIPs, the NNs approach seems well-suited because it is designed to be fast and to learn autonomously rather than to be programmed (minimizing time and labor), while consuming lots of processor power and lots of data for training (sometimes too much-see the next section) . But at the computational level, KIPs require lots of empirically-derived constraints or knowledge to specify a solution, so a knowledge-based approach, as its name suggests, seems appropriate at this level. The idea behind our work is that designing systems to solve KIPs can be decomposed in just this way, with the knowledge-based approach and the neural nets approach operating at different levels and complementing one another . The resulting requirement is for a system that relates the two ãproaches in an approapriate way. 2 Our Approach: Knowledge Compilation and Recovery 2.1 Describing the Approach We use declarative languages that are both convenient and expressive enough to define at least partially the evidential relationships of a problem. This is the computational level. At the algorithmic level, we use parallel networks, possibly incorporating learning, so that the theory defined at the higher level can be computed as efficiently as possible at run-time under the requirements for correctness. Learning will ease IThi. dist inction is really identlca.! to those of Noa.m Chomsky (coml'e l.Cnce VB. performance), John McCa.rlhy a.nd Pa.trick Ha.yes (epis temologica.! V!. heuri stic adequacy), Herbrut Simon (substan t ive VB . procedu ral ra.tionility ) , Daniel Dennett (intentional stance VB. subpersonal stance), Allen Newell (knowledge level VS. symbol level) , a.nd Hector Levesque (content VB. form) . 'The [our. lesources divide in to two natura.! cluster!, with t he minimal req uirements for them in a given computalion being (h'c s ubjec\.6 of com plexity and In[ormalion t heory. SplLce and li me (t he subjecu of compulationi!.l complexi ty theory) ca.n be traded off one for Lhe olher, <IS can program all(l data (measured by I<olmogorov com plex itx il-nd Shannon '. en t ropy, rcspectively. in information theory) . In I\ddition , the computational requi.rements (space-time) Lyade off ag;linst the inform. tion a.! requirements (program-d"t.,, ) in '''lLyo thaL u e just beginning to be stud ied t heoreti cally. the labor burden thal would ordina rily (al l at the computational level. And relating . he two levels we use procedures of compilat ion and decom pilation from the language to t he networks and back again. T he com pile and decompile procedures ueed to be automated because it is of Len far from transparent what knowledge is embedded in the networks, even those (like the Pearl networks, see detail below) whose structure is closely tied to probabilities, and also because the translation in each direction is generally tedious. This last fact is caused by the global character of the translation. We have explored several combinations of language, network architecture, and translation. Likewise, in the development tools we are begi.nning to build, the person attempting to solve a problem will be gi ven a choice of several higher level languages for defini ng what he/she knows about the domain, matched by translation procedures to different network architectures. This diversity is necessary because different problems and computing situations require different levels of convenience and expressiveness , and different cost priorities for soundness, completeness, time, space, labor, and data. For instance, the method we have chosen below to illustrate this approach on the Yale Shooting Problem uses a variant of first order logic with default rules. The compilation procedure then translates statements in this language individually into constraints on a probability model, finds a particular probability model by maximizing entropy, and embeds this model in what Judea Pearl calls a "causal poly tree" for use at run-time (Pearl, 1988). Our approach to solving the Yale Shooting Problem (which differs from Pearl's own solution in Pearl, 1988) does not require learning, but many more complicated problems would. In fact, one method for solving a problem or building an efficient knowledge base that would be well served by this approach is one involving iterative compilation and decompilation. We define what we know about a problem as bes t we can in a language that seems appropriate, ~ompile this representation into a network, let it do the best it can while constantly receiving new data to modify itself, and then decompile to a representation of what it has learned. At this stage, we could inspect the principles it is applying and see for ourselves whether they are sufficiently general to apply to novel inputs , or obviously taking advanta.ge of regularities tha t happened to hold dllcing training but will not apply la ter on. II t.his lat ter is the case, then we need to modify the knowledge by hand at the declara ti ve level, recompile, and try again. Over time, we should achieve a better system by this type of refinement , and moreover we can be confident that our solu tions are sufficiently general because we can recover explanations from the system. The development of this methodology for networks that learn is part of the work in which we are currently engaged, but we should emphasize that this is an experimental approach. While we feel after careful analysis that this approach can make neural networks work, we really cannot tell how easy it will be to overcome the inherent problems with the networks that we listed previously. Our rationale is that networks cannot work well on large problems unless we provide initial structuring that reflects what we know about the problem, and knowledge bases large enough to encompass everything we know about natural language, perception , and common sense reasoning problems are impossible to build entirely by hand, let a.lone to run an inference engine over. So the right method must lie somewhere in the middle, but it will require much experimentation before we can say just where. It is possible, for instance, that learning can sol ve most of the problem for a surprising number of applications, and only some crude structuring is necessary. Alternatively, T Davies 719 it may prove unmanageable to use learning until most of the knoll'ledge that must be defined about a problem is built into a net.work. These possibilities are what we would lil(e to test, and the answers will almost certainly depend on what types of problems we are trying to solve, even within the class we have called knowledge-intensive. 2.2 An Example of Compilation To illustrate the approach we are taking, we will present a stepby-step description of how we might translate the Yale Shooting Problem , described in a declarative language by a designer, into a network that will answer questions about the shooting situation based on the information in the description. To do this, we must choose a specific description language, target architecture, and translation algorithm. All of this should make it clearer what kinds of tools we will be builcling, although we will stop short of actually solving the problem . The langua.ge we will use for the high level description is a variant of the functional predicate calculus that has a default implica.tion operator ("~") which means that the left hand side provides strong evidence for the right hand side. The objects in the domain are all time points. The statements in this language would be divided into background knowledge, which applies more generally across situations, and situational knowledge specific to the case at hand. The background knowledge would be entered as follows. "It Loaded(t)&Shoot(t) =:::? Dead(t + 1) "It Loaded( t) ~ Loaded( t + 1) VtAlive(t) ~ Alive(t + 1) "It Dead(t) <==> -,Alive(t) TJ = To + 1 T2 = TJ + 1 The last two statements express background knowledge about specific times, like our knowledge that the 18th Century came before the 19th, and that both have passed. We might think of the above as already forming part of the knowledge base before we describe the current situation, which reads as follows. Loaded(To) Shoot(Td Alive(TJ) Fin al ly, we would enter a query. Since we want to know whether Fred is dead at time T2 , we would enter The target archi tecture we will use is Judea Pearl's causal poly trees (Pearl, 1988). Poly trees have the advantages of a well-understood semantics (for easy decompilation) and quick settling. The disadvantages are that the translation (compilation) is computa.tiona.lly difficult and that the highly structured nature of the network does not lend itself to flexible learning. For this problem, we do not require learning and the number of variables is small enough that we can handle a small combinatorial explosion in the translation algorit.hm , so poly trees will do fine for us this time. For other problems, we would ma.ke another choice depending on the cost priorities specific to the problem. The translation algorithm. \\'hich is just one of several we could choose for going 720 PRICAI '90 from logic with defaults to causal poly trees, consists of five ~ j;eps. Fjl's~ we gen rate a. set of equ at ional constrain ts On the probabili ty model. These are obtained by LranslMing eadl sla.t~rlenl (ex(;ept for equalities) in the logical des(;riplion of backg.l'ound knowledge into a probabili ty equation , aJ1d substitu ting known object constants for object varIables to make all of the probability statements un quantified. We can also make use of known situational facts (like Shoot(TJ)) and simple definitions (like Dead(T2) <==> -,Alive(T2)) to simplify the probability constraints, yielding a set that includes following. Pl'{ -,Loaded(Td V -,Alive(T2)} = 1 Pr{Alive(T2 ) I Alive(TJ)} = v Pr{Loaded(Td I Loaded(To)} = v The global parameter v is a number inside the unit interval which is determined by the designer. In this case, we will set it to be 0.9. The key idea in most of these tra.nslations is to construct a model from the constraints. A model specifies what happens for every combination of events expressible in the la.ngua.ge, in this case probabilistically, and therefore goes beyond the information in the description. AI researchers have been attracted to models as a way of doing tractable reasoning partly for computational reasons (Levesque, 1986) and partly be- {;ause they appear from introspection to be how we reason about change ourselves without falling victim to the "Frame Problem," or the problem of enumerating all that becomes or remains true or false after an event or action. We 'understand what happens when a ball is kicked into a window not because we run down a list of sentences in our head about the effects of kicking balls at windows, but because we set up from the description' some internal picture or model of what happens. To get a probability model for the Yale Shooting Problem, we need to infer probabilities that cannot be calculated from the given information merely using the probability calculus. Instead we find a model that maximizes the probabilistic entropy subject to our constraints. To set this up as a nonlinear optimization problem, we need to divide the space of possibilities into disjoint events, set up an objective function for maximizing the entropy of these probabilities, and add a few constraints that come from probability theory. V,'hen this is done, the resulting nonlinear program is the following (see notational definitions below). MAXIM IZE L I(Pr{Lo = i&AJ = j&L j = k&A2 = I}) i,j,k,1 S .T. Pl'{LJ = O} + Pl'{AI = O} Pr{L1 = 0&A 1 = O} = 1 Pr{Al = 1&A2 = 1}/Pr{A1 = I} = v Pr{Lo = 1&L1 = l}/Pr{Lo = 1} = v L: Pr{Lo = i&AJ = j&LJ = k&A2 = I} = 1 i,j,k,1 Vi, j, k, IPr{Lo = i&AJ = j&L1 = k&A2 = I} ~ 0, where I( x) = x log x, event variables AJ and Lo denote the propositions Alive(TJ) and Loaded(To), and so forth, and i, j, k, and I range over {O, I} and denote the truth (1) or falsity (0) of the event variable. The last two constrai~ts come from probability theory. The solution to the nonlInear program is a probability model which can then be used to generate a ca.usal poly tree for answering queries. We first decompose the probability distribution P(Lo, AJ , LJ, A2) into a product of probabilities using the chain rule for conjunction. When all conditional independencies are taken into acco unt, with the event variable order we have been 4 I I .... ~ (1,0) P(at) Figure 2: Causal Poly tree for the Yale Shooting Problem using, the distribution can be rewritten as the product P(Lo)P(A1)P(L l I Lo)P(A2 I Ll,Ad* Applying Pearl's algorithm for constructing Bayesian networks, this genera.tes the tree shown in Figure 2. Each node has associated with it two parameters, 71 (below the node) a.nd >. (above the node). If we solve the nonlinear program given above (using, say, the SQP method of optimization employed in the NPSOL program (Gill et aI., 1986) and provide the evidence that Loaded(To) and Alive(T1), then the network concludes that Pr{Alive(T2) I Loaded(To)&Alive(Tt)} = 0.5. In other words, we have not provided sufficient information to say that It is more likely Lhat tile person died than that the gull was unloaded. At thjs point, we can change the knowledge base of statements to ma,ke this fact clear, and iterate through the compilation aga.in. The details of the network updating algorithm and the parameter assignments are too complicated to present in this paper, but are given in section 4.3 of Judea Pearl's book (Pearl, 1988). The parameters can ollly be fully determined once we have a complete probability model, thus making necessary the solution of the maximum entropy problem. However , because Ute maximllID entropy problem grows cxponell~iaJ]y in the number of ato*mic pl'OpOsi tional variables, it is i It[easi ble to sol ve exactly far problems other than sma.ll ones Jlke the Yale SllOoting Problem. To get around this problem, we mus~ 'UBe heuristic methods of optimization geared toward finding the maximum entropy solution. These methods would need to be developed as part of the project. Our initial plan is to add intelligence to the process of decomposing the space into disjoint possibilities, instead of using the straightforward method given here. Additional complications that can arise in these problems include the dependence of the constructed network on the orderi ng of the event varia.bles, the need to eliminate cycles tha.t make networks fai l to be singly-connected (and hence lIot poly t rees), ,~nd the problem of large malrlces of parameLers al nod.es which depend on many interacting causal variables, since blley grow Ul size exponentially. AJl of these problems have solutions (see Pearl, 1988), but the solutions are tecUous and so a.u.tomation is crucial if they are to be applied to problems of a.ny size. This complication is a point we want to emphasize generally. We have made this example simple enough to present briefly and in a way that gi ves a flavor of the work we are doing, but in the process we have stripped away most of the complicating factors (including learning and decompilation) that necessitate tools for constructing these networks automatically from a friendlier declarative de*scription. 2.3 Approaches to Decompilation Onr approach to the development of decompilation algorithms involves ana.lysis of correlations in a unit's activation with conditions in its input set to determine objective mean- --1 il'lg for a hidden node, fa[Jowed by r.ra llslatio ll f the sel of weights Into meaningflll probability staLements. T he first or these stages is similar in spirit to the work dOlle by Terry Sejl\Ow ki ru1d Char1s Rosenherg on analyzing NelTalk (Rosenb rg, 19 7), alld the second stage approach is described in it paper by Davies gi ven at the L988 L! NS Conference ill Boston (Davies, 1988). Since this work has appeared elsewhere we will not repeat the mathematical details here. 3 Related Work by Others: How We Differ A number of researchers ha.ve ex-pressed the opinion that NNs and KBs apply best to different problems (e.g., Hecht-Nielson, 1986). Hybrid KB/NN systems thus sometimes use the KB and the NN to work on different aspects of a problem, rather than using them at different stages in the solution of a. single applications problem as in this project . The principal research efforts that have involved encoding or recovering knowledge in neural networks, in a. manner comparable to that proposed here, have been the following. 1. Work by Steven Gallant that is being further developed by Hecht-Nielson Neurocomputers, on so-ca.lled "connectionist expert systems" (Gallant, 19 8). This involves an induction of rules from statistics collected [rom examples rather than a translation of the parameters obtai lied during adaptation. 2. Work by Dam\ Ballard on implemen.ting precUcãe logic theorem proving in neural networks (Ballard, 1986) , or all the work that has been done by others, this is tl'le most like what we are proposing. Dall ru'd's algorithm is used [or "proof by refutation, and so can only answer Yes/No queries for specific propositIons, r.ather than forward chaining to a set of conclusions as in the algorithm we have developed for translating first order logic constraints (Davies, 1989). 3. Work by a number of researchers (Derthick, 1988; Weber, 1989; Thagard, 1988; Pearl, 1988; Shastri, 1989; Jones & Story, 1989) on using neural networks for nOllmonotonic and evidential reasoning. Ea.ch of these researc.hers has constructed networks by ha.nd, t,o solve inference problems, inc.\udlng exa,mpJes like the Yale Shooting Problem detailed earlier. But the compilation and r.ecovery of declarative knowledge in their systems are, if they are mentioned a.t all, accomplished by rather simple, local mappings of a constra.int onto a network link, rather than by global fit-ting. T his approach limits eiLher the expressiveness of the network or its efficiency ralher severely for general problem solving. It limits t he ex.pressiveness becaUBe sets of constraints whose local translations violate tlle Iletwork's forma.tion rules (syntax) cannot be represented. Thus, for exa.mple, Lokenda Sha.stri, Mark Jones and Guy Story, who do provide interpretations of network structures as embodying declarative theories, restrict themselves to inheritance networks that are relatively inexpressive for local translation algorithms. When the synta.x is relaxed to allow more flexibility, for example when cycles or combinatorial rules are permitted, then the network will by nc* cessity run more slowly, if it settles at all. The globa.l translation approach we use, which requires automation, is designed to get around this tradeoff by paying the price during compilation ra.ther than during design or at rlln time. T. Davies 721 I To summarize, then, the distinctive features of our apI'roach are (1) the use of global rather than local translation, 12) the idea of an iterative compile/decompile cycle with some place for learning and some place for hand-crafting, and (3) I he fact that the KB and NN approaches are both used, but ,t! different sta.ges in the solution of a problem rather than to ~olve different problems. References [1) Anderson, J. A. & Rosenfeld, E. Neurocomputing: Foundations of Research. Cambridge, MA: The MIT Press, 1988. [2) Ballard, D. H. Parallel Logical Inference and Energy Minimization. Proceedings of the Fifth National Conference on Artificial Intelligence (AAAI-86), Philadelphia, PA, August 11-15, 1986, pp. 203-208. [3) Buchanan, B. G. & Shortliffe, E. H. Rule-Based Expert Systems: The MYCIN Experiments of the Stanford Heuristic Programming Project. Reading, MA: AddisonWesley, 1984. [4) Bundy, A. The Computer Modeling of Mathematical Reasoning. New York: Academic Press, 1983. . [5) Davies, T. R. Some Notes on the Probabilistic Semantics of Logistic Function Parameters in Neural Networks. Neural Networks, 1, Supplement 1, Abstracts of the First Annual INNS Meeting, Boston, September 6-10, 1988, p. 88. [6] Davies, T . R. Neural Networks and Artificial Intelligence. Independent Research and Development Data Sheet, Project WS, SRI International, Menlo Park, California, March 15, 1989. [7] Derthick, M. Mundane Reasoning by Parallel Constraint Satisfaction. Technical Report CMU-CS-88-182, Computer Science Department, Carnegie Mellon University, Pittsburg, PA, September 1988. [8] Gallant, S. 1. Connectionist Expert Systems. Communications of the ACM, 31(2), February 1988, pp. 152-169. [9) Gill, P. E., Murray, W., Saunders, M. A., Wright, M. H. User's Guide for NPSOL (Version 4.0): A Fortran Package for Nonlinear Programming. Technical Report SOL 86-2, Systems Optimization Laboratory, DepartIllent of Operations Research, Stanford University, Stanford, CA, January 1986. [10] Hanks, S. & McDermott, D. Default Reasoning, Nonmonotonic Logics, and the Frame Problem . Proceedings of the Fifth National Conference on Artificial Intelligence (AAAI-86). Philadelphia, PA, August 11-15,1986. [11] Hecht-Nielsen, R. Performance Limits of Optical, Electro-optical, and Electronic Neurocomputers. In Szu, H. (Ed.) Hybrid and Optical Computing. Bellingham, WA: Society of Photo-Optical Instumentation Engineers, 1986, pp. 277-306. [12) Jones, Iv!. A. & Story, G. A. Inheritance Reasoning in Connectionist Networks. First International Joint Conference on Neural Networks, Washington, D.C., June 1989. 722 PRICAI '90 [13] Lapedes, A. & Farber, R. Nonlinear Signal Processing Using Neural Networks: Prediction and System Modeling. Technical Report LA-UR87-2662, Los Alamos National Laboratory, Los Alamos, NM, 1987. (14) Levesque, H. J. Making Believers Out of Computers. Arificial Intelligence, 30, 1986, pp. 81-108. [15) Marr, D. Vision. New York: W. H. Freeman and Co., 1982. [16) Mead, C. A. & Mahowald, M. A. A Silicon Model of Early Visual Processing. Neural Networks, 1:1, 1988, pp. 91-97. (17) Minsky, M. L. & Papert, S. A. Perceptrons (Expanded Edition). Cambridge, MA: The MIT Pres, 1988. (18) Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Los Altos, CA: Morgan Kaufmann, 1988. [19) Rosenberg, C. R. Revealing the Structure of NETtalk's Internal Representations. The Ninth Annual Conference of the Cognitive Science Society, Seattle, Washington, 16-18 July, 1987, pp.537-554. [20] Shastri, L. Default Reasoning in Semantic Networks: A Formalization of Recognition and Inheritance. Artificial Intelligence, July, 1989, pp. 283-356. (21) Thagard, P. Explanatory Coherence. CSL Report 16, Cognitive Science Laboratory, Princeton University, Princeton, NJ, March 1988. [22] Webber,' B. L. & Nilsson, N. J. Readings in Artificial Intelligence. Los Altos, CA: Morgan Kaufmann, 1981. (23) Weber, J. A Statistical Approach to the Qualification Problem. Talk Presented at Artificial Intelligence Center, SRI International, Menlo Park, CA, February 23, 1989. Artif'cial IntelliRence in the Pacific Rim Proceeding<; of the Pacific Rim International Conference on Artificial Intelligence. NaRoya 19'10 Copyright © 1991 by Japanese Society for Artificial Intelligence The exclu!'ive publication riRhts to this post-conference proceedings are granted to OIlMSIIA. LTD. by the copyright owner. All riRht!' reserved. No part of this publication may be reproduced. stored in a retrieval system or tnm,;mitted in any form or by any means. electronic. mechanical. recording or otherwise. without the prior written permission of the copyright owner. ISB'-l 4-:!74-076:lfi-9 (OIlMSHA> ISBN 90 ':iJ99-0S:l-7 (lOS Press) 711,' 1"''</-'-''''/''1'1'111''' I*difion {",hlished find di.,frihuled in Japan by: OHMSIIA. LTD. ;l-I Kanda Nishiki-cho, Chiyoda-ku. Tokyo J01. Japan lJi.<lribulr*d ;17 .'V"rl" Aml'l"im by: lOS Press. Inc. Postal Drawer 1055R. Burke. VA 22009-0558. U.S.A. nisln'bllit'd ill f;lIropc and till' I'I'S/ (If file world by: lOS I'r~" Van Diemenstraat 94, 1013 eN Am,;terdam. The Netherlands Prilllcd in J'IIIOII1 Artificial Intelligence in the Pacific Rim Edited by Hozumi Tanaka Tokyo Institute of Technology lOS Press ~~w Ohnlsha Amsterdam , Wãhlngton Tokyo a,aka KyolO