Artificial neural networks are well suited for prediction, forecasting, generalising decisions based on previously processed data, classifying data, building meta-models based on samples. In my knowledge, ANNs return outputs according to the entered inputs without explaining how such decision was made. Contrarily to machines, humans are able to explain their decisions.
Do you have any comments on this issue?
It is possible to extract rules from neural networks.
Especially logical formulas, under the condition that
the neurons act as classical boolean gates.
To achieve that goal, all inputs and outputs
of neurons (or units), even in the 'hidden' layers,
should be trained to assume values that are
either 0.0 or 1.0 (or -1.0 and 1.0), but not a
value between. This can be done with an
evolutionary learning mechanism. Maybe a
combination of evolution and backpropagation
is possible (first backpropagation, later evolution).
At the end the logical function of each neuron
can be tested by applying all possible
combinations of inputs.
Regards,
Joachim
Nabil ,
I would break your question into two separate threads
1) Biological debate
Can the neural networks of humans explain their output or is it a rationalization based on other neurons analyzing the inputs and outputs of that network?
I think answering this question first might prove useful to answer your question based on your comparison.
2) Machine learning debate
I think that while decision trees do provide white box style algorithms they do not have the flexibility of NNs (I am biased towards NN). In this sense all algorithms have their pros and cons depending on the problem to be solved and if explanations are necessary for a positive evaluation of the algorithm.
Artificial Neural Networks (ANN) are simplifications of the vertebrate brain, being reduced to a complex equation (the role of training is to determine the parameters of an equation that best fits the data).
Can a mathematical function explain its output? The answer is no. Then why do you have other expectations from an ANN?
When ANNs will be more like their inspiration systems, then maybe this answer will change.
You can use some type of reasoning for some types of ANN systems,
For example for an LVQ type ANN the decision for the "suggested" output class is based on the proximity of the vector under examination with codebook vectors. To pose it simply as output class of an unknown vector is chosen the class of the nearest codebook vector.
In a second step it is easy to find the training set vectors that are near to this codebook vector or even simpler to the vector under investigation.
Thus a reasoning for the ANN decision can be based on the similarity of the vector under investigation and the training set vectors belonging to the same class.
Therefore some type of reasing for the ANN outcome is: "the vector under investigation belongs to class X because there are Y similar vectors belonging to the same class"
"Contrarily to machines, humans are able to explain their decisions." - Not necessarily! Humans also often (re)act based on experience - explicitly or implicitly. General humans (not scientist but these also) do not always do a regression on their experiense and draw explicit conclusion and rules as base for their following decisions. Often they only decide intuitively. Very often these are also the better decisions...
If you are using feedforward networks you can use methods for fuzzy-rule extraction from neural networks. If you looks for human readability, I don't recommend this since, although you obtain a set of rules, you don't know what they mean.
If you need to put expert knowledge into the network or to find a right and easier way to obtain network behaviour interpretation, I suggest to use ANFIS-like networks.
If you use first or second order recurrent neural networks (Elman type, etc.), you can use a method to extract regular autómata or stack autómata.
Laurent Bougrain proposed some models to extract knowledge from a learnt ANN. The paper entitled "A pruned higher-order network for knowledge extraction" in International Joint Conference on Neural Networks - IJCNN'02, May 2002, is one of his attempt.
I have been working with ANNs as a user for some years now. Our work deals with the applications of ANNs in the pharmaceutical and plant science fields. In my experience ANNs can be useful to explain procesess and make decissions when you combines ANNs with fuzzy logic. I don't agree with Manuel when he says "you don´t know what the rules mean". On the basis of the experience (a data set including many inputs and outputs), neurofuzzy logic is able to generate understable rules useful to make decisions and generale knowledge (Please, see reference European Journal of Pharmaceutical Sciences 38 (2009) 325–331). We were able to explain with words the effects of the variables on the performance of products. That is the mean goal of pharmaceutical development and I think this technology is a great help.
As mentioned by several commenters, there have been numerous efforts to extract understandable rules from trained neural networks. I'm partial to the work done by Thuan Huynh (e.g. "Guiding Hidden Layer Representations for Improved Rule Extraction From Neural Networks," IEEE Trans. Neural Nets. 2011.)
I can't put my finger on a reference right this second, but I know people have analyzed the distribution of weights after training to infer explanations. Kamimura's "Information theoretic neural computation" book may be some help here.
There are also various cognitive modeling-inspired approaches that seek to learn interpretable representations. I'd look to Randall O'Reilly and his colleagues to get started down this path, as well as Chris Eliasmith et al.
PS A small bit of pedantry: artificial neural networks are not based on *vertebrate* neural systems. The McCulloch-Pitts model was inspired (IIRC) by the squid giant axon, the stomatogastric ganglion of the lobster has been heavily studied, and the entire neural system of C.elegans is being (has been?) mapped out and analyzed.
We're also getting away from CS/neuroscience and into epistemology here, but I would be more reluctant to make some of the claims that I've read here about the inability of biological systems to explain what they've done.
Until now it was thought that MLP does not explain its decision. An article about Border Pairs Method (BPM) describes how the decision is made in individual layers of the Multi Layer Perceptron (MLP).
Conference Paper Border Pairs Method – Constructive MLP Learning Classificati...
Nabil,
' Contrarily to machines, humans are able to explain their decisions. ''
Humans are not able to explain their decisions. We think we can and are also able to provide some rationals for our decisions but these rationals do not provide the reasons for our decisions. Pascal was saying: '' The heart has its reasons, which reason does not know.'' We make a lot of sacrifices for our children and for the people we love or for the social causes we believe in. But what does it mean ''to love''? We do not know in a rational way. We eat when we are hungry, we do not understand or know in an intellectual way why we are hungry; we simply feel hungry. So in this case as in so many other cases, we act acording to what we feel to do, and why we feel like we do, we do not know. We do not like to see ourself like the other animals. We know a lot of things but we should not foul ourself in thinking that we know why we act the we we act. As infants we learn to walk and to move our arms to do what we want to do; but how do we move our legs and our arms? We do not intellectually know that, we know that as any animal know that which is not at all intellectual knowing. But living in society we have to communicate with each other and explain why we behave the way we do. So we are very good at creating rational stories where all decisions are taken for this and this reason. In these stories we act as purely rational being. We become so good at creating rational being stories that we even beleive that these stories are true! As scientists we should know better.
Dear Nabil! As already proposed by some other repliers, your question shall be split into two or more questions - and adequate answers. I am also convinced that your question is a rhetorical question and not a real dilemma. Everything you ask is well documented in AI and ML literature. Just to trigger a bit the discussion: Can you write down concise instructions how to ride a bicycle? yet a human robot can learn this on its own - so not everything is reducible to simple rules (conceptual models, regression threes, CART in AI) that can be learnt without heuristic and try&error experimentation (ANN approach). SIC! Regards, Boris
This looks like a problem of self reference. In formal systems usually self references are sources of paradoxes; I have never heard about "paradoxes" in neural networks, but trying to explain its own outputs, I know from expert systems based on symbolic knowledge representation (like production rules), using "canned" information, attached to rules.
Dear Mohammad!
Have you checked this source?
http://www.cs.waikato.ac.nz/ml/weka/
It is all inclusive ...
Regards, Boris
Dear Mohammad!
It was not my intention to give you the source code, but the manual which is describing the background behind the SW package WEKA. If you are searching for working SW within the Matlab, there is a Matlab toolbox for ANN, besides, also a lot of freeware on the i-net, just google! E.g.: http://www.mathworks.com/products/neural-network/
In the last 15 years there has been a lot of work on rule extraction from neural networks. I worked on the extraction of rules from ensembles of neural networks. If you are interested to read more details just go on my researchgate page where you can download several papers.
ANNs return outputs according to the entered inputs. The machines make decisions using different methods of optimal parameter space searching. It is picture of human decision making.
Neural models are usually not used for hypothesis testing and so there is less of a need to explain how the weights are determined or what they mean. Most NNA applications are for predictive purposes and most programs develop many models heuristically before they settle on one that gives the best overall fit. It is only in situations where the model and the theory coincide that the structure of the NN is readily explained. There are some examples of this in most research arenas. For example, if a sequence of variables is time ordered then the neural model would have to include the sequence as a way of justifying its fit.
In line with Louis Brassard and William Jackson, I'd like to address the following point:
"Contrarily to machines, humans are able to explain their decisions."
In humans, this seems to happen a posteriori: an output of the neural network (processing, decisions, etc) is given and only at that point an interpretation is made up. There are neurological syndromes in which patients lose memory and when asked "why are you in this place?" their neural network gives them false memories. They then use these false memories to make up an explanation of the reason why they're in that place, explanations that result in something absurd (confabulations).
Please note that the difference with healthy people is that healthy people's neural networks generally don't give them false memories, but the process is the same.
To return to your point, I think that you might be more accurate in asking if ANN can output an explanation of their output in addition to the output itself (I see a possible problem of self-reference, too, like pointed out by Rey Segundo Guerrero-Proenza) or a posteriori give an interpretation to the previous output, or even if you want to know if an ANN can be transparent enough to understand the internal processes.
ANNs method are blackbox, they can not describ their desition, they are not interpritable, aganst fuzzy logic and expert system. but in some work they combine with fuzzy and be a enterpritable method for solve problems
Perhaps the greatest advantage of ANNs is their ability to be used as an arbitrary function approximation mechanism that 'learns' from observed data. However, using them is not so straightforward, and a relatively good understanding of the underlying theory is essential.
Choice of model: This will depend on the data representation and the application. Overly complex models tend to lead to problems with learning.
Learning algorithm: There are numerous trade-offs between learning algorithms. Almost any algorithm will work well with the correct hyperparameters for training on a particular fixed data set. However selecting and tuning an algorithm for training on unseen data requires a significant amount of experimentation.
Robustness: If the model, cost function and learning algorithm are selected appropriately the resulting ANN can be extremely robust.
With the correct implementation, ANNs can be used naturally in online learning and large data set applications. Their simple implementation and the existence of mostly local dependencies exhibited in the structure allows for fast, parallel implementations in hardware.
First of all, I think that we cannot compare the best of our works in neural networks or other learning machines with the most insignificant insects.
A neural network is a learning machine, and there is no intelligence inside. However for some application purposes, sometimes we would like to know why a neural network has classified an input as a member of the class 1 rather than a member of the class2. If you are interested by this problem, I have proposed a method to explain neural network classification:
https://www.researchgate.net/publication/11348549_A_methodology_to_explain_neural_network_classification
Article A methodology to explain neural network classification
I would like to thank you all for you interesting comments.
So, most of the answers suggest that ANNs are not originally designed to explain their decision. Rather, they are black-box programs that are able to generalize some acquired knowledge. It was mentioned also that there are some researches that adapted ANNs so that they would be able to explain some decisions.
I guess that it would be much more works on this field in the future, especially that, in some cases, explaining a decision is as interesting as the decision itself. I is okay that even humans cannot explain everything. But, in the situations where they can, it is not guarantied that "intelligent" machines can too.
(sorry for my bad English)...
From machine learning perspectiev: the output of the models including ANNs can be explained in varius human readable forms. An example of general explanation methodology and its application to neural nets is available in the paper Marko Robnik-Sikonja, Igor Kononenko, Erik Štrumbelj: Quality of Classification Explanations with PRBF. Neurocomputing, 96:37-46, 2012, see http://lkm.fri.uni-lj.si/rmarko/papers/RobnikSikonja-Neurocomputing%202012.pdf
A standard ANN is a linear combination of parameterized nonlinear functions (the "neurons"). A polynomial is a linear combination of nonlinear functions (the monomials). Would you complain that a polynomial does not explain its decision? No. The term "neural network" is misleading: a neural network has nothing to do with the brain. The term “training” is misleading: the training of neural nets is just parameter estimation. A neural net is a parameterized function that has useful properties, which make it very good at classification and nonlinear regression. If you want to do something useful with neural nets, think in terms of mathematics, statistics, signal processing, automatic control, not neurobiology. Of course neural nets and other machine learning techniques are very useful to neurobiologists too, especially for the analysis of brain signals, be they provided by fMRI, EEG, even single electrode recordings; they are also useful for brain-computer interfaces. But neural nets are useful because of their mathematical and statistical properties, not because they are "neural".
The story of neural nets is interesting in that respect. The early researchers in the field believed that they were designing machines that mimicked the brain. But it took only very few years to realize that nobody could mimic the brain because nobody knew how the brain works. However, it turned out that, in a serendipitous fashion, these "neural nets" had useful mathematical properties, which had nothing to do with their "biological" inspiration. In general, I would say that trying to mimic biology is not a good idea, unless you know and understand exactly the process you are mimicking. Airplanes do not have feather-covered fuselages and flapping wings, have they?
To summarize, do not expect a neural net to explain its decision. It is your job to understand what it is doing, and to make sure that you do everything right (feature selection, cost function optimization, model selection, performance estimation, statistical analysis of the significance of the results, estimation of confidence intervals…). In addition, neural nets can go much beyond black-box modeling: they can make efficient use of prior knowledge (this is known as semi-physical modeling or gray-box modeling). In other words, you can help them make a decision that you will understand, not the other way around.
Neural nets demand more than just using a toolbox, but they can be very rewarding.
It is true, that humans can help the neural networks world to make decision, not they can do alone. But the stream theory is something which can make news in optimal space searching, maybe as new dimension in world of neural nets ( with own parameter space, own dimension, etc,... ) maybe like human brain...
maybe better English word string theory
It is true, that humans can help the neural networks world to make decision, not they can do alone. But the stream theory is something which can make news in optimal space searching, maybe as new dimension in world of neural nets ( with own parameter space, own dimension, etc,... ) maybe like human brain...
I do not understand the above comment except the first sentence, with which I cannot agree. Artificial neural networks are definitely not a model of biological learning. There is not the remotest experimental evidence that the brain performs parameter estimation by minimizing a cost function; I have never seen any report on the discovery of a "backpropagation brain area" or of a "Levenberg-Marquardt brain area" by any experimental brain investigation technique. Neural network training is parameter estimation, just like least squares fitting for linear regression, but less straightforward because the network output is nonlinear with respect to the parameters of the neurons of the hidden layer. It is a technical, not very glamorous statement, but machine learning does not draw any benefit from unsupported claims. Completely implausible references to neurobiology, as were frequently made in the 1980s, hampered the development of machine learning for several years.
Neuro-Fuzzy networks may give you informations about their training parameters that can be usefull to interpret results in some cases.
An artificial neural network (ANN) could be inspired and developed from a neurobiological one (but not necessarily) and after its training, it is ready for meeting its objectives. This does not imply giving significance or meaning to the task at hand, at least at the present moment. The ANN's dynamics can take place at the transfer functions level,at the synaptic weights level, or at the connectivity pattern level and still is far away from showing the complexities in dynamics present in the biological ones. The assigment of meaning, and then the possibility of explaining its output requires the complexity of our own biological neural networks which exhibit computational power at the molecular level, at their dendritic trees formations and at several larger domains well before of reaching the synaptic circuitry level, usually the source level of inspiration for the ANNs. There is also an important fact to take into considaretion: the biological neural networks are embedded in live systems, being target of multiple interactions with many internal and external subsystems and we should model those interactions in order to have a complete picture of everything. We are not even close to model and simulate this sort of complexity. Then, so far the answer to the question is no.
I think that this discussion is strange. There is a confusion between explanation of an output, and biological inspired neural network:
Can a mosquito explain why he chooses to turn to the right ?
Are the neurons of a mosquito not biological inspired ?
There is no correlation between the explanation ability and the biological plausibility.
It exists some methods to interpret the classification of ANN. For example: a MLP project the inputs in a representation space, the hidden layer, where the data are linearly separable. You can cluster the data in the representation space, and then explain each cluster by identifying the input variables which are relevant for it.
Let me give you an example:
Suppose you train an ANN in order to anticipate the risk of customers credit demands. If the ANN indicates that a given credit demand should not be agreed, than an explanation of such decision is required and cannot be ignored.
I think it is explained, as follows: you enter inputs, then the ANN calculate the outputs by modifying the weights it has associated, this weights are the one you should check to see how it arrive at such conclusion, not expect the outputs tell you. Cheers.
I believe the original question was actually about the potential ability of an ANN-based problem solver to provide an explanation of its solution in terms of the application domain, rather than in terms of its own internal functionality. In the example given by Nabil in his last reply, it might sound, say, like this: "I recommend not to agree with this demand because this customer reveals poor credit history and is only C-rated by XYZ institution". IMO, such an ability has actually nothing to do with functional principles of such a low level like those presented by ANN. This ability, clearly, assumes a human-like manner of thinking, no matter on what low-level functional elements it is implemented. That might be ANN (as well as a different apparatus), if it can allow to build on its basis corresponding functionality of higher levels. From similarity with human brain this seems to be quite possible, in principle, though I personally am not sure that this would be optimal (efficient) in technical sense.
We can explain the output of an Artificial Neural Network. The results of a decision by an ANN can be explained in terms of a set of rules, and the full network can be characterized as a set of rules. There is no real difference between a supervised ANN trained by backpropagation or an unsupervised ANN such as a SOM, or a decision tree trained by C4.5 or J48, or an expert system handcrafted by a programmer.
To explain this, let us focus on one result, recognizing a tumour say. Here a simple dichotomous system might say yes this person has a tumour, or an expert might say, I'm sorry but I believe you have a tumour. The expert might point to and/or circle something in an image. The system might point to and/or circle something in the image. So here we have a first aspect of explanation, pointing to the specific evidence, focussing attention there and away from irrelevant aspects. The expert or expert system could make specific comments about the colour or texture, the growth versus the previous image, etc. For the knowledge-engineered expert system, these intermediate concepts or features are provided by a human expert in human understandable language. A learned system may be trained first to recognize these features, and then in a subsequent stage to learn to make the diagnosis based on these features.
An example that is at the same time harder and simpler is recognizing dogs. Whether from seeing them, hearing them, feeling them, or understanding the word or a spoken description. What is simpler about it is that even a two-year old can do all of these variants. What is harder about this is that not even an adult, or a linguist or scientist can explain at present exactly how we do any of those things, and computers are not very good at doing any of them either, but when it does make a decision we can say exactly why it made that decision, athough there may be a problem that we don't have English words or concepts for some of the intermediate features that were invented by the computer, any more than a child does for concepts like convexity or association that are presumed to be important to them achieving the goal, but are preconscious features without attached linguistic labels -except to the extent they have been proposed by scientists or linguists who are studying these processes.
Now let's consider the decision tree or neural network. In both cases we have a network of nodes where each node has a certain set of inputs (attributes or features) and a single output (yes/no, or some kind of strength or credibility), and the inputs for certain nodes are outputs from specific nodes. For simplicity we will assume exactly two inputs per node (anything other network can be boiled down to such a network).
Let us assume we have an OR-type node or a perceptron-like neuron with a low threshold. If either of the inputs fires above threshold, it will fire or assert, equivalent to a F if A or B rule. On the other hand with an AND-type node or a perceptron-like neuron with a high threshold, we require both inputs to be high in order for our output neuron to fire or be asserted, equivalent to a F if A and B rules. Combined with the ability to invert inputs or outputs, or have NOT or inhibitory inputs, these nodes are also equivalent to logic circuits and any logic function can be built, not only sequential functional logic, but recurrent circuits with memory/state like flip-flops.
So given there is a flow of inputs to outputs through a number of layers of rules (and possibly memory inputs as well as current inputs), we have a sequence of rules from inputs to decision and can generate an explanation specifying each condition that lead to a neuron firing or node activating. Of course we may not like the explanation, and we may find it hard to follow... And it if is self-organized like both SOMs and MLP hidden layers, and the intermediate goals/nodes of decision trees/networks, then we won't have the nice English semantic labels to apply to all of those hidden nodes unless we take the time to look at what they mean, interpret them, and add such labels.
In some instances you may be able to visualise the weights of the intermediate layers of the network, which may give insight into the types of features that are being learnt by the network. This has been done in the field of vision to show that the network is learning basic features such as edge detectors. However this of course still requires a human evaluation of the visualisation, and may not be possible at all depending on how the network was constructed.
To truly achieve what you are talking about I think what would be required would be some kind of meta-learner, which is able to inspect the network and reason about the features. I would expect that this meta-learner would have to be something other than a neural network itself - more of a logic machine acting on a separate explanation ontology.
For what it's worth, I think humans actually do something like this - we don't really have access to the internals of our own neural networks, but we do have access to sets of concepts about decision making. When we are asked to explain a decision, what we are really doing is picking the best concept from our internal knowledge about decisions that matches the evidence about the decision. This is a meta-cognitive process.
That statement depends on the manufacturing area of knowledge and to what a neural network is used. For example in the case of the identification and control of dynamic systems, neural networks can describe the temporal behavior of a system. If the support vector machines and neural networks are considered, these enable explain in what context and a decision was made that instead of space is the same, which gives many lights as support for decision making. For multidimensional neural classifiers, these patterns can recognize different types according to their input features. In the case of Business Intelligence, neural models allow complex relationships between variables in a database for adaptation and learning, which is an important tool for decision making. It is clear however that neural networks establishing such relationships without the why, but why when you do not know very well the data means and in what context were obtained.
The "why" of an answer is typically given as an explanation of the steps that have led to that answer. This is valid for both humans and machines. So, in my opinion, if one can provide insight in the steps that lead to a certain output of an artificial neural network, one has an explanation of the network's decision.
You can search in the literature for rule extraction methods, of which there exist many, depending on neural network architecture and application domain.
In my Ph.D. thesis (http://purl.org/utwente/41465) I introduced a generic neural network analysis method that utilizes domain-specific basic functions that are easy to interpret by the user. In general, the analysis consists in describing the internal functionality of a trained neural network in terms of domain-specific basic functions, functions that can be considered basic in the application domain of the neural network. This helps users gain insight in the way the neural network solves their problem. The challenge for the neural network "interpreter" in a given application domain is to find suitable basic functions for that application domain.
There is a dubious argument used for connecting the performance of an ANN with the capability of explanation of its human counterpart, the human expert. An ANN can point out at some elements present in some radiological images, for instance, and declare the existence of a tumor. This can be achieved using a MLP or a SOM. In order to get an explanation from this ANN, this system only has its rules and architecture as a prior reference for offering such required explanation, and this is quite different to what a human expert would say at the same requirement. This human expert can get into the causal aspects beyond the ocurrence of such critical elements present in the images and offer a contextual and complete causal description that we accept as an explanation. He/she can do that because the complexity of his/her brain neural networks, much more complex than that of a mosquito that can not explain why it turns right as flying. So, nature and reality show us that evolution has place the capacity of causal explanations into well complex brains. Can we achieve that level of complexity on ANNs? Hard to say. What I can tell right now is that giving an explanation of a result is different from describing how you get into that result. Procedural knowledge is not "nuclear" or semantic knowledge.
Berend,
We have to remember that an explanation can take place at different level of abstraction. If I try to understand why my mechanical clock provides me with the time. I can be provided by with thousand different ways to describe the steps, all describing the mechanism of the clock but from different perspectives. The steps of the neural net can be those from the point of view of its lowest level of functionality but they can also be described more abstractly at highest levels of description with different steps in these descriptions.
I commented already yesterday but it seems that the answer was not uploaded ... so, to summarize: Explanation of the ANN in the way that every and each neuron is presented as an if-then sentence is too bulky and hardly transparent to a human. In fact, it looks as an over fitting. Humans do expect explanation in more generalized way, with only a few (if-then) statements that can be grasped in a single view, or in relatively few steps. This approach may be compared to CART tree pruning.
A precision : For a number of rule extraction techniques a single neuron does not necessarily represent an if-then rule. For instance several neurons could contribute to a given rule. However, if an oblique hyper-plane separates two classes it is better to extract an oblique rule, as it will require the use a very large number of axis-parallel rules (going to infinity).
Sorry to get down to practical instead of the theoretical explanations.In my papers I describe three ways to make the the trained ANN model more than a "black box".
The first is the "causal index", described back in 1990 by three Japanese researchers, (1) that calculate quite simply from the connection weights the qualitative influence of each feature on the any of the ANN outputs (CI positive means that increase in this feature value will increase the output value, and vice versa).
The second one is the identification of what features are not relevant to the ANN output and can be discarded and the ANN retrained with the reduced feature set (2). The eliminated features are those who do not contribute significant variance to any of the hidden neurons outputs in the trained ANN (either they do not chance much in the data, or the assigned connection weights between them and all the hidden neurons are very small).
The third one is the based on the hidden neurons outputs, which tend to be "binary" in a well trained ANN (3). All examples that generate the same binary pattern of the hidden neurons outputs belong to the same class, as they contain the same essential information.
More can be found in my publications available in my Research Gate page.
1. Baba, K., Enbutu, I., Yoda, M.: Explicit Representation of Knowledge Acquired from Plant Historical Data using Neural Network. Proc. of the Intl. Joint Conference on Neural Networks 3 (1990) 155-160
2. Boger, Z., Guterman, H.: Knowledge Extraction from Artificial Neural Networks Models. Proc. IEEE Intl. Conf. Systems Man and Cybernetics (1997) 3030-3035
3. Z. Boger, Finding patient cluster attributes using auto-associative ANN modeling. Proceedings of the International Joint Conference on Neural Networks, 2003, pt. 4, pp. 2643-8.
I'm coming from a completely lay background, so pardon my ignorance. One question that I have is whether one could really have an ANN if one does not know all the components/variables of the biological neural network it is based on? I specifically ask this because there is a lot of neurology and cognitive psychology that we still need to learn. Or is it able to backwards engineer to process perhaps?
Dear Hilda! You may have already read some of the previous answers where someone has said that ANN is just a construct that we think it is remotely mimicing the functioning of our brain. There is no intention (so far) to replace human brain. Instead, ANN has proven to be a good tool for constructing working models of real world's processes.
It depends on the kind of explanation you want. Though ANNs do not provide a human-like description as decision trees, they explain their decision making process in terms of weights describing seperability. However this explanation hardly make sense to an human especially when dealing with non-linear data set.
One way to explain is to use analogy. Here is a simple mathematical correspondence.
Lets we have some data points (xi, yi) and think about the simplest linear regression to fit the data into it and to find m and n in y=mx+n to. If you do the same problem with a 1 hidden layer ANN you get a similar equation but with 7 free parameters to handle: 3 biases a, b and c; 4 weights d, e, f and g.. If you use TANH in hidden layer and linear function in the output layer, you will get a long equation. Now the trick, approximate the TANH(x) as x and put it in the equation, you will get y=f(dx+b)+g(ex+c)+a which upon rearrangement would be, y=(df+eg)x+(a+bf+cg). So slope m is similar to (df+eg) the weights and intercept n is nearly similar to biases (a=bf+cg).
ANNs are just a mathematical tool that are designed to take inputs and produce outputs ... They sound cool, since they are biologically inspired ! Also, there is the concept of TRAINING, which allows the ANNs to adjust the internal weights based on training data ...
It is the PROBLEM MODELing that you use that puts ANNs to good use. With that model, you can also train the ANN to produce outputs that can EXPLAIN the results ... In the end, it is YOU that models the inputs, and it is YOU that models what the output values mean. The machine only calculates the necessary additions, and multiplications, and the sigmoid functions ...
One example is FACE DETECTION. In this process, there is MATCH PROBABILITY and CONFIDENCE.
MATCH PROBABILITY is a number that tells you, 93% (i.e., the QUANTITY OF THE RESULT) ... But, the confidence tells you how much you can TRUST your match prediction (i.e., the QUALITY OF THE RESULT) ... In this case, you modeled your ANN to tell you "well, I am just a machine. This mathematical model in the ANN is not always perfect in predicting. I am only 80% confident " ... You can add arbitrarily more output values to EXPLAIN the output, but, eventually you are limited by the mathematical model behind the ANNs ...
Nabil writes
"Contrarily to machines, humans are able to explain their decisions."
Actually humans can't and don't explain their decisions...
But they may rationalise them!
A. Why prefer a particular colour, or mate, or route?
B. How do you recognize a face, an emotion/expression, a word or gesture?
C. What's wrong with this machine/system/person?
Generally we make decisions at lower than conscious levels, and some of these are tuned by our social/cultural/linguistic/emotional/educational background, but for the vast majority of our decisions, these are made before or without reaching conscious attention.
In terms of A above, we really have no idea, and our computer models and AIs aren't even on the page yet.
In terms of B above, the processes actually seem to be different for things you learn in infancy versus later, and computational intelligence has given us systems that do reasonably but not particularly well, and not particularly like a human. Some of these systems are based on neural networks, or similar abstractions like SVMs and Adaboost, and we can always work out exactly why a system made a decision if we choose to dig that deep.
In terms of C above, experts internalise their formal instruction, their reading of manuals, their experience of many different cases or situations, and make an intuitive decision, and the job of the knowledge engineering is to start with their basic ideas, which explain maybe the 80% of simple decisions, and case be case tease out what they are actually doing that's different and get quickly up to 90% and maybe up to 99%. This is a formalisation of the process of rationalisation. Learning systems with hidden states, or hidden layers of neurons, may be able to recover a similar process but this is unlikely. Different experts do things different way and it is hard to reconcile models based on a team of experts; different classifiers do things differently too; and there is a whole subfield on fusion, boosting, ensembles, etc.
In fact, whether we are talking about human committees or classifier ensembles, we don't want a committee of yes-men (or even yes-women) who think/decide the same way, and figuring out what is different in how they make their decisions is important in developing the best classifier. In fact, diversity is just as important as accuracy:
a. we'd like members that get different questions/cases wrong;
b. we'd like members that get different wrong answers when not correct.
There is also an old and perhaps forgotten form of learning called "Explanation-Based Learning" which makes use of cases and explanations to learn more like the way we expect, and so directly incorporates explanation.
Turning our ANNs or Decision Trees, or Ensembles or Forests of them, into rules is trivial, and rules can be used to give how and why explanations (see any text on knowledge engineering and any decent textbook on data mining). These rules and explanations can be used by metalevel learners for intelligent fusion (rather than dumb stacking), along with more basic information about diversity (typically measured with kappa, which may also be used to measure chance-corrected accuracy, and to ensure fusion/boosting doesn't overtrain to the noise, biases and prevalences of the data, or the humans who labelled the data).
So your question is a very deep one, and not a mere academic question, since the availability of explanation is a major area of research across the cognitive sciences as well as in artificial intelligence.
Just a comment: Turning ANNs into rules is not trivial ! It is a NP-hard problem !
Golea, M. "On the complexity of rule extraction from neural networks and network querying. R ules and Net works (1996).
Nabil writes
"Contrarily to machines, humans are able to explain their decisions."
Actually my student and me wrote a paper in which we have shown that ANN modeling can explain people decisions better than them.
Different persons in Ben-Gurion University in Be'er Sheva, Israel were asked to provide us with e-mails they received, grade their importance to them, and give weights for the importance of each (non-trivial) word in the e-mail.
We were able to train an ANN model to correctly predict the importance of e-mails not included in the training set. The Causal Index algorithm I use, that relates the importance of each feature in the data to the ANN outcome (see my earlier contribution) did not agree with the persons' assessment of the terms' importance.
However, we were able to show that the terms people gave zero-importance and we found to be significant were significantly present in "important" e-mails than in "non-important" e-mail - the receivers of the e-mail just did not realize it.
.Z. Boger, T. Kuflik, P. Shoval and B. Shapira, Automatic keyword identification by artificial neural networks compared to manual identification by users of filtering Systems.Information Processing & Management, vol. 37(2), 2001, pp. 187-198.
In my opinion the only superiority of neural networks i.e. compared to other computer programs is that they are able to learn entirely from data. As for what you will do with the learned data, humans still need to give their input. This data still needs the input of the human heart to make decisions. The computer therefore does most of the brain work and you only input your rational decisions.
If the ANN is a Multilayer perceptron, many studies have been done to extract rules of decisions, however the process is not simple, but if you use Radial Basis Functions (with the same computation power as MLPs), rules are easy to extract, as RBF is a subset of Fuzzy Systems. In unsupervised ANNs, clustering and data mining is applied as in K-clustering.
As we know ANN is black box learning...where its' output is highly depend on the input that has been fed up in learning process. Therefore, the application of ANN in solving prediction, classification and forecasting has been hybrid with fuzzy, rough set and etc to improve the quality of output. and to make the decision is more understandable.
@Julio: Crisp rules and fuzzy rules should be distinguished. Extracting crisp rules from an RBF network is still NP-hard. Perhaps fuzzy rules is easier, but I am not so sure.
@Guido,Julio
Explain the outputs of an ANN does not mean to extract rules. An ANN does not work as a decision tree. A MLP projects the inputs on a manifold, where the data are linearly separables. Extracting rules of a non-based rule system is not a piece of cake. A best way is to cluster the data on the manifold and then to explain the clusters, exactly as we do for linear analysis.
Actually, ANN is a general model. I mean that any model like regression can be showed in ANN architecture. So this comment can not be generalized to all ANN types. But, multilayer perceptron ANN has more complicate structure and it is very difficult to determine contrubution of explanatory variables. In a regression method, some comments can be made because of linear structure, but linearity is very strict assumption. Because of this how can we turst this kind of comments? If we make MLP-ANN whitebox model, we need to obtain confidence intervals for prediction and hyphothesis test for parameters to determine contrubution of input varables. These targets can be achived by using some statistical techniques. For example, confidence intervals can be obtained by using bootstrap methods. Some papers were published about this topic.
Some thoughts are in my work " Y. U. Osipovv. The Recurrent Neural Network with Two Signal System. Information and Control Systems (Informatsionno-upravliaiushchie sistemy), 4(65)/2013 (In Russian)"
ANN is black box learning...the ANN can be hybrid with fuzzy and make an output and a decision interpretable by the human experts
I'm new to this post. So please pardon me if I happen to echo an earlier post.
I suspect one could architect ANN to make it interpretable but fear loss of generality. Such an approach might devolve into fitting coefficients to an equation imposed by the architect.
I have the idea that incorporating fuzzy may be the most tractable approach to get interpretation while avoiding the worst aspects of simple curve fitting. I've considered working with a fuzzy reformulation by Hemanta Baruah that respects excluded middle and non-contradiction. Such an approach might look more like probabilistic networks and avoid some of the heroics that we sometimes see with classical fuzzy when we don't want membership to sum above one at any point. Unfortunately, I've too many other projects to pursue my thoughts right now -- so I might even be writing gibberish.
ANN systems can be figure like as an brain type. For example; clouds and winter give rise to rain. In here, inputs show to ouput but brain does not explain to this one. This mean is that ANNs can reflects in digital area...
Hi Belgasmi,
Please check the attached publication, where I discuss an interesting approach to open black-box models, including neural networks:
P. Cortez and M.J. Embrechts. Using Sensitivity Analysis and Visualization Techniques to Open Black Box Data Mining Models. In Information Sciences, Elsevier, 225:1-17, March 2013, ISSN 0020-0255.
http://dx.doi.org/10.1016/j.ins.2012.10.039
http://www3.dsi.uminho.pt/pcortez/nsensitivity2.pdf (pre-press pdf)
Regards,
Paulo Cortez
using analysis of weight of ANN you can analysis sensitivity of input/outputs
and also with help of fuzzy system integrated with ANN as ANFIS
You can also try selective attention artificial neural networks (e.g. Sigma-if), which automatically select most important features for classification during training and during working. Sigma-if will give you features subsets for each training vector, and you can always see which inputs were used to prepare the result. The computation time on a sequential machine is smaller in comparison to ANNs without selective attention, as Sigma-if automatically uses (or not-uses) connections connected with features that are important (or not important) for the given data vector. Selective attention - as e.g. in humans - reduces computation cost and time while increasing classification accuracy.
As most aspects of human reasoning under uncertainty, the answer is both yes and no
to varying degrees. It is an established technique in some applications in fault isolation, identification and classification to combine fuzzy logic and neural network methodologies to create what is referred to as a neurofuzzy system. In combining a data
driven approach with the numerical representation of subject experts to produce
explainations as outcomes to the system. Case in point was RSL. LTD's Health usage and monitoring systems for gas turbine experts, Ellen Applbaum.
In that paper the fuzzy classifier prototype was created. Eventually my recommendation to go the way of a neurofuzzy classifier was adopted in the implementatiion of a
product. United Technologies used a neural network for this approach which clearly
indicated to me how the two methodologies could be combined.
Right now, I see this in the scope of medical imaging applications -- in ultrasound.
I will try to put on my project site the work our team has done in the ECE510 course this semester on estimation of plaque motion which involves both a statistical and
subjective evaluation of results.
One of the most important things in working with neural networks is its internal structure, which in the case of support vector machines are known as kernels. It is for this simple example neural networks do not give a further explanation of the solution of the problem in infinite spaces, however in the case of radial basis neural networks, the case is different, because this type of neural networks working on finite spaces which in many cases give explanation of the behavior of the variables, while output weights show you what level of significance of this network each subspace generated. However, neural networks are not an expert system, and the knowledge that shed must go hand in hand with the context of the problem and wants to solve the problem the user.
Restricted Boltzmann Machines (RBMs) are a type of Neural Networks where the receptive field of each hidden unit can be plotted. This allows for a certain interpretability. Here you can find a plot as an example: http://scikit-learn.org/stable/auto_examples/plot_rbm_logistic_classification.html
Bekir,
Apparently this subject is very interesting to the ANN community, as this discussion is active since September and thus it is hard to see all of it.
I retype my answer weeks ago. Although it was meant to show that supervised ANN is not a black box, my third method applies also to auto-associative ANN, as the automatic partitioning of the examples based on the "binary" can identify those samples which has the same essential information, and thus belong to the same cluster. Once those clusters are identified, dividing the mean of each feature in a class by the mean of this feature in the full data, reveal that those features with ratios different from unity are those responsible for their being in a separate cluster.
However, this requires to use a small number of hidden neurons (I found that 5 is the most useful number).
I saw your paper on ANN and diabetics, and will be grateful if you can e-mail me (to [email protected]) your data, as I would like to train it and find the relationships between each feature and the outcome using the Causal index described below.
Sorry to get down to practical instead of the theoretical explanations.In my papers I describe three ways to make the the trained ANN model more than a "black box".
The first is the "causal index", described back in 1990 by three Japanese researchers, (1) that calculate quite simply from the connection weights the qualitative influence of each feature on the any of the ANN outputs (CI positive means that increase in this feature value will increase the output value, and vice versa).
The second one is the identification of what features are not relevant to the ANN output and can be discarded and the ANN retrained with the reduced feature set (2). The eliminated features are those who do not contribute significant variance to any of the hidden neurons outputs in the trained ANN (either they do not chance much in the data, or the assigned connection weights between them and all the hidden neurons are very small).
The third one is the based on the hidden neurons outputs, which tend to be "binary" in a well trained ANN (3). All examples that generate the same binary pattern of the hidden neurons outputs belong to the same class, as they contain the same essential information.
More can be found in my publications available in my Research Gate page.
1. Baba, K., Enbutu, I., Yoda, M.: Explicit Representation of Knowledge Acquired from Plant Historical Data using Neural Network. Proc. of the Intl. Joint Conference on Neural Networks 3 (1990) 155-160
2. Boger, Z., Guterman, H.: Knowledge Extraction from Artificial Neural Networks Models. Proc. IEEE Intl. Conf. Systems Man and Cybernetics (1997) 3030-3035
3. Z. Boger, Finding patient cluster attributes using auto-associative ANN modeling. Proceedings of the International Joint Conference on Neural Networks, 2003, pt. 4, pp. 2643-8.