Can you explain what you are trying to do? What does your FAM do? Is FAM a very popular method/algorithm/mathematical law in your engineering field? Can you accept something else that is not FAM but it does exactly or similar like the FAM? This is the 3rd time you posted about FAM. So, we want to help you meaningfully, but not irresponsibly giving you many links and ask you search blindly (something that you don't even know what they are).
Links no longer solve your problem at this point. Engineering knowledge may solve your problem.
Yew-Chung Chak Hi, there are multi input vectors and for each input vector there is corresponding output vector (each input vector with it's corresponding output vector represents a case), I want to use FAM for pattern recognition , I know there are other methods for pattern recognition but I want to examine FAM .
Thanks, but I cannot gather sufficient info from your answers, based on all questions that I specifically asked. I'm trying to "interpolate" your answers as follows:
Are you implying there are N Single-Input Single-Output (SISO) "systems"? If so, then you should be able to produce N single-output Sugeno fuzzy inference systems (FIS) using neural nets to train the input/output data.
If each FIS satisfactorily delivers the best fit to a series of input/output data points, can you store the N FIS pattern pairs in the FAM?
If two functions are given, exp(–x2) and tanh(x), can you use FAM to recognize the sigmoidal pattern? I want to find out whether or not you can solve this simple problem.
This link will provide you with access to a thirty-five page scientific research paper on " . . . Fuzzy Morphological Memories . . . .":
https://arxiv.org/pdf/1902.04144.pdf
"Abstract
"MaxC and minD projection autoassociative fuzzy morphological memories (maxC and minD PAFMMs) are two layer feedforward fuzzy morphological neural networks able to implement an associative memory designed for the storage and retrieval of finite fuzzy sets or vectors on a hypercube. In this paper we address the main features of these auto-associative memories, which include unlimited absolute storage capacity, fast retrieval of stored items, few spurious memories, and an excellent tolerance to either dilative noise or erosive noise. Particular attention is given to the so-called PAFMM of Zadeh which, besides performing no floating-point operations, exhibit the largest noise tolerance among max C and min D PAFMMs. Computational experiments reveal that Zadeh’s max C PFAMM, combined with a noise masking strategy, yields a fast and robust classifier with strong potential for face recognition."
Here is an excerpt which focuses on input and output:
"The user can estimate FAM rules without counting the quantization vectors in each FAM cell. There may be too many FAM cells to search at each estimation iteration. The user never need examine FAM cells. Instead the user checks the synaptic vector components rnij. The user defines in advance fuzzy-set intervals, such as [1NC, UNL] for NL. If bvL < rnij < UNL, then the FAM-antecedent reads "IF X is NL." Suppose the input and output spaces X and Y are the same, the real interval [-35, 35]. Suppose we partition X and Y into the same seven disjoint fuzzy sets: NL = [-35,-25] NM = [-25,-151 NS = [-15,-5] ZE = [-5, 5] PS = [5, 15] PM = [15, 25] PL = [25, 35] Then the observed synaptic vector mj = [9, -10] increases the count of FAM cell PS × NS and increases the weight of FAM rule "IF X is PS, THEN Y is NS." This amounts to nearest-neighbor classification of synaptic quantization vectors. We assign quantization vector mk to FAM cell Fij iff mk is closer to the centroid of Fij than to all other FAM-cell centrolds. We break ties a ...."
Fuzzy associative memories: A design through fuzzy clustering
C. Zhong, W. Pedrycz, +2 authors Lina Li
Published 15 January 2016
Mathematics, Computer Science
Neurocomputing
In this study, we discuss a design of fuzzy associative structures (memories) realized within the framework of fuzzy clustering. Associative memories are inherently direction-free structures (and the recall of objects can be realized for any variable or a subset of variables). Fuzzy clustering being direction-free comes here as a sound design alternative. Two recall proposals are studied: one involves prototypes (being the key descriptors of the structure of the data) and their activation in… Expand
View via Publisher
Save to Library
Create Alert
Cite
Share This Paper
7 Citations
View All / Background Citations1Methods Citations1
Fuzzy associative memories: A design through fuzzy clustering
C. Zhong, W. Pedrycz, +2 authors Lina Li
Published 15 January 2016
Mathematics, Computer Science
Neurocomputing
In this study, we discuss a design of fuzzy associative structures (memories) realized within the framework of fuzzy clustering. Associative memories are inherently direction-free structures (and the recall of objects can be realized for any variable or a subset of variables). Fuzzy clustering being direction-free comes here as a sound design alternative. Two recall proposals are studied: one involves prototypes (being the key descriptors of the structure of the data) and their activation in… Expand
View via Publisher
Save to Library
Create Alert
Cite
Share This Paper
7 Citations
View AllBackground Citations1Methods Citations1
Figures, Tables, and Topics from this paper
Figure 1 📷
Table 1 📷
Figure 2 📷
Figure 3 📷
📷
View All 5 Figures & Tables
Fuzzy clustering
Cluster analysis
Fuzzy cognitive map
Fuzzy set
Coefficient
Mathematical optimization
Algorithm
7 Citations
Date Range
Citation Type
Has PDF
Publication Type
Author
More Filters
Sort by RelevanceSort by Most Influenced PapersSort by Citation CountSort by Recency
Fuzzy associative memories with autoencoding mechanisms
Lina Li, W. Pedrycz, T. Qu, Zhiwu Li
Computer ScienceKnowl. Based Syst.
2020
A logic-driven model of two-level fuzzy associative memories augmented by autoencoding processing is developed, composed of two functional modules that help achieve storing and completing the recall realized by a logic-oriented associative memory. ExpandTLDR
4
Save
Alert
Development of associative memories with transformed data
Lina Li, W. Pedrycz, Zhiwu Li
Mathematics, Computer ScienceAppl. Soft Comput.
2017
This study is concerned with the enhancements of performance of associative memories by nonlinear transformations of spaces of data to be associated to enhance the recall abilities of the memories. ExpandTLDR
4
PDF
Save
Alert
Bidirectional and multidirectional associative memories as models in linkage analysis in data analytics: Conceptual and algorithmic developments
A. Pedrycz
Computer ScienceKnowl. Based Syst.
2018
This study revisits and augment the concept of associative memories by offering some new conceptual design insights where the corresponding mappings are realized on a basis of a related collection of landmarks (prototypes) over which an associative mapping becomes spanned. ExpandTLDR
2
PDF
Save
Alert
Pattern classification using smallest normalized difference associative memory
Rogelio Ramírez-Rubio, M. Aldape-Pérez, C. Yáñez-Márquez, I. López-Yáñez, O. C. Nieto
Computer SciencePattern Recognit. Lett.
2017
The proposed algorithm overcomes the limitations of the original Alpha-Beta associative memory, while maintaining the fundamental set recalling capacity, and achieves the best classification accuracy averaged over the all datasets addressed in the present work. ExpandTLDR
16
Save
Alert
Design of granular interval-valued information granules with the use of the principle of justifiable granularity and their applications to system modeling of higher type
D. Wang, W. Pedrycz, Zhiwu Li
Mathematics, Computer ScienceSoft Comput.
2016
This study engages a principle of justifiable granularity as a way of forming type-1 and type-2 information granules—granular interval-valued informationgranules, whose descriptors are intervals themselves rather than numeric entities. ExpandTLDR
12
View 2 excerpts, cites background
Save
Alert
Traceability of Information Routing Based on Fuzzy Associative Memory Modelling in Fisheries Supply Chain
T. Djatna, Aditia Ginantaka
Computer ScienceInt. J. Fuzzy Syst.
2020
Modelling of routing and handling time prediction using a fuzzy associative memory (FAM) method is presented, showing that from such a FAM formulation, one can obtain 27 rules and the results of the computational experiment show that the total handling time for this case will be 66 h with low error rates. ExpandTLDR
2
Save
Alert
Moisture prediction of sweet potato-quinoa-kiwicha flakes dried by rotary drum dryer using artificial intelligence
V. Vasquez-Villalobos, Orlando Hernández-Bracamonte, Julio Rojas-Naccha, Viviano Ninaquispe-Zare, C. ROJAS-PADILLA, Julia Vásquez-Angulo
Biology
2018
In this research the use of FL among variables, enabled us to get the best prediction adjustment of experimental values, and it is recommended to integrate RSM and GA into optimization studies. ExpandTLDR
2
PDF
View 1 excerpt, cites methods
Save
Alert
References
SHOWING 1-10 OF 47 REFERENCES
SORT BYRelevanceMost Influenced PapersRecency
A general framework for fuzzy morphological associative memories
M. E. Valle, P. Sussner
Mathematics, Computer ScienceFuzzy Sets Syst.
2008
It is shown that many well-known FAM models fit within this framework and can therefore be classified as FMAMs, and certain concepts of duality that are defined in the general theory of MM are employed in order to derive a large class of strategies for learning and recall inFMAMs. ExpandTLDR
78
PDF
Save
Alert
Implicative Fuzzy Associative Memories
P. Sussner, M. E. Valle
Mathematics, Computer ScienceIEEE Transactions on Fuzzy Systems
2006
This paper introduces implicative fuzzy associative memories (IFAMs), a class of associative neural memories based on fuzzy set theory, and presents a series of results for autoassociative models including one pass convergence, unlimited storage capacity and tolerance with respect to eroded patterns. ExpandTLDR
99
View 1 excerpt, references background
Save
Alert
Storage and recall capabilities of fuzzy morphological associative memories with adjunction-based learning
M. E. Valle, P. Sussner
Mathematics, Computer ScienceNeural Networks
2011
The recall phase of fuzzy morphological associative memories is characterized and several theorems concerning the storage capacity, noise tolerance, fixed points, and convergence of auto-associative FMAMs are proved. ExpandTLDR
41
Save
Alert
The Kosko Subsethood Fuzzy Associative Memory (KS-FAM): Mathematical Background and Applications in Computer Vision
P. Sussner, E. Esmi, I. Villaverde, M. Graña
Computer ScienceJournal of Mathematical Imaging and Vision
2011
This paper proves several theorems concerning the conditions of perfect recall, the absolute storage capacity, and the output patterns produced by the KS-FAM, and proposes a normalization strategy for the training and recall phases of the KS -FAM. ExpandTLDR
23
View 1 excerpt, references background
Save
Alert
Fuzzy Associative Memories and Their Relationship to Mathematical Morphology
P. Sussner, M. E. Valle
Computer Science
2008
Kosko’s FAM suffers from an extremely low storage capacity of one rule per FAM matrix, which limits its applications to problems such as backing up a truck and trailer, target tracking, and voice cell control in ATM networks. ExpandTLDR
43
PDF
Save
Alert
On fuzzy associative memory with multiple-rule storage capacity
The inherent property for storing multiple rules in a FAM matrix is identified and a theorem for perfect recalls of all the stored rules is established and based upon which the hardware and computation requirements of the FAM model can be reduced significantly. ExpandTLDR
81
View 1 excerpt, references methods
Save
Alert
Correlation Matrix Memories
T. Kohonen
Computer Science, MathematicsIEEE Transactions on Computers
1972
A new model for associative memory, based on a correlation matrix, is suggested, in which any part of the memorized information can be used as a key and the memories are selective with respect to accumulated data. ExpandTLDR
885
Highly Influential
PDF
View 5 excerpts, references background and methods
Save
Alert
Interval-valued fuzzy associative memories based on representable conjunctions with applications in prediction
P. Sussner, T. Schuster
Mathematics, Computer Science2013 Joint IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS)
2013
Interval-valued FMAMs (IV-FMAMs) are defined and representable conjunctions of interval-valued fuzzy set theory are employed to define and applied to a problem of monthly streamflow prediction of a large Brazilian hydro-electric plant. Expand / TLDR
6
View 1 excerpt, references background
Save
Alert
Observations on morphological associative memories and the kernel method
P. Sussner
Computer Science / Neurocomputing
2000
This paper establishes the proofs for all claims made about the choice of kernel vectors and perfect recall in kernel method applications and provides arguments for the success of both approaches beyond the experimental results presented up to this point. Expand / TLDR
74
View 1 excerpt, references background
Save
Alert
Fuzzy Associative Conjuncted Maps Network
Hanlin Goh, Joo-Hwee Lim, Hiok Chai Quek
Computer Science, Medicine / IEEE Transactions on Neural Networks
2009
The fuzzy associative conjuncted maps (FASCOM) is a fuzzy neural network that associates data of nonlinearly related inputs and outputs and exhibits the network's suitability to perform analysis and prediction on real-world applications, such as traffic density prediction as shown in this paper. Expand / TLDR
12
PDF
View 1 excerpt
Save
Alert
1
2
3
4
5
Related Papers
The objective of this study is to develop a new design methodology of constructing incremental fuzzy rules formed through fuzzy clustering with the aid of context-based Fuzzy C-Means (C-FCM) clustering.
Reinforced rule-based fuzzy models: Design and analysisEun-Hu Kim, Sung-Kwun Oh, W. Pedrycz Computer ScienceKnowl. Based Syst.2017 TLDR
17
Save
Alert
Particle Swarm Optimization is used to determine the optimal transformation realized on a basis of a certain performance index, and the proposed fuzzy clustering method achieves better performance in comparison with the outcomes produced by the generic version of the FCM algorithm.
Fuzzy clustering with nonlinearly transformed dataXiubin Zhu, W. Pedrycz, Zhiwu Li Mathematics, Computer Science / Appl. Soft Comput.2017 TLDR
24
Save
Alert
Experiments have been performed on real datasets to compare the Subtractive and FCM clustering, and the effect of increase in the radius size is analyzed in SubTractive clustering.
Fuzzy model generation using Subtractive and Fuzzy C-Means clustering
L. Goyal, Mamta Mittal, Jasleen K. Sethi Computer Science / CSI Transactions on ICT2016 TLDR
About fifty years ago, holography was proposed as a model of associative memory. Associative memories with similar properties were soon after implemented as simple networks of threshold neurons by Willshaw and Longuet-Higgins. In these pages I will show that today’s deep nets are an incremental improvement of the original associative networks. Thinking about deep learning in terms of associative networks provides a more realistic and sober perspective on the promises of deep learning and on its role in eventually understanding human intelligence. As a bonus, this discussion also uncovers connections with several interesting topics in applied math: random features, random projections, neural ensembles, randomized kernels, memory and generalization, vector quantization and hierarchical vector quantization, random vectors and orthogonal basis, NTK and radial kernels. This material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216. Deep Networks as Associative Nets Tomaso Poggio Abstract About fifty years ago, holography was proposed as a model of associative memory. Associative memories with similar properties were soon after implemented as simple networks of threshold neurons by Willshaw and Longuet-Higgins. In these pages I will show that today’s deep nets are an incremental improvement of the original associative networks. Thinking about deep learning in terms of associative networks provides a more realistic and sober perspective on the promises of deep learning and on its role in eventually understanding human intelligence. As a bonus, this discussion also uncovers connections with several interesting topics in applied math: random features, random projections, neural ensembles, randomized kernels, memory and generalization, vector quantization and hierarchical vector quantization, random vectors and orthogonal basis, NTK and radial kernels.
1 Introduction
The plan of this brief note is to show that today’s deep nets can be regarded as refurbishing the old networks proposed fifty year ago as associative memories, with properties similar to holography. After this first part, I will discuss a number of intriguing relations between random features, random projections, neural ensembles, randomized kernels, memory and generalization, vector quantization and hierarchical vector quantization, random vectors and orthogonal basis, NTK and radial kernels. The third and final part of this note discusses briefly the role that associative, recurrent and deep, networks may play in our attempts to understand human intelligence.
2 From associative nets to deep nets
2.1 Willshaw Nets Holograms store information in the form of an optical interference pattern recorded in a photosensitive optical material. Light from a single laser beam illuminates a noise-like reference image (originally produced from ground glass) as well as the pattern to be stored, producing an interference pattern stored in the hologram. Many thousand such pairs of associations can be recorded on a single hologram. Each stored data can then be read-out from the hologram by using as input its associated reference pattern. 1 Figure 1: A original figure from Willshaw et al., [1] showing an associative memory network. The matrix of connections correspond to the matrix W of weights in a shallow network, see text. 2 The basic associative memory AX,Y can be modeled as a one layer “shallow” network [1] storing the correlation matrix between input and output. Figure 1 shows the training phase of the network. In the read-out phase, the output y can be retrieved by inputting the associated x to the network, that is by computing AX,Y ◦ x (Willshaw computed R ◦ AX,Y ◦ x where R represents a set of thresholds on the outputs to improve accuracy retrieval in a otherwise linear network).
The basic idea is as follows. Suppose that we want to associate each pattern yn, n = 1, · · · , N to a noise-like key vector xn, where x, y ∈ RD and N < D. The noise-like assumption on the xn is equivalent to assuming that XXT ≈ I, where X is the matrix of all the inputs (xn are the columns of X). The optimal least-square solution of the equation AX = Y is A = Y XT (XXT ) −1 = Y XT .
Thus, if I want to retrieve yi , I input the key xi to the network and get Axi ≈ yi,jδi,j = yi . Of course, in dealing with binary vectors, this linear associative network can be improved by using thresholds to clean up the output as Willshaw did.
In any case, this is still a one layer network, quite different from modern multilayer networks. It turns out that Willshaw experimented “de facto” with multilayer networks when he found that a recurrent version of his one layer network performed quite well.
As he reported “...it was found by computer simulation...that the initial response to a given cue could be improved by feeding the output back into the associative net and continuing until the sequence of outputs so generated converged onto a single pattern...”. Furthermore, “The same "cleaning-up behavior" was seen when patterns were stored in sequence. Pattern A was associated with B. B with C. C with D. and so on, the last pattern being stored with A.
When a fragment of A was used as a cue and then the output used as the next input, after a few passes the sequence of retrieved patterns converged onto the stored sequence, even when the initial cue was a very poor representation of one of the stored patterns.
Simulation experiments were performed to see what cycle of outputs would result from any arbitrarily selected cue. (Because each input determines the next output and there is only a finite number of possible outputs, the sequence of outputs must eventually lead into a cycle.)..” It is quite easy to see what is going on.
Using the algorithm above, one can associate x1 to x2 and x2 to x3 and so on, performing the kind of cyclic retrieval described in the second paragraph above.
Of course a recurrent network is just a multilayer network with shared weights across different layers[2]. Thus fifty years ago we had already the idea and the implementation of single layer as well as recurrent associative networks !
2.2 Shallow, deep and recurrent networks
Is there something more that we can say about associative nets? The following is a simple additional observation about depth. As we already mentioned, the optimal least square solution of AX = Y is A = Y X† = Y XT (XXT ) −1 . This suggests (among other possibilities) a 2-layer network with W1 = (XXT ) −1 and a read-out layer W2 = Y XT . (1) 3 Figure 2: Setting K = I − XXT allows the recurrent network as well as its unrolled deep network counterpart to compute (XXT ) −1 .
Interestingly, the computation of W1 = (XXT ) −1 can be performed by a recurrent network. Assume that the weight matrix of the recurrent network is set to Wi = (I − XXT ), ∀i = 1, · · · , L − 1., (2) with the last read-out layer set to be WL = Y XT . Since division of operators can be approximated by its power expansion, that is I I−K = (I+K+K2+· · ·), a recurrent network as shown in Figure 2 computes (I − K) −1 . If K = I − XXT , the recurrent network computes (XXT ) −1 .
Alternatively, a recurrent network can be replaced by a deep residual network (ResNet) of L − 1 layers with the same K (see Figure and [3, 2]. Convergence requires the condition ||XXT − I|| < 1, which is usually satisfied if the weight matrices are normalized (for instance by batch normalization). Estimates about retrieval errors in such associative memories and ways to reduce them by using thresholds are given in [1, 4]. Thus training a recurrent network under the square loss on a training set (X, Y ) by unrolling it in L layers and imposing shared weights for the first L − 1 layers should converge to the quasi-optimal solution suggested by Equations 1 and 2.
So far I have described linear networks.
The RELU nonlinearity after unit summation can be added as follows. Let us assume a deep network written as f(x) = (VLσ(VL−1 · · · σ(V1x))) (3) 4 where σ(x) = σ 0 (x)x, which captures the homogeneity property of the RELU activation. The equation can be rewritten for each training example as f(xj ) = VLDL−1(xj )VL−1 · · · · · · Vk+1Dk(xj )Vk · · · D1(xj )V1xj (4) where Dk(xj ) is a diagonal matrix with 0 and 1 entries depending on whether the corresponding RELU is active or not for the specific input xj , that is Dk−1(xj ) = diag[σ 0 (Nk(xj )] with Nk(xj ) the input to layer k. The presence of the D(x) matrices makes the networks much more powerful in terms of approximating any continuous functions instead of just linear functions. It also affects the linear analysis described earlier. Remarks • The convergence of a recurrent network for L → ∞ – where L is the number of iterations – is guaranteed by Brower’s fixed point theorem if the operator T z = W z is non-expansive, that is if ||T x − T y|| ≤ ||x − y||. The fact that the operator corresponding to the transformation of each layer of the network is non-expanding follows from the fact that ||W z|| ≤ ||W||||z||, assuming that ||W|| = 1 because of batch normalization(BN) (see [5] for the importance of BN). Notice that this holds for linear networks but also for networks with RELU nonlinearities. If the inputs x satisfy ||x|| ≤ 1 the set of fixed points of T contains a unique minimum norm element (see [6]) • Deep networks with L − 1 layers of identical input and output dimensionality and shared weights across layers are equivalent to a one-layer recurrent network run for L−1 iterations. Empirically it seems[2] that non-shared weights give only a small advantage despite the much larger number of parameters with respect to equivalent shared-weights networks. From this perspective, multiple layers may be required only to exploit the blessing of compositionality[7, 8]. In other words, depth’s main purpose may be to allow pooling at certain stages (even just by subsampling). • Consider instead of Wi,j = (XXT )i,j the choice Wi,j = K(xi , xj ) = X∞ ` λ`φ`(xi)Φ`(xj ) = Φ(xi)ΦT (xj ) (5) where the (infinite) column vector Φ(x) = λ 1 2 i φ`(x) and λ` are the eigenvalues of the integral operator associated with K. A shift-invariant kernel such as the Gaussian kernel has φ`(x) which are orthonormal Fourier eigenfunctions. It can be approximated by random Fourier features e −iωx with ω drawn from a Gaussian distribution [9]. • The “holographic” scheme of using a “noise-like” key vector associated with a signal is almost exactly the algorithm used in the spread spectrum CDMA techniques used to encode and decode cell phones communication. 5 3 Discussion We have described how deep and recurrent networks can be regarded as stacked associative one-layer networks of the Willshaw type. This perspective is interesting for two main reasons. First, it connects deep networks with several classical ideas such as random quasi-orthogonal basis, kernels, randomized RKHS features, the key role of normalization and compositionality. Second, if deep networks are “just” associative memories, what is their role in explaining intelligence? In other words: is associative memory a key part of human intelligence?
3.1 Connections between deep learning and signal processing • The old associative networks assumed noise-like inputs that are approximately orthogonal (like in the original concept of holography implementing an associative memory), that is x T i xj = δi,j . A recent analysis [5] of deep network trained under the square loss identifies a bias towards orthogonality induced by normalization techniques such as batch normalization. Quasi-orthogonality makes it easy to invert a deep network as it is required in an autoencoder. Notions related to random projections and the Johnson-Lindestrauss lemma may also be relevant. • I did not say much about convolutional networks.
The architecture of convolutional networks reflects a specific type of Directed Acyclic Graph (DAG). It turns out that all functions of several variables can be decomposed according to one or more DAGs as compositional functions, that is functions of functions[8]. Often such decompositions satisfy a hierarchical locality condition: even if the dimension of the overall function is arbitrarily high, the constituent functions are of small, bounded dimensionality. For these functions and these decompositions, approximation theory proves[8] that deep networks reflecting the underlying compositional DAG can avoid the curse of dimensionality, whereas shallow networks cannot. Convolutional networks are an example of this (locality of the kernel rather than weight sharing is the key property in avoiding exponential complexity). Not accidentally, convolutional networks represent one of the main success stories of deep learning. Thus the main reason for deep networks as opposed to shallow, recurrent networks may in fact be to escape the curse of dimensionality by exploiting compositionality: this requires what we called earlier “pooling”, that is stages at which the outputs of constituents functions undergoes aggregation , as in Figure 3. • Compositional architectures can be regarded as reflecting iterated functional relations of the kind “compose parts” as in f(x1, x2, x3) = f1(f2(x2, x3), f3(x3)), where f1 reflects the composition of f2 and f3 and f2 composes x1 and x2. A deep associative network of this type is then closely related to what is called “hierarchical vector quantization (VQ)”[10].
The similarity is especially strong if we assume weight matrices that are derived from RBF kernels. This corresponds to memorizing, at the lowest level, the association of basic features and then the association of their associations (think of hierarchical JPEG 6 x1 x2 x3 x4 x5 x6 x7 x8 + x1 x2 x3 x4 x5 x6 x7 x8 + x1 x2 x3 x4 x5 x6 x7 x8 + x1 x2 x3 x4 x5 x6 x7 x8 Figure 3: The figure shows the graph of a function of eight variables with constituent functions of dimension two. encoding)1 . • The claim that deep networks are quite similar to “linear” RBF networks is supported by recent results[11] on the Neural Tangent Kernel (NTK).
It turns out that under certain training conditions (e.g. starting with “largish” norms for the matrices weights) a deep network converges to a set of weight matrices that corresponds to a standard kernel machine with the NTK kernel. Furthermore, classification performance is quite good – though not the best possible – and the NTK itself is equivalent[12] to a classical RBF kernel, the Laplacian. • An alternative to deep networks as models of the brain are neural assemblies. The idea received new life from some recent very interesting work [13].
The obvious question is about connections between neural assemblies and associative memories.
3.2 Is human intelligence “just” associative memory?
Thirty years ago I wrote a paper[14] proposing “ that much information processing in the brain is performed through modules that are similar to enhanced look-up tables”. I had in mind associative memories and implementations such as RBF networks (see2 Equation 5): for instance for a Gaussian kernel, increasing σ changes the network from a look-up table kind of memory, that
1 Starting from a small number of primitive features, there is a hierarchy of more complex features each one being an association of simple features. If the simple features are stored then only some of the more complex ones – only the ones which are used – need to be stored as associations. This is similar to a dictionary storing only some of the infinite number of words that may be created from a finite alphabet of letters.
2 RBF networks are usually thought as Gaussian unit computing e − (x−xi ) 2 σ2 where xi is the “center” of unit i; in Equation 5 the network reflect the dual form of a RBF network in terms of the Fourier features of the Gaussian. 7 recognizes only the training data, to a “learning” system that generalizes beyond the training data. Willshaw only looked at his network as a memory. It was better than a pure look-up table since it could work well with noisy or partial inputs but its function was to memorize and retrieve.
The machine learning and neural network community has looked only at generalization beyond the training data. In fact, the boundary between associative networks – shallow or deep — and learning networks is very thin, since the underlying machinery is very much the same and the difference is just in parameter values.
This was the reason I wrote that the idea of intelligence grounded on associative memory “suggests some possibly interesting ideas about the evolution of intelligence...There is a duality between computation and memory. ... Given that the brain probably has a prodigious amount of memory ... is it possible that part of intelligence may be built from a set of interpolating look-up look-up tables? One advantage of this point of view is to make perhaps easier to understand how intelligence may have evolved from simple associative reflexes...”. Clearly human intelligence is not just associative memory. Because of the previous discussion this also means that intelligence is not just deep learning. It is possible, however, that the intelligence of a dog may be explainable in terms of associative memory modules or equivalently deep or shallow networks. It is also very likely that human intelligence evolved from associative memories and that associative networks are still an important part of how we think, from visual and speech recognition to Kahneman’s System One which is fast, intuitive, and emotional whereas System Two is slower, more deliberative, and more logical. The question then is: how did logic and language based thinking evolve from associative memories? What are the differences in the circuits underlying them with respect to associative networks? I regard this as the core question in our quest to understand human intelligence and replicate it in machines.
Acknowledgments
I am grateful to Andrzej Banbuski for finding a bug in section 2.2 and to Arturo Deza and for useful comments.
This material is based upon work supported by the Center for Minds, Brains and Machines (CBMM), funded by NSF STC award CCF-1231216, and part by C-BRIC, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA. 8
References
[1] D. J. Willshaw, O. P. Buneman, and H. C. Longuet-Higgins. Non-holographic associative memory. Nature, 222(5197):960–962, 1969.
[2] Q. Liao and T. Poggio. Bridging the gap between residual learning, recurrent neural networks and visual cortex. Center for Brains, Minds and Machines (CBMM) Memo No. 47, also in arXiv, 2016.
[3] T. Poggio and W. Reichardt. On the representation of multi-input systems: Computational properties of polynomial algorithms. Biological Cybernetics, 37, 3, 167-186., 1980.
[4] G Palm. On associative memory. Biological Cybernetics, 36:19–31, 1980.
[5] T. Poggio and Q. Liao. Generalization in deep network classifiers trained with the square loss. CBMM Memo No. 112, 2019.
[6] Paulo Jorge S. G. Ferreira. The existence and uniqueness of the minimum norm solution to certain linear and nonlinear problems. Signal Processing, 55:137–139, 1996.
[7] T. Poggio, H. Mhaskar, L. Rosasco, B. Miranda, and Q. Liao. Theory I: Why and when can deep - but not shallow - networks avoid the curse of dimensionality. Technical report, CBMM Memo No. 058, MIT Center for Brains, Minds and Machines, 2016.
[8] H.N. Mhaskar and T. Poggio. Deep vs. shallow networks: An approximation theory perspective. Analysis and Applications, pages 829– 848, 2016.
[9] A. Rahimi and B. Recht. Random features for large-scale kernel machines. NIPS, pages 1177–1184, 2007.
[10] T. Poggio, F. Anselmi, and L. Rosasco. I-theory on depth vs width: hierarchical function composition. CBMM memo 041, 2015.
[11] Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruslan Salakhutdinov, and Ruosong Wang. On Exact Computation with an Infinitely Wide Neural Net. arXiv e-prints, page arXiv:1904.11955, April 2019.
[12] Ronen Basri, Meirav Galun, Amnon Geifman, David Jacobs, Yoni Kasten, and Shira Kritchman. Frequency Bias in Neural Networks for Input of Non-Uniform Density. arXiv e-prints, page arXiv:2003.04560, March 2020.
[13] Christos H. Papadimitriou, Santosh S. Vempala, Daniel Mitropolsky, Michael Collins, and Wolfgang Maass. Brain computation by assemblies of neurons. Proceedings of the National Academy of Sciences, 117(25):14464–14472, 2020. [14] T. Poggio. A theory of how the brain might work. In Cold Spring Harbor Symposia on Quantitative Biology, pages 899–910. Cold Spring Harbor Laboratory Press, 1990. [End of Quoted Matter]
Associative memories are usually trained to learn association pairs (x_1,y_1), (x_2,y_2),..., (x_K,y_K), where x_i are the stimulus and y_i are the responses. The nature of the stimulus and the responses depends on the associative memory model and its application. In the case of a fuzzy associative memory, the stimulus and the responses are fuzzy sets. Precisely, they are usually finite fuzzy sets and, thus, can be represented as vectors in [0,1]^N and [0,1]^M.
During the learning phase, also called the storage phase, the associative memory learns (memorizes) the association pairs. Concluded the learning phase, you can use an associative memory to recall a stored item (or yield the desired response) from a noisy or corrupted version of a stored stimulus.
Answering your question, you should define the association pairs to use a fuzzy associative memory. That is the stimulus and corresponding responses.
I would like to finish by pointing out that there are several associative memory models in the literature and, each model has its own learning rule. Many interesting associative memory models can be found in the papers cited above.