Information and Coding Theory v.s. Machine Learning

01 January 1970 1 1K Report

I have been pondering about the relationship between these two important topics of our data-driven world for a while. I have bits and pieces, but I have been looking forward to find a neat and systematic set of connections that would somehow (surprisingly) bind them and fill the empty spots I have drawn in my mind for the last few years.

In the past, while I was dealing with multi-class classification problem (not so long ago), I have come to realize that multiple binary classifications is a viable option to address this problem through using error correction output coding (ECOC) - a well known coding technique used in the literature whose construction requirements are a bit different than classical block or convolutional codes. I would like to remind you that grouping multiple classes in two superclasses (a.k.a. class binarization) can be addressed in various ways. You can group them totally randomly which does not dependent on the problem at hand or based on a set of problem-dependent constraints that can be derived from the training data. One way I like the most stays at the intersection point of information theory and machine learning. To be more precise, class groupings can be done based on the resultant mutual information to be able to maximise the class separation. In fact, the main objective with this method is to maximise class separation so that your binary classifiers expose less noisy data and hopefully result in better performance. On the other hand, ECOC framework calls for coding theory and efficient encoder/decoder architectures that can be used to efficiently handle the classification problem. The nature of the problem is not something we usually come across in communication theory and classical coding applications though. Binarization of classes implies different noise and defect structures to be inserted into the so called "channel model" which is not common in classical communication scenarios. In other words, the solution itself changes the nature of the problem at hand. Also the way we choose the classifiers (such as margin-based, etc) will affect the characterization of the noise that impacts the detection (classification) performance. I do not know if possible, but what is the capacity of such a channel? What is the best code structure that addresses these requirements? Even more interestingly, can the recurrent issues of classification (such as overfitting) be solved with coding? Maybe we can maintain a trade-off between training and generalization errors with an appropriate coding strategy?

Similar trends can be observed in the estimation theory realm. Parameter estimations or in the same way "regression" (including model fitting, linear programming, density estimation etc) can be thought as the problems of finding "best parameters" or "best fit", which are ultimate targets to be reached. The errors due to the methods used, collected data, etc. are problem specific and usually dependent. For instance, density estimation is a hard problem in itself and kernel density estimation is one of its kind to estimate probability density functions. Various kernels and data transformation techniques (such as Box-Cox) are used to normalize data and propose new estimation methods to meet today's performance requirements. To measure how well we do, or how different distributions are we again resort to information theory tools (such as Kullback–Leibler (KL) divergence and Jensen-Shannon function) and use the concepts/techniques (including entropy etc.) therein from a machine learning perspective. Such an observation separates the typical problems posed in the communication theory arena from the machine learning arena requiring a distinct and careful treatment.

Last but not the least, I think that there is deep rooted relationship between deep learning methods (and many machine learning methods per se) and basic core concepts of information and coding theory. Since the hype for deep learning has appeared, I have observed that many studies applying deep learning methods (autoencoders etc) for decoding specific codes (polar, turbo, LDPC, etc) claiming efficiency, robustness, etc thanks to parallel implementation and model deficit nature of neural networks. However, I am wondering the other way around. I wonder if, say, back-propagation can be replaced with more reasonable and efficient techniques very well known in information theory world to date.Perhaps, distortion theory has something to say about the optimal number of layers we ought to use in deep neural networks. Belief propagation, turbo equalization, list decoding, and many other known algorithms and models may have quite well applicability to known machine learning problems and will perhaps promise better and efficient results in some cases. I know few folks have already began searching for neural-network based encoder and decoder designs for feedback channels. There are many open problems in my oppinion about the explicit design of encoders and use of the network without the feedback. Few recent works have considered various areas of applications such as molecular communications and coded computations as means to which deep learning background can be applied and henceforth secure performances which otherwise cannot be achieved using classical methods.

In the end, I just wanted to toss few short notes here to instigate further discussions and thoughts. This interface will attract more attention as we see the connections clearly and bring out new applications down the road...

Ramesh Challa

Information theory and coding It mathematical approach to the study of coding of information along with the quantification, storage, and communication of information.

Machine learning is the process of training the machine to identify the objects of interest.

Training to the machine is given on the basis of objects similar in nature but not he same to be identified

Badges
Science topic

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

I need to model an anisotropic material in which the Poisson's ratio ν_12 ≠ ν_21 and so on. Therefore, the elastic compliance matrix wouldn't be a symmetric one. In ANSYS APDL, for TB,ANEL...

09 August 2024 5,048 2 View

Request Python code?

Request Python code from this article : Gender equity of authorship in pulmonary medicine over the past decade. THANKS!

08 August 2024 6,242 2 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Is it possible to plot the atom-projected band structure using GPAW?

Hi, I'm currently working on a project where I need to plot the atom-projected band structure using GPAW. I've been able to calculate the band structure for my material, but I'm having trouble...

07 August 2024 269 3 View

Why does everyone use vs code?

Visual Studio Code (VS Code) has become a popular choice among developers for several reasons: 1. **Free and Open Source**: VS Code is free to use and open source, making it accessible to...

07 August 2024 7,013 4 View