The Pivotal Role, which Imperatively Hidden Elements (IHE) Play in Correct Feature Selection (FS) to Properly Train Supervised Machine Learning (ML) ?

01 January 2018 3 4K Report

How can the inherent challenges posed by hidden objects be adequately addressed and eventually overcome?

Challenges in predicting relevant features? - Logic or obscure? - If logic, are scientists already aware of it? - But if they were, they would let a computer select the features in a much less bias and much more systematic way than people are capable of. - But since I have not seen anybody doing that, I believe, we could drastically accelerate our discovery process by making researchers aware of the advantages to hand feature selection from observation bias people to much less bias and much more systematic artificial intelligence (AI)

Long Title:

Is proper, correct and exhaustive feature selection for training machine learning algorithms already possible even before all imperatively hidden objects/factors/dimensions, which are required for correctly conceptualizing aging and many other complex phenomena, are fully discovered?

Short Title:

Is proper feature selection possible before all imperatively hidden objects, which are required for conceptualizing aging adequately, are fully discovered?

Topic:

About imperatively hidden objects and the need for new concept discoveries to select all necessary features required to fully understanding aging, immigration and other phenomena.

Beginning of Writing:

Humans are very bias in choosing their method of conducting experimental measurements or make observations without being aware of it. What percentage of the entire electromagnetic wave spectrum can we perceive? No more than 5% for sure. But the changes, of which we must be aware, before we can understand aging, are most likely much more distinct outside our narrow sensory window because our sensory limitations did not affect the evolution of aging in any way.

For example, humans can only hear part of the sound an elephant makes because humans cannot hear such low frequencies as the elephant can. This tends to prevent the full understanding of the elephant’s communication options. Humans cannot distinguish such low sound frequencies from background noise, i.e. environment, because they cannot perceive the low elephant sound frequencies from being different from the background environment. But without considering those imperatively hidden factors we cannot fully understand elephant communication. Therefore, humans tend to miss cellular processes, which can only be distinguished from background noise outside the electromagnetic wavelength interval, for which humans have evolved sensory organs, i.e. eyes, ears and skin. The mechanism by which the tongue and nose operate is of an entirely different dimension because they cannot sense any wavelength.

For example, before magnets were discovered, they remained for us an imperatively hidden object because we could not even suspect them in any way. But still, just because we lack any senses for perceiving any kind of magnetism does not stop it from affecting our lives. Only after we discovered the consequences of the forces, which the magnetic field has on some metals, prompted us to search outside the limited window, within which we can sense differences in wave length. Magnetic fields could affect life in many positive ways because they are used to treat major depressive disorder and cause involuntary muscle contraction. But has anybody even thought of measuring the magnetic field of a cell or brain, which I expect to be strong enough for us to measure with sensitive devices? Since any electric current causes a perpendicular radiating magnetic field, it can be hypothesized that the weak magnetic field is pulse-like and depends on the temporal pattern by which neurons fire action potentials. The changes in the magnetic field of a cell is expected to be enriched for the cellular component membrane because they have proton pumps and maintain an electric gradient to produce ATP. But what if changes in this magnetic field are causing us to age? Then we could stop the aging process by any intervention, which sets our cellular magnetic field pattern back to its youthful benchmark.

I suspect that the reason for our only rudimentary understanding of the aging process is caused by us missing such kind of imperatively hidden objects, which are required for making the essential key observations without which aging cannot be fully explained. I view a magnetic field as a concept, which exists, regardless weather we are aware of it. There may be many more other hidden concepts, which we must develop correctly, before we can reverse aging.

Analogies to aid in the understanding of the concept of Imperatively Hidden Objects (IHO)

Let’s say that an immortal interstellar alien highly intelligent out-of-space critter has landed on Earth. Let’s imagine that he can only perceive wave lengths within the limits of the magnetic field. Then we humans would not even notice this out of space interstellar visitor because he/she remains an imperatively hidden object (IHO) that we cannot even suspect. Let’s say this interstellar species has not evolved a body or anything to which our senses are sensitive. Let’s say that this life can be fully defined by irregularities within the magnetic field. But this interstellar critter can perceive us humans because our magnetic field disrupt the homogeneity of the background environment and must therefore be something other than background noise. Let’s say that this immortal interstellar critter can perceive and process all the magnetic fields on Earth. Could he maybe develop the concept of siblings or parents on its own? Is the magnetic field of relatives more similar to each other than expected by chance? It is very likely because humans vary a lot in their neuronal wiring architecture. Hence, each human could be defined by the pattern of his/her action potentials. This inevitably causes a very weak unique perpendicularly acting electromagnetic field that cannot be detected by our instruments. Therefore, instead of humans, we should use the giant squid as model organism to understand the relationships between life, aging and changes in magnetic field because it has the thickest neuron. Therefore, it must fire stronger action potentials than our human neurons. This will inevitably cause a stronger perpendicularly acting electromagnetic field, which may be strong enough to be detected by our instruments.

Let’s say that this interstellar critter wants to use machine learning to predict the risk of any particular university student in the USA for having to return home after graduation because they lost their immigration status and could not find a job, which would have made them eligible for one year OPT (Optional Practical Training). Let’s say that this interstellar critter has no concept of aging and that his most important goal is to develop a classifier by developing a new machine learning algorithm, which can predict in advance the risk that any particular student is facing to no longer been allowed to reside in the United States. Let’s say that accomplishing this objective has the same meaning and importance to this critter as for us the cure of aging and elimination of death.

What should he do? He cannot talk. No human even suspects him. He could start using supervised machine learning by observing thousands of students to find out what those students share, who are forced to leave, or what they lack compared to citizens, who are always welcome here.

I hypothesize that no matter how clever and sensitive to irregular interruption of the homogenous electromagnetic field, which is the only dimension, in which he can sense the presence of humans and any other form of life, he has no chance to understand the risk factors for being forced to leave America after graduation, because they are an imperatively hidden concepts (IHC) to this critter, because he cannot even suspect them in any way. However, without developing the right concepts in advance, this critter can never the discover risk factors for having to leave the USA after graduation.

The same applies to aging. We are still missing essential concepts without which we cannot fully understand it. But even if somebody by chance could detect the magnetic irregularities caused by this foreign interstellar critter, he/she could never suspect that it is highly intelligent.

This means that even if we measured a cell across the entire wavelength spectrum and could clearly detect its presence we would never suspect it to have any kind of intelligence because we would consider the anomalies in the magnetic field as background noise. Our visiting interstellar critter has a similar problem. He cannot develop the essential concepts without which he could never develop a machine learning algorithm to predict all the correct risk factors, which impair the chances for somebody to be allowed to keep residing in the US while not full time enrolled. As long as this critter has no concept of “country”, e.g. the USA, he has absolutely no chance to discover nationalities because even if he could figure out the nationality of everyone, it would make no sense to him. But words like “American” “German”, “French” or “Indian” cannot make any sense to this critter as long as the concept of “country” remains an imperatively hidden object for him. How can somebody be considered “German” or “American” as long as the concept of Germany or USA are still lacking? One can only be German if Germany exists. Without at least suspecting the concept of a country, e.g. Germany, there is absolutely no way to discover the required concept of citizenship. But without determining the feature “citizenship” no machine learning algorithm could learn to make correct predictions. .The same applies to aging. We are still lacking so many essential concepts without which aging can never be understood

For example, as long as the concept of a ribosome is lacking, we have no way of understanding the changes in the relative abundance ratio of mRNA and proteins. We may have some initially success with building a model to predict protein abundance and concentration because it is about 70% similar to the transcriptome. However, according to Janssens et al (2015) [1], this similarity declines with age and is a driver of replicative aging in yeast.

But no matter how many training samples we use to train our predictor, it must fail, unless we have developed a mental concept of a ribosome. I believe we face a similar predicament with understanding the causes and regulation of epigenetic changes over time with advancing age, despite being able to measuring them so clearly that we can use them to determine the biological age. But unfortunately, as long as we lack any concept, by which epigenetic changes could be connected to other cellular processes, we cannot understand how epigenetic changes are regulated.

Before we could correctly conceptualize the role and scope of the ribosome we had no way to explain the mechanisms by which mRNA and protein abundance are linked. But even after we conceptualized the role of the ribosome correctly any machine learning algorithm to predict protein concentration would inevitably fail as long as we lack the correct concept of the poly-AAA-tail. Similarly, there are still lots of imperatively hidden concepts, factors, dimensions or objects, which we cannot suspect because we cannot perceive them, which prevent us from fully understanding aging. However, the fact that our current observations fail to fully explain aging, indicate the presence of imperatively hidden factors of which we can see the consequences without being able to detect them. But since every consequence must have a cause, any unexplained consequence indicates the presence of imperatively hidden imperceptible factors (IHF) without which we cannot succeed to improve our feature selection.

As I have explained in my immigration example, only when selecting the correct feature, e.g. citizenship, the risk for being asked to leave America by the federal government can be fully understood and hence, can be predicted much better. Could I convince anybody of the high likelihood of the presence of imperatively hidden factors, which we cannot perceive yet as being distinctly different from their environment?

Conclusions and proposed responses/adaptations of our study design

What is the rate-limiting bottleneck, which limits our research progression and why?

The current bottleneck in defeating aging is not addressed by further improving our machine learning algorithms and increasing the training samples, but instead, we must focus on improving proper feature selection first. My main contribution towards defeating aging is to predict features, measurement types and intervals between measurements, which could show the actions of aging much clearer than the features, which we have currently selected to stop aging and defeat death. Now it is up to wet-lab scientists to test my hypotheses. But even if all of them can be ruled out, the possibilities, by which the mechanism of aging could function, would be reduced. This would leave us with fewer hypotheses left to test. Since the options we have for fully understanding the aging process are large - but yet finite - any crazy appearing – no matter high unlikely seeming - hypothesis, which can be ruled out, brings us a tiny step closer to immortality.

The reason why I claim that correct feature selection, but not the gradually improving performance of our machine learning algorithms, is the current bottleneck, which is holding us back from improving our understanding of the aging process, is that our machine learning algorithms have been improving gradually over time, but our feature selection methods have not.

The fact that I cannot find any data for measuring the yeast transcriptome in five-minute intervals for more than 3 out of the average 25 replications, which is considered the average wild type (WT) yeast replicative lifespan, indicates that nobody has seriously suspected that we could at least observe the effects of the aging mechanism by selecting new periodic features, such as period length, temporal phase shift or amplitude, which only make sense if we replace our linear with a periodic concept of life. However, this requires us to change our concepts about life to be driven by linearly acting trends to cyclical periodically acting trends in order to expand our feature selection options to periodic quantities, such as period length, temporal phase shift, amplitude or oscillation pattern, which would have been impossible to imagine when holding on to the old linear concept. In this case – although we could clearly measure the period length - we could not detect it as a feature affected by aging until we explicitly define, select and measure this new feature, e.g. the period length, temporal phase shift, amplitude or oscillation pattern.

Please let me know if this writing makes sense to you, because so far, almost nobody, except for me, seems to worry about this problem. Thanks a lot for your time to read and think through this. I welcome your feedback because my conclusions are logical but surprising to me because nobody else appears to have been aware of this because our study designs don’t reflect this insight yet.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Here is a positive response to my posting at ResearchGate.net, which gave me the confidence to share this.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Hello Thomas

In general, we describe a machine learning system as the sequence of three fundamental stages: preprocessing, processing and post-processing. The preprocessing has been concentrated, fundamentally, on the selection of attributes (editing), on the selection of objects (condensation) or on the mixture of both, but always starting from a previous database. The processing has followed strategies guided by symbolic learning, regression, connectionism, evolutionary-genetic algorithms, probabilities or analogy. And more recently, by the classifiers combination schemes and deep learning. While post-processing has focused on improving the quality of the prediction and/or trying to explain it. However, despite of many efforts made in each of these stages, over more than 60 years, don't exist the master algorithm capable of solving all the learning problems. This means that machine learning systems have improved over time, but our selection of features has not. So where is the error?

Hence, I believe that yes, your writing can make a lot of sense!!!

The imperatively hidden features could be the key to make essential observations that allow us to understand processes or phenomena that we have not yet been able to explain. If we start from the fact that the success of our experimental design is subject not only to the objectives we pursue, but also to the nature of the data we have and to their capacity to explain (model) the phenomenon or process itself . Then, our lack of capacity to understand a phenomenon becomes the limiting factor when it comes to explaining it, or what is the same, it prevents us from describing it according to its features and, therefore, to model it. Hence, the importance of knowing those essential concepts that allow us to understand what is happening. To subsequently, be able to make an adequate selection of features that leads to the development of that algorithm capable of modeling the process. This means that, even when we have a large number of characteristics on a process or phenomenon and the combination thereof, if we do not have those features that truly describe it and still remain imperatively hidden, then it will not be possible to understand it.

I work in the field of Computational Biology, precisely developing algorithms for the prediction of protein structures. And after hundreds and hundreds of algorithms and approximations described by the literature, the prediction does not exceed 30% of accuracy. This could be due to our inability to adequately model the proteins folding process. To our inability to discover what are the concepts, factors or sub-process that we cannot perceive and that prevent us from fully understanding the process from a holistic point of view. It is true that if we assume that each consequence must have a cause, then any unexplained consequence indicates the presence of imperatively hidden factors without which we cannot improve our selection of traits. This, warns us that no matter how much we focus on improving our machine learning algorithms and increase the training samples, if not we focus on an appropriate selection of traits.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

I worry about that there could be many still hidden dimensions, which are very similar to the magnetic field that we cannot yet anticipate. But we must first associate information from these kind of magnetic-field-resembling still imperatively hidden dimensions with aging before we can understand aging.

Since we humans have observational tunnel vision, which is mostly limited to the dimensions of our sensations, which must use artificial intelligence because for it all the different dimensions and the features, which define them, are more equal. Only if we can make people understand this, we will have a chance to collectively survive. I need help to get this published because only then experts will take it seriously. For that, I must provide proof-of-principle that we still very naive and observation bias humans would have missed important relevant information if we would not have let artificial intelligence (AI) define possibly aging-relevant features for us in a much more systematic and less bias manner. For us bias humans to create a much less bias AI, we must be able to look at life from many different ridiculous-seeming perspectives because that is what we expect our aging-features-selecting AI to accomplish for us. I am really good at that but the problem is that nobody seems to have time to me to listen. But if I write it, almost nobody has time to read my writings either. We need to create AI to systematically search for relations between our observational measurements, which we humans cannot suspect.

Here is another writing of mine, in which I have described a partial solution, even before I had defined the problem. We must reword it in such a way that people can understand it much easier. We must show one example, in which it has worked, as proof-of-principal that it will work in similar ways if we succeed in expanding its scope . The most important thing is that I don't feel alone because otherwise I may start believing that I must be wrong since nobody else seems to be thinking my way. Below is my partial remedy.

#############################################################################

Continuously Ongoing Emergency Random Evolution mimicking procedure "Unpredictable Survival"

#############################################################################

A major threat is that we are aging much faster than we can reverse it. We are still very far away from inferring, which information is most likely relevant for reversing aging that we MUST take an undirected method to counteract this problem because we don’t have any better alternative. Every day lots of new pairs of information is added to the web. Anything, which define at least two indivisible pieces of information as a value pair indicating a specific instance can be evaluated be vsboost. Therefore, we should start developing an independently working software, which keeps crawling the internet for any instance defined by at least to informational units as input data. Then, even though this software cannot infer the meaning of any of the event-defining information pair, it can use their values in predicting pretty much any other combination of paired information and try to predict any pair with any other pair. This would allow identifying even weak correlations and dependencies much sooner than when exclusively selecting features manually in our traditional way based on logic reasoning. Although logic reasoning and highly directed and targeted manipulations are good to have, it takes us way too much time until our understanding and concepts of new correlations has developed far enough to contribute to logically driven data feature selection and data manipulation. This continuously web-crawling software keeps adding anything, which could either serve as input our output value for any kind of supervised machine learning process. When this software can predict any random feature by whatever means it can possibly think of, it will let us know so we can check whether this could possibly make sense. We need to improve the NLP (Natural Language Processing) and semantic recognizing ability of this randomly feature adding software so that it can combine the same informational components into a single unit feature. But nevertheless, just like evolution randomly mistakes in grouping the same information component into a single indivisible feature, variations in the groupings of informational components, which must be predicted all at once, could turn out to be a good thing. For example, considering all Transcription Factor Binding Site (TFBS)-associated information into a single informational group may allow for the most accurate prediction rate but only when our random model contains all input features we need to define any possible informational dimension needed to sufficiently define all the parameters, which could belong to the TFBS dimension. For example if our feature hungry crawler has not yet discovered that TFBS binding is a cooperative rate rather than a Boolean process, it would fail. But if it could learn to predict time series plots only based on the Boolean value indicating whether a particular Transcription Factor (TF) could possibly bind to a promoter but disregarding the number and order of the TFBS for the same TF in the promoter of one gene it could still predict time series plots well enough to raise its prediction power far above the current threshold.

Although this old model is still imperfect it has value to get it as soon as possible instead of waiting until our crawler has found all input parameters (features) to assign a value to all possible dimension of the TFBS domain. This would actually speak in favor of allowing our prediction crawler to randomly vary any specific dimension of any domain suited for training supervised machine learning because the fewer the number of dimensions making up any domain the fewer and smaller information input domain are required for building a model based on randomly considering and randomly grouped information domains. Currently, most of us are not aware of the artificial imperative limitations resulting from letting humans have the monopoly on deciding, which dimensions can be grouped together to form a meaningful instance for input or output to train a supervised model. It is likely that smaller domains consisting of fewer dimensions or larger domain combining more dimensions could be more. But - although there are so many humans on this planet - our thinking, understanding, conceptualizing, imagining and applying our intuitive preferences for intuitively tending to include very specific dimensions into an indivisible input or output instance without even worrying about possible alternatives, is still too similar. The way in which our senses, perceptions, imaginations, concepts and partial understanding of any phenomenon intuitively selects the dimensions to a larger domain, which most of us would never even consider to predict in parts or as a very small dimension of a much larger super-domain, is only one out of very many possible option for combining any number of specific dimensions into a domain from which any number of input or output instances can be formed. One could imagine a domain as a row like a gene, which can have any number of column i.e. its dimensions, which must be considered like a single instance in their combination because it is lacking the option to consider only a few of its columns or combining some of with columns from an entirely different and unrelated table. A good example are time series plots. Human tend to be bias and prefer to define the gene expression time series curves by mRNA measured at each time point. This sounds so obvious but is this the best way for conceptualizing the temporal expression signature for each gene? I felt my colorful time series plots have much more meaning and can carry much more informational value as well as a more meaningful concept for imaging, comparing and analyzing gene specific temporal signatures. But although they look very pretty and are a good way to get a first impression about the similarities between two curves, they are not well suited to find out whether the plots for the gene, which belong to the same GO term, are indeed more correlated to each other than to the rest. Since I felt that a vector can never be the same as a curve I tried many ways to account for the slopes connecting each time point. But since I could think of so many different ways to achieve this but I could not decide on any way that I consider as the best possible option I am still not sure how to convert time series plots into numerical dimensions, which possess the very obvious advantage to allow for easy comparing, ranking and quantifying. I am not sure how to account for differences between plots from the Y axis. Maybe we should add another dimension to our concepts of our understanding of a time series curve. If we added to its time points also the total area under the curve to each plot, maybe we could quantify them in a much better and more intuitive way. But how much numerical value should we give each time point and the area under the curve. I am stuck with this problem ever since I tried to quantify time series plots. But imagine how many more option you'd had if you were not a human because then you would not limit your dimensions for defining your domain to only those you can easily imagine. A computer can randomly extract and try out any combination, subset or superset of dimensions without tending to be limited to those dimensions that can easily be conceptualized as a picture. Extreme gradient boosting (xgboost), which never gets tired to randomly define an indivisible domain by any combination of dimensions, might have much more luck.

�hUn֗]

Peter Wlodarczak

Deep Learners can do automatic feature discovery. E. g. Convolutional Neural Networks have been used for medical imaging where it could detect anomalies such as fibrosis or or Alzheimer's that is very difficult to detect by eye.

Michał Combik

If You want to find which features has biggest impact on final result You can use random forest or decision tree algorithms. That approach is called white box approach because after many iterations of learning algorithm you can see "decision tree". That tree shows what value of which feature gives results in data used to train. As a input to learn procedure You should specify which features are predictors and which value is results.

In R language there are many library to do that ex. "party".

For big data machine learning I recommend h2o.ai.

Adriana Santos-Caballero

An intrinsic problem of classifiers based on machine learning (ML) methods is that their

learning time grows as the size and complexity of the training dataset increases. For this

reason, it is important to have efficient computational methods and algorithms that can be

applied on large datasets, such that it is still possible to complete the machine learning tasks in

reasonable time. In this context, we present in this paper a simple process to speed up ML

methods. An unsupervised clustering algorithm is combined with Expectation, Maximization

(EM) algorithm to develop an efficient Hidden Markov Model (HMM) training. The idea of the

proposed process consists of two steps. The first one involves a preprocessing step to reduce the

number of training instances with any information loss. In this step, training instances with

similar inputs are clustered and a weight factor which represents the frequency of these

instances is assigned to each representative cluster. In the second step, all formulas in the

classical HMM training algorithm (EM) associated with the number of training instances are

modified to include the weight factor in appropriate terms. This process significantly

accelerates HMM training while maintaining the same initial, transition and emission

probabilities matrixes as those obtained with the classical HMM training algorithm.

Accordingly, the classification accuracy is preserved. Depending on the size of the training set,

speedups of up to 2200 times is possible when the size is about 100.000 instances. The proposed

approach is not limited to training HMMs, but it can be employed for a large variety of MLs methods.

Sharing link:

https://www.google.es/url?sa=t&source=web&rct=j&url=http://airccj.org/CSCP/vol2/csit2514.pdf&ved=2ahUKEwiD8ZHZqPHYAhVCWBQKHcIbC_84ChAWMAF6BAgREAE&usg=AOvVaw3lwgfJqXrY9Nk2rYozXmxW

What is the best way to explain human decisions and behaviors?

Could you please share links to cancer datasets and to Python or R packages to analyze them?

What is the compensatory power of remotely training visually impaired computer users??

Who would be interested in me leading workshops about innovative adaptations to make electronic information more accessible to the visually impaired?

What is the big deal about the medical disclaimer on supplements that everyone, except for me, seems to feel obligated to respect this rule?

Can the Immigration Status be adjusted as means of last resort to give foreign disabled job-seekers a chance to get hired for less competitive jobs?

What are the great still undiscovered benefits of standardizing the functional layout and display the same functions at all websites?

What would make cancer the most fascinating disorder if it were not deadly?

Are my concepts, based on which I intend to infer gene functions, correct?

How can feature selection for training Supervised Machine Learning Algorithms be expanded to improve their predictive power (60th revision)?

Feedback defines the constitution of an organism?

Which type of compound does lamda max of 218 indicate in a uv-vis spectrum of a partially purified compound through column and TLC?

How to learn more about SPSS and its Application?

Can you connect an HPLC to a Mass Spec only at a certain time point?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

What are examples of AI for good projects a teacher can assign to students?

RNA Extraction Using Hot Borate Method No Longer Working?