Potential impact of imperatively hidden objects on machine learning, predictions and aging
1.) Introduction to machine learning.
Image we want to predict the mode of transportation between UALR and UAMS, which are about 3.1 miles apart.
We start with 100 training samples for each class. The machine learning algorithm is on trained by giving two time point, i.e. starting time and finishing time, takes 23 minutes on bike. To get to the classroom it takes another 5 minutes. For our machine learning algorithm for something to be of the class "bike" it the travel needs to take at least 23 minutes. So we can expect a normal distribution peaking at 28 minutes (+/- 5 min)
When traveling by car it only takes 9 min + 5 min parking. To be of class "car" travel time must be at least 14 min, peak around 19 min and have upper extreme of 24 min.
Only if our interval is between 23-24 min our model may predict wrong.
2.) Automatic prediction of new classes/clusters using machine learning.
After having trained it on 100 samples I give it the rule that anything outside the range of the class "bike" or "car" must not be assigned to either class but is a new class instead.
2.1.) Examples for training that allows independent class discoveries using machine learning.
Walking takes 65 min +/-10 min, i.e. range: 55-75 min. The machine learning algorithm cannot classify but must conclude that it discovered a new class of transportation, i.e. walking. The distribution peak should be at 65 min. We must tell our machine learning algorithm that it has just discovered a new class, i.e. mode of transportation other than "car" and "bike". Thus, if a similar situation arises again our algorithm can discover new classes on its own.
2.2.) Introducing a fourth class, i.e. bussing
Taking the bus takes 56 min. The range is 56 min to 116 min peaking at 86, having two local maxima at 66 and 106 minutes if the bus goes every 20 minutes. Based on this distribution the machine learning algorithm should be able to distinguish between walking and bussing given intervals that resulted from the same mode of transportation.
Can the machine learning algorithm conclude that "bussing" is influenced by a periodic features equals the time difference between neighboring extremes, which n our case is causes by the busses running every 20 min. This should separate bussing more from the other transportation options than they differ from one another.
2.3.) Recap of the 4 classes based on the the time interval between starting and arriving only.
Our machine learning model can now distinguish between the following four classes.
For our machine learning algorithm the world looks as described below. It has no concept of bussing, driving, walking or biking.
2.3.1) to be of the class car the interval must be between 14 and 24 min forming a normal distribution around 19
to be of class bike: range 23-33 min with normal distribution around 28 min
2.3.2. )to be of class walking: range: 55-75 min with normal distribution around 65 min
2.3.3.) to be of class bus: range 56-116 min with trimodal distribution with global maximum at 86 min and two local maxima at 76 and 106 minutes given that busses come every 20 min..
2.3.4.) to be of the class biking it forms a normal distribution around 33 minutes ranging from 28 to 38 minutes.
2.4.) Learning a new transportation class on its own, i.e. taxi
If I take a taxi I save 5 minutes because no time needs to be spent on parking. This would result in an interval from 9-19 min, with normal distribution around 14 min. The machine learning algorithm should define it as a new mode of transportation because its outside the range of the other four transportation classes for which it has training samples. Based on this training our machine learning algorithm should not be able to correctly decide whether a series of travel intervals fit an any of the 5 classes or if its a new transportation mode discovery.
I want to draw the distributions in R or Python to help people understand machine learning.
The aim is to use machine learning for new discoveries of classes and concepts.