Theoretical bound on the uncertainty provided by decision trees?

24 November 2018 0 4K Report

Decision trees have the property that they provide both a prediction and a probability for this prediction (scikit-learn's predict_proba method; Section 3.4 of Data Mining with Decision Trees, 2nd edition), which is basically the proportion of the predicted classes among the samples at the corresponding leaf. This probability is more an indication of the tree's (un)certainty about its prediction than the true probability (like with density estimation techniques), for which I am using this value.

Do you know about theoretical bounds on this quantity? (Minimum value, probability distribution between the minimum value and 1 would be very interesting for me.) More formally, denoting by p^

the posterior probability of the sample belonging to the predicted class (a random quantity), p^=P(y^|x), what would be the probability distribution of p^ (or the way to compute P(p^≤T) for varying T)? I guess there must be some approximation in there (there is one value per leaf, i.e. p^

has a discrete distribution over a small subset of the rationals). Something like a generalisation bound based on VC dimension, for instance (it is available for the predictions, of course, but I do not see how to generalise it to the uncertainty).

I have done quite a lot of research on the topic, but have found nothing of use in my case. Some people advise to use things like Platt scaling or isotonic regression, but these seem to be more suited to density estimation based on the algorithm's output.

A very basic lower bound would be like 1/N

for N samples, supposing that all classes have the same number of samples at the leaf (neglecting the samples that went to other leaves); the upper bound would be 1 for leaves with no impurity. However, this gives no indication whatsoever on the distribution of values between the two extremes. (I could compute that distribution on one tree, but it is hard to write a proof based on experimental results.)

(If you know about any similar result for other machine-learning algorithms, I'm also very interested!)

Thanks a lot for your time!

Badges
Science topic

Similar topics
Mathematical Sciences
Graphs

More Thibaut Cuvelier's questions See All

How to calculate the reflection coefficient of acoustic plane waves at a non-normal, finite thickness surface ?

Dear ResearchGate members, On one hand, there is a theory giving the reflection/transmission coefficients when acoustic planes waves propagating in a medium (rho0, c0) reach a finite thickness...

08 August 2023 3,729 5 View

Logistic regression model is significant but the predictors are not. How do I report this?

Dear, I have conducted a study where 18 patiënts were included. I ran a logistic regression, the model is significant but none of my predictors are. My R square is 1 which also a bit strange....

01 March 2023 9,990 9 View

Is it possible to obtain a modified strain of Kluyveromyces marxianus with a good yield of ethanol on whey to create an alcoholic drink?

For my 3rd year internship, I am looking for a modified strain of kluyveromyce marxianus with a good yield of ethanol on whey to make an alcoholic beverage. Ps: This strain will only be used for...

06 February 2022 6,154 1 View

Difference between planetary ball mill and vibrating ball mill?

Dear all, I would like to know if I can use vibrating ball mill instead of planetary ball mill to reduce grain size. If yes, which parameters need to be choose? (I know the parameters for...

12 June 2019 8,722 3 View

Is there a known rough mathematical relation between a stainless steel composition and its pitting potential?

Dear, I'm trying to see if there is a rough (even very rough) statistical link with stainless steels chemical composition + microstructure + ... and the pitting potential. It would be...

26 May 2019 7,242 4 View

What is the solubility parameter of HEH[EHP] (Hansen or Hildebrand)?

I am trying to calculate the solubility parameter of a molecule: 2-ethylhexylphosphonic acid mono-2-ethylhexyl ester (HEH[EHP])

24 February 2019 9,827 3 View

What are notable receptors that are known to be expressed by microglias and related to chronic pain conditions such as fibromyalgia ?

It has been discovered that microglias plays a key role in chronic pain related diseases such as Fibromyalgia. Some therapetic strategies are studied, such as Naltrexone that binds to Toll-Like...

08 December 2018 434 5 View

How is Cannabidiol supposed to relieve Fibromyalgia related pain?

CBD is the new trend for people living with fibromyalgia and other pain related diseases. CBD is said to be an inverse agonist of CB1 and CB2 receptors. CB2 receptors are linked to microglial...

24 November 2018 3,809 0 View

Looking for Blendermann 1995 paper on Wind coefficients?

Hi, I am looking for a PDF of the paper 'Estimation of wind loads on ships in wind with a strong gradient', proceedings of 14th international conference on offshore mechanics and arctic...

20 August 2018 8,898 2 View

What is the impact of improved SD of metrics in IOL master 700 compared to Lenstar?

The new IOL master 700 has smaller in vivo repeatability SD for most metrics than the Lenstar, eg. 8µ instead of 35µ for AXL, 11µ instead of 40µ for ACD and 12µ instead of 80µ for...

10 March 2015 1,384 2 View

Is this a facetotecta nauplius?

This larva was captured using a plankton net in the Persian Gulf during the summer. I believe it may be a Facetotecta nauplius.

08 August 2024 3,746 4 View

May members post flyers about opportunities to present at a conference? If so, where to post?

May members post flyers about opportunities to present at a conferehttps://veraeducation.com/nce? If so, where to post for the Virginia Educational Research Association? https://veraeducation.com/

08 August 2024 4,585 1 View

Hello all, Looking for international reviewer to review Ph.D thesis in wireless sensor network.Can anybody help?

My name is Apurva Saoji. I am a Ph.D scholar in Computer engineering in India. I am looking for international expert in reviewing my PhD thesis, "Competitive Optimization Techniques to Minimize...

07 August 2024 4,600 2 View

Research Methodology - Impact of Corporate Reputation on Stakeholders Behaviors?

Please can anyone support with the survey questions based on RQ measures and propose how to do it in FMCG industry and include as well the role of brand equity Thanks

06 August 2024 949 0 View

"A Markov-like Model for Patient Progression"?

A Markov-like Model for Patient Progression" Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC) is a powerful computational technique used to draw samples from a probability...

05 August 2024 10,079 0 View

How to report results of Generalised Linear Mixed Models in a journal article?

Hi everyone, If you have written or come across any papers where Generalised Linear Mixed Models are used to examine intervention (e.g., in mental health) efficacy, could you please share the...

04 August 2024 4,130 4 View

Why results of ROS flurescence are negative as there was no bacteria within?

Hello. I am working on ROS production of two systems: system A is cerium oxide and hydrogen peroxide, system B is cerium oxide nanoparticle, hydrogen peroxide and potassium bromide. I did some...

04 August 2024 5,974 3 View

Radiogenomics Cancer Research Challenges?

what are the top 3 challenges to the advancement of the field of Radiogenomics in cancer research? is it the availability of easily available low-cost matched imaging and biosamples with clinical...

03 August 2024 5,828 4 View

Would you like to join our meta analysis research team?

I am excited to announce an opportunity for dedicated researchers to join our dynamic team for an ongoing meta-analysis study. If you are passionate about research and have a keen interest in...

01 August 2024 6,737 1 View

Is the peer-reviewed publication "MedieKultur: Journal of Media and Communication Research" (E-ISSN 1901-9726, P-ISSN : 0900-9671) a legitima?

Is the peer-reviewed publication "MedieKultur: Journal of Media and Communication Research" (ISSN Online: 1901-9726, ISSN Print: 0900-9671) a legitimate and credible scholarly journal in the field...

01 August 2024 629 3 View