Random forest applied to time series: Data-generated prediction model sometimes contains influence for a variable, what are potential causes?

More Jonas Mellin's questions See All

Abiotic quantification of ROS with DCHF-DA?

Hi together, does anyone have a detailled protocol for the quantification of ROS in an electrolyte solution, i.e. in the absence of any cells? I am currently working on this with the fluorescent...

07 February 2024 8,690 3 View

How can we link these parameters(to stabilize the climate, safeguard ecosystems and improve everyone’s quality of life) to solve environmental issues?

challenging question that I always thinking when I'm trying to think about sustainable environmental conservation.

14 September 2023 537 1 View

APDL-Coding: How to calculat work conducted by external loads?

Hi there, is there an Ansys APDL Command / Method for the work conducted by my external loads? I want to use the work value for further analysis. best regards,

07 August 2023 8,286 1 View

How to complete an incomplete crystal structure in Pymol ?

Dear community, I am currently working on the crystal structure of FOKI. https://doi.org/10.2210/pdb2FOK/pdb Unfortunately, the researches who made the crystal structure havent been able to...

07 July 2023 6,147 1 View

I can't figure it out - EFA or CFA for adapted scales?

Hi everyone, I am currently working on my Master's thesis where I am required to conduct a factor analysis on a scale that was used in a survey. The survey in question was developed by combining...

28 June 2023 5,091 13 View

How to operate Cryojet Remote via RS232 with Arduino/PLC?

Dear all, Can anybody share experience in operating a Cryojet remotely via the RS232 with a Microcontroller? I found no docu nor any HowTo. Thank You! Cheers, Jonas

29 March 2023 6,719 3 View

Big datasets of methane emissions in wet soils?

Hello every I'm looking for datasets looking at methane emissions in wet soils, be it low-lying soils, wetlands, forest swamps, etc.. Any help would be appreciated. Thanks Jonas

09 March 2023 8,102 4 View

Which is the best biotin-azide substrate for click-chemistry to detect nascent protein synthesis with L-homopropargyl-glycine metabolic labelling?

My aim is to detect nascent protein synthesis by click-chemistry of L-homopropargyl-glycine-labelled proteins with biotin azide and detection by western blotting with fluorescently-conjugated...

13 February 2023 8,942 2 View

Should I use TA sites to determine the insertion sites (and counts) for TnSeq experiment using Tn5 transposon?

We generated transposomes following this method Efficient amplification of multiple transposon-flanking sequences. Now I am analyzing the data, however, most of the pipeline available online are...

09 February 2023 3,368 1 View

Is there a method to generate functions automatically in the prediction modelling process ?

Dear researchers, Selecting a function of the right form (linear, polynomial, exponential, power law ...) to fit a set of data usually requires the use of some knowledge and some trial-and-error...

16 January 2023 5,425 0 View

How can I prepare virus for a TEM or SEM imaging?

I have virus (viral hemorrhagic septicemia virus) in suspension and the experiment will not involve cells. What level of TCID50 is preferred?

11 August 2024 3,115 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

Usually, additive manufacturing techniques like SEBM, SLS, and SLM are used for interconnected porous lattice structure generation with sizes of >100–200 micrometers. Can the Fused Deposition...

09 August 2024 7,892 0 View

Hello researchers Is this a random laser or just fluorescence?

I am using Rhodamine6G as gain medium and silver nanoparticles as scatterers on a microscope slide and laser input 532 nm comes from above.

09 August 2024 9,894 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

Naveen Kumar Boiroju

Try to see whether there is any possibility of path analysis. It may give you some clues in the data.

Sven F. Crone

Hi Jonas. Have you looked at the underlying functional relationship of the data you are trying to model (i.e. plot the data for bivariatea analysis ? Most decision trees partition feature space in a orthogonal way ... that means that if you are looking for correlations decision trees per definition are not well equipped to model this and get arbitrary and often poor results (ensembles of decision trees might arbitrariy capture these decision boundaries of correlations, but not in a parsimonious form). Note also that decision trees are very rarely used in time series prediction with and without explanatory variables - try neural nets if you are keen on machine learning or good old regression and compare to a single decision tree to check general suitability. Hope this helps, Sven

@Paul: orthogonal partitioning of feature space is the conventional way to describe how decision boundaries of decision trees (in classifcation or regression tasks) are iteratively generated - this is unfortunately standard in machine learning jargon. (is this a pure stats question?). So I was indeed trying to bridge a gap here ... and I am sorry you did not understand my response as a result.

Ok, thx.

Note that in machine learning the term "Decision Tree" is used for a family of machine learning algorithms of ID3, CART, C4, CHAID (see also Quinlan, J. R., (1986). Induction of Decision Trees. Machine Learning 1: 81-106, Kluwer Academic Publishers) which are the algortihms used to constitute random forrests (see above).

Jonas Mellin

Thanks for the answers/discussion elucidating this issue.

Thanks again, I will look this up too.

@Paul, yes we treat each ship individually. No, we have not tried to standardize it (neither ipsatively nor normatively), it is hard to get data for varying conditions (weather, position, speed, plans, time of year etc. etc.), so we have a subset of environmental conditions. For this subset, we do have a lot of data. For any kind of standardization, we need more data. I, personally, believe that particle filtering with Monte Carlo simulation may be an interesting approach, e.g., (Osgood & Liu, 2014); in a similar way, we have processes where the distribution are non-Gaussian and the relationsships are not necessarily linear.

Concerning my problem, the variations w.r.t. optimization is so small compared to other so even though random forest is good at handling small variations, sometimes the random partitioning end up with training data containing no variation. This leads to prediction models that have no influence from the optimization. I ran different configuration of random forests (e.g., different moving average window periods, different number of classification trees, different seeds for partioning the data) and found no particular configuration that worked better than another w.r.t. obtaining a prediction model that have influence based on the optimization. Instead, in about 2/3 of all possible configurations we found influences and in >95% of predictions based on the subset of conditions experienced during the test, the prediction models indicated that fuel was saved. In the rest, we found no influence what so ever.

Given time and resources (depends on the company), we may investigate this further. Thanks for your answers and the discussion, it gave me a lot of hints for further investigation.

Osgood, ND & Liu, J 2014, ‘Towards Closed Loop Modeling: Evaluating the Prospects for Creating Recurrently Regrounded Aggregate Simulation Models Using Particle Filtering’, Proceedings of the 2014 Winter Simulation Conference, Savannah, Georgia, US.