What to do when model does not predict the model and predicts one of features?

More Masoud Masoumi Moghadam's questions See All

How can I find a dataset for lane PREDICTION?

I want to work on a Lane Prediction model. I already worked with dataset that has segmented lanes like Culane, but I am looking for a dataset which has predicted lanes. for example this is a demo...

01 February 2020 2,320 3 View

How can we make a thorough dataset of images which has good coverage over specific subject like vehicle recognition?

I am about to make a vehicle detection model which is able to detect all types of vehicles like bikes, motorcycles, cars, buses and trucks and return the detection boxes. Unfortunately I did not...

11 December 2019 9,032 3 View

What are the best quality measurements of video for key frame extraction?

I have a project which is about getting an input video (basically a surveillance system video) and returning key frames of video (assume that video has only one shot) which have following features...

11 December 2019 8,740 4 View

How to build a video summarizer based detection of most faces possible?

Imagine a video from surveillance system and me as a developer of some algorithm for video summerization for this system. There are two scenarios for me which I need help. Imagine a long video...

11 December 2019 6,551 3 View

Is there any advantages over using Convolutional neural network while images are converted to frequencies?

I am about to develop a multi-purposed API for image and video processing with my team. I am searching the internet to find some resources (youtube, blog, medium articles, papers, github, etc) to...

10 November 2019 4,152 3 View

How can I overcome same image immediate transformation when doing change shot detection?

I got an image processing project which gets an input video that could be lengthy and returns the frames which camera shots change in the input video. So we can have a summary of shots in whole...

10 November 2019 1,020 2 View

How can I deal with imbalanced data in regression problems?

I am working on a project in the field of traffic data analysis. we are working on a routing app for vehicles with considering traffic data. Imagine each street has got data-set gathered by...

04 May 2019 4,424 6 View

How to detect rainy days in a dataset of climate variables information?

I am working on a traffic-related prediction system. I want to add a column in my dataset which says if it was raining or not in specific date. I gathered some climate variables information for...

03 April 2019 9,156 6 View

Is there any way to manipulate data in the way that relative error being reduced?

I am working on a data mining project and some of features varies between 1 to 1,000,000. Before training machines, I do the standardization for all the data features and do the rest of machine...

03 April 2019 5,128 5 View

How do we have to compute Fourier kernel for support vector regression?

I'm doing a project in field of data mining using support vector regression method. I used Linear, RBF and polynomial kernels and somehow got familiar with these. I was reading a paper mentioning...

03 April 2019 8,351 0 View

How can I prepare virus for a TEM or SEM imaging?

I have virus (viral hemorrhagic septicemia virus) in suspension and the experiment will not involve cells. What level of TCID50 is preferred?

11 August 2024 3,115 1 View

Which type of compound does lamda max of 218 indicate in a uv-vis spectrum of a partially purified compound through column and TLC?

A crude extract of fungal culture using EtOH was subjected to column and TLC and partially purified compound was obtained. UV vis spectrum of the compound/s has max absorbance at 218nm. The...

11 August 2024 9,801 2 View

Can you connect an HPLC to a Mass Spec only at a certain time point?

Can anyone explain this method? Especially the last statement where it says only at 1.5 to 2.5mins was the MS/MS connected to the UPLC. How is that possible, is it a feature in this specific...

11 August 2024 8,141 3 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

Usually, additive manufacturing techniques like SEBM, SLS, and SLM are used for interconnected porous lattice structure generation with sizes of >100–200 micrometers. Can the Fused Deposition...

09 August 2024 7,892 0 View

Hello researchers Is this a random laser or just fluorescence?

I am using Rhodamine6G as gain medium and silver nanoparticles as scatterers on a microscope slide and laser input 532 nm comes from above.

09 August 2024 9,894 2 View

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

I need to model an anisotropic material in which the Poisson's ratio ν_12 ≠ ν_21 and so on. Therefore, the elastic compliance matrix wouldn't be a symmetric one. In ANSYS APDL, for TB,ANEL...

09 August 2024 5,048 2 View

RNA Extraction Using Hot Borate Method No Longer Working?

I've been performing RNA extraction on cotton petiole tissue for a few months now using the method described in the following paper, a derivative of the typical hot borate method...

08 August 2024 9,882 2 View

How can I apply boundary conditions in an orthotropic steel deck numerical model using ABAQUS software?

I am trying to simulate vehicular loading on an orthotopic steel deck bridge section in ABAQUS software. The red arrow mark in the attached figure indicates the direction in which the vehicle will...

08 August 2024 719 0 View

Can I use a HisTRAP column for affinity chromatography?

I'm working on selecting antibodies against a recombinant protein that has a His-tag. My idea is to first bind the recombinant protein to a HisTRAP column and then use this column for an affinity...

07 August 2024 505 3 View

Hazar Altınbaş

What is prediction list? Your model's speed predictions for test data?

Masoud Masoumi Moghadam

Yes that's right. for each features row there is a prediction.

Eduardo R. Cunha

I would use a Generalized Linear Mixed Model (GLMM) and model Speed as a function of Density, Time-date, Day of Week and Hour, considering labels as a random factor. Using label as a random factor allows you to estimate a particular slope and intercept for each car. Of course you will need several measures for each car in order to fit the model.

Speed ~ Density + Time-date + Day of Week + Hour | 1+Label

If only two Speed values were collected for each car, I would calculate the difference (Speed at t2 - Speef at t1), and model the speed change.

DeltaSpeed ~ Density + Time-date + Day of Week + Hour

Simon Marillet

It could very well be that speed is by far the most predictive feature of your dataset; which isn't too far fetched as it is the same measure as your label - only shifted in time. If that is the case, other features will be "ignored" by your classifier (low weights, unused for splitting etc.).

Since speed is not a perfect predictor of label, you get a scattered plot of prediction versus label. Obviously, it is a good predictor of itself, hence your second plot.

Various approaches can be used to investigate this hypothesis.

I would personally start with a simple strategy: use a linear classifier (ordinary least-squares regression) with speed as the only feature and see how it compares with your results so far (using cross-validation or a test set depending on the amount of data available).

Other methods that can be relevant to check whether speed is the only feature to be consistently selected:

use L1-penalized (lasso) regression, which can be seen as ODS regression with built-in feature selection. Note that your variables should be standardized.
with random forest, compute feature importance ( "gini importance" or "mean decrease impurity").
try to run various feature selection algorithms with various classifiers.

If it turns out I was right, it means that, as they are, most of your variables are not very useful for your regression problem. You could therefore try to investigate how they are distributed conditionally to your label in order to find a pattern which was not caught by your classifiers. Then you can try to transform or combine your features so that they can be used by your classifier.

can you give an example for combination method for features?

I'm a newbie in machine learning, don't know much !

Then, check the value of Time-date for outliers (specific times outside the cyclic week - hour routine - e.g. holidays).

Thinking about it, the issue is not the correlation between Speed and Label, which is a good thing because it means knowing Speed helps you to predict Label - precisely what you want - but rather the correlation between speed and the other variables. For instance, you expect the Speed to be lower for a higher density, and at specific time at specific days (e.g. 5 to 7pm from Monday to Friday). So in the end, once you know speed, the other variables may be of little help, hence what I mentioned in my previous answer.

Did you try to check whether speed was the only variable used by your model as I suggested? If this is the case, a strategy is to get more variables relevant to your problem, which are not related to speed.

Also,check Eduardo R. Cunha 's suggestion to model speed change. I would do the following: use Speed, Density, Day Of Week and Hour to predict DeltaSpeed.

Thus, your label predictions can be achieved in two steps: first predict DeltaSpeed, then predict Label as Speed + DeltaSpeed

Then, check the value of Time-date for outliers (specific times outside the cyclic week - hour routine - e.g. holidays, events).

Roman Liessner

Compare your prediction with a naive predictor.

It is possible that the algorithms simply learn to return the last value of a feature because they end in a local optimum.

I can recommend this article in this regard: https://towardsdatascience.com/how-not-to-use-machine-learning-for-time-series-forecasting-avoiding-the-pitfalls-19f9d7adf424

As a solution you could try to predict not the next speed but the change of speed.