How to make a time series imputation benchmark ?

More Benoît Loucheur's questions See All

Could anyone know any application which allows to be able to determine the modal composition of a rock from its whole rock chemical composition?

I have to classify some rocks using their modal composition, but no point counter in my disposal, that's why I would like to know if some one may help me with any Excel sheet or reliable...

31 October 2023 1,280 3 View

In Construction Grammar, how frequent is frequent enough?

It's all in the question, really. This topic must have been raised and discussed before, but I can't seem to find a good reference just yet. I'm writing this paper where I make the case for a...

20 February 2023 2,552 6 View

Apotracker Green with other surface antibodies for flow cytometry ?

Hi, I would like to use Apotracker green (Biolegend) with other surface antibodies. Does someone have experience with it ? Would you add the Apotracker Green together with the other antibody or...

13 January 2023 9,084 3 View

Which French diagnostic questionnaire can i use to attest bipolar trouble diagnosis for a study ?

Hello, Most research on bipolar disorder are in English and use The SCID-IV questionnaire to attest the diagnosis. Can anyone refer me a questionnaire in French with a good validity for a...

07 September 2022 9,946 0 View

How to get wind data for multiple altitudes?

I need wind speed data for a various altitudes (at least till to 1000m) in brazilian south region.

08 April 2021 9,262 3 View

How to make PEDOT:PSS fluorescent?

Hello, I am spin coating thin films of PEDOT:PSS (Ossila, PH1000, ~50nm thick) on Si wafers and would like to make this thin films fluorescent. I have tried to dissolve fluorescent dyes in the...

05 January 2021 6,822 6 View

[Species Co-occurrence/EcoSimR] Analysis showed only random co-occurrence, how to discuss this?

Hello! I have run a co-occurrence null model with EcoSimR to find out, if certain species of sharks being caught as bycatch in artisanal fisheries always appear together, or vice...

07 May 2020 1,331 0 View

INEXT in R: Problems with processing data. Who can tell me, what I'm doing wrong?

Hello, I'm trying to compare two small datasets of Elasmobranchs landed by artisanal fisheries in Brazil via rarefaction curves that are based on sampling units, not abundances. Therefore, I...

02 April 2020 3,935 9 View

Small sample size in master thesis: What suggestions do you have how to statistically treat the data?

Dear everyone who might be able to help: Hello! I'm currently a student in marine science and I am writing my master thesis about Elasmobranchs and the big numbers that are being caught by...

19 February 2020 2,936 12 View

Which post hoc test to use?

I did two a two way repeated measures ANOVA for a within-subjects cross over study to evaluate the effect of tDCS on HRV over time (40 minutes) in patients with Alcohol Use Disorder. With tDCS put...

17 November 2019 549 6 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Are there any instruments for studying time similar to the way it is in space?

There are a huge number of methods for studying objects in space, according to the senses (and not only). Mechanical, thermal, optical, acoustic, electrical, magnetic, based on particle beams,...

06 August 2024 7,102 0 View

Is Galaxy.org good to use for research for analyzing data and for publication?

Hello all, I wanted to know, can I use galaxy (USA, Europe or Australia) platform for analyzing the shotgun data, and can it be used for publication purpose as well? Thanks :)

06 August 2024 6,610 4 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

Why does the MFDFA algorithm need to calculate the profile of the time series?

As described in the Multifractal detrended fluctuation analysis (MFDFA) algorithm, it at first calculates the profile of the time series, and then other steps are operated on the profile....

05 August 2024 9,366 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

Is there any machine to do real time pcr?

I want to know how do you make real time pcr solation ? is there any machine to make it? thanks for answering

05 August 2024 1,660 0 View

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

Better ways to analyze the qualitative and quantitative data in a sequential explanatory mixed method approaches

04 August 2024 2,703 6 View

Andrey Davydenko

You can use the concept of relative performance applied to time series.

One approach is to use the AvgRel-metrics proposed in Chapter 2 of (Davydenko, 2012), see pages 63-64 in particular:

Thesis Integration of judgmental and statistical approaches for dem...

The AvgRel-metrics assume you use the geometric mean to average relative performances and use the loss function you used to optimize forecasts in order to evaluate forecasts. You can use the AvgRelMAE or the AvgRelMSE.

So the AvgRelMAE or the AvgRelMSE compare the performance of a method against the performance of the benchmark using a reasonably well-designed approach.

In your case the questions are:

1) How to calculate the MAE/MSE for the missing data for the benchmark?

2) how to calculate the MAE/MSE for the missing data for the method being evaluated against the benchmark?

Here you just need to simulate: using known values try making it missng and then impute and compare with real outcomes. The benchmark may be the mean of available cases.

Moreover, the AvgRelMAE/MSE will depend on time series features and the number of values missing. But the general approach I recommend is to use the AvgRel-metrics defined in (Davydenko, 2012). More on AvgRel-metrics here (see relevant slides):

Conference Paper Data Formats and Visual Tools for Forecast Evaluation in Cyb...

Sudhir Yadav

https://datascience.stackexchange.com/questions/89827/time-series-imputation-benchmark

Benoît Loucheur

Sudhir Yadav Why send this link? Yes I asked the question on another forum

Thank you for your response. However I would have liked more to have an idea to start separating my train/test set.

Is it correct and possible to split my set in 2 :

One part over 10 years and another one over 5 years.

Then simulate a lot of times (K times) missing data like the template (screen) on the 10 years data. Select the best parameters for each algorithm, then simulate once the missing data on the 5 years data and choose the best algorithm.

Is it conceivable to do something this way?

What type of missing data do you have? Is it missing-at random (MAR)? https://en.wikipedia.org/wiki/Missing_data#Missing_at_random

The type of missing data that i have is like on the graphics that i made : http://prntscr.com/106qpc2

Values are missing per block. A block of values can be missing at the beginning of a time series, at the end of a time series or even in the middle. There could also be a missing block at the beginning and at the end of a time series.

The most important thing is that they are blocks of missing values and not isolated values.

I'm not sure if it fits exactly the definition of MAR or MNAR?

Please see the wikipedia definition. If the variables are missing at random, it's easier to simulate the missing values and then compare modeled values with real outcomes.