There are some missing data in my sample? I tried to put NaN, but my MATLAB codes did not produce any results. Then I tried to replace all NaN with zero. Is there any better way to deal with missing values in MATLAB rather than zero?
I'm not sure what you mean by "did not produce any results", but there are functions such as 'nanmean' and 'nanstd' which perform calculations across datasets containing NaNs.
Hi Katherine, thank you for your answer. When there is missing data, I put "NaN" and run OLS and then the result of coefficients are NaN. I am not sure how to deal with the missing data so I decide to replace the NaN with zero. Then I can run the codes. However, I am not sure whether there are better ways to deal with missing data. Thanks.
I am not sure if this is of any help now. But you could (or anyone who stumbled upon similar problem) use this Matlab File Exchange submission - inpaint_nans that has multiple approaches for interpolation.
There's no right or wrong answer here, and I would say that no matter what you proceed with (interpolation, imputation through regression/learning, averages etc.) I would say that the right thing to do is to always show the results with and without the treatment you applied on your data. This would ensure complete transparency, as don't forget that each method dealing with your missing data is an estimation (and as such a hypothesis of what your data could look like). I would say that the most important thing is to start with why they are missing in the first place (is it at random or not?).
In any case, MathWorks offers a lot of self-paced courses on data handling (including NaNs) through its core training programs. So feel free to take some of these to see how NaNs could be handled in the easiest way.
Personally, I would run my results excluding the NaNs (you can use functions as filters like "isnan" to exclude these from your estimations - do note that zeros are sometimes distorting the results, depending on what you end up doing with these data) and then use interpolation and a second way (probably imputation through regressions or kNN) to see how my results could end up with different treatments.
In addition to what Ouahab Kadri has written, the last line of his code
result=interp1(xi,yi,x,'linear') can be replaced by result=interp1(xi,yi,x,'spine') for better result.
NB: This is very useful when y contains two data points. If y is empty, kindly guess the two data points. MATLAB wont execute it until you have a minimum of two data points.