Analysing longitudinal data via Statistical modelling, e.g., multivariate regression modelling is recently hot topic among computer science researchers, see e.g., https://cdn1.sph.harvard.edu/wp-content/uploads/sites/343/2013/03/abc.pdf
It is important to clarify what is meant by longitudinal data. Two closely related topics are cross-sectional data and time series data. If the data values have a meaningful sequence, then we deal with time series. If the order is not important, then the data can be said to have a cross-sectional dimension. A kind of data that has its own set of problems is cross-sectional time series, or longitudinal, or panel data. Longitudinal data refers to data containing time-series observations for several subjects [1]. Each observation involves at least two dimensions; a cross-sectional one, and a time-series dimension. In other words, longitudinal data records the evolution of individual units such as people or engineering equipment at different time points.
To deal with time-series data (and longitudinal data also), recurrent neural networks (RNNs) are typically appropriate. These networks were designed to deal explicitly with sequential or temporal relationships in data. They have shown promise in diverse fields, including natural language processing, speech, music, video, and handwriting. There are several types of RNNs, such as Echo State Network, Long Short Term Memory Network, Gated Recurrent Unit, among others.
The work of Ilya Sutskever is of note in the field of RNNs.
[1] - Hsiao, Cheng. Analysis of Panel Data. No. 54. Cambridge University Press, 2014.
Thanks for the insight. Quite helpful. I am dealing with a time-series data collected at different time points and have covered multiple domains (determinants of health ). For that matter , I reckon 'recurrent neural networks'
Could you please clarify what you mean by "covered multiple domains"? I guess you mean data characterizing the health of different people over time. If that is so, you are dealing with panel data (or longitudinal data). Another name for this type of data is cross-sectional time series.
Recurrent neural networks can deal with these kind of data quite naturally but there are some concerns, namely how to prepare the data for the RNN. For practical examples, please see [1]. In your case, the topic is "How to reshape multiple parallel series data for an LSTM model and define the input layer.".
Thanks Márcia Baptista for the insight. Yes, my data intends to reflect health of a group of individuals over a time span of 12 years when multiple factors affecting the health are kept into consideration.
You are dealing with longitudinal (panel or cross-sectional) data as you correctly identified. Econometric has a long history of dealing with these kind of data using regression techniques such as the ones Muhammad Ali has referred. Again, if you want more advanced AI techniques I guess RNNs may be a good choice.
In some of the studies used Bayesian hierarchical models for making these predictions together with their associated uncertainty by making predictions either of future measurements or of the time taken to reach some threshold value. this method particular addressed, which include some novel components, are handling curvature in individuals’ trends over time, making predictions for both underlying and measured levels, making predictions from a single baseline measurement, making predictions from a series of measurements, allowing flexibility in the error and random-effects distributions, and including covariates.