Synthetic data are commonly generated in order to validate mathematical models comparing the behaviors of the real data of those generated through the model.
Practically, image you want to generate a synthetic time series in matlab of a certain Gaussian process with a certain length. The first step is to find the parameters of the Normal distribution that fit your process ( code in matlab s = fitdist(x,'Normal'), where x contains your real data). Second step: with the two parameter make the inverse of the normal distribution (just write the well known function). You can now give to this inverse probability distribution function as input random numbers ranging from 0 to 1 (code in matlab syn=rand(n) , where n will be the length of your synthetic time series ). The output will be a synthetic time series, with length equal to n, that follows, in terms of probability distribution function, the real process that you had chosen.
Synthetic data are commonly generated in order to validate mathematical models comparing the behaviors of the real data of those generated through the model.
Practically, image you want to generate a synthetic time series in matlab of a certain Gaussian process with a certain length. The first step is to find the parameters of the Normal distribution that fit your process ( code in matlab s = fitdist(x,'Normal'), where x contains your real data). Second step: with the two parameter make the inverse of the normal distribution (just write the well known function). You can now give to this inverse probability distribution function as input random numbers ranging from 0 to 1 (code in matlab syn=rand(n) , where n will be the length of your synthetic time series ). The output will be a synthetic time series, with length equal to n, that follows, in terms of probability distribution function, the real process that you had chosen.
Suppose you have a biological or physical system for which you know the (range of) parameters governing its dynamics, then by varying the input/or perturbations to such a system you can basically simulate the expected output. This output is what is often quantified as "measured variables". By varying the input, parameters and simulating the behavior of such a system under various situations you basically generate various outputs (measurements). These measurements, are in principle what is referred to as synthetic data. I hope this helps, for the start :).
This topic is related to digital signal processing, in order to simulate real signals (pure signal + noise). In general, this approach is close to generate real conditions in order to produce "simulated" o "synthetic" phenomena behavior, and then study the real phenomenon considering as background support the initial conditions as variables in time domain, and later in frequency domain.
can you please send link to further explore on it? How we can apply it in time-domain and frequency domain? I had a signal. i transformed it to frequency domain also. i have attributes that affect the signal. some of these attributes are impractical to determine in real time. is there anything that can let me generate synthetic data in such case.
there is a book whose title is something like statistical analysis with missing data,
in which they generate synthetic data as part of the algorithms
the E-M method is also a good source for this style of thinking (``if you don't have enough data, make some up, but be careful about its properties, and study the sensitivity of the result to changes in those properties'')
There is a difference between synthetic and simulated data, at least in survey-related literature. I do not know if this terminological differentiation is also common in all fields of statistics. Imagine that you have confidential dataset, and you want to make a synthetic version public. You will just try to fit a model to this data, estimate the parameters, and re-generate the dataset with those parameters. A more elaborate technique consists in generating many datasets. This technique, as mentioned by Christopher Landouer, was introduced by D. Rubin and was seen as an application to multiple imputation to tackle the issue of confidentiality.
Simulated data, for experts in synthetic data, just designates data that was generated without paying attention to the real data.
A recent reference with relevant biblio:
http://www.springer.com/la/book/9781461403258 (Jorg Drechsler, Synthetic Datasets for Statistical Disclosure Control)