Traditionally "mechanistic models" are those that are based on mathematical description of mechanical, chemical, biological etc. phenomenon or process. A good example can the propagation of sound in random media (atmosphere, for instance) that can be described by stochastic differential equations. Stochasticity here is the essential part of the mathematical model. On top of this one can add observational (stochastic/random) error. PK/PD models is another great example of mechanistic modeling: a researcher uses system of differential equations that describes absorption/elimination processes of chemical compounds in the subject body. These equations may contain random parameters, plus random observational errors and in the most sophisticated cases random variability in time of processes in the subject body. Models that are used in population genomics provide another example. Again, stochasticity can be a crucial component of a mechanistic model. Statistics plays a crucial role when we have to estimate unknown parameters that are included in "mechanistic" models using observed information. Often the term "solving inverse problem" is used to describe that stage of research. Of course there are plentiful area of deterministic mechanistic models and the Newton mechanics is a classical example.
The term "empirical models" is mainly used for models that are based on relatively simple approximations like the first and second order polynomials in regression analysis, often motivated by "we do not know anything better" statements. Various "smoothing" models provide another example.
Boundary between "mechanistic" and "empirical" models is rather fuzzy and there exists a wealth of interim types of models.
In simple terms, (i) in deterministic models, all input variables (predictors) and output variables (responses) are non-stochastic; an example is modeling flow of contaminant through porous media via differential equations, and (ii) In stochastic modeling, at least the response variable is a random variable, and the predictors may or may not be random (simple example: a time series regression model for forecasting revenue from monthly revenue data).
I like Singh's answer, but is "mechanistic" synonymous with "deterministic?" A further distinction could be that some statistical models involve mere pattern-recognition (e.g. neural networks, many multivariate techniques like PCA and NMDS) whereas mechanistic models always include proposed relationships with other variables. Mechanistic models may or may not involve stochasticity in the response variable. Unlike all mechanistic models, the particular statistical models I mentioned typically do not attempt to explicitly connected the response variable to a specific hypothesis about the process that created the data.
Most statistical techniques involve mathematical statements about the data generating process and are therefore also mechanistic (e.g. multiple regression, nested model comparison likelihood ratios or model comparison via AIC). Ultimately, I don't think these are useful terms without more context.
Another distinguishing feature that I would add to the list would be that mechanistic models are frequently used when the dependent variable is un-observable.
This happens commonly, when for example one is interested in estimating contaminant concentrations in a species that is not currently present (or cannot be disturbed) on a contaminated site. Under such a situation, "mechanistic models" use first principles to construct a series of equations linking contaminant source to receptor through fate and transport equations, food web models and the physics of equilibrium partitioning to "predict" the concentration in the receptor. In these situations inputs are frequently random variables, but no statistical means is available to optimize the model parameters that best match predicted and actual concentrations. It is the "physical" and chemical mechanisms that provide the basis for prediction.
Global circulation models are in this class of mechanistic models.
I agree with Matt that there is a great deal of confusion on this issue and the term Model is frequently used to describe both mechanistic and statistical versions. Generally the potential to use statistical techniques to infer parameters of mechanistic models is under appreciated.
I agree with what Dr Falcy and Kern said; that issue sounds a little bit semantic. Mechanistic assumes that you know the process generating the response via eg differential equations as in physics. Stochastic assumes that there is something random behind the scene. In practice, most times we are in between and have to combine both approaches: see for instance Stochastic Differential Equations.
As we've discussed here, it is necessary to be clear about what is meant by "mechanistic." In general, I think a mechanistic model is easily the best way to do a forecast. A forecast is requires that observations can be connected to an anticipated response. If the forecaster does not have a clear, mechanistic understanding of the process that gives rise to the response, the he/she is on very weak ground to make a forecast. Beware of "overfitting" if a mechanism is not available. Ideally, uncertainty in the processes generating the data will be cast in alternative models, perhaps with different complexity, and then information theory (AIC, BIC) will be used to weight or evaluate the parsimony of the alternative (mechanistic and stochastic) models.
I would agree that this idea of forecasting--based on causative relationships is one part of the argument, although not necessarily a defining characteristic. With recent developments in big data, the underlying mechanisms are completely ignored and yet Google accurately "forecasts" our next click and what advertisements are most likely to be of interest to us. (The ski boots I looked at yesterday seem to be showing up everywhere I look today).
These methods are based on predictive models like CART, Neural networks and random forests etc. and the whole notion of overfitting is dealt with through one form or another of data splitting training and true validation or generalized cross validation as opposed to our small data, tools the AIC and BIC--Believe me I use these regularly.
So really the need for mechanisms and mechanistic models is relative to the sample size and the number of variables available. As these dimensions grow, the need for the mechanistic model is reduced and predictive power increases with the number of unique cases that can be observed and used to train the mechanism free predictor.
I'm not disagreeing with you, but making the observation that the defining characteristics of a mechanistic model that differentiate it from a traditional statistical model is a bit like stepping on jellow....and hence the continued confusion, particularly when trying to discuss these types of issues across disciplines....
Good points, John. I completely agree with you. I've used information theory and "big data" techniques in forecasting arenas. As a Biologist, I see little utility in the big data techniques beyond the forecast itself since these techniques don't clearly advance science. I would never advise someone to begin their forecasting career with "big data" techniques. However, your point about the size of the data cannot be ignored.
I think we are at an interesting technological juncture that is forcing people to really step back and think about issues of causation, observational studies, modeling of the numerical/mechanistic type and modeling of the empirical statistical type and how they interrelate.
My observation is that the numerical modeling camps and the big data camps are growing by leaps and bounds with leadership from computer scientists and engineers and that the more empirical/traditional statistical camps are struggling to maintain relevance. In the process, appreciation for key statistical issues of experimental design, pseudo-replication, and reliable inference is fading away.
Traditionally "mechanistic models" are those that are based on mathematical description of mechanical, chemical, biological etc. phenomenon or process. A good example can the propagation of sound in random media (atmosphere, for instance) that can be described by stochastic differential equations. Stochasticity here is the essential part of the mathematical model. On top of this one can add observational (stochastic/random) error. PK/PD models is another great example of mechanistic modeling: a researcher uses system of differential equations that describes absorption/elimination processes of chemical compounds in the subject body. These equations may contain random parameters, plus random observational errors and in the most sophisticated cases random variability in time of processes in the subject body. Models that are used in population genomics provide another example. Again, stochasticity can be a crucial component of a mechanistic model. Statistics plays a crucial role when we have to estimate unknown parameters that are included in "mechanistic" models using observed information. Often the term "solving inverse problem" is used to describe that stage of research. Of course there are plentiful area of deterministic mechanistic models and the Newton mechanics is a classical example.
The term "empirical models" is mainly used for models that are based on relatively simple approximations like the first and second order polynomials in regression analysis, often motivated by "we do not know anything better" statements. Various "smoothing" models provide another example.
Boundary between "mechanistic" and "empirical" models is rather fuzzy and there exists a wealth of interim types of models.
Statistical models try to predict the future by projecting current trends. It also reacts to the changes in the variables that impact the outcome. Yes What’s the good use of a forecast model if it changes all the time? Well, it reflects the complexity of the problem these models are trying to solve. For example, weather forecasters use atmospheric models to predict the weather, and as they gather more data on temperature, humidity, and barometric pressure, their forecasts become more accurate and, thus, often change. The emerging concepts for modeling have great scope for accurate prediction as those are trained and tested with large volume of data. The mechanistic models assume all processes as linear and thus far from real world situations.
The differences between mechanistic and statistical modeling are mostly related to our ability/inability to describe the system under consideration by some law on the lowest level.
Simply put, mechanistic models are based on the application of a well know the physical, chemical, or biological law describing the behavior of constituting parts of the modeled system.
One of the very well known examples of such types of models is the application of Newton laws to the movement of physical bodies. You know the laws governing the movement of each body involved and you just follow it tightly in the model.
The problems arise when the number of systemic parts goes up to the extent where you know laws governing the movement of each part but the number of parts is so huge that you are unable to know it!
Such a situation raised in the past in the modeling of gases and resulted in the development of statistical physics (Ludwigs Boltzmann seminal work) that simply said describes enormous ensembles of systemic parts interacting together using statistical descriptions only.
The third basic situation arise when you got no law governing the behavior of systemic parts but you are having some kind of aggregate values/measurements. In such a case, the model is from the beginning designed as the statistical one.
Actually, the last approach can be used to describe complex systems, which we are unable to observe from various reasons, and model and predict the evolution of some integral signal capturing their state.
When you are interested in the last approach, you can read my last paper on the prediction of arrhythmias from ECG recordings.
Complex systems are often intractable by standard approaches, their statistical description that often includes entropy measurements in combination with AI and machine learning techniques can become the game-changer. :-)