1) To explain for a non scpecialist, if you have a sum : 2+x+y=10, if you give a value to x=3, then y is limited to be 5, then "it" lost one df (1 degree of freedom).
That is why in a contingency table, the df of a table ex 4x5 is = (4-1; 5-1) and generalizing if you have a tabel ixj then df = (i1-; j-1).
Also, you loose one degree of freedamn each time you have to estimate a parameter .
2) We can make a connection between df and the sample dimension.
Less df corresponds to a decrease in sample size. By how much?
Suppose df = v / (n1 + n2-2) (this corresponds to the t-test for independent samples and to make comparison of means and variances with unknown samples, where "v" is the df when the variances are equal), then such lower value of df correponde a decrease in the size of the sample in 1-df (= 1-v / (n1 + n2-2).
In a mechanical system, the degree of freedoms are clear. E.g. the degree of freedoms are related to the number of movements available on different directions.
I think in a mathematical model for instance, this ideea must be transferred to the flexibility of model. This is possible by different actions. One is to enhance the number of model parameters. (like in SPICE models for electron devices)
1) To explain for a non scpecialist, if you have a sum : 2+x+y=10, if you give a value to x=3, then y is limited to be 5, then "it" lost one df (1 degree of freedom).
That is why in a contingency table, the df of a table ex 4x5 is = (4-1; 5-1) and generalizing if you have a tabel ixj then df = (i1-; j-1).
Also, you loose one degree of freedamn each time you have to estimate a parameter .
2) We can make a connection between df and the sample dimension.
Less df corresponds to a decrease in sample size. By how much?
Suppose df = v / (n1 + n2-2) (this corresponds to the t-test for independent samples and to make comparison of means and variances with unknown samples, where "v" is the df when the variances are equal), then such lower value of df correponde a decrease in the size of the sample in 1-df (= 1-v / (n1 + n2-2).
In simple terms a DF can be forgone or not known in a sample as, if part of a variable, it can be determined from a parameter known.
For example the sum of 1+2+X has a mean of 2 (the parameter). This leads us to conclude that the third value (X) must be 3 as then the sum would then be 6 and the mean of the three values 6/3 = 2
This leads us to the notion of n-1 degrees of freedom.
In this example where there are three values for the variable we say there are (n-1) or 3-1 = 2 degrees of freedom
For more than one dimension of parameters need a DF for each dimension
Helena points out above issues surrounding number of DF
I think that a useful explanation to a "lay person" is not possible. If you do it anyway, you will simply tell a lie, and the only benefit is to soothe the person (but not to improve any understanding). There is nothing much wrong when a lay person just knows that there are some "buttons to be pushed" (in a statistical analysis) and that it is clear which buttons are to be pushed in which cases. It is not really required that the lay person understands the wirings and the mechanisms behind the scenes. This changes, however, when the lay person wants or needs to become an "expert", like I would expect or hope for any empirical scientist. And here the major problem starts already much earlier, at the time the concept of "probability" is introduced. I would put my shirt on a horse that most students and empirical scientists are convinced that they knew what probability means, and how it is defined - but that almost all are esentially wrong with this. If a person starts investigating and understanding such an alleged simple thing like the "meaning of probability", this person will become more an expert in statistics (not neccesarily in mathematical statistics, what is a completely different topic! - I am talking about the understanding of basic concepts, regarding things like "information", "entropy", "knowledge", "uncertainty", and alike!) and will then have the foundation of really understanding the concept of "degrees of freedom".
In point 1) I gave an example to explian the loss of df we have each time we need to estimate a parameter ( a real statistica situation, and not a lay); and in 2) I would like you to explian me what is incorrect in the following, because it is a real case of a test t for comparison of 2 independent sample means , and not a lay.
But of course we are always learning from each other, then if you kindly could explain me what is wrong in those statistical explanations, i would be very greatfull.
If you read the example given by Prof. Huber (I gave the link above) it should be clear. He gives a concrete example, and I surely can not explain it any better.
That an independent-sample t-test uses a t-distribution with two degrees of freedom (d.f.) less than the total sample size (N = n1+n2) is more a coincidence with the fact that two means could be estimated. Actually, the test is about one single statistic: the "mean difference", and according to the explanations given above this should lower the d.f. by only 1 (and not by 2). The test does not care about the two individual means. Using the two individual means in the explanation why the d.f. are just N-2 is thus a spurious argument. One can go further and ask how the fractional degrees of freedom would then arise in Welsh-Sattherwhite adoption for unequal variances...
The original question seems to relate to degrees of freedom in a model? For explanation to students, I suggest the following when considering regression models that are fit to data.
1. Models have degrees of freedom (df). Then higher df imply that better fit to the data is possible, because more freedom is allowed in the model structure. So, fit to the data will usually be better.
2. Now the more relevant question is whether we obtain better generalizability of our fit. This depends on the amount of data available. If a small data set is analyzed, using more df will imply more overfit to the data: specific patterns are captured, which are not seen again in future, independent data. A more robust model with fewer df may generalize better. This relates to the bias - variance tradeoff: more df lead to less bias, but higher variance.
3. Examples in a prediction context: http://www.ncbi.nlm.nih.gov/pubmed/25777297. Here we found that splines with 4 df generalize worse than splines with 2 df, despite providing somewhat better fit to the data used to develop the model.
Or: http://www.ncbi.nlm.nih.gov/pubmed/10790680, where a prediction model with 17 predictors (17 df) performed worse in independent data when originally fitted on small data, compared to a model with 8 predictors (8 df).
In sum, higher df usually lead to better fit at model development, but not necessarily better performance at independent data.