Consider the following differential equation (ODE):
dy/dx = f(x).
You discretize the variable x into the set {x_0, x_1, x_2, ....., x_N}. The difference between the forward and backward differences lies in the way how you approximate the derivative. The derivative is a mathematical concept and provides you with the instantaneous change of a function. In real world you can measure only average change. Thus, the derivative is actually Delta y / Delta x. More precisely:
Forward scheme: y_{i+1} - y_i = h*f(x_i) (you use the next value of y);
Backward scheme: y_i - y_{i-1} = h*f(x_i) (you use the previous value of y).
If you consider a simple computational mesh of the x axis with mesh steps dx indexed by i say, i = 1,..., IMAX, then if you calculate derivative at i, the forward difference is
These are for the function f defined in x. The big difference is that the forward difference reaches forward on the mesh to compute the difference, hence the derivative at x(i) while the backwards difference reaches backward on the mesh to compute the derivative at x(i). Note that if you add these 2 together, you get
df(i)/dx+ + df(i)/dx- = (f(i+1) - f(i-1))/dx
This is the so-called centered difference formula. These formula are formally known as Newton Divided Differences as developed by Sir Isaac Newton. OK, that's all well and good, but the real question you may be asking is why would we use forward and backward formulas? Forward and backward differences are also referred to as "upwind" formulas because they are respective of a preferred direction of information flow. Without getting too general, if a physics problem poses no preferred direction of information flow, then a centered difference is likely to be appropriate. An example of this problem is heat conduction or other diffusion problem.
Consider another problem say, a linear wave traveling along the x axis from the left to the right. In its simplest form, this problem is modeled by the first order 1D wave equation. Information propagates along with the wave from left to right, so the preferred direction of information flow is in the +x direction. It's almost like a causality argument, but wave properties at x(i) must depend on properties at x(i-1), so the use of the backward difference is needed to respect the physical direction of information travel. On the other hand, if the wave were moving from right to left, the forward difference would be needed.
These are simple examples, but the real power of this type of these differences lies in algorithms for fluid dynamics. In high speed fluid flows, there is usually a preferred direction of information flow taken with respect to the propagation of sound. Proper upwind difference schemes are crucial for the capturing of shock waves in both transonic and supersonic flow fields.
The Backward and Forward difference method are numerical methods to solve numerically ordinary differential equations (ODE). Can be deduced in several ways. By example, by using the rectangle rule of numerical integration applied to the (equivalent) integral representation of the ODE or by using the Taylor series. The methods are also called Euler implicit or explicit, respectively.
When consider an evolutionary partial differential equation (PDE), you can apply this methods to solve the resulting system of ODE obtained after perform a discretization in space by using finite differences, finite volumes or finite element method (or another technique).
Some introductory references are:
1. Numerical Analysis by R. Burden, D. Faires & A. Burden, 10th ed. chapt 5.
2. Numerical methods for engineer by S. Chapra & R. Canale, chapt. 25
A more advanced reference is:
3. Numerical solution of ODE by K. Atkinson, W. Han & D. Stewart.
Specific use of the method in PDEs you can find in Applied PDE by J. David Logan and as well as in chapter 11 of reference 1 and chapters 29 to 32 of reference 2.
The backward and the forward difference of order one have the same accuracy. The difference should be seen in the boundary conditions or for the initial condition where the difference at right is more suitable and accurate. For unstructured mesh the difference could be significante depending on the geometrie of the domain.