before real forecast, you can forecast to an observed period, and compare forecasted and observed values. Several absolute and relative error measures can be calculated (e.g. RMSE). Cross-validation can also be done.
As per my knowledge, the OAGCM should realistically represent several aspects of the climate system. For example- The prediction using the OAGCM is justified in connection to the regional and global climate change, if it successfully simulates of well-known the El Niño–Southern Oscillation (ENSO) and Asian monsoon teleconnections in it.
No forecast is 100% accurate, and cant be sold as a service with any guarantee of such high level of accuracy. Now, Before going operational, the best thing you can do is evaluate the forecast with a few different statistics.
Statistical skill scores such as Mean Absolute error, Root mean square error, Correlation etc. for the performance of your forecast model can be calculated against the observation (station/satellite) data and a decision can be made whether to use the forecast directly for operational service or to perform further BIAS correction on it before you do it so.
Follow this link for forecast verification statistics,
This question is very complex and there is no clear answer. The set of possible combinations of model types (hydrostatic, non-hydrostatic), resolutions, physical parameterization and desired predictions (from minutes to days) is very large and each of the elements of that set requires a specific methodology. For further details please consult
https://www.cawcr.gov.au/projects/verification/
The above site should provide you some suggestions how to design an ultimate set of tests for the forecast model.
It depends on the forecast model. There are forecast models involving statistical empirical techniques, numerical models, data assimilation or involving two or more of the above techniques. Validations, tuning and bias corrections are often employed before the forecast deemed operational. Forecasting across ensembles are often proffered than single model forecasts because every model has its own strengths and weaknesses in representing the physical and chemical characteristics across wide ranges of scales. Therefore, ensemble averaging and forecasts with probabilities, ranges and error analysis are required before going operational.