It depends on what you want to calculate. The spectrum of a mixed valency state is not even in principle describable by a Kohn-Sham (KS) spectrum since the KS solution is an effective single particle description and the mixed valency state is a "true" many-body state which needs to be described as a sum of several single particle states and not just a renormalized single particle state. Or to put it in plain quantum chemistry: The mixed valency means that the ground state is a sum of several determinants, and the KS solution is a single-determinant solution. That means that even if we possessed the True, Divine Functional the KS spectrum probably wouldn't be anything like the experimental one.
There are possible extensions to DFT that could be used to get the spectrum, in the case you describe I would probably use dynamical mean-field theory (LDA+DMFT) using something like the Hubbard I approximation for the local problem, that should be able to give the spectrum. But it is an art to apply these methods, so I would probably instead try to ask a question that does not involve accurate calculation of a spectrum.
Apart from that, of course total energies and (average) on-site occupancies should in principle come out right in DFT. Without knowing much about the particular case of these complexes, I would think that you might need to apply something like a hybrid functional or possibly "+U" correction to remove some of the LDA/GGA error in the dependency of the total energy on partial occupation numbers. This is commonly referred to as the self-interaction error and this is what conventionally gets the blame for the delocalization problems with LDA/GGA. The problem, especially with +U, is that you have a theory with adjustable parameters that you can use to tweak for example the on-site occupancy to whatever you want, so you need to make sure that you don't fool yourself.
But on the other hand, you say that these things are well characterized experimentally according to the degree of localization, so it should be straightforward to just calculate some of these experimentally well-known properties that define the classes for a couple of functionals and see how they do.
Sorry about repeating some stuff already said, but I wrote the answer and then forgot to submit it...
As a rule of thumb, hybrid functionals tend to give a better description of electron delocalization in cases such as yours, although it should be noted that there isn't necessarily a sound theoretical basis for this. Another option is the DFT+U approach.
Try wb97xd. It is hybrid functional, it considers dispersion interaction, and it also features range separation, (22% HF when short range, 100% HF in long range). This kind of functional is said to give better results for charge transfer excitations. I think it might also be appropriate for your case.
It depends on what you want to calculate. The spectrum of a mixed valency state is not even in principle describable by a Kohn-Sham (KS) spectrum since the KS solution is an effective single particle description and the mixed valency state is a "true" many-body state which needs to be described as a sum of several single particle states and not just a renormalized single particle state. Or to put it in plain quantum chemistry: The mixed valency means that the ground state is a sum of several determinants, and the KS solution is a single-determinant solution. That means that even if we possessed the True, Divine Functional the KS spectrum probably wouldn't be anything like the experimental one.
There are possible extensions to DFT that could be used to get the spectrum, in the case you describe I would probably use dynamical mean-field theory (LDA+DMFT) using something like the Hubbard I approximation for the local problem, that should be able to give the spectrum. But it is an art to apply these methods, so I would probably instead try to ask a question that does not involve accurate calculation of a spectrum.
Apart from that, of course total energies and (average) on-site occupancies should in principle come out right in DFT. Without knowing much about the particular case of these complexes, I would think that you might need to apply something like a hybrid functional or possibly "+U" correction to remove some of the LDA/GGA error in the dependency of the total energy on partial occupation numbers. This is commonly referred to as the self-interaction error and this is what conventionally gets the blame for the delocalization problems with LDA/GGA. The problem, especially with +U, is that you have a theory with adjustable parameters that you can use to tweak for example the on-site occupancy to whatever you want, so you need to make sure that you don't fool yourself.
But on the other hand, you say that these things are well characterized experimentally according to the degree of localization, so it should be straightforward to just calculate some of these experimentally well-known properties that define the classes for a couple of functionals and see how they do.
Sorry about repeating some stuff already said, but I wrote the answer and then forgot to submit it...
Neglecting the possible multi-reference character of your systems a priori and going straightfoward to KS-DFT, I agree to some of the previous suggestions: GGAs have been shown to overestimate delocalization in some mixed valence complexes and reduced transition metal oxides[e.g., 1] (due to the Coulomb self-interaction error). A prominent case, I have been working with, are partially reduced ceria gas-phase clusters.[2]
For the clusters, we found B3-LYP and TPSSh did their job to determine the structures. However, it seems that neither B3-LYP nor TPSSh have been able to predict always the right structures among all investigated compositions. (We used IR spectra for structural assignment.)
The DFT+U approach is mostly available in plane wave/solid state quantum chemistry packages, as it serves as a kind of replacement for the much more costly hybrid functional under PBC. If you work under PBC you may consider also the screened hybrid functional HSE. These approaches also tend to give reasonable approximate descriptions for 'excess' electrons in the material.
Which density functional APPROXIMATION will work for your system and the properties you want to investigate is, however, 'experimental'. I strongly suggest to compare with experimental reference properties in each case.
[1] Z. Yang et al., J. Chem. Phys. 120, 7741 (2004)
[2] A. M. Burow, Phys. Chem. Chem. Phys., 2011, 13, 19393–19400
I have obtained good results for different systems by using the traditional B3LYP functional . So, I believe that B3LYP is a good initial guess for your study.