The NBsUse is the number of actually used basis functions. The relationship between the used basis functions and MOs is a linear transformation of same dimension, so their numbers must be exactly identical.
If you simply hope that NbsUse is always equal to NBasis (even when diffuse functions are used), you can add IOp(3/32=2) keyword to avoid Gaussian automatically eliminating linear dependent basis functions.
Generally, the number of basis function (Nbasis) is identical to the finally generated MOs. However, when diffuse functions are employed, the number of actually used basis function (NBsUse) may be less than NBasis, because diffuse functions often lead to severe linear dependence problem, and Gaussian automatically eliminates linearly dependent basis functions to avoid this problem. Since the number of yielded MOs is always identical to NBsUse, it could be less than Nbasis.
As you have said after removing the diffuse functions from my basis set (6-31+g(d,p) to 6-311g) the NBsUse and NBasis are same now. I am using some code to calculate transfer integral directly from the Gaussian output and that needs the NBasis and no of MOs to be same. So in this case is there any way to use the diffuse function avoiding the linear dependency problem? Or can we make NBsUse and NBasis equal manually?
I have noticed for some large molecules (my molecule is also large with 37 atoms) NBsUse and NBasis are same after using 6-311g(d,p) ! What actually determines these values, as they are not same for case of mine?
And why should the NBsUse and no of MOs are always equal?
The NBsUse is the number of actually used basis functions. The relationship between the used basis functions and MOs is a linear transformation of same dimension, so their numbers must be exactly identical.
If you simply hope that NbsUse is always equal to NBasis (even when diffuse functions are used), you can add IOp(3/32=2) keyword to avoid Gaussian automatically eliminating linear dependent basis functions.
Actually I was trying to doing the calculations with the keyword IOp(3/32=2) as you mentioned. But no calculations is converging with this keyword. The sample error is like that:
Rare condition: small coef for last iteration: 0.934D-05
EnCoef did 100 forward-backward iterations
Restarting incremental Fock formation.
EnCoef did 100 forward-backward iterations
EnCoef did 3 forward-backward iterations
EnCoef did 100 forward-backward iterations
EnCoef did 7 forward-backward iterations
EnCoef did 100 forward-backward iterations
EnCoef did 5 forward-backward iterations
EnCoef did 4 forward-backward iterations
EnCoef did 100 forward-backward iterations
>>>>>>>>>> Convergence criterion not met.
SCF Done: E(RB3LYP) = -2656935.54400 A.U. after 129 cycles
NFock=128 Conv=0.51D+01 -V/T=*******
Convergence failure -- run terminated.
Error termination via Lnk1e in /snufs/apps/Gaussian/g09/l502.exe at Sat Jun 30 12:50:28 2018.
Job cpu time: 8 days 1 hours 54 minutes 41.6 seconds.
Linear dependency problem of basis functions makes SCF convergence much more difficult, this why Gaussian always attempts to eliminate linear dependent basis functions. There are several solutions that you can try:
(1) Simply remove diffuse functions if they are not very necessary.
(2) Firstly use 6-311G** to conduct the calculation, and then use guess=read keyword to load the converged wavefunction from chk file as initial guessing wavefunction of 6-311+G** calculation
(3) Add some basis keywords to facilitate SCF convergence, for example: scf(novaracc,noincfock,xqc) int(acc2e=12,ultrafine)
You should fix the transfer integral code. The number of linearly independent basis functions always matches the number of molecular orbitals, but this might be different from the number of (linearly dependent) atomic orbitals i.e. the full molecular basis set. If you don't remove the linear dependencies, convergence will suffer a great deal and you might be converging to a wrong solution. You'll also be limited to very small basis sets, which means the quality of your answers will suffer. Many properties may require triple or quadruple zeta basis sets - possibly even augmented with many diffuse functions - to get the converged value, and this means you'll pretty much always have to deal with singular basis sets i.e. ones where the overlap matrix has near-zero eigenvalues.
I will add my input here as well since the code Dwaipayan has mentioned is one I wrote. Tian has correctly reasoned that the Number of basis functions must be equal to the number of molecular orbitals. This is necessary as a result of the symmetric orthonormalization process (Lowden Orthonormalization) that is used in my [transfer integral code](https://github.com/JoshuaSBrown/QC_Tools) and I believe is also used during the scf process in quantum chemistry packages... though I'm not positive on that one. Symmetric orthonormalization requires the use of a square matrix (rows - MOs, cols - AO basis set functions) hence the dimensional constraint.
If this is what is going on the extra molecular orbitals from Gaussian should appear in the LUMO levels, they are probably simply not being printed. However, Gaussian may also be doing some fancy linear algebra and cutting corners to save on computational time. That said the transfer integral code could simply be updated at Susi has pointed out. By simply using Graham Schmidt orthonormalization instead of Lowden one does not require the use of equal numbers of basis functions and MOs.
What Gaussian and every other quantum chemistry program does is use canonical orthonormalization in case the overlap matrix is singular. That is, once you have run the eigendecomposition of the overlap matrix S as S = U L U^T, where U is now a matrix holding the eigenvectors in the columns and L is a diagonal matrix of the eigenvalues, instead of forming the half-inverse overlap matrix as
Sinvh = U 1/sqrt(L) U^T
what you do is you form a non-square matrix with columns
Sinvh(i) = U(i)/sqrt(L(i))
where U(i) is the i:th column (eigenvector) and L(i) the corresponding eigenvalue. You do this for all eigenvectors with eigenvalues L(i)>=tol where typically tol=1e-5, but you might be able to go down to tol=1e-7 without affecting the numerical stability of the results.
Even if Sinvh is non-square, everything works the same way as before: your orthogonal Fock matrix is
F_orth = Sinvh^T F_AO Sinvh
which you can diagonalize and then get the eigenvectors as
So, the website was buggy when I posted my answer - it didn't show up but when I refreshed there were a dozen copies, so I removed all but one. But it turns out that it was missing the final note.
Namely, you don't need to recalculate the orthonormalization of the basis set, if you have the orbitals from Gaussian (or any other QM program), because they are already orthonormal with respect to the basis set i.e. C^T S C = 1. You can use the MOs as your basis.
Also, it appears Gaussian is using 10^-6 as its default cutoff, see http://gaussian.com/overlay3/#iop_(3/59)