The dash in the expression 4 - approx SQRT( number of cores) is not a minus sign. It is meant to suggest the range, as in "4 to approx SQRT ...". I usually make sure that the number of cores on a given node is a multiple of NPAR, and then jobs seem to run well. So, on a 10-core node, I would set NPAR = 2 or 5 (but not 4 or 8). On a 64-core node, I may set it to 8 or 16. VASP manual says one must experiment to see what works well.
That's it. General rules are in these manuals, however there is no one universal approach - everything depends on machine/architecture etc. You are using 4 nodes 32 cores each or 32 in total?
Personally, I've found that a good value for the types of machines like those at the TACC (Texas Advanced Computing Center) - Stampede and Lonestar machines - is to set the NPAR tag equal to the number of nodes. For example on Lonestar if I have 96 cores that I'm using, that'll be 8 nodes and NPAR = 8.
Since v5.2.13, VASP introduces a new tag - NCORE (http://cms.mpi.univie.ac.at/wiki/index.php/NCORE). Either NPAR, or NCORE can be specified. However, NCORE is much easier to use. Basically, you set NCORE equal to the number of cores per node, and that's it. For instance, on Vienna Scientific Cluster (VSC-3) the optimal performance is achieved by setting NCORE=16 (16 cores per node).