This totally depends on what your jobs actually do. Usual rule of the thumb is that the amount of data consumed by mapper should be no more than 10 HDFS blocks, but at the same time it should try to consume most of the block which is about 16MB on most installations, but it's configurable. At the same time job start-up time could be a minute or two, so you want your tasks to run for at least 15-20 minutes to reduce overhead, but long running tasks are at risk of failure ( the longer tasks exists higher the risk) also running tasks make the downstream tasks hang around and wait for completion thus driving utilization down.
So based on this constraints you can come up with the model. But as I said it totally depends on what are you doing in these tasks
In addition to Vlad's answer and the timings that are listed therein you might also want to consider what would you see as an acceptable time for the whole process to execute. (For example, a few hours? A few months?)
To get a useful estimate of that, you might want to profile your code to discover the timings due to unavoidably serialy executed operations and parallel operations. This is because it is not necessary that pouring more CPUs on a problem will result in less computational time. For more information, please see attached link.
It depends on your MapReduce job and on your cluster. There are some research works that aim to determine the optimal/best MapReduce parameters (number of Map/Reduce functions, Chunk size, ...) but they are generally designed for a specific application. For example, this paper deals with MapReduce-based pattern mining approaches: