Is the purpose of the test to differentiate passing and non-passing (so establishing a single cut-score) or to give each individual a score and allow them to them differentiated reliably throughout the scale? Your answer will affect how you use pilot testing and item ratings from subject matter experts (which are two of the methods that you will likely use).
The distribution of item difficulties will depend on the purpose. If the goal is in to get fairly reliable scores for everyone and you are not using an adaptive format (i.e., CAT) then you would have item difficulties (probably found via IRT) spread throughout the span. If, however, you have particular cut-scores you would want more items near those. The information curves from the typical IRT packages make this clear. For example, if you have a single cut score and think most people should pass, you would not use items that only differentiated those at the top end of the ability scale. So, I am not sure what types of tests Nishad and Subhash are referring to (or how they define their three groups). It would be useful if they can provide more details.