I think you need to work from whole to part, instead of part to whole. In ERDAS imagine mark all GCP and supply respective coordinates, while rectifying select elevation source from DEM either cartsat or dem.
"though may be accurate are not directly usable because the hill tops and valleys may not accurately register with the GCPs available." I dont think it has much effect if you use 90m, 30m, or 10 m DEM. Since the smallest measurable unit in RS is pixel. If you have GCP whose precision is +_ 5 cm, that will fall any where on 10 m pixel.
In such scenario, do we need to go for LIDAR based elevation values which is a costly affair? Or is it suggested to use stereo pair which is also a costly affair?
As you said above methods are costly and doesn't improve your accuracy very much in-terms of orthorectification. 'Economy of the accuracy' is crucial here.
DEM derived from LIDAR is costly and get clearance from ministry. Stereo pair image is cheaper than LIDAR. Generate orthorectified image and DEM from Stereo pair. You should consider base to height ratio in stereo pair image. Leica photogrammetry Suite(LPS) and Autosync are available in ERDAS Imagine.