There are many parallel algorithms for sorting problem. I want to know from a given parallel algorithm how a work-optimal randomized algorithm can be developed?
You just subdivide the problem equally among P processors, sort the P sub-problems, then merge the already sorted lists on adjacent pairs of processors and walk back up a binary tree. You want a wall-time optimal solution for parallel algorithms.
How about looking at the algorithm at the Intel link below.
How about a distribution/bucket sort? Expected average complexity is O(n), but worst case is O(n2). The idea is to first put elements into different buckets (which in my opinion can be already done in parallel) and then assign one bucket to each processor. Each processor then sorts the elements of its bucket with any well-known algorithm, e.g. quicksort. Finally, the buckets just need to be stored in the correct order.
The disadvantage of this algorithm is that ideally all buckets contain approximately the same number of elements. This means that the boundaries of the buckets need to be set accordingly. To determine these values in an appropriate amount of time we need knowledge about the distribution of values or a randomized approach (usually you would draw representative values from the whole data).
The advantage of this sorting strategy compared to merge sort is the missing merge step. Merging values into an existing array (i.e. inserting them) usually involves shifting values many times. This makes array-based implementations very inefficient. Bucket sort does not have these problems.