I am writing this note to call out to researchers in combinatorics and probability theory. I invite researchers to think about the problem I will describe below.
The MergeSort algorithm recursively divides the input array into sub-arrays until an element is left in each sub-array. This division is an unconditional division, unlike the Quicksort algorithm. Then, MergeSort recursively merges the two sorted sub-arrays into a sorted array. In all cases, the algorithm's running time is Ө(nlgn), meaning that even if the input array is sorted, MergeSort needs Ө(nlgn) running time to sort it.
Recently, we suggested that the division phase of the MergeSort algorithm can be carried out a little more intelligently rather than unconditionally. We tried 3 ways to do this.
Way 1: We scan the array from beginning to end, divide it into increasing subarrays of consecutive terms, and then merge these subarrays into an increasing array using the tournament method.
Way 2: We scan the array from beginning to end and divide it into increasing or decreasing subarrays of consecutive terms and then merge these subarrays into an increasing array using the tournament method.
Way 3: Some arrays are combinations of 2 increasing and decreasing subarrays (the terms of subarrays need not be consecutive terms of the input array), but of course, the vast majority of arrays do not meet this condition. We divide the input array into two increasing and decreasing subarrays up to the possible term. Then, we merge these two subarrays into an increasing subarray. Then, we repeat these two operations starting from the term we left until we reach the end of the input array. Finally, we merge the sorted subarrays we found into an increasing array using the tournament method.
In all three ways, the algorithm we propose sorts at Ө(n) in the best case and at Ө(nlgn) in the worst case.
For more details, you can access our study from the link below:
https://www.researchgate.net/publication/379560542_A_new_approach_to_Mergesort_algorithm_Divide_smart_and_conquer
Research question: What is the average processing time of our proposed algorithm for each of the 3 paths separately? We can experimentally observe that it is Ө(ngn), but we cannot prove it mathematically.
For this, it is necessary to calculate the expected value of the number of subarrays obtained after division under the assumption of a distribution.