We know that the binary search is the fastest approach to find an item in the group of items. Do you believe that we can develop more efficient algorithm than binary search?
The original question is not formulated very precisely. I think it is important to know (i) whether the group of items satisfies certain properties and (ii) whether you want to look up an item once or a huge number of times. Let's consider a number of cases.
(1) You need to look up items in your group of items very often. In that case, it may be worth the cost of building an index. E.g. if you have a hash table you can find back any item in unit time
(2) You don't need to look up items very often, building an index as preprocessing is too expensive.
(2a) your items are numbered. Then, you can put them in a vector/array and look them up by number in unit time.
(2b) your items are sorted (according to some complete order). Then, you can perform binary search in time logarithmic in the size of your set of items.
(2c) there is no order or structure in your group of items. Then, you have no other option them searching linearly through all items (since any of them could be the one you are looking for)
For case (2b), H.E. Lehtihet wonders about the case where the items are stored in an ordered search tree (e.g. a binary tree where the items in a left branch below a node are all smaller than the items in the right branch). This is indeed an option, especially if you want to add/remove items and still have them sorted. There are specific datastructures to keep such trees balanced, e.g. http://en.wikipedia.org/wiki/Self-balancing_binary_search_tree
If 'we know that the binary search is the fastest approach to find an item in the group of items', how we then can believe that 'we can develop more efficient algorithm than binary search' ?????????????
I consider it intuitively evident that for random sequences of data, binary search can not be improved. It is however an interesting question how the human brain can form such internally convincing judgements on such logically complicated matters.
Of course most interesting: Is there a proof or a disproof of this 'evident' conjecture. Being not an expert on order algorithms I don't know, but would be interested to learn.
I agree with Vitaly. Not only BS applies to an ordered set of items but also this ordering is crucial. Indeed, BS is a general-purpose algorithm that does not require that the items have any specific additional properties other than precisely their order.
Concerning the problem of improving BS, it seems to me that the real issue is not so much the efficiency of the algorithm per se, but rather the flexibility of the structure (tree) in which the items are being placed. For example, a poorly-balanced tree may induce a significant downgrade of the performance of BS. Therefore, my guess is that the most interesting improvements rather concern how to re-balance dynamically the tree whenever an item is either added or suppressed from the list. A well-known example of such flexible structures are the so-called AVL trees.
The original question is not formulated very precisely. I think it is important to know (i) whether the group of items satisfies certain properties and (ii) whether you want to look up an item once or a huge number of times. Let's consider a number of cases.
(1) You need to look up items in your group of items very often. In that case, it may be worth the cost of building an index. E.g. if you have a hash table you can find back any item in unit time
(2) You don't need to look up items very often, building an index as preprocessing is too expensive.
(2a) your items are numbered. Then, you can put them in a vector/array and look them up by number in unit time.
(2b) your items are sorted (according to some complete order). Then, you can perform binary search in time logarithmic in the size of your set of items.
(2c) there is no order or structure in your group of items. Then, you have no other option them searching linearly through all items (since any of them could be the one you are looking for)
For case (2b), H.E. Lehtihet wonders about the case where the items are stored in an ordered search tree (e.g. a binary tree where the items in a left branch below a node are all smaller than the items in the right branch). This is indeed an option, especially if you want to add/remove items and still have them sorted. There are specific datastructures to keep such trees balanced, e.g. http://en.wikipedia.org/wiki/Self-balancing_binary_search_tree
For quantum computers there is also Grover's algorithm, which can be used to search through an unsorted collection of N items in O(sqrt(N)) steps, rather than O(N) steps. This algorithm is one of the major reasons why quantum computing is considered "interesting".
As I understand, quantum computing algorithms are typically probabalistic - often requiring additional steps to arrive at a definitive solution. Also, potential benefits may be severely limited for databases that require access to external storage devices, as the peripheral database might have to be accessed to reinitialize a quantum device for each new comparison...
Considerations not yet mentioned include data volatility and storage space requirements, which can also affect search times...
Conversely, an infrequent sequential search of a few items stored in memory or in buffers may yield satisfactory results, possibly requiring fewer processing cycles for resolution than even a binary search, especially if items can be ordered such that most search values will be found in the first entries...
Binary search is a basic, introductory and simplest search. But this search is not efficient. If I talk on root to explain my point so I can say a simple IF construct is used in binary search. But using the nested IF or CASE SELECT can improve the efficiency of a search program.
Binary search works efficiently on ordered list. If you are looking for other options, Donald Knuth wrote an entire volume dedicated to Searching and Sorting (volume 3) http://www.amazon.com/Art-Computer-Programming-Volume-Searching/dp/0201896850
If we are taking on the assumptions of binary search, then no, we cannot if we are using the RAM model. There is a lower bound on searching with these premises, and it is met by binary search. It is also the same reason why binary search is commonly employed in linear programming techniques. Asymptotically it cannot be beaten with inputs of a sorted array. It is one of the classic examples of an algorithm meeting a lower bound. This one in particular is for a class of search problems.
I am sorry, I could not participate here for some reasons. Anyway yes the BINARY SEARCH is the most efficient but it does not mean that we can't introduce some better options. If the dataset is ordered(sorted) then if we make groups according to our need and the system capacity then our search capacity will increase due to the reduced domain(group) of target data. We can predict our item and at every iteration we will dramatically reach to our target item with N times faster speed. Here N is the no. of groups.
Please advice me weather my approach is right or wrong?
Let me see if I understand what you're saying. Are you proposing that the ordered dataset be subdivided into N groups, each of which will be ordered, and then you
want to run a search on each of the groups in parallel? Each of the groups would be eliminated in much the same way as a portion of the original dataset would be. But I don't see an efficiency as each of the subgroups will have at least one search query done on them regardless if they are within the range of the number being searched for. That query will only eliminate the group.
If I've not understood your proposed solution, could you give a more concrete example of what you are trying to explain?
@Qaim. I don't think this would work if you want more efficiency unless this is suited for an computational model where that works effectively (I imagine something like that would be a decent strategy for employing binary search in parallel if you assume the data is partitioned such that the key won't be in more than one subarray). It seems like with your approach you will lose instances if you assume you can group them into "groups". What are the groups? This may or may not work depending on what you assume is a group. Some data structures use this idea in how they are constructed, so it isn't necessarily a bad idea if you structure the data (it's just the overhead that's the problem). This searching problem (the one binary search is used for) relates to an array, and trying to search for an element that is provided (both as input).
Let me explain a little more carefully. We can solve this problem with varying complexities but binary search meets the lower bound on the amount of computation required for the RAM model. You can try this by forming a tree and asking questions about the tree (it actually has a pretty interesting relationship with sorting and its lower bound for comparison based sorting). You can actually prove this with little more than a decision tree model, and a couple commonly known closed forms for trees.
Now, if you want to improve the time-complexity (we can't really beat the lower bound, but imagine if you relaxed the problem a bit so you want to do LOTS of searches), you often need to make use of more space to carry out POST-operations. An example: Create a hash table, then use the key as a lookup value (I mean if you find it in the process, it is no better than linear search), then the lookup. If there are n elements, you get THETA(n). So even looking at all the elements is not a good idea.
Now, this isn't the end of course, imagine if you wanted to use this data structure (the hash table, or even better (for fun) say a cuckoo hashing table (let's overlook the fact that one needs to rebuild this table after a while). You can get some wicked fast search queries as it is all essentially lookups. But, notice this isn't any better because to just find the element required us to look at all of them! Many crafty algorithmic techniques like this can be used to exploit data and make for efficient queries on large sets of data over time though. That's a great benefit of data structures. But, this is more of information than a direct answer.
Keep in mind the following:
Binary Search will take a logarithmic number of steps in the worst case. You have n items in the array. It throws away "half" of the subinstance each time! In your method of breaking it into smaller groups, you may need to do this more than a logarithmic number of steps as you would have more than one of the groups (I could imagine it being closer to linear time). A way you can check its complexity is the following (in the worst case):
-How many groups are made (in the worst case)?
-How many basic operations are performed on each group (in the worst case)?
With respect to the input for any instance, combine these two and you're likely to get the time-complexity of interest under the RAM model.
Aside: If you want to put this into a parallel environment, you would look at how many operations each group has when the data is partitioned, then you need to consider the synchronization related to dishing out the data and getting the answer back before terminating. For example, each time one core needs something, assign it some variable length of time, then use this as a unit.
The lower bounds for establishing a value match is to avoid generalized search processing completely. If it can be reliably determined that, of all color values, for example, most will be red and many will be blue, simply using If "red" Do... Else If "blue" Do... Else Search table would be most efficient. This may require program encoding or paramaterization of related process variables, but may not present program maintenance issues unless they are highly volatile.
Alternatively, if table entries can be reliability ordered probabilistically - a simple sequential search may result in lower total execution times than any other search mechanism.
In my experience, programmers (i.e., myself ) may become infatuated with programming aesthetics when specific knowledge of the data can be leveraged to produce (albeit less pleasing) coding that delivers optimal performance... Program maintenance issues and evolving conditions must also be carefully considered, however.
The problem with the "yes" answers here is that they are actually distorting the premises. binary search is the theoretical lowest bound for searching in sorted lists. "If we divide the dataset into N subsets ...". well then you will be running a preliminary binary search for finding the subset, then a secondary binary search within the subset for finding the element. It just changes data organization, not the actual premise. Also lower bound is not effected by the implementation details. That is just giving the answer based on "one implementation of binary search". As long as data has nothing to offer to the programmer other than the search field and a sorted list there can be no faster algorthm than binary search. So for the specific question the answer should be definitely No.
If the dataset has no additional property or no specific storage structure it can be mathematically proved that binary search is most efficient taking log n time.cases and ifs etc just add programming cosmetics but do not enhance computing time
of .course the present computing paradigm is used and we and not speaking of quantum computing or neural network searches etc.
it is actually mathematical lower bound and does not depend on hardware or technology. Even if you use qbits, you will have to make log n comparisons whatever your computation method or technolgy is. for example the computers we are using supports an ALU that is capable of comparing 2 numbers at a time. given that, the lower bound is log n (log base 2), If ALU is capable of comparing 5 numbers in a single clock cycle it would still be log n (this time base of log would be 5 however.) In case of quantum computing, if we had a quantum computer our results would be stored in vector form, which means we can trace back our choices. this would change the results if the question is something like "find the nth closest number" however for exact match, quantum computing does not change anything. Given a sorted list of numbers one has to check at least log n numbers for assuring a hit or miss on a search. That is a mathematical fact, no hardware or technology can change that.
An example of unsorted list of numbers will clarify it faster i think. If the list is unsorted best algorithm has to check n numbers in the worst case . that is theta n worst case performance and mathematical lower bound.No technology can change that. If one says that a computer can search less than that comparisons, there should definitely be some kind of clairvoyance since one cannot say a number is not found without checking every number. Similarly when checking a sorted list, it is possible to check the middle point of the list and eliminate half of the list. One has to do log n such elimination rounds for assuring the search fails or hits. Thus lower bound is log n, in the worst case you have to check the values of log n numbers if the list is sorted.
If someone is not agree with my thought; he/she should put his/her point with fact and figures. Just down voting is not the solution. It is just demoralising someone to go further with a absolutely new thought. I am working hard on this topic and results of my primary phase was very motivating. I will come with complete analysis and comparative results in the first quarter of the new year.
Here we can discuss and rectify the problem or guide the approach in a proper way. The outcome of the discussion should be positive and fruitful. I think this is the best platform where you can discuss your any innovative idea with great scholars.
@Tayfun Kucukyilmaz. Your reasoning is mathematically correct, but it ignores several practical aspects. Taking into account these aspects may lead to significant improvements.
First, the unsorted case. You say that no technology can change the theta n worst case performance. Well, that might be true. However, when we talk about quantum algorithm, we often quietly assume that a small probability of error is acceptable. This changes the game. As I mentioned before, there is Grover's algorithm that can find an item in unsorted database in O(sqrt(n)) steps with only O(1/n) error.
Second, the sorted case. Surely faster algorithms than the standard binary search can be designed if we can assume that the data is distributed in a specific way. So if there is a technology that allows to create a mapping from the initial distribution of data to e.g. Pareto distribution, search can be made faster. The mapping can be done by e.g. simply preprocessing the data. Or some kind of "magic" technology can be involved (similarly to oracles in computability theory) that serves as a black box and redistributes the data for you (a DNA computer perhaps could do this?).
In real world there is also a number of issues like caching, the need for parallelization that may have more impact on the performance than asymptotic bounds. But I'm not sure the OP wants to go into these.
Since it seems that people may impose their interpretations at will ... Why not use hashing? Without collisions, and enough RAM, time does not vary with the table size ... i.e., faster than binary search.
@Rense, refer to my answer where I go into detail in why this won't do better than binary search. That still is linear, not logarithmic. You would need to look at all the elements to construct the hash table... As stated previously, if you built your hash table before considering the problem, you can do it very fast, but that's not this type of search problem. This one asks, given an array that is sorted, find an element in it! The hash construction is infact redundant as you would have passed the yes/no answer and location as you built the hash table.
I must be confused - when you say "... but that's not this type of search problem," and "given an array that is sorted, find an element in it" where is the particular problem statement that you're referring to specified?
If what's being considered is a sequential file that is sorted and read into memory to initialize the binary search array, for example, why is the alternative creation of a hash table precluded as the file is loaded into memory?
@James The question relates to algorithms. An algorithm takes an input instance, and produces a solution. This is the topic.
"Is it possible to develop more efficient searching algorithm than Binary Search?
We know that the binary search is the fastest approach to find an item in the group of items. Do you believe that we can develop more efficient algorithm than binary search?"
This isn't asking about a program (programs relate to computer systems, as algorithms relate to computational models), so we cannot make any assumptions of that nature as it doesn't even need to relate to a computer necessarily. Since binary search is what is assumed here, binary search takes the following as an input instance:
1) An array A of n elements, this array is sorted (such that a partial order is placed on the elements from smallest to largest or vice versa)
2) A key X to be found in A. Keep in mind X may or may not be in X.
In the worst-case this algorithm performs in BigTheta(log_2(n)). The question is asking now, can we do better than this? I say no because it is a well known lower bound on sorting when the inputs are given in this fashion (i.e., you can prove that instances solved by this algorithm cannot be solved quicker than this BIGOMEGA(log_2(n)). This means that binary search meets the lower bound.
I will mention something about the technique you suggested. This is an alternate version of an informality of pre-computation. The problem assumed the array is given such that the elements are sorted. You can't make any assumptions for any input instance on how this is internally stored. I mean, one can't just simply just store an array between the min and max and go at it "that way", that won't necessarily be Theta(n) input storage. E.g., say there are n elements, but the smallest is 1 and the largest is n^5 in size. This can't work in terms of time complexity even in that way because the input size will be incorrect for being logarithmic for the lookup. Even if one suggests that "oh it comes into memory beforehand so we can stop it" is not even going to work because the worst-case is identical to linear search BigTheta(n).
This being said, a linear scan isn't how to go about it for this problem unless it is entirely an unsorted array (linear scan is one of the major ways to deal with that type of searching problem).
If the question is constrained algorithms applied to sorted lists, then I agree that no more efficient search method can be found. But is this really the intent of the posted question? If we are not interested in finding the most efficient solution to the problem of generalized look-up methods, IMO this question is an uninteresting academic exercise.
The O(log(n)) lowerbound can be defeated by an expected case O(loglog(n)) upper bound using interpolation search; this requires however that the data is drawn from a uniform distibuted source generating points in an interval. Alternative probability distributions may be transformed and scaled to fit this pattern. Interpolation search has a worst case complexity O(n) in case the uniformity of the input set is severely violated.
This is a classic result proved during the 1970-ies.
Binary search presupposes order. If you 'know' other things about the data set you can make short cuts.
A simple expression of this: If I know that the value I am looking for is the lowest value, I can find it in constant time. If I know that the value is at least 3 STD off the mean I can also improve the search.
Again, a binary search presupposes only a sorted list. The ability to apply domain knowledge (shape of the data set) to the search can drastically reduce the time taken to find the specific item you are looking for.