I am lost in : Hadoop, Solr, Cloudera, Carrot2, Hesse and Lucence.
If you could help me narrow this set for my particular scenario it would be great. I am not sure which of the above will fit in retrieve information from cloud computing.
if you use Hadoop you should keep in mind that most use way is through map reduce and therefore key value not concept of index there.
The first step is to find which kind of data structure are you working with? For example if you use Redshift column oriented the method will be different than let say Hadoop. Then you should ask what do I want to do? If you want to do Machine Learning you will do a lot of iterations on your data, therefore in memory will be needed to go faster.
Hope those few lines help a little to clarify your question.