Thanks, actually, i would like to know how recommendations and search are done in cloud computing. My objective is to design a recommendation system for cloud computing services.
For an overview over research issues, check out the cloud computing report by the commission: http://cordis.europa.eu/fp7/ict/ssai/docs/cloud-report-final.pdf
As for your recommendation system, do you mean: recommending cloud services or building a recommendation system for (any) service on top of a cloud?
well, then let's ask the other way round: which features of the cloud do you want to exploit for this? clouds do not per se differ from any other internet provided service, so for enabling search, you either need to crawl through all exposed sites / services, or - more commonly in this context - host a registry, respectivel search existing registries. Does this help?
but crawl mean some how centralized i.e., crawl the host registry then index them for search purposes.
my objective is to design a decentralized RS over cloud computing and works in the fly i.e., each agent or node index some of the whole data services or services, and then when a node issues a query, the query is forwarded to the related nodes i.e., those who have results for the query.
for example, suppose that i am editing a paper in GoogleDoc that is related to social RS, and like to cooperate with some who also writing similar to mine. then the proposed RS suggest to me a set of individuals who are working on social RS that are close to my work. after i communicate with those to make cooperation. moreover the recommendations should comes from different cloud provider, i.e., not just from Google may be also from Microsoft, etc.
another examples like scientific workflow, etc.
thus, we should have a distributed infrastructure that includes all the providers of cloud computing or exploit users resources to make cloud computing infrastructure that can provide cloud computing services such as p2p file sharing.
i appreciate your feed back and your opinion of the possibility of designing such infrastructure
hmm, you're touching on more complex issues than related to clouds or any IoS technology, comes to that. The principle of the problem is always the same: if no information is exposed, you cannot access it (without some mean security breaches ;) In order to enable web based service provisioning, the interfaces need to be published in one way or another. Obviously, web sites and web services are only announced "locally", whereby "local" means on the head node(s) of the respective data center.
There are only two main approaches to deal with this: try to access each provider in turn and list their services (more or less crawl) or asking them to push the information to a more or less centralised point (registry). Note that crawling does not necessarily imply indexing, unless you want to offer fast searches, such as Google. With crawling triggered with every query, you waste a lot of time, but you do not centralise the data as such (just the search entry).
Now you can complicate the approach any way you want, in particular if you compare this to p2p file sharing entities. The - very basic - principle in these cases consists in forwarding the search from each individual peer, i.e. you start with the peers you immediately find, return their results and continue collecting by asking them to forward your query (thus, your results improve over time after the query) - until a predefined cut-off point is reached. More sensible and / or pactical approaches use some form of registers to at least maintain the "nearest" or "best" peers, so that you either query a central instance first for all known / registered / found peers, or maintain a least at each peer and update it slowly over time. You can also introduce "super peers", which essentially means adding a hierarchy to the system.
Note that the principles are similar to distributed databases, so you may want to check that out.
In all cases though, each peer has to provide the functionality to understand (and respond to) the query, and in the p2p cases even handle forwarding the query. Current web service providers only expose the information via e.g. the Apache server, but not a lot more - success stories such as google base on extending this information to serve specific search needs (which, if you use them properly, helps a lot). Implicitly, if you want to use the standard way of exposing this information, you need additional software on each providing peer to give you the according type of data and / or to serve query forwarding.
This has nothing to do with the cloud: cloud services are no better or worse exposed than in any other case. In addition to this, a cloud node is not more accessible than any other server (even less in general).
So, the cloud does not help you with the complexity of the task.
Now here comes what you can use the cloud for, though: executing the search from a centralised (single) point will quickly overload the performance and create long delays. So you could scale your search by splitting the query (or rather the lookup destinations) into multiple domains and instantiate partial searches on additional nodes. This allows you to distribute the load of the search itself and thus lowering the query response time - this is common practice for most search engines ;)
I believe that the Cloud is a platform to provide centralized services for distributed data and applications leaving all the complex data and application management to the cloud administrators. The infrastructure used to make this cloud function is the Grid. To make a Cloud scalable a Grid Computing Infrastructure is required.
Thus, some of the Cloud challenges have to be resolved in the grid infrastructure.
The obstacles mentioned:
1. Job Queing, --- grid
2. Security - Grid
3. Process Broker (the Entity that fowards a task to the best resource) -- Cloud and/or Grid
4. The UI - Cloud
5. Application Services - Cloud
I would recommend further reading on popular Grid Computing Infrastructures and tools. Like Condor, Globus Grid, GridGain.
well, to be fair, calling the cloud a grid leaves out the essential core feature of the cloud, namely elasticity. Grids (and in fact Web Service) offer the essential functionalities for hosting and communication (alternatively virtualisation technologies for IaaS), but they cannot cater for scale out. Now the effort to realise this depends on the use case, but still this is a major difference. In addition, it's worth mentioning things like multi-tenancy, isolation etc. which have different impact on clouds than on grids, so...
I can suggest to have a look at the issues around Quality of Service support for virtualized applications, including performance isolation of virtual machines in a cloud environment. We investigated some of these topics in the European Project IRMOS (http://www.irmosproject.eu), and specifically within the ISONI component (http://www.irmosproject.eu/Isoni.aspx). In case you'd like an overview, there are a few YouTube videos in the irmosproject channel (http://www.youtube.com/user/irmosproject).
An example "overarching" paper about these issues might be considered this one:
Virtualised e-Learning with Real-Time Guarantees on the IRMOS Platform," in Proceedings of the IEEE International Conference on Service-Oriented Computing and Applications (SOCA 2010), Perth, Australia, December 2010. (Best Paper Award)
along with the "ISONI Whitepaper" that you can easily find on ISONI webpage above.
As Lutz points out, a proper support for scale-out capabilities is one of the key functionality, which becomes particularly challenging in presence of heterogeneous hardware within an IaaS provider.
Security in cloud computing is also a quite challenging issue, nowadays, for not letting clouds remain restricted to those application domains in which data security (and particularly confidentiality) is not an issue at all.