Its not clear from your sentence whether you want to create or analyze the dataset from cloud.
In case you are looking for the cloud based data set for applying ID techniques then use this CIDS data set http://www.di.unipi.it/~hkholidy/projects/cidd/
In case you are looking to create a dataset then you need a cloud setup and perform certain events including attacks. You may also look at the my paper on Intrusion detection and response system (full text available on my profile) to get an idea of the earlier one .
thank you for answer. indeed, we need to create simulated intrusions data set for cloud infrastructure... we will focus on particular attacks in cloud...on the other hand, CIDS data set is not available.... so, my question if can we use ns2 as a simulator for cloud ?!!
Really, your question is one of the top challenging right now to do it in cloud environment.
I give you some essential information which could useful for generating intrusion dataset. First of all, you need simulator for attacks, and another one for normal or you can collect this from a cyber range lab or a real environment. Regarding attack simulators, you can use Metasploit, IXIA perfect storm, lunching attack scripts. Second, you need to understand the labeling process as this the risk task to ensure the fidelity of your data and this one is very difficult to control between end-to- end applications, with respect to time delay. Next, cloud architecture require a wise architecture including, SAAS, IAAS, PAAS models to say that this is for cloud resources. Finally, sniffer such as tcpdum, tshark... etc to capture your traffic, with respect to the reliability of the ground truth for achieving accurate labeling.
I would also give you notice that attacks of KDD99, NSLKDD are very different from current attack philosophy. So, these datasets are not used as dataset for a cloud computing.