I don't know how to collect data or simulate these things . How to simulate server , router etc . I am new in this area , so please help me . please give me any link , by reading which ,i can simulate these things.
There are a great many simulation tools you can use for this purpose. However, if you want to really find out what is going on, why don't you set up a honeypot system, which will allow you to gather REAL attack data, often in real time as it happens. There is a faily recent paper which compares different honeypot software which would give you a good start to learning about this, which you can download from here:
https://arxiv.org/pdf/1608.06249.pdf
The biggest advantage of honeypot software is that you can collect REAL attack data, rather than simulated attack data as provided by simulation programmes. Simulations will never give you a real understanding of what is really going on, whereas honeypots will be attacked by real attackers, who will often try real up-to-date viruses and attack methodology, so you will learn far more this way. Remember, an IDS that relies on pattern matching known attacks, will only ever be able to do just that - match KNOWN attacks. To be successful, an IDS must be much more able to adaptivly recognise an attack and respond to it.
If you check out the reference on this paper, and also check out all the papers that have references it in turn, you should be able to dig up a lot of research to help you on this.
I agree with Bob Duncan. Unless you are interested in log parsing and "a posteriori" discovery of attacks, an effective IDS must operate in real time.
There is another aspect that must be taken in account: the economical costs of security. An IDS cannot be more expensive than the system it want to protect. The cost of an IDS must be a fraction of the cost of the entire system. The real time requirement for an IDS could make its cost too high.
Anyway, if your choice is to have an IDS that identities attacks in real time, you cannot use an IDS based on "pattern matching": it is too slow and too unstructured for managing attacks in real time.
It is slow because of the computational cost of a regular expressions is not always linear, it could be related to the length of the input to be checked multiplied to the length of the regular expression you want to verify in the input. Furthermore, a good IDS should check the input against a set of of possible attacks, and, considering a regular expression for each attack, this inserts another factor (the number of regular expressions to be checked) to the final computational cost of the IDS, making the real time infeasible unless you spend a lot of money for a fast parallel hardware.
Secondly, regular expression are unable to match structured attacks sequenced over multiple network packets and transactions. A real structured attack could be sequenced over many transactions and many clients, and a real IDS should be able to check for this kind of attacks.
I would suggest you to read something about process mining in order to have a more precise understanding of these aspects.
for intrusion detection systems we can collect the data samples( collection of model data sets) are available in net. we can download and use for our research purpose. ex. suspicious URL , datas etc... then you can do matching pattern with modeldata sets with original data to findout intrusion.
To add unto Bob's and Franco's interventions which I agree with, you can still use existing log data (Bruce's reference is interesting though a bit old) to train your system, on top of what you can get from the honeypot.