I have a terabyte of pcap files. I would like to find intrusions in those files. How can I eliminate/reduce unwanted fields or packets in those pcap files ?
Noise or unrelated fields can be analysed using techniques based on information theory and entropy. Quantitative information flow analysis is one technique that can be useful for coarse analysis of what is noise and what is more useful information. I used this approach to define an information leakage metric for IDS alarms here: http://arxiv.org/abs/1308.5421
Tcpdump acts as a sniffer and packet analyzer. Since I would like to find intrusions, I thought tcpdump is not suitable to be used. IDS such as snort is more suitable using its rulesets. Am I wrong?
Snort is fine, given that you know what you are looking for. Snort is however not suitable for finding unknown attacks (zero day attacks). Snort also has no knowledge in itself of which fields of a packet that are wanted/needed or not. Then some kind of anomaly-based approach based on statistics and/or a learning system is more appropriate. The entropy-based approach I mentioned is one feature that could be considered in such an anomaly-based IDS.
I agreed with you that snort is not suitable to find 0-day attacks. Since I would like to detect intrusions to be an evidences of a crimes in the network, I thought signature based IDS is most suitable to be used. Or maybe a hybrid of signature-based and anomaly-based IDS could produce an appropriate results?
A hybrid approach gives the best coverage. Most commercial intrusion detection systems use a hybrid of signature-based and anomaly-based in order to cover both known and unknown attacks. Anomaly-based IDS excels at detecting suspicious attack behaviour patterns, for example an unknown worm spreading, misuse of privileges or a Denial of Service attack.
This is still ongoing research. There are many IDS approaches that do reasonably well on the KDD-Cup data sets. However these data sets are very outdated (from 1998/99) and not very realistic, which means that there does not exist very good methods for benchmarking intrusion detection systems. Also, which approach that works best may depend on your problem domain - which type of attack or pattern you aim at detecting.
I heard that KDDcup99 are obsolate. Therefore I had captured terabytes of dataset in my university network to help in my research. The problem is, with these millions of network traffics (I'm using tcpdump to captured pcap files), what is the best method to find intrusion in it? How would I decide which attack to detect?
PacketPig based on Apache Hadoop and Snort is probably your best starting point for now: http://hortonworks.com/blog/big-data-security-part-one-introducing-packetpig/