It depends , if you know the normal events or procedures for the network scenario you are training then you can apply various training model to produce initial training profile. However, in some dynamic network scenarios you do not have that luxury and then assume the data in which you train is normal.
Yes, going strictly by terminology, anomaly detection is based on learning patterns from normal data/behavior and look for any deviation to trigger alarm. However, we need to make sure that normal data is representative of "entire normal behavior" to avoid FP (which is, in practice, hard to obtain!).
It can be noted that anomaly detection systems can also be built as "expert systems" wherein you do not have a training phase, but rather hardcoded rules that define abnormal behavior.
I think it is right. But to make more effective and improve in detection both normal and attack training data should be used. Due to todays machine learning algorithms are scalable, it is better to use high amount of data during the learning process.
Building a model based on the normal behavior of a network which was observed during the training phase and later identifying anomalies when the measured state of the network and the expected state (calculated based on the model) differs more than a defined threshold is a common approach to apply behavior-based intrusion detection (here, anomaly-detection).
Anyway, some challenges arise by using this approach:
1. The problem of high false-alarm rates based on unseen benign data during the learning phase and
2. the problem of the ground truth: as the typical learning process has to be executed in the real-world environment / network with real traffic, the data packets are not marked as benign or malicious. Here, the differentiation between supervised and unsupervised learning techniques is important:
supervised training (for example used in back-propagation algorithms) requires labeled data; therefore, when having unlabeled real-world data, all data should be benign during the training phase - but this typically cannot be guaranteed (e.g., the Cisco 2014 Annual Security Report stated that 100 percent of the analyzed business networks had traffic going to websites that host malware). Because of that, (unrecognized) malign traffic can be learned as benign one during the training phase and cannot be detected afterwards (false negatives / type II error); but remember, most false alarms during the application are typically based on previously unseen benign traffic (false positives / type I error).
On the other hand, unsupervised training algorithms are able to build, e.g., clusters of the unlabeled data, therefore, in the best case, being able to differentiate between benign and malign traffic by their own; leader clustering is an example of an unsupervised learning algorithm.
Summarized, on the one hand, ensuring to have only normal or completely labeled data during the training phase is a big challenge in real-world networks while on the other hand, unseen benign traffic can later result in high false positive rates.
I think in real network, it is difficult to ensure labelling normal data efficiently. so we can modify this concept with utilising a profile from normal or a abnormal. the detecting in this case will depend on unsupervised learning such clustering or any classifier which not use the label for a training set
Most approaches available today are knowledge-based, i.e. using signatures or learned traffic models, which restricts the detector to only known anomalies. To overcome these limitations, the process of anomaly detection should be devised in such a way that it must be able to detect intrusions from fresh data by learning from it, rather than building a model to detect intrusions in the future traffic data.
Have a look at this paper:
An unsupervised heterogeneous log-based framework for anomaly detection, Asif Iqbal HAJAMYDEEN, , Nur Izura UDZIR, Ramlan MAHMOD, Abdul Azim ABDUL GHANI, Turkish Journal of Electrical Engineering & Computer Sciences, (2016) 24: 1117 - 1134