Is it a right way to build anomaly detection from normal data during the training phase?

More Nour Moustafa's questions See All

Is there a method to measure the concentration of Cs, Sr, Co using spectrophotometer?

it's important to know that we are using the inactive elements of Cs, Sr, Co

25 July 2024 1,553 2 View

Nutrition map: What does it mean for you?

Unluckily, just similar to an expensive car, your brain can be damaged if you ingest anything other than premium fuel.

23 June 2024 2,287 7 View

Can biofloc system be applied in earthen ponds?

Dear colleagues Can I apply biofloc system in earthen ponds?

17 May 2024 2,685 0 View

What is the Impact of English as a Global Lingua Franca on educational frameworks in developing nations?

The rise of English as a lingua franca (ELF) has led to its increasing use in educational settings, particularly in non-English-speaking countries seeking internationalization. Looking forward to...

27 April 2024 8,818 12 View

Methylation of spata 6 gene in rat males?

methylation of spata 6 gene and its role on male infertility of rat

08 March 2024 9,667 0 View

What is the best way to detect lipid droplets into brain tissue histology samples?

I tried using oil red O and Sudan black B but I didn't get good results. I used cryostat sections and fixed the tissue into formaldehyde and then sucrose in advance but the tissue field is always...

06 March 2024 6,321 2 View

Can people help me complete my questionnaire?

I am conducting research with title "Cost Overrun and Time Delay Prediction Using Machine Learning Algorithm: Application of a Hybrid Intelligence Model." and I need to collect responses on...

23 February 2024 3,544 1 View

What is the best sampling method when dealing with a small population?

I am conducting an exploratory study following a mixed-method approach. The population I am working with is small (20 to 30 people). What sampling method would you recommend to make the best out...

14 February 2024 9,759 3 View

How the arbuscular abundance values can be the same in Irrigated and non Irrigated areas?

Arbuscular abundance values have the same values in Irrigated and non Irrigated areas using microbial inoculates such as; AMF, PGPR and Trichoderma

04 February 2024 8,488 0 View

What is the relation between potato yield and starch content ?

04 February 2024 4,133 1 View

Stability of the Solar System: Insights from Einstein’s Equations ?

The stability of the Solar System is a complex subject that blends the classical framework of Newtonian mechanics with the modern insights provided by General Relativity (GR). Understanding this...

07 August 2024 2,569 1 View

Absorption coefficient of methane?

Hello, Can anyone provide me with the absorption coefficient of methane gas at 7.7 um? Any reference?

06 August 2024 980 5 View

In what part of the brain´s rabbit bdnf is located and include references please?

I´ve been unable to find specific information about this neurotrophin in the CNS of rabbits exclusively. There is extensive info in mice, fish and rats, but in brain´s rabbit is hard to find....

04 August 2024 762 1 View

I need the datasets of Microgrid for system identification?

Hi I am working on data driven model of the microgrid, for that, i need the reliable datasets for the identification of MG data driven Model. Thanks

02 August 2024 5,748 4 View

Which will be the best software for the Hydration shell analysis with molecular dynamics?

I am using a windows system, what software I should use for hydration shell analysis with molecular dynamics?

02 August 2024 3,143 4 View

How to Compress Information Neurally?

Samuel Morse, the inventor of the Morse Code, understood that certain letters in the English language occurred more frequently than others (Gallistel and King 2010). To deal with this, Morse used...

01 August 2024 4,456 2 View

How to determine method detection limit in an analytical method?

I know the difference between instrumental LOD and method LOD but my query is - in case of any sample whose concentration is zero or not detected by the instrumental LOD, is it possible to get...

24 July 2024 6,592 5 View

What role does the gut microbiome play in immune system modulation and disease susceptibility?

Only Probiotics Related

22 July 2024 6,454 0 View

Do you think Serotonin is Primarily A NEUROTRANSMITTER? Does anyone else like me think this Beautiful Biomarker was repurposed as a Neurotransmitter?

Serotonin (5-hydroxytryptamine, 5-HT) is predominantly known as a neurotransmitter in the central nervous system (CNS), where it plays a crucial role in mood regulation, gastrointestinal motility,...

20 July 2024 3,925 5 View

How to determine LOD values?

I am performing fluorescence experiments using a ligand to detect metal ions. I want to determine the Lowest Detection Limit (LOD) using the formula LOD = 3σ / K. However, I'm uncertain about...

19 July 2024 1,086 1 View

Adnan Nadeem Al Hassan

It depends , if you know the normal events or procedures for the network scenario you are training then you can apply various training model to produce initial training profile. However, in some dynamic network scenarios you do not have that luxury and then assume the data in which you train is normal.

Sanjay Rawat

Yes, going strictly by terminology, anomaly detection is based on learning patterns from normal data/behavior and look for any deviation to trigger alarm. However, we need to make sure that normal data is representative of "entire normal behavior" to avoid FP (which is, in practice, hard to obtain!).

It can be noted that anomaly detection systems can also be built as "expert systems" wherein you do not have a training phase, but rather hardcoded rules that define abnormal behavior.

Dr Santosh Kumar Sahu

Dear Nour,

I think it is right. But to make more effective and improve in detection both normal and attack training data should be used. Due to todays machine learning algorithms are scalable, it is better to use high amount of data during the learning process.

Robert Koch

Building a model based on the normal behavior of a network which was observed during the training phase and later identifying anomalies when the measured state of the network and the expected state (calculated based on the model) differs more than a defined threshold is a common approach to apply behavior-based intrusion detection (here, anomaly-detection).

Anyway, some challenges arise by using this approach:

1. The problem of high false-alarm rates based on unseen benign data during the learning phase and

2. the problem of the ground truth: as the typical learning process has to be executed in the real-world environment / network with real traffic, the data packets are not marked as benign or malicious. Here, the differentiation between supervised and unsupervised learning techniques is important:

supervised training (for example used in back-propagation algorithms) requires labeled data; therefore, when having unlabeled real-world data, all data should be benign during the training phase - but this typically cannot be guaranteed (e.g., the Cisco 2014 Annual Security Report stated that 100 percent of the analyzed business networks had traffic going to websites that host malware). Because of that, (unrecognized) malign traffic can be learned as benign one during the training phase and cannot be detected afterwards (false negatives / type II error); but remember, most false alarms during the application are typically based on previously unseen benign traffic (false positives / type I error).

On the other hand, unsupervised training algorithms are able to build, e.g., clusters of the unlabeled data, therefore, in the best case, being able to differentiate between benign and malign traffic by their own; leader clustering is an example of an unsupervised learning algorithm.

Summarized, on the one hand, ensuring to have only normal or completely labeled data during the training phase is a big challenge in real-world networks while on the other hand, unseen benign traffic can later result in high false positive rates.

Nour Moustafa

thanks for answering.

I think in real network, it is difficult to ensure labelling normal data efficiently. so we can modify this concept with utilising a profile from normal or a abnormal. the detecting in this case will depend on unsupervised learning such clustering or any classifier which not use the label for a training set

Asif Iqbal Hajamydeen

Most approaches available today are knowledge-based, i.e. using signatures or learned traffic models, which restricts the detector to only known anomalies. To overcome these limitations, the process of anomaly detection should be devised in such a way that it must be able to detect intrusions from fresh data by learning from it, rather than building a model to detect intrusions in the future traffic data.

Have a look at this paper:

An unsupervised heterogeneous log-based framework for anomaly detection, Asif Iqbal HAJAMYDEEN, , Nur Izura UDZIR, Ramlan MAHMOD, Abdul Azim ABDUL GHANI, Turkish Journal of Electrical Engineering & Computer Sciences, (2016) 24: 1117 - 1134