Bias in Security Datasets?

More Goutham Sunkara's questions See All

How should one calculate the concentration of free radicals (in spins/g) on solids by EPR spectroscopy?

I have some solid-acid catalysts whose surface consists of free radicals. I am able to visualise this by the intensity of peaks on the EPR spectrum (sample image attached), but would like to...

05 August 2021 3,192 2 View

Can anybody recommend some relevant case studies in setting up a children's library?

There are various studies carried out discussing various aspects of a children's library like marketing of resources, developing services etc. however I would like to know whether there are any...

25 April 2021 4,695 4 View

How to evaluate inductance of a solenoid in accordance with the air gap?

Hello All, I am currently modeling a solenoid on Matlab. I have evaluated the inductance of the solenoid based on its geometrical configuration using the formula, L = mu0*n^2*V where, mu0 =...

16 February 2021 5,793 5 View

How to verify a flexible model modeled on Simscape environment using the PDE toolbox?

Hello All, I have modeled the beam as a flexible body on Simscape environment I have chosen the appropriate boundary nodes, mapped them into interface frames in the Simscape environment. From the...

10 February 2021 779 0 View

How to define acceptable log10 range for real time pcr quantitative assay?

Case: I have performed inter laboratory comparison for 2 samples and obtained our lab and referral lab value in IU/ml. As per NABL guidelines (India) the acceptable range between the two values...

25 November 2020 4,306 1 View

How to sort the TCGA dataset based on expression of a panel of genes?

I would like to sort the TCGA dataset based on the expression of a panel of genes (let's say 22 genes related to the inflammatory response) on cBioportal to classify them as inflammatory-high and...

24 March 2020 5,897 1 View

How to run the MD simulation for a protein-ligand system in NAMD?

Hi I am trying to run the MD simulation for protein-ligand system in NAMD. But I am having troubles running it. It says the ligand is not parametrised when generating psf file. I understood the...

28 August 2019 9,974 2 View

Which electrode are suitable to measure I-V of potassium fluoride?

I am not able to measure zero current potential (Em), electrolyte was potassium fluoride (KF) (10 concentration gradient) of few micron hole device with Ag/AgCl electrodes (which prepared by...

04 August 2019 885 2 View

Validation of docking result?

hello. i have finished running the ensemble docking with ligand library. what are the next step of computational methods or other methods available to prove the top scored molecules likely to be...

08 March 2019 3,353 7 View

With a 405 nm CW laser, What are the suitable fluence values to reduce graphene oxide free standing films?

I am trying to reduce the graphene oxide free standing films(50 micron thick) with a 405 nm CW laser. The laser has an average power of 534 mW. I am looking for some suggestions for what fluence...

11 April 2018 8,385 1 View

I need the datasets of Microgrid for system identification?

Hi I am working on data driven model of the microgrid, for that, i need the reliable datasets for the identification of MG data driven Model. Thanks

02 August 2024 5,748 4 View

What is the best sampling strategy?

I am conducting a qualitative study that uses interviews to investigate the perceptions of teachers about a particular leadership practice and I am focusing on 3 schools which have a total number...

01 August 2024 8,457 10 View

Difficulty with permittivitt and Magnetic Permeability Calculations?

Difficulty with permittivitt and Magnetic Permeability Calculations Hello everyone, I have all the parameters related to the calculations of the permittivitty and magnetic permeability...

30 July 2024 5,206 1 View

What should a Mechanical Engineering PhD scholar focus on during their PhD to enhance their chances of securing a postdoctoral position?

29 July 2024 7,714 4 View

How to use Desmond in HPC ?

Our department has recently acquired an HPC (High-Performance Computing) system, and I'm thrilled to take my molecular dynamics calculations to the next level using Desmond. I used to run my...

28 July 2024 6,553 1 View

All math can be explained by iterator of code?

all math can be traversed by code? all math can be translate to code?

26 July 2024 9,530 0 View

Which file formats are accepted for supplementary material?

I have a dataset consisting of json files. i tried to upload a zip or tar of it but the system tells me that the file format is not accepted... br

25 July 2024 1,316 3 View

Dataset of synchronized cardiac angiography and ECG?

Hello, I'm working on medical project and I would need synchronized angiography with ECG? Does anyone know if some open source dataset of this kind exist? Regards, Bruno

25 July 2024 2,214 2 View

Is it possible to do transient simulation in HFSS in presence of magnetic biasing?

I have designed magnetic material with different biasing conditions in HFSS. Now I want to give an RF AC signal and do a transient simulation in HFSS. Is it possible to do in HFSS? Please help me...

24 July 2024 9,319 5 View

What is climate resilience for food security and sustainable agriculture and how is smart irrigation in agriculture climate resilient?

22 July 2024 1,947 0 View

Suraj Kapoor

To detect and mitigate bias in cybersecurity datasets (for intrusion detection or phishing classification), systematically analyze for data imbalances using statistical tools and domain-discrepancy algorithms, such as distribution comparison or domain discrimination. Employ mitigation techniques like oversampling minority classes, data reweighting, and fairness-aware learning, while continually updating datasets to reflect emerging threats and reduce sampling or feature bias. Ensuring diverse, representative, and regularly audited data—paired with transparent, explainable models—effectively reduces false positives and enhances real-world robustness.

Patrik Goldschmidt

Hello, of course, it depends on the type of data, task, and what do you mean by "bias". General recommendations for cybersecurity research were discussed in "Dos and Don'ts of Machine Learning in Computer Security" by D. Arp. et al at USENIX 2022. Really a must-read paper for anyone working in cybersec using ML.

If you are more interested in intrusion detection, particularly Network Intrusion detection, I'd gladly invite you to take a look on our paper "Network Intrusion Datasets: A Survey, Limitations, and Recommendations" which we recently published in Computers & Security, with a preprint available through my profile.

Cheers.

Lynn Obadha

Practitioners perform exploratory data analysis (EDA) to examine feature distributions, class imbalances, and source diversity.

To detect systematic bias, use: Statistical parity checks, confusion matrix analysis across subgroups, and outlier detection.

Mitigation measures:

-Data augmentation could be used to balance classes.

-Re-sampling techniques for oversampling minority classes and undersampling dominant classes.

-Fairness-aware algorithms that optimize for accuracy and equity.

To reduce bias and ensure robust, fair, and generalized threat detection systems:

-Incorporate diverse data sources

-Continuously validate models against real-world traffic

-Establish human-in-the-loop review processes