Can any one help to select the good clustering algorithm that deals with the clusters shapes and sizes please?

More Mehdi Ebady Manna's questions See All

Do you think can be any Uranium bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about Uranium ore deposits in world.

11 August 2024 6,720 0 View

Do you think can be any diamond bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about diamond ore deposits in world.

11 August 2024 2,167 1 View

What is the difference between mathematical R^4 space and physical 4D unit space?

We assume that the difference is huge and that it is not possible to compare the two spaces. The R^4 mathematical space considers time as an external controller and the space itself is immobile in...

10 August 2024 6,678 14 View

If Banks do not provide credit facility, what are the options available for FPOs and impact on producer’s income?

10 August 2024 8,198 5 View

Controlling for pupil light reflex when analyzing pupil size time course?

I used eye tracking to examine how participants from two different populations (A and B) react to an image. Participants in population A exhibit larger pupil sizes over time, but they also have...

10 August 2024 3,229 0 View

What are a “Farmers Producer Organization” (FPO) and its essential features?

10 August 2024 477 5 View

Strugglling with m6A dot blot any suugesstion ?

I have been doing the m6A dot blot for a while with no improvement, I am extracting the RNA, and I can see the dots although the three biological replicas give a different reading on the memberan...

10 August 2024 8,539 5 View

Do interactions between biosphere, carbon cycle, & water cycle impact global warming & interaction between atmosphere & hydrosphere?

How do interactions between the biosphere, the carbon cycle, and the water cycle impact global warming and interaction between the atmosphere and the hydrosphere?

09 August 2024 3,291 2 View

How to get moment output in Abaqus Standart?

I have input a moment load in module load Abaqus, i put my moment load on the node surface (using reference point). I have define moment in history output and make a set for moment too. But the...

08 August 2024 4,831 4 View

How is energy cycled through the Earth's climate system and how do matter cycle and energy flow through the rock cycle?

08 August 2024 8,162 0 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Why does my protein refolded to beta sheet during thermal denaturation analysis?

Hi! So i attempted to understand a novel protein behavior towards heat application by analyzing its secondary structure change. I subjected the protein to a thermal denaturation analysis using...

06 August 2024 1,989 3 View

Filippo Biscarini

It's not easy to give a good answer without knowing details on the type of problem at hand. With clustering you are referring to a broad set of statistical techniques that try to identify homogeneous subgroups among the available data/observations.

Two popular clustering methods are K-means clustering which seeks to partition data into a pre-specified number k of subgroups by minimizing the within-group variation.

Or hierarchical clustering, which does not need to pre-specify the number of clusters and builds subgroups by iteratively grouping similar data points.

These may be starting points for your work (there are however several other clustering methods)

Mehdi Ebady Manna

Dear Filippo Biscarini;

Thank you for your answer, actually i'm looking for a good hierarchical algorithm that clustering the normal traffic from attack traffic. for the point-assignment algorithm such as k-means, it doesn't deal with different shapes and sizes for the cluster. you know that the attacker send flooding of data in a specific time which lead to a large attack cluster that different from a normal cluster in some time. this is what i look for?? let me have your comments please.

Well, I have never dealt with such a problem before, but once the clusters have been identified (say through bottom-up hierarchical clustering) you could do some post-processing of the results. I would start off with some simple descriptive statistics and plots: for instance, you could plot the number of data points within the clusters in order to have an idea of the size of the clusters. Or you could look at the distance between clusters. This might give you an idea of the characteristics of specific clusters (attack/no attack).

Hope this helps!

Sourabh Bharti

You can try using Support Vector Machines

Dear Sourabh Bharti ;

My problem concerns with clustering algorithm and the SVM is a supervised algorithm. Specifically, i don\t have a number of cluster and classes.

Dear Dania Abed Aljawad;

thanks for your answer. I have a large data of attack/ normal packets. this means that there is a big distance (no similarity) between normal and attack data point. so, i can't use Spectral clustering because i can't use similarity matrix to reduce the dimensional of data. in genreral, the attacker send flooding packets in some time and suspend in another time. this means that the shapes of attack clusters are in different sizes and shapes, if we assume that the number of clusters are two ( one for attack and one for normal). I have read a lot of papers to select the good algorithm that deals with different shape and size but not find till this time.

let me have your comments please

thanks in advance

Mehdi

I have sent you message please check it

Best Regards

Anjana Kakoti Mahanta

I think DBSCAN will be a good algorithm in this case. It can identify clusters with different shapes and sizes.

Shalender Singh

For this you will need to represent your data into as 2 dimensions: 1. A output of a shape classifier 2. The overall size either as perimeter or area of contour.

Overall size is easy to calculate but the shape classification is an non-trivial job. One way to do it as 1 dimension is to assume circle as a '0' shape and then find the distance of each contour from the the circular shape by this algorithm: http://docs.opencv.org/trunk/modules/shape/doc/shape.html .

For multi-dimension shape classification, you can build up a starting bin of pre-assigned basic shapes and represent each shape as distance from the each basic shape. So if the number of basic shapes is N, you have following vector (s1, ..., sN). Add to it the size, which make vector as (Area, s1, ... sN). Then you should apply recursive K-means clustering. If you need more sophistication, the new shape centers of the clusters can then be used to rebuild the basic shape data.

Malay K. Pakhira

There is no truly satisfying clustering algorithm for every sorts of criteria. However, I found one variable shape related text for you. May check the following :

K-Means for Spherical Clusters with Large Variance in Sizes by

A. M. Fahim, G. Saake, A. M. Salem, F. A. Torkey, and M. A. Ramadan

Brian C. Franczak

I agree with Malay, that is no single method that will always give a good performance. However, if you some facility with R I suggest the functions:

gpcm(...) - package(mixture)

pgmmEM(...) - package(pgmmEM)

Mclust(...) - package(mclust) (this contains a subset of the models available in mixture)

Given the dimensionality problems pgmmEM(...) will probably work the best as it is designed to handle high dimensional data (see McNicholas and Murphy (2008) Parsimonious Gaussian Mixture Models and the subsequent McNicholas and Murphy (2010) Clustering Gene Expression Microarray Data)

Thanks all for replies and comments