Does anybody know any clustering algo that work good on binary valued data and can give balanced elements cluster.?

More Ashu Sharma's questions See All

Which type of compound does lamda max of 218 indicate in a uv-vis spectrum of a partially purified compound through column and TLC?

A crude extract of fungal culture using EtOH was subjected to column and TLC and partially purified compound was obtained. UV vis spectrum of the compound/s has max absorbance at 218nm. The...

11 August 2024 9,801 2 View

Is there an English Translation of the Carl Moller text: ZUR VERGLEICHENDEN ANATOMIE DER SILURIDEN?

I recently came across an anatomy text by Carl Moller that was published in 1915 but it is in German or Dutch neither of which I can understand. I would like to know if there is an English...

10 August 2024 4,347 1 View

How does grain and grain boundary affect the ceramic when studying its dielectric properties?

I am not able to get good literature and the physics behind how first these grains and grain boundaries arises out of no where when we make a pellet to study its dielectric properties and then how...

07 August 2024 5,177 3 View

Reason for discontinuities in my Band structure?

Hey All! I am wondering what might be wrong with my band structure. I did the calculations using VASP and plotted the results using Origin. Although I have tried changing various input...

25 July 2024 2,920 11 View

If my gene of interest has high GC content can it be problematic in sequencing? What kind of error is expected with GC rich gene sequences??

Gene sequencing related trouble shooting

25 July 2024 4,149 2 View

How to dispose off lipids waste?

I am looking for a lipid waste disposal method, keeping in mind the environmental, health and safety aspect of lipid waste. Could someone please provide guidelines or the lowest acceptable...

25 July 2024 637 5 View

What publications should I target as a psychology masters student in the UK?

I am writing a paper as a part of my course. I am new in London and was wondering that what publications should look upto?

21 July 2024 3,538 1 View

I have two problems: 1) the enzyme is not immobilizing efficiently into the MOF material.. 2) the MOF itself has peak on 400nm by using p-NPA test.?

I am working on carbonic anhydrase immobilization into MOFs. I am facing problems with low enzyme loading.. The other issue is that when using p-NPA activity test to detect the activity of the...

20 July 2024 1,440 3 View

How effective has the United Nations been in addressing the conflict and its consequences?

After World War, we have seen the formation of the United Nations. The sovereign body for peacekeeping but with regards to the Russia-Ukraine war it is visible that the UN seems to be ineffective...

18 July 2024 4,674 4 View

What are the main obstacles to achieving a ceasefire or peace agreement between Russia and Ukraine?

It has been a quite long time since the Russia and Ukraine war has been going on and if we see the context of Geopolitics and International Relations we don't see a mediation between the two...

18 July 2024 6,737 4 View

How to use evolutionary algorithms with real parameters in ryu sdn controller with large scale?

Hi, I wanna to implement evolutionary algorithms in ryu sdn controller in mininet, i have some challenges, how i can run the big scale topo with one sdn contoller??? and another question is to...

21 July 2024 246 2 View

How can I begin quantum computing on my computer or laptop?

I am interested in designing, developing, and testing algorithms on my laptop or local machine. Do I require any specialized quantum hardware or an online quantum computing service? Is it possible...

10 June 2024 2,917 3 View

Where can I find a reliable(peer reviewed) source code for the QKD BB84 protocol?

I'm trying to implement BB84 on a network, however I don't have a source code that is backed by any organization or a peer reviewed paper. Any help would be appreciated. Thanks!

09 June 2024 5,786 1 View

How are surrogates integrated in evolutionary algorithms?

I am interested in understanding how surrogates are effectively integrated into evolutionary algorithms (EAs). Specifically, I would like guidance on how to handle the approximation function when...

08 May 2024 2,579 0 View

How do quantum algorithms, such as quantum support vector machines or quantum neural networks, differ from their classical counterparts?

25 March 2024 2,307 1 View

How do quantum algorithms, such as quantum support vector machines or quantum neural networks, differ from their classical counterparts?

25 March 2024 4,842 1 View

What is the most effective method for fine-tuning PID controllers, including techniques like Ziegler-Nichols, Genetic Algorithms (GA),PSO, ACO,WOA ?

Which tuning method is optimal for adjusting PID controller parameters, such as Ziegler-Nichols (ZN), Genetic Algorithms (GA), Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), and...

18 March 2024 5,283 1 View

What is the script for running protein cluster by using DBSCAN?

Im trying to run dbscan.py by using vmd (dcd and pdb) files but the script is showing error. Its not generating cluster it's only generating noise from the trajectory file. How to solve this issue...

03 March 2024 6,414 0 View

How can i calculate the Levy flight on Monarch Butterfly Optimization Algorithm?

Hi all! I am implementing the Monarch Butterfly Optimization Algorithm in Java in order to schedule tasks on VMs. How can I calculate the Lévy flight for updating the butterflies in subpopulation...

13 February 2024 3,194 0 View

How to solve quadratic programming problems with continuous variables by using Qiskit quantum algorithms?

i solve this classical optimization using docplex . The requirement is that I need to solve it with quantum algorithms. For solving QUBO problems, I could use QAOA. But for solving quadratic...

24 January 2024 1,645 2 View

Łukasz Jopek

What do you mean that binary valued data ? How many features (coefficients) describes these objects ? Is this a set of values, which each can set only two values (e.x. true or false) ?

You can try :

1) Hierarchical cluster

2) Two-step cluster method (for example: Zhang, T., Ramakrishnan, R., & Livny, M. (1996, June).

BIRCH: an efficient data clustering method for very large databases.

In ACM SIGMOD Record (Vol. 25, No. 2, pp. 103-114). ACM.)

Ashu Sharma

thnx for the suggestion.. Yes my dataset attributes have values of true or false only.

I am searching an algorithm that can make clusters of equall number of elements.

I have used hierarchical (bolth aglo and diana) with different dissimilarity matrix but the clusters I am getting are sparse (not balanced).. that is my major problem

Do you tried PCA algorithm ( before the appropriate clusterization process)?

Also are some others clustering algorithm for clustering binary data:

1) One-Side K-means Clustering

2) Iterative Feature and Data Clustering

Ebenezer R.H.P. Isaac

When a set of binary values are provided instead of numerical, then the Euclidean distance would provide a poor measure of similarity. You could use the Jaccard distance or cosine distance metric to measure similarity.

You could then use any of the clustering algorithm as specified by Dr. Jopek, as the measure of distance itself in all of the above is only a meta characteristic, it need not to be necessarily Euclidean distance (as it is always used by default).

I would suggest using hierarchical clustering with the distance measure to be Jaccard distance. The choice of whether to use single-link or complete-link would be up to you and the nature of the data set.

Yes, use the Jaccard index for clustering binary data is a good idea (except that, you can use Hamming distance or simple matching coefficient ).

Saeed Iqbal

Yes, agree with Lukasz Jopek, Hamming distance and simple matching coefficient algorithm is not a good idea, you can use the Jaccard index for clustering binary data.

Yes thats true I agree with all. But my major problem is that I want that my clusters should be balanced. From above discussed algorithms I am getting un balanced clusters

Mihir Shekhar

Following points may prove to be of interest:-

1) Use of cosine similarity or Jaccard Coefficient is more preferential in this case, the prime reason being the inability of measures like euclidean to capture dissimilarity in these case.

2) Clustering is used to understand data distribution properties, inherently present in the data. If after using a lot of measures, you are getting the same quality of the result, the reason can be they are all capturing same distribution. In this case, I will suggest following remedies:-

a) One of the important principles in data mining is, to let data present itself it's properties and then draws conclusions about it. Clustering as a basic data understanding mechanism helps to understand properties of data. If for all the clustering mechanism you tried you are getting the same result, then your data has that property and you shouldn't change the property based on the conclusion.

b) However if you feel from your ground truth knowledge, that the nature of clusters is not resembling your understanding of data, I'll suggest use of multi-clustering (http://dme.rwth-aachen.de/sites/default/files/public_files/dmcs-icml2013.pdf) to see multiple facets of results that can be observed or subspace clustering(http://cis.jhu.edu/~rvidal/publications/SPM-Tutorial-Final.pdf) to see impact of different dimensions on clustering (if some dimensions are disrupting your ground truth structure of clusters)

3) Try using spectral clustering which generates two clusters and cluto which has a fraction factor, that can be used to tune clusters.

http://dme.rwth-aachen.de/sites/default/files/public_files/dmcs-icml2013.pdf

http://cis.jhu.edu/~rvidal/publications/SPM-Tutorial-Final.pdf

http://glaros.dtc.umn.edu/gkhome/views/cluto

https://www.cs.cmu.edu/~aarti/Class/10701/slides/Lecture21_2.pdf

Ph Hp

I had to solve an exact problem, and I used K-modes algorithm to solve it. Below is the video of how the algorithm works.

https://www.youtube.com/watch?v=b39_vipRkUo