I am a bit confused with "No of clusters" and "No of Seeds" in K-Mean clustering algorithms. Kindly provide an example for understanding the point of view. What is the effect if we change either?
To decide what is the best number of cluster is a different problem than to decide how to set the values of the seeds.
The first problem is how to decide the"value of k" in k-means (k= amount of clusters), because any additional cluster improves the quality of the clustering but at a decreasing rate, and having too many clusters may be useless to decision makers, data comprehension, data explanation, etc.
The number of initial seeds (initial centers of clusters) is the same as number of clusters (at leats in the original k-means). The problem of the VALUES of the seeds is different than problem of number of clusters... normally you would use random cluster centers, but some research points to better ways to choose them. With better seeds, k-means converges faster and the quality of the clusters is good.
I remember that there is variations of k-means mixed with hieralchical methods, in those, you use more than k seeds, and later you must collapse (unify) some clusters, like in hieralchical clustering, until reducing the number of clusters to k. In that method, final number of clusters is not equal to initial number of seeds.
To decide what is the best number of cluster is a different problem than to decide how to set the values of the seeds.
The first problem is how to decide the"value of k" in k-means (k= amount of clusters), because any additional cluster improves the quality of the clustering but at a decreasing rate, and having too many clusters may be useless to decision makers, data comprehension, data explanation, etc.
The number of initial seeds (initial centers of clusters) is the same as number of clusters (at leats in the original k-means). The problem of the VALUES of the seeds is different than problem of number of clusters... normally you would use random cluster centers, but some research points to better ways to choose them. With better seeds, k-means converges faster and the quality of the clusters is good.
I remember that there is variations of k-means mixed with hieralchical methods, in those, you use more than k seeds, and later you must collapse (unify) some clusters, like in hieralchical clustering, until reducing the number of clusters to k. In that method, final number of clusters is not equal to initial number of seeds.
@Alexey, yes i am using weka for a traffic dataset to fetch out some interesting result to make some future trends. thanks for your appreciation of time.
The seed number (any integer) is the randomization for your initial K points. K represents the number of clusters. Because Kmeans is sensitive to initial points, you will have to try experimentation on the stability of your clusters with different seeds. However, K is user defined which could also be guided by domain knowledge and other practical factors.
Whats seed does mean in weka? such an example if i configured seed = 100 with noting that i used split dataset test option in weke, what will happen to my dataset?