It is actually possible for clustering algorithms to really struggle with high-dimensional datasets. However, there are ways I think this could be solved.
Can you try doing some feature selection/feature extraction/dimensionality reduction techniques? I believe Principal component analysis (PCA) or t-distributed Stochastic Neighbor Embedding (t-SNE) can be a good start.
High-dimensional data is complex in nature, so it is difficult to create clusters. Dimensionality reduction techniques such as PCA can be used to convert data into low dimension space and removing unwanted features in High dimensional data. Correlation can be used to find out two features are positively correlated or not.
Article Using Projection-Based Clustering to Find Distance- and Dens...
Article Systematic Review of Clustering High-Dimensional and Large Datasets
Article The Challenges of Clustering High Dimensional Data
Chapter Clustering High-Dimensional Data: A Reduction-Level Fusion o...
How clustering struggles with high-dimensional datasets
Clustering algorithms struggle with high-dimensional datasets due to the "curse of dimensionality." As the number of dimensions increases, the data becomes increasingly sparse, and the distance between points becomes less meaningful.
In high-dimensional spaces, the volume of the space increases exponentially with the number of dimensions, making it difficult to identify meaningful patterns or clusters. This means that the clustering algorithm may not be able to accurately identify clusters, or it may identify spurious clusters due to the noise or random fluctuations in the data.
In addition, high-dimensional data often suffer from the problem of feature redundancy, where some of the features may be highly correlated or redundant, and provide little or no additional information. This can lead to bias in the clustering results and a loss of interpretability.
To overcome these challenges, various techniques have been developed, including feature selection and dimensionality reduction techniques like principal component analysis (PCA), t-SNE, or autoencoders. These techniques can help to reduce the dimensionality of the data and remove irrelevant or redundant features, thereby improving the accuracy and interpretability of clustering algorithms
Clustering algorithms may face challenges in high-dimensional data sets because of noise, distance metrics, and visualizing and interpreting the results. To overcome this problem, dimensionality reduction techniques, such as feature selection or feature extraction, can be applied to reduce the number of features and improve the performance of clustering algorithms. Additionally, you tried to carefully select the appropriate clustering algorithms that are robust to high-dimensional datasets to mitigate the struggles associated with clustering high-dimensional datasets.
Dear Fatemeh, Thanks. Big question is how clustering struggles and yes PCA can reduce those problems. Why clustering struggles???? I think redundancy and random fluctuations is major cause to struggle clustering. PCA overcome to those dimensionality sparse.
Often clustering struggles with many possible factors and what science can deal how it rise that is investigation part. More than remedy or overcome to the struggle , it would have been better if we raise on how struggle starts moment we start clustering???