Tell me a tool for clustering big data, which is in full open free access. Where to download it or how to use it online? I need it to process at least 400 variables (preferably up to 2000) and at least 8000 observations
I have been using Weka Machine Learning Software for many years. As the authors state in their web, https://www.cs.waikato.ac.nz/ml/weka/, “it contains tools for clustering”.
I recommend to you their book: Data Mining Practical Machine Learning Tools and Techniques with JAVA Implementations, by Ian H. Witten and Eibe Frank, in Morgan Kaufmann and also this other one: The Elements of Statistical Learning. Data Mining, Inference and Prediction by Trevor Hastie, Roberrt Tibshirani and Jerome Friedman. Second Edition. Springer Series in Statistics.
There are a bunch of clustering approaches that you can use out-of-the-box from sklearn (a Python library). A good one to start with is TSNE (make sure you optimize the hyperparameters).