How to reduce the dimensions of high dimensional data?

Zzz Ch @Zzz-Ch-2

28 October 2023 11 2K Report

High dimensional data；Dimensionality reduction

Muhammad Minoar Hossain

Use dimensionality reduction technique, Like- PCA, LDA, etc.

Mustafa Ali Abuzaraida

It depends on the type of your data. Try PCA to do the reduction

Miroslav Jezek

For a intro/brief review I would suggest a blog post by Jason Brownlee:

[1] https://machinelearningmastery.com/dimensionality-reduction-algorithms-with-python/

Then you might be interested in a high-performance dimension-reduction algorithm termed t-distributed Stochastic Neighbor Embedding (t-SNE):

[2] van der Maaten, L.J.P.; Hinton, G.E. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9:2579-2605, 2008.

[3] van der Maaten, L.J.P. t-Distributed Stochastic Neighbor Embedding, https://lvdmaaten.github.io/tsne/

Manish Dwivedi

Principle component analysis (PCA), Independent Component Analysis (ICA), Minimum noise fraction (MNF), Local linear embedding(LLE), Linear discriminant analysis (LDA) are few example of dimensionality reduction technique. Try these technique depending upon type of your data.

Jammisetty Yedukondalu

Use optimization techniques like PSO,GWO,WOA

Lev Sukherman

Actually there are many different dimensionality reduction technique. The most common is PCA, which is linear method. you can also try other methods: Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Independent Principle Analysis (IPC), Kernel Principle Component Analysis (KPCA). you can also try non-linear methods: Stochastic Neighbor Embedding (t-SNE) or ISOMAP.

T. Tamilarasi

Lots of dimensionality reduction algorithms are there to reduce the data. Those huge algorithms cannot give the perfect solution for our problem( I got this point from one data mining book). Before choosing anyone, analyze your data manually. Find key data and try to eliminate the salty data.

When we can do it ourselves, that should be the best dataset. (No one can have much idea than us about our data)

Mrutyunjaya Hiremath

@ T. Tamilarasi

Your approach to dimensionality reduction emphasizes the importance of domain knowledge and manual data analysis, which is indeed a crucial aspect of effective data management and preprocessing.

While automated dimensionality reduction algorithms are powerful and necessary tools, especially for handling large and complex datasets, they are most effective when used in conjunction with thorough manual data analysis and a solid understanding of the domain. This hybrid approach allows for the most informed and contextually relevant decisions in data preprocessing and analysis.

@ Zzz Ch

Reducing the dimensions of high-dimensional data is a common task in data analysis, machine learning, and statistics. This process, known as dimensionality reduction, can be crucial for visualizing data, speeding up computation, and improving the performance of machine learning algorithms. Here are several popular techniques for dimensionality reduction:

1. Principal Component Analysis (PCA):

- PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.

- The first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component has the highest variance possible under the constraint that it is orthogonal to the preceding components.

2. t-Distributed Stochastic Neighbor Embedding (t-SNE):

- t-SNE is a non-linear technique particularly well suited for the visualization of high-dimensional datasets.

- It converts similarities between data points to joint probabilities and tries to minimize the Kullback–Leibler divergence between the joint probabilities of the low-dimensional embedding and the high-dimensional data.

3. Linear Discriminant Analysis (LDA):

- LDA is used as a dimensionality reduction technique in the pre-processing step for pattern classification and machine learning applications.

- The goal is to project a dataset onto a lower-dimensional space with good class-separability in order to avoid overfitting (“curse of dimensionality”) and also reduce computational costs.

4. Autoencoders (in Deep Learning):

- An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data.

- The network is trained to use a small layer in the middle to represent the input, which forces it to identify the most important features in the input data and learn a compressed representation.

5. Uniform Manifold Approximation and Projection (UMAP):

- UMAP is a relatively new technique that is particularly good at preserving both local and global structure in the data, unlike t-SNE which mainly preserves local structure.

- It is often used for visualization purposes but can also be used for general non-linear dimension reduction.

6. Feature Selection Techniques:

- Instead of creating new combinations of features, feature selection techniques select a subset of the original features.

- Methods include filter methods (based on statistical tests), wrapper methods (use a predictive model to evaluate a combination of features), and embedded methods (which perform feature selection as part of the model construction process).

The choice of method depends on the specific needs of your task, such as the size and nature of your data, whether you prioritize preserving local or global structures, and whether you need a linear or non-linear reduction.

Sefinat Muhammad-Thani

You may apply metaheuristic algorithm.

Ernest Acheampong

PCA, LDA etc

Eugene Veniaminovich Lutsenko

Орлов А.И. Методы снижения размерности пространства статистических данных / А.И. Орлов, Е.В. Луценко // Политематический сетевой электронный научный журнал Кубанского ГАУ (Научный журнал КубГАУ) [Электронный ресурс]. – Краснодар: КубГАУ, 2016. – №05(119). С. 92 – 107. – IDA [ID статьи]: 1191605005. – Режим доступа: http://ej.kubagro.ru/2016/05/pdf/05.pdf , 1 упл.

Do you think can be any Uranium bearing rocks in Eastern part of Iran and western part of Afghanistan?

Do you think can be any diamond bearing rocks in Eastern part of Iran and western part of Afghanistan?

What is the difference between mathematical R^4 space and physical 4D unit space?

If Banks do not provide credit facility, what are the options available for FPOs and impact on producer’s income?

Controlling for pupil light reflex when analyzing pupil size time course?

What are a “Farmers Producer Organization” (FPO) and its essential features?

Strugglling with m6A dot blot any suugesstion ?

Do interactions between biosphere, carbon cycle, & water cycle impact global warming & interaction between atmosphere & hydrosphere?

How to get moment output in Abaqus Standart?

How is energy cycled through the Earth's climate system and how do matter cycle and energy flow through the rock cycle?

Does Fentanyl presence in injecting drug user communities require special harm reduction measures?

What is main issue that can not be solved of poverty reduction by donation?

How to perform the DFT calculation on the mechanism of Photocatalytic CO2 reduction with water?

Is there any real advantage in co-electrolyzing CO2 and H2O in liquid phase to produce syngas versus using separated electrolysis of H2O and CO2?

How to use ORP Combination Electrode for GSH/GSSG Measurments?

Do systems of care for substance use disorder include harm reduction techniques?

What properties determine the ability of metal atoms to adsorb and activate different small molecules (such as CO2, N2)?

What properties determine the ability of different valence metals to adsorb and activate small molecules?

What factors determine whether photocatalysts require continuous or batch CO2 purging for CO2 reduction?

CO2 and nitrite (NO2-) co-electrochemical reduction to produce urea?