Dimensionality reduction - autoencoders

More Aleksandra Szymura's questions See All

HPLC baseline problem. Baseline problem?

Hello, I noticed a problem with my HPLC analysis and wanted to ask for help. During my analysis I've noticed repetitive problems with the baseline: it is dropping and then (usually) going back...

20 May 2024 6,501 3 View

What’s are microorganisms adaptation to live in high sugar concentration (osmophile microorganisms)?

Hello I am looking for information about osmophilie microorganisms and specifically their enzymes. If there are any industrially important enzymes from microorganisms living in environment where...

19 May 2024 136 1 View

How to do root sampling for transcriptome analysis?

I am preparing an experiment in which I would like to sample roots of sunflowers grown in pots (substrate is soil) in later stages of the plant development. Can anyone tell me, what is the best...

29 April 2024 8,775 1 View

What's the specificity/tolerance of TaqMan probes for differences in sequence?

Hi, Does my TaqMan probe recognize the gene only if the sequence has 100% coverage with the probe or will I also get signal if there is a SNP or even a few bp difference? And if a few bp are ok...

09 April 2024 6,856 3 View

Why ions affect enzyme activity different in differ buffer?

I have tested enzyme activity (b-galactosidase) with cation ions in different buffer and the results differ dramatically. For example, while I used MES buffer calcium acts as an activator but when...

11 February 2024 3,135 2 View

Cell lines from Accegen - quality?

Hello, Does anyone have any kind of experience with ordering cell lines from Accegen? I'm wondering if they are trustworthy and about the quality of provided cell lines?

26 November 2023 3,757 1 View

Do you know of any tourism journals that have a mentoring programme for beginning reviewers?

I think that mentoring programmes support in meeting publishers' expectations.

05 October 2023 2,584 1 View

How do you get information about the number of chromosomes or DNA content?

I need to get information about the number of chromosomes or DNA content. I have seedlings from in vitro that were frozen at -20 degrees after liquidation of the culture. They are seedlings from...

28 September 2023 6,373 2 View

Detection of phosphorylated transmembrane proteins with WB?

I need to detect phosphorylated IGF1R in cell lysate and brain lysate with Western Blot. Are there any life hacks for detection of such proteins?

05 September 2023 7,310 0 View

Would you like to attend our Summer School on Energy and Society?

https://webmagazine.unitn.it/en/evento/sociologia/115480/esa-rn12-environment-and-society-summer-school

10 March 2023 3,806 3 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Why does my protein refolded to beta sheet during thermal denaturation analysis?

Hi! So i attempted to understand a novel protein behavior towards heat application by analyzing its secondary structure change. I subjected the protein to a thermal denaturation analysis using...

06 August 2024 1,989 3 View

Zimam Ahamed

Dear Aleksandra Szymura ,

From my experience, working with datasets that include important outliers and extreme values requires a thoughtful approach to normalization, especially when employing autoencoders for dimensionality reduction. To maintain consistency and avoid distortion, the entire dataset should be normalized using its global minimum and maximum values instead of relying solely on the parameters from the training set. This ensures all features are scaled uniformly and prevents issues such as clipping, which can alter critical data points in the test set.

After normalization, applying the trained autoencoder to the full dataset is appropriate as long as the overall data distribution aligns with the training data. A low Mean Absolute Error (MAE) during reconstruction suggests the model generalizes well. Additionally, visual comparisons between the original and reconstructed data can confirm that no significant biases or errors are present.

For clustering tasks like DBSCAN, normalizing the entire dataset prior to dimensionality reduction ensures that feature scales reflect the complete data distribution, which is vital for accurately detecting clusters, including those influenced by outliers. This approach balances effective dimensionality reduction with preserving the essential characteristics of the dataset needed for clustering.

Aleksandra Szymura

Zimam Ahamed thanks for your answer! But I would like to make dimensionality reduction of one specific dataset. My model does not have to generalize on new, unseen dataset. It is not predictive model. It is only supposed to work for one dataset. It is not typical ML approach...

Maybe I should train and test the model on the same dataset (my full dataset)?

Samer Sarsam

Since your model is designed to work specifically with a single dataset and doesn't need to generalize to new data, it's perfectly fine to train and test on the full dataset. To maintain consistency and prevent distortion, you should normalize the entire dataset using its global minimum and maximum values, instead of using the parameters from the training set alone. This ensures that all features are scaled uniformly, avoiding issues like clipping that could distort critical data points.

After normalizing the dataset, you can apply dimensionality reduction using autoencoders. There's no need to split the data into training and testing sets in this case. Once dimensionality reduction is complete, you can proceed with clustering, such as DBSCAN, directly on the full dataset. This approach ensures that outliers and extreme values are handled correctly, and you avoid potential problems that could arise from using a training-test split.

I hope that helps.

Cheers,

Dr. Samer Sarsam

Since your goal is to perform dimensionality reduction on a single dataset without requiring the model to generalize to new data, it is entirely valid to train and test the autoencoder on the same dataset. This approach ensures that the model effectively learns the inherent structure of the data, capturing both typical patterns and outliers, as the focus is not on prediction but on representation.

To ensure consistency, it is important to normalize the dataset using its global minimum and maximum values prior to training the autoencoder. This guarantees uniform scaling across all features and avoids distortions caused by extreme values or outliers. Once normalized, the autoencoder can be trained and applied to the same dataset to produce reduced-dimensional representations.

Evaluating the autoencoder's performance involves assessing the reconstruction quality, particularly for outliers and extreme values, to confirm that critical information has been preserved. While a low Mean Absolute Error (MAE) is a positive indicator, visual comparisons between the original and reconstructed features can provide further assurance of the model's effectiveness.

When proceeding with clustering, such as using DBSCAN, it is important to verify that the reduced dataset aligns well with the algorithm's requirements. Fine-tuning parameters like `eps` and `min_samples` based on the characteristics of the reduced data may be necessary to optimize clustering performance. This structured approach ensures that the dimensionality reduction process meets your specific objectives while retaining the dataset's integrity for further analysis.