How is it possible to choose k in KNN algorithm in a unsupervised dataset?

More Vincenzo Guida's questions See All

Why Do TDS and EC Increase with Larger Wastewater Volumes, While BOD and COD Decrease?

I have carried out MFC experiments on three different volumes, 50, 500 and 1000 mL of wastewater. Results after MFC treatment shows that TDS and EC are more in larger volumes of water i.e. TDS and...

09 August 2024 9,621 0 View

How to enrich pig excreta for increasing nutrient quality organically ?

Pig slurry is rich in major and minor nutrients. Is there any way to improve / Enrich its manure quality to be used in agriculture organically ? please share your knowledge.

09 August 2024 5,605 2 View

Is it possible to plot the atom-projected band structure using GPAW?

Hi, I'm currently working on a project where I need to plot the atom-projected band structure using GPAW. I've been able to calculate the band structure for my material, but I'm having trouble...

07 August 2024 269 3 View

Unusual intensity drop in some sections of chromatograms in DDA?

Hi, we have measured tryptic peptides using both DDA and DIA method on QExactive. In DDA replicates i saw unusual intensity drops occurring at the same sections of chromatograms in DDA replicates...

07 August 2024 3,218 4 View

Leaf area of tomato ?

Hi How can this equation Ln(LA) = 1.038 + 0.89 ln(X) be applied to calculate the leaf area of a tomato? Can you explain with an example and what is the substitution of Ln and ln?

06 August 2024 2,508 2 View

Why did the authors extrapolate a phenotype that they experimentally proved in one bacterial strain across the whole genus of the organism?

I aim to be as skeptical as possible regarding whether a pair of orthologous genes results in the same phenotype in their different but related bacterial organisms under similar environmental...

05 August 2024 6,787 4 View

How to preform densitometry on SDS-page bands?

I ran a SDS-page of a bacterial lysate and I want to quantify protein concentration in a specific band. I was thinking of using a standards ladder or make some standards are different...

05 August 2024 9,805 3 View

XRD Analysis is showing only Calcium carbonate. It is not showing other compounds. Can anyone help me get the other compounds?

XRD Analysis is showing only Calcium carbonate. It is not showing other compounds. Can anyone help me get the other compounds

04 August 2024 3,019 3 View

Which solvent is better to dissolve with secondary metabolites extracted from fungi?

I work on MCF7 cell cell for anticaner purpose and I wa to do drug preperation the drug ( secondary metabolites extracted from Aspergillus) My question which solvent is better with these secodary...

03 August 2024 4,725 2 View

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?

Machine learning (ML) has shown great potential in predicting the compressive strength of concrete, an important property for structural engineering. However, its practical application comes with...

03 August 2024 2,546 2 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Why does my protein refolded to beta sheet during thermal denaturation analysis?

Hi! So i attempted to understand a novel protein behavior towards heat application by analyzing its secondary structure change. I subjected the protein to a thermal denaturation analysis using...

06 August 2024 1,989 3 View

Masoum Mohammadi Gharagoz

the boundary becomes smoother with increasing value of K. The training error rate and the validation error rate are two parameters we need to access different K-value.

To get the optimal value of K, you can segregate the training and validation from the initial dataset. Now plot the validation error curve to get the optimal value of K. This value of K should be used for all predictions.

The optimal K value usually found is the square root of N, where N is the total number of samples. Use an error plot or accuracy plot to find the most favorable K value. KNN performs well with multi-label classes, but you must be aware of the outliers.

https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/

Muhammad Ali

Dear Vincenzo Guida,

There is heugh literature on this topic, see e.g., https://www.analyticsvidhya.com/blog/2018/03/introduction-k-neighbours-algorithm-clustering/ and https://realpython.com/knn-python/ for basic concepts.

Good Luck

Kamran Shaukat Dar

It may vary for your dataset and problem domain. You can use a grid search like technique to find the optimal values of k.

Ashifur Rahman

Hi, Vincenzo Guida

There are various methods to choose the best k in KNN. I am listing a few below:

Divide your data into train and tuning (validation) set. Do not use test set for this purpose. Use the validation set to tune your k and find the one that works for your problem.

Another method is to use Schwarz Criterion.

The Schwarz Criterion picks k by minimizing : distortion + λDk log N.

D = dimension of problem, k = number of clusters, N = no. of data points, λ = parameter to be specified.

It is to be noted that, when λ tends to 0; we are not penalizing having a large number of cluster centers. A trivial clustering achieves zero distortion by putting a cluster center at every data point. When λ tends to infinity, the penalty of one extra cluster will dominate the distortion and we will have to do with least amount of clusters possible (k = 1)

Elbow method is used to find the value of k in k means algorithms. Do not confuse KNN with k means