Which classification and clustering techniques of machine learning works well on non normally distributed data ?

More Abhishek Verma's questions See All

What does Lifetime Units in terms of RPL protocol mean ?

I am acutally not getting what is the real notion of lifetime units here. (mentione din RFC 6550). Default Lifetime: 8-bit unsigned integer. This is the lifetime that is used as default...

09 October 2018 7,563 2 View

Is it feasible to compare results of datasets created over wired network with wireless network dataset?

I am performing classification task related to intrusion detection (Binary classificaton, i.e., normal and attack). Accuracy ad FAR are considered for comparisions of results of various...

02 March 2018 5,475 3 View

How probability distribution of each feature is calculated ?

I am not able to understand how probability of different features is calculated and compared in training and testing sets. Please refer figure.

01 February 2018 2,188 4 View

When to normalize and when to standardize features of dataset ?

When do we normalize and when do we standardize features and how to deal with encoding of nominal data to numeric so that it can be feed to ANN ?

01 February 2018 8,569 5 View

What kind of DoS attacks can be implemented(performed) on CoAP (Application layer) of Internet of Things?

I am using NetSim tool.

11 December 2017 8,339 5 View

What security attacks can be done on CoAP layer of Internet of Things ?

I am using NetSim tool and trying to implement attack behaviors on CoAP layer of IoT.

11 December 2017 4,579 8 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View

Need help with my research project on open source SIEM and machine learning?

Hello everyone, I am currently working on a research project that aims to integrate machine learning techniques into an open source SIEM tool to automate the creation of security use cases from...

04 August 2024 3,196 2 View

Swimming/space travel depends on the proprioceptive muscle spindles?

When the entire neocortex is ablated in rodents, although they are still able to swim, all the limbs move continuously and asynchronously (Vanderwolf 2006; Vanderwolf et al. 1978). Normal animals...

03 August 2024 835 3 View

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?

Machine learning (ML) has shown great potential in predicting the compressive strength of concrete, an important property for structural engineering. However, its practical application comes with...

03 August 2024 2,546 2 View

Which test should be used to study association among demographic profile and awarness level?

i have to study the awareness and adoption level of cloud computing in a district of India. i also want to use association among demographic variables like gender, age, education, income etc and...

02 August 2024 2,420 3 View

Some new emerging problems on application of RL for scheduling in IoT networks?

I have seen plenty of existing works on applied Reinforcement Learning (RL) policies for optimized scheduling in IoT networks including Q-learning, DQNs, and the newer ones including PPO for...

01 August 2024 8,754 2 View

Corrado Mencar

neural networks and svm are "model free" methods, as they work without any assumption on the data generation process. Many clustering techniques (such as k-means and fcm) can be customized with different distance functions, so as to adapt their behavior to non-normal data.

Jake Dailey

Hi Abhishek,

If your data has labels (meaning you have examples where you know which group/classthey belong to), then you should focus on classification models to learn the relationship between your features and labels. As Corrado mentioned above, random forest, softmax regression and SVC are good places to start. I'd leave neural networks alone until you're convinced your problem requires all of their bells and whistles (unless your problem is image, speech or natural language classification, in which they're the clear option).

If you don't have examples with labels, I'd recommend k-means or hierarchical clustering as a first stab at the problem, which as Corrado pointed out, need appropriate distance functions in order to define what data is closely grouped vs. not. If you're not satisfied, from there you may also consider GMMs or DBSCAN, which excel at certain classes of problem but require more delicate handling (tuning of hyperparameters can prove tricky).

In general would recommend learning about what the assumptions behind each model are and thinking through whether they make sense for your data. Wikipedia and YouTube are great resources for learning about all the things I've listed.

Hope that helps!

-Jake

Daniel Schuette

One other thing you might want to consider as well is transforming your (supposedly skewed?) data prior to analysis. You could use e.g. a log- or Box Cox-transformation and after doing so, a simple histogram or a Chi-squared test to see whether that helps to achieve normality.

As soon as your data is normally distributed, most analyses will become much easier.

Best, Daniel

Akhmad Faqih

Like what Corrado said, NN and SVM are good models for solving classification data. But, for better accuracy and recognition of the outputs, we need preprocessing data (such as data transformation, data normalization, etc.). Besides, the activation function that we choose is important too.

Amir Hossein Poorjam

There is a spectrum of non-parametric classification algorithms, such as SVM, NN, k-nearest-neighbor and Parzen, that could deal with non-normal data. Choosing the best classifier from this spectrum (from one side of the spectrum to the other side) then depends on many factors such as the amount of data, dimension of the feature space, computational power and time, and the difficulty of the task (separability of the classes in the feature space).

Saeed Ullah

Abhishek Verma First you should need to decide/verify that whether you should need to use/apply classification or clustering method because these two are different fro each other (if in doubt/cannot decide )

check this link:

https://blog.minitab.com/blog/understanding-statistics-and-its-application/what-should-i-do-if-my-data-is-not-normal-v2

and for handling/dealing with non normally distributed data check the below links, it will be helpful for you : https://www.researchgate.net/post/How_to_deal_with_non_normal_data

https://www.isixsigma.com/tools-templates/normality/dealing-non-normal-data-strategies-and-tools/