Is there any formal test for linear separability of 2-class data?

More Vasudha Bhatnagar's questions See All

Do the simulated heads show an increasing trend or does the model require spin up?

I ran Modflow using a 2005 engine over my study area and got my first results. The result at one such location is attached herewith. There is a difference between simulated and observed time...

08 May 2024 6,551 0 View

Why am I getting an error while running Modflow in steady state ?

I am trying to set up groundwater flow model over the Ganga Basin (0.4 million sq km) using Modflow. I am using Visual Modflow Classic and Modflow 2005 engine. I had set up the transient model but...

04 December 2023 905 3 View

What is the equivalent of a 2 hour 100,000g spin on a 48,000g centrifuge?

Hello, I just graduated from college and am conducting immunology research. I am following a protocol that isolates extracellular vesicles which asks for an ultra-centrifugation step at 100,000g...

26 February 2023 8,945 2 View

The interpolation of the initial heads in Modflow yields values above the ground surface. How to fix this?

The interpolation of the initial heads in Modflow yields values above the ground surface. I supplied a grid file of initial heads, containing the already interpolated head values using rigging...

20 January 2023 1,063 1 View

Can you please share the Authentic Leadership Style Questionnaire?

Dear Fellow Researchers, I am researching on authentic leadership style and require a self-reported questionnaire to measure it. Can you please suggest the questionnaire? I will be obliged if you...

29 November 2022 255 10 View

How does modflow handle overlapping of zones?

If there is overlapping in defining conductivity zones (for instance), then there will be cells in the grid which will come under both the zones, then which value does modflow assign to such...

22 November 2022 6,241 1 View

What does the PEST run error "error reading line 503 of model output" mean?

I tried to calibrate my model using pilot point approach of PEST module in modflow flex 6.1. I am getting the following error: "oh05-01-200000000503: line 503 of model output file...

17 November 2022 8,714 0 View

How to check the presence of Cadmium ions in a given complex by doing chemical analysis?

I want to check the presence of cadmium ions in a complex by doing a Chemical analysis, kindly suggest a method.

25 October 2022 3,289 4 View

What does the error " '00-1' is not a valid integer value " mean in MODFLOW Classic?

I am facing this #error while giving input of river to Visual #MODFLOW Classic. The input is as a #shapefile. The error is '00-1' is not a valid integer. PFA the screenshot of the error for...

17 October 2022 8,185 2 View

Can modflow handle negative recharge values?

On running the modflow model, the volumteric budget mentions the term recharge out. Does this recharge out look after the negative recharge? Does this mean modflow can handle negative/zero...

16 June 2022 5,200 2 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View

Could dyes amplify the spectrum of light to a specific wavelength?

I am interested to know the behavior of dyes toward light. Specifically, Blue dyes re-emit the spectrum, especially from the green zone (known as principal in LED lamps, and blue dyes are known...

05 August 2024 3,290 1 View

How to report results of Generalised Linear Mixed Models in a journal article?

Hi everyone, If you have written or come across any papers where Generalised Linear Mixed Models are used to examine intervention (e.g., in mental health) efficacy, could you please share the...

04 August 2024 4,130 4 View

Need help with my research project on open source SIEM and machine learning?

Hello everyone, I am currently working on a research project that aims to integrate machine learning techniques into an open source SIEM tool to automate the creation of security use cases from...

04 August 2024 3,196 2 View

Swimming/space travel depends on the proprioceptive muscle spindles?

When the entire neocortex is ablated in rodents, although they are still able to swim, all the limbs move continuously and asynchronously (Vanderwolf 2006; Vanderwolf et al. 1978). Normal animals...

03 August 2024 835 3 View

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?

Machine learning (ML) has shown great potential in predicting the compressive strength of concrete, an important property for structural engineering. However, its practical application comes with...

03 August 2024 2,546 2 View

Samer Sarsam

Hi Vasudha,

Visualizing the data is useful in such cases. Another option is training a linear classifiers and checking if you can get, e.g., zero errors. Then your dataset is linearly separable. Otherwise, it's nonlinearly separable.

HTH.

Samer

Niall Twomey

there is no simple answer to this question :-) the modelling choices that you make depends on the application you're looking at, the amount of data, and a bunch of other things. i'll walk you through a few considerations.

firstly. it sounds like you are trying to make a decision on which kernel type to use based on the results of a test. that you want to say "if the data is linearly separable use a linear kernel, otherwise use an RBF." this line of logic is inadvisable since the test ignores how the boundary affects predictive performance on new data. and this is really at the core of machine learning so i definitely wouldn't advise taking this route. if all that you want to get 100% accuracy on training data just use kNN with k=1. but this won't likely generalise to new data.

also, it is worth bearing in mind that you won't always want to select the boundary that separates the data perfectly. you can think of a simple example in 1 dimension where data from the negative class lies between -2 and -1 on the real line, and data from the positive class lies between 1 and 2 on the real line. so, given this data it would seem reasonable to choose a decision boundary at 0, right?

now. if there is one outlier in the negative class that lies at the point 0.9, the data is still separable, but all possible decision boundaries that perfectly separate the two classes are found between the values of 0.9 and 1. in this situation selecting a boundary of 0 is still probably very reasonable. however, by selecting decision boundaries that separate the data perfectly you will select one that feels much too far to the right.

so, often you will want to trade away predictive accuracy for classification margin.

by the way, selecting kernel functions depends on other features of the dataset. if there are a great many features compared to dataset size, you should prefer linear kernels even if the data isn't linearly separable since this reduces the risk of overfitting.

so, what i advise is that you read around model selection and cross validation. lots of important references in the following link: https://en.wikipedia.org/wiki/Cross-validation_(statistics)

Vasudha Bhatnagar

Thanks Samer and Niall.

@Samer, i understand that visualization is the only way other than training. Is this right?

What prevents developing a mathematical method for the test?

You're welcome Vasudha.

Visualizing the data is the very basic method generally. Training particular learner can be after that.

However, regarding your question, I don't think that there is an issue with developing a mathematical method for testing purpose. Have a look at this:

https://en.wikipedia.org/wiki/Linear_separability

A couple of comments to expand on a couple of points from Hasan's good answers.

1) You can visualise multidimensional data using something like a pair plot. This visualisation technique considers pairs of all features, and creates a matrix of scatter plots of all pairs of axes. See the link below for an example. If the data are separable over one or two dimensions you will see the separation in this visualisation. However, if the separating hyperplane spans more than two dimensions, the pair plot won't necessarily reveal that the data are separable. Additionally, pair plots doesn't scale well for massive numbers of dimensions.

2) One issue with the perceptron is that with it you can only say that data are linearly separable. If the algorithm doesn't converge, you can't say that the data are not separable.

3) In my experience of experimentation on different kinds of cluster shapes/sizes, you don't tend to get desired separation if the shapes and sizes of the 'real' clusters are dissimilar. Additionally, you will need to put thought into the distance metric to use for the application, and this is nontrivial in general.

The test I would recommend is relatively simple. Learn an SVM model with a linear kernel. Set the C parameter to infinity (or a very high number). This algorithm has finite convergence time to find the optimal solution since the optimisation is quadratic. Once it converges, make predictions on the training data. Your data is separable if you get 100% accuracy.

But please note that this may not be a good solution if you want to make predictions on future data. See my first answer for more.