Multi-class classification: Apart from softmax, what are the other standard ways to map infinite real-world values to finite probability scores ?

Titas De

Hi,

In multi-class classification, mapping infinite real-world values to finite probability scores is crucial for making decisions. Apart from the widely-used softmax function, there are several other methods and approaches to achieve this. Here are some of the alternatives:

1. Sigmoid Activation Function:**

- While typically used in binary classification, the sigmoid function can be extended to multi-class classification by applying it independently to each class. However, this does not guarantee that the resulting probabilities will sum to 1.

2. Platt Scaling:**

- Platt Scaling is a method that applies a sigmoid function to the output of a classifier to map real-valued outputs to probabilities. It is often used for binary classifiers, but can be extended to multi-class classifiers through one-vs-rest (OvR) schemes.

3. Isotonic Regression:**

- This is a non-parametric method to map predicted values to probabilities by fitting a piecewise constant non-decreasing function. It can be used for calibration in both binary and multi-class settings.

4. Temperature Scaling:**

- This technique involves dividing the logits (outputs of the neural network before applying softmax) by a temperature parameter before applying the softmax function. This parameter is usually optimized on a validation set and can improve the calibration of the probabilities.

5. Rank-based Methods:**

- These methods assign probabilities based on the rank order of the scores rather than their absolute values. For instance, a rank normalization can be performed where the ranks are converted to probabilities.

6. Bayesian Methods:**

- Bayesian approaches can be used to estimate probabilities by incorporating prior distributions and updating them with observed data. This can lead to a probabilistic interpretation of the output scores.

7. Ensemble Methods:**

- Techniques like bagging and boosting can produce probabilistic outputs by combining the predictions of multiple classifiers. Methods like Random Forests and Gradient Boosting Machines (GBM) inherently provide probability estimates.

8. Logistic Regression on Output Scores:**

- Fitting a logistic regression model to the scores output by the base classifier can help convert those scores into probabilities. This is especially useful in calibration tasks.

9. Dirichlet Calibration:**

- A method specifically designed for multi-class calibration. It models the probability distribution of the class scores using a Dirichlet distribution.

10. Quantile Transformation:**

- This technique transforms the scores into a uniform distribution before converting them into probabilities. This can be useful when the original score distribution is unknown or irregular.

Each of these methods has its own advantages and use cases. The choice of method often depends on the specific requirements of the task, the nature of the data, and the performance of the baseline classifier.

Please recommend this reply if you find it useful , Thanks

RNA later for the preservation of RNA in fecal samples at room temperature for one day (37°C)?

How to develop an academic literacy program for engineering at the higher education level?

How can i generate a CRISPR knockin mutation zebrafish model with a reporter?

What should be the best Lumens range for T8 (120cm) full spectrum LED lamp tubes?

Cross Attention in Transformers: Standard applications of the same ?

Time Series Analysis: Has Recurrent Neural Networks (RNN) ever been used on Time Series Analysis ?

LSTM on Time Series: Has LSTM architectures ever been applied to Time-Series Forecasting ?

What could be causing these smears in my PCR electrophoresis gel?

What are the typical applications of Large Vision Models (LVMs) ?

Are there standard libraries/frameworks for doing RLHF for training LLMs ?

How to merge two or more linkage maps?

Enquire about the calculation of percentage of reads mapped to the reference sequence?

How can I determine if a drug binds to a ribosome using a Cryo-EM map?

Are the coordinates the center of the cell or a corner in GADM / worldclim maps in R?

Is there any tutorial for creating Suitability Maps for CA-Markov Land cover prediction?

Nutrition map: What does it mean for you?

In CNN, is the feature map obtained randomly by convolution kernel?

Unexpected error condition encountered please check the integrity of the operation performed, the data files used & the amount of disk space availabl?

How can I make landcover/ land use prediction map on ArcGIS pro?

I am starting tp learn molecular docking but am stuck on making a .map file by autogrid, how do I get past this error?