In machine learning, more deeply in the neural network, we use a sigmoid function to restrict the outcome to binary numbers. What other functions, if any, can we use for the same purpose, and why? If not please explain it with a reason.
IMHO ...the trigger in the neutron model is a unit step function , requires a delta
as derivative , the delta in FT is comprising of all frequencies with equal weightage , so it seems the most appropriate. The weights and bias inputs give a semblance of control. Of course in particular cases a modification of the delta derivative function using some other functions could be useful.
There are number of activation functions are there used for different purpose and also its keep evolving. Such as Sigmoid, Tanh, RELU, LeakyRelu etc. Depends on the requirements we need to select one.
Example Sigmoid: If the objective is to map real value into range of 0 to 1,
Issues with Sigmoid: Vanishing Gradient problem
Tanh: If the objective is to map real value into range of -1 to 1
Issue: This also saturated at zero like sigmoid
But converges very slow.
Relu: If the objective is to map real value into range of 0 to infinity.
Pros: Fast converges as compared to Sigmoid and TanH.
Cons: Quickly vanish is the learning rate is high
Leaky Relu: Fix the cons of relu, as with negative input, it produces a small slope. No vanishing or dying issue.
I would suggest based on your input weight, data , processing budget you can choose the best activation function. I have tried all, for me Leaky relu out perform.
As the previous answers look OK, I'd like to remark that the main goal of the activation function is to introduce a non-linearity to the model. If you dont use a non-linearity, you end up with a model equivalent to a single layer perceptron.
There are plenty of activation functions, you can dig into a framework like PyTorch for knowing some new ones: https://pytorch.org/docs/stable/nn.html (Check the section "Non-linear activations" in the right side).
The choice of activation function for a Neural Network depends upon the objective and in which layer they are begin used. In most recent research work using neural networks or convolutional neural networks, "ReLU" or "LeakyReLU" activation functions are being used for hidden or intermediate layers. The choice of activation function for the final layer or output layer depends upon the output requirement. For example, we can use the "Sigmoid" activation function for binary classification problems (output is either 0 or 1) or multi-label classification problems. For multi-class classification, ''softmax" activation function is used. We can go for 'linear' activation for regression problems.
Hence, consider your objective before choosing an activation function for the output layer of a Neural Network. See the labels or target (for supervised learning)
If one properly understood the relationship between neural networks and optimal decision theory (in my experience, no one does), then one would find that the appropriate non-linear operator at the heart of an AI neuron would be closely related to the appropriate LLR transformation.