I am trying one layer feed-forward training NN for classification. I want to use Leaky ReLU as hidden layer activation function in matlab?Is it possible?or I should use this layer for just multiple layers?
As i know, it can be applied for NN but i dont make sure that the result will be satisfied. Its method used to simple cases, but for more complex problem, we need to add another process like add layers or something else.
using a single layer feed-forward neural network is equivalent of taking a linear combination of the input data x through the neuron weight W, then processing the resulting value Wx with a nonlinear activation function f(Wx).
If you use a sigmoid as activation function, you are training a logistic regression, if you take a ReLU you are basically cutting every results below 0, and with Leaky ReLU you are allowing small values for negative results.
With this in mind, using a single layer NN is not really exploiting the possibilities that multi-layers NN offers, in terms of nonlinear combinations of the input data, but as per your question you can still use a Leaky ReLU as activation function.
In biologically inspired neural networks, the activation function is usually an abstraction representing the rate of action potential firing in the cell[according to whom?]. In its simplest form, this function is binary—that is, either the neuron is firing or not. The function looks like {\displaystyle \phi (v_{i})=U(v_{i})}, where {\displaystyle U} is the Heaviside step function. In this case many neurons must be used in computation beyond linear separation of categories.
A line of positive slope may be used to reflect the increase in firing rate that occurs as input current increases. Such a function would be of the form {\displaystyle \phi (v_{i})=\mu v_{i}}, where {\displaystyle \mu } is the slope. This activation function is linear, and therefore has the same problems as the binary function. In addition, networks constructed using this model have unstable convergencebecause neuron inputs along favored paths tend to increase without bound, as this function is not normalizable.
All problems mentioned above can be handled by using a normalizable sigmoid activation function. One realistic model stays at zero until input current is received, at which point the firing frequency increases quickly at first, but gradually approaches an asymptote at 100% firing rate. Mathematically, this looks like {\displaystyle \phi (v_{i})=U(v_{i})\tanh(v_{i})}, where the hyperbolic tangent function can be replaced by any sigmoid function. This behavior is realistically reflected in the neuron, as neurons cannot physically fire faster than a certain rate. This model runs into problems, however, in computational networks as it is not differentiable, a requirement to calculate backpropagation.
The final model, then, that is used in multilayer perceptrons is a sigmoidal activation function in the form of a hyperbolic tangent. Two forms of this function are commonly used: {\displaystyle \phi (v_{i})=\tanh(v_{i})} whose range is normalized from -1 to 1, and {\displaystyle \phi (v_{i})=(1+\exp(-v_{i}))^{-1}} is vertically translated to normalize from 0 to 1. The latter model is often considered more biologically realistic, but it runs into theoretical and experimental difficulties with certain types of computational problems.
Thanks for your answer. I used the ReLU for hidden layer and logsig for last layer. if I want the probability as a output, is the output of the logsig the probability? what should i do to have probability for binary classification?
yes, in case of a binary classification problem, you can interpret the output of the logsig neuron as a probability.
In case your output layers has 2 neurons, each providing the score of the input data belonging to either of your two classes, you should normalize the scores so that they sum to 1.
I appreciate your comments.In the output layer I have just one 1 neuron and I get the outputs between [.5 and 1].is it correct to have output between [.5 1] for all testing set?? in this case is the output probability??