What is a Cost Function?
In simple terms, a cost function (or loss function) measures how "wrong" our model's predictions are compared to the actual correct answers. The goal of training a neural network is to find the set of parameters (weights and biases) that minimizes this cost function. The lower the cost, the better our model is performing.
Consider a Traning set which is like a tuple of input (x) and output (y) values. It is represented as (x,y). There are m traning examples. Then consider that the neural network has L layers. Sl represents the number of neurons.
There are 2 types of output:
The Cost Function for Logistic Regression
Before we jump into neural networks, let's quickly review the cost function for logistic regression, which is a single-layer neural network. The formula looks like this:
Let's break this down:
J(θ) is the cost function, which depends on our model's parameters, θ.
m is the number of training examples.
y(i) is the actual output for the i-th training example.
hθ(x(i)) is the predicted output from our model for the i-th training example.
The first part, the sum of the log terms, is the primary cost. It penalizes the model when its predictions are wrong.
The second part, 2mλ∑j=1nθj2, is the regularization term. It helps prevent our model from overfitting the training data by penalizing large parameter values. λ is the regularization parameter.
Extending to Neural Networks
A neural network is essentially a collection of interconnected logistic regression units. The cost function for a neural network is a generalization of the logistic regression cost function. The key difference is that a neural network can have multiple output neurons, especially for multi-class classification problems.
The image provided shows the cost function for a neural network with K output units.
This looks a bit more intimidating, but it's built on the same principles. Let's break it down piece by piece:
The Main Cost Term:
The double summation ∑i=1m∑k=1K means we are summing the cost over all training examples (m) and all output neurons (K).
yk(i) is the actual value for the k-th output neuron for the i-th example.
(hΘ(x(i)))k is the predicted value from the k-th output neuron for the i-th example.
The logic within the brackets is identical to the logistic regression cost. We're just applying it to each output neuron individually and summing the results.
The Regularization Term:
This is where things get a little more complex because a neural network has many layers and many weights.
∑l=1L−1 sums over all the layers in the network (except the output layer, which doesn't have a weight matrix going into it). L is the total number of layers.
∑i=1sl sums over all the neurons in layer l. sl is the number of units in layer l.
∑j=1sl+1 sums over all the neurons in layer . sl+1 is the number of units in layer .
(Θji(l))2 is the square of a single weight. Specifically, the weight connecting the i-th unit of layer l to the j-th unit of layer .
The triple summation simply means we're summing the squares of all the weights in the entire network. We do not regularize the bias terms (Θj0(l)) as they are not dependent on the input features.
In essence, the neural network cost function is a powerful tool that combines the principles of logistic regression with a comprehensive regularization scheme to handle the complexity of multi-layered networks. By minimizing this function, we're able to find the optimal weights that allow our network to make accurate predictions.
Comments
Post a Comment