Cost function

"Cost Function": Why We Care About Error in Machine Learning

In our last post, we talked about how a Machine Learning model learns a "hypothesis" ( $h (x)$ ) to predict an output ( $y$ ) based on an input ( $x$ ). But how do we know if our hypothesis is any good? How do we know if it's making accurate predictions? This is where the Cost Function comes into play.

Imagine you're trying to draw a straight line that best fits a bunch of scattered data points on a graph. You might draw one line, then another, and another, trying to find the one that seems "closest" to all the points. The Cost Function is essentially the mathematical way for our computer to figure out how "close" its current hypothesis line is to the actual data points.

Our Simple Hypothesis: A Straight Line

For many basic Machine Learning problems, especially when we're trying to predict a continuous value (like house prices), our hypothesis often starts simple. For example, in linear regression, our hypothesis $h (x)$ looks like this:

$h (x) = θ_{0} + θ_{1} x$

If you've ever dealt with equations of a straight line, you'll recognize this! It's just like $y = m x + c$ , where:

$θ_{0}$ (theta-naught) is like your 'c' or the y-intercept (where the line crosses the vertical axis).
$θ_{1}$ (theta-one) is like your 'm' or the slope (how steep the line is).

Our goal is to choose the best values for $θ_{0}$ and $θ_{1}$ so that our predicted $h (x)$ is as close as possible to the actual output $y$ for every single data point $(x, y)$ in our training set.

Why Do We Care About "Error"?

This is the crucial part! We need to know how "wrong" our current hypothesis is. This "wrongness" is what we call error.

Think of it this way:

Predicted Value: This is $h (x)$ , what our model thinks the output should be.
Actual Value: This is $y$ , the true output from our training data.

The difference between these two is the error for a single data point: Error = Predicted Value - Actual Value or $h (x^{(i)}) - y^{(i)}$

If our model predicts a house price of $200,000 when the actual price was $205,000, the error is $200,000 - 205,000 = -$5,000. If it predicts $210,000, the error is $210,000 - 205,000 = +$5,000.

The Mean Squared Error (MSE) Cost Function

Just looking at the error for one point isn't enough. We need to measure the total "wrongness" across all our training data points. The most common way to do this for regression problems is using the Mean Squared Error (MSE), which serves as our Cost Function.

The Cost Function, denoted as $J (θ_{0}, θ_{1})$ , looks like this:

Let's break this down piece by piece:

$(h (x^{(i)}) - y^{(i)})^{2}$ : This is the "squared error" for a single training example $i$ . We square the error for two main reasons:
1. To make all errors positive: Whether our prediction is too high or too low, squaring it makes the error positive. This prevents positive and negative errors from canceling each other out when we sum them up.
2. To penalize larger errors more heavily: Squaring makes larger errors much more significant than smaller ones. This encourages the model to avoid big mistakes.
∑i=1m: This is the summation symbol. It means we add up all the squared errors for every single training example, from the first one ( $i = 1$ ) all the way to the last one ( $m$ ). So, if you have 100 houses in your training set, you'll calculate the squared error for each house and add all 100 values together.
2m1: This is the averaging part (mostly). We divide by $m$ (the total number of training examples) to get the "mean" or average of all the squared errors. The '2' is there for mathematical convenience; it simplifies calculations later on when we optimize the function.
minimizeθ0,θ1 - J(θ0,θ1): This is our ultimate goal! We want to find the values of $θ_{0}$ and $θ_{1}$ that make the Cost Function $J (θ_{0}, θ_{1})$ as small as possible.

Why We Use Error to Improve the Model

Think back to our line-drawing analogy. If your first line is far away from many points (high cost), you know you need to adjust its slope ( $θ_{1}$ ) and intercept ( $θ_{0}$ ). You'll nudge $θ_{0}$ and $θ_{1}$ a bit, redraw the line, and then calculate the cost again. If the cost went down, you're moving in the right direction! If it went up, you know you need to adjust in the opposite direction.

The Cost Function gives us a single number that tells us "how well" our current hypothesis fits the training data. Our entire aim in training a Machine Learning model is to minimize this Cost Function. By continuously calculating the error and adjusting our model's parameters ( $θ_{0}$ and $θ_{1}$ in this case) to reduce that error, we guide the model towards learning the best possible relationship between inputs and outputs.

Next up - https://kavanamlstuff.blogspot.com/2025/08/cost-function.html

Machine Learning

Search This Blog