20. Index basics

What is an index? 

Imagine a giant book with 10 million pages.
You want the page that talks about “Spark ETL”.

Without an index, you would:

  • Flip page 1

  • Flip page 2

  • Flip page 3...
    …until you find it.

That’s slow.
Now imagine doing that on a database table with a million rows.
Same pain.

An index is a shortcut : like the index at the back of a book.

In books:

  • “Spark ETL → Page 567”

In databases:

  • WHERE name = "Kavana"
    → The index jumps directly to the row instead of scanning everything.


Why do we need an index?

Without an index:
Database checks EVERY row. (called full table scan)

With an index:
Database jumps to the exact row instantly.

Speed difference:

  • Without index: O(n) → slow as the table grows

  • With index: O(log n) → lightning fast


Small SQL example 

Table: users

Columns: idnameemail

If we search:

    SELECT *
    FROM users
    WHERE email = "kav@example.com";

Without index on email:

DB checks all 5 million rows.

With index on email

DB jumps directly to the row.

Creating the index:

CREATE INDEX idx_email ON users(email);
 CREATE INDEX idx_email ON Users(email);
 SELECT *
    FROM users
    WHERE email = "kav@example.com";
Boom. Search is 100x faster.

What indexes actually store 

An index is like a mini sorted phonebook that maps:

email → row location namerow location id → row location

Think:

  • index = “cheat sheet”

  • table = “full data”


Downside 

Indexes speed up searching, but slow down inserts/updates, because the index must stay sorted.

So:

  • Good for searching (WHERE, JOIN)

  • Too many indexes = slower writes

Comments

Popular Posts