Machine Learning

From Basic to Advanced: The DFS & BFS of Machine Learning This blog is my journey through the entire landscape of Machine Learning - from the simplest building blocks to the most advanced concepts. Think of it as exploring ML in two ways: BFS (Breadth-First Search): covering a wide range of topics - algorithms, math, coding, and applications. DFS (Depth-First Search): diving deep into the core principles behind models, optimization, and real-world problem solving.

Tuesday, November 18, 2025

Data Ingestion

Data ingestion, in essence, involves transferring data from a source to a designated target.

Its primary aim is to usher data into an environment! ronment primed for staging, processing, analysis, and artificial intel! ligence/machine learning (AI/ML). While massive organizations may focus on moving data internally (among teams), for most of us, data ingestion emphasizes pulling data from external sources and directing it to in-house targets.

ETL steps

Data ingestion: now vs then

Old world: traditional ETL
- Extract → Transform → Load
- You heavily clean/transform before loading into a warehouse.
New world: mostly ELT
- Extract → Load everything into cheap cloud storage → Transform later
- Storage is cheap, compute is flexible, so people prefer “store first, think later”.
Big trends that changed ingestion:
- Cloud + warehouses + lakehouses
- Streaming/real-time data, not just nightly batches

So: we still do “ETL”, but the order + tools changed.

Monday, November 17, 2025

20. Index basics

What is an index?

Imagine a giant book with 10 million pages.
You want the page that talks about “Spark ETL”.

Without an index, you would:

Flip page 1
Flip page 2
Flip page 3...
…until you find it.

That’s slow.
Now imagine doing that on a database table with a million rows.
Same pain.

An index is a shortcut : like the index at the back of a book.

In books:

“Spark ETL → Page 567”

In databases:

WHERE name = "Kavana"
→ The index jumps directly to the row instead of scanning everything.

Why do we need an index?

Without an index:
Database checks EVERY row. (called full table scan)

With an index:
Database jumps to the exact row instantly.

Speed difference:

Without index: O(n) → slow as the table grows
With index: O(log n) → lightning fast

Small SQL example

Table: `users`

Columns: id, name, email

If we search:

    SELECT *
    FROM users
    WHERE email = "kav@example.com";

Without index on email:

DB checks all 5 million rows.

With index on email

DB jumps directly to the row.

Creating the index:


CREATE INDEX idx_email ON users(email);
 CREATE INDEX idx_email ON Users(email);
 SELECT *
    FROM users
    WHERE email = "kav@example.com";
Boom. Search is 100x faster.

What indexes actually store

An index is like a mini sorted phonebook that maps:


email → row location
name → row location
id → row location

Think:

index = “cheat sheet”
table = “full data”

Downside

Indexes speed up searching, but slow down inserts/updates, because the index must stay sorted.

So:

Good for searching (WHERE, JOIN)
Too many indexes = slower writes

Wednesday, November 5, 2025

15. Window Functions

Powering Advanced Analytics in SQL

Window Functions perform calculations across a set of table rows that are somehow related to the current row, but crucially, they do not collapse the rows. This allows for sophisticated analytical tasks like ranking, calculating running totals, or comparing a row's value to a previous or next row, all within the same result set.

What is a Trigger?

A trigger in SQL is a special kind of stored program that automatically executes when certain events occur in a table — such as an INSERT, UPDATE, or DELETE.

Think of it like a “background script” that runs whenever your data changes — no manual call needed!

Trigger Syntax



CREATE TRIGGER TriggerName(
{ BEFORE | AFTER } { INSERT | UPDATE | DELETE }
ON Table_Name
FOR EACH ROW
BEGIN 
--
END;

Example: Automatically Update Inventory After a Sale
Trigger Code 
CREATE TRIGGER Update_Inventory_Trigger(
AFTER INSERT 
ON Sales
FOR EACH ROW
BEGIN 
UPDATE Inventory
SET quantity = quantity - :NEW.qantity
WHERE product_id = :NEW.product_id
END;

Explanation

:NEW

→ Refers to the newly inserted row in Sales

12. Stored Procedure

A Stored Procedure is a set of SQL statements that performs an operation or sequence of operations. Think of it as a complete program stored on the database server.

Example Use Case: Automated Order Processing

A procedure might be used to finalize an order

Update the Inventory table (UPDATE).
Insert a new record into the Sales table (INSERT).
Commit the transaction.

Oracle PL/SQL Stored Procedure

`CREATE OR REPLACE PROCEDURE Finalize_Order ( ... )`

Purpose: Defines a stored procedure in Oracle.
CREATE OR REPLACE: Creates a new procedure or replaces an existing one with the same name.
Finalize_Order: The name of the stored procedure.
(...): The parameter list—used to pass input values into the procedure.


CREATE OR REPLACE PROCEDURE Finalize_Order(

p_product_id   IN NUMBER,
p_quantity     IN NUMBER,
p_customer_id  IN NUMBER

Each line defines a parameter:
- p_product_id: product being sold.
- p_quantity: how many units are sold.
- p_customer_id: the buyer’s ID.
IN → means data flows into the procedure (input-only parameter).

NUMBER → defines the datatype.

`IS`

IS

Marks the start of the procedure body in PL/SQL.
Similar to saying “beginning of declaration and code section.”

Use SET SERVEROUTPUT ON; in SQL Developer or SQL*Plus to see the output.

`BEGIN`

BEGIN

Marks the start of executable statements.
All SQL logic (update, insert, etc.) goes between BEGIN and END.

Use SET SERVEROUTPUT ON; in SQL Developer or SQL*Plus to see the output.

Update Inventory

UDATE Inventory

SET quantity = quantity - p_quantity

WHERE product_id = p_product_id;

Reduces stock quantity by the amount sold.
Inventory → table name.
SET quantity = quantity - p_quantity → subtracts ordered quantity from available stock.
WHERE product_id = p_product_id → ensures only that product’s row is updated.

Insert Sale Record

INSERT INTO Sales(sales_id, product_id, customer_id,

quantity, sale_date)

VALUES (sales_id.NEXTVAL, p_product_id,

p_customer_id, p_quantity, SYSDATE);

Adds a new row into the Sales table for this transaction.
sales_seq.NEXTVAL → uses a sequence named sales_seq to generate a unique sale ID automatically.
SYSDATE → system date/time (when the sale happens).

Commit Transaction


COMMIT;

Makes the changes permanent in the database (update + insert).
Without COMMIT, changes wouldn’t persist after the procedure ends.

COMMIT;

Success Message


DBMS_OUTPUT.PUT_LINE('Order finalized successfully.');

Prints a message to the SQL console (for debugging/confirmation).
DBMS_OUTPUT is an Oracle package for displaying messages.
Use SET SERVEROUTPUT ON; in SQL Developer or SQL*Plus to see the output.

DBMS_OUTPUT.PUT_LINE('Order finalized successfully.');

Full code:


CREATE OR REPLACE PROCEDURE Finalize_Order(

p_product_id   IN NUMBER,
p_quantity     IN NUMBER,
p_customer_id  IN NUMBER

IS
BEGIN
  -- Step 1 : Update Inventory
  UDATE Inventory  SET quantity = quantity - p_quantity
  WHERE product_id = p_product_id;
  -- Step 2 : Insert Sales Values 
  INSERT INTO Sales(sales_id, product_id,
 customer_id, quantity, sale_date)
  VALUES (sales_id.NEXTVAL, p_product_id,
 p_customer_id, p_quantity, SYSDATE);
COMMIT;
DBMS_OUTPUT.PUT_LINE('Order finalized successfully.');
END Finalize_Order;

To Run it:

SET SERVEROUTPUT ON;
EXEC Finalize_Order(101, 2, 501);

Fl Key Characteristics

Purpose: To execute a sequence of DML (INSERT, UPDATE, DELETE) or DDL (CREATE, ALTER) commands, or complex retrieval tasks.
Return Value: Can return zero, one, or multiple result sets (tables of data). It can also return a single numeric value (an exit code or status code) to indicate success or failure.
Execution: Executed using the EXEC or EXECUTE command.
Transactional Control: Procedures can manage transactions (using BEGIN TRAN, COMMIT, ROLLBACK), making them ideal for ensuring multi-step processes either fully succeed or completely fail.

11. Views and Materialized Views

In SQL, Views and Materialized Views are essential tools for both simplifying complex queries and significantly improving reporting performance. They don't store data themselves (in the case of a standard View) but rather act as permanent, pre-defined windows into your underlying data.

1. Views: The Virtual Table

A View is a saved SQL query that is stored in the database as a virtual table. When you query a View, the database executes the underlying SQL query in real-time.

Purpose and Usage

Simplification: Views hide the complexity of joining multiple tables. Instead of writing a complex 5-table JOIN query every time, you simply SELECT * FROM SalesReportView.
Security: You can restrict a user's access to only certain columns and rows by granting permission to the View, while denying access to the sensitive base tables.
Logic Abstraction: If the definition of a calculation (like "Active Users") changes, you update the View once, and all reports and applications relying on it are automatically updated.

Creation Example

This View combines employee names with their department names, hiding the complexity of the JOIN.

CREATE VIEW EmployeeDepartmentView AS
SELECT
    E.Name,
    D.DeptName,
    E.Salary
FROM
    Employees E
INNER JOIN
    Departments D ON E.DeptID = D.DeptID
WHERE
    E.IsActive = TRUE;

Key Feature: Real-Time Data

Views do not store data; they are just stored SQL code. Every time the View is queried, it pulls the very latest data from the base tables.

2. Materialized Views: The Performance Snapshot

A Materialized View (MV) is similar to a standard View, but it is different in a critical way: it stores the results of the query in a dedicated physical table.

Purpose and Usage

Performance: MVs are designed for speed. They pre-calculate and store the results of complex aggregations (like SUM,AVG,GROUP BY) or slow, multi-table joins.
Reporting: Ideal for dashboards and reports that query the same, large data set repeatedly, as the database simply reads from a static table rather than re-running the complex query.

Creation Example

This MV calculates total sales per city and stores the result.

CREATE MATERIALIZED VIEW CitySalesSummary AS
SELECT
    C.City,
    SUM(S.Quantity * P.Price) AS TotalRevenue
FROM
    Sales S
JOIN
    Customer C ON S.CustomerID = C.CustomerID
JOIN
    Product P ON S.ProductID = P.ProductID
GROUP BY
    C.City;

Key Feature: Refreshment

Since the data is static, MVs must be periodically updated or "refreshed."

Manual Refresh: The MV is only updated when you explicitly run the refresh command (e.g., REFRESH MATERIALIZED VIEW CitySalesSummary;).
Scheduled Refresh: In production environments, MVs are often refreshed automatically on a schedule (e.g., every hour, or once nightly) to balance data freshness against performance gains.

Tuesday, November 18, 2025

Data ingestion: now vs then

Monday, November 17, 2025

What is an index?

An index is a shortcut : like the index at the back of a book.

Why do we need an index?

Speed difference:

Small SQL example

Table: users

Without index on email:

With index on email

What indexes actually store

Downside

Wednesday, November 5, 2025

Powering Advanced Analytics in SQL

What is a Trigger?

Trigger Syntax

Example: Automatically Update Inventory After a Sale

Trigger Code

Explanation

Example Use Case: Automated Order Processing

Oracle PL/SQL Stored Procedure

CREATE OR REPLACE PROCEDURE Finalize_Order ( ... )

IS

BEGIN

Update Inventory

Insert Sale Record

Commit Transaction

Success Message

Full code:

To Run it:

Fl Key Characteristics

1. Views: The Virtual Table

Purpose and Usage

Creation Example

Key Feature: Real-Time Data

2. Materialized Views: The Performance Snapshot

Purpose and Usage

Creation Example

Key Feature: Refreshment

Table: `users`

`CREATE OR REPLACE PROCEDURE Finalize_Order ( ... )`

`IS`

`BEGIN`