4. Combining Data: Understanding SQL JOINs
SQL JOINs are fundamental for any data analysis, as they allow you to combine columns from two or more tables based on related column values. Without joins, the information scattered across separate tables remains siloed.
We'll use two simple tables, Employees and Departments, to illustrate the core join types.
Sample Data
| Employees Table | |||
| EmployeeID | Name | DeptID | Salary |
| 101 | Alice | D1 | 60000 |
| 102 | Bob | D2 | 80000 |
| 103 | Charlie | D1 | 75000 |
| 104 | David | D4 | 55000 |
| Departments Table | ||
| DeptID | DeptName | Location |
| D1 | Sales | East |
| D2 | Marketing | West |
| D3 | HR | Central |
1. INNER JOIN (The Intersection)
The INNER JOIN returns only the rows that have matching values in both tables. Any row without a match in the other table is excluded.
Concept: What employees are successfully matched to a department? (David, with DeptID D4, and the HR department, D3, have no match and are excluded.)
ID = D.DeptID;
| Name | DeptName |
| Alice | Sales |
| Bob | Marketing |
| Charlie | Sales |
2. LEFT JOIN (Keep All Left) 👈
The LEFT JOIN (or LEFT OUTER JOIN) returns all rows from the left table (the first table listed) and the matching rows from the right table. If there is no match in the right table, NULL values are returned for the right table's columns.
Concept: Show all employees, and if they have a department, show it. (We keep David, but his DeptName is NULL.)
| Name | DeptName |
| Alice | Sales |
| Bob | Marketing |
| Charlie | Sales |
| David | NULL |
3. RIGHT JOIN (Keep All Right)
The RIGHT JOIN (or RIGHT OUTER JOIN) returns all rows from the right table and the matching rows from the left table. If there is no match in the left table, NULL values are returned for the left table's columns.
Concept: Show all departments, and if they have employees, show them. (We include HR, but its employee name is NULL.)
| Name | DeptName |
| Alice | Sales |
| Bob | Marketing |
| Charlie | Sales |
| NULL | HR |
4. FULL JOIN (Keep All)
The FULL JOIN (or FULL OUTER JOIN) returns all rows when there is a match in either the left or the right table. If there is no match, it fills in the missing side with NULLs.
Concept: Show everything! (Includes both David and the HR department.)
SELECT E.Name, D.Name FROM Employee E FULL OUTER JOIN Department DON E.DeptID = D.DeptID;
Other Important Joins
CROSS JOIN: Returns the Cartesian Product—every row from the first table is combined with every row from the second table. This is rarely used in analysis unless you intentionally need to generate every possible combination.
SELF JOIN: This is not a unique type of join, but a technique where you join a table to itself. It's used when a table contains a relationship within its own data (e.g., finding all employees who report to the same manager, using two aliases for the same
Employeestable).
Comments
Post a Comment