SQL Notes

Introduction to SQL

What is Database?

Database is a collection of interrelated data.

What is DBMS?

DBMS (Database Management System) is software used to create, manage, and organize databases.

What is RDBMS?

RDBMS (Relational Database Management System) - is a DBMS based on the concept of tables (also called relations).
Data is organized into tables (also known as relations) with rows (records) and columns (attributes).
Examples: MySQL, PostgreSQL, Oracle etc.

What is SQL?

SQL is Structured Query Language - used to store, manipulate and retrieve data from RDBMS.

(It is not a database, it is a language used to interact with database)

We use SQL for CRUD Operations:

CREATE - To create databases, tables, insert tuples in tables etc
READ - To read data present in the database.
UPDATE - Modify already inserted data.
DELETE - Delete database, table or specific data point/tuple/row or multiple rows.

Note: SQL keywords are NOT case sensitive. Eg: select is the same as SELECT in SQL.

SQL v/s MySQL

SQL is a language used to perform CRUD operations in Relational DB, while MySQL is a RDBMS that uses SQL.

SQL Data Types

In SQL, data types define the kind of data that can be stored in a column or variable.

DATATYPE	DESCRIPTION	USAGE
CHAR	string(0-255), can store characters of fixed length	CHAR(50)
VARCHAR	string(0-255), can store characters up to given length	VARCHAR(50)
BLOB	string(0-65535), can store binary large object	BLOB(1000)
INT	integer(-2,147,483,648 to 2,147,483,647)	INT
TINYINT	integer(-128 to 127)	TINYINT
BIGINT	integer(-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)	BIGINT
BIT	can store x-bit values. x can range from 1 to 64	BIT(2)
FLOAT	Decimal number - with precision to 23 digits	FLOAT
DOUBLE	Decimal number - with 24 to 53 digits	DOUBLE
BOOLEAN	Boolean values 0 or 1	BOOLEAN
DATE	date in format of YYYY-MM-DD ranging from 1000-01-01 to 9999-12-31	DATE
TIME	HH:MM:SS	TIME
YEAR	year in 4 digits format ranging from 1901 to 2155	YEAR

Note: CHAR is for fixed length & VARCHAR is for variable length strings. Generally, VARCHAR is better as it only occupies necessary memory & works more efficiently.

We can also use UNSIGNED with datatypes when we only have positive values to add. Eg - UNSIGNED INT

Types of SQL Commands

DQL (Data Query Language): Used to retrieve data from databases. (SELECT)
DDL (Data Definition Language): Used to create, alter, and delete database objects like tables, indexes, etc. (CREATE, DROP, ALTER, RENAME, TRUNCATE)
DML (Data Manipulation Language): Used to modify the database. (INSERT, UPDATE, DELETE)
DCL (Data Control Language): Used to grant & revoke permissions. (GRANT, REVOKE)
TCL (Transaction Control Language): Used to manage transactions. (COMMIT, ROLLBACK, START TRANSACTIONS, SAVEPOINT)

Data Definition Language (DDL)

Data Definition Language (DDL) is a subset of SQL (Structured Query Language) responsible for defining and managing the structure of databases and their objects.

DDL commands enable you to create, modify, and delete database objects like tables, indexes, constraints, and more.

CREATE TABLE

Used to create a new table in the database.
Specifies the table name, column names, data types, constraints, and more.

CREATE TABLE employees (
    id INT PRIMARY KEY, 
    name VARCHAR(50), 
    salary DECIMAL(10, 2)
);

ALTER TABLE

Used to modify the structure of an existing table.
You can add, modify, or drop columns, constraints, and more.

ALTER TABLE employees ADD COLUMN email VARCHAR(100);

DROP TABLE

Used to delete an existing table along with its data and structure.

DROP TABLE employees;

CREATE INDEX

Used to create an index on one or more columns in a table.
Improves query performance by enabling faster data retrieval.

CREATE INDEX idx_employee_name ON employees (name);

DROP INDEX

Used to remove an existing index from a table.

DROP INDEX idx_employee_name;

CREATE CONSTRAINT

Used to define constraints that ensure data integrity.
Constraints include PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL, and CHECK.

ALTER TABLE orders ADD CONSTRAINT fk_customer 
FOREIGN KEY (customer_id) REFERENCES customers(id);

DROP CONSTRAINT

Used to remove an existing constraint from a table.

ALTER TABLE orders DROP CONSTRAINT fk_customer;

TRUNCATE TABLE

Used to delete the data inside a table, but not the table itself.

TRUNCATE TABLE table_name

Data Query/Retrieval Language (DQL or DRL)

DQL (Data Query Language) is a subset of SQL focused on retrieving data from databases.

The SELECT statement is the foundation of DQL and allows us to extract specific columns from a table.

SELECT

The SELECT statement is used to select data from a database.

-- Select specific columns
SELECT column1, column2, ... FROM table_name;

-- Select all columns
SELECT * FROM table_name;

-- Example
SELECT CustomerName, City FROM Customers;

WHERE

The WHERE clause is used to filter records.

SELECT column1, column2, ... FROM table_name WHERE condition;

-- Example
SELECT * FROM Customers WHERE Country='Mexico';

Operators used in WHERE:

= : Equal
> : Greater than
< : Less than
>= : Greater than or equal
<= : Less than or equal
<> : Not equal (In some versions of SQL this operator may be written as !=)

AND, OR and NOT

The WHERE clause can be combined with AND, OR, and NOT operators.
The AND and OR operators are used to filter records based on more than one condition:
- The AND operator displays a record if all the conditions separated by AND are TRUE.
- The OR operator displays a record if any of the conditions separated by OR is TRUE.
The NOT operator displays a record if the condition(s) is NOT TRUE.

-- AND syntax
SELECT column1, column2, ... 
FROM table_name 
WHERE condition1 AND condition2 AND condition3 ...;

-- OR syntax
SELECT column1, column2, ... 
FROM table_name 
WHERE condition1 OR condition2 OR condition3 ...;

-- NOT syntax
SELECT column1, column2, ... 
FROM table_name 
WHERE NOT condition;

-- Examples
SELECT * FROM Customers WHERE Country='India' AND City='Japan';
SELECT * FROM Customers WHERE Country='America' AND (City='India' OR City='Korea');

DISTINCT

Removes duplicate rows from query results.

SELECT DISTINCT column1, column2 FROM table_name;

LIKE

The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.

There are two wildcards often used in conjunction with the LIKE operator:

The percent sign (%) represents zero, one, or multiple characters
The underscore sign (_) represents one, single character

-- Examples
SELECT * FROM employees WHERE first_name LIKE 'J%';

WHERE CustomerName LIKE 'a%'      -- Finds values that start with "a"
WHERE CustomerName LIKE '%a'      -- Finds values that end with "a"
WHERE CustomerName LIKE '%or%'    -- Finds values that have "or" in any position
WHERE CustomerName LIKE '_r%'     -- Finds values that have "r" in the second position
WHERE CustomerName LIKE 'a_%'     -- Finds values that start with "a" and are at least 2 chars
WHERE CustomerName LIKE 'a__%'    -- Finds values that start with "a" and are at least 3 chars
WHERE ContactName LIKE 'a%o'      -- Finds values that start with "a" and ends with "o"

IN

Filters results based on a list of values in the WHERE clause.

SELECT * FROM products WHERE category_id IN (1, 2, 3);

BETWEEN

Filters results within a specified range in the WHERE clause.

SELECT * FROM orders WHERE order_date BETWEEN '2023-01-01' AND '2023-06-30';

IS NULL

Checks for NULL values in the WHERE clause.

SELECT * FROM customers WHERE email IS NULL;

AS

Renames columns or expressions in query results.

SELECT first_name AS "First Name", last_name AS "Last Name" FROM employees;

ORDER BY

The ORDER BY clause allows you to sort the result set of a query based on one or more columns.

-- Basic syntax
SELECT column1, column2 FROM table_name ORDER BY column1 [ASC|DESC];

-- Example (descending order)
SELECT product_name, price FROM products ORDER BY price DESC;

-- Sorting by multiple columns
SELECT first_name, last_name FROM employees ORDER BY last_name, first_name;

-- Sorting by expressions
SELECT product_name, price, price * 1.1 AS discounted_price 
FROM products ORDER BY discounted_price;

-- Sorting NULL values
SELECT column_name FROM table_name ORDER BY column_name NULLS LAST;

-- Sorting by position
SELECT product_name, price FROM products ORDER BY 2 DESC, 1 ASC;

GROUP BY

The GROUP BY clause in SQL is used to group rows from a table based on one or more columns.

-- Basic syntax
SELECT column1, aggregate_function(column2) 
FROM table_name 
GROUP BY column1;

-- Example with aggregation
SELECT department, AVG(salary) FROM employees GROUP BY department;

-- Grouping by multiple columns
SELECT department, gender, AVG(salary) 
FROM employees 
GROUP BY department, gender;

-- HAVING clause
SELECT department, AVG(salary) 
FROM employees 
GROUP BY department 
HAVING AVG(salary) > 50000;

-- Combining GROUP BY and ORDER BY
SELECT department, COUNT(*) 
FROM employees 
GROUP BY department 
ORDER BY COUNT(*) DESC;

AGGREGATE FUNCTIONS

These are used to perform calculations on groups of rows or entire result sets. They provide insights into data by summarising and processing information.

Common Aggregate Functions:

COUNT(): Counts the number of rows in a group or result set.
SUM(): Calculates the sum of numeric values in a group or result set.
AVG(): Computes the average of numeric values in a group or result set.
MAX(): Finds the maximum value in a group or result set.
MIN(): Retrieves the minimum value in a group or result set.

Data Manipulation Language (DML)

Data Manipulation Language (DML) in SQL encompasses commands that manipulate data within a database. DML allows you to insert, update, and delete records, ensuring the accuracy and currency of your data.

INSERT

The INSERT statement adds new records to a table.

-- Syntax
INSERT INTO table_name (column1, column2, ...) 
VALUES (value1, value2, ...);

-- Example
INSERT INTO employees (first_name, last_name, salary) 
VALUES ('John', 'Doe', 50000);

UPDATE

The UPDATE statement modifies existing records in a table.

-- Syntax
UPDATE table_name 
SET column1 = value1, column2 = value2, ... 
WHERE condition;

-- Example
UPDATE employees SET salary = 55000 WHERE first_name = 'John';

DELETE

The DELETE statement removes records from a table.

-- Syntax
DELETE FROM table_name WHERE condition;

-- Example
DELETE FROM employees WHERE last_name = 'Doe';

Data Control Language (DCL)

Data Control Language focuses on the management of access rights, permissions, and security-related aspects of a database system.

DCL commands are used to control who can access the data, modify the data, or perform administrative tasks within a database.

DCL is an important aspect of database security, ensuring that data remains protected and only authorised users have the necessary privileges.

There are two main DCL commands in SQL: GRANT and REVOKE.

GRANT

The GRANT command is used to provide specific privileges or permissions to users or roles. Privileges can include the ability to perform various actions on tables, views, procedures, and other database objects.

-- Syntax
GRANT privilege_type
ON object_name
TO user_or_role;

-- Example: Granting SELECT privilege
GRANT SELECT ON Employees TO Analyst;

REVOKE

The REVOKE command is used to remove or revoke specific privileges or permissions that have been previously granted to users or roles.

-- Syntax
REVOKE privilege_type
ON object_name
FROM user_or_role;

-- Example: Revoking SELECT privilege
REVOKE SELECT ON Employees FROM Analyst;

DCL and Database Security

DCL plays a crucial role in ensuring the security and integrity of a database system.

By controlling access and permissions, DCL helps prevent unauthorised users from tampering with or accessing sensitive data. Proper use of GRANT and REVOKE commands ensures that only users who require specific privileges can perform certain actions on database objects.

Transaction Control Language (TCL)

Transaction Control Language (TCL) deals with the management of transactions within a database.

TCL commands are used to control the initiation, execution, and termination of transactions, which are sequences of one or more SQL statements that are executed as a single unit of work.

Transactions ensure data consistency, integrity, and reliability in a database by grouping related operations together and either committing or rolling back changes based on the success or failure of those operations.

There are three main TCL commands in SQL: COMMIT, ROLLBACK, and SAVEPOINT.

COMMIT

The COMMIT command is used to permanently save the changes made during a transaction.

It makes all the changes applied to the database since the last COMMIT or ROLLBACK command permanent.

Once a COMMIT is executed, the transaction is considered successful, and the changes are made permanent.

-- Example: Committing changes
UPDATE Employees
SET Salary = Salary * 1.10
WHERE Department = 'Sales';

COMMIT;

ROLLBACK

The ROLLBACK command is used to undo changes made during a transaction.

It reverts all the changes applied to the database since the transaction began.

ROLLBACK is typically used when an error occurs during the execution of a transaction, ensuring that the database remains in a consistent state.

-- Example: Rolling back changes
BEGIN;

UPDATE Inventory
SET Quantity = Quantity - 10
WHERE ProductID = 101;

-- An error occurs here

ROLLBACK;

SAVEPOINT

The SAVEPOINT command creates a named point within a transaction, allowing you to set a point to which you can later ROLLBACK if needed.

SAVEPOINTs are useful when you want to undo part of a transaction while preserving other changes.

-- Syntax
SAVEPOINT savepoint_name;

-- Example: Using SAVEPOINT
BEGIN;

UPDATE Accounts
SET Balance = Balance - 100
WHERE AccountID = 123;

SAVEPOINT before_withdrawal;

UPDATE Accounts
SET Balance = Balance + 100
WHERE AccountID = 456;

-- An error occurs here

ROLLBACK TO before_withdrawal;

-- The first update is still applied

COMMIT;

TCL and Transaction Management

Transaction Control Language (TCL) commands are vital for managing the integrity and consistency of a database's data.

They allow you to group related changes into transactions, and in the event of errors, either commit those changes or roll them back to maintain data integrity.

TCL commands are used in combination with Data Manipulation Language (DML) and other SQL commands to ensure that the database remains in a reliable state despite unforeseen errors or issues.

JOINS

In a DBMS, a join is an operation that combines rows from two or more tables based on a related column between them.

Joins are used to retrieve data from multiple tables by linking them together using a common key or column.

Types of Joins:

Inner Join
Outer Join
Cross Join
Self Join

1) Inner Join

An inner join combines data from two or more tables based on a specified condition, known as the join condition.

The result of an inner join includes only the rows where the join condition is met in all participating tables.

It essentially filters out non-matching rows and returns only the rows that have matching values in both tables.

-- Syntax
SELECT columns
FROM table1
INNER JOIN table2
ON table1.Column = table2.Column;

Example:

Consider two tables: Customers and Orders.

Customers Table:
CustomerID	CustomerName
1	Alice
2	Bob
3	Carol

Orders Table:
OrderID	CustomerID	Product
101	1	Laptop
102	3	Smartphone
103	2	Headphones

-- Inner Join Query
SELECT Customers.CustomerName, Orders.Product
FROM Customers
INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

Result:
CustomerName	Product
Alice	Laptop
Bob	Headphones
Carol	Smartphone

2) Outer Join

Outer joins combine data from two or more tables based on a specified condition, just like inner joins. However, unlike inner joins, outer joins also include rows that do not have matching values in both tables.

Outer joins are particularly useful when you want to include data from one table even if there is no corresponding match in the other table.

Types:

There are three types of outer joins: left outer join, right outer join, and full outer join.

1. Left Outer Join (Left Join):

A left outer join returns all the rows from the left table and the matching rows from the right table.

If there is no match in the right table, the result will still include the left table's row with NULL values in the right table's columns.

-- Example
SELECT Customers.CustomerName, Orders.Product
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

2. Right Outer Join (Right Join):

A right outer join is similar to a left outer join, but it returns all rows from the right table and the matching rows from the left table.

If there is no match in the left table, the result will still include the right table's row with NULL values in the left table's columns.

-- Example
SELECT Customers.CustomerName, Orders.Product
FROM Customers
RIGHT JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

3. Full Outer Join (Full Join):

A full outer join returns all rows from both the left and right tables, including matches and non-matches.

If there's no match, NULL values appear in columns from the table where there's no corresponding value.

-- Example
SELECT Customers.CustomerName, Orders.Product
FROM Customers
FULL OUTER JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

3) Cross Join

A cross join, also known as a Cartesian product, is a type of join operation in a Database Management System (DBMS) that combines every row from one table with every row from another table.

Unlike other join types, a cross join does not require a specific condition to match rows between the tables. Instead, it generates a result set that contains all possible combinations of rows from both tables.

Cross joins can lead to a large result set, especially when the participating tables have many rows.

-- Syntax
SELECT columns
FROM table1
CROSS JOIN table2;

Example:

Consider two tables: Students and Courses.

Students Table:
StudentID	StudentName
1	Alice
2	Bob

Courses Table:
CourseID	CourseName
101	Maths
102	Science

-- Cross Join Query
SELECT Students.StudentName, Courses.CourseName  
FROM Students  
CROSS JOIN Courses;

Result:
StudentName	CourseName
Alice	Maths
Alice	Science
Bob	Maths
Bob	Science

4) Self Join

A self join involves joining a table with itself.

This technique is useful when a table contains hierarchical or related data and you need to compare or analyse rows within the same table.

Self joins are commonly used to find relationships, hierarchies, or patterns within a single table.

In a self join, you treat the table as if it were two separate tables, referring to them with different aliases.

-- Syntax
SELECT columns
FROM table1 AS alias1
JOIN table1 AS alias2 ON alias1.column = alias2.column;

Example:

Consider an Employees table that contains information about employees and their managers.

Employees Table:
EmployeeID	EmployeeName	ManagerID
1	Alice	3
2	Bob	3
3	Carol	NULL
4	David	1

-- Self Join Query
SELECT e1.EmployeeName AS Employee, e2.EmployeeName AS Manager
FROM Employees AS e1
JOIN Employees AS e2 ON e1.ManagerID = e2.EmployeeID;

Result:
Employee	Manager
Alice	Carol
Bob	Carol
David	Alice

SET OPERATIONS

Set operations in SQL are used to combine or manipulate the result sets of multiple SELECT queries.

They allow you to perform operations similar to those in set theory, such as union, intersection, and difference, on the data retrieved from different tables or queries.

Set operations provide powerful tools for managing and manipulating data, enabling you to analyse and combine information in various ways.

There are four primary set operations in SQL:

UNION
INTERSECT
EXCEPT (or MINUS)
UNION ALL

1. UNION:

The UNION operator combines the result sets of two or more SELECT queries into a single result set.

It removes duplicates by default, meaning that if there are identical rows in the result sets, only one instance of each row will appear in the final result.

Example:

Assume we have two tables: Customers and Suppliers.

Customers Table:
CustomerID	CustomerName
1	Alice
2	Bob

Suppliers Table:
SupplierID	SupplierName
101	SupplierA
102	SupplierB

-- UNION Query
SELECT CustomerName FROM Customers  
UNION  
SELECT SupplierName FROM Suppliers;

Result:
CustomerName
Alice
Bob
SupplierA
SupplierB

2. INTERSECT:

The INTERSECT operator returns the common rows that exist in the result sets of two or more SELECT queries.

It only returns distinct rows that appear in all result sets.

-- Example
SELECT CustomerName FROM Customers
INTERSECT
SELECT SupplierName FROM Suppliers;

In this example, there are no common names between customers and suppliers, so the result is an empty set.

3. EXCEPT (or MINUS):

The EXCEPT operator (also known as MINUS in some databases) returns the distinct rows that are present in the result set of the first SELECT query but not in the result set of the second SELECT query.

-- Example
SELECT CustomerName FROM Customers
EXCEPT
SELECT SupplierName FROM Suppliers;

In this example, the names "Alice" and "Bob" are customers but not suppliers, so they appear in the result set.

4. UNION ALL:

The UNION ALL operator performs the same function as the UNION operator but does not remove duplicates from the result set. It simply concatenates all rows from the different result sets.

-- Example
SELECT CustomerName FROM Customers
UNION ALL
SELECT SupplierName FROM Suppliers;

Difference between Set Operations and Joins

Aspect	Set Operations	Joins
Purpose	Manipulate result sets based on set theory principles.	Combine data from related tables based on specified conditions.
Data Source	Result sets of SELECT queries.	Tables that are related by common columns.
Combining Rows	Combine rows from different result sets. May remove duplicates.	Combine rows from different tables based on specified conditions.
Output Columns	Require the SELECT queries to have the same number of output columns and compatible data types.	Can combine columns from different tables, regardless of data types or column numbers.
Common Operations	UNION, INTERSECT, EXCEPT (MINUS).	INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN.
Conditional Requirements	No specific join conditions are required.	Require specified join conditions for combining data.
Handling Duplicates	UNION removes duplicates by default.	Joins do not inherently handle duplicates; it depends on the join type and data.
Usage Scenarios	Useful for combining and analysing related data from different queries or tables.	Used to retrieve and relate data from different tables based on their relationships.
Result Set Structure	Result sets may have different column names, but data types and counts must match.	Result sets can have different column names, data types, and counts.
Performance Considerations	Generally faster and less complex than joins.	Joins can be more complex and resource-intensive, especially for larger datasets.

SUB QUERIES

Subqueries, also known as nested queries or inner queries, allow you to use the result of one query (the inner query) as the input for another query (the outer query).

Subqueries are often used to retrieve data that will be used for filtering, comparison, or calculation within the context of a larger query.

They are a way to break down complex tasks into smaller, manageable steps.

-- Syntax
SELECT columns
FROM table
WHERE column OPERATOR (SELECT column FROM table WHERE condition);

Example:

Consider two tables: Products and Orders.

Products Table:
ProductID	ProductName	Price
1	Laptop	1000
2	Smartphone	500
3	Headphones	50

Orders Table:
OrderID	ProductID	Quantity
101	1	2
102	3	1

-- Example: Retrieve the product names and quantities for orders with a total cost greater than the average price of all products.
SELECT ProductName, Quantity
FROM Products
WHERE Price * Quantity > (SELECT AVG(Price) FROM Products);

Result:
ProductName	Quantity
Laptop	2

Differences Between Subqueries and Joins

Aspect	Subqueries	Joins
Purpose	Retrieve data for filtering, comparison, or calculation within the context of a larger query.	Combine data from related tables based on specified conditions.
Data Source	Result of one query used as input for another query.	Data from multiple related tables.
Combining Rows	Not used for combining rows; used to filter or evaluate data.	Combines rows from different tables based on specified join conditions.
Result Set Structure	Subqueries return scalar values, single-column results, or small result sets.	Joins return multi-column result sets.
Performance Considerations	Subqueries can be slower and less efficient, especially when dealing with large datasets.	Joins can be more efficient for combining data from multiple tables.
Complexity	Subqueries can be easier to understand for simple tasks or smaller datasets.	Joins can become complex, but are more suited for handling large-scale data retrieval and combination tasks.
Versatility	Subqueries can be used in various clauses: WHERE, FROM, HAVING, etc.	Joins are primarily used in the FROM clause for combining tables.

Window Functions

Window functions perform calculations across a set of table rows that are somehow related to the current row. Unlike regular aggregate functions, window functions do not cause rows to become grouped into a single output row — the rows retain their separate identities.

Key Characteristics:

Perform calculations on a set of rows related to the current row
Don't collapse rows like GROUP BY does
Can access rows before and after the current row
Include an OVER() clause that defines the window frame

Basic Syntax

function_name([arguments]) 
OVER (
    [PARTITION BY partition_expression, ... ]
    [ORDER BY sort_expression [ASC | DESC], ... ]
    [frame_clause]
)

Common Window Functions

1. Aggregate Functions as Window Functions

-- Running total
SELECT order_id, order_date, amount,
       SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;

-- Average by department
SELECT employee_id, name, department, salary,
       AVG(salary) OVER (PARTITION BY department) AS avg_department_salary
FROM employees;

2. Ranking Functions

-- ROW_NUMBER(): Unique sequential integers
SELECT product_id, product_name, price,
       ROW_NUMBER() OVER (ORDER BY price DESC) AS price_rank
FROM products;

-- RANK(): Rank with gaps for ties
SELECT student_id, name, score,
       RANK() OVER (ORDER BY score DESC) AS class_rank
FROM students;

-- DENSE_RANK(): Rank without gaps for ties
SELECT employee_id, name, salary,
       DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank
FROM employees;

3. Value Functions

-- LAG(): Access previous row
SELECT date, revenue,
       LAG(revenue, 1) OVER (ORDER BY date) AS prev_day_revenue,
       revenue - LAG(revenue, 1) OVER (ORDER BY date) AS daily_change
FROM daily_sales;

-- LEAD(): Access next row
SELECT employee_id, name, hire_date,
       LEAD(hire_date, 1) OVER (ORDER BY hire_date) AS next_hire_date
FROM employees;

-- FIRST_VALUE(): First value in window
SELECT product_id, month, sales,
       FIRST_VALUE(sales) OVER (PARTITION BY product_id ORDER BY month) AS first_month_sales
FROM product_sales;

Window Frame Specification

The window frame defines which rows are included in the window relative to the current row. Common frame specifications:

-- Rows between current row and 2 following
SUM(amount) OVER (ORDER BY date ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING)

-- Range of 3 days before and after
AVG(temperature) OVER (ORDER BY date RANGE BETWEEN INTERVAL '3' DAY PRECEDING 
                      AND INTERVAL '3' DAY FOLLOWING)

-- All rows before current row
SUM(sales) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

-- Default frame (for ORDER BY): RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
-- Default frame (no ORDER BY): RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

Practical Examples

1. Moving Averages

SELECT date, sales,
       AVG(sales) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS weekly_moving_avg
FROM daily_sales;

2. Percent of Total

SELECT department, expenses,
       expenses / SUM(expenses) OVER () * 100 AS percent_of_total
FROM department_budgets;

3. Difference from Average

SELECT student_id, test_score,
       test_score - AVG(test_score) OVER () AS difference_from_mean
FROM test_results;

4. Top N per Group

WITH ranked_products AS (
    SELECT product_id, category, sales,
           ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales DESC) AS rank
    FROM products
)
SELECT product_id, category, sales
FROM ranked_products
WHERE rank <= 3;

Window Functions vs. GROUP BY

Aspect	Window Functions	GROUP BY
Row Count	Preserves original row count	Reduces row count (one per group)
Aggregation	Can see detail rows and aggregates	Only shows aggregated results
Performance	Generally more expensive	Generally more efficient
Use Case	When you need both detail and aggregate data	When you only need aggregated results