Database and AI Blog: SQL (Structured Query Language)

SQL (Structured Query Language) is a specialized programming language designed for managing and interacting with relational databases. It is very important for data analysis also.

Here's a breakdown:

Relational Databases: These databases organize information in a structured way, using tables with rows and columns. Each row represents a record, and each column represents a specific attribute of that record.

SQL's Role:

Data Definition Language (DDL): Creates and modifies the database structure.

Examples: CREATE TABLE, ALTER TABLE, DROP TABLE

Data Manipulation Language (DML): Manipulates data within the database.

Examples: SELECT, INSERT, UPDATE, DELETE

Data Control Language (DCL): Controls access to the database.

Examples: GRANT, REVOKE

Key SQL Commands:

SELECT: Retrieves data from one or more tables.

FROM: Specifies the table(s) to retrieve data from.

WHERE: Filters the data based on specific conditions.

ORDER BY: Sorts the results based on one or more columns.

GROUP BY: Groups rows that have the same values in specified columns.

JOIN: Combines data from two or more tables based on a related column.

INSERT: Adds new rows to a table.

UPDATE: Modifies existing data in a table.

DELETE: Removes rows from a table.

Example:

SELECT *

FROM Customers

WHERE Country = 'USA';

This query retrieves all information (*) from the "Customers" table where the "Country" column equals "USA".

Why is SQL Important?

Foundation of Data Management:

It's the standard language for interacting with relational databases, a core technology in many industries.

Data Analysis and Decision Making: SQL enables powerful data analysis and reporting, crucial for businesses to make informed decisions.

Widely Used: Found in various applications, from e-commerce websites to financial systems and scientific research.

Here's why SQL is considered a language:

Syntax and Grammar: SQL has its own specific syntax and grammar rules. It uses keywords, operators, and clauses to form statements that instruct the database on what to do.
Data Manipulation: SQL commands allow you to manipulate data within the database, such as:

Retrieving data: Using SELECT to extract specific information from tables.
Inserting data: Adding new records to tables using INSERT.
Updating data: Modifying existing data using UPDATE.
Deleting data: Removing records from tables using DELETE.

Data Definition: SQL commands also define the structure of the database itself:

Creating tables: Using CREATE TABLE to define the structure of tables (columns, data types).
Modifying table structures: Using ALTER TABLE to add, modify, or delete columns.
Creating databases: Defining and managing databases.

Data Control: SQL provides commands for managing access to the database, such as:

Granting and revoking user permissions: Controlling who can access and modify data.

Key Characteristics of SQL as a Language:

Declarative: SQL focuses on describing what data you want to retrieve or manipulate, rather than how to retrieve it. This makes it easier to write and understand SQL queries.
Domain-Specific: SQL is specifically designed for interacting with relational databases. It's not a general-purpose programming language like Python or Java.
Standard-Based: While there are variations between different database systems, SQL adheres to a set of standards, ensuring a degree of portability across different platforms.

In essence, SQL possesses the fundamental characteristics of a programming language: it has its own syntax, allows for the execution of instructions, and enables the creation of complex logic for interacting with a specific type of system (relational databases).

SQL Joins

In SQL, joins are used to combine data from two or more tables based on a related column. This allows you to retrieve information that is spread across multiple tables within your database. Here's a breakdown of common join types:

1. INNER JOIN

Purpose: Returns rows where there is a match in both tables based on the join condition.
Syntax:

SQL

SELECT column1, column2, ...

FROM table1

INNER JOIN table2

ON table1.column = table2.column;

2. LEFT JOIN

Purpose: Returns all rows from the "left" table (the one specified before LEFT JOIN) and the matching rows from the "right" table. If there's no match in the right table, it returns NULL values for the columns from the right table.
Syntax:

SQL

SELECT column1, column2, ...

FROM table1

LEFT JOIN table2

ON table1.column = table2.column;

3. RIGHT JOIN

Purpose: Returns all rows from the "right" table and the matching rows from the "left" table. If there's no match in the left table, it returns NULL values for the columns from the left table.
Syntax:

SQL

SELECT column1, column2, ...

FROM table1

RIGHT JOIN table2

ON table1.column = table2.column;

4. FULL OUTER JOIN

Purpose: Returns all rows from both tables, including rows with no match in the other table. For unmatched rows, it returns NULL values for the columns from the other table.
Syntax:

SQL

SELECT column1, column2, ...

FROM table1

FULL OUTER JOIN table2

ON table1.column = table2.column;

Example:

Let's say you have two tables:

Customers: CustomerID, CustomerName
Orders: OrderID, CustomerID, OrderDate

To get a list of all customers and their orders (including customers with no orders):

SQL

SELECT Customers.CustomerName, Orders.OrderID, Orders.OrderDate

FROM Customers

LEFT JOIN Orders

ON Customers.CustomerID = Orders.CustomerID;

This query uses a LEFT JOIN to include all customers, even if they haven't placed any orders.

Key Considerations:

Choose the appropriate join type based on the specific requirements of your query.
Use clear and meaningful aliases for tables to improve readability.
Understand the relationships between the tables you are joining.

SQL Complex Joins

Complex joins involve combining data from multiple tables using a series of JOIN operations. They are essential for retrieving intricate relationships within your database. Here's a breakdown of some common scenarios and how to approach them:

1. Multiple Joins:

Scenario: You need to join data across three or more tables.
Approach: Chain multiple JOIN clauses together.

SQL

SELECT

c.CustomerName,

o.OrderID,

p.ProductName

FROM

Customers c

JOIN

Orders o ON c.CustomerID = o.CustomerID

JOIN

Products p ON o.ProductID = p.ProductID;

This query joins the Customers, Orders, and Products tables to retrieve customer names, order IDs, and product names for each order.

2. Self-Join:

Scenario: Joining a table with itself to find relationships within the same table.
Approach: Create an alias for the table and join it with itself using different column names.

SQL

SELECT

e1.EmployeeID,

e1.FirstName,

e2.FirstName AS ManagerName

FROM

Employees e1

JOIN

Employees e2 ON e1.ManagerID = e2.EmployeeID;

This query finds the names of employees and their respective managers.

3. Non-Equijoins:

Scenario: Joining tables based on conditions other than equality (e.g., greater than, less than).
Approach: Use comparison operators in the ON clause.

SQL

SELECT

c.CustomerID,

c.City,

o.OrderID

FROM

Customers c

JOIN

Orders o ON c.CustomerID = o.CustomerID AND o.OrderDate > '2023-01-01';

This query retrieves customer IDs and cities for customers who placed orders after January 1, 2023.

4. Cross Join:

Scenario: Combining every row from one table with every row from another table (Cartesian product).
Approach: Use the CROSS JOIN keyword or omit the ON clause in a regular join.

SQL

SELECT

c.CustomerID,

p.ProductID

FROM

Customers c

CROSS JOIN

Products p;

This query would generate a result set with every customer paired with every product.

Important Considerations:

Performance: Complex joins can impact query performance. Use indexes on the join columns to improve efficiency.
Readability: Use meaningful aliases for tables and columns to make your queries easier to understand.
Testing: Thoroughly test your queries to ensure they return the expected results.

By mastering these complex join techniques, you can effectively retrieve and analyze intricate relationships within your relational databases.

Role of Index in SQL :

Indexes in SQL play a crucial role in optimizing the performance of database queries, particularly for large datasets. Here's why they are important:

Faster Data Retrieval:

Reduced Search Time: Indexes act like an index in a book, allowing the database system to quickly locate specific rows without having to scan the entire table. This significantly speeds up queries that involve searching for data based on specific column values.
Improved Query Performance: Indexes are particularly beneficial for queries that use WHERE clauses, ORDER BY clauses, and JOIN operations.

Enhanced Database Performance:

Faster Queries: Faster query execution leads to improved overall database performance, resulting in quicker response times for applications that rely on the database.
Improved Scalability: Indexes can help databases handle larger volumes of data more efficiently, as queries can be executed more quickly even with a growing dataset.

Improved Data Integrity:

Unique Indexes: Unique indexes ensure that no duplicate values exist in a specific column, helping to maintain data integrity and consistency.

Analogy:

Imagine a library. Without an index, you would have to search through every book to find the information you need. With an index, you can quickly locate the relevant pages by looking up the topic in the index. Indexes in SQL work similarly, providing a faster way to locate and retrieve specific data within a table.

Key Considerations:

Index Creation: Creating indexes has a cost. It takes time and resources to create and maintain indexes.
Index Maintenance: Updates, inserts, and deletes on indexed columns can impact index performance. The database system needs to update the index to reflect the changes, which can add overhead.
Over-indexing: Creating too many indexes can actually slow down database performance, especially during data modification operations.

In summary:

Indexes are a vital component of efficient database management. By carefully selecting the columns to index, you can significantly improve query performance and overall database efficiency.

Subqueries

A subquery is a query within another SQL query. A subquery is also called inner query or inner select and the statement containing a subquery is called an outer query or outer select.

Note:

1. Subqueries must be enclosed within parenthesis,

2. An ORDER BY cannot be used in a subquery.

3. The image type such as BLOB, array, text datatypes are not allowed in subqueries.

Subqueries can be used with select, insert, update and delete statement within where, from, select clause along with IN, comparison operators, etc.

We have a table named ITCompanyInNepal on which we will perform queries to show subqueries examples:

Examples: SubQueries With Select Statement

with In operator and where clause:

SELECT * FROM ITCompanyInNepal

WHERE Headquarter IN (SELECT Headquarter FROM ITCompanyInNepal WHERE Headquarter = 'USA');

with comparison operator and where clause

SELECT * FROM ITCompanyInNepal WHERE NumberOfEmployee < (SELECT AVG(NumberOfEmployee) FROM ITCompanyInNepal )

with select clause

SELECT CompanyName, CompanyAddress, Headquarter, (Select SUM(NumberOfEmployee) FROM ITCompanyInNepal Where Headquarter = 'USA') AS TotalEmployeeHiredByUSAInKathmandu FROM ITCompanyInNepal WHERE CompanyAddress = 'Kathmandu' AND Headquarter = 'USA'

Subqueries with insert statement

We have to insert data from IndianCompany table to ITCompanyInNepal. The table for IndianCompany is shown below:

INSERT INTO ITCompanyInNepal SELECT * FROM IndianCompany

Subqueries with update statement

Suppose all the companies whose headquarter is USA decided to fire 50 employees from all US based companies of Nepal due to some change in policy of USA companies.

UPDATE ITCompanyInNepal SET NumberOfEmployee = NumberOfEmployee - 50 WHERE Headquarter IN (SELECT Headquarter FROM ITCompanyInNepal WHERE Headquarter = 'USA')

Subqueries with Delete Statement

Suppose all the companies whose headquarter is Denmark decided to shutdown their companies from Nepal.

DELETE FROM ITCompanyInNepal WHERE Headquarter IN (SELECT Headquarter FROM ITCompanyInNepal WHERE Headquarter = 'Denmark'

Union and Union all

Union operation combines the results of two or more queries into a single result set that includes all the rows that belong to all queries in the union and will ignore any duplicates that exist. Union all also does the same thing but include even the duplicate values. The concept of union operation will be clear from the example below. Few things to consider while using union are:

1.The number and the order of the columns must be the same in all queries.

2.The data types must be compatible.

Example:

We have three tables : Marksheet1, Marksheet2 and Marksheet3. Marksheet3 is the duplicate table of Marksheet2 which contains same values as that of Marksheet2.

Table1: Marksheet1

Table2: Marksheet2

Table3: Marksheet3

Union on tables Marksheet1 and Marksheet2

SELECT SubjectCode, SubjectName, MarksObtained FROM Marksheet1

UNION

SELECT CourseCode, CourseName, MarksObtained FROM Marksheet2

Note: The output for union of the three tables will also be same as union on Marksheet1 and Marksheet2 because union operation does not take duplicate values.

SELECT SubjectCode, SubjectName, MarksObtained FROM Marksheet1

UNION

SELECT CourseCode, CourseName, MarksObtained FROM Marksheet2

UNION

SELECT SubjectCode, SubjectName, MarksObtained FROM Marksheet3

OUTPUT

Union All

SELECT SubjectCode, SubjectName, MarksObtained FROM Marksheet1

UNION ALL

SELECT CourseCode, CourseName, MarksObtained FROM Marksheet2

UNION ALL

SELECT SubjectCode, SubjectName, MarksObtained FROM Marksheet3

You will notice here that the duplicate values from Marksheet3 are also displayed using union all.

SQL Aggregate Functions

Aggregate functions in SQL Server run calculations on sets of values, returning a single value.

SUM()

Returns sum of numeric values in a given column.

We have table as shown in figure that will be used to perform different aggregate functions. The table name is Marksheet.

Select SUM(MarksObtained) From Marksheet

The sum function doesn't consider rows with NULL value in the field used as parameter In the above example

if we have another row like this:

106 Italian NULL

This row will not be considered in sum calculation

AVG()

Returns average of numeric values in a given column.

We have table as shown in figure that will be used to perform different aggregate functions. The table name is Marksheet.

Select AVG(MarksObtained) From Marksheet

The average function doesn't consider rows with NULL value in the field used as parameter.

In the above example if we have another row like this:

106 Italian NULL

This row will not be considered in average calculation

MAX()

Returns the largest value in a given column.

We have table as shown in figure that will be used to perform different aggregate functions. The table name is Marksheet.

Select MAX(MarksObtained) From Marksheet

MIN()

Returns the smallest value in a given column.

We have table as shown in figure that will be used to perform different aggregate functions. The table name is Marksheet.

Select MIN(MarksObtained) From Marksheet

COUNT()

Returns the total number of values in a given column. We have table as shown in figure that will be used to perform different aggregate functions.

The table name is Marksheet.

Select COUNT(MarksObtained) From Marksheet

The count function doesn't consider rows with NULL value in the field used as parameter. Usually the count parameter is * (all fields) so only if all fields of row are NULLs this row will not be considered.

In the above example if we have another row like this:

106 Italian NULL

This row will not be considered in count calculation

NOTE The function COUNT(*) returns the number of rows in a table. This value can also be obtained by using a constant non-null expression that contains no column references, such as COUNT(1).

Example

Select COUNT(1) From Marksheet

COUNT(Column_Name) with GROUP BY Column_Name

Most of the time we like to get the total number of occurrence of a column value in a table for example:

TABLE NAME : REPORTS

ReportName ReportPrice

Test 10.00 $

Test 2 11.00 $

Test 10.00 $

Test 3 14.00 $

Test 4 100.00 $

SELECT ReportName AS REPORT NAME, COUNT(ReportName) AS COUNT FROM REPORTS GROUP BY ReportName

REPORT NAME COUNT

Test 4

Test2 1

Test3 2

Test4 1

Ranking Functions

Arguments	Details
partition_by_clause	Divides the result set produced by the FROM clause into partitions to which the DENSE_RANK function is applied. For the PARTITION BY syntax, see OVER Clause (Transact-SQL).
order_by_clause	Determines the order in which the DENSE_RANK function is applied to the rows in a partition
OVER ( [ partition_by_clause ] order_by_clause)	partition_by_clause divides the result set produced by the FROM clause into partitions to which the function is applied. If not specified, the function treats all rows of the query result set as a single group. order_by_clause determines the order of the data before the function is applied. The order_by_clause is required. The of the OVER clause cannot be specified for the RANK function. For more information, see OVER Clause (Transact-SQL).

DENSE_RANK ()

Same as that of RANK(). It returns rank without any gaps:

Select Studentid, Name,Subject,Marks, DENSE_RANK() over(partition by name order by Marks desc)Rank From Exam order by name

Studentid Name Subject Marks Rank

101 Ivan Science 80 1

101 Ivan Maths 70 2

101 Ivan Social 60 3

102 Ryan Social 70 1

102 Ryan Maths 60 2

102 Ryan Science 50 3

103 T Tanvi Maths 90 1

103 Tanvi Science 90 1

103 Tanvi Social 80 2

RANK()

A RANK() Returns the rank of each row in the result set of partitioned column.

Eg :

Select Studentid,Name,Subject,Marks, RANK() over(partition by name order by Marks desc)Rank From Exam order by name,subject

Studentid Name Subject Marks Rank

101 Ivan Maths 70 2

101 Ivan Science 80 1

101 Ivan Social 60 3

102 Ryan Maths 60 2

102 Ryan Science 50 3

102 Ryan Social 70 1

103 Tanvi Maths 90 1

103 Tanvi Science 90 1

103 Tanvi Social 80 3

Labels: Avg(), Cross Join, Data Analysis using SQL, DCL, DDL, DML, Equi Join, Max(), Min(), Rank(). Dense_Rank(), Self Join, SQL (Structured Query Language), SQL Joins, Subqueries, Sum(), Union and Union All

Database and AI Blog

Thursday, January 2, 2025

SQL (Structured Query Language)

0 Comments:

Post a Comment

About Me

Previous Posts