Its obvious, isn't it? Any person who is working on data SHOULD know about data extraction or retrieval.

Hence, regardless of the data science role, having strong SQL skills is a major asset for anyone in the data field. It demonstrates your ability to work with data effectively, a crucial skill in today's data-driven world.

SQL

Company size and industry: Larger companies and those in data-driven industries like finance or healthcare will likely place a higher value on SQL skills.

Specific requirements of the job: Always pay attention to the job description. If it mentions SQL prominently, be prepared for a SQL-heavy interview.

{% module_block module "widget_58fc53b9-5ac7-433a-a1f8-c348bae63ee6" %}{% module_attribute "alignment" is_json="true" %}{% raw %}"text-center"{% endraw %}{% end_module_attribute %}{% module_attribute "button_style" is_json="true" %}{% raw %}"bg-transparent text-decoration-none d-inline-block border border-2 border-black font-secondary fs-6 px-4 py-3 rounded-3 text-black"{% endraw %}{% end_module_attribute %}{% module_attribute "button_text" is_json="true" %}{% raw %}"An extensive resource library to support you on your upskilling and professional development journey ahead."{% endraw %}{% end_module_attribute %}{% module_attribute "child_css" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "css" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "label" is_json="true" %}{% raw %}null{% endraw %}{% end_module_attribute %}{% module_attribute "link" is_json="true" %}{% raw %}"https://www.odinschool.com/learning-hub"{% endraw %}{% end_module_attribute %}{% module_attribute "module_id" is_json="true" %}{% raw %}113008019537{% endraw %}{% end_module_attribute %}{% module_attribute "schema_version" is_json="true" %}{% raw %}2{% endraw %}{% end_module_attribute %}{% module_attribute "tag" is_json="true" %}{% raw %}"module"{% endraw %}{% end_module_attribute %}{% end_module_block %}

Basic SQL interview questions

Following are basic SQL interview questions. A recruiter can consider them as SQL interview questions for freshers.

Q1: What is the difference between SELECT, INSERT, UPDATE, and DELETE statements?

SELECT: retrieves data from a table.
NSERT: Adds new records to a table.
UPDATE: Modifies existing records in a table.
DELETE: Removes records from a table.

Q2: How do you sort the results of a query using the ORDER BY clause?

The ORDER BY clause sorts the results of a query based on a specified column, either ascending (ASC) or descending (DESC).

SELECT * FROM Customers ORDER BY City ASC;

Q3: Write a SQL query to find the top 3 countries with the highest average purchase amount per customer, considering only countries where the average purchase amount exceeds $100 and there are at least 50 customers.

SELECT Country, AVG(TotalAmount) AS AvgPurchaseAmount
FROM (
SELECT c.Country, o.CustomerID, SUM(o.ProductPrice * o.Quantity * (1 - o.Discount)) AS TotalAmount
FROM Orders o
JOIN Customers c ON o.CustomerID = c.CustomerID
GROUP BY c.Country, o.CustomerID
) AS CustomerPurchaseAmounts
GROUP BY Country
HAVING AVG(TotalAmount) > 100 AND COUNT(DISTINCT CustomerID) >= 50
ORDER BY AvgPurchaseAmount DESC
LIMIT 3;

Q4: Write a SQL query to find the top 5 products that have been purchased the most frequently in consecutive months by the same customer, considering only those products that have been purchased for at least 6 consecutive months.

WITH MonthlyPurchaseCounts AS (
SELECT CustomerID, ProductID, EXTRACT(YEAR_MONTH FROM OrderDate) AS YearMonth, COUNT(*) AS PurchaseCount
FROM Orders
GROUP BY CustomerID, ProductID, EXTRACT(YEAR_MONTH FROM OrderDate)
),
ConsecutiveMonthGroups AS (
SELECT CustomerID, ProductID, YearMonth,
ROW_NUMBER() OVER (PARTITION BY CustomerID, ProductID ORDER BY YearMonth) -
ROW_NUMBER() OVER (PARTITION BY CustomerID, ProductID, PurchaseCount ORDER BY YearMonth) AS MonthDiff
FROM MonthlyPurchaseCounts
),
ConsecutiveMonths AS (
SELECT CustomerID, ProductID, MIN(YearMonth) AS StartMonth, MAX(YearMonth) AS EndMonth, COUNT(*) AS ConsecutiveMonths
FROM ConsecutiveMonthGroups
GROUP BY CustomerID, ProductID, MonthDiff
HAVING COUNT(*) >= 6
)
SELECT ProductID, SUM(PurchaseCount) AS TotalPurchaseCount
FROM ConsecutiveMonths
GROUP BY ProductID
ORDER BY TotalPurchaseCount DESC
LIMIT 5;

Q5: What is the difference between a GROUP BY clause and a HAVING clause?

GROUP BY: Groups rows based on shared values in one or more columns. HAVING: Filters groups created by GROUP BY based on a condition applied to aggregate functions.

SELECT City, COUNT(*) AS CustomerCount
FROM Customers
GROUP BY City
HAVING CustomerCount > 100;

Q6: Write a SQL query to find the customers who have made purchases in all product categories available in the store.

SELECT CustomerID
FROM (
SELECT CustomerID, COUNT(DISTINCT ProductCategoryID) AS CategoryCount
FROM Orders
GROUP BY CustomerID
) AS CustomerCategoryCounts
WHERE CategoryCount = (SELECT COUNT(DISTINCT ProductCategoryID) FROM ProductCategories);

Q7: What is a subquery, and how can it be used in SQL?

A subquery is a nested query embedded within another query. It allows you to perform more complex data retrieval based on conditions involving multiple tables.

Q8: Explain the concept of NULL values in SQL and how they differ from empty strings.

NULL represents the absence of a value, while an empty string is a character data type with no characters. NULL indicates "unknown" or "not applicable," while an empty string is a specific value (though sometimes unintended).

Q9: How can you combine results from multiple SELECT statements into a single result set? Explain the concept of UNION and UNION ALL.

UNION: Returns distinct rows (duplicates are removed). UNION ALL: Returns all rows, including duplicates.

SELECT CustomerName FROM Customers WHERE City = 'New York'
UNION
SELECT CustomerName FROM Customers WHERE City = 'Los Angeles';

Q10: Describe the concept of data integrity and how constraints like PRIMARY KEY and FOREIGN KEY help maintain it.

Data integrity ensures data accuracy and consistency within a database.

PRIMARY KEY: Enforces a unique identifier for each row in a table, preventing duplicate records.
FOREIGN KEY: Creates a link between two tables, referencing the PRIMARY KEY of another table, ensuring referential integrity (valid relationships between data points).

Q11: Explain the concept of transactions in SQL and the ACID properties (Atomicity, Consistency, Isolation, Durability).

A transaction is a unit of work involving multiple database operations. ACID properties ensure data integrity during transactions:

Atomicity: All operations succeed or fail as a whole.
Consistency: The database transitions from one valid state to another.
Isolation: Concurrent transactions do not interfere with each other.
Durability: Committed transactions persist even in the event of system failures.

Q12: Write a query to find the top N customers based on their total order amount.

ORDER BY: Order customers by total order amount (DESC for descending).

LIMIT: Specify the number of top N results to retrieve.

SELECT CustomerID, CustomerName, SUM(OrderAmount) AS TotalAmount
FROM Orders O
INNER JOIN Customers C ON O.CustomerID = C.CustomerID
GROUP BY CustomerID, CustomerName
ORDER BY TotalAmount DESC
LIMIT 10;

Q13: Describe the difference between a correlated subquery and a non-correlated subquery.

Non-correlated subquery: Evaluated independently for each row in the outer query. Doesn't reference any column from the outer query. Example (Non-correlated): Find orders with an amount exceeding the average order amount:

SELECT * FROM Orders
WHERE OrderAmount > (SELECT AVG(OrderAmount) FROM Orders);

Correlated subquery: References a column from the outer query in its WHERE clause. Evaluated once for each row in the outer query. Example (Correlated): Find employees who earn more than their manager:

SELECT e.EmployeeName
FROM Employees e
WHERE e.Salary > (SELECT ManagerSalary FROM Employees m WHERE m.EmployeeID = e.ManagerID);

Q14: Explain the concept of window functions with a use case.

Window functions operate on a set of rows within a query partition (defined by a window clause). They allow for calculations like ranking, moving averages, or cumulative sums within the result set.

Example: Rank employees within each department by salary:

SELECT EmployeeName, Department, RANK() OVER (PARTITION BY Department ORDER BY Salary DESC) AS Rank
FROM Employees;

Q15: Explain the concept of temporary tables and their advantages/disadvantages compared to permanent tables.

Temporary tables: Exist only for the duration of a session or transaction. Useful for intermediate calculations within complex queries without modifying existing data. Advantages: Improve query performance for complex operations, avoid modifying existing data unintentionally. Disadvantages: Not persistent, data is lost after the session ends.

Q16: Describe how to handle errors and exceptions in SQL.

Most databases offer mechanisms for handling errors within SQL statements. Techniques include:

TRY...CATCH block: Similar to programming languages, allows defining code to execute if an error occurs (CATCH block).
RAISE statement: Explicitly raise a custom error message to handle specific situations.

Q17: Describe the concept of triggers and their use cases in SQL.

Triggers are stored procedures automatically executed based on specific events (e.g., INSERT, UPDATE, DELETE) on a table. They allow for automated data validation, maintaining referential integrity, or logging changes.

Example: Trigger to enforce a minimum order amount:

CREATE TRIGGER validate_order_amount
BEFORE INSERT ON Orders
FOR EACH ROW
BEGIN
IF NEW.OrderAmount < 100 THEN
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Minimum order amount is $100';
END IF;
END;

Q18: Describe some basic techniques for backing up and restoring a database.

Database management system (DBMS) tools: Most DBMS offer built-in utilities for creating backups (full or incremental) at scheduled intervals.

SQL commands: Some databases allow using commands like EXPORT and IMPORT to export data to a file and restore it later.

Q19: Describe some best practices for writing efficient and maintainable SQL queries.

Use clear and descriptive naming conventions for tables, columns, and aliases.
Proper indentation and formatting to enhance readability.
Optimize WHERE clause conditions to avoid full table scans.
Utilize appropriate data types for columns based on the expected data.
Use comments to explain complex logic within the query.

Q20: Explain the concept of ETL (Extract, Transform, Load) and its role in data integration.

ETL: A process for extracting raw data from various sources, transforming it into a suitable format for storage and analysis, and loading the transformed data into a target system.
Role in data integration: Enables seamless data transfer and transformation between different data sources, ensuring data consistency and completeness for analysis.

By understanding these advanced concepts and best practices, you'll showcase your potential for tackling more intricate SQL challenges in the workplace.

Intermediate SQL interview questions

Following are SQL interview questions for less experienced candidates. This listing also includes SQL query interview questions (you should be able to come up with your own hypothetical scenario in such cases if needed).

Q1: Write a query to find all orders with an order amount exceeding the average order amount for that specific customer.

SELECT o.order_id, o.customer_id, o.amount
FROM orders o
WHERE o.amount > (
SELECT AVG(amount)
FROM orders o2
WHERE o2.customer_id = o.customer_id
);

Q2: Write a query to find the top 5 most popular products (based on total quantity ordered) across all orders.

SELECT product_id, SUM(quantity) AS total_quantity
FROM order_products
GROUP BY product_id
ORDER BY total_quantity DESC
LIMIT 5;

Q3: Write a query to transform the data, showing total sales quantity for each product across all dates and stores in separate columns.

SELECT sd.store_id,
p.product_id,
SUM(CASE WHEN op.product_id = p.product_id THEN op.quantity ELSE 0 END) AS product_quantity
FROM sales_data sd
LATERAL JOIN (SELECT DISTINCT product_id FROM sales_data) p ON TRUE -- Virtual table with all products
LEFT JOIN sales_data op ON sd.store_id = op.store_id AND sd.date = op.date
GROUP BY sd.store_id, p.product_id;

Q4: Write a query to find the top 3 actions performed by each user within the last day, ranked by their timestamps (earliest being 1).

SELECT user_id, action_type,
DENSE_RANK() OVER (PARTITION BY user_id ORDER BY timestamp) AS action_rank
FROM user_activity
WHERE timestamp >= DATE_SUB(CURDATE(), INTERVAL 1 DAY)
ORDER BY user_id, action_rank;

Q5: You have two tables: customers (customer_id, name, country) and orders (order_id, customer_id, order_date, amount). Write a query to find the total number of orders placed by customers from each country in the last quarter, excluding orders with an amount exceeding $1000.

SELECT p.product_id, p.name, p.category
FROM products p
WHERE p.average_rating > (
SELECT AVG(rating)
FROM ratings
);

WITH ProductAverage (product_id, average_rating) AS (
SELECT product_id, AVG(rating) AS average_rating
FROM ratings
GROUP BY product_id
)
UPDATE products p
SET p.average_rating = pa.average_rating
FROM ProductAverage pa
WHERE p.product_id = pa.product_id;

Q6: You have an employees table with employee_id, manager_id, department_id, and salary columns. Write a query to find the total payroll cost for each department, considering all employees within the department hierarchy (including managers and their subordinates).

WITH DepartmentPayroll (department_id, employee_id, salary, level) AS (
SELECT e.department_id, e.employee_id, e.salary, 1
FROM employees e
WHERE e.manager_id IS NULL -- Starting point: Department heads
UNION ALL
SELECT dp.department_id, e.employee_id, e.salary, dp.level + 1
FROM DepartmentPayroll dp
INNER JOIN employees e ON dp.department_id = e.department_id
)
SELECT dp.department_id, SUM(salary) AS total_payroll
FROM DepartmentPayroll dp
GROUP BY dp.department_id;

Q7: Write a query to find the 7-day moving average of sales for each store.

SELECT sd.date, sd.store_id,
AVG(sales_amount) OVER (PARTITION BY sd.store_id ORDER BY sd.date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_average
FROM sales_data sd;

Q8: You have three tables: customers (customer_id, name, country), orders (order_id, customer_id, order_date, amount), and products (product_id, name, category). Write a query to find the total number of orders placed and the total revenue generated for each country, separating domestic and international orders (assuming a reference country).

SELECT c.country,
COUNT(*) AS total_orders,
SUM(CASE WHEN o.customer_id = c.customer_id THEN o.amount ELSE 0 END) AS domestic_revenue,
SUM(CASE WHEN o.customer_id != c.customer_id THEN o.amount ELSE 0 END) AS international_revenue
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN products p ON o.product_id = p.product_id
WHERE c.country = 'Your Reference Country' -- Replace with your reference country
OR c.country != 'Your Reference Country'
GROUP BY c.country;

Q9: Write a query to find all users who have performed a specific action ("login") at least once but haven't performed any other actions ("logout") within the last hour.

SELECT ua.user_id
FROM user_activity ua
WHERE ua.action_type = 'login'
AND NOT EXISTS (
SELECT 1
FROM user_activity ua2
WHERE ua2.user_id = ua.user_id
AND ua2.action_type = 'logout'
AND ua2.timestamp >= DATE_SUB(CURDATE(), INTERVAL 1 HOUR)
);

Q10: Write a SQL query to find the top 5 departments with the highest average salary, but only consider departments where the number of employees is greater than 10, and order the result by the department name alphabetically.

SELECT Department, AVG(Salary) AS AvgSalary
FROM Employees
GROUP BY Department
HAVING COUNT(*) > 10
ORDER BY Department
LIMIT 5;

Q11: Write a SQL query to find the cumulative sum of salaries for each department, but only consider departments where the cumulative sum is greater than the average cumulative sum of all departments.

WITH DeptSalary AS (
SELECT Department, Salary,
SUM(Salary) OVER (PARTITION BY Department ORDER BY Salary) AS CumulativeSalary
FROM Employees
),
AvgCumulative AS (
SELECT AVG(CumulativeSalary) AS AvgCumulativeSalary
FROM DeptSalary
)
SELECT Department, CumulativeSalary
FROM DeptSalary
INNER JOIN AvgCumulative ON DeptSalary.CumulativeSalary > AvgCumulative.AvgCumulativeSalary;]

Q12: Write a SQL query to find the top 3 most common categories that occur together in the "ProductCategories" table

SELECT Category1, Category2, COUNT(*) AS CombinationCount
FROM ProductCategories pc1
INNER JOIN ProductCategories pc2 ON pc1.ProductID = pc2.ProductID AND pc1.CategoryID < pc2.CategoryID
GROUP BY Category1, Category2
ORDER BY CombinationCount DESC
LIMIT 3;

Q13: Write a SQL query to find the total sales amount for each month of the year, including months with zero sales, from the "Sales" table.<