Introduction

Interviews for data analysts and data scientists typically include SQL questions. That's something that hasn't changed, even as new tools like Pandas, Julia, Spark and NoSQL attempt to dethrone and replace it. Typically, SQL data science interviews ask a range of questions, all the way from basic definition-based SQL questions, to advanced querying exercises and questions.

To help you prep for your SQL interview, we've curated some of the most common SQL practice problems, exercises and questions (with answers) in a range of categories.

Want to improve your SQL skills before your interview? Build your competency with our SQL course.

What Types of SQL Interview Questions Get Asked?

SQL is one of the most common topics in data science interviews. That's true across the board. According to Interview Query analysis, they're asked 70% of the time in Facebook data science interviews, and 94% of the time in Amazon's business intelligence interviews.

Ultimately, the most common questions for SQL interviews fall into seven key categories:

  • Definition based SQL questions
  • Basic SQL questions
  • Reporting and metrics SQL questions
  • Analytics SQL questions
  • ETL SQL questions
  • Database design questions
  • Logic based SQL questions

In other words, as you prepare for a SQL interview, test your knowledge in these areas. Practice questions, SQL exercises and courses can help you quickly increase your core competencies in each of these categories.

Definition Based SQL Questions

Definition-based SQL questions are commonly asked in interviews, and essentially, it's a way for interviewers to quickly gauge your SQL knowledge. To prepare, all you have to do is study a list of definitions of SQL terms and applications. These questions will include understanding the differences between SQL joins, what kinds of aggregations exist, and knowing the basic functions like CASE WHEN or HAVING. Useful definition-based questions for SQL interviews include:

Q1. What is a join in SQL?

A JOIN is a keyword used to merge together two or more tables on the same key.

Q2. Which SQL command is used to add rows to a table?

The INSERT command.

Q3. What is OLAP and OLTP? And when do you denormalize data?

OLTP are databases intended for online transaction processing and OLAP are databases intended for online analytical processing. Denormalize when its OLAP operations and normalize when OLTP.

Q4. What's the difference between WHERE and HAVING?

The main difference is that a WHERE clause is used to filter rows before grouping and HAVING is used to exclude records after grouping.

Q5. When do you use the CASE WHEN function?

CASE WHEN lets you write complex conditional statements on the SELECT clause and also allows you to pivot data from wide to long formats.

Q6. What's the difference between a LEFT JOIN and an INNER JOIN?

Essentially, a join is used to combine data from multiple data sources. A LEFT JOIN combines returns all rows from the left table, even if there are no matches in the right table. An INNER JOIN, on the other hand, only returns rows if there is a match in both of the tables.

Q7. When would you use UNION vs UNION ALL? What if there were no duplicates?

UNION is used to join multiple data sets. UNION is thusly used to combine two queries into a single result, based on the parameters of the query. Conversely, UNION ALL extracts all rows from a set of two tables, including duplicates.

Q8. What's the difference between COUNT and COUNT DISTINCT?

The COUNT function is used to count the number of rows specified by the query. It returns all the rows specified by a WHERE condition. COUNT DISTINCT eliminates duplicate values before the count is applied, or it returns the number of unique rows in a table. Additionally, it does not count rows with NULL values.

Q9. When would you use a HAVING clause versus a WHERE clause?

You would use WHERE and HAVING if you're using them together with the GROUP BY clause. For example, WHERE would be used to filter rows before the GROUP BY clause, while HAVING would filter rows after the clause.

Basic SQL Query Interview Questions

Basic SQL questions are generally easy and focus on assessing if you know the basics. For example, basic querying questions will require you to perform entry-level queries, like getting the COUNT of a table, knowing what the HAVING clause does, what a unique key is, and figuring out how to utilize LEFT JOIN versus INNER JOIN to give you the values that you need.

Q1. We're given two tables, a users table with demographic information and the neighborhood they live in and a neighborhoods table. Write a query that returns all of the neighborhoods that have 0 users.

users table:

columns type
id int
name varchar
neighborhood_id int
created_at datetime

neighborhoods table:

columns type
id int
name varchar
city_id int

Hint: Our aim to find all the neighborhoods without users. In a sense we need all the neighborhoods that do not have a singular user living in them. This means we have to introduce a concept of existence of a column in one table, but not in the other.
Empty Neighborhoods — Interview Query sql problem
users table columnstype idint namevarchar neighborhood_idint created_atdatetime neighborhoods table columnstype idint namevarchar city_idint We're
Try this question in our interactive SQL editor.

Q2. Write a query to return pairs of projects where the end date of one project matches the start date of another project.

projects table:

column type
id integer
title string
start_date datetime
end_date datetime
budget float

Output:

column type
project_title_end string
project_title_start string
date datetime

Here, we're trying to use values in a single table to generate comparisons. Specifically, we're looking for pairs of projects such that one project starts on the same day that another project ends.

This will require us to perform a self-join.

Hint: Remember, when performing a self-join, the order of your values is very important. You'll want to be particular about aliasing to make sure you keep your tables straight.
Project Pairs — Interview Query sql problem
projects table columntype idinteger titlestring start_datedatetime end_datedatetime budgetfloat Write a query to return pairs of projects
Try this question in our interactive SQL editor.

Q3. You're given a table that represents search results from searches on Facebook. The query column is the search term, position column represents each position the search result came in, and the rating column represents the human rating of the search result from 1 to 5 where 5 is high relevance and 1 is low relevance.

1. Write a query to compute a metric to measure the quality of the search results for each query.

2. You want to be able to compute a metric that measures the precision of the ranking system based on position. For example, if the results for dog and cat are....

query result_id position rating notes
dog 1000 1 2 picture of hotdog
dog 998 2 4 dog walking
dog 342 3 1 zebra
cat 123 1 4 picture of cat
cat 435 2 2 cat memes
cat 545 3 1 pizza shops

...we would rank 'cat' as having a better search result ranking precision than 'dog' based on the correct sorting by rating.

Write a query to create a metric that can validate and rank the queries by their search result precision. Round the metric (avg_rating column) to 2 decimal places.

search_results table:

column type
query varchar
result_id integer
position integer
rating integer

The ratings of each search result (1 to 5 where 5 is high relevance and 1 is low relevance) could be used to compute a metric that measures the quality of the search results for each query.

For example, if the search query for 'tiger' has 5s for each result, then that would average to be a perfect result. Therefore, taking the average of the ratings seems like a good way to measure the quality of the search results.

Search Ratings — Interview Query sql problem
`search_results` table columntype queryvarchar result_idinteger positioninteger ratinginteger You're given a table that represents
Try this question in our interactive SQL editor.

Q4. We're given a table called employers that consists of a user_id, year, and employer EIN label. Users can have multiple employers dictated by the different EIN labels.

Write a query to add a flag to each user if they've added a new employer in the last year in the table.

employers table:

column type
user_id integer
year string
employer_ein string

Example:

# employer
#
# user_id year employer_ein
# 34323      2018    A
# 34323      2018    B
# 34323      2018    C
# 34323      2017    F
# 34323      2017    A
# 34323      2017    B
#
# 86323      2018    A
# 86323      2018    B
# 86323      2018    C
# 86323      2017    B
#
# 98787      2018    A
# 98787      2018    B
# 98787      2018    F
# 98787      2017    F
# 98787      2017    B
# 98787      2017    A
#
# 55559      2018    A
# 55559      2018    B
# 55559      2018    C
# Output
# user_id year new_ein_flag
# 34323      2018      1
# 86323      2018      1
# 98787      2018      0
Employer EINs — Interview Query sql problem
employers table columntype user_idinteger yearstring employer_einstring We're given a table called employers that consists of a user_id,
Try this question in our interactive SQL editor.

Q5. Write a SQL query to select the 2nd highest salary in the engineering department. If more than one person shares the highest salary, the query should select the next highest salary.

employees
+---------------+---------+
| id            | int     |
| first_name    | varchar |
| last_name     | varchar |
| salary        | int     |
| department_id | int |--+
+---------------+---------+  |
|
departments                  |
+---------------+---------+  |
| id            | int     |<-+
| name          | varchar |
+---------------+---------+

Output:

column type
salary int

First, we need the name of the department to be associated with each employee in the employees table, to understand which department each employee is a part of.

The “department_id” field in the employees table is associated with the “id” field in the departments table. We call the “department_id” a foreign key because it is a column that references the primary key of another table, which in this case is the “id” field in the departments table.

Based on this common field, we can join both tables, using INNER JOIN, to associate the name of the department name to the employees that are a part of those departments.

SELECT salary
FROM employees
INNER JOIN departments
ON employees.department_id = departments.id

2nd Highest Salary — Interview Query sql problem
employees +---------------+---------+ | id | int || first_name | varchar | | last_name | varchar | | salary | int | | department_id | int |--+ +---------------+---------+
Try this questions in our interactive SQL editor.

Q6. Given the revenue transactions table above, write a query that finds the third purchase of every user.

`transactions` table:

column type
id integer
user_id integer
created_at datetime
product_id integer
quantity integer

Output:

column type
user_id integer
created_at datetime
product_id integer
quantity integer

We need an indicator of which purchase was the third value. Whenever we think of ranking our dataset, it's helpful to then immediately think of a specific window function we can use.

We need to apply the RANK function to the transactions table. The RANK function is a window function that assigns a rank to each row in the partition of the result set.

Third Purchase — Interview Query sql problem
`transactions` table columntype idinteger user_idinteger created_atdatetime product_idinteger quantityinteger Given the revenue
Try this question in our interactive SQL editor.
Want to start with the basics? Check out our Basic SQL Interview Questions guide or Three SQL Concept You Must Know.

Reporting and Metrics SQL Interview Questions

Example of reporting SQL queries to Dashboard

Reporting and metrics SQL questions are probably the most common type of SQL question to show up in interviews. Reporting SQL interview questions replicate the work that many business and reporting analysts do on a day-to-day basis. This means writing queries that end up in dashboards and key metrics.

These questions are generally some of the most common problems that analysts and data scientists will run into when pulling metrics for their day-to-day job. The output of what they need to pull is very clearly defined and the queries themselves require complex joins, sub-queries, self-joins, window functions and more.

Interested in doing this sort of work in the field of business intelligence? Check out our article on Business Intelligence Interview Questions on Interview Query.

Metrics-Based SQL: Easy Questions

Typically, easy SQL reporting questions tend to require basic knowledge of SQL functions. Some basic questions include:  

Q1. What's the total distance traveled for all riders on Lyft in the month of March?

Q2. Given three tables, representing customer transactions and customer attributes: Write a query to get the average order value by gender.

transactions table:

column type
id integer
user_id integer
created_at datetime
product_id integer
quantity integer

users table:

column type
id integer
name varchar
sex varchar

products table:

column type
id integer
name string
price float
Average Order Value — Interview Query sql problem
`transactions` table columntype idinteger user_idinteger created_atdatetime product_idinteger quantityinteger `users` table columntype idinteger namevarchar sexvarchar `products`

Q3. Given three tables: user_dimension, account_dimension, and download_facts, find the average number of downloads for free vs paying customers broken out by day.

Note: The account_dimension table maps users to multiple accounts where they could be a paying customer or not. Also, round average_downloads to 2 decimal places.

user_dimension table:

column type
user_id int
account_id int

account_dimension table:

column type
account_id int
paying_customer boolean

download_facts table:

column type
date date
user_id int
downloads int

Output:

column type
date date
paying_customer boolean
average_downloads float

Let's first break it down. What values in which tables can we join together to get the data that we need?

Ideally the data should be broken down such that we could easily graph the values to visualize two line plots of free vs paying users. The x-axis would represent the date and the y-axis would represent the average number of downloads.

The user_dimension table represents the mapping between account ids and user ids while the account_dimension table holds the value if the customer is paying or not. Lastly the download_facts table has the date and number of downloads per user.

SELECT *
FROM user_dimension AS ud
INNER JOIN account_dimension AS ad
    ON ud.account_id = ad.account_id
LEFT JOIN download_facts AS df 
    ON ud.user_id = df.user_id
Download Facts — Interview Query sql problem
user_dimension table columntype user_idint account_idint account_dimension table columntype account_idint paying_customerboolean download_facts table columntype datedate user_idint downloadsint Given
Try this question in our interactive SQL editor.

Reporting with SQL: Advanced Questions

Advanced SQL reporting questions for business analysts typically require some advanced SQL skills, as well as problem-solving ability. Some example exercises include:

Q4. Write a query to get the month-over-month change of new users in January.

Q5. Given a table of job postings, write a query to breakdown the number of users that have posted their jobs once versus the number of users that have posted at least one job multiple times.

job_postings table:

column type
id integer
job_id integer
user_id integer
date_posted datetime

This question is kind of complicated so it's helpful to break it into multiple steps. First let's visualize what the output would look like.

Output:

column type
posted_jobs_once int
posted_at_least_one_job_multiple_times int

We want the value of two different metrics, the number of users that have posted their jobs once versus the number of users that have posted at least one job multiple times. What does that mean exactly?

Well if a user has 5 jobs but only posted them once, then they are part of the first statement. But if they have a 5 jobs and posted a total of 7 times, that means that they had to at least posted one job multiple times.

To get to that point, we need a table with the count of user-job pairings and the number of times each job gets posted. We can pretty easily get to that point with just a simple GROUP BY on two variables, user_id and job_id.

Now we just need a way to differentiate the users that posted each job once from users that posted multiple times. Let's go back to our example. We can deduce that if we take the sum of the number of rows for each user and the sum of the number of times each job is posted, they must be equal for the user to have posted each job only once, since we grouped by the user and the job id.

Repeat Job Postings — Interview Query sql problem
`job_postings` table columntype idinteger job_idinteger user_idinteger date_posteddatetime Given a table of job postings, write a query
Try this question in our interactive SQL editor.

Q6. Write a query to get a histogram of the number of posts per user in 2020.

Q7. We're given a table of product purchases. Each row in the table represents an individual user product purchase.  Write a query to get the number of customers that were upsold by purchasing additional products.

Note: If the customer purchased two things on the same day that does not count as an upsell as they were purchased within a similar timeframe.

transactions table

column type
id integer
user_id integer
created_at datetime
product_id integer
quantity integer

Output:

column type
num_of_upsold_customers integer
Hint: An upsell is determined by multiple days by the same user. Therefore we have to group by both the date field and the user_id to get each transaction broken out by day and user.
Upsell Transactions — Interview Query sql problem
`transactions` table columntype idinteger user_idinteger created_atdatetime product_idinteger quantityinteger We&#39;re given
Try this question in our interactive SQL editor.

Q8. Given a table of students and their SAT test scores, write a query to return the two students with the closest test scores with the score difference.

If there are multiple students with the same minimum score difference, select the student name combination that is higher in the alphabet.

scores table:

column type
id integer
student varchar
score integer

Input:

id student score
1 Jack 1700
2 Alice 2010
3 Miles 2200
4 Scott 2100

Hint: Given the problem statement is referencing one table with only two columns, we have to self-reference different creations of the same table. It's helpful to think about this problem in the form of two different tables with the same values.
Closest SAT Scores — Interview Query sql problem
scores table columntype idinteger studentvarchar scoreinteger Given a table of students and their SAT test scores, write a query to return the two

Analytics SQL Interview Questions

Analytics SQL interview questions are some of the trickiest interview questions that you will face. This is because they test two concepts:

  1. Understanding what metrics we need to answer the question.
  2. Writing the correct SQL query that will output these metrics.

Analytics SQL interview questions are designed to test how you would think about solving a problem, and are purposely left more ambiguous than other types of problems. The tough part is that you not only have to think critically about what the SQL output has to look like, you also need to understand EXACTLY the right data points to pull.

For example, an interviewer might ask you to write a SQL query (given a few tables) to understand which AB test variant won. But there might not even be any understanding of what winning actually means.

Q1. We ran an A/B test on two different sign up funnels. Write a query to see which variant "won."

Q2. The schema below is for a retail online shopping company consisting of two tables, attribution and user_sessions.

  • The attribution table logs a session visit for each row.
  • If conversion is true, then the user converted to buying on that session.
  • The channel column represents which advertising platform the user was attributed to for that specific session.
  • Lastly the `user_sessions` table maps many to one session visits back to one user.

First touch attribution is defined as the channel to which the converted user was associated with when they first discovered the website.

Calculate the first touch attribution for each user_id that converted.

attribution table:

column type
session_id integer
channel string
conversion boolean

user_sessions table:

column type
session_id integer
created_at datetime
user_id integer

Output:

user_id channel
123 facebook
145 google
153 facebook
172 organic
173 email

Imagine the full path of a user converting to a purchase after multiple visits.

  • 1st Session: User sees Facebook ad -> Clicks to order -> Leaves
  • 2nd Session: User sees Google ad -> Leaves
  • 3rd Session: User types in website -> Clicks to order -> Purchases

How do we figure out the beginning path of the Facebook ad and connect it to the end purchasing user?

We need to do two actions: 1) subset all of the users that converted to customers and 2) figure out their first session visit to attribute the actual channel. We can do that by creating a sub-query that only gets the distinct users that have actually converted.

First Touch Attribution — Interview Query sql problem
`attribution` table columntype session_idinteger channelstring conversionboolean `user_sessions` table columntype session_idinteger created_atdatetime user_idinteger The
Try this question in our interactive SQL editor.

Q3. We're looking to understand the effect of a new Uber driver incentive promotion released in the past month on driver behavior. Write a query to figure out if the incentive worked as indicated.

Q4. Let's say we want to build a naive recommender. We're given two tables, one table called `friends` with a user_id and friend_id columns representing each user's friends, and another table called `page_likes` with a user_id and a page_id representing the page each user liked.

Write an SQL query to create a metric to recommend pages for each user based on recommendations from their friends liked pages.

Note: It shouldn't recommend pages that the user already likes.

friends table:

column type
user_id integer
friend_id integer

page_likes table:

column type
user_id integer
page_id integer

Output:

column type
user_id integer
page_id integer
num_friend_likes integer

Let's solve this problem by visualizing what kind of output we want from the query. Given that we have to create a metric for each user to recommend pages, we know we want something with a user_id and a page_id along with some sort of recommendation score.

Let's try to think of an easy way to represent the scores of each user_id and page_id combo. One naïve method would be to create a score by summing up the total likes by friends on each page that the user hasn't currently liked. Then the max value on our metric will be the most recommendable page.

The first thing we have to do is then to write a query to associate users to their friends liked pages. We can do that easily with an initial join between the two tables.

Liked Pages — Interview Query sql problem
`friends` table columntype user_idinteger friend_idinteger `page_likes` table columntype user_idinteger page_idinteger Let&#39;s

Q5. Amazon released a new recommendation widget on their landing page. Write a query to determine the impact the recommendation widget made on user behavior for one metric.

Q6. Write a query to show the number of users, number of transactions placed, and total order amount per month in the year 2020. Assume that we are only interested in the monthly reports for a single year (January-December).

transactions table:

column type
id integer
user_id integer
created_at datetime
product_id integer
quantity integer

products table:

column type
id integer
user_id integer
created_at datetime
product_id integer
quantity integer

users table:

column type
id integer
name varchar
sex varchar

Output:

month num_customers num_orders order_amt
1 300 550 12000
2 315 700 9000
3 290 900 20000
Hint: We've decided to COUNT the DISTINCT users in our transactions table to generate our number of monthly customers. How can we take advantage of the relationship between our transactions and products tables to calculate the total purchase amount per month of any given customer?
Monthly Customer Report — Interview Query sql problem
`transactions` table columntype idinteger user_idinteger created_atdatetime product_idinteger quantityinteger `products` table columntype idinteger namestring pricefloat `users`
Try this question in our interactive SQL editor.

Q7. Given a table of transactions and products, write a query to return the product id and average product price for that id. Only return the products where the average product price is greater than the average price of all transactions.

Product Average — Interview Query sql problem
`transactions` table columntype idinteger user_idinteger created_atdatetime product_idinteger quantityinteger `products` table columntype idinteger namestring pricefloat Given
Try this question in our interactive SQL editor.

Q8. Given a table of product subscriptions with a subscription start date and end date for each user, write a query that returns true or false whether or not each user has a subscription date range that overlaps with any other user.

subscriptions table:

column type
user_id int
start_date date
end_date date
Subscription Overlap — Interview Query sql problem
subscriptions table columntype user_idint start_datedate end_datedate Given a table of product subscriptions with a subscription start date and
Try this question in our interactive SQL editor.

Q9. In the table below, column action represents either ('post_enter', 'post_submit', 'post_canceled') for when a user starts a post (enter), ends up canceling it (cancel), or ends up posting it (submit).

Write a query to get the post success rate for each day in the month of January 2020.

events table:

column type
id integer
user_id integer
created_at datetime
action string
url string
platform string

Hint: Let's see if we can clearly define the metrics we want to calculate before just jumping into the problem. We want post success rate for each day over the past week.

To get that metric let's assume post success rate can be defined as:

(total posts created) / (total posts entered)

Additionally since the success rate must be broken down by day, we must make sure that a post that is entered must be completed on the same day.

Post Success — Interview Query sql problem
events table columntype idinteger user_idinteger created_atdatetime actionstring urlstring platformstring In the table above,
Try this question in our interactive SQL editor.

ETL SQL Interview Questions

ETL stands for "Extract, Transfer, Load" and describes the process for which data flows between different data warehousing systems.

Extract does the process of reading data from a database. Transform converts data into a format that could be appropriate for reporting, analysis, machine learning, etc., and Load writes the transformed data into another database, table, or any other data storage service that can be then used by another data scientist or engineer for reporting.

In the interview, ETL tools and concepts are important to know for virtually all roles. The more difficult interview questions, however, will likely be focused and asked in data engineering, business intelligence, and related interviews.

Q1. What's the difference between TRUNCATE and DROP?

Q2. What is a PRIMARY KEY in SQL syntax?

Q3. We have a table called `song_plays` that tracks each time a user plays a song. Let's say we want to create an aggregate table called `lifetime_plays` that records the song count by date for each user. Write a SQL query that could make this ETL each day.

column type
id integer
created_at datetime
user_id integer
song_id integer

Hint: For this problem, we use the INSERT INTO keywords to add rows into the lifetime_plays table. If we set this query to run daily, it becomes a daily extract, transform, and load (ETL) process.
Lifetime Plays — Interview Query database design problem
`song_plays` table columntype idinteger created_atdatetime user_idinteger song_idinteger We have a table called `song_plays` that tracks

Q4. What is a FOREIGN KEY?

Q5. Let's say you have a table with a billion rows. How would you add a column inserting data from the original source without affecting the user experience?

Hint: Before jumping into the question we should remember to clarify a few details that we can potentially get out of the interviewer. It helps to ask questions to understand and show that you can think holistically about the problem. Rushing too fast into a solution is a red flag for many interviewers.
Modifying a Billion Rows — Interview Query database design problem
Let&#39;s say you have a table with a billion rows.How would you add a column inserting data from the original source without affecting the user experience?

Q6. List an example of when you would add an INDEX to a table?

Q7. What's the difference between a PARTITION and an INDEX?

Q8. Let’s say we have a table representing a company payroll schema. Due to an ETL error, the employees table instead of updating the salaries every year when doing compensation adjustments, did an insert instead. The head of HR still needs the current salary of each employee.

Write a query to get the current salary for each employee.

employees table:

column type
id integer
first_name string
last_name string
salary integer
department_id integer
Note: Assume no duplicate combination of first and last names (e.g. two John Smiths)
Employee Salaries (ETL Error) — Interview Query sql problem
employees table columntype idinteger first_namestring last_namestring salaryinteger department_idinteger Let’s say we have a table

Q9. Does creating a view require storage in a database?

Q10. Let's say that we have two ETL jobs that feed into a single production table each day. Can you think of any problems this might bring up?

Database Design SQL Interview Questions

Database design SQL questions test your knowledge of data architecture and design. Most importantly, it tests whether you know how to design a database from scratch, given a business idea, application, or any other software that needs to interact with a database.

Many times, when databases need to scale for performance or breadth size of tables, we need to realize how to modify our database to fit the new requirements. Starting from a solid initial design is always important when building out databases.

Q1. Let's say we're working at Spotify. In the users table we have columns such as name and location but also columns like total_liked_songs and favorite_artist. Should we do something about this table?

Q2. Design a database to represent a Tinder style dating app. What does the schema look like and what are some optimizations that you think we might need?

Let’s first approach this problem by understanding the scope of the dating app and what functionality we must design around.

If we were to list the key Tinder app capabilities, it would be something like:

  • Onboarding - User opens up, adds preferences, add pictures, and starts swiping on users.
  • Matching - If the user matches with another user, we notify them and create a messaging system.
  • Messaging - Users can message between each other only if both have been matched. Users can also leave conversations at any time.
Swiping App Design — Interview Query database design problem
Design a database to represent a Tinder style dating app. What does the schema look like and what are some optimizations that you think we might need?
Try this question in our interactive SQL editor.

Q3. How would you design a database for a fast food restaurant?

Q4. Let's say we want to run some data collection on the Golden Gate bridge.

1. What would the table schema look like if we wanted to track how long each car took coming into San Francisco to enter and exit the bridge? Let's say we want to track additional descriptives like the car model and license plate.

2. Write a query on the given tables to get the time of the fastest car on the current day.

3. Write a query on the given tables to get the car model with the average fastest times for the current day. (Example: Let's say three Ferraris crossed the bridge in an average of one minute).

This question functions slightly more like data engineering or architecture program. Given a certain use case or application, we have to model how we want to store the information.

In this case we're given that we have to track time entered and exited leaving the bridge, but also the car make and model along with license plate information. We know that the car model to license plate information will be one to many, given that each license plate represents a single car, and a car model can be replicated many times.

Crossing Bridges — Interview Query database design problem
Let&#39;s say we want to run some data collection on the Golden Gate bridge.1. What would the table schema look like if we wanted to track how long each car took coming into San Francisco
Try this question in our interactive SQL editor.

Q5. Let's say we are productionizing a machine learning model. How would you set up the data architecture to ensure that the model serves all of our users in real time? How would you ensure that the model can be re-trained each day with new data coming in?

Q6. How would you create a schema to represent client click data on the web?

Hint: What exactly does click data on the web mean? Any form of button clicks, scrolls, or action at all is an interaction with the client interface–in this case desktop–and would somehow be represented into a schema form for the end user to query. This does not include client views. A simple but effective design schema would be to first represent each action with a specific label. In this case, assigning each click event a name or label describing its specific action.
Click Data Schema — Interview Query database design problem
How would you create a schema to represent client click data on the web?
Try this question in our interactive SQL editor.

Q7. Let's say we have a table representing vacation bookings. How would you make an aggregate table represented below called listing_bookings with values grouped by the listing_id and columns that represented the total number of bookings in the last 90 days, 365 days, and all time?

bookings table:

column type
id integer
guest_id integer
listing_id integer
date_check_in date
date_check_out date
ds_book (partition_key) string

listing bookings table:

column type
listing_id integer
num_bookings_last90d integer
num_bookings_last365d integer
num_bookings_total integer
ds (partition_key) string
Listing Bookings Aggregation — Interview Query database design problem
bookings table columntype idinteger guest_idinteger listing_idinteger date_check_indate date_check_outdate ds_book (partition_key)string Let&#39;s
Try this question in our interactive SQL editor.

Logic based SQL Interview Questions

Logic based SQL interview questions are very tricky. They aren't really based on real life examples so much as putting the trickiness of algorithms and data structure interviews into SQL questions. This is exemplified on sites such as LeetCode, where you'll see a lot of interview questions that aren't very practical for real life scenarios.

Q1. Write a query to create a new table, named flight routes, that displays unique pairs of two locations.

Example: Duplicate pairs from the flights table, such as Dallas to Seattle and Seattle to Dallas, should have one entry in the flight routes table.

flights table:

column type
id integer
source_location string
destination_location string

Flight Records — Interview Query sql problem
`flights` table columntype idinteger source_locationstring destination_locationstring Write a query to create a new table, named flight routes,
Try this question in our interactive SQL editor.

Q2. Let's say we have a table with an id and name field. The table holds over 100 million rows and we want to sample a random row in the table without throttling the database.

Write a query to randomly sample a row from this table.

Hint: We know that the RAND() function actually returns a floating-point between 0 and 1. So if we were to run this function, SELECT RAND(), we would get a random decimal point to some Nth degree of precision. RAND() essentially allows us to seed a random value. How can we use this to select a random row quickly?
Random SQL Sample — Interview Query sql problem
`big_table` columntype idint namevarchar Let&#39;s say we have a table with an id and name field. The table holds over 100 million rows and we want to sample
Try this question in our interactive SQL editor.

Q3. Given a users table, write a query to get the cumulative number of new users added by day, with the total reset every month.

users table:

columns type
id int
name varchar
created_at datetime
Hint: This question first seems like it could be solved by just running a COUNT(*) and grouping by date. Or maybe it's just a regular cumulative distribution function? But we have to notice that we are actually grouping by a specific interval of month and date. And that when the next month comes around, we want to the reset the count of the number of users.
Cumulative Reset — Interview Query sql problem
users table columnstype idint namevarchar created_atdatetime Given a users table, write a query to get the cumulative number of new users added by
Try this question in our interactive SQL editor.

Strategies for the live SQL interview

Let's go over the common strategies when tackling SQL interview questions.

1.Repeat the problem statement

When presented with a SQL question, listen carefully to the problem description and repeat back what you think the crux of the problem is. The interviewer can then help verify if your understanding is correct.

2. Understand the edge cases

If time permits, write out a base case and an edge case to show that you understand the problem. For example: if the interviewer asks you to pull the average number of events per user per day, write out an example scenario where you're verifying this metric.

Do duplicate events matter? Are we looking at distinct users? These are questions we need to clarify.

3. Try working backwards if the problem is tricky

Sketching out what the output of the SQL question will look like is a great strategy towards solving the problem. Usually, if I know what the end output table is supposed to look like, I can work backwards from there on what functions need to be applied before.

For example, if the output looks like this:

date        | average events per user
------------+-----------------------
2021-12-01  |  3.5
2021-12-02  |  4.0

I know that the table before this aggregation would have to look something like this.

date       | event | user_id
-----------+-------+--------
2021-12-01 | click | 1
2021-12-01 | view  | 1
......

And then, I can figure out what functions I should use to get to my desired output!

4. Pattern match to different functions

As you practice more and more SQL exercises, what you'll find is that many SQL problems follow similar patterns. There are techniques we can use in SQL, like utilizing HAVING on aggregations, self-joins and cross-joins, and applying window functions. But, additionally, we'll see problems that run in a similar vein.

For example, writing a query to get the second highest salary or writing a query to isolate every fifth purchase by a user utilizes the same RANK function in SQL.

Understanding the commonalities between questions will help you understand the first step to solving SQL questions faster because you can re-use similar code and stitch together techniques on top of each other.

5. Start writing SQL

Finally, it's important to just start writing SQL. It's better to start writing an imperfect solution vs trying to perfectly understand the problem or trying to perfect the solution on the first try.

Verbalize your assumptions and what you're doing as you write SQL and your interviewer can then be put on the same page as you.

SQL Study plan for your next interview

SQL interview questions on Interview Query are bucketed into easy, medium, and hard and go over every SQL concept that we went through in this blog post.

Use Interview Query to level yourself at how good you are at SQL. Here’s how you should approach each problem.

  1. Try the SQL question without running any SQL code in the engine. Give yourself 15 to 20 minutes to work on each problem by just reading the problem statement and using the editor.
  2. Work out all the bugs and make sure that when you click run or submit, you feel like this is the definitive solution that will work without syntax errors.
  3. If you get stuck, try the problem hint. The hints on certain questions will push you into learning how to think about solving the problem at hand.
  4. Try submitting your code to check if it passes our test cases. If it misses a few test cases, figure out why by debugging your code.
  5. If you need to debug your code, you can hit the RUN SQL button to try it on our test database. Run your code and check for syntax errors. If you get any, re-read your query to fix the necessary errors.
  6. Check your solution against ours. Take note of whether there are any optimizations you can use to write a more efficient query.

Thanks for Reading!

Start studying for your next SQL interview with resources from Interview Query. Try our SQL course or check out these company guides: Amazon SQL Questions and Google SQL Questions.