Table of Contents

Introduction

SQL is the good old friend that has always worked. It’s something you always come back to, even as Pandas, Julia, Spark, Hadoop, and NoSql attempt to dethrone and replace SQL as the new de-facto data tool.

Eventually though, they all fail in the face of the consistently reliable SQL. And that's why SQL continues to get asked in interviews.

A note before we start...
This guide should be used for anyone who is preparing for an interview in which they know SQL will show up. This guide is not a search engine optimized listacle (top 50 sql questions for 2021...really?).
Rather, this is real advice and REAL interview questions and exercises gathered from hundreds of data scientists, engineers, and analysts. We sprinkle exercises throughout this post after learning concepts. Be sure to try attempting the questions first before we walk through solving them.  
Lastly, if you enjoy this article, please give us a share and check out our SQL course that goes a little deeper with more exercises and problems.

1. Why does SQL show up on the interview?

SQL allows data scientists and engineers to do a couple of important things.

One is to effectively store and retrieve information at scale for analytics. Even though Google Sheets allows users to easily manipulate and visualize data, it cannot store and scale like a SQL database can. Other popular programs –namely Hadoop and Spark– can scale much further than SQL, but still don’t have a clean and easy-to-use language like SQL to retrieve data efficiently.

Another great thing about SQL is that understanding the fundamentals bridges the gap between engineering and data science. Knowing SQL well gives you a competitive edge over any other candidate, whether you're competing for a position as a product manager, software engineer, or even as a business analyst. Having the skillset to write and pull your own queries is like being a magician that can come up with analyses out of thin air.

And at the end of the day, you could just be really good at SQL if you wanted to and make tons of money creating ETL jobs or pulling dashboards with efficiency. That's how valued SQL is.

How often does SQL show up in interviews?

One prevailing question is how often SQL shows up in interviews.

At Interview Query, we analyzed a dataset of Glassdoor data science interview experiences and responses submitted by our users. The analysis came back that SQL was asked:

Skills tested in Facebook's Data Science Interview as of 2021

Due to its nature in being able to get and manipulate your own data, this is by far the most important skill now towards nabbing a data science position. And while pandas and other languages are useful, note that SQL will always be the one that matters.

2. Strategies for the live SQL interview

Let's go over the common strategies when tackling SQL interview questions.

1.Repeat the problem statement

When presented with a SQL question, listen carefully to the problem description and repeat back what you think the crux of the problem is. The interviewer can then help verify if your understanding is correct.

2. Understand the edge cases

If time permits, write out a base case and an edge case to show that you understand the problem. For example: if the interviewer asks you to pull the average number of events per user per day, write out an example scenario where you're verifying this metric.

Do duplicate events matter? Are we looking at distinct users? These are questions we need to clarify.

3. Try working backwards if the problem is tricky

Sketching out what the output of the SQL question will look like is a great strategy towards solving the problem. Usually, if I know what the end output table is supposed to look like, I can work backwards from there on what functions need to be applied before.

For example, if the output looks like this:

date        | average events per user
------------+-----------------------
2021-12-01  |  3.5
2021-12-02  |  4.0

I know that the table before this aggregation would have to look something like this.

date       | event | user_id
-----------+-------+--------
2021-12-01 | click | 1
2021-12-01 | view  | 1
......

And then, I can figure out what functions I should use to get to my desired output!

4. Pattern match to different functions

As you practice more and more SQL exercises, what you'll find is that many SQL problems follow similar patterns. There are techniques we can use in SQL, like utilizing HAVING on aggregations, self-joins and cross-joins, and applying window functions. But, additionally, we'll see problems that run in a similar vein.

For example, writing a query to get the second highest salary or writing a query to isolate every fifth purchase by a user utilizes the same RANK function in SQL.

Understanding the commonalities between questions will help you understand the first step to solving SQL questions faster because you can re-use similar code and stitch together techniques on top of each other.

5. Start writing SQL

Finally, it's important to just start writing SQL. It's better to start writing an imperfect solution vs trying to perfectly understand the problem or trying to perfect the solution on the first try.

Verbalize your assumptions and what you're doing as you write SQL and your interviewer can then be put on the same page as you.

3. The 7 different SQL interview questions

SQL questions asked during interviews can vary widely across companies, but even more so across positions. You won't see data scientists asked the same SQL questions as software engineers, and that's because data scientists have to write different types of queries compared to software engineers.

Generally, each SQL interview question can be bucketed into these categories:

  • Definition based SQL questions
  • Basic SQL questions
  • Reporting and metrics SQL questions
  • Analytics SQL questions
  • ETL SQL questions
  • Database design questions
  • Logic based SQL questions

In this next section, we'll go over which types of SQL questions are expected for different roles and what those different kinds of SQL questions are in detail.

4. SQL questions for data scientists and analysts

SQL interview questions for data scientists and data analysts will likely show up in three parts of the interview process: the technical round, the take-home challenge, and the onsite interview.

The technical round and take-home challenge will usually consist of SQL questions designed to filter out candidates. Since SQL is commonly used as a filter mechanism for data scientists, it's important to perform well on this part of the interview in order to demonstrate competence.

Depending on what type of data science role you're interviewing for, you'll find that most SQL questions will be split into these three types:

  • Basic SQL Interview Questions
  • Reporting and Metrics SQL Interview Questions
  • Analytics SQL Interview Questions

Basic SQL Interview Questions

Basic SQL questions are what they sound like. These questions will be generally easy and focus on assessing if you know the basics.

Definition based SQL questions are grouped into this category because they're super easy to learn. All you have to do is study a list of definitions of SQL terms and applications. These questions will include understanding the differences between joins, what kinds of aggregations exist, and knowing the basic functions like CASE WHEN or HAVING.

Basic SQL interview questions that involve a user actually writing a query are slightly different. These will involve getting the COUNT of a table, knowing what the HAVING clause does, and figuring out how to utilize LEFT JOIN versus INNER JOIN to give you the values that you need.

Read more on the the basic concepts you need to know to pass your data science interview here.
Three SQL Concepts for your Data Scientist Interview
I’ve interviewed a lot of data scientist candidates and have found there are a a lot of SQL interview questions for data science that eventually boil down to three generalized types of conceptual understandings.

Basic SQL Concepts to Review

  • What's the difference between a LEFT JOIN and an INNER JOIN?
  • When would you use UNION vs UNION ALL? What if there were no duplicates?
  • What's the difference between COUNTand COUNT DISTINCT?
  • When would you use a HAVING clause versus a WHERE clause?

Basic SQL Question Example:

We're given two tables, a users table with demographic information and the neighborhood they live in and a neighborhoods table.
Write a query that returns all of the neighborhoods that have 0 users.

Try answering this question with our interactive SQL editor.

Here's a hint:

Our predicament is to find all the neighborhoods without users that live in them. This means we have to introduce a concept of existence of a field in one table, while not existing in another.

For example, let's say we generate some fake data of user's and the neighborhoods they live in. We would expect it to look something like this.

neighborhoods.name  | users.id
____________________|__________
castro              | 1
castro              | 2
cole valley         | null
castro heights      | 3
sunset heights      | 4

We see each user from one to four is appropriately placed in their respective neighborhood except for the neighborhood of Cole Valley. That's the neighborhood we're targeting for returning in our query.

Strategies: whenever the question asks about finding values with 0 something (users, employees, posts, etc..), immediately think of the concept of LEFT JOIN! An inner join finds any values that are in both tables, a left join keeps only the values in the left table.

Our predicament is to find all the neighborhoods without users. To do this, we must do a left join from the neighborhoods table to the users table.

If we then add in a where condition of WHERE users.id IS NULL, we will get every single neighborhood without a singular user as shown above.

SELECT n.name   
FROM neighborhoods AS n 
LEFT JOIN users AS u
    ON n.id = u.neighborhood_id
WHERE u.id IS NULL
Empty Neighborhoods — Interview Query sql problem
users table columnstype idint namevarchar neighborhood_idint created_atdatetime neighborhoods table columnstype idint namevarchar city_idint We're
Check out the full problem and different solutions we can apply

Reporting and Metrics SQL Interview Questions

Example of Reporting SQL queries to Dashboard

Reporting and metrics SQL questions are probably the most common type of SQL question to show up in interviews.

Reporting SQL interview questions replicate the work that many business and reporting analysts do on a day-to-day basis. This means writing queries that end up in dashboards and key metrics.

For example, a typical B2C software company would likely want to understand how many new users signed up. Better yet, they'd want to understand the daily, weekly, and monthly active users that are on their platform. What about the number of week-over-week new users that signed up as well?

These questions are generally some of the most common problems that analysts and data scientists will run into when pulling metrics for their day-to-day job. The output of what they need to pull is very clearly defined and the queries themselves require complex joins, sub-queries, self-joins, window functions, and more.

Easy Metrics-Based SQL Questions

  • What's the total distance traveled for all riders on Lyft in the month of March?
  • How many bookings did Airbnb get in the month of December?
  • How many existing users booked at least one place in Airbnb in the month of December?

Advanced Metrics-Based SQL Questions:

  • Write a query to get the month-over-month change of new users in January.
  • Write a query to calculate the monthly retention of subscribers.
  • Write a query to get a histogram of the number of posts per user in 2020.

Let's take a look at tackling an example problem posed by LinkedIn in their data science interview.

Repeat Job Postings
Given a table of job postings, write a query to breakdown the number of users that have posted their jobs once versus the number of users that have posted at least one job multiple times.
Repeat Job Postings — Interview Query sql problem
`job_postings` table columntype idinteger job_idinteger user_idinteger date_posteddatetime Given a table of job postings, write a query
Try the problem on Interview Query

Here's a hint on tackling the problem.

First, let's visualize what the output would look like and clarify the question.

We want the value of two different metrics, the number of users that have posted their jobs once versus the number of users that have posted at least one job multiple times. What does that mean exactly?

Well, if a user has 5 jobs but only posted them once, then they are part of the first statement. But if they have a 5 jobs and posted a total of 7 times, that means that they had to at least posted one job multiple times.

We can visualize it the following way with an example. Let's say this is our end output:

Users posted once   | Posted multiple times
-----------------------------------------
        1           |       1

To get to that point, we need a table with the count of user-job pairings and the number of times each job gets posted.

user_id | job_id | number of times posted
--------+--------+-----------------------
  1     |    1   |          2
  1     |    2   |          1
  2     |    3   |          1

We can pretty easily get to that point with just a simple GROUP BY on two variables, user_id and job_id.

WITH user_job AS (
    SELECT user_id, job_id, COUNT(DISTINCT date_posted) AS num_posted
    FROM job_postings
    GROUP BY user_id, job_id
)

SELECT * FROM user_job
Watch how I solve the rest of the solution below.

Check out this video where I tackle a harder reporting SQL question featuring multiple self-joins and year over year metrics.

Analytics SQL Interview Questions

Analytics SQL interview questions are some of the trickiest interview questions that you will face. This is because they test two concepts.

  1. Understanding what metrics we need to answer the question.
  2. Writing the correct SQL query that will output these metrics.

Analytics SQL interview questions are designed to test how you would think about solving a problem, and are purposely left more ambiguous than other types of problems. The tough part is that you not only have to think critically about what the SQL output has to look like, you also need to understand EXACTLY the right data points to pull.

For example, an interviewer might ask you to write a SQL query (given a few tables) to understand which AB test variant won. But there might not even be any understanding of what winning actually means.

Here's another example:

Let's say that we're given a table of users and sessions. A product manager wants to know if the new sign-up flow launched last week improved the sign up conversion rate at all.
Write a query to pull a metric to determine if this is the case.

We can see that there isn't a clearly defined output like in the previous question. The interviewer isn't defining a metric that the candidate needs to write a query to find, such as the number of new users signed up last week or the conversion rate.

Rather, the interviewer is asking the candidate to first define a metric to solve a problem and then write a query to get that metric. The reason why these problems are so difficult is because getting the first part right is critical to getting the second part as well.

It also makes for a great way to test product analysts and data scientists that are especially analytically focused. The SQL question is testing if a candidate solve an ambiguous question and prove their technical chops.

Example of UI analytics from CXL

Analytics SQL Question Concepts

  • We ran an A/B test on two different sign up funnels. Write a query to see which variant "won".
  • We're looking to understand the effect of a new Uber driver incentive promotion released in the past month on driver behavior. Write a query to figure out if the incentive worked as indicated.
  • Amazon released a new recommendation widget on their landing page. Write a query to determine the impact the recommendation widget made on user behavior for one metric.

Analytics SQL Question Example:

Liked Pages
Let's say we want to build a naive recommender.
We're given two tables: one table called friends with a user_id and friend_id columns representing each user's friends, and another table called page_likes with a user_id and a page_id representing the page each user liked.
Write an SQL query to create a metric to recommend pages for each user based on recommendations from their friends' liked pages.
Note: It shouldn't recommend pages that the user already likes.
Liked Pages — Interview Query sql problem
`friends` table columntype user_idinteger friend_idinteger `page_likes` table columntype user_idinteger page_idinteger Let's
Try the question in the SQL editor

Here's a hint.

Let's solve this problem by visualizing what kind of output we want from the query.

Given that we have to create a metric for each user to recommend pages, we know we want something with a user_id and a page_id along with some sort of recommendation score.

Let's try to think of an easy way to represent the scores of each user_id and page_id combo. One naive method would be to create a score by summing up the total likes by friends on each page that the user hasn't currently liked. The max value of total likes will be the most recommendable page.

Then, the first thing we have to do is write a query to associate users with their friends' liked pages. We can easily do that with an initial join between the two tables.

WITH t1 AS (
    SELECT 
        f.user_id 
        , f.friend_id
        , pl.page_id 
    FROM friends AS f
    INNER JOIN page_likes AS pl
        ON f.friend_id = pl.user_id
)

Now, we have every single user_id associated with the friends' liked pages. Can't we just do a GROUP BY on user_id and page_id fields and get the DISTINCT COUNT of the friend_id field?

Not exactly. We still have to filter out all of the pages that the original users also liked. How do we do that?

We can do that by joining the original page_likes table back to the t1 CTE. We can filter out all the pages that the original users liked by doing a LEFT JOIN on page_likes and then selecting all the rows where the JOIN on user_id and page_id are NULL.

LEFT JOIN page_likes AS pl
    ON t1.page_id = pl.page_id 
        AND t1.user_id = pl.user_id
WHERE pl.user_id IS NULL # filter out existing user likes

More study practice? Check out three tricky analytics interview questions that I solved with my friend Andrew.

SQL Interview Questions for Analysts

Whether you work as a data analyst, product analyst, or business intelligence analyst, you'll likely be asked a SQL interview question if it's required for the job. Generally, the SQL questions asked will be similar as those asked in a data science interview– the only difference being that they may be more focused towards reporting based SQL questions rather than analytics.

However, product analysts and data analysts can expect more analytics based SQL interview questions, while business analysts and reporting analysts will likely see more reporting SQL interview questions.

Specifically for business intelligence roles, there is likely to be ETL SQL interview questions, given the importance of making sure the data is secure from the data infrastructure side to the reporting dashboards.

ETL SQL Interview Questions

ETL stands for "Extract, Transfer, Load" and describes the process for which data flows between different data warehousing systems.

Extract does the process of reading data from a database. Transform converts data into a format that could be appropriate for reporting, analysis, machine learning, etc., and Load writes the transformed data into another database, table, or any other data storage service that can be then used by another data scientist or engineer for reporting.

Many times, ETLs are stacked on top of each other, creating a system of complex data-flows that eventually need to be managed by scheduling systems, such as Airflow or MLflow.

In the interview, ETL concepts are important to know for virtually all roles. The more difficult interview questions, however, will likely be focused and asked on data engineering, business intelligence, and related interviews.

Basic ETL SQL Concepts

  • What's the difference between TRUNCATE and DROP?
  • What is a PRIMARY KEY in SQL syntax?
  • What is a FOREIGN KEY?

Advanced ETL SQL Concepts

  • List an example of when you would add an INDEX to a table?
  • What's the difference between a PARTITION and an INDEX?
  • Does creating a view require storage in a database?
  • Let's say that we have two ETL jobs that feed into a single production table each day. Can you think of any problems this might bring up?

Example ETL Interview Questions

Let's say you have a table with a billion rows.
How would you add a column inserting data from the original source without affecting the user experience?
Modifying a billion rows — Interview Query system design problem
Let's say you have a table with a billion rows.How would you add a column inserting data from the original source without affecting the user experience?

Here's a hint.

In a general database, writing a column would lock up the whole table. However, we can potentially do the update in steps.

One strategy is taking an exact replica of the existing table and updating the results offline. So, we could create a new table by copying the old table, update the new column, and then drop the old existing table and renaming it to the new table.

However, this does produce a problem in that we may have a potential mismatch of data. From the time that we take a copy of the new table to then switching the tables over, we may have lost data.

View the solution in the mock interview with my friend Scott who works as a data and machine learning engineer.

Employee Salaries - ETL Error

Table for Interview Query problem

Let’s say we have a table representing a company payroll schema.
Due to an ETL error, the employees table, instead of updating the salaries every year when doing compensation adjustments, did an insert instead. The head of HR still needs the current salary of each employee.
Write a query to get the current salary for each employee.
Assume no duplicate combination of first and last names. (I.E. No two John Smiths)

Here's a hint.

The first step would be to remove duplicates and retain the current salary for each user.

Given that we know there aren't any duplicate first and last name combinations, we can remove duplicates from the employees table by running a GROUP BY on two fields, the first and last name. This allows us to then get a unique combinational value between the two fields.

Run the SQL query below in the editor. What does this max_id value get us?

SELECT first_name, last_name, MAX(id) AS max_id
FROM employees
GROUP BY 1,2
Employee Salaries (ETL Error) — Interview Query sql problem
employees table columntype idinteger first_namestring last_namestring salaryinteger department_idinteger Let’s say we have a table

5. SQL questions for engineers

A little while ago, I talked to a hiring manager at a prominent tech company in Silicon Valley, and he mentioned that he always tested SQL first for all members on his team - even the engineers.

I asked him to elaborate and he emphasized the importance of SQL.

“No matter what, the candidate needs to know SQL. I don’t care if they’re the best machine learning expert in the world– if you can’t pull your own data, you can’t work on my team. No one is going to pull your data for you.”

If you were the world’s best machine learning researcher in the world, how would it look if you had to go to an analyst and say:

"Hey, can you get me this dataset for me, clean it, and then also feature engineer it a bit? Thx.

This doesn’t exactly make for great teamwork. It also wouldn’t make sense that you would be the best machine learning expert in the world and still not know how to master something as simple as SQL.

Each engineering position that works with SQL tests it differently, but let's start with data engineering, because data engineers definitely have to be masters at SQL.

Data Engineering SQL Interview Questions

Example of Data Architecture

Data engineers specialize in designing large scale data systems. This work can span from building out production and backup databases and data warehouses to writing code to transfer the data between different states.

One key understanding is that data engineers always require SQL knowledge. It's the fundamental building block to their position. Most of the required interview questions we have already gone over before, except for database design.

  • Reporting SQL Interview Questions
  • ETL SQL Interview Questions
  • Database Design SQL Interview Questions

Database Design SQL Interview Questions

Database design SQL questions test your knowledge of data architecture and design. Most importantly, it tests whether you know how to design a database from scratch, given a business idea, application, or any other software that needs to interact with a database.

Many times, when databases need to scale for performance or breadth size of tables, we need to realize how to modify our database to fit the new requirements. Starting from a solid initial design is always important when building out databases.

Example Database Design Questions

  • Let's say we're working at Spotify. In the users table we have columns such as name and location but also columns like total_liked_songs and favorite_artist. Should we do something about this table?
  • How would you design a database for a fast food restaurant?
  • Let's say we are productionizing a machine learning model. How would you set up the data architecture to ensure that the model serves all of our users in real time? How would you ensure that the model can be re-trained each day with new data coming in?

Example Database Design Interview Question

How would you create a schema to represent client click data on the web?
Click Data Schema — Interview Query system design problem
How would you create a schema to represent client click data on the web?

Here's a hint.

What exactly does click data on the web mean?

Any form of button clicks, scrolls, or action at all is an interaction with the client interface–in this case desktop–and would somehow be represented into a schema form for the end user to query. This does not include client views.

A simple but effective design schema would be to first represent each action with a specific label. In this case, assigning each click event a name or label describing its specific action.

For example, let's say the product is Dropbox on web and we want to track each folder click on the UI of an individual person's Dropbox. We can label the clicking on a folder as an action name called folder_click. When the user clicks on the side-panel to login or logout and we need to specify the action, we can call it login_click and logout_click.

What other fields do we need?

Check out the rest of the solution here

Machine Learning Engineering SQL Interview Questions

Machine learning engineers need to pull data in a similar fashion to data scientists. Depending on the role, however, machine learning engineers might not be asked SQL interview questions. At more established companies like Facebook and Google, ML engineers have processes to get their data without using SQL.

The most common types of SQL interview questions that will show up are:

  • Reporting and Metrics SQL Interview Questions
  • ETL SQL Interview Questions

Software Engineering SQL Interview Questions

Software engineers will be asked SQL questions mostly a precursor to test that they know the very basics. Software engineers interact with data when building APIs and endpoints and have to query the database for data.

The syntax that software engineers usually use for SQL is generally different from data scientists and analysts. This is because of the rise in interfacing with a SQL relational mapper and toolkit, such as SQLAlchemy for Python. The software engineer isn't exactly writing SQL syntax, but rather writing Python, Go, Java, etc. that wraps around SQL.

The syntax can also be more like this:

SELECT * From Emp, Dept

Or like this:

SELECT * 
FROM Emp, Dept
WHERE Emp.dept_id = Dep.id

The most common SQL interview questions that software engineers would receive would be:

  • Basic SQL Interview Questions
  • ETL SQL Interview Questions
  • Database Design Interview Questions
  • Logic Based SQL Interview Questions

Logic based SQL Interview Questions

Logic based SQL interview questions are very tricky. They aren't really based on real life examples so much as putting the trickiness of algorithms and data structure interviews into SQL questions. This is exemplified on sites such as LeetCode, where you'll see a lot of interview questions that aren't very practical for real life scenarios.

Here's an example of a logic based SQL interview question.

Write a query to create a new table, named flight routes, that displays unique pairs of two locations.
Duplicate pairs from the flights table, such as Dallas to Seattle and Seattle to Dallas, should have one entry in the flight routes table.
Flight Records — Interview Query sql problem
`flights` table columntype idinteger source_locationstring destination_locationstring Write a query to create a new table, named flight routes,

6. Quick SQL concepts & review

Here, I want to go over an extensive review list of all of the concepts we just went through. In the second part, I've linked actual real SQL exercises to try.

SQL Concepts and Questions

What is a join?

A JOIN is a keyword used to merge together two or more tables on the same key.

Which SQL command is used to add rows to a table?

The INSERT command.

When might you denormalize your data?

Denormalize when its OLAP operations and normalize when OLTP.

What is OLAP and OLTP?

OLTP are databases intended for online transaction processing and OLAP are databases intended for online analytical processing.

What's the difference between WHERE and HAVING?

The main difference is that a WHERE clause is used to filter rows before grouping and HAVING is used to exclude records after grouping.

When do you use the CASE WHEN function?

CASE WHEN lets you write complex conditional statements on the SELECT clause and also allows you to pivot data from wide to long formats.

When would you use a SELF-JOIN?

A self-join joins a table with itself. It's most commonly used when needing to perform aggregation functions when data is stored in one large table rather than smaller ones.

7. SQL interview questions and exercises

Try more SQL interview questions with our interactive editor!

Employee Salaries — Interview Query sql problem
employees table columnstypes idint first_namevarchar last_namevarchar salaryint department_idint departments table columnstypes idint namevarchar Given
Comments Histogram — Interview Query sql problem
users table columnstype idinteger namestring created_atdatetime neighborhood_idinteger mailstring comments table columnstype user_idinteger bodytext created_atdatetime Write
Upsell Transactions — Interview Query sql problem
`transactions` table columntype idinteger user_idinteger created_atdatetime product_idinteger quantityinteger We're given
Random SQL Sample — Interview Query sql problem
`big_table` columntype idint namevarchar Let's say we have a table with an id and name field. The table holds over 100 million rows and we want to sample

8. SQL Interview Questions asked by Facebook, Amazon, and Google (FAANG)

What's the difference between SQL interview questions asked by the big tech companies like Facebook, Amazon, Microsoft, etc. vs other types of companies? Generally, FAANG companies ask questions that are more product and analytically facing that are conceptualized with a case study.

For example, Facebook will likely ask questions surrounding practical queries they have to run on a day-to-day with their platform. That means pulling different ad bids, looking at daily active users, and understanding product performance with dashboards.

Generally, it's helpful to tailor your SQL interview practice according to industry of the company that you'll be interviewing for. If you're interviewing for a bio-tech company, think of the type of data that an data analyst or data scientist will be querying on a day-to-day basis, and practice questions related to that.

Check out a SQL mock interview that I did with my friend Ben replicating the Facebook data science interview.

9. Last tips and notes

There’s a couple of important details we need to consider in preparing for SQL technical interviews.

SQL interviews are almost always “white-boarding”.

SQL interviews are rarely conducted in a live console where you can run queries in a playground database. Even CoderPad, which is the de-facto interviewing tool for engineers and most tech companies, has only around four example tables you can use. And I’ve never heard of any interviewers using questions from their tables for interview problems.

Why is this the case?

First and foremost, it is difficult to set up a shared environment to test SQL with real data. I also suspect that many companies don’t think it’s a good test if a candidate does have a live SQL playground to work in. For example, Facebook chooses not to create a SQL playground for interviewees to use.

The main reason, however, is that testing SQL by white-boarding SQL syntax accomplishes the task of assessing real mastery of the language that is also practical for speaking towards your efficiency as an individual contributor.

The worst-case scenario is where you write a wrong query, put it into an ETL job, and then be comfortably oblivious as your company ingests wrong metrics for who knows how long. Learning how to write correct SQL is very necessary and the expectation is that you can whiteboard it without too much failure.

SQL can be quickly learned.

One can improve their SQL ability by simply practicing. The steps towards mastering SQL just include repeatedly working on problems. SQL interview questions are very similar to the tasks that you do on the job as a data scientist, which, on the plus side, means that mastering SQL can sometimes consist of you doing your actual day job.

However, the downside is that if your role is specialized and you’re restricted to only using SQL for something like time-series forecasting, or you don’t use SQL on a day-to-day basis, you can’t practice it effectively across the board for interviews.

Therefore, the unique problem sets that we surface in Interview Query allow you to hack the learning curve of SQL experience and master the querying language.

Our job within this course is to expose you to the most common types of SQL questions that data scientists have to work on. Examples range from practicing applied analytics to dash-boarding metrics to queries for model and dataset generation.

Optimize the Performance of your SQL Query

Always try optimizing the performance of your SQL queries during the interview. If you end up writing a SQL query that is horribly inefficient, you will likely be rejected from the interview.

If you’re working at a company with large swaths of data, your queries are likely to be slow. My co-workers and I on the data science team would write a query, go to the snack bar across the office and walk back, and it still would not be done. There was just so much data we had to query.

Running queries generally takes a long time, which is why getting it right on the first try is a good demonstration of general competence.

Imagine running an expensive join and waiting an hour until it times out just from non-optimized code. You likely just got blocked on a task for a whole hour during the workday. Then, imagine again an hour-long SQL query that comes back with wrong data because you messed up a simple join.

Take this interview question for example:

Let's say we have a table with an id and name field. The table holds over 100 million rows and we want to sample a random row in the table without throttling the database.
Write a query to randomly sample a row from this table.

The most obvious solution for this question would be to just order the rows by RAND() and select one row.

SELECT id, name 
FROM big_table
ORDER BY RAND()
LIMIT 1

But it's noted that the table has 100 million rows. Ordering by RAND() would never finish the query! So even though it's technically a correct solution, it's not practical for the question being asked.

Random SQL Sample — Interview Query sql problem
`big_table` columntype idint namevarchar Let's say we have a table with an id and name field. The table holds over 100 million rows and we want to sample
Try the rest of the problem here

Use Correct Formatting for your SQL Queries.

I won't tell you how many times I have marked people down because their formatting is so atrocious I can't tell if the actual solution is correct or not.

Always make sure that you're adding line breaks and spacing to make your SQL code easy to read and digestible. Otherwise, you will find yourself writing SQL code that your interviewer cannot even verify as being correct, ruining the entire point of the interview itself.

Does it matter which SQL Engine I use for my interview?

No, not really. The only thing to watch out for is if you use a very uncommon SQL engine that goes against common SQL technique and syntax.

If you start using window functions such as PARTITION BY or RANK and your interviewer doesn't use that kind of SQL engine or doesn't understand window functions, you'll likely have to re-write your query in another way that is intelligible.

10. SQL Study plan for your next interview

SQL interview questions on Interview Query are bucketed into easy, medium, and hard and go over every SQL concept that we went through in this blog post.

Use Interview Query to level yourself at how good you are at SQL. Here’s how you should approach each problem.

  1. Try the SQL question without running any SQL code in the engine. Give yourself 15 to 20 minutes to work on each problem by just reading the problem statement and using the editor.
  2. Work out all the bugs and make sure that when you click run or submit, you feel like this is the definitive solution that will work without syntax errors.
  3. If you get stuck, try the problem hint. The hints on certain questions will push you into learning how to think about solving the problem at hand.
  4. Try submitting your code to check if it passes our test cases. If it misses a few test cases, figure out why by debugging your code.
  5. If you need to debug your code, you can hit the RUN SQL button to try it on our test database. Run your code and check for syntax errors. If you get any, re-read your query to fix the necessary errors.
  6. Check your solution against ours. Take note of whether there are any optimizations you can use to write a more efficient query.