Data science case study questions are often the most difficult part of the interview process. Designed to simulate a company's current and past projects, case study problems rigorously examine how candidates approach prompts, communicate their findings, and work through roadblocks.

Practice is key for acing case study interviews in data science. By practicing test problems, you'll learn how to approach case study questions, ask the right questions, and formulate answers quickly.

Looking for more case study resources? Check out the Product Sense and Modeling/Machine Learning sections of our Data Science Course.

What Questions Get Asked in Case Study Interviews?

There are three main types of data science case studies: product questions, modeling and machine learning questions, and business case questions.

  • Product Case Studies - This type of case study tackles a specific product or feature, often tied to the interviewing company. Interviewers are generally looking for a sense of business intuition revolving around product metrics.
  • Prediction/Modeling Case Studies - Modeling case studies are more varied and focus on assessing your intuition for building models around business problems.
  • Business Case Questions - Similar to product questions, business case problems are tackling a problem specific to the business. Common topics are often tied around having candidates assess the best option for certain business plans, and formulating a process for solving a specific problem.

Ultimately, because data science case studies tend to be product- and company-focused, it is extremely beneficial to research current projects and developments across different divisions, as these initiatives might end up as the case study topic!

Why Are Case Study Questions Asked?

To plan your approach to case study questions, it helps to understand what interviewers are looking for. Often, at this point in the interview process, you'll have demonstrated sufficient technical understanding and skills for the position.

In other words, you aren't being assessed on whether or not you can perform key job duties.

Instead, case studies assess your thought process – the ability to think on your feet and work through real-world problems that don't have a right or wrong answer. Real-world case studies aren't binary; there is no black-and-white, yes-or-no answer. This is why it's important you can demonstrate decisiveness in your investigations, as well as your capacity to consider impacts and topics from a variety of angles.

Perhaps most importantly, case interviews assess your ability to effectively communicate your conclusions. On the job, data scientists exchange information across teams and divisions, so a significant part of the interviewer's focus will be on how you process and explain your answer.  

Product Case Study Questions

With product case questions, the interviewer wants to get an idea of your product sense and business intuition, specifically with identifying which metrics should be proposed to understand a product.

Check out our guide on tackling the product data science case interview.

Q1. Suppose you’re working as a data scientist at Facebook. How would you measure the success of private stories on Instagram, where only certain chosen friends can see the story?

Hint: How would you calculate the metric average comments per user?
Decreasing Comments — Interview Query product metrics problem
Let's say you work for a social media company that has just done a launch in a new city. Looking at weekly metrics, you see a slow decrease in the average number of comments per user

Q2. At Netflix, we offer a subscription where customers can enroll for a 30 day free trial. After 30 days, customers will be automatically charged based on the package selected. We want to measure the success of acquiring new users through the free trial.

How can we measure acquisition success and what metrics can we use to measure the success of the free trial?

One way we can frame the concept specifically to this problem is to think about controllable inputs, external drivers, and then the observable output. Start with the major goals of Netflix:

  • Acquiring new users to their subscription plan.
  • Decreasing churn and increasing retention.

How does that affect how Netflix might acquire new users?

Netflix Retention — Interview Query product metrics problem
Let's say at Netflix we offer a subscription where customers can enroll for a 30 day free trial. After 30 days, customers will be automatically charged based on the package selected.Let's

Q3. How would you measure the success of Facebook Groups?

Start by considering the key function of Facebook Groups. You could say that Groups are a way for users to connect with other users through a shared interest or real-life relationship.

Hint: With this in mind, what how could we use the goals of Facebook Groups to measure success?
Group Success — Interview Query product metrics problem
How would you measure the success of Facebook Groups?


Q4. We're working on a new feature for LinkedIn chat, and we want to implement a green dot to show an “active user”. Given engineering constraints, we can't AB test it before release.

How would you analyze the effectiveness of this new feature?

When you approach case study questions, remember to always clarify any vague terms. In this case, "effectiveness" is very vague. To help you define that term, you would want to first consider what the goal is of adding a green dot to LinkedIn chat.

Green Dot — Interview Query product metrics problem
Let's say we're working on a new feature for LinkedIn chat. We want to implement a green dot to show an “active user” but given engineering constraints, we can't AB

Q5. Let's say that you're a data scientist on the engagement team. A product manager comes up to you and says that the weekly active users metric is up 5% but email notification open rates are down 2%.

What would you investigate to diagnose what's happening?

What assumptions can you make about the relationship between weekly active users and email open rates? With a case question like this, you'd want to first answer that.

Hint: Open rate can decrease when its numerator decreases (fewer people open emails) or its denominator increases (more emails are sent). Taking these two factors into account, what are some hypotheses we can make about our decrease in open rate compared to our increase in weekly active users?
WAU vs Open Rates — Interview Query product metrics problem
Let's say that you're a data scientist on the engagement team. A product manager comes up to you and says that the weekly active users metric is up 5% but email notification

Modeling and Machine Learning Case Questions

Machine learning case questions assess your ability to build models to solve business problems. These questions can range from applying machine learning to solve a specific case scenario to assessing the validity of a hypothetical existing model. The modeling case study requires a candidate to evaluate and explain any certain part of the model building process.

See our Machine Learning Interview Questions guide for more sample case questions.

Q1. Describe how you would build a model to predict Uber ETAs after a rider requests a ride.

Common case study problems like this are designed to explain how you build a model. Many times this can be scoped down to specific parts of the model building process. For example taking the example above, we could break it up to:

How would you evaluate the predictions of an Uber ETA model?

OR

What features would you use to predict the Uber ETA for ride requests?

Our recommended framework breaks down a modeling and machine learning case study to individual steps in order to tackle each one thoroughly. In each full modeling case study, you'll want to go over:

  • Data processing
  • Feature Selection
  • Model Selection
  • Cross Validation
  • Evaluation Metrics
  • Testing and Roll Out

Q2. You work at a bank that wants to build a model to detect fraud on the platform.

The bank wants to implement a text messaging service that will text customers when the model detects a fraudulent transaction in order for the customer to approve or deny the transaction with a text response.

How would we build this model?

Let’s start out by understanding what kind of model would need to be built. We know that since we’re working with fraud, there has to be a case where there either is a fraudulent transaction or there isn't.

Hint: This problem is then a binary classification problem. Now given the problem scenario, what considerations do we have to think about when first building this model? What would the bank fraud data look like?
Bank Fraud Model — Interview Query machine learning problem
Let's say that you work at a bank that wants to build a model to detect fraud on the platform.The bank wants to implement a text messaging service in addition that will text customers

Q3. You're working on building a model to detect potential bombs at a border crossing.

How would you design the model's inputs and outputs, measure its accuracy, and test your model?

Bomb Detection — Interview Query machine learning problem
Let's say that you're working on building a model to detect potential bombs at a border crossing.How would you design the model's inputs and outputs, measure its accuracy,

Q4. We want to build a model to predict booking prices on Airbnb.

Between linear regression and random forest regression, which model would perform better and why?

Hint: What are the main differences between linear regression and random forest?

Let's see how each model is applicable to Airbnb's bookings. One thing we need to do in the interview is to understand more context around the problem of predicting bookings. To do so, we need to understand which features are present in our dataset.

Booking Regression — Interview Query machine learning problem
Let's say we want to build a model to predict booking prices on Airbnb.Between linear regression and random forest regression, which model would perform better and why?

Q5. Suppose we have a binary classification model that classifies whether or not an applicant should be qualified to get a loan. Because we are a financial company, we have to provide each rejected applicant with a reason why.

Given that we don't have access to the feature weights, how would we give each rejected applicant a reason why they got rejected?

Hint: How would the problem change if we had 10, 1000, or 10K applicants that had gone through the loan qualification program?
Rejection Reason — Interview Query machine learning problem
Suppose we have a binary classification model that classifies whether or not an applicant should be qualified to get a loan. Because we are a financial company we have to provide each

Business Case Questions

In data science interviews, business case study questions task you with addressing problems as they relate to the business. You might be asked about topics like estimation and calculation, as well as applying problem-solving to a larger case. One tip: Be sure to read up on the company's products and ventures before your interview to expose yourself to possible topics.

Q1. Let's say that you work for a software as a subscription (SAAS) company that has existed for just over a year. The chief revenue officer wants to know the average lifetime value.

We know that the product costs 100 dollars per month, averages 10% in monthly churn, and the average customer sticks around for around 3.5 months.

Calculate the formula for the average lifetime value.

Hint: Lifetime value is defined by the prediction of the net revenue attributed to the entire future relationship with all customers averaged.
Revenue Retention — Interview Query business case problem
Let's say that you work for a software as a subscription (SAAS) company that has existed for just over a year. The chief revenue officer wants to know the average lifetime value.We

Q2. Say you’re running an e-commerce website. You want to get rid of duplicate products that may be listed under different sellers, names, etc... in a very large database.

For example: iPhone X and Apple iPhone 10

How do you go about doing this?

See a solution for this business case study question.
Duplicate Product Names — Interview Query business case problem
Say you’re running an e-commerce website. You want to get rid of duplicate products that may be listed under different sellers, names, etc... in a very large database.For example:

Q3. You work as a data scientist for a ride-sharing company.

An executive asks how you would evaluate whether a 50% rider discount promotion is a good or bad idea. How would you implement it? What metrics would you track?

Hint: Be sure you ask for clarification on a question like this. What does good or bad mean to a ride-sharing business? What metrics will help you measure the pros and cons of the promotion?

Q4. Say Chase Bank is looking into creating a new partner card (think Starbucks Chase credit card or Whole Foods Chase credit card). You have access to all of its customer spending data.

How would you determine what the next partner card should be?

Hint: Chase creates partnerships with different merchants to increase acquisitions of new customers and retain existing customers. How can you find metrics to measure that? Will you need to look outside the current dataset?
New Partner Card — Interview Query business case problem
We are looking into creating a new partner card (think Starbucks chase credit card or Whole Foods chase credit card). You have access to all of our customer spending data.How would

Q5. Let's say that you're working at Netflix. The company executives are working to renew a deal with another TV network that grants Netflix exclusive licensing to stream their hit TV series (think something like Friends or The Office). One of the executives wants to know how to approach this deal. We know that the TV show has been on Netflix for a year already.

How would you approach valuing the benefit of keeping this show on Netflix?

Here's how to approach a question like this: Start by trying to understand the reasons why Netflix would want to renew the show. Netflix mainly has three goals for what their content should help achieve:

  • Acquisition: To increase the number of subscribers.
  • Retention: To increase the number of active subscribers and retain them as paying members.
  • Revenue: To increase overall revenue.

With this in mind, how would you go about calculating the loss of subscribers caused by not renewing the show?

Licensing Valuation — Interview Query business case problem
Let's say that you're working at Netflix.The company executives are working to renew a deal with another TV network that grants Netflix exclusive licensing to stream their

How to Approach Data Science Case Study Questions

Data science case study framework from TowardsDataScience

There are four main steps to tackling every data science case study problem, regardless of the type: clarify, make assumptions, gather context, and provide data points and analysis.

Here are some helpful tips for approaching data science case study questions:

Clarify

The first step is used to gather more information. More often than not, these case studies are designed to be confusing and vague! There will be unorganized data intentionally supplemented with extraneous or omitted information, so it is the candidate’s job in this step to even out this inherent disadvantage. Interviewers will observe how they ask questions and continue on through their solution.

For example, with a product question, you might take into consideration:

  • What is the product?
  • How does the product work?
  • How does the product align with the business itself?

Make Assumptions

The next step is where the thought process really starts to be outlined. With all the data provided, it’s important to start investigating and discarding possible hypotheses. Developing insights here is complementary to the ability to fine tune and glean information from the previous step, and the understanding gained there is paramount to forming a successful hypothesis. For simplicity’s sake, let’s continue with the product line of questioning.

In this step, some important questions to evaluate and draw conclusions from include:

  • Who uses the product? Why?
  • What are the goals of the product?

The goal of this is to reduce scope of the problem at hand and ask the interviewer questions upfront that allow you to tackle the meat of the problem instead of focusing on random edge cases.

Tip: Don't be afraid to think out loud. The interviewer wants to assess your thought processes. Therefore, as you walk out, be sure you're talking the interviewer through your assumptions.

Hypothesize and Propose a Solution

Now that a hypothesis is formed, gathering context is the next step towards fleshing out an answer. This is where the problem should be reframed given the new information gathered in the last two steps.

Remember that there isn't an expected singular solution, and as such, there is a certain freedom here to determine the exact path for investigation. Consider how to define different metrics in the context of the problem.

Provide Data Points and Analysis

Finally, providing data points and analysis involves choosing and prioritizing a main metric. As with all prior factors, this step must be tied back to the hypothesis and the main goal of the problem. From there, it’s important to trace through and analyze different examples– from the main metric–in order to validate the hypothesis.

Consider Potential Pitfalls

Every case question tends to have multiple solutions. Therefore, you should absolutely consider and communicate any potential trade-offs of your chosen method. Be sure you're communicating the pros and cons of your approach.

The Case Interview Format

The last topic to touch upon would be the general format of these case studies. Unfortunately, this is company-specific: Some prefer live settings, where candidates actively work through a prompt after receiving it, while others offer some period of time (say, a week) before settling in for a presentation of the findings.

Note: In some special cases, solutions will also be assessed on the ability to convey information in layman's terms. Regardless of the structure, applicants should always be prepared to solve through the framework outlined above in order to answer the prompt.

There have been multiple articles and discussions conducted by interviewers behind the Data Science Case Study portion, and they all boil down success in this stage to one main factor: effective communication.

All the analysis in the world isn’t going to help if interviewees cannot verbally work through and highlight their thought process within the case study. Again, the main highlight in this section of the hiring process are well-developed “soft-skills” and problem-solving capabilities. Demonstrating those traits is key to succeeding in this round.

To this end, the best advice possible would be to practice actively going through example case studies, such as those available in the Interview Query questions bank. Exploring different topics with a friend in an interview-like setting with cold recall (no Googling in between!) will be uncomfortable and awkward, but it’ll also help reveal weaknesses in fleshing out the investigation.

Don’t worry if the first few times are terrible! Developing a rhythm will help with gaining confidence in assessing and learning through these sessions.

As always, feel free to check us out at Interview Query for more tips and practice!