Table of Contents
The data science case study is often the most difficult part of the hiring process. After sending in a resume and passing the recruiter screening along with the initial interview, this final stage often makes or breaks an applicant’s hiring potential.
Designed to simulate a company’s current and past projects, case study problems rigorously examine how a candidate approaches prompts, communicates their findings, and works through roadblocks.
Why do case studies get asked?
In order to understand how to pass the case study section, it’s important to first understand what interviewers are looking for when applicants work through these prompts. Often, at this point in the process, prospects have already demonstrated sufficient technical understanding and skills for the position, so this is no longer a question of whether or not they can perform job duties.
Instead, case studies look to understand the interviewee’s thought process– the ability to think on their feet through problems that don’t have a singular solution. Real life cases aren’t binary– there is no black-and-white-yes-or-no answer. Rather, due to all the ambiguities, candidates will need to demonstrate decisiveness in their investigations, as well as a capacity to consider impacts and topics from a variety of angles.
Perhaps even more importantly, the ability to effectively communicate conclusions will be heavily highlighted in data science case study problems. Real working conditions require a great deal of information exchange across teams and divisions, so part of the interviewer’s focus will be on the system through which a candidate processes and explains their answer, and consequently, exactly what details are falling through the cracks.
Types of Data Science Case Studies
There are three main types of data science case studies: product questions, modeling and machine learning questions, and business case questions.
Product Case Study Questions
This type of case study tackles a specific product or feature, often tied to the interviewing company. As such, it is extremely beneficial to research current projects and research developments across different divisions, as it might end up as the case study topic!
In this type of data science case study, interviewers are generally looking for a sense of business intuition revolving around product mechanics. The most important part is to identify which metrics should be proposed to understand a product.
Check out our guide on tackling the product data science case interview.
Here's an example product data science case study question:
Suppose you’re working as a data scientist at Facebook. How would you measure the success of private stories on Instagram, where only certain chosen friends can see the story?
Try solving a product case question on why comments would be decreasing on a social media platform.
Modeling and Machine Learning Case Questions
Modeling case studies are more varied and designed around developing some sort of insight into building models around business problems. These questions can range from applying machine learning to solve a specific case scenario to assessing the validity of a hypothetical existing model. The modeling case study requires a candidate to evaluate and explain any certain part of the model building process.
Read more on machine learning interview questions
A common case study problem would be for a candidate to explain how they would build a model for a product that exists at the company or another company.
Describe how you would build a model to predict Uber ETAs after a rider requests a ride
Many times this can be scoped down into specific portion of the model building process. For example taking the example above, we could break it up to:
How would you evaluate the predictions of an Uber ETA model?
What features would you use to predict the Uber ETA for ride requests?
Our recommended framework is to break a modeling and machine learning case study down to individual steps and tackle each one thoroughly.
In each full modeling case study, you'll want to go over each part of:
- Data processing
- Feature Selection
- Model Selection
- Cross Validation
- Evaluation Metrics
- Testing and Roll Out
Let's say that you work at a bank that wants to build a model to detect fraud on the platform.
The bank wants to implement a text messaging service in addition that will text customers when the model detects a fraudulent transaction in order for the customer to approve or deny the transaction with a text response.
How would we build this model?
Try answering the interview question with our interactive guide
Business Case Questions
Similar to product questions, business case problems are tackling a problem specific to the business. Common topics are often tied around having candidates assess the best option for certain business plans, and formulating a process for solving a specific problem. Other examples could include estimation and calculation, as well as applying problem solving to a larger case.
As with the product variant, it is helpful to read up on the interviewing company’s products and ventures beforehand to have some exposure to possible topics.
Example business case question:
You work as a data scientist for a ride-sharing company.
An executive asks how you would evaluate whether a 50% rider discount promotion is a good or bad idea. How would you implement it? What metrics would you track?
For an in-depth example of a case study question, try this case question asked by Amazon on de-duplicating products on an ecommerce website.
Framework for Data Science Case Studies
There are four main steps to tackling every data science case study problem, regardless of the type: clarify, make assumptions, gather context, and provide data points and analysis.
The first step is used to gather more information. More often than not, these case studies are designed to be confusing! There will be unorganized data intentionally supplemented with extraneous or omitted information, so it is the candidate’s job in this step to even out this inherent disadvantage. Interviewers will observe how they ask questions and continue on through their solution.
For example, with a product question, you might take into consideration:
- What is the product?
- How does the product work?
- How does the product align with the business itself?
The next step is where the thought process really starts to be outlined. With all the data provided, it’s important to start investigating and discarding possible hypotheses. Developing insights here is complementary to the ability to fine tune and glean information from the previous step, and the understanding gained there is paramount to forming a successful hypothesis. For simplicity’s sake, let’s continue with the product line of questioning.
In this step, some important questions to evaluate and draw conclusions from include:
- Who uses the product? Why?
- What are the goals of the product?
The goal of this is to reduce scope of the problem at hand and ask the interviewer questions upfront that allow you to tackle the meat of the problem instead of focusing on random edge cases.
Hypothesize and Propose a Solution
Now that a hypothesis is formed, gathering context is the next step towards fleshing out an answer. This is where the problem should be reframed given the new information gathered in the last two steps.
Remember that there isn't an expected singular solution, and as such, there is a certain freedom here to determine the exact path for investigation. Consider how to define different metrics in the context of the problem.
Provide Data Points and Analysis
Finally, providing data points and analysis involves choosing and prioritizing a main metric. As with all prior factors, this step must be tied back to the hypothesis and the main goal of the problem. From there, it’s important to trace through and analyze different examples– from the main metric–in order to validate the hypothesis.
Final Breakdown + Tips
The last topic to touch upon would be the general format of these case studies. Unfortunately, this is company-specific: some prefer live settings, where candidates actively work through a prompt after receiving it, while others offer some period of time (say, a week) before settling in for a presentation of the findings.
Note: in some special cases, solutions will also be assessed on the ability to convey information in layman's terms. Regardless of the structure, applicants should always be prepared to solve through the framework outlined above in order to answer the prompt.
There have been multiple articles and discussions conducted by interviewers behind the Data Science Case Study portion, and they all boil down success in this stage to one main factor– effective communication.
All the analysis in the world isn’t going to help if interviewees cannot verbally work through and highlight their thought process within the case study. Again, the main highlight in this section of the hiring process are well-developed “soft-skills” and problem-solving capabilities. Demonstrating those traits is key to succeeding in this round.
To this end, the best advice possible would be to practice actively going through example case studies, such as those available in the Interview Query question bank. Exploring different topics with a friend in an interview-like setting with cold recall (no Googling in between!) will be uncomfortable and awkward, but it’ll also help reveal weaknesses in fleshing out the investigation.
Don’t worry if the first few times are terrible! Developing a rhythm will help with gaining confidence in assessing and learning through these sessions.
As always, feel free to check us out at Interview Query for more tips and practice!