If you’re interviewing for data science and analytics roles at FAANG and other big tech companies, chances are that you’ll encounter some probability interview questions along the way.

Probability theory forms the core skill set data scientists and machine learning engineers use to perform statistical analysis of their data. Probability is also remarkably unintuitive, so testing for probability skills is a good proxy metric for companies to assess analytical reasoning and intelligence.

Whether it’s coin tosses, picking random numbers, or calculating the chance that patients test positive for a disease, probability theory is everywhere. If you’re a data scientist, understanding probability can be the difference between landing your dream job or having to go back to the drawing board.

Conceptual Probability Questions

These probability questions are designed to test your conceptual knowledge of probability theory. You could be quizzed on the types of distribution, asked to explain the Central Limit Theorem, or to describe the application of Bayes’ Theorem. The key to this type of question is not only to demonstrate knowledge of formal probability theory but also the ability to communicate this knowledge to a layperson.  

  1. What is the difference between the Bernoulli and binomial distribution?

The Bernoulli distribution models the event of conducting one trial of an experiment with only two outcomes, while the binomial distribution models conducting n many trials.

2. Explain how a probability distribution could be not normal and give an example scenario.

A probability distribution is not normal if most of its observations do not cluster around the mean, forming the bell curve. An example of a non-normal probability distribution is a uniform distribution, in which all values are equally likely to occur within a given range.

3. What is Bayes’ Rule?

4. What is the difference between covariance and correlation? Provide an example.

Covariance can take on any numeric value, while correlation can only take on values between -1 (strong inverse correlation) and 1 (strong direct correlation). Therefore, the relationship between two variables can have a covariance that seems high, but only a middling correlation value.

5. What is the difference between the Central Limit Theorem and the Law of Large Numbers?

The Law of Large Numbers says that a sample mean is an unbiased estimator for the population mean and that the error of that mean decreases as the sample size grows, while the Central Limit Theorem states that as a sample size n becomes large, its distribution can be approximated by the normal distribution.

6. What is an unbiased estimator? Give an example for a layperson.

An unbiased estimator is an accurate statistic that is used to approximate a population parameter. An example would be taking a sample of 1000 voters in a political poll to estimate the total voting population. There is no such thing as a perfectly unbiased estimator.

Review more probability concepts in the first chapter of our probability course.

Probability Case Study Questions

In this type of probability question, you’ll be given an example scenario and asked to use the given information to calculate a probability. One example might be:

7. You are playing a game with a friend to see who can roll a six on a six-sided die first. You roll first. What’s the probability that you win the game?

This question is testing your ability to apply formal probability knowledge to real-world scenarios. While it’s sometimes possible to brute-force questions like these by modeling all of the different possible outcomes of the scenario, most interviewers won’t be satisfied with that response. They’re looking for you to identify the underlying patterns within the problem and match the “correct,” or most elegant, probability concept to the problem in order to solve it.

8. Let’s say the probability that a specific item X is at location A is 0.6 and the probability that it is at location B is 0.8. What is the probability that item X would be found in locations A or B?

Let's define our probabilities:

P(Item at location A) = P(A) = 0.6

P(Item at location B) = P(B) = 0.8

We want the probability that item X is on the website in this city. That can be defined from the question as the probability that item X is at location A or location B. Given our events are not mutually exclusive, we can represent this probability in equation form: P(A or B) = P(AUB)

9. Imagine a deck of 500 cards numbered from 1 to 500. If all the cards are shuffled randomly and you are asked to pick three cards, one at a time, what's the probability of each subsequent card being larger than the previously drawn card?

Imagine this as a sample space problem, ignoring all other distracting details. If someone randomly picks three differently numbered unique cards without replacement, then we can assume that there will be a lowest card, a middle card, and a high card.

Let's make this easy and assume we drew the numbers 1, 2, and 3. In our scenario, if we drew (1,2,3) in that exact order, then that would be the winning scenario.

But what's the full range of outcomes we could draw?

10. Let's say you have a function that outputs a random integer between a minimum value, N, and maximum value, M. Now let's say we take the output from that function and make it the max value of another random number generator with the same min value N. What would the distribution of the samples look like? What would the expected value of the second function be?

Let X be the result of the first run and Y the result of the second run. Since the integer output is “random” and no additional information is given, we can assume all integers between and including N and M have an equal shot at being selected. Thus, X and Y are discrete uniform random variables with bounds N & M and N & X respectively.

11. Three zebras are sitting on each corner of an equilateral triangle. Each zebra randomly picks a direction and only runs along the outline of the triangle to either opposite edge of the triangle. What is the probability that none of the zebras collide?

Let's imagine all of the zebras on an equilateral triangle. They each have two options of directions to go in if they are running along the outline to either edge. Given the case is random, let's compute the possibilities in which they fail to collide.

There are only really two possibilities. The zebras will either all choose to run in a clockwise direction or a counter-clockwise direction.

Let's calculate the probabilities of each. The probability that every zebra will choose to go clockwise will be the product of each zebra choosing the clockwise direction. Given there are two choices (counterclockwise or clockwise), that would be 1/2 * 1/2 * 1/2 = 1/8

The probability of every zebra going counter-clockwise is the same at 1/8. Therefore, if we sum up the probabilities, we get the correct probability of 1/4 or 25%.

12. You call 3 random friends of yours who live in Seattle and ask each independently if it's raining. Each of your friends has a 2/3 chance of telling you the truth and a 1/3 chance of messing with you by lying. All 3 friends tell you that "Yes" it is raining. What is the probability that it's actually raining in Seattle?

Interpreting the direct result of the Frequentist approach, if you repeated the trials with your friends, there’s one event in which all three of your friends lied within those 27 trials.

However, since your friends gave the same answer, you’re not actually interested in all 27 of those trials, as that would include events where your friends had differing answers.

13. You flip a fair coin 576 times. Without using a calculator, calculate the probability of flipping at least 312 heads.

This question requires some memorization. At first glance, we can infer that it's a binomial distribution problem, given that we have to guess the number of heads out of a number of trials. Therefore, we’ll use a binomial distribution with n trials and probability of success p on each trial.

The expected number of heads for a binomial distribution is the probability of a success (a fair coin has a 0.5 probability of landing heads or tails) multiplied by the total number of trials (576). So 288 is the expected number of times that our coin flips will turn up heads.

Then, you would have to remember that the standard deviation of the binomial distribution is sqrt(n*p*(1-p)).

14. You're given a fair coin. You flip the coin until either Heads Heads Tails (HHT) or Heads Tails Tails (HTT) appears. Is one more likely to appear first? If so, which one and with what probability?

Okay, given the two scenarios, we can assess that both sequences need H first. Once H appears, the probability of HHT is now equivalent to 1/2.

Why is this the case? Because in this scenario, all you need for HHT is one H. The coin does not reset as we are flipping the coin continuously in sequence until we see the string of HHT or HTT happening in a row. Given that the first letter starts with H, this increases the chances of HHT occurring versus HTT.

Get a probability question every week with an in-depth solution on Interview Query.

Probability Theory Concept Review

Probability theory is the branch of mathematics that deals with uncertainty, underpinning all of statistics and machine learning. It has applications in specialized fields of data analysis, such as physics, meteorology, web searching, and econometrics. This means that any good data scientist should have at least an intermediate understanding of probability theory.

The main object of study in probability is events. An event is simply an outcome of some experiment, such as flipping a coin.

Experiments are defined by the fact that they are non-deterministic, in that we may do the same experiment twice and get two different outcomes (such as getting heads or tails).

You likely encounter probability in your everyday life. It might be a weatherman telling you there is a 20% chance it will rain today, or your boss mentioning that profits are likely to increase by 5% this year. Humans have a basic understanding that all things have a chance to happen (even if that chance may be zero).

However, numerous psychological studies have shown human beings are terrible at probability. For example, when a person randomly picks a number between 1 and 10, they are statistically more likely to pick seven than any other number. For this reason, we need a precise way to talk about probability in a way that is independent of human psychology. This language is called probability theory.

How much probability do I need to know?

The amount of probability you’ll need to pass your data science interview varies from role to role. We’ve analyzed Interview Query’s bank of real interview questions, sorted by position, to find out which roles are most likely to encounter probability in their interview.

We discovered that data scientists, research scientists, and machine learning engineers will encounter probability questions the most.

The likelihood that you’ll encounter probability questions in your data science interview is high, especially in the data scientist, machine learning engineer, and research scientist roles.


This is just a small sample of the types of probability interview questions you might be asked in your data science interview. If you’re looking for a more in-depth treatment of probability, check out Interview Query’s probability course.

The course has five sections:

  • Basic Probability
  • Discrete Distributions
  • Continuous Distributions
  • Multivariate Distributions
  • Sampling Theorems

It’s designed to take you from the basics of probability all the way to a functioning understanding of conducting probability analysis in the real world.

Finally, many of the questions outlined in this article came from Interview Query’s bank of real interview questions from companies like Google, Facebook, Amazon, and more. If you’re interviewing for data science roles, be sure to take advantage of the industry insights and practice offered through Interview Query, where you can take the next step in your data science career.