The SAP data science interview consists of a multi-step process interview process. The first step is a technical live Skype coding screen. The data scientist interviewer will ask about your background and your past experience. Prepare a good five minute elevator pitch on past projects, your interest in data science, and why you want to work at SAP.

After going over background and past projects, the interviewer will jump into a live coding exercise and problem. These data science coding questions asked by SAP will likely be in Python and done over Coderpad.

Here are some example coding questions asked in past SAP data science interviews:

  • Given a string, return the first recurring character in it, or ‘None’ if there is no recurring character.
  • Let’s say you’re given a huge 100 gb file that cannot be entirely read into memory at once. Write code in Python to count the total number of lines in the file.

Onsite Interview

The second phase of the SAP data science interview will be an on-site. This will consist of one behavioral interview and three technical interviews. The technical interviews will span a process of white-boarding coding questions, SQL and data analysis and processing questions, and machine learning conceptual questions. Make sure to review general machine learning concepts and coding exercises.

An example SAP technical interview in machine learning would be going deep into two projects that an SAP data scientist is working on. The interviewer would describe two of their current projects and would brainstorm what techniques to improve model performance. Make sure to engage in a thoughtful discussion as this is how they measure how well they would work together with you.

Let's look at some more example SAP data scientist interview questions.


  • How would you design a recommendation system for amazon customers? Take into consideration that a single customer could use many devices to log-on to a single account?
  • What is the Big O notation for dimensionality reduction using recursion vs dynamic programming?
  • How do you deal with an unbalanced dataset?
  • Given a regression model, what metrics would you use to evaluate the model?
  • What’s the difference between a gaussian and rbf kernel?