Facebook is arguably the world's most popular social media network with over 2 billion worldwide active users. So it is no longer news that Facebook accumulates and stores massive amounts of user data, making it a treasure trove for anyone looking to build a career in data science. Whether its data scientist, data analyst or data engineering, Facebook will offer you a scale only a few companies can match.
As a data engineer at Facebook, you will not only get to work with the most advanced tools and platform any data engineers can ever dream of, but you will also see a direct link between your work, company growth, and user satisfaction.
The Data Engineer Role at Facebook
Data engineer roles in any enterprise data analytic team range from managing, optimizing, and overseeing data retrieval system to building complex and robust data pipelines and algorithms. In more technical terms, their job involves finding trends in data sets, developing algorithms for enhanced data collection, compiling database systems, and writing complex queries for refining dataset.
“Data engineers at Facebook are part of a tightly integrated team of core technical functions that support every product team at Facebook. They help product decisions alongside software engineering, design, product management, data science, research, and others”.
At Facebook, data engineers lay the groundwork for data analysis by building and managing scalable data pipelines and frameworks, designing data warehouses for internal business use, and leveraging big data technologies to transform raw and complex data into actionable insights for better business decision making.
The data engineer role at Facebook requires time-tested skills and extensive industry experience. As a result, Facebook chooses to hire only highly qualified applicants with at least 4 years of industry experience in data warehouse space.
Other Minimum Qualifications Include:
- BS/BA in Computer Science, Mathematics, Physics, or other technical fields.
- Over 4 years of experience in writing complex SQL, Dataframe APIs, developing custom ETL, implementation, and maintenance.
- Extensive industry experience with either a MapReduce or an MPP system.
- Deep understanding of data architecture, Machine learning methods, Schema design, and dimensional data modelling.
- Hands-on experience with object-oriented programming languages (Java, Python, C++, Scala, Perl, etc.)
- Experience in analyzing large dataset to identify deliverables, gaps, and inconsistencies.
Types of Data Engineer Teams at Facebook
Facebook is a very large product-based company with many departments, teams, and sub-division levels. As a data-driven company, Facebook relies heavily on data to make sound business decisions.
Data engineers are responsible for data collection and data integrity and they work cross-functionally with internal teams to help facilitate the leap from data to sound decision-making processes. As a result, data engineers at Facebook work within teams and their specific roles may differ a little based on the team roles.
Depending on the team, data engineer roles at Facebook may include:
Facebook App Monetization (FAM) Team: Roles include designing and building a strong data foundation, infrastructure, and architecture that will aid analytics, product, engineering and FAM leadership drive better decisions. They also work closely with data infrastructure teams to suggest improvements and modifications to existing data and ETL pipelines and communicate strategies and processes to multi-functional groups and leadership.
Data Warehouse Team: Roles within the ream include designing/building/launching new ETL processes and data models in production, managing data warehouse plans, partnering with engineers, product managers and product analysts to understand data needs, and collaborating with the data infrastructure team to triage infra issues and drive to resolution.
Novi Blockchain Data Engineering Team: Data engineers in this team design and implement scalable data repositories to integrate qualitative and quantitative research data, build and launch new ETL processes in production, and identify, collect and transform user interaction data and server events data into scalable schema models. They also work closely with Product Managers, Data Scientists, Software Engineers, Economic Researchers, Compliance, and Risk Management to build unique and intuitive products to tackle challenging problems.
Facebook Video Distribution: Roles include developing optimal data processing architecture and systems for new data and ETL pipelines, and recommending improvements and modifications to existing data and ETL pipelines. Collaborating with Facebook internal teams to understand their needs and links these needs within the framework of data engineering solutions.
Partnerships Central Systems, Data and Tools Team: Responsibilities include building and maintaining efficient and reliable data pipelines to move and transform data, build models that provide intuitive analytics, and collaborate cross-functionally to frame problems, gather data, and provide business-impact recommendations.
Family Ecosystems: General roles include, working with data infrastructure, product software Engineering, and product management teams to develop and validate, architecture-driven, end to end analytics development products, tools, and infrastructure stacks. Other roles include building an optimal data processing framework for new data and ETL pipelines/applications, build visualization for data and metrics insights, and effectively communicate strategy within teams and across various leadership level.
The Interview Process
The Facebook data engineer interview follows a standard interview process like other Facebook technical roles. The interview process starts with an initial recruiter phone call interview where the roles and interview process will be explained. After this, is a one-hour long technical phone screen involving SQL and Python/Java coding. After passing the technical screen, an onsite interview consisting of 3 to 4 back-to-back interview rounds will be scheduled.
Need a thorough list of data engineering interview questions? Check out our ultimate guide to the data engineering interview.
This is a 30 minutes long phone call interview with a recruiter or HR. Within this phone call conversation, the recruiter gets to explain more about the job role and what to expect subsequently with the interview process.
The Facebook DE technical interview is a 1-hour long phone interview involving SQL and Python/Java (depending on your programming language preference) coding using “Coderpad”. Questions are usually around 8 to 10 in number and are divided equally between SQL and Python (5 SQL/5Python) and there’s an algorithm question for both SQL and Python.
Note: You will be always limited by time (1-hour max.). It helps to clearly communicate your thought process with the interviewer while solving problems especially around the coding section.
Getting ready for your Facebook data engineering interview by practicing a few questions a week on Interview Query
The last stage of the Facebook data engineer interview process is an onsite interview comprised of 3 full-stacked interviews (2 ETL rounds, 1 data modelling round), 1 behavioral round, and a lunch break in-between.
Except for the behavioral interview, every other interview round will have a product sense element that tests the candidate’s product-sense knowledge on key operational metrics. You can expect questions like “What metrics would be good to capture for x scenario?”, “Describe a situation where you did not agree with the stakeholders and how did you handle it?”. Questions around ETL and modelling are case-based and may require some amount of coding.
A breakdown of the onsite interview process is as follows:
- ETL Round: This round involves writing SQL and python/java code that resembles standard Facebook ETL codes.
- Modelling Round: This round has a mixture of SQL and Python and questions involves data model questions based on business scenario
- Behavioral: This interview assesses a candidate's communication skills and how well they can convey their thoughts and ideas. Work on preparing your own stories, for example, a story on how you achieved success on a project, or about a time you dealt with a major failure, or on how you overcame a particular challenge on a project.
Note: Pre-pandemic era, this interview was done onsite at the Facebook building, but because of the pandemic, every interview is done virtually (online).
Notes and Tips
The Facebook data engineer interview process aims to assess candidates' abilities to utilize big data to provide actionable business insights for growth. Facebook uses standardized questions to test the candidate’s in-depth knowledge of data architecture and frameworks as well as key operational metrics for all Facebook products.
Also, remember that Facebook uses standardized questions for all their interview process especially coding interviews. Try to explain your thought process while answering questions; communicate clearly to the interviewer how and why you used the methods you used.
The Facebook data engineer interview covers the length and breadth of data science domains including modelling, visualization, system designs, and end-to-end solutions from a data engineering perspective. Questions can span across:
- Data structures and algorithms
- Writing SQL queries to solve a real-world problem
- DB performance tuning
- Data pipeline design
- Metric and visualization solution design for a business case
- Statistics and modelling
- Previous project experience
- Big data solutions like Spark, EMR
- Reporting tools like Tableau, Excel
- Building data platforms or architecture for a hypothetical or existing Facebook product.
Practice lot of SQL, Python/Java, modeling, and algorithm questions including lists, arrays (strings and substrings), dot product, JOINS, SUBQUERY, AGGREGATE functions, and GROUP BY. Try coding on a whiteboard to get familiar with the on-site interview experience.
Facebook Data Engineer Interview Questions
- Given an array of integers, we would like to determine whether the array is monotonic (non-decreasing/non-increasing) or not.
1 2 5 5 8->true
9 4 4 2 2->true
1 4 6 3->false
1 1 1 1 1 1->true
- Design a dashboard to highlight a certain aspect of the user behaviour
- Does database view occupy the disk space.
- What is a loop that goes on forever?
- What is the term used to select non duplicates in SQL?
- Find the max no from the given set of elements in an array (without using max function)
- Find the minimum absolute difference between the set of elements of an array.
- Create DDL (table and foreign keys) for several tables in a provided ERD. ERD contains at least one many to many relationship.
- Recursively parse a string for a pattern that can be either 1 or 2 characters long.
- Perform a merge-sort with SQL only.
- Given full authority to "make it work", import a large data set with duplicates into a warehouse while meeting the requirements of a business intelligence designer for query speed.
- Query a many to many relationship while not violating the grain of a fact table.
- Given a number and an array find the sum of any 2 numbers in a list is equal to a given number.
- Design an experiment to test whether a feature spurs conversation.
- Describe your projects.
- Given a raw data table, how would you write the SQL to perform the ETL to get data into the desired format?
- How do rate the popularity of a posted video online?
- Given an IP address as an input string, validate it and return True/False
- Count the neighbors of each node in a graph. input graph is a multidimensional list
- Given a list of tuples of movie watched times, find how many unique minutes of the movie did the viewer watch e.g. [(0,15),(10,25)]. The viewer watched 25 minutes of the movie.
- How do you delete duplicate in a list?
- Given a multi-step product feature, write SQL to see how well this feature is doing (loading times, step completion %). Then use Python to constantly update average step time as new values stream in, given that there are too many to store in memory
- How do you join two tables with all the information on the left one unchanged?
- What operator will you use if you want to join a table 2 tables with one left and matched the right one?
- The ORDER BY command in SQL is automatically set in what format if you didn't set it? Ascending or Descending?
- When you want to delete or add a column of a table in a database, what command you will use?
- You have a 2-D array of friends like [[A,B],[A,C],[B,D],[B,C],[R,M], [S],[P], [A]]
- Write a function that creates a dictionary of how many friends each person has. People can have 0 to many friends. However, there won't be repeat relationships like [A,B] and [B,A] and neither will there be more than 2 people in a relationship