Written by Austin Gorsuch, Ivan Liao, and Zachary Botkin.
The explosive growth of Big Tech and data over the past decade has yielded a constantly developing field and ever-specializing careers. Within this shift, data science and data engineering have emerged as two roles that seem to occupy similar niches, but actually have very different responsibilities. In a nutshell, data engineers make the systems that data scientists use to make decisions.
Data engineering involves the development, testing, and maintenance of data pipelines.
Solid data engineering work makes an efficient data science workflow possible, working behind the scenes to create and optimize databases to make data queries and analysis as easy as possible for the data scientists with whom they work.
Data engineers are also responsible for facilitating Extract, Transform, and Load (or ETL) procedures on large amounts of data. As a result, data engineering is a discipline steeped in fundamentals, requiring a strong background in programming and development with an emphasis on the raw fundamentals of solid code.
Data science is moreso involved with organizing and interpreting data from a pipeline.
A good data scientist takes advantage of solid data engineering to ask– and answer– important questions about large amounts of data. They are responsible for presenting analyses to people who may not have a programming background, so data science requires a high degree of knowledge specific to the field of employment and strong communication skills to share findings with a diverse audience of interested parties.
A data scientist working in finance without any background knowledge of the field will have a difficult time communicating why the data looks the way it does or why that should matter to anyone.
Therefore, while data science requires a strong background in mathematics and statistics under any circumstance, for many jobs that is only the foundation of the data scientist’s toolkit and further specialization may be required. If data science interests you, you will want to develop a strong basis in Python, SQL, and R, as well as a body of domain-specific knowledge for the field in which you’d like to work.
Running a keyword counter from various articles, we created a heatmap of key skills for both data science and data engineering, displaying the concrete skills that both professions use most frequently. As mentioned above, math and analysis skills seem to be more important for data scientists than data engineers, while R and Python are fundamental skills regardless of position.
Depending on where you work, the process of data cleaning can fall into the domain of either data engineering or data science. In a particularly robust data engineering division, a data scientist may query a database to find all of their data already neatly ordered and ready for analysis. In a smaller company, or an operation with a less developed data engineering division, the responsibility of cleaning data may fall to the data scientist instead.
Therefore, whether you’re interested in data engineering or data science, some familiarity with the process of cleaning or wrangling data may be useful. Even if the responsibility doesn’t fall to you (whatever your role may be), understanding the commonly applied methods of data cleaning will help you understand your data set (as a data scientist) or the needs of your data science team (as a data engineer), and make you that much more effective at your job.
While it's important to distinguish between the disciplines of data science and data engineering for the purposes of choosing a career, it’s even more important to recognize the degree to which these two disciplines are complementary components of any organization.
On the one hand, data science simply isn’t possible without the data pipeline that data engineering enables behind the scenes. You can’t perform a query on a database that isn’t there.
On the other end of the spectrum, the work done by data scientists on a daily basis will determine the sorts of work performed by the data engineers at the same company.
The shape of a given database and how it is developed and maintained is related to how that database will be used, which is something that a data scientist decides. Therefore, no matter which of the two careers you personally find appealing, it's necessary to become familiar with the other, since chances are you'll be working in close quarters with someone from the other side of the tracks when it comes to fulfilling your long-term responsibilities at any company.
The definitions for data science and data engineering are constantly shifting. It has been 8 years since Harvard Business Review hailed data science as the “sexiest job of the 21st century”. Many people predicted that a data science bubble would burst, but it has evolved instead.
Both data science and data engineering are common positions (and only growing more common) among employers and focusing on either will bring you into contact with many opportunities.