In defense of the DIY data scientist

Some people start their data science journey with a formal education. They learn the essential math, programming, and statistics at a university. And they enter the data science world equipped with a slick degree.

But many others do not. Instead, they piece together knowledge on their own. They scoop up what they need when they need it. They’re self-taught; DIY.

While everyone’s journey is a bit different, chances are that if you’re a DIY data scientist, yours includes one or more of the following: become an expert user of excel, get compelled to learn some programming, tackle data-related problems at work, get introduced to machine learning by a colleague or friend, learn what some algorithms do, build models using scikit-learn, participate in Kaggle competitions.

Many of these activities fall under the umbrella of “applied data science” — using code that has already been packaged together neatly to enable the quick application of complex ideas.

There are some issues with how readily available these resources are. People with limited knowledge make mistakes. They apply things incorrectly. They fool themselves.

There are plenty of warnings out there about problems that result from letting the wrong people, or the wrong situation run wild with data and data science tools.

But there’s also a need to balance skepticism around democratization, with optimism and perspective.

In fact, I’d argue that “applied” is the way of the future, just as it has shaped the past.

As a society, we learn something, we package it together, and we make it easier to use and build upon. All builders are also users — taking advantage of tools and structures spanning across a long bridge of time and human progress.

Nearly everyone can use a computer, but only a select few can give a detailed explanation of the physics of transistors and the code that brings internet browsing down to 1s and 0s.

Of course, those who possess this knowledge may be some of the “best” computer-users in the world. But can the average person not accomplish many impressive things with a computer?

Many DIY data scientists’ journeys will take place exclusively in this world of application. They will equip themselves with the tools and knowledge necessary to solve a growing list of problems. They should, of course, be taught to be careful, but it would be foolish to doubt their ability to create value — especially as tools get better. Just as data problems exist on a spectrum, so does the skillset required to solve them.

Some DIY-data-scientists’ journeys will not stop at basic application. They will go on to get undergraduate and master’s degrees to supplement their knowledge. Or they will find a niche and learn it deeply. Or they will go so deep in self-teaching that their knowledge meets or exceeds those who are formally educated.

Others, particularly those with a diverse set of skills, will go on to build and lead companies — leveraging their data literacy to create value in an increasingly data-driven world.

Finally, let’s face it. There is something admirable about people who teach themselves.

People who battle their way through online courses. People who spend their nights and weekends debugging code and learning new tools. People who are so motivated and passionate that they overcome countless challenges to solve problems. At the very least, this level of tenacity says something about work ethic and internal motivation.

Teaching yourself how to solve problems is cool.

Article written by: Skylar Dale

April 13, 2020