A recent theme in the blogosphere centers on how newcomers can get into the field of data science and statistical analysis. What are the necessary qualifications? And how can you go about getting those skills?
Unfortunately, the answers to these questions seem to present a quandary that was eloquently summed up by a comment I read on another blog I seem to have forgotten (perhaps it was Chandoo):
You need experience to get a job as an analyst. But the only way to get experience is to work in a job as an analyst.
Employers today are asking for more from all of their employees. And data analysts are no exception. In fact, the pressure to produce more with less is pushing many employers to merge business functions across smaller workforces.
For the data scientist and more importantly the aspiring marketing researcher or business intelligence analyst there is a three-headed monster to contend with. Each head represents a different role that you will need to fulfill in your career.
This post will help flesh out what this three-headed monster looks like and how each head behaves. Most importantly, you’ll learn what you need to know in order to slay this monster one head at a time, and become a rock star data analyst!
The Three-Headed Monster
Data scientists in the business world need a set of skills not unlike those found in other professional research fields. This triumvirate consists of the following types of knowledge:
- Data Analysis
Each of these types of knowledge represent one facet of information that the commercial data analyst will rely on regularly to perform at their best. So, let’s look at each in more detail.
There are some who will argue that any competent analyst should be able to use data to answer questions regardless of the substantive topic area.
To a degree, this is true. For example, I can study crime rates just as effectively as studying corporate customer satisfaction scores.
But I have the advantage of having backgrounds in both business and criminology.
Substantive knowledge about the field of study helps place your results in context, and allows a frame of reference for what is normal and what is unexpected.
Whether you are working on a six sigma project, consumer loyalty and satisfaction metrics, or optimizing your company sales funnel, knowing the relevant parameters and constraints on the process is important.
To become a rock star analyst, you don’t need deep substantive training. A solid background in the fundamentals will get you going. After that, you’ll simply get better as you learn more.
Data Analysis Knowledge
It almost goes without saying that a rock star data analyst should have solid skills in data analysis. But just which skills are necessary?
In today’s digital world strong quantitative statistical skills seem most important. These skills fall under the various headings of descriptive and inferential statistics, econometrics, and frequentist and bayesian analytical perspectives.
But what many outside the realm of data science don’t understand is that good analysts should also have knowledge of research methodologies such as experimental and quasi-experimental design, measurement principles, survey design, and secondary data analysis capabilities.
I also argue that strong candidates for data scientists should have at least a fundamental understanding of qualitative data analysis. Observing a social context directly, interviewing relevant stakeholders, and running focus groups provides a richness of information that cannot be captured in any database or survey.
The rock star analyst should know how to delve into that information and make sense of the patterns. Ultimately, this ability will inform quantitative data analysis efforts, and vice versa.
There is not denying that the growing world of e-commerce and digital content rely on a bedrock of programming code.
But not all codes are written as equals.
There are interpreted languages, such as HTML, CSS, and PHP: the bedrock on which most web content is created. These languages are designed to be interpreted by other programs (e.g. web browsers for viewing HTML), and do not beed to be converted to machine language before running.
In contrast, there are compiled languages such as C++, Visual Basic, and Python. These languages allow greater flexibility in creating complex processes, making them ideal for writing everyday programs (e.g. that browser you’re using is probably written in C++).
Then there are statistical programming languages that fall somewhere in between.
On the interpreted language side, there are proprietary languages used for major packages like SAS, SPSS, and Stata. On the other hand, there are programs like Excel that can compile and execute VB code to perform a multitude of analytics tasks.
Then there are pure statistical computing languages like S and R that behave much more like compiled languages (although like Python, they aren’t strictly compile-only languages).
Now don’t get nervous…the rock star analyst doesn’t need to have a degree in computer science. However, learning something about basic programming structures will be necessary incredibly useful for efficient data management and analysis. Regardless of your platform of choice, you will at least need to learn the code for that program (a topic I’ll be exploring extensively in this blog).
I also recommend that you take the time to learn another language aside the one used by your preferred statistical package. Python is probably the most widely useful language today. But it can be a little difficult for the novice programmer. If that’s you, maybe start with something more fun such as basic HTML and CSS for web programming.
Slaying the Three-Headed Monster
All of this might sound a little overwhelming if you’re new to the data analysis profession. So, let me break it down into a summary of next steps.
First, begin learning as much as you can about statistics and research methodologies. There are a number of great websites that offer such training for free (e.g. Coursera and Kahn Academy are good places to begin your search).
Next, choose a data analysis platform to learn on. R is more difficult to learn, but is open-source (i.e. free) and gaining widespread popularity (see www.r-project.org for the latest version). It would be a good idea talk to your employer, or others in the data analysis field to see what they recommend. I primarily use Excel, Stata, and R these days.
Finally, begin learning the basics of computer programming. It doesn’t really matter which language you use, since virtually every language makes use of the same basic tools such as if-else statements, loops, arrays, and variables, etc. (it’s okay if you don’t know these terms right now…you’ll learn about them quickly once you start). I recommend finding a good book, or web-based course on Python since it can be used for everything from video-game programming, to scientific computing.
One step at a time…you’ll get there. But the most important thing is to start by taking the first step. Remember, the fastest path between two points can still only be travelled one step at a time.