An increasing number of companies, both large and small, are now using enormous amount of data for analysis to gain information to better support their company and serve their customers. The scale and magnitude of data confronting companies is difficult to imagine. Take this. A study has estimated that by 2024, the world’s enterprise servers will annually process the digital equivalent of a stack of books extending more than 4.37 light-years to Alpha Centauri, our closest star system in the Milky Way Galaxy. That’s a lot of data to gather and analyze!
Business or data analysts have been around for a while. However, during the last decade, data scientists have had the spotlight turned on them. In an article in the October 2012 issue of Harvard Business Review, the data scientist’s job was termed as the sexiest job of the 21st century!
First, what does a data scientist do?
A data scientist represents an evolution from the business or data analyst role. The formal training is similar, with courses in computer science and applications, modelling, statistics, analytics and math. However, it is to be supplemented with a strong business acumen and ability to communicate findings to business leaders in a way that can influence how an organization approaches a business challenge. It is said that good data scientists do not just address problems; they pick the problems that have most value to the organization.
While a data analyst may look only at data from a single source, a data scientist will explore and examine data from multiple disparate sources. The data scientist will sift through all incoming data with the goal of discovering a previously hidden insight, which in turn can provide a competitive advantage or address a pressing business problem. A data scientist does not simply collect and report on data, but also looks at it from many angles, determines what it means, then recommends ways to apply the data.
Data scientists are inquisitive: exploring, asking questions, doing “what if” analyses, and questioning existing assumptions and processes. Armed with data and analytical results, a top-tier data scientist will then communicate informed conclusions and recommendations across an organization’s leadership structure.
Here’s a great video of a data scientist talking about his work
A data scientist requires mastery of a number of fields. Core competencies include:
Basic tools: A statistical programming language, like R or Python, and a database querying language like SQL.
Basic statistics: Familiarity with statistical tests, distributions, maximum likelihood estimators, etc.
Machine learning: Familiarity with machine learning methods like k-nearest neighbours, random forests, ensemble methods.
Multivariable calculus: Grip on basic multivariable calculus or linear algebra questions.
Data munging: Knowing how to deal with imperfections in data, especially for small companies or early hires.
Data visualization and communication: Familiarity with data visualization tools like ggplot and d3 and communication techniques for both technical and non-technical audiences.
Software engineering: Strong software engineering background would be necessary.
Data science is still nascent and ill-defined as a field. Getting a job is as much about finding a company whose needs match your skills as it is developing those skills. Data scientists today are akin to Wall Street “quants” of the 1980s and 1990s. In those days people with backgrounds in physics and math were favourites of investment banks and hedge funds.
In India, an average data scientist earns a starting salary of over Rs 600,000 per year. Experience strongly influences income for this job.