Skip to main content
6 answers
5
Asked 2060 views

What should I learn to become a Data Scientist?

I've already known Python libraries like pandas, numpy and some linear math. #science #python #datascience

+25 Karma if successful
From: You
To: Friend
Subject: Career question for you

5

6 answers


1
Updated
Share a link to this answer
Share a link to this answer

Su’s Answer

Hi Vladislav,


First of all, I'll put all the materials I found useful here:


Coursera (ML / Statistics / Big Data / Data Visualization)


- Machine Learning, by Stanford

- Deep Learning Specialization (5 courses), by deeplearning.ai

- Advanced Machine Learning Specialization (3/7 courses), by National Research University

- Bayesian Statistics, by University of California, Santa Cruz. check 1point3acres for more.

- Data Visualization and Communication with Tableau, by Duke

- Big Data Integration and Processing, by University of California, Santa Cruz


Books (ML / Statistics)


- Hands-On Machine Learning with SciKit-Learn and TensorFlow

- Python Machine Learning

- Pattern Recognition and Machine Learning (PRML)

- The Elements of Statistical Learning (ESL)

- An Introduction to Statistical Learning (ISL)

- Machine Learning: A Probabilistic Perspective

- Interpretable Machine Learning


Secondly, the role Data Scientist in tech industry have several different duties:

  • Data Analytics: interaction with data warehouse and discover insights, require SQL skills
  • Machine Learning Engineer: maintain ML models and solve business needs, close to backend software engineer role
  • Machine Learning Scientist: also related to ML models but less involved in large scale problems

So I'd suggest to find a particular role to start with and focus on. 


For example, as Data Analytics its a most have skill set to run sophisticated SQL queries and be familiar with modern data warehouse like Hive, SparkSQL. A great book to start with is: https://www.manning.com/books/big-data-warehousing-cx

As a machine learning engineer, I'd recommend to start with machine learning knowledges as well as general software engineer skill sets.

Most company that hiring particularly machine learning Scientist requires Phd degree or more than 5 years experience. 

Lastly, this is a fast changing industry and the requirements can be dramatically different in 3/5 years. So I'd suggest to take interviews with real companies every year.

Thanks.

Shaowei

Su recommends the following next steps:

Find a particular field of data scientist to start with
Go through the list of books/courses I shared above (and more..)
Knowing big data related skill sets is a good plus(Hadoop, Spark..)
Take interviews with real companies every year.
1
1
Updated
Share a link to this answer
Share a link to this answer

Sachin’s Answer

Hi Vladislav,

Thanks for the question. Here is a webpage that lists the steps detailing all the skills, knowledge and training you need to become a data scientist

https://www.superdatascience.com/blogs/how-to-become-data-scientist-from-scratch

Hope this helps and good luck!

1
0
Updated
Share a link to this answer
Share a link to this answer

karthik’s Answer

To become a data scientist, you could earn a Bachelor's degree in Computer science, Social sciences, Physical sciences, and Statistics. The most common fields of study are Mathematics and Statistics (32%), followed by Computer Science (19%) and Engineering (16%)
0
0
Updated
Share a link to this answer
Share a link to this answer

Robert’s Answer

You'll hear a lot about languages, tools, and technologies. However, the foundation of all data science is statistics. Learn how to do basic exploratory data analysis. Understand statistical distributions and how they apply to the real world. You'll need to understand things like inferential analysis, probabilities, linear predictions, how to build a statistical hypothesis, and how to create simulations to test your hypothesis.

Languages like python and libraries such as matplotlib, numpy, pandas, scikit-learn, and so on are the tools you can use, but it's very important to understand the mathematical concepts underlying the tools. Without that foundation, it's difficult to know if the tool or method you've chosen actually produces accurate results for your problem.

So, take the math courses first. Or at least the same time as working on your programming.
0
0
Updated
Share a link to this answer
Share a link to this answer

Aroquiamarie Kavitha’s Answer

The tasks and responsibilities of a data scientist vary between companies and sometimes within verticals. I assume the query is generic and the motivation is to get into an entry level job with data science functions. For starters, an aspirant needs to get familiar with the following.
1. Basic math for data science (Linear algebra, elementary calculus and statistics)
2. Writing code with basic programming constructs (either python or R)
3. Data wrangling skills (use of Database technologies like SQL to handle larger data that doesn't fit in spreadsheet)
4. Hands on mindset to play around with different data tools / softwares on linux based systems.
5. Understanding the nature of how the data is created and the business function of the data
6. Storytelling with data (talking different stakeholders of the business on the findings and observations about the data)

Good to have:

1. Basic knowledge of handling data in cloud systems like AWS, Google cloud, Azure.

Basic mini courses
kaggle courses (they have a curriculum from beginner to intermediate level)
https://www.kaggle.com/learn [kaggle.com]
If you have programming experience and looking for an experiential learning with more hands activities via programming try fastai
https://course18.fast.ai/ml.html [course18.fast.ai]
If you have good foundation in high school math and prefer the traditional learning methodology, Stanford CS229 Machine learning is a good place to start
https://www.youtube.com/playlist?list=PLoROMvodv4rMiGQp3WXShtMGgzqpfVfbU [youtube.com]

Once done you can start working on portfolio projects of interests and showcasing them in your resume as suggested in the recommended courses. Often try to solve real world problems by taking part in kaggle competitions.
0
0
Updated
Share a link to this answer
Share a link to this answer

karthik’s Answer

Python Coding.
Hadoop Platform.
SQL Database/Coding.
Apache Spark.
Machine Learning and AI.
Data Visualization.
Unstructured data.
0