What should I learn to become a Data Scientist?
I've already known Python libraries like pandas, numpy and some linear math. #science #python #datascience
6 answers
Su’s Answer
Hi Vladislav,
First of all, I'll put all the materials I found useful here:
Coursera (ML / Statistics / Big Data / Data Visualization)
- Machine Learning, by Stanford
- Deep Learning Specialization (5 courses), by deeplearning.ai
- Advanced Machine Learning Specialization (3/7 courses), by National Research University
- Bayesian Statistics, by University of California, Santa Cruz. check 1point3acres for more.
- Data Visualization and Communication with Tableau, by Duke
- Big Data Integration and Processing, by University of California, Santa Cruz
Books (ML / Statistics)
- Hands-On Machine Learning with SciKit-Learn and TensorFlow
- Python Machine Learning
- Pattern Recognition and Machine Learning (PRML)
- The Elements of Statistical Learning (ESL)
- An Introduction to Statistical Learning (ISL)
- Machine Learning: A Probabilistic Perspective
- Interpretable Machine Learning
Secondly, the role Data Scientist in tech industry have several different duties:
- Data Analytics: interaction with data warehouse and discover insights, require SQL skills
- Machine Learning Engineer: maintain ML models and solve business needs, close to backend software engineer role
- Machine Learning Scientist: also related to ML models but less involved in large scale problems
So I'd suggest to find a particular role to start with and focus on.
For example, as Data Analytics its a most have skill set to run sophisticated SQL queries and be familiar with modern data warehouse like Hive, SparkSQL. A great book to start with is: https://www.manning.com/books/big-data-warehousing-cx
As a machine learning engineer, I'd recommend to start with machine learning knowledges as well as general software engineer skill sets.
Most company that hiring particularly machine learning Scientist requires Phd degree or more than 5 years experience.
Lastly, this is a fast changing industry and the requirements can be dramatically different in 3/5 years. So I'd suggest to take interviews with real companies every year.
Thanks.
Shaowei
Su recommends the following next steps:
Sachin’s Answer
Hi Vladislav,
Thanks for the question. Here is a webpage that lists the steps detailing all the skills, knowledge and training you need to become a data scientist
https://www.superdatascience.com/blogs/how-to-become-data-scientist-from-scratch
Hope this helps and good luck!
karthik’s Answer
Robert’s Answer
Languages like python and libraries such as matplotlib, numpy, pandas, scikit-learn, and so on are the tools you can use, but it's very important to understand the mathematical concepts underlying the tools. Without that foundation, it's difficult to know if the tool or method you've chosen actually produces accurate results for your problem.
So, take the math courses first. Or at least the same time as working on your programming.
Aroquiamarie Kavitha’s Answer
1. Basic math for data science (Linear algebra, elementary calculus and statistics)
2. Writing code with basic programming constructs (either python or R)
3. Data wrangling skills (use of Database technologies like SQL to handle larger data that doesn't fit in spreadsheet)
4. Hands on mindset to play around with different data tools / softwares on linux based systems.
5. Understanding the nature of how the data is created and the business function of the data
6. Storytelling with data (talking different stakeholders of the business on the findings and observations about the data)
Good to have:
1. Basic knowledge of handling data in cloud systems like AWS, Google cloud, Azure.
Basic mini courses
kaggle courses (they have a curriculum from beginner to intermediate level)
https://www.kaggle.com/learn [kaggle.com]
If you have programming experience and looking for an experiential learning with more hands activities via programming try fastai
https://course18.fast.ai/ml.html [course18.fast.ai]
If you have good foundation in high school math and prefer the traditional learning methodology, Stanford CS229 Machine learning is a good place to start
https://www.youtube.com/playlist?list=PLoROMvodv4rMiGQp3WXShtMGgzqpfVfbU [youtube.com]
Once done you can start working on portfolio projects of interests and showcasing them in your resume as suggested in the recommended courses. Often try to solve real world problems by taking part in kaggle competitions.
karthik’s Answer
Hadoop Platform.
SQL Database/Coding.
Apache Spark.
Machine Learning and AI.
Data Visualization.
Unstructured data.