8 answers
Yubing’s Answer
N’s Answer
If you want to become a data scientist, you can start by taking some free courses online, learn more about data science in general. Get familiar with the concept of machine learning, as there are tools that will take over a lot of what we currently do as data scientists today. In parallel, you can get into coding a little bit, try to get familiar with SQL coding (there are so many resources online to learn). In addition to SQL, Python is a good one to add to your skills. Data manipulation tools such as Alteryx are a great add, and have free training available online. Data visualization is also very important, so make sure you get familiar with tools such as Tableau or Power BI. Hope this helps!
Ilaria’s Answer
As data scientist, I think it's important to have a very good understanding of:
- computer science: you need to be able to code. You would need to know very well SQL and Python or R as a minimum requirement. Those languages are not very complex to learn and they are used in several fields
- statistics: machine learning and statistics go hand in hand. It is very important to understand basic statistics concepts like probability, distributions, hypothesis testing just to mention a few.
- machine learning: ML is key for certain data science positions. Often you would find yourself in the situation of deciding which ML model to run and why. It is very important to have an understanding of which models exists, what do they solve and when they can be applied.
- visualisation: being able to have a graph describing your results or analysing data using visualisations techniques is also very important. You might have to create a dashboard to summarize your results. Tableau is a very good tool which is not very complex to use and it is widely used.
- presenting: as data scientist you will find yourself involved in discussions both to decide what to do and why, but as well to present your finding and explain clearly (without going into too many details) what you achieved. Having good presentation and communication skills is important - and not something just for DS.
- logical thinking: another important skill is an analytical mindset that can adapt to solve different problems. This is something that you acquire working on different projects and with experience.
There are plenty of online courses that can help you acquire those skills (see attached links).
I hope to have given you a good overview of some of the most important skills to have to become a data scientist! You might feel it is a lot but don't feel discouraged! You will always learn new things and skill during your career!
Regards,
Ilaria
Ilaria recommends the following next steps:
Paula’s Answer
Sheila’s Answer
Hi Waweru:
Data scientists are a relatively new type of computer scientist who focus heavily on solving problems by using their skills in statistics, computer programming, and machine learning to analyze very large data sets.
Data scientists could potentially have a number of different educational and professional backgrounds, as long as they have the necessary skills to be successful. However, pursuing a bachelor's degree directly in data science or in a related field such as computer science would likely help you acquire these skills. Some individuals may be interested in also obtaining a master's degree to further specialize and hone their skillsets. Below are the skills required in a data science career:
REQUIRED SKILLS:
Individuals who want to work as data scientists will need to be highly proficient in a number of computer programming languages, like R, Python, and Matlab. You will use these programming languages extensively as a data scientist in order to properly analyze and visualize data sets. They also are used for machine learning. Mathematical skills are also important in this field, especially in the areas of statistics, linear algebra, and probability. In addition to these skills, data scientists often work with large amounts of data, necessitating an ability to stay organized.
TECHNICAL SKILLS (Computer Science):
- Python Coding - Python is the most common coding language I typically see required in data science roles, along with Java, Perl, or C/C++.
- Hadoop Platform - Although this isn’t always a requirement, it is heavily preferred in many cases. Having experience with Hive or Pig is also a strong selling point. Familiarity with cloud tools such as Amazon S3 can also be beneficial.
- SQL Database/Coding - Even though NoSQL and Hadoop have become a large component of data science, it is still expected that a candidate will be able to write and execute complex queries in SQL.
- Apache Spark - Apache Spark is becoming the most popular big data technology worldwide. It is a big data computation framework just like Hadoop. The only difference is that Spark is faster than Hadoop. This is because Hadoop reads and writes to disk, which makes it slower, but Spark caches its computations in memory.
- Machine Learning and AI - If you want to stand out from other data scientists, you need to know Machine learning techniques such as supervised machine learning, decision trees, logistic regression etc.
- Data Visualization - As a data scientist, you must be able to visualize data with the aid of data visualization tools such as ggplot, d3.js and Matplottlib, and Tableau.
- Unstructured Data - Unstructured data are undefined content that does not fit into database tables. Examples include videos, blog posts, customer reviews, social media posts, video feeds, audio etc.
Visit kdnuggets.com for further info: - - >https://www.kdnuggets.com/2018/05/simplilearn-9-must-have-skills-data-scientist.html
This is new territory for a Data Scientist. There's so much data and information out there in the world until it is endless. I wish you the best and much success on your journey.
~ Sheila
Sheila recommends the following next steps:
Yi’s Answer
Linear algebra (essential to understanding most ML/AI approaches)
Basic differential calculus (with a bit of multi-variable calculus)
Coordinate transformation and non-linear transformations (key ideas in ML/AI)
Linear and higher-order Regression (make predictions based on existing data)
Himanshu’s Answer
1. Gathering Data: It can be as simple as getting excel file or as complicated as writing SQL scripts to query a database. So depending on how the data is available, you may want to start by learning some SQL (Structured Query Language). There are many type of SQL styles and all depends on the type of database you are querying. So make sure you ask your customer this question, and get to know what database they are using.
2. Understanding raw data: If the data is not much you can use simple tools like Microsoft Excel. Try to understand what the data is about and what each row represents and what all the columns are and how are they related from business point of view. You can pivot the data and make some charts to understand frequency (histograms) and trends of quantitative measures broken down by qualitative measures.
If you get familiar with Microsoft Excel, you can go to advanced tools like PowerBI and Tableau for understanding data as well. Though, these tools are used to create beautiful reports mostly.
3. Data Visualization: Like I said before, you can use PowerBI and Tableau to create beautiful reports that would contain visualizations like trend charts, bar charts, funnel charts, etc..
4. Data Analysis: To make things simple for you, data analysis can be as simple as pivoting table in Microsoft Excel and can be as complicated as writing a thousand line script in Python. So it depends on the type of data and the objective of you project. I would not recommend R for doing data analysis from industry perspective even though most colleges use R for their research and teaching. Python is now the preferred way in most companies that I have heard about, so if you are beginner, you might want to spend your time on Python.
5. Data Reporting: I would argue with most people over this but I believe that you don't just need technical skills to be a good data person but you also need to have good presentation skills. You see, after doing all the intensive work and making the reports, if you are not able to get your message across then it does not matter what type of graph you prepare. You have to learn to speak the language of the company and the terms they use when they talk about the data. One useful trick I find is, think of yourself as the person whom you are presenting the data and think about all the possible questions he might ask. So, all in all, you have to learn to strike the balance to speak in terms of data but also keep it relevant for the target user.
6. Machine Learning and AI: These are more complex part of Data Science and requires a good understanding of statistics concepts along with expertise in scripting languages like Python or R. If you are beginning to learn, do not just learn to implement the models but question what the model really is and why is the best fit for the data. Look at the math behind the model and how it translated to code.
I hope this helps
-H