5 answers
Asked
943 views
Dislikes about Data Science?
What do you dislike about Data Science? Is there a task that annoys you, or you just don't like doing?
Login to comment
5 answers
Updated
Manesh’s Answer
Hello, I notice that you're brimming with questions about the field of Data Science. It's wonderful to witness your keen interest and determination to delve deeper into this area, aiming to comprehend how you can thrive in it. In an attempt to address your multitude of queries, I'll consolidate my responses into a single, comprehensive answer. Please bear with me as this might turn out to be quite lengthy.
While I must admit that I'm not a Data Scientist by profession, I hold a degree in Statistics and have a solid background in Monte Carlo Simulation and Bayesian Analysis. I also work in close collaboration with our Data Science Team, so I'm more than willing to offer my perspective on this subject.
In response to your initial question about the characteristics needed to become a Data Scientist, a robust understanding of Mathematics is crucial. A deep comprehension of Statistics is essential as it forms the backbone of data interpretation and results analysis. If you have a passion for Statistical Math, you're off to a great start. Another vital trait is curiosity. You should be the kind of person who loves to ask questions and seeks evidence. Moreover, you should be willing to challenge your own hypothesis. It's often easy to justify a hypothesis or viewpoint using data, but striving to disprove it is a unique skill.
Additional skills and knowledge that will significantly aid you include the ability to query data using SQL. Despite the existence of numerous No-SQL databases, the fundamental understanding of joins, filters, relationships, and data navigation from a Database is indispensable. Complementing this is the need for some programming skills. You don't necessarily need to master a specific language like Java, Python, or NodeJS (although that would be beneficial), but having a mindset that grasps programming logic, iteration, parsing, and programmatic operations is a critical skill.
One common frustration among Data Scientists is the lack of control over certain aspects. These include:
a) The data source - initially, you have little control over what data is collected, the collection method, and frequency.
b) The data's accuracy and completeness - issues like incomplete or inaccurate data collection can arise.
c) The systems used for data mining - the suitability of the data storage for your analysis type and the budget for acquiring better tools.
d) Time estimation - it can be challenging to predict how long it will take to obtain specific answers, which can be stressful when under pressure as businesses increasingly rely on data science results for crucial decisions.
However, these challenges are balanced by the rewarding outcomes of your work. The impact you can make on a business or research output can be exhilarating. The significant contributions you can make to companies can be incredibly rewarding and satisfying.
While I must admit that I'm not a Data Scientist by profession, I hold a degree in Statistics and have a solid background in Monte Carlo Simulation and Bayesian Analysis. I also work in close collaboration with our Data Science Team, so I'm more than willing to offer my perspective on this subject.
In response to your initial question about the characteristics needed to become a Data Scientist, a robust understanding of Mathematics is crucial. A deep comprehension of Statistics is essential as it forms the backbone of data interpretation and results analysis. If you have a passion for Statistical Math, you're off to a great start. Another vital trait is curiosity. You should be the kind of person who loves to ask questions and seeks evidence. Moreover, you should be willing to challenge your own hypothesis. It's often easy to justify a hypothesis or viewpoint using data, but striving to disprove it is a unique skill.
Additional skills and knowledge that will significantly aid you include the ability to query data using SQL. Despite the existence of numerous No-SQL databases, the fundamental understanding of joins, filters, relationships, and data navigation from a Database is indispensable. Complementing this is the need for some programming skills. You don't necessarily need to master a specific language like Java, Python, or NodeJS (although that would be beneficial), but having a mindset that grasps programming logic, iteration, parsing, and programmatic operations is a critical skill.
One common frustration among Data Scientists is the lack of control over certain aspects. These include:
a) The data source - initially, you have little control over what data is collected, the collection method, and frequency.
b) The data's accuracy and completeness - issues like incomplete or inaccurate data collection can arise.
c) The systems used for data mining - the suitability of the data storage for your analysis type and the budget for acquiring better tools.
d) Time estimation - it can be challenging to predict how long it will take to obtain specific answers, which can be stressful when under pressure as businesses increasingly rely on data science results for crucial decisions.
However, these challenges are balanced by the rewarding outcomes of your work. The impact you can make on a business or research output can be exhilarating. The significant contributions you can make to companies can be incredibly rewarding and satisfying.
Updated
Danielle’s Answer
I second the data cleaning answer as being something I dislike about it. It would be so nice to have access to clean, reliable data so we can start the analyses, but that is never the case. Examining data can be tedious and difficult to do thoroughly with very large datasets. Another thing is simply finding the data to answer questions. We have so much data in many different databases and tables, so sometimes it's not clear where to get data. The good thing is we have teams and lots of people we work with who we can ask where to find whatever data we're looking for.
Updated
Vivienne’s Answer
The aspect of data science that I find less appealing is the process of data cleaning. In the realm of data science, the principle of 'garbage in, garbage out' is widely accepted. It's crucial to ensure that the data you're utilizing for your analysis or model is of high quality and any potential issues are rectified prior to its use. This implies that nearly 70% of your time is dedicated to scrutinizing your data for anomalies, missing or invalid values, dates, and so on, and liaising with the data owners as necessary. For instance, what is the best approach to manage missing values? Should they be omitted, or should they be replaced? Are they missing randomly or is there a pattern? Are there specific values assigned to represent missing values? In conclusion, while data cleansing might not be the most enjoyable task, it's undeniably crucial. On a positive note, data cleaning provides a great opportunity to gain in-depth knowledge about the data.
Updated
Reid’s Answer
Data science and analytics has been very rewarding for me. I completed an undergraduate degree in Materials Engineering which gave me a good foundation of math and problem solving which translated very effectively to data science when I pivoted in my career.
One area that consistently poses challenges and causes delays is data availability and connections/pipelines. Creating data pipelines does border on data engineering however in my role at my company I am faced with having to work on this step before being able to build anything with the data. The most challenging cases arise from projects that involve data sources from multiple systems and especially challenging when they are external to the company. Pipelines need to be created to bring the data into a data warehouse. This is typically a tedious process involving access requests and approvals while finding the best solution to create the pipeline (direct connections, APIs, manual exports, etc.).
This is a very import aspect to a project because the sustainability of the solution is dependent upon it while also removing human intervention with automation.
One area that consistently poses challenges and causes delays is data availability and connections/pipelines. Creating data pipelines does border on data engineering however in my role at my company I am faced with having to work on this step before being able to build anything with the data. The most challenging cases arise from projects that involve data sources from multiple systems and especially challenging when they are external to the company. Pipelines need to be created to bring the data into a data warehouse. This is typically a tedious process involving access requests and approvals while finding the best solution to create the pipeline (direct connections, APIs, manual exports, etc.).
This is a very import aspect to a project because the sustainability of the solution is dependent upon it while also removing human intervention with automation.
Updated
Patrick’s Answer
Arvaiya, I want to first thank you for reaching out and asking your questions regarding your curiosity about the potential dislikes in the field of data science. I hope the information that I have below provides you some insight and/or help.
Data science is a rewarding and dynamic field, but it does have its challenges and less enjoyable aspects.
A common hurdle in data science is the large amount of time spent on data cleaning and preprocessing. Raw data is seldom ready for analysis, and a good chunk of a data scientist's time is spent on making this data usable. This can be a tedious and long process, requiring careful attention to detail and patience.
Another potential issue is the uncertainty that comes with real-world data. Unlike neatly arranged academic datasets, real-world data can be messy, incomplete, or inconsistent. Handling this uncertainty can be frustrating and it calls for a mix of creativity and problem-solving skills to overcome the challenges it poses.
Also, deploying a model and transitioning from a successful prototype to a system ready for production can be a source of stress. This stage involves working with IT teams and often demands skills beyond traditional data science, like software engineering knowledge and understanding of deployment environments. This can be a tough learning curve for those mainly focused on analytics.
From my personal experience, one challenging aspect is the need to stay up-to-date with the rapid progress in tools, techniques, and algorithms in the data science field. While it's thrilling to be in a constantly evolving field, the fast pace can be demanding and staying current with the latest developments can sometimes feel overwhelming.
However, it's important to remember that these potential downsides are part of the nature of the field, and many data scientists find the work extremely satisfying. The challenges are often balanced by the pleasure of solving complex problems, deriving meaningful insights from data, and contributing to data-driven decision-making.
It's vital for those entering the field to be aware of these challenges, build resilience, and keep a passion for continuous learning. Like any profession, finding the aspects that align with your interests and strengths can help lessen potential dislikes and create a more enjoyable and rewarding data science career.
Data science is a rewarding and dynamic field, but it does have its challenges and less enjoyable aspects.
A common hurdle in data science is the large amount of time spent on data cleaning and preprocessing. Raw data is seldom ready for analysis, and a good chunk of a data scientist's time is spent on making this data usable. This can be a tedious and long process, requiring careful attention to detail and patience.
Another potential issue is the uncertainty that comes with real-world data. Unlike neatly arranged academic datasets, real-world data can be messy, incomplete, or inconsistent. Handling this uncertainty can be frustrating and it calls for a mix of creativity and problem-solving skills to overcome the challenges it poses.
Also, deploying a model and transitioning from a successful prototype to a system ready for production can be a source of stress. This stage involves working with IT teams and often demands skills beyond traditional data science, like software engineering knowledge and understanding of deployment environments. This can be a tough learning curve for those mainly focused on analytics.
From my personal experience, one challenging aspect is the need to stay up-to-date with the rapid progress in tools, techniques, and algorithms in the data science field. While it's thrilling to be in a constantly evolving field, the fast pace can be demanding and staying current with the latest developments can sometimes feel overwhelming.
However, it's important to remember that these potential downsides are part of the nature of the field, and many data scientists find the work extremely satisfying. The challenges are often balanced by the pleasure of solving complex problems, deriving meaningful insights from data, and contributing to data-driven decision-making.
It's vital for those entering the field to be aware of these challenges, build resilience, and keep a passion for continuous learning. Like any profession, finding the aspects that align with your interests and strengths can help lessen potential dislikes and create a more enjoyable and rewarding data science career.