Data Science 简明教程

Data Science - Scientists

数据科学家是一位经过培训的专业人士,他们分析并理解数据。他们利用其对数据科学的了解来帮助企业做出更好的决策并更好地运营。大多数数据科学家在数学、统计学和计算机科学方面都有丰富的经验。他们使用这些信息来查看大量数据并找出趋势或模式。数据科学家还可能提出收集和存储数据的新方法。

A data scientist is a trained professional who analyzes and makes sense of data. They use their knowledge of data science to help businesses make better decisions and run better. Most data scientists have a lot of experience with math, statistics, and computer science. They use this information to look at big sets of data and find trends or patterns. Data scientists might also come up with new ways to collect and store data.

How to become a Data Scientist?

迫切需要懂得如何使用数据分析为公司提供竞争优势的人员。作为一名数据科学家,您将根据数据做出业务解决方案和分析。

There is a big need for people who know how to use data analysis to give their companies a competitive edge. As a data scientist, you will make business solutions and analytics that are based on data.

成为数据科学家的途径有很多,但由于它通常是一份高级工作,因此大多数数据科学家都拥有数学、统计学、计算机科学和其他相关领域的学位。

There are many ways to become a Data Scientist, but because it’s usually a high-level job, most Data Scientists have degrees in math, statistics, computer science, and other related fields.

以下是成为数据科学家的步骤:

Below are some steps to become a data scientist −

Step 1 − Right Data Skills

如果你没有数据相关的工作经验,那么你可以成为一名数据科学家,但你需要获得从事数据科学的必要基础。

You can become a Data Scientist if you have no data-related job experience, but you will need to acquire the necessary foundation to pursue a data science profession.

数据科学家是一个高级职位;在达到这一专业水平之前,你应该在相关主题中获得全面的知识基础。这可能包括数学、工程、统计、数据分析、编程或信息技术;一些数据科学家从银行或棒球侦查开始他们的职业生涯。

A Data Scientist is a high-level role; prior to attaining this level of expertise, you should acquire a comprehensive knowledge foundation in a related topic. This might include mathematics, engineering, statistics, data analysis, programming, or information technology; some Data Scientists began their careers in banking or baseball scouting.

但是,无论你从哪个领域开始,你都应该从 Python、SQL 和 Excel 开始。这些能力对于处理和组织原始数据非常重要。熟悉 Tableau 有利,这是一种你经常用来构建可视化效果的工具。

But regardless of the area you begin in, you should begin with Python, SQL, and Excel. These abilities will be important for processing and organizing raw data. It is beneficial to be acquainted with Tableau, a tool you will use often to build visuals.

Step 2 − Learn Data Science Fundamentals

数据科学训练营可能是学习或提高数据科学原理的完美方法。你可以参考 Data Science BootCamp ,其中包含了详细涵盖的每个主题。

A data science boot camp might be a perfect approach to learn or improve upon the principles of data science. You can refer Data Science BootCamp which has each and every topic covered in detail.

学习数据科学基础知识,例如如何收集和存储数据、分析和建模数据,以及使用数据科学工具集中所有工具(例如 Tableau 和 PowerBI 等)显示和呈现数据。

Learn data science fundamentals such as how to gather and store data, analyze and model data, and display and present data using every tool in the data science arsenal, such as Tableau and PowerBI, among others.

在培训结束时,你应该能够利用 Python 和 R 创建评估行为和预测未知数的模型,以及使用户友好的格式重新打包数据。

You should be able to utilize Python and R to create models that assess behavior and forecast unknowns, as well as repackage data in user-friendly formats, by the conclusion of your training.

一些数据科学工作清单规定高级学位是先决条件。有时,这是不可协商的,但当需求超过供应时,这越来越多地揭示了真相。也就是说,必要才能的证明常常超越了仅凭证书。

Several Data Science job listings state advanced degrees as a prerequisite. Sometimes, this is non-negotiable, but when demand exceeds supply, this increasingly reveals the truth. That is, proof of the necessary talents often surpasses credentials alone.

招聘经理最关心的是你如何很好地展示你对该科目的了解,越来越多的人认识到,不必以传统方式去做。

Hiring managers care most about how well you can show that you know the subject, and more and more people are realizing that this doesn’t have to be done in the traditional ways.

Data Science Fundamentals

  1. Collect and store data.

  2. Analyze and model the data.

  3. Build a model that can make prediction using the given data.

  4. Visualizing and presenting data in user-friendly forms.

Step 3 − Learn Key Programming Languages for Data Science

数据科学家使用各种工具和程序,这些工具和程序专门用于清理、分析和建模数据。数据科学家需要了解的不仅仅是 Excel。他们还需要了解一门统计编程语言,如 Python、R 或 Hive,以及一门查询语言,如 SQL。

Data Scientists use a variety of tools and programs that were made just for cleaning, analyzing, and modeling data. Data Scientists need to know more than just Excel. They also need to know a statistical programming language like Python, R, or Hive, as well as a query language like SQL.

RStudio 服务器为在服务器上使用 R 工作提供了开发环境,它是数据科学家最重要的工具之一。另一个流行的软件是开源 Jupyter Notebook,它可用于统计建模、数据可视化、机器学习等。

RStudio Server, which provides a development environment for working with R on a server, is one of the most important tools for a Data Scientist. Another popular software is the open-source Jupyter Notebook, which can be used for statistical modeling, data visualization, machine learning, and more.

机器学习最常用于数据科学。它指的是使用人工智能的工具,使系统能够学习和改进,而无需专门对其进行编程。

Machine learning is being used most in data science. This refers to tools that use artificial intelligence to give systems the ability to learn and get better without being specifically programmed to do so.

Step 4 − Learn how to do visualizations and practice them

练习使用 Tableau、PowerBI、Bokeh、Plotly 或 Infogram 等程序从头开始制作自己的可视化效果。找到让数据自己说明问题的最佳方式。

Practice making your own visualizations from scratch with programs like Tableau, PowerBI, Bokeh, Plotly, or Infogram. Find the best way to let the data speak for itself.

此步骤中通常使用 Excel。尽管电子表格背后的基本思想很简单——通过关联单元格中的信息进行计算或绘图——但 Excel 在 30 多年后仍然非常有用,没有它几乎不可能进行数据科学。

Excel is generally used in this step. Even though the basic idea behind spreadsheets is simple-making calculations or graphs by correlating the information in their cells-Excel is still very useful after more than 30 years, and it is almost impossible to do data science without it.

但制作美丽的图片仅仅是个开始。作为一名数据科学家,你还需要能够使用这些可视化效果向现场观众展示你的调查结果。你可能已经具备了这些沟通技巧,但如果没有,也不必担心。每个人都可以通过练习来提高自身。如果你需要,可以从向一个朋友甚至你的宠物进行演示开始,然后再进行小组演示。

But making beautiful pictures is just the start. As a Data Scientist, you’ll also need to be able to use these visualizations to show your findings to a live audience. You may have these communication skills already, but if not, don’t worry. Anyone can get better with practice. If you need to, start small by giving presentations to one friend or even your pet before moving on to a group.

Step 5 − Work on some Data Science projects that will help develop your practical data skills

一旦你了解了数据科学家使用的编程语言和数字工具的基础知识,你就可以开始使用它们来练习和提高你的新技能。尝试承担需要广泛技能的项目,例如使用 Excel 和 SQL 管理和查询数据库,以及使用 Python 和 R 使用统计方法分析数据、构建分析行为并为你提供新见解的模型,以及使用统计分析预测你不知道的事情。

Once you know the basics of the programming languages and digital tools that Data Scientists use, you can start using them to practice and improve your new skills. Try to take on projects that require a wide range of skills, like using Excel and SQL to manage and query databases and Python and R to analyze data using statistical methods, build models that analyze behavior and give you new insights, and use statistical analysis to predict things you don’t know.

在你练习时,请尝试涵盖该过程的不同部分。从研究公司或市场领域开始,然后定义和收集适合手头任务的正确数据。最后,清理并测试该数据,以使其尽可能有用。

As you practice, try to cover different parts of the process. Start with researching a company or market sector, then define and collect the right data for the task at hand. Finally, clean and test that data to make it as useful as possible.

最后,你可以制作和使用自己的算法来分析和建模数据。然后你可以将结果放入简单易读的可视化或仪表板中,供用户使用它与你的数据互动并就此询问问题。你甚至可以尝试向其他人展示你的调查结果以提高你的沟通能力。

Lastly, you can make and use your own algorithms to analyze and model the data. You can then put the results into easy-to-read visuals or dashboards that users can use to interact with your data and ask questions about it. You could even try showing your findings to other people to get better at communicating.

你也应该习惯处理不同类型的数据,比如文本、结构化数据、图像、音频甚至视频。每个行业都有其自己的数据类型,这些数据可以帮助领导者制定更好的、更明智的决策。

You should also get used to working with different kinds of data, like text, structured data, images, audio, and even video. Every industry has its own types of data that help leaders make better, more informed decisions.

作为一名职业数据科学家,你可能只精通其中一两个领域,但作为一名培养技能的初学者,你应该学习尽可能多类型的基础知识。

As a working Data Scientist, you’ll probably be an expert in just one or two, but as a beginner building your skillset, you’ll want to learn the basics of as many types as possible.

承担更复杂项目将让你有机会了解如何用不同方式来使用数据。一旦你知道如何使用描述性分析法来查找数据中的模式,你就可以更好地准备尝试诸如数据挖掘、预测模型和机器学习等更复杂的统计方法来预测未来事件或提出建议。

Taking on more complicated projects will give you the chance to see how data can be used in different ways. Once you know how to use descriptive analytics to look for patterns in data, you’ll be better prepared to try more complicated statistical methods like data mining, predictive modelling, and machine learning to predict future events or even make suggestions.

Step 6 − Make a Portfolio that shows your Data Science Skills

一旦你完成初步研究、接受培训并通过制作各种令人印象深刻的项目来实践你的新技能,下一步就是通过制作精美的作品集来展示你的新技能,这会帮你获得理想的工作。

Once you’ve done your preliminary research, gotten the training, and practiced your new skills by making a wide range of impressive projects, the next step is to show off your new skills by making the polished portfolio that will get you your dream job.

事实上,在你求职时,你的作品集可能是最重要的东西。如果你想成为一名数据科学家,你或许应该在 GitHub 上展示你的作品,而不仅仅是(或加上)你自己的网站。GitHub 能让你轻松展示你的工作、流程和结果,同时也能在公共网络中提升你的个人形象。不过,不要就此止步。

In fact, your portfolio might be the most important thing you have when looking for a job. If you want to be a Data Scientist, you might want to show off your work on GitHub instead of (or in addition to) your own website. GitHub makes it easy to show your work, process, and results while also raising your profile in a public network. Don’t stop there, though.

用你的数据加入一个引人入胜的故事,并展示你试图解决的问题,以便雇主能够看到你有多好。你可以在 GitHub 上将你的代码放在更大的图片中,而不仅仅是单凭代码本身,这使得你的贡献更容易理解。

Include a compelling story with your data and show the problems you’re trying to solve so the employer can see how good you are. You can show your code in a bigger picture on GitHub instead of just by itself, which makes your contributions easier to understand.

在你申请特定工作时,不必列出你所有的工作。仅强调最贴合你要申请的工作的几个部分,这些部分最能展示你贯穿整个数据科学流程的技能范围,从使用基本的数据库开始,到定义问题、清理数据、建立模型并找到解决方案。

Don’t list all of your work when you’re applying for a specific job. Highlight just a few pieces that are most relevant to the job you’re applying for and that best show your range of skills throughout the whole data science process, from starting with a basic data set to defining a problem, cleaning up, building a model, and finding a solution.

你的作品集是你展示自己不仅能处理数字还能有效沟通的机会。

Your portfolio is your chance to show that you can do more than just crunch numbers and communicate well.

Step 7 − Demonstrate Your Abilities

你独立完成的一个出色的项目可以是一个展示你技能并给可能会雇用你的招聘经理留下深刻印象的绝佳方式。

A well-done project that you do on your own can be a great way to show off your skills and impress hiring managers who might hire you.

选择一些真正感兴趣的事物,向其中提问,并尝试用数据回答这个问题。

Choose something that really interests you, ask a question about it, and try to answer that question with data.

记录你的旅程,并通过以美丽的方式呈现你的调查结果并说明你是如何做到的来炫耀你的技术技能和创造力。你的数据应该附带一个引人入胜的故事,该故事展示了你解决的问题,突出你的流程和所采取的创造性步骤,以便雇主能够看出你的价值。

Document your journey and show off your technical skills and creativity by presenting your findings in a beautiful way and explaining how you got there. Your data should be accompanied by a compelling narrative that shows the problems you’ve solved, highlighting your process and the creative steps you’ve taken, so that an employer can see your worth.

加入 Kaggle 等在线数据科学网络是另一种展示你投身于社区、作为一个有志向的数据科学家展示你的技能以及不断提升你的专业知识和影响力的绝佳方式。

Joining an online data science network like Kaggle is another great way to show that you’re involved in the community, show off your skills as an aspiring Data Scientist, and continue to grow both your expertise and your reach.

Step 8 − Start Applying to Data Science Jobs

数据科学领域有很多工作。在学习基础知识后,人们往往会继续专门从事不同子领域,例如数据工程师、数据分析师或机器学习工程师等。

There are many jobs in the field of data science. After learning the basics, people often go on to specialize in different subfields, such as Data Engineers, Data Analysts, or Machine Learning Engineers, among many others.

了解公司重视什么以及他们正在做些什么,并确保它符合你的技能、目标和未来想做的事情。而且不要只盯着硅谷。波士顿、芝加哥和纽约等城市难以找到技术人才,因此有很多机会。

Find out what a company values and what they’re working on, and make sure it fits with your skills, goals, and what you want to do in the future. And don’t just look in Silicon Valley. Cities like Boston, Chicago, and New York are having trouble finding technical talent, so there are lots of opportunities.