Data Science 简明教程
Data Science - Getting Started
数据科学是从数据中提取和分析有用信息以解决难以通过分析解决的问题的过程。例如,当你访问一个电子商务网站,在购买之前查看一些类别和产品时,你正在创建分析师可以用来弄清楚你是如何进行购买的数据。
Data Science is the process of extracting and analysing useful information from data to solve problems that are difficult to solve analytically. For example, when you visit an e-commerce site and look at a few categories and products before making a purchase, you are creating data that Analysts can use to figure out how you make purchases.
它涉及不同的学科,例如数学和统计建模,从其来源中提取数据和应用数据可视化技术。它还涉及处理大数据技术以收集结构化和非结构化数据。
It involves different disciplines like mathematical and statistical modelling, extracting data from its source and applying data visualization techniques. It also involves handling big data technologies to gather both structured and unstructured data.
它可以帮助你找到隐藏在原始数据中的模式。术语“数据科学”已经演变,因为数学统计、数据分析和“大数据”已经随着时间而改变。
It helps you find patterns that are hidden in the raw data. The term "Data Science" has evolved because mathematical statistics, data analysis, and "big data" have changed over time.
数据科学是一个跨学科领域,它让你可以从有组织和无组织的数据中学习。利用数据科学,你可以将业务问题转化为研究项目,然后将其应用到实际解决方案中。
Data Science is an interdisciplinary field that lets you learn from both organised and unorganised data. With data science, you can turn a business problem into a research project and then apply into a real-world solution.
History of Data Science
约翰·图基在 1962 年使用术语“数据分析”来定义一个类似于当前现代数据科学的领域。在 1985 年对北京中国科学院的演讲中,C·F·杰夫·吴首次将短语“数据科学”作为统计的替代词。随后,在 1992 年于蒙彼利埃第二大学举办的会议上,从事统计工作的参与者认识到一个以多种来源和形式的数据为中心的新领域诞生,将统计和数据分析的已知思想和原则与计算机相结合。
John Tukey used the term "data analysis" in 1962 to define a field that resembled current modern data science. In a 1985 lecture to the Chinese Academy of Sciences in Beijing, C. F. Jeff Wu introduced the phrase "Data Science" as an alternative word for statistics for the first time. Subsequently, conference held at the University of Montpellier II in 1992 participants at a statistics recognised the birth of a new field centred on data of many sources and forms, integrating known ideas and principles of statistics and data analysis with computers.
彼得·诺尔在 1974 年建议将“数据科学”作为计算机科学的替代名称。国际分类学会联合会是第一个将数据科学作为专门主题予以突出的会议,是在 1996 年。然而,这个概念仍然在变化中。继在北京中国科学院的 1985 年的演讲后,C·F·杰夫·吴再次倡导在 1997 年将统计学更名为数据科学。他的理由是,一个新名称将有助于统计学摆脱不准确的刻板印象和观念,例如与会计有关或仅限于数据描述。林知己在 1998 年提出数据科学是一个包含数据设计、数据收集和数据分析三个组成部分的新型多学科概念。
Peter Naur suggested the phrase "Data Science" as an alternative name for computer science in 1974. The International Federation of Classification Societies was the first conference to highlight Data Science as a special subject in 1996. Yet, the concept remained in change. Following the 1985 lecture at the Chinese Academy of Sciences in Beijing, C. F. Jeff Wu again advocated for the renaming of statistics to Data Science in 1997. He reasoned that a new name would assist statistics in inaccurate stereotypes and perceptions, such as being associated with accounting or confined to data description. Hayashi Chikio proposed Data Science in 1998 as a new, multidisciplinary concept with three components: data design, data collecting, and data analysis.
在 20 世纪 90 年代,“知识发现”和“数据挖掘”是识别因数据集不断增长而产生的模式的过程的流行短语。
In the 1990s, "knowledge discovery" and "data mining" were popular phrases for the process of identifying patterns in datasets that were growing in size.
在 2012 年,工程师托马斯·H·戴文波特和 DJ·帕蒂尔宣称“数据科学家:21 世纪最热门的工作。”这个术语被纽约时报和波士顿环球报等主要都市出版物所采用。十年后,他们又重复了这一点,并补充说“这个职位的需求比以往任何时候都大。”
In 2012, engineers Thomas H. Davenport and DJ Patil proclaimed "Data Scientist: The Hottest Job of the 21st Century," a term that was taken up by major metropolitan publications such as the New York Times and the Boston Globe. They repeated it a decade later, adding that "the position is in more demand than ever"
威廉·S·克利夫兰经常与数据科学作为独立领域的当前理解联系在一起。在 2001 年的一项研究中,他主张将统计学发展为技术领域;由于这将从根本上改变科目,因此需要一个新名称。在随后的几年中,“数据科学”变得越来越流行。在 2002 年,科学技术数据委员会出版了《数据科学杂志》。哥伦比亚大学于 2003 年创办了《数据科学杂志》。美国统计协会的统计学习和数据挖掘部分在 2014 年更名为统计学习和数据科学部分,反映了数据科学越来越受欢迎。
William S. Cleveland is frequently associated with the present understanding of Data Science as a separate field. In a 2001 study, he argued for the development of statistics into technological fields; a new name was required as this would fundamentally alter the subject. In the following years, "Data Science" grew increasingly prevalent. In 2002, the Council on Data for Science and Technology published Data Science Journal. Columbia University established The Journal of Data Science in 2003. The Section on Statistical Learning and Data Mining of the American Statistical Association changed its name to the Section on Statistical Learning and Data Science in 2014, reflecting the growing popularity of Data Science.
在 2008 年,DJ·帕蒂尔和杰夫·哈默巴赫获得了“数据科学家”的专业资格。尽管国家科学委员会在他们 2005 年的研究“长期数字数据收集:支持 21 世纪的研究和教学”中使用了这个术语,但它指的是在管理数字数据收集方面任何重要的角色。
In 2008, DJ Patil and Jeff Hammerbacher were given the professional designation of "data scientist." Although it was used by the National Science Board in their 2005 study "Long-Lived Digital Data Collections: Supporting Research and Teaching in the 21st Century," it referred to any significant role in administering a digital data collection.
对于“数据科学”的含义尚未达成共识,而且一些人认为它是一个流行语。大数据在营销中是一个类似的概念。数据科学家负责将海量数据转化为有用信息,并开发软件和算法,以帮助企业和机构确定最佳运营。
An agreement has not yet been reached on the meaning of Data Science, and some believe it to be a buzzword. Big data is a similar concept in marketing. Data scientists are responsible for transforming massive amounts of data into useful information and developing software and algorithms that assist businesses and organisations in determining optimum operations.
Why Data Science?
据 IDC 称,到 2025 年,全球数据将达到 175 泽字节。数据科学帮助企业了解来自不同来源的海量数据,提取有用的见解,并做出更好的数据驱动决策。数据科学广泛应用于多个工业领域,例如营销、医疗保健、金融、银行和政策制定。
According to IDC, worldwide data will reach 175 zettabytes by 2025. Data Science helps businesses to comprehend vast amounts of data from different sources, extract useful insights, and make better data-driven choices. Data Science is used extensively in several industrial fields, such as marketing, healthcare, finance, banking, and policy work.
以下是使用数据分析技术的重要优势:-
Here are significant advantages of using Data Analytics Technology −
-
Data is the oil of the modern age. With the proper tools, technologies, and algorithms, we can leverage data to create a unique competitive edge.
-
Data Science may assist in detecting fraud using sophisticated machine learning techniques.
-
It helps you avoid severe financial losses.
-
Enables the development of intelligent machines
-
You may use sentiment analysis to determine the brand loyalty of your customers. This helps you to make better and quicker choices.
-
It enables you to propose the appropriate product to the appropriate consumer in order to grow your company.
Need for Data Science
The data we have and how much data we generate
根据福布斯的报道,2010 至 2020 年间,全球产生的、复制的、记录的和消耗的总数据量激增约 5,000%,从 1.2 万亿千兆字节增长到 59 万亿千兆字节。
According to Forbes, the total quantity of data generated, copied, recorded, and consumed in the globe surged by about 5,000% between 2010 and 2020, from 1.2 trillion gigabytes to 59 trillion gigabytes.
How companies have benefited from Data Science?
-
Several businesses are undergoing data transformation (converting their IT architecture to one that supports Data Science), there are data boot camps around, etc. Indeed, there is a straightforward explanation for this: Data Science provides valuable insights.
-
Companies are being outcompeted by firms that make judgments based on data. For example, the Ford organization in 2006, had a loss of $12.6 billion. Following the defeat, they hired a senior data scientist to manage the data and undertook a three-year makeover. This ultimately resulted in the sale of almost 2,300,000 automobiles and earned a profit for 2009 as a whole.
Demand and Average Salary of a Data Scientist
-
According to India Today, India is the second biggest centre for Data Science in the world due to the fast digitalization of companies and services. By 2026, analysts anticipate that the nation will have more than 11 million employment opportunities. In fact, recruiting in the Data Science field has surged by 46% since 2019.
-
Bank of America was one of the first financial institutions to provide mobile banking to its consumers a decade ago. Recently, the Bank of America introduced Erica, its first virtual financial assistant. It is regarded the as best financial invention in the world. Erica now serves as a client adviser for more than 45 million consumers worldwide. Erica uses Voice Recognition to receive client feedback, which represents a technical development in Data Science.
-
The Data Science and Machine Learning curves are steep. Although India sees a massive influx of data scientists each year, relatively few possess the needed skill set and specialization. As a consequence, people with specialised data skills are in great demand.
Impact of Data Science
数据科学对现代文明的各个方面产生了重大影响。数据科学对组织的重要性不断提高。根据一项调查,到 2023 年,数据科学的全球市场将达到 1150 亿美元。
Data Science has had a significant influence on several aspects of modern civilization. The significance of Data Science to organisations keeps on increasing. According to one research, the worldwide market for Data Science would reach $115 billion by 2023.
医疗保健行业受益于数据科学的兴起。2008 年,谷歌员工意识到他们可以实时监控流感毒株。以前的只能每周提供一次实例更新。谷歌能够利用数据科学构建出首批用于监测疾病传播的系统。
Healthcare industry has benefited from the rise of Data Science. In 2008, Google employees realised that they could monitor influenza strains in real time. Previous technologies could only provide weekly updates on instances. Google was able to build one of the first systems for monitoring the spread of diseases by using Data Science.
体育产业也同样受益于数据科学。2019 年,一位数据科学家找到了方法来衡量和计算进球尝试如何增加足球队的取胜几率。事实上,数据科学被用于轻松计算多种体育项目的统计数据。
The sports sector has similarly profited from data science. A data scientist in 2019 found ways to measure and calculate how goal attempts increase a soccer team’s odds of winning. In reality, data science is utilised to easily compute statistics in several sports.
政府机构也每天都在使用数据科学。全球各国的政府使用数据库监控有关社保、税收以及有关其居民的其他数据的信息。政府对新兴技术的使用不断发展。
Government agencies also use data science on a daily basis. Governments throughout the globe employ databases to monitor information regarding social security, taxes, and other data pertaining to their residents. The government’s usage of emerging technologies continues to develop.
由于互联网已成为人类交流的主要媒介,因此电子商务的普及度也在增加。利用数据科学,在线企业可以监控整个客户体验,包括营销活动、购买和消费者趋势。广告必须是电商企业使用数据科学的最大实例之一。您是否曾在网上搜索过某些内容或访问过电商产品网站,却发现社交网站和博客上充斥着该产品的广告?
Since the Internet has become the primary medium of human communication, the popularity of e-commerce has also grown. With data science, online firms may monitor the whole of the customer experience, including marketing efforts, purchases, and consumer trends. Ads must be one of the greatest instances of eCommerce firms using data science. Have you ever looked for anything online or visited an eCommerce product website, only to be bombarded by advertisements for that product on social networking sites and blogs?
广告像素对于在线收集和分析用户信息至关重要。企业利用在线消费者行为通过因特网重新投放目标消费者。这种对客户信息的利用超出了电子商务的范畴。诸如 Tinder 和 Facebook 等应用使用算法帮助用户找到他们想要找的内容。互联网是一个不断增长的数据宝库,收集和分析数据也将继续增长。
Ad pixels are integral to the online gathering and analysis of user information. Companies leverage online consumer behaviour to retarget prospective consumers throughout the internet. This usage of client information extends beyond eCommerce. Apps such as Tinder and Facebook use algorithms to assist users locate precisely what they are seeking. The Internet is a growing treasure trove of data, and the gathering and analysis of this data will also continue to expand.