Machine Learning 简明教程
Machine Learning - Introduction
Introduction to Machine Learning
我们生活在“数据时代”,这个时代因更好的计算能力和更多存储资源而得以丰富。这些数据或信息日复一日地增加,但真正的挑战是理解所有数据。企业和组织正尝试使用 Data Science 、 Data Mining 和机器学习的概念和方法来构建智能系统以应对这一挑战。其中,机器学习是计算机科学中最激动人心的领域。把机器学习称为算法的应用和科学并不过分,这些算法可以让数据更有意义。
We are living in the 'age of data' that is enriched with better computational power and more storage resources. This data or information is increasing day by day, but the real challenge is to make sense of all the data. Businesses & organizations are trying to deal with it by building intelligent systems using the concepts and methodologies from Data Science, Data Mining, and Machine learning. Among them, machine learning is the most exciting field of computer science. It would not be wrong to call machine learning the application and science of algorithms that provide sense to the data.
What is Machine Learning?
机器学习 (ML) 是计算机科学的一个领域,借助该领域,计算机系统能够像人类一样赋予数据意义。
Machine Learning (ML) is that field of computer science with the help of which computer systems can provide sense to data in much the same way as human beings do.
简而言之,机器学习是一种人工智能,它使用算法或方法从原始数据中提取模式。
In simple words, ML is a type of artificial intelligence that extracts patterns out of raw data by using an algorithm or method.
How does Machine Learning work?
机器如何从模型中学习的机制分为三个主要组成部分 −
The mechanism of how a machine learns from a model is divided into three main components −
-
Decision Process − Based on the input data and output labels provided to the model, it will produce a logic about the pattern identified.
-
*Cost Function − * It is the measure of error between expected value and predicted value. This is used to evaluate the performance of machine learning.
-
*Optimization Process − * Cost function can be minimized by adjusting the weights at the training stage. The algorithm will repeat the process of evaluation and optimization until the error minimizes.
Need for Machine Learning
人类在此刻是地球上最聪明、最先进的物种,因为他们可以思考、评估和解决复杂问题。另一方面,人工智能仍处于初期阶段,在许多方面尚未超越人类智能。
Human beings, at this moment, are the most intelligent and advanced species on earth because they can think, evaluate and solve complex problems. On the other side, AI is still in its initial stage and hasn’t surpassed human intelligence in many aspects.
那么问题是,需要让机器学习的原因是什么?这样做最合适的原因是“基于数据,高效且有规模地做出决策”。
Then the question is, what is the need to make machines learn? The most suitable reason for doing this is “to make decisions, based on data, with efficiency and scale”.
最近,各组织正在大量投资人工智能、机器学习和深度学习等新技术,从数据中获取关键信息,以执行多项现实世界的任务并解决问题。我们可以称其为由机器做出的数据驱动决策,特别是为了自动化这个过程。
Lately, organizations are investing heavily in newer technologies like Artificial Intelligence, Machine Learning and Deep Learning to get the key information from data to perform several real-world tasks and solve problems. We can call it data-driven decisions taken by machines, particularly to automate the process.
这些数据驱动决策可以代替编程逻辑,用于无法固有编程的问题上。事实上,我们离不开人类智能,但另一个方面是,我们所有人需要以巨大规模高效地解决现实世界问题。这就是 need for machine learning 出现的缘故。
These data-driven decisions can be used, instead of programming logic, in problems that cannot be programmed inherently. The fact is that we can’t do without human intelligence, but another aspect is that we all need to solve real-world problems with efficiency at a huge scale. That is why the need for machine learning arises.
History of Machine Learning
机器学习的 history 可追溯到 1959 年,当时 Arthur Samuel 发明了一个程序,为每一方计算西洋跳棋的 winning probability 。
The history of Machine learning roots back to the year 1959, when Arthur Samuel invented a program that calculates the winning probability in checkers for each side.
好吧,机器学习几十年的演变始于这样一个问题,“机器能思考吗?”。然后,在 1960 年至 1970 年间出现了 neural networks 的兴起。机器学习继续通过统计方法进行推进,例如 Bayesian networks 和 decision tree 学习。
Well, the evolution of Machine learning through decades started with the question, "Can Machines think?". Then came the rise of neural networks between 1960 and 1970. Machine learning continued to advance through statistical methods such as Bayesian networks and decision tree learning.
Deep Learning 的革命始于 2010 年代,随着自然语言处理、卷积神经网络和语音识别的任务演进。如今,机器学习已成为一项革命性技术,已成为各个领域的组成部分,从医疗保健到金融和交通运输。
The revolution of Deep Learning started off in the 2010s with the evolution of tasks such as natural language processing, convolution neural networks and speech recognition. Today, machine learning has turned out to be a revolutionizing technology that has become a part of all fields, ranging from healthcare to finance and transportation.
Machine Learning Methods
机器学习模型主要可以分为以下四类 −
Machine learning models can be categorized mainly into the following four types −
-
Supervised Machine Learning
-
Unsupervised Machine Learning
-
Semi-supervised Machine Learning
-
Reinforcement Machine Learning
让我们详细探讨以上每一种类型的机器学习。
Let’s explore each of the above types of machine learning in detail.
Supervised Machine Learning
在 supervised machine learning 中,算法在标记数据上接受训练,这意味着为每个输入提供了正确的答案或输出。然后,算法使用这些标记数据对新的、未见数据做出预测。
In supervised machine learning, the algorithm is trained on labeled data, meaning that the correct answer or output is provided for each input. The algorithm then uses this labeled data to make predictions about new, unseen data.
Unsupervised Machine Learning
在 unsupervised machine learning 中,算法在未标记数据上接受训练,这意味着为每个输入未提供正确的输出或答案。相反,算法必须自己识别数据中的模式和结构。
In unsupervised machine learning, the algorithm is trained on unlabeled data, meaning that the correct output or answer is not provided for each input. Instead, the algorithm must identify patterns and structures in the data on its own.
Semi-supervised Machine Learning
Semi-supervised machine learning 是一种机器学习技术,它是监督学习和非监督学习的集成,因为它使用主要部分的未标记数据集和次要部分的标记数据来训练算法,最好用于分类和回归任务。
Semi-supervised machine learning is a type of machine learning technique that is an integration of supervised and unsupervised learning as it uses a major portion of unlabeled dataset and minor portion of labeled data for training an algorithm preferably for classification and regression tasks.
Reinforcement Machine Learning
在 reinforcement machine learning 中,算法通过接收基于其行为的奖励或惩罚形式的反馈来学习。然后,算法使用此反馈来调整其行为并提高性能。
In reinforcement machine learning, the algorithm learns by receiving feedback in the form of rewards or punishments based on its actions. The algorithm then uses this feedback to adjust its behavior and improve performance.
Machine Learning Use Cases
机器学习已经成为我们所有人生活中必不可少的一部分。它广泛用于各个行业,特别是涉及处理大量数据的行业。机器学习的一些用例包括:
Machine learning has become a significant part of all our lives. It is broadly used in every industry, especially industries that involve dealing with large data. Some of the use cases of Machine learning are:
Recommendation System
它们是软件引擎,旨在根据用户的喜好和厌恶、以前与应用程序的互动等向用户推荐物品。这有助于提升用户体验,从而增加企业的销售额。
They are software engines designed to suggest items to users based on their likes and dislikes, previous engagement with the application, etc. This helps enhance the user experience which would increase sales of a business.
Voice Assistants
它是一个基于语音识别、语言处理算法和语音合成工作的数字助理,可以聆听特定的语音命令,并用用户询问的相关信息作为回应。
It is a digital assistant that works based on speech recognition, language processing algorithms, and voice synthesis to listen to a specific voice command and reciprocate back with relevant information asked by the user.
Fraud Detection
它是识别系统或组织中异常活动的过程,主要用于金融部门以识别欺诈交易。算法经过训练,可以监控交易、行为和模式,识别可疑活动,以便报告并进一步调查。
It is the process of identifying unusual activities within a system or organization mostly used in the financial sector to identify fraudulent transactions. An algorithm is trained to monitor transactions, behaviors, and patterns to identify suspicious activities that can be reported and looked into further.
Health Care
机器学习在医疗保健行业中广泛用于诊断疾病、提高医学成像的准确性以及个性化患者治疗。
Machine learning is widely used in the health sector to diagnose a disease, improve medical imaging accuracy, and personalize patient treatment.
Robotic Process Automation (RPA)
RPA 也被称为软件机器人,它使用智能自动化技术来执行重复的手动任务。
Also known as software robotics, RPA uses intelligent automation technologies to perform repetitive manual tasks.
Drive-less Cars
拥有自动驾驶汽车的概念将技术带到了另一个层次。尽管这些技术背后的算法和技术堆栈很先进,但核心是机器学习。最常见的例子是特斯拉汽车,经过了充分的测试和验证。
The idea of having a car that drives for itself took technology to another level. Though the algorithm and tech stack behind these technologies are advanced, the core is machine learning. The most common example is Tesla cars, which are well-tested and proven.
Advantages of Machine Learning
-
*Automation − * With machine learning, every task especially repetitive can be done seamlessly saving time and energy for humans. For example, the deployment of chatbots has improved customer experience and reduced waiting time. While human agents can work on dealing with creativity and complex problems.
-
*Enhancing user experience and decision making − * Machine learning models can analyze and gain insights from large datasets for decision making. Machine learning also allows for the personalization of products and services to enhance the customer experience. An algorithm analyzes customer preferences and past behavior to recommend products that enhance retail and also user experience.
-
*Wide Applicability − * This technology has wide range of applications. From health care and finance to business and marketing, machine learning is applied in almost all sectors to improve productivity.
-
*Continuous Improvement − * Machine learning algorithms are designed in a way that they keep learning to improve accuracy and efficiency. Every time the data is retrained by the model, the decisions improve.
Disadvantages of Machine Learning
-
*Data acquisition − * The most crucial and the most difficult task in machine learning is collecting data. Every machine learning algorithm requires data that is relevant, unbiased, and good quality. Better data would result in better performance of the machine learning model.
-
*Inaccurate Results − * Another major challenge in machine learning is the credibility of the interpreted result generated by the algorithm.
-
*Chances of Error − * Machine learning depends on two things data and algorithm. Any incorrectness or bias in these could result in errors and inaccurate outcomes. For example, if the dataset trained is small, then the algorithm cannot fully understand the patterns resulting in biased and irrelevant perdition.
-
*Maintenance − * Machine learning models have to continuously be maintained and monitored to ensure that they remain effective and accurate over time.
Challenges in Machine Learning
尽管机器学习取得了进步,但仍有一些需要解决的挑战和限制。
Despite the progress of Machine learning, there are a few challenges and limitations that have to be addressed.
-
*Data Privacy − * Machine learning models highly depend on data. Sometimes, it might be personal details. Keeping privacy and security concerns in mind, the data collected should be limited to only what is required by the model. It also requires the balance of the use of sensitive data with the protection of an individual’s privacy. The key tasks include effective anonymization, data protection, and data security.
-
*Impact on Jobs − * Machine learning takes up roles and tasks that can be automated like jobs in areas like data entry and customer service. Simultaneously it also creates job opportunities related to data preparation and algorithm development like data scientist, machine learning engineer and many more. Machine learning towards human resources towards data-driven decision making and creativity.
-
*Bias and Discrimination − * In the aspect of privacy considerations, a few sensitive attributes have to be protected such as race and gender from being inappropriately used to avoid discrimination.
-
*Ethical Consideration − * It helps to access how these machine learning algorithms impact individuals, society and various other sectors. The goal of these ethics is to establish a few guidelines to maintain transparency, accountability and social responsibility.
Machine Learning Algorithms Vs. Traditional Programming
机器算法和传统编程之间的区别取决于它们被编程为处理任务的方式。下面根据不同的标准对一些比较进行了分类:
The difference between machine algorithms and traditional programming depends on how they are programmed to handle tasks. Some comparisons based on different criteria are tabulated below:
Criteria |
Machine learning algorithms |
Traditional programming |
Problem solving approach |
The computer learns from training a model on large datasets. |
Explicit rules are given to the computer to follow in the form of code that is manually programmed. |
Data |
They heavily rely on data, it defines the performance of the model. |
They rely less on data, as the output depends on the logic encoded. |
Complexity of Problem |
Best suited for complex problems like image segmentation or natural language processing, which require identifying patterns and relationships in the data. |
Best suited for a problem with defined outcome and logic. |
Flexibility |
It is highly flexible and adapts to different scenarios, especially because the model is retrained with new data. |
It has limited flexibility, as the changes should be done manually. |
Outcome |
The outcome in machine learning is unpredictable, as it depends on data trained, model and many other things. |
The outcome in traditional programming can be accurately predicted if the problem and logic are known. |
Machine Learning Vs. Deep Learning
深度学习是机器学习的一个子领域。它们之间的实际区别在于算法的学习方式。
Deep learning is a sub-field of Machine learning. The actual difference between these is the way the algorithm learns.
在机器学习中,计算机使用算法从大数据集中学习,以便执行预测和推荐等任务。而深度学习则使用类似于人脑开发的复杂算法结构。
In Machine learning, computers learn from large datasets using algorithms to perform tasks like prediction and recommendation. Whereas Deep learning uses a complex structure of algorithms developed similar to the human brain.
与机器学习模型相比,深度学习模型对复杂问题的有效性更高。例如,自动驾驶汽车通常使用深度学习,它可以使用图像分割识别 U 型弯路标,而如果使用机器学习模型,则将选择路标的特征,然后使用分类器算法进行识别。
The effectiveness of deep learning models for complex problems is more compared to machine learning models. For example, autonomous vehicles are usually developed using deep learning where it can identify a U-TURN sign board using image segmentation while if a machine learning model was used, the features of the signboard are selected and then identified using a classifier algorithm.
Machine Learning Vs. Generative AI
机器学习和生成型 AI 是不同的分支,有不同的应用。虽然机器学习用于预测分析和决策,生成型 AI 则专注于创建内容,包括现有模式中的逼真图像和视频。
Machine learning and Generative AI are different branches with different applications. While Machine Learning is used for predictive analysis and decision-making, Generative AI focuses on creating content, including realistic images and videos in existing patterns.
Future of Machine Learning
机器学习肯定将成为技术领域的下一个变革者。自动化机器学习和合成数据生成是使机器学习更易于访问且高效的新时代发展。
Machine Learning is definitely going to be the next game changer in technology. Automated machine learning and synthetic data generation, are new age developments that make machine learning more accessible and efficient.
采用机器学习的一项重大技术是 Quantum computing 。它利用量子的机械现象创造出一个同时展示多个状态的系统。这些先进的量子算法用于高速处理数据。 AutoML 是将自动化和机器学习结合起来的另一项技术。它可能包括从原始数据到为部署开发模型的每个阶段。
One big technology that is an adoption of machine learning is Quantum computing. It uses the mechanical phenomenon of quantum to create a system that exhibits multiple states at the same time. These advanced quantum algorithms are used to process data at high speed. AutoML is another technology that combines automation and machine learning. It potentially includes each stage from raw data to developing a model ready for deployment.
Multi-modal AI 是一种用于有效解释和分析多感知输入(包括文本、语音、图像和传感器数据)的 AI 系统。 Generative AI 是机器学习的另一个新兴应用,它专注于创建模仿现有模式的新内容。对机器学习产生影响的其他一些新兴技术包括边缘计算、机器人技术等等。
Multi-modal AI is an AI system used to effectively interpret and analyze multi-sensory inputs, including texts, speech, images, and sensor data. Generative AI is another emerging application of machine learning which focuses on creating new content that mimics existing patterns. A few other emerging technologies that have an impact on Machine learning are Edge computing, Robotics, and many more.
How to Learn Machine Learning?
学习机器学习可能会让人望而生畏,但只要有正确的资源和指导,它就会是一次有价值的经历。以下是开始学习机器学习的 5 个步骤 −
Getting started with machine learning can seem intimidating, but with the right resources and guidance, it can be a rewarding experience. Below is a 5-step process getting started with machine learning is broken −
Step 1 − Learn the Fundamentals of Machine Learning
在深入研究机器学习之前,重要的是对基本原理有透彻的了解。其中包括学习数据类型、统计、算法和 Python 等编程语言。网上提供了许多在线课程、书籍和教程来帮助你入门。
Before diving into machine learning, it’s important to have a solid understanding of the fundamentals. This includes learning about data types, statistics, algorithms, and programming languages like Python. There are many online courses, books, and tutorials available that can help you get started.
Step 2 − Choose a Machine Learning Framework
一旦你对机器学习有了基本的了解,就可以选择一个框架了。有许多流行的机器学习框架可用,包括 TensorFlow、PyTorch 和 Scikit-Learn。每个框架都有其自己的优点和缺点,因此重要的是选择与你的目标和专业知识相符的框架。
Once you have a basic understanding of machine learning, it’s time to choose a framework. There are many popular machine learning frameworks available, including TensorFlow, PyTorch, and Scikit-Learn. Each framework has its own strengths and weaknesses, so it’s important to choose one that aligns with your goals and expertise.
Step 3 − Practice with Real Data
学习机器学习的最佳方法之一是使用真实数据进行练习。你可以在 Kaggle 或 UCI 机器学习存储库等网站上找到公开可用的数据集。使用真实数据进行练习将帮助你了解如何清洗、预处理和分析数据,以及如何针对不同类型的问题选择适当的算法。
One of the best ways to learn machine learning is by practicing with real data. You can find publicly available datasets on websites like Kaggle or UCI Machine Learning Repository. Practicing with real data will help you understand how to clean, preprocess, and analyze data, as well as how to choose appropriate algorithms for different types of problems.
Step 4 − Build Your Own Projects
随着你对机器学习了解的不断深入,开始构建自己的项目很重要。这将帮你应用所学知识,进一步发展你的技能。你可以从简单的项目开始,如构建推荐系统或情绪分析工具,然后随着你对这个过程越来越熟悉,再转向更复杂的项目。
As you gain more experience with machine learning, it’s important to start building your own projects. This will help you apply what you’ve learned and develop your skills further. You can start with simple projects, like building a recommendation system or a sentiment analysis tool, and then move on to more complex projects as you become more comfortable with the process.
Step 5 − Participate in Machine Learning Communities
加入机器学习社区(如在线论坛或聚会)是一个很好的方式,可以与其他对同一领域感兴趣的人联系。你可以向他人学习,分享你自己的经验,并获得关于你项目的反馈。这可以帮助你在持续学习和成长时保持动力和参与。
Joining machine learning communities, such as online forums or meetups, can be a great way to connect with other people who are interested in the same field. You can learn from others, share your own experiences, and get feedback on your projects. This can help you stay motivated and engaged as you continue to learn and grow.