Data Mining 简明教程
Data Mining - Applications & Trends
数据挖掘在不同领域被广泛使用。现如今有许多可用的商业数据挖掘系统,但该领域仍面临许多挑战。在本教程中,我们将讨论数据挖掘的应用和趋势。
Data mining is widely used in diverse areas. There are a number of commercial data mining system available today and yet there are many challenges in this field. In this tutorial, we will discuss the applications and the trend of data mining.
Data Mining Applications
以下列举了数据挖掘被广泛使用的领域 −
Here is the list of areas where data mining is widely used −
-
Financial Data Analysis
-
Retail Industry
-
Telecommunication Industry
-
Biological Data Analysis
-
Other Scientific Applications
-
Intrusion Detection
Financial Data Analysis
银行和金融业中的财务数据通常可靠且质量上乘,可促成系统化的数据分析和数据挖掘。一些典型情况如下 −
The financial data in banking and financial industry is generally reliable and of high quality which facilitates systematic data analysis and data mining. Some of the typical cases are as follows −
-
Design and construction of data warehouses for multidimensional data analysis and data mining.
-
Loan payment prediction and customer credit policy analysis.
-
Classification and clustering of customers for targeted marketing.
-
Detection of money laundering and other financial crimes.
Retail Industry
数据挖掘在零售业有很大的应用空间,因为它从销售、客户购买历史记录、货物运输、消费和服务中收集了大量数据。由于网络越来越方便、可用,并且越来越受欢迎,收集的数据量会继续快速增长,这是理所当然的。
Data Mining has its great application in Retail Industry because it collects large amount of data from on sales, customer purchasing history, goods transportation, consumption and services. It is natural that the quantity of data collected will continue to expand rapidly because of the increasing ease, availability and popularity of the web.
零售业的数据挖掘有助于识别客户购买模式和趋势,从而提高客户服务质量,提升客户保留率和满意度。以下列举了零售业中数据挖掘的示例 −
Data mining in retail industry helps in identifying customer buying patterns and trends that lead to improved quality of customer service and good customer retention and satisfaction. Here is the list of examples of data mining in the retail industry −
-
Design and Construction of data warehouses based on the benefits of data mining.
-
Multidimensional analysis of sales, customers, products, time and region.
-
Analysis of effectiveness of sales campaigns.
-
Customer Retention.
-
Product recommendation and cross-referencing of items.
Telecommunication Industry
如今,电信行业是提供各种服务的最蓬勃发展的行业之一,例如传真、寻呼机、蜂窝电话、互联网信使、图像、电子邮件、网络数据传输等。由于新计算机和通信技术的发展,电信行业正在迅速扩张。这就是数据挖掘变得非常重要的原因,有助于业务的理解和开展。
Today the telecommunication industry is one of the most emerging industries providing various services such as fax, pager, cellular phone, internet messenger, images, e-mail, web data transmission, etc. Due to the development of new computer and communication technologies, the telecommunication industry is rapidly expanding. This is the reason why data mining is become very important to help and understand the business.
电信行业的数据挖掘有助于识别电信模式、捕捉欺诈活动、更好地利用资源以及提高服务质量。以下列出了数据挖掘改善电信服务的一些示例:
Data mining in telecommunication industry helps in identifying the telecommunication patterns, catch fraudulent activities, make better use of resource, and improve quality of service. Here is the list of examples for which data mining improves telecommunication services −
-
Multidimensional Analysis of Telecommunication data.
-
Fraudulent pattern analysis.
-
Identification of unusual patterns.
-
Multidimensional association and sequential patterns analysis.
-
Mobile Telecommunication services.
-
Use of visualization tools in telecommunication data analysis.
Biological Data Analysis
近来,我们在生物学领域取得了巨大的发展,如基因组学、蛋白质组学、功能基因组学和生物医学研究。生物数据挖掘是生物信息学的重要组成部分。以下列出了数据挖掘在生物数据分析中发挥作用的方面:
In recent times, we have seen a tremendous growth in the field of biology such as genomics, proteomics, functional Genomics and biomedical research. Biological data mining is a very important part of Bioinformatics. Following are the aspects in which data mining contributes for biological data analysis −
-
Semantic integration of heterogeneous, distributed genomic and proteomic databases.
-
Alignment, indexing, similarity search and comparative analysis multiple nucleotide sequences.
-
Discovery of structural patterns and analysis of genetic networks and protein pathways.
-
Association and path analysis.
-
Visualization tools in genetic data analysis.
Other Scientific Applications
上述讨论的应用程序倾向于处理相对较小和同质的数据集,统计技术适合这些数据集。从地球科学、天文学等科学领域收集到了大量数据。由于气候和生态系统建模、化学工程、流体动力学等各个领域中的快速数值模拟,正在生成大量的数据集。以下是数据挖掘在科学应用领域的应用:
The applications discussed above tend to handle relatively small and homogeneous data sets for which the statistical techniques are appropriate. Huge amount of data have been collected from scientific domains such as geosciences, astronomy, etc. A large amount of data sets is being generated because of the fast numerical simulations in various fields such as climate and ecosystem modeling, chemical engineering, fluid dynamics, etc. Following are the applications of data mining in the field of Scientific Applications −
-
Data Warehouses and data preprocessing.
-
Graph-based mining.
-
Visualization and domain specific knowledge.
Intrusion Detection
入侵是指任何威胁网络资源的完整性、机密性或可用性的行为。在当今这个互联的世界里,安全性已成为主要问题。随着互联网使用的增加以及用于入侵和攻击网络的工具和技巧的可用性,入侵检测已成为网络管理的关键组成部分。以下是数据挖掘技术可应用于入侵检测的领域列表:
Intrusion refers to any kind of action that threatens integrity, confidentiality, or the availability of network resources. In this world of connectivity, security has become the major issue. With increased usage of internet and availability of the tools and tricks for intruding and attacking network prompted intrusion detection to become a critical component of network administration. Here is the list of areas in which data mining technology may be applied for intrusion detection −
-
Development of data mining algorithm for intrusion detection.
-
Association and correlation analysis, aggregation to help select and build discriminating attributes.
-
Analysis of Stream data.
-
Distributed data mining.
-
Visualization and query tools.
Data Mining System Products
有许多数据挖掘系统产品和特定领域的数据挖掘应用程序。新的数据挖掘系统和应用程序正在添加到以前系统中。此外,我们正在努力对数据挖掘语言进行标准化。
There are many data mining system products and domain specific data mining applications. The new data mining systems and applications are being added to the previous systems. Also, efforts are being made to standardize data mining languages.
Choosing a Data Mining System
数据挖掘系统取决于以下特性:
The selection of a data mining system depends on the following features −
-
Data Types − The data mining system may handle formatted text, record-based data, and relational data. The data could also be in ASCII text, relational database data or data warehouse data. Therefore, we should check what exact format the data mining system can handle.
-
System Issues − We must consider the compatibility of a data mining system with different operating systems. One data mining system may run on only one operating system or on several. There are also data mining systems that provide web-based user interfaces and allow XML data as input.
-
Data Sources − Data sources refer to the data formats in which data mining system will operate. Some data mining system may work only on ASCII text files while others on multiple relational sources. Data mining system should also support ODBC connections or OLE DB for ODBC connections.
-
Data Mining functions and methodologies − There are some data mining systems that provide only one data mining function such as classification while some provides multiple data mining functions such as concept description, discovery-driven OLAP analysis, association mining, linkage analysis, statistical analysis, classification, prediction, clustering, outlier analysis, similarity search, etc.
-
Coupling data mining with databases or data warehouse systems − Data mining systems need to be coupled with a database or a data warehouse system. The coupled components are integrated into a uniform information processing environment. Here are the types of coupling listed below − No couplingLoose CouplingSemi tight CouplingTight Coupling
-
Scalability − There are two scalability issues in data mining − Row (Database size) Scalability − A data mining system is considered as row scalable when the number or rows are enlarged 10 times. It takes no more than 10 times to execute a query. Column (Dimension) Salability − A data mining system is considered as column scalable if the mining query execution time increases linearly with the number of columns.
-
Visualization Tools − Visualization in data mining can be categorized as follows − Data VisualizationMining Results VisualizationMining process visualizationVisual data mining
-
Data Mining query language and graphical user interface − An easy-to-use graphical user interface is important to promote user-guided, interactive data mining. Unlike relational database systems, data mining systems do not share underlying data mining query language.
Trends in Data Mining
数据挖掘的概念仍在不断发展,以下是我们在这个领域看到的最新趋势−
Data mining concepts are still evolving and here are the latest trends that we get to see in this field −
-
Application Exploration.
-
Scalable and interactive data mining methods.
-
Integration of data mining with database systems, data warehouse systems and web database systems.
-
SStandardization of data mining query language.
-
Visual data mining.
-
New methods for mining complex types of data.
-
Biological data mining.
-
Data mining and software engineering.
-
Web mining.
-
Distributed data mining.
-
Real time data mining.
-
Multi database data mining.
-
Privacy protection and information security in data mining.