Big Data Analytics 简明教程

Big Data Analytics - Characteristics

大数据是指大到可以分析以揭示模式、趋势和关联的数据集,特别是与人类行为和交互相关的数据集。

Big Data refers to extremely large data sets that may be analyzed to reveal patterns, trends, and associations, especially relating to human behaviour and interactions.

Big Data Characteristics

大数据的特征通常用“五个 V”来总结,这些特征包括:

The characteristics of Big Data, often summarized by the "Five V’s," include −

Volume

顾名思义,量指每秒使用 IoT 设备、社交媒体、视频、金融交易和客户日志生成和存储的庞大数据。从设备或不同来源生成的数据可能是数 terabyte 到 petabyte,甚至更多。管理如此海量的数据需要强大的存储解决方案和先进的数据处理技术。Hadoop 框架用于存储、访问和处理大数据。

As its name implies; volume refers to a large size of data generated and stored every second using IoT devices, social media, videos, financial transactions, and customer logs. The data generated from the devices or different sources can range from terabytes to petabytes and beyond. To manage such large quantities of data requires robust storage solutions and advanced data processing techniques. The Hadoop framework is used to store, access and process big data.

Facebook 每天生成 4 petabyte 的数据,即一百万千兆字节。所有这些数据都存储在所谓的 Hive 中,包含约 300 petabyte 的数据 [1]。

Facebook generates 4 petabytes of data per day that’s a million gigabytes. All that data is stored in what is known as the Hive, which contains about 300 petabytes of data [1].

big data analytics characteristics1

Fig :每天在社交应用上花费的分钟数(图片源:Recode)

Fig: Minutes spent per day on social apps (Image source: Recode)

big data analytics characteristics2

Fig :印度主要社交媒体应用的用户参与度(图片源:www.statista.com)[2]

Fig: Engagement per user on leading social media apps in India (Image source: www.statista.com) [2]

从上面的图表中,我们可以预测用户如何花时间访问不同的频道和转换数据,因此数据量正在日益增大。

From the above graph, we can predict how users are devoting their time to accessing different channels and transforming data, hence, data volume is becoming higher day by day.

Velocity

生成、处理和分析数据的速度。随着物联网设备和实时数据流的发展和使用,数据的速度极大地提高,需要能够立即处理数据以得出有意义见解的系统。一些高速数据应用程序如下所示:

The speed with which data is generated, processed, and analysed. With the development and usage of IoT devices and real-time data streams, the velocity of data has expanded tremendously, demanding systems that can process data instantly to derive meaningful insights. Some high-velocity data applications are as follows −

big data analytics characteristics3

Variety

大数据包含不同类型的数据,例如结构化数据(在数据库中找到)、非结构化数据(例如文本、图像、视频)和半结构化数据(例如 JSON 和 XML)。这种多样性需要使用高级工具进行数据集成、存储和分析。

Big Data includes different types of data like structured data (found in databases), unstructured data (like text, images, videos), and semi-structured data (like JSON and XML). This diversity requires advanced tools for data integration, storage, and analysis.

Challenges of Managing Variety in Big Data -

Challenges of Managing Variety in Big Data

big data analytics characteristics4

Variety in Big Data Applications -

Variety in Big Data Applications

big data analytics characteristics5

Veracity

准确性指的是数据的准确性和可信度。确保数据质量、处理数据差异以及应对数据歧义是所有大数据分析中的主要问题。

Veracity refers accuracy and trustworthiness of the data. Ensuring data quality, addressing data discrepancies, and dealing with data ambiguity are all major issues in Big Data analytics.

Value

将海量数据转化为有用见解的能力。大数据的最终目标是提取有意义且可操作的见解,这些见解可以帮助您做出更好的决策、创造新产品、增强消费者体验以及获得竞争优势。

The ability to convert large volumes of data into useful insights. Big Data’s ultimate goal is to extract meaningful and actionable insights that can lead to better decision-making, new products, enhanced consumer experiences, and competitive advantages.

这些特质 характеризуетsya 决定了大数据的本质,并突出了现代工具和技术对于有效数据管理、处理和分析的重要性。

These qualities characterise the nature of Big Data and highlight the importance of modern tools and technologies for effective data management, processing, and analysis.