Python Pandas 简明教程

Introduction to Data Structures

熊猫处理以下三个数据结构 −

Pandas deals with the following three data structures −

  1. Series

  2. DataFrame

  3. Panel

这些数据结构建立在 Numpy 阵列的基础上,也就是说它们很快。

These data structures are built on top of Numpy array, which means they are fast.

Dimension & Description

思考这些数据结构的最佳方式是,高维数据结构是其低维数据结构的一个容器。例如,数据框是序列的容器,面板是数据框的容器。

The best way to think of these data structures is that the higher dimensional data structure is a container of its lower dimensional data structure. For example, DataFrame is a container of Series, Panel is a container of DataFrame.

Data Structure

Dimensions

Description

Series

1

1D labeled homogeneous array, sizeimmutable.

Data Frames

2

General 2D labeled, size-mutable tabular structure with potentially heterogeneously typed columns.

Panel

3

General 3D labeled, size-mutable array.

构建和处理两个或多个维度阵列是一项繁琐的任务,在编写函数时考虑数据集的方向负担由用户承担。但是,使用 Pandas 数据结构,用户的脑力负担会减轻。

Building and handling two or more dimensional arrays is a tedious task, burden is placed on the user to consider the orientation of the data set when writing functions. But using Pandas data structures, the mental effort of the user is reduced.

例如,对于表格数据(DataFrame),考虑 index (行)和 columns 比考虑轴 0 和轴 1 在语义上更有帮助。

For example, with tabular data (DataFrame) it is more semantically helpful to think of the index (the rows) and the columns rather than axis 0 and axis 1.

Mutability

所有 Pandas 数据结构都是值可变的(可以更改),但除了 Series 之外,其余的大小都是可变的。Series 的大小不可变。

All Pandas data structures are value mutable (can be changed) and except Series all are size mutable. Series is size immutable.

Note − DataFrame 用途广泛,是最重要的数据结构之一。Panel 用途少得多。

Note − DataFrame is widely used and one of the most important data structures. Panel is used much less.

Series

Series 是具有齐次数据的一维类似数组的结构。例如,以下系列收集了整数 10、23、56、…

Series is a one-dimensional array like structure with homogeneous data. For example, the following series is a collection of integers 10, 23, 56, …

10

23

56

17

52

61

73

90

26

72

Key Points

  1. Homogeneous data

  2. Size Immutable

  3. Values of Data Mutable

DataFrame

DataFrame 是一个具有异构数据的二维数组。例如,

DataFrame is a two-dimensional array with heterogeneous data. For example,

Name

Age

Gender

Rating

Steve

32

Male

3.45

Lia

28

Female

4.6

Vin

45

Male

3.9

Katie

38

Female

2.78

该表表示一个组织的销售团队及其总体绩效评级的表示,数据以行和列表示。每一列表示一个属性,每一行表示一个人。

The table represents the data of a sales team of an organization with their overall performance rating. The data is represented in rows and columns. Each column represents an attribute and each row represents a person.

Data Type of Columns

四列的数据类型如下 −

The data types of the four columns are as follows −

Column

Type

Name

String

Age

Integer

Gender

String

Rating

Float

Key Points

  1. Heterogeneous data

  2. Size Mutable

  3. Data Mutable

Panel

Panel 是一个具有异类数据的三个维度数据结构。很难以图形方式表示 Panel。但可以将 Panel 阐释为 DataFrame 的容器。

Panel is a three-dimensional data structure with heterogeneous data. It is hard to represent the panel in graphical representation. But a panel can be illustrated as a container of DataFrame.

Key Points

  1. Heterogeneous data

  2. Size Mutable

  3. Data Mutable