Python Data Science 简明教程
Python - Data Operations
Python 主要通过两个库 Pandas 和 Numpy 处理各种格式的数据。我们在前几章已经了解了这两个库的重要特性。本章我们将分别从每个库中了解一些基本示例,了解如何对数据进行操作。
Python handles data of various formats mainly through the two libraries, Pandas and Numpy. We have already seen the important features of these two libraries in the previous chapters. In this chapter we will see some basic examples from each of the libraries on how to operate on data.
Data Operations in Numpy
NumPy 中定义的最重要的对象是一个称为 ndarray 的 N 维数组类型。它描述了相同类型的项目的集合。可以使用基于零的索引访问集合中的项目。可以通过本教程后面描述的不同数组创建例程来构造 ndarray 类的实例。使用 NumPy 中的数组功能创建基本 ndarray 如下所示 −
The most important object defined in NumPy is an N-dimensional array type called ndarray. It describes the collection of items of the same type. Items in the collection can be accessed using a zero-based index. An instance of ndarray class can be constructed by different array creation routines described later in the tutorial. The basic ndarray is created using an array function in NumPy as follows −
numpy.array
以下是一些关于 Numpy 数据处理的示例。
Following are some examples on Numpy Data handling.
Example 3
# dtype parameter
import numpy as np
a = np.array([1, 2, 3], dtype = complex)
print a
输出如下 −
The output is as follows −
[ 1.+0.j, 2.+0.j, 3.+0.j]
Data Operations in Pandas
Pandas 通过 Series 、 Data Frame 和 Panel 处理数据。我们将从每个部分中了解一些示例。
Pandas handles data through Series,Data Frame, and Panel. We will see some examples from each of these.
Pandas Series
Series 是一个一维标记数组,能够容纳任何类型(整数、字符串、浮点数、Python 对象等)的数据。轴标签统称为索引。可以使用以下构造函数创建 Pandas Series −
Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index. A pandas Series can be created using the following constructor −
pandas.Series( data, index, dtype, copy)
Pandas DataFrame
数据框是二维数据结构,即,数据在行和列中按表格方式对齐。可以使用以下构造函数创建熊猫数据框 −
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. A pandas DataFrame can be created using the following constructor −
pandas.DataFrame( data, index, columns, dtype, copy)
现在我们使用数组创建索引数据框。
Let us now create an indexed DataFrame using arrays.
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print df
它的 output 如下所示 −
Its output is as follows −
Age Name
rank1 28 Tom
rank2 34 Jack
rank3 29 Steve
rank4 42 Ricky
Pandas Panel
panel 是数据的 3D 容器。术语 Panel data 衍生自计量经济学并且部分负责熊猫名称 − pan(el)-da(ta) -s。
A panel is a 3D container of data. The term Panel data is derived from econometrics and is partially responsible for the name pandas − pan(el)-da(ta)-s.
可以使用以下构造函数创建一个面板 −
A Panel can be created using the following constructor −
pandas.Panel(data, items, major_axis, minor_axis, dtype, copy)
在下面示例中我们根据数据框对象词典创建一个面板
In the below example we create a panel from dict of DataFrame Objects
#creating an empty panel
import pandas as pd
import numpy as np
data = {'Item1' : pd.DataFrame(np.random.randn(4, 3)),
'Item2' : pd.DataFrame(np.random.randn(4, 2))}
p = pd.Panel(data)
print p
它的 output 如下所示 −
Its output is as follows −
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 4 (major_axis) x 5 (minor_axis)
Items axis: 0 to 1
Major_axis axis: 0 to 3
Minor_axis axis: 0 to 4