Python Pandas 简明教程

Python Pandas - IO Tools

Pandas I/O API 是一个顶级读取器函数集，其访问方式类似于 pd.read_csv() ，它通常返回一个 Pandas 对象。

The Pandas I/O API is a set of top level reader functions accessed like pd.read_csv() that generally return a Pandas object.

用于读取文本文件（或平面文件）的两个主力函数是 read_csv() 和 read_table() 。它们都使用相同的解析代码，将表格数据智能地转换为一个 DataFrame 对象：

The two workhorse functions for reading text files (or the flat files) are read_csv() and read_table(). They both use the same parsing code to intelligently convert tabular data into a DataFrame object −

pandas.read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer',
names=None, index_col=None, usecols=None

pandas.read_csv(filepath_or_buffer, sep='\t', delimiter=None, header='infer',
names=None, index_col=None, usecols=None

以下是 csv 文件数据的样子 −

Here is how the csv file data looks like −

S.No,Name,Age,City,Salary
1,Tom,28,Toronto,20000
2,Lee,32,HongKong,3000
3,Steven,43,Bay Area,8300
4,Ram,38,Hyderabad,3900

将这些数据另存为 temp.csv 并对其进行操作。

Save this data as temp.csv and conduct operations on it.

S.No,Name,Age,City,Salary
1,Tom,28,Toronto,20000
2,Lee,32,HongKong,3000
3,Steven,43,Bay Area,8300
4,Ram,38,Hyderabad,3900

将这些数据另存为 temp.csv 并对其进行操作。

Save this data as temp.csv and conduct operations on it.

read.csv

read.csv 从 csv 文件中读取数据并创建一个 DataFrame 对象。

read.csv reads data from the csv files and creates a DataFrame object.

import pandas as pd

df=pd.read_csv("temp.csv")
print df

它的 output 如下所示 −

Its output is as follows −

   S.No     Name   Age       City   Salary
0     1      Tom    28    Toronto    20000
1     2      Lee    32   HongKong     3000
2     3   Steven    43   Bay Area     8300
3     4      Ram    38  Hyderabad     3900

custom index

这指定 csv 文件中使用 index_col. 自定义索引的列。

This specifies a column in the csv file to customize the index using index_col.

import pandas as pd

df=pd.read_csv("temp.csv",index_col=['S.No'])
print df

它的 output 如下所示 −

Its output is as follows −

S.No   Name   Age       City   Salary
1       Tom    28    Toronto    20000
2       Lee    32   HongKong     3000
3    Steven    43   Bay Area     8300
4       Ram    38  Hyderabad     3900

Converters

列的 dtype 可以作为 dict 传递。

dtype of the columns can be passed as a dict.

import pandas as pd

df = pd.read_csv("temp.csv", dtype={'Salary': np.float64})
print df.dtypes

它的 output 如下所示 −

Its output is as follows −

S.No       int64
Name      object
Age        int64
City      object
Salary   float64
dtype: object

默认情况下，Salary 列的 dtype 为 int ，但结果显示为 float ，因为我们已显式地强制转换了该类型。

By default, the dtype of the Salary column is int, but the result shows it as float because we have explicitly casted the type.

因此，数据看起来像浮点数 −

Thus, the data looks like float −

  S.No   Name   Age      City    Salary
0   1     Tom   28    Toronto   20000.0
1   2     Lee   32   HongKong    3000.0
2   3  Steven   43   Bay Area    8300.0
3   4     Ram   38  Hyderabad    3900.0

header_names

使用 names 参数指定标头名称。

Specify the names of the header using the names argument.

import pandas as pd

df=pd.read_csv("temp.csv", names=['a', 'b', 'c','d','e'])
print df

它的 output 如下所示 −

Its output is as follows −

       a        b    c           d        e
0   S.No     Name   Age       City   Salary
1      1      Tom   28     Toronto    20000
2      2      Lee   32    HongKong     3000
3      3   Steven   43    Bay Area     8300
4      4      Ram   38   Hyderabad     3900

请注意，标头名称附加自定义名称，但文件中的标头尚未消除。现在，我们使用 header 参数来删除该标头。

Observe, the header names are appended with the custom names, but the header in the file has not been eliminated. Now, we use the header argument to remove that.

如果标头不在第一行，则将行号传递给 header。这将跳过前几行。

If the header is in a row other than the first, pass the row number to header. This will skip the preceding rows.

import pandas as pd

df=pd.read_csv("temp.csv",names=['a','b','c','d','e'],header=0)
print df

它的 output 如下所示 −

Its output is as follows −

      a        b    c           d        e
0  S.No     Name   Age       City   Salary
1     1      Tom   28     Toronto    20000
2     2      Lee   32    HongKong     3000
3     3   Steven   43    Bay Area     8300
4     4      Ram   38   Hyderabad     3900

skiprows

skiprows 跳过指定数量的行。

skiprows skips the number of rows specified.

import pandas as pd

df=pd.read_csv("temp.csv", skiprows=2)
print df

它的 output 如下所示 −

Its output is as follows −

    2      Lee   32    HongKong   3000
0   3   Steven   43    Bay Area   8300
1   4      Ram   38   Hyderabad   3900