Python Pandas 简明教程
Python Pandas - IO Tools
Pandas I/O API 是一个顶级读取器函数集,其访问方式类似于 pd.read_csv() ,它通常返回一个 Pandas 对象。
The Pandas I/O API is a set of top level reader functions accessed like pd.read_csv() that generally return a Pandas object.
用于读取文本文件(或平面文件)的两个主力函数是 read_csv() 和 read_table() 。它们都使用相同的解析代码,将表格数据智能地转换为一个 DataFrame 对象:
The two workhorse functions for reading text files (or the flat files) are read_csv() and read_table(). They both use the same parsing code to intelligently convert tabular data into a DataFrame object −
pandas.read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer',
names=None, index_col=None, usecols=None
pandas.read_csv(filepath_or_buffer, sep='\t', delimiter=None, header='infer',
names=None, index_col=None, usecols=None
以下是 csv 文件数据的样子 −
Here is how the csv file data looks like −
S.No,Name,Age,City,Salary
1,Tom,28,Toronto,20000
2,Lee,32,HongKong,3000
3,Steven,43,Bay Area,8300
4,Ram,38,Hyderabad,3900
将这些数据另存为 temp.csv 并对其进行操作。
Save this data as temp.csv and conduct operations on it.
S.No,Name,Age,City,Salary
1,Tom,28,Toronto,20000
2,Lee,32,HongKong,3000
3,Steven,43,Bay Area,8300
4,Ram,38,Hyderabad,3900
将这些数据另存为 temp.csv 并对其进行操作。
Save this data as temp.csv and conduct operations on it.
read.csv
read.csv 从 csv 文件中读取数据并创建一个 DataFrame 对象。
read.csv reads data from the csv files and creates a DataFrame object.
import pandas as pd
df=pd.read_csv("temp.csv")
print df
它的 output 如下所示 −
Its output is as follows −
S.No Name Age City Salary
0 1 Tom 28 Toronto 20000
1 2 Lee 32 HongKong 3000
2 3 Steven 43 Bay Area 8300
3 4 Ram 38 Hyderabad 3900
custom index
这指定 csv 文件中使用 index_col. 自定义索引的列。
This specifies a column in the csv file to customize the index using index_col.
import pandas as pd
df=pd.read_csv("temp.csv",index_col=['S.No'])
print df
它的 output 如下所示 −
Its output is as follows −
S.No Name Age City Salary
1 Tom 28 Toronto 20000
2 Lee 32 HongKong 3000
3 Steven 43 Bay Area 8300
4 Ram 38 Hyderabad 3900
Converters
列的 dtype 可以作为 dict 传递。
dtype of the columns can be passed as a dict.
import pandas as pd
df = pd.read_csv("temp.csv", dtype={'Salary': np.float64})
print df.dtypes
它的 output 如下所示 −
Its output is as follows −
S.No int64
Name object
Age int64
City object
Salary float64
dtype: object
默认情况下,Salary 列的 dtype 为 int ,但结果显示为 float ,因为我们已显式地强制转换了该类型。
By default, the dtype of the Salary column is int, but the result shows it as float because we have explicitly casted the type.
因此,数据看起来像浮点数 −
Thus, the data looks like float −
S.No Name Age City Salary
0 1 Tom 28 Toronto 20000.0
1 2 Lee 32 HongKong 3000.0
2 3 Steven 43 Bay Area 8300.0
3 4 Ram 38 Hyderabad 3900.0
header_names
使用 names 参数指定标头名称。
Specify the names of the header using the names argument.
import pandas as pd
df=pd.read_csv("temp.csv", names=['a', 'b', 'c','d','e'])
print df
它的 output 如下所示 −
Its output is as follows −
a b c d e
0 S.No Name Age City Salary
1 1 Tom 28 Toronto 20000
2 2 Lee 32 HongKong 3000
3 3 Steven 43 Bay Area 8300
4 4 Ram 38 Hyderabad 3900
请注意,标头名称附加自定义名称,但文件中的标头尚未消除。现在,我们使用 header 参数来删除该标头。
Observe, the header names are appended with the custom names, but the header in the file has not been eliminated. Now, we use the header argument to remove that.
如果标头不在第一行,则将行号传递给 header。这将跳过前几行。
If the header is in a row other than the first, pass the row number to header. This will skip the preceding rows.
import pandas as pd
df=pd.read_csv("temp.csv",names=['a','b','c','d','e'],header=0)
print df
它的 output 如下所示 −
Its output is as follows −
a b c d e
0 S.No Name Age City Salary
1 1 Tom 28 Toronto 20000
2 2 Lee 32 HongKong 3000
3 3 Steven 43 Bay Area 8300
4 4 Ram 38 Hyderabad 3900