Python Pandas 简明教程
Python Pandas - Comparison with SQL
由于许多潜在的 Pandas 用户对 SQL 有些熟悉,本页旨在提供有关如何在 pandas 中执行各种 SQL 操作的一些示例。
Since many potential Pandas users have some familiarity with SQL, this page is meant to provide some examples of how various SQL operations can be performed using pandas.
import pandas as pd
url = 'https://raw.github.com/pandasdev/
pandas/master/pandas/tests/data/tips.csv'
tips=pd.read_csv(url)
print tips.head()
它的 output 如下所示 −
Its output is as follows −
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
SELECT
在 SQL 中,选择是使用一个逗号分隔的列表完成的,列出你选择的列(或一个 * 来选择所有列) −
In SQL, selection is done using a comma-separated list of columns that you select (or a * to select all columns) −
SELECT total_bill, tip, smoker, time
FROM tips
LIMIT 5;
使用 Pandas,列选择是通过将列名列表传递给 DataFrame 完成的 −
With Pandas, column selection is done by passing a list of column names to your DataFrame −
tips[['total_bill', 'tip', 'smoker', 'time']].head(5)
让我们检查完整程序 −
Let’s check the full program −
import pandas as pd
url = 'https://raw.github.com/pandasdev/
pandas/master/pandas/tests/data/tips.csv'
tips=pd.read_csv(url)
print tips[['total_bill', 'tip', 'smoker', 'time']].head(5)
它的 output 如下所示 −
Its output is as follows −
total_bill tip smoker time
0 16.99 1.01 No Dinner
1 10.34 1.66 No Dinner
2 21.01 3.50 No Dinner
3 23.68 3.31 No Dinner
4 24.59 3.61 No Dinner
不带列名列表地调用 DataFrame 将显示所有列(类似于 SQL 的 *)。
Calling the DataFrame without the list of column names will display all columns (akin to SQL’s *).
WHERE
在 SQL 中,筛选是通过 WHERE 子句完成的。
Filtering in SQL is done via a WHERE clause.
SELECT * FROM tips WHERE time = 'Dinner' LIMIT 5;
DataFrame 可以通过多种方式筛选;其中最直观的是使用布尔索引。
DataFrames can be filtered in multiple ways; the most intuitive of which is using Boolean indexing.
tips[tips['time'] == 'Dinner'].head(5)
让我们检查完整程序 −
Let’s check the full program −
import pandas as pd
url = 'https://raw.github.com/pandasdev/
pandas/master/pandas/tests/data/tips.csv'
tips=pd.read_csv(url)
print tips[tips['time'] == 'Dinner'].head(5)
它的 output 如下所示 −
Its output is as follows −
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
以上语句将一个 True/False 对象的 Series 传递给 DataFrame,返回所有带有 True 的行。
The above statement passes a Series of True/False objects to the DataFrame, returning all rows with True.
GroupBy
此操作获取整个数据集中每个组中的记录计数。例如,一个查询获取我们按性别留下的提示数量 −
This operation fetches the count of records in each group throughout a dataset. For instance, a query fetching us the number of tips left by sex −
SELECT sex, count(*)
FROM tips
GROUP BY sex;
Pandas 等效项是 −
The Pandas equivalent would be −
tips.groupby('sex').size()
让我们检查完整程序 −
Let’s check the full program −
import pandas as pd
url = 'https://raw.github.com/pandasdev/
pandas/master/pandas/tests/data/tips.csv'
tips=pd.read_csv(url)
print tips.groupby('sex').size()
它的 output 如下所示 −
Its output is as follows −
sex
Female 87
Male 157
dtype: int64
Top N rows
SQL 返回 top n rows ,使用 LIMIT −
SQL returns the top n rows using LIMIT −
SELECT * FROM tips
LIMIT 5 ;
Pandas 等效项是 −
The Pandas equivalent would be −
tips.head(5)
我们来检查完整示例 −
Let’s check the full example −
import pandas as pd
url = 'https://raw.github.com/pandas-dev/pandas/master/pandas/tests/data/tips.csv'
tips=pd.read_csv(url)
tips = tips[['smoker', 'day', 'time']].head(5)
print tips
它的 output 如下所示 −
Its output is as follows −
smoker day time
0 No Sun Dinner
1 No Sun Dinner
2 No Sun Dinner
3 No Sun Dinner
4 No Sun Dinner
这些是我们比较的一些基本操作,这即是我们之前在 Pandas 库的章节所了解的内容。
These are the few basic operations we compared are, which we learnt, in the previous chapters of the Pandas Library.