Python Pandas 简明教程
Python Pandas - Sorting
Pandas 提供两种排序方式。它们是-
There are two kinds of sorting available in Pandas. They are −
-
By label
-
By Actual Value
我们考虑一个有输出的示例。
Let us consider an example with an output.
import pandas as pd
import numpy as np
unsorted_df=pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],colu
mns=['col2','col1'])
print unsorted_df
它的 output 如下所示 −
Its output is as follows −
col2 col1
1 -2.063177 0.537527
4 0.142932 -0.684884
6 0.012667 -0.389340
2 -0.548797 1.848743
3 -1.044160 0.837381
5 0.385605 1.300185
9 1.031425 -1.002967
8 -0.407374 -0.435142
0 2.237453 -1.067139
7 -1.445831 -1.701035
在 unsorted_df 中, labels 和 values 未排序。让我们看看如何对它们进行排序。
In unsorted_df, the labels and the values are unsorted. Let us see how these can be sorted.
By Label
使用 sort_index() 方法,通过传递axis参数和排序顺序,可以对DataFrame进行排序。默认情况下,按行标签升序排序。
Using the sort_index() method, by passing the axis arguments and the order of sorting, DataFrame can be sorted. By default, sorting is done on row labels in ascending order.
import pandas as pd
import numpy as np
unsorted_df = pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],colu
mns = ['col2','col1'])
sorted_df=unsorted_df.sort_index()
print sorted_df
它的 output 如下所示 −
Its output is as follows −
col2 col1
0 0.208464 0.627037
1 0.641004 0.331352
2 -0.038067 -0.464730
3 -0.638456 -0.021466
4 0.014646 -0.737438
5 -0.290761 -1.669827
6 -0.797303 -0.018737
7 0.525753 1.628921
8 -0.567031 0.775951
9 0.060724 -0.322425
Order of Sorting
通过将布尔值传递给ascending参数,可以控制排序顺序。我们考虑以下示例来理解它。
By passing the Boolean value to ascending parameter, the order of the sorting can be controlled. Let us consider the following example to understand the same.
import pandas as pd
import numpy as np
unsorted_df = pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],colu
mns = ['col2','col1'])
sorted_df = unsorted_df.sort_index(ascending=False)
print sorted_df
它的 output 如下所示 −
Its output is as follows −
col2 col1
9 0.825697 0.374463
8 -1.699509 0.510373
7 -0.581378 0.622958
6 -0.202951 0.954300
5 -1.289321 -1.551250
4 1.302561 0.851385
3 -0.157915 -0.388659
2 -1.222295 0.166609
1 0.584890 -0.291048
0 0.668444 -0.061294
Sort the Columns
通过将axis参数传递一个值0或1,可以在列标签上进行排序。默认情况下,axis=0,按行排序。我们考虑以下示例来理解它。
By passing the axis argument with a value 0 or 1, the sorting can be done on the column labels. By default, axis=0, sort by row. Let us consider the following example to understand the same.
import pandas as pd
import numpy as np
unsorted_df = pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],colu
mns = ['col2','col1'])
sorted_df=unsorted_df.sort_index(axis=1)
print sorted_df
它的 output 如下所示 −
Its output is as follows −
col1 col2
1 -0.291048 0.584890
4 0.851385 1.302561
6 0.954300 -0.202951
2 0.166609 -1.222295
3 -0.388659 -0.157915
5 -1.551250 -1.289321
9 0.374463 0.825697
8 0.510373 -1.699509
0 -0.061294 0.668444
7 0.622958 -0.581378
By Value
与索引排序一样, sort_values() 是按值排序的方法。它接受一个“by”参数,它将使用要对其进行值排序的DataFrame的列名。
Like index sorting, sort_values() is the method for sorting by values. It accepts a 'by' argument which will use the column name of the DataFrame with which the values are to be sorted.
import pandas as pd
import numpy as np
unsorted_df = pd.DataFrame({'col1':[2,1,1,1],'col2':[1,3,2,4]})
sorted_df = unsorted_df.sort_values(by='col1')
print sorted_df
它的 output 如下所示 −
Its output is as follows −
col1 col2
1 1 3
2 1 2
3 1 4
0 2 1
注意,col1值已排序,并且相应的col2值和行索引将与col1一起改变。因此,它们看起来是未排序的。
Observe, col1 values are sorted and the respective col2 value and row index will alter along with col1. Thus, they look unsorted.
'by' 参数采用列值列表。
'by' argument takes a list of column values.
import pandas as pd
import numpy as np
unsorted_df = pd.DataFrame({'col1':[2,1,1,1],'col2':[1,3,2,4]})
sorted_df = unsorted_df.sort_values(by=['col1','col2'])
print sorted_df
它的 output 如下所示 −
Its output is as follows −
col1 col2
2 1 2
1 1 3
3 1 4
0 2 1
Sorting Algorithm
sort_values() 提供了一个从mergesort、heapsort和quicksort中选择算法的条款。Mergesort是唯一稳定的算法。
sort_values() provides a provision to choose the algorithm from mergesort, heapsort and quicksort. Mergesort is the only stable algorithm.
import pandas as pd
import numpy as np
unsorted_df = pd.DataFrame({'col1':[2,1,1,1],'col2':[1,3,2,4]})
sorted_df = unsorted_df.sort_values(by='col1' ,kind='mergesort')
print sorted_df
它的 output 如下所示 −
Its output is as follows −
col1 col2
1 1 3
2 1 2
3 1 4
0 2 1