Python

pandas series sorting

In this post, we will look at different ways of sorting a pandas series. Open python in the terminal using the command python. Once the terminal opens python, import pandas in it. Pandas is the python library that contains the series object.

$ python

Python 2.7.18 (default, Mar 8 2021, 13:02:45)

[GCC 9.3.0] on linux2

Type "help", "copyright", "credits" or "license" for more information.

>>> import pandas as pd

A pandas series is a one-dimensional array with axis labels (indexes). The labels need not be unique, but they must be hashable. The series object allows the user to store a collection of similar kinds of variables. It can store any data type – integer, float, objects, etc. A series object can be sorted in multiple ways using different parameters in its call. A series can be initialized using the command pd.Series. By default, pandas sort the series object in ascending order.

>>> s = pd.Series([6, 3, 8, 2, 9])

>>> s.sort_values()

3 2

1 3

0 6

2 8

4 9

dtype: int64

Sorting of values in descending order can be achieved using the parameter ascending. By setting ascending to False, the series can be sorted in descending order.

>>> s.sort_values(ascending=False)

4 9

2 8

0 6

1 3

3 2

dtype: int64

By default, the call to sort_values returns a copy of the series object. For big-size datasets, this is unviable since it results in generating a new copy of data. To avoid that, the sorting operation can be performed in-place using the inplace keyword. By assigning inplace as True, it makes the series object sort inplace without taking additional space.

>>> s.sort_values(ascending=False, inplace=True)

>>> s

4 9

2 8

0 6

1 3

3 2

dtype: int64

In the above case, note that no additional copy of data is returned.

Pandas allow the user to choose the sorting algorithm for performing the sorting operation. One can set the sorting algorithm using the kind parameter. The kind parameter takes one of the following values as arguments: quicksort, mergesort, heapsort. By default, the quicksort algorithm is used to sort the values.

>>> s.sort_values(kind=’quicksort’)

3 2

1 3

0 6

2 8

4 9

dtype: int64

Sometimes, a series object contains NA values. NA values are the values that are missing in the series object. The NA values can be placed either at the beginning of the array or the last of the array. The position can be assigned using the parameter na_position.

>>> s = pd.Series([6, 3, 8, np.nan, 2, 9])

>>> s.sort_values(na_position=’last’)

4 2.0

1 3.0

0 6.0

2 8.0

5 9.0

3 NaN

dtype: float64

The NA values could also be dropped before sorting. This can be achieved using the command dropna.

>>> s = pd.Series([6, 3, 8, np.nan, 2, 9])

>>> s.dropna().sort_values(na_position=’last’)

4 2.0

1 3.0

0 6.0

2 8.0

5 9.0

dtype: float64

During the sort, we observe that the index remains the same for the new sorted array. The index can be ignored using the parameter ignore_index. It takes a bool value: True or False. If True, the index is ignored in the output. By default, it is False.

>>> s.sort_values(ignore_index=True, na_position='first')

0 NaN

1 2.0

2 3.0

3 6.0

4 8.0

5 9.0

dtype: float64

Sometimes it is convenient to use a key function to sort the values. In such cases, one can explicitly pass the key function using the key parameter. For sorting using the key function, the key function is applied to the series values before sorting. Consider the below example, which does not use key parameters.

>>> s = pd.Series(data=['a', 'B', 'c', 'D'])

>>> s.sort_values()

1 B

3 D

0 a

2 c

dtype: object

The above series can be sorted using key parameter as follows.

>>> s.sort_values(key=lambda x : x.str.lower())

0 a

1 B

2 c

3 D

dtype: object

Instead of a lambda function, numpy functions could also be used as key functions. In the below function, the sorted order is evaluated using the np.sin function which computes the sine value and uses it as a sorting key.

>>> import numpy as np

>>> s = pd.Series([1, 2, 3, 4, 5])

>>> s.sort_values(key=np.sin)

4 5

3 4

2 3

0 1

1 2

dtype: int64

A series object can also be sorted by the index value. Consider the below example. To sort by index, we call sort_index.

>>> s = pd.Series(data=[1, 2, 3, 4], index=['d', 'b', 'c', 'a'])

>>> s.sort_index()

a 4

b 2

c 3

d 1

dtype: int64

Sorting by an index is the same in all ways as sorting by values, including the parameters, except that it operates on the index rather than the data.

About the author

Arun Palaniappan