The pandas stack is used for stacking the data from the sequence of the columns to the index manner. It returns multiple levels of index displaying in a new DataFrame. We will implement how we can use the pandas stack() function.
Syntax
Parameters
- Level – It takes an integer parameter that will specify the stacking level.
1. We can set levels like Level – 0,1, etc. - Dropna – This other parameter, “dropna”, is used to drop rows in the stacked DataFrame having NaN values.
Scenario 1: Single-Level Column
Create a DataFrame with 3 columns and 5 rows. Here, the column names are – [“Exam type”,”Marks”,”Result”].
Now, we will stack the DataFrame:
results = pandas.DataFrame([["Internal", 98,"pass"],
["Internal", 45,"fail"],
["External", 89,"pass"],
["External", 67,"pass"],
["External", 18,"fail"]],
columns = ["Exam type","Marks","Result"],
index = ['Ram','Sravan','Govind','Anup', 'Jab']
)
print(results,"\n")
# Apply stack() on single level column
print(results.stack())
Output
Ram Internal 98 pass
Sravan Internal 45 fail
Govind External 89 pass
Anup External 67 pass
Jab External 18 fail
Ram Exam type Internal
Marks 98
Result pass
Sravan Exam type Internal
Marks 45
Result fail
Govind Exam type External
Marks 89
Result pass
Anup Exam type External
Marks 67
Result pass
Jab Exam type External
Marks 18
Result fail
dtype: object
Explanation
Now, the Stacked DataFrame is displayed. Let’s discuss this in detail for one Row.
Ram – Exam Type is ‘Internal’, Ram – Marks is 98, and Ram – Result is ‘fail’. Similarly, for all the remaining rows, you can see the values are stacked.
Scenario 2: Multi-Level Column With Level – 0
One of the following ways in Python to create a MultiIndex is by using the MultiIndex.from_tuples() method. It will take column names in the list of tuples as a parameter. Finally, we will pass this to the “columns” parameter in the pandas DataFrame.
Syntax
Example 1
Create a DataFrame with rows that have MultiIndex. Stack the DataFrame with Level-0.
results = pandas.DataFrame([["Internal", 98,"pass"],
["Internal", 45,"fail"],
["External", 89,"pass"],
["External", 89,"pass"],
["External", 45,"fail"]],
index = ['Ram','Sravan','Govind','Anup', 'Jab'],
columns=pandas.MultiIndex.from_tuples( [('Exams', 'Exam Type'),('Marks Secured', 'Total'), ('Status', 'Result')]
))
print(results,"\n")
# Apply stack() with level-0 on multi level column
print(results.stack(level=0))
Output
Exam Type Total Result
Ram Internal 98 pass
Sravan Internal 45 fail
Govind External 89 pass
Anup External 89 pass
Jab External 45 fail
Exam Type Result Total
Ram Exams Internal NaN NaN
Marks Secured NaN NaN 98.0
Status NaN pass NaN
Sravan Exams Internal NaN NaN
Marks Secured NaN NaN 45.0
Status NaN fail NaN
Govind Exams External NaN NaN
Marks Secured NaN NaN 89.0
Status NaN pass NaN
Anup Exams External NaN NaN
Marks Secured NaN NaN 89.0
Status NaN pass NaN
Jab Exams External NaN NaN
Marks Secured NaN NaN 45.0
Status NaN fail NaN
Explanation
You can see the multi indices. For the Row:
- Ram – For index ‘Exams’ and ‘Exam Type’ – the value is .Internal.
- Ram – For index ‘Exams’ and ‘Result’ – the value is NaN (Not a number).
- Ram – For index ‘Exams’ and ‘Total’ – the value is NaN.
- Ram – For index ‘Marks Secured’ and ‘Exam Type’ – the value is NaN.
- Ram – For index ‘Marks Secured’ and ‘Result’ – the value is NaN.
- Ram – For index ‘Marks Secured’ and ‘Total’ – the value is 98.0.
- Ram – For index ‘Status’ and ‘Exam Type’ – the value is NaN.
- Ram – For index ‘Status’ and ‘Result’ – the value is “pass”.
- Ram – For index ‘Status’ and ‘Total’ – the value is NaN
Similarly, for all rows stacking happened in the previous format. For missed values, NaN is replaced.
Example 2
Create a DataFrame with rows that have MultiIndex. Stack the DataFrame with Level 2.
results = pandas.DataFrame([["Internal", 98,"pass"],
["Internal", 45,"fail"],
["External", 89,"pass"],
["External", 67,"pass"],
["External", 18,"fail"]],
index = ['Ram','Sravan','Govind','Anup', 'Jab'],
columns=pandas.MultiIndex.from_tuples( [('Exams', 'Exam Type'),('Marks Secured', 'Total'), ('Status', 'Result')]
))
# Apply stack() with level-1 on multi level column
print(results.stack(level=1))
Output
Ram Exam Type Internal NaN NaN
Result NaN NaN pass
Total NaN 98.0 NaN
Sravan Exam Type Internal NaN NaN
Result NaN NaN fail
Total NaN 45.0 NaN
Govind Exam Type External NaN NaN
Result NaN NaN pass
Total NaN 89.0 NaN
Anup Exam Type External NaN NaN
Result NaN NaN pass
Total NaN 67.0 NaN
Jab Exam Type External NaN NaN
Result NaN NaN fail
Total NaN 18.0 NaN
Conclusion
Pandas “stack” is an extravagant technique for stacking the level columns into rows (index). In the areas where workers need to work on the rows instead of the columns or may want to have the data in the row manner, but they have done it in columns, this method is for them. It will save their precious time by simply using the method of pandas stack. We have done various ways in which the pandas stack works according to the situation. Every situation has its way of solving the problem to give the desired results in the DataFrame.