Python

Pandas Standard Deviation

“The “Pandas” is a great language for performing the analysis of data because of its great ecosystem of data-centric python packages. That makes the analysis and importing of both factors easier. The standard deviation is a “typical” deviation derived from the mean. It is used a lot, as it returns the original units of measure of the dataframe. The pandas used std() for the computation of the standard deviation. The standard deviation can be calculated from the given values that can be in the dataframe in the form of a row or column. We will be implementing all of the possible ways in which pandas standard deviation is used. For the implementation of the code, we will use the tool “spyder” as it is written in a python-friendly environment.”

Syntax

 

“df.std()

 
The following syntax is used for calculating the standard deviation in the dataframe. The “df” in the dataframe is the abbreviation of the “dataframe”. What does the standard deviation do? It measures how extended the required data is. The more expanded high values, the higher the standard deviation should occur.

Return

The pandas standard deviation returns the dataframe if the level is specified based on the requirement.

Note that the function “std()” will automatically ignore the “NaN” values in the “df” while calculating the pandas standard deviation. “NaN” can be explained as “not a number” which means that there is no value assigned to a particular.

Following are the methods which will be executed with examples of the pandas standard deviation:

    • Pandas standard deviation calculation in a single column.
    • Pandas standard deviation calculation in multiple columns.
    • Pandas standard deviation calculation of all numeric columns.
    • pandas standard deviation using the axis = 1.
    • pandas standard deviation using the axis = 0.

Creating the Dataframe for the Calculation of Standard Deviation in Pandas

First, open the “spyder” software. Now import the pandas library as pd. We will create a dataframe that consists of a scoreboard having terms as “x”, “y” and “z” with their points as “22”, “10”, “11”, “16”, “12”, “45”, “36”, and “40”. We have their assists values as “8”, “9”, “13”, “7”, “22”, “24”, “4” and “6” also, having the value of rebounds as “17”, “14”, “3”, 5”, “9”, “8”, “7” and “4”.


The displays show the created dataframe according to the values assigned in the code:

Example # 01: Pandas Standard Deviation Calculation in a Single Column

In this example, we will calculate the standard deviation of a single column in the pandas dataframe. The dataframe has the values of the team as “u”, “v” and “b” with their points as “44”, “33”, “22”, “44”, “45”, “88”, “96” and “78”. The values of assists are as “7”,”8”, “9”, “10”, “11”, “14”, “18”, and “17” also having the values of rebounds as “11”, “9”, “8”, “7”, “6”, “5”, “4”, and “3”. The column “points” is selected from the dataframe to calculate the single column standard deviation.


The output shows the standard deviation calculated of the column “points”:

Example # 02: Pandas Standard Deviation Calculation in Multiple Columns

In this example, we will execute the pandas standard deviation calculations in multiple columns. In this dataframe, the data is again of the sports scoreboard having the values of the team as “n”, “w” and “t” with the score as “33”, “22”, “66”, “55”, “44”, “88”, “99”, and “77”. The assists as “9”, “7”, “8”, “11”, “16”, “14”, “12” and “13” and rebounds as “5”, “8”, “1”, “2”, “3”, “4”, “6”, and “7”. Here we will calculate the standard deviation of the two columns “points” and “rebounds” by using the function std() applied to the dataframe.


As we see, the output shows the standard deviation came up as 26.944387 in the points column and 2.449490 in the rebound column, respectively.

Example # 03: Pandas Standard Deviation Calculation of All Numeric Columns

Now we have learned how to calculate the standard deviation of single and multiple rows. What if we do not want to specify all the column names in the dataframe and calculate the whole dataframe? This is possible with just a simple function implementation of the pandas standard deviation for the calculation of the complete dataframe altogether in the results. The dataframe here consists of “l”, “m” and “o” with the scoring values “33”, “36”, “79”, “78”, “58”, “55”, and two teams score the same that is “25”. The assists are as “1”, “2”, “3”, “4”, “6”, “9”, “5” and “7” and their rebounds as “14”, “10”, “2”, “5”, “8”, “3”, “6” and “9”. We can calculate all of the standard column deviations by pandas in the dataframe using the pandas “std()” function.


The display has the calculated standard deviation of the whole “df” shown below; we can also notice that the pandas have not calculated the standard deviation of the first column, which is “team”, because it is not a numeric column.

Example # 04: Pandas Standard Deviation Using the Axis = 0

In this example, the dataframes have the teams of the sports as “g”, “h”, and “k” with further data. Here, we will calculate the standard deviation by using the axis as “0”, a parameter used in the pandas standard deviation. This argument calculates the standard deviation column-wise of the dataframe.


The following output displays the results in columns of the standard deviation calculated. The points column has the calculated standard deviation as “24.0313062”, the assists column has the calculated standard deviation as “2.669270” and the rebound column’s calculated standard deviation is shown as “3.943802”.

Example # 05: Pandas Standard Deviation Using the Axis = 1

Here we will use the axis parameter assigned as “1” to compute the standard deviation in pandas. What difference can axis “1” make? The “1” axis argument calculates the row-wise standard deviation of the numeric values in the dataframe. The dataframe has the three teams as “s”, “d” and “e”, with the addition of data columns created as points of the team, assists of the team, and rebounds of the team. Directions all are assigned with different values in the dataframe. This axis parameter is such a game changer as, by the time, we need to work on the data where we want it to be in a column plus point calculated of standard deviation performed.


The following output displays the standard deviation calculated in a row of the dataframe:

Conclusion

Pandas standard deviation is a very technical function, which is a very beneficial function as it finds the standard deviation of the enthusiasm pact of pandas dataframes. In this editorial, we have studied the methods of calculating the standard deviation in pandas. We have done single-column calculations of standard deviation and multiple columns and also calculated the standard deviation of the entire dataframe together. All the strategies work well as long as they are used consistently and with the desired results.

About the author

Aqsa Yasin

I am a self-motivated information technology professional with a passion for writing. I am a technical writer and love to write for all Linux flavors and Windows.