Python

Pandas Json Normalize

The “JSON” basically stands for the “JavaScript Object Notation”.

Pandas has the most popular “data processing framework” in Python, which is the “JSON” normalize” feature. It is a built-in feature of Pandas. It is the simplest way to do the Pandas JSON normalization() using the “Python” request modules.

In this article, we will see different levels of normalization.

Syntax

pandas.json_normalize(data, max_level=None,record_prefix=None)

Here:

  1. Data can be a dictionary or a list of dictionaries.
  2. Max_level is used to specify maximum levels to normalize. If it is not specified, it will normalize all levels. It takes an integer as a parameter; by default, it is None.
  3. If you want to add any prefix to the Label after normalization, you can add it by passing the prefix to the record_prefix parameter. By default, it is None.

Example 1: With Data as Parameter

Here, we will only pass the JSON data. So it will normalize all levels. Let’s create five dictionaries inside a list (List of dictionaries) and normalize it.

import pandas

# Consider the JSON data

actual_json_data = [

{"state": "AP", "code": "APH456"},

{"state": "TS", "code": "SCVH456"},

{"state": "MUM", "code": "TYH4543"},

{"state": "PUN", "code": "AYU78BN6"},

{"state": "BNG", "code": "RE456"},

]

print(actual_json_data)

# normalize the above data

normalized = pandas.json_normalize(actual_json_data)

# Display the normalized data

print(normalized)

Output

[{'state': 'AP', 'code': 'APH456'}, {'state': 'TS', 'code': 'SCVH456'}, {'state': 'MUM', 'code': 'TYH4543'}, {'state': 'PUN', 'code': 'AYU78BN6'}, {'state': 'BNG', 'code': 'RE456'}]

state code

0 AP APH456

1 TS SCVH456

2 MUM TYH4543

3 PUN AYU78BN6

4 BNG RE456

Explanation

JSON data is converted into Pandas with a maximum level of normalization.

Example 2

Let’s create five dictionaries inside a list (List of dictionaries) with some empty values and normalize it.

import pandas

# Consider the JSON data

actual_json_data = [

{"state": "AP", "code": "APH456","length":100},

{"state": "TS", "code": "SCVH456"},

{"state": "MUM", "length":200},

{"state": "PUN", "code": "AYU78BN6"},

{"state": "BNG","length":300},

]

print(actual_json_data)

# normalize the above data

normalized = pandas.json_normalize(actual_json_data)

# Display the normalized data

print(normalized)

Output

[{'state': 'AP', 'code': 'APH456', 'length': 100}, {'state': 'TS', 'code': 'SCVH456'}, {'state': 'MUM', 'length': 200}, {'state': 'PUN', 'code': 'AYU78BN6'}, {'state': 'BNG', 'length': 300}]

state code length

0 AP APH456 100.0

1 TS SCVH456 NaN

2 MUM NaN 200.0

3 PUN AYU78BN6 NaN

4 BNG NaN 300.0

Explanation

JSON data is converted into Pandas with a maximum level of normalization. If the value is empty, then NaN is returned at empty positions.

Example 3: With the max_level Parameter

Let’s create five dictionaries inside a list (List of dictionaries) and normalize it up to level 0.

import pandas

# Consider the JSON data

actual_json_data = [

{"state": {"state 1": "AP","state 2":"Ind","state 3":"Cal"}, "code": "APH456","length":100},

{"state": "TS", "code": "SCVH456","length":160},

{"state": "MUM", "code": "SAVH4MO6","length":200},

{"state": "PUN", "code": "AYU78BN6","length":200},

{"state": {"state 1":"BNG","state 2":"TLN"},"code": "AYU78BN6","length":300},

]

# normalize the above data up to level 0

normalized = pandas.json_normalize(actual_json_data,max_level=0)

# Display the normalized data

print(normalized)

Output

state code length

0 {'state 1': 'AP', 'state 2': 'Ind', 'state 3':... APH456 100

1 TS SCVH456 160

2 MUM SAVH4MO6 200

3 PUN AYU78BN6 200

4 {'state 1': 'BNG', 'state 2': 'TLN'} AYU78BN6 300

Explanation

Normalization is done only up to level 0. Still, we can normalize the data in the state column.

Let’s normalize up to level 1 in the next example.

Example 4

Let’s create five dictionaries inside a list (List of dictionaries) and normalize it up to level 1.

import pandas

# Consider the JSON data

actual_json_data = [

{"state": {"state 1": "AP","state 2":"Ind","state 3":"Cal"}, "code": "APH456","length":100},

{"state": "TS", "code": "SCVH456","length":160},

{"state": "MUM", "code": "SAVH4MO6","length":200},

{"state": "PUN", "code": "AYU78BN6","length":200},

{"state": {"state 1":"BNG","state 2":"TLN"},"code": "AYU78BN6","length":300},

]

# normalize the above data up to level 1

normalized = pandas.json_normalize(actual_json_data,max_level=1)

# Display the normalized data

print(normalized)

Output

code length state.state 1 state.state 2 state.state 3 state

0 APH456 100 AP Ind Cal NaN

1 SCVH456 160 NaN NaN NaN TS

2 SAVH4MO6 200 NaN NaN NaN MUM

3 AYU78BN6 200 NaN NaN NaN PUN

4 AYU78BN6 300 BNG TLN NaN NaN

Explanation

Now you can see the maximum normalization is done.

Example 5: With record_prefix as Parameter

Create JSON data with a dictionary that holds three states and add the prefix “I-” to the state label.

import pandas

# Consider the JSON data

actual_json_data = { "state": ["AP","TS", "PNU"]}

# normalize the above data by passing the record_prefix parameter

normalized = pandas.json_normalize(actual_json_data,"state",record_prefix="I-")

# Display the normalized data

print(normalized)

Output

I-0

0 AP

1 TS

2 PNU

Explanation

So we are adding the prefix to the state. After normalization, the prefix is added to the label 0.

Conclusion

“Pandas JSON normalization” is a very effective, powerful, and convenient way to convert unstructured data into a valuable state of DataFrame. We learned about all of the Pandas JSON normalization options in this article. As explained, we have done the JSON normalization using max level “0” and max level “1” for the distribution manner. We have also done the JSON normalization on the DataFrame and several columns of DataFrames. The JSON normalization method of Pandas leads us to a sustainable performance of the DataFrame in every way possible.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain