The “JSON” basically stands for the “JavaScript Object Notation”.
Pandas has the most popular “data processing framework” in Python, which is the “JSON” normalize” feature. It is a built-in feature of Pandas. It is the simplest way to do the Pandas JSON normalization() using the “Python” request modules.
In this article, we will see different levels of normalization.
Syntax
Here:
- Data can be a dictionary or a list of dictionaries.
- Max_level is used to specify maximum levels to normalize. If it is not specified, it will normalize all levels. It takes an integer as a parameter; by default, it is None.
- If you want to add any prefix to the Label after normalization, you can add it by passing the prefix to the record_prefix parameter. By default, it is None.
Example 1: With Data as Parameter
Here, we will only pass the JSON data. So it will normalize all levels. Let’s create five dictionaries inside a list (List of dictionaries) and normalize it.
# Consider the JSON data
actual_json_data = [
{"state": "AP", "code": "APH456"},
{"state": "TS", "code": "SCVH456"},
{"state": "MUM", "code": "TYH4543"},
{"state": "PUN", "code": "AYU78BN6"},
{"state": "BNG", "code": "RE456"},
]
print(actual_json_data)
# normalize the above data
normalized = pandas.json_normalize(actual_json_data)
# Display the normalized data
print(normalized)
Output
state code
0 AP APH456
1 TS SCVH456
2 MUM TYH4543
3 PUN AYU78BN6
4 BNG RE456
Explanation
JSON data is converted into Pandas with a maximum level of normalization.
Example 2
Let’s create five dictionaries inside a list (List of dictionaries) with some empty values and normalize it.
# Consider the JSON data
actual_json_data = [
{"state": "AP", "code": "APH456","length":100},
{"state": "TS", "code": "SCVH456"},
{"state": "MUM", "length":200},
{"state": "PUN", "code": "AYU78BN6"},
{"state": "BNG","length":300},
]
print(actual_json_data)
# normalize the above data
normalized = pandas.json_normalize(actual_json_data)
# Display the normalized data
print(normalized)
Output
state code length
0 AP APH456 100.0
1 TS SCVH456 NaN
2 MUM NaN 200.0
3 PUN AYU78BN6 NaN
4 BNG NaN 300.0
Explanation
JSON data is converted into Pandas with a maximum level of normalization. If the value is empty, then NaN is returned at empty positions.
Example 3: With the max_level Parameter
Let’s create five dictionaries inside a list (List of dictionaries) and normalize it up to level 0.
# Consider the JSON data
actual_json_data = [
{"state": {"state 1": "AP","state 2":"Ind","state 3":"Cal"}, "code": "APH456","length":100},
{"state": "TS", "code": "SCVH456","length":160},
{"state": "MUM", "code": "SAVH4MO6","length":200},
{"state": "PUN", "code": "AYU78BN6","length":200},
{"state": {"state 1":"BNG","state 2":"TLN"},"code": "AYU78BN6","length":300},
]
# normalize the above data up to level 0
normalized = pandas.json_normalize(actual_json_data,max_level=0)
# Display the normalized data
print(normalized)
Output
0 {'state 1': 'AP', 'state 2': 'Ind', 'state 3':... APH456 100
1 TS SCVH456 160
2 MUM SAVH4MO6 200
3 PUN AYU78BN6 200
4 {'state 1': 'BNG', 'state 2': 'TLN'} AYU78BN6 300
Explanation
Normalization is done only up to level 0. Still, we can normalize the data in the state column.
Let’s normalize up to level 1 in the next example.
Example 4
Let’s create five dictionaries inside a list (List of dictionaries) and normalize it up to level 1.
# Consider the JSON data
actual_json_data = [
{"state": {"state 1": "AP","state 2":"Ind","state 3":"Cal"}, "code": "APH456","length":100},
{"state": "TS", "code": "SCVH456","length":160},
{"state": "MUM", "code": "SAVH4MO6","length":200},
{"state": "PUN", "code": "AYU78BN6","length":200},
{"state": {"state 1":"BNG","state 2":"TLN"},"code": "AYU78BN6","length":300},
]
# normalize the above data up to level 1
normalized = pandas.json_normalize(actual_json_data,max_level=1)
# Display the normalized data
print(normalized)
Output
0 APH456 100 AP Ind Cal NaN
1 SCVH456 160 NaN NaN NaN TS
2 SAVH4MO6 200 NaN NaN NaN MUM
3 AYU78BN6 200 NaN NaN NaN PUN
4 AYU78BN6 300 BNG TLN NaN NaN
Explanation
Now you can see the maximum normalization is done.
Example 5: With record_prefix as Parameter
Create JSON data with a dictionary that holds three states and add the prefix “I-” to the state label.
# Consider the JSON data
actual_json_data = { "state": ["AP","TS", "PNU"]}
# normalize the above data by passing the record_prefix parameter
normalized = pandas.json_normalize(actual_json_data,"state",record_prefix="I-")
# Display the normalized data
print(normalized)
Output
0 AP
1 TS
2 PNU
Explanation
So we are adding the prefix to the state. After normalization, the prefix is added to the label 0.
Conclusion
“Pandas JSON normalization” is a very effective, powerful, and convenient way to convert unstructured data into a valuable state of DataFrame. We learned about all of the Pandas JSON normalization options in this article. As explained, we have done the JSON normalization using max level “0” and max level “1” for the distribution manner. We have also done the JSON normalization on the DataFrame and several columns of DataFrames. The JSON normalization method of Pandas leads us to a sustainable performance of the DataFrame in every way possible.