[
Common Use Cases for JSON¶
Here are some of the common use cases for JSON.
- Read data from JSON files.
- Write data to JSON files.
- We can use either
json
orpandas
modules to read data from JSON files or write data to JSON files. One need to be familiar with both the approaches as each of them have different use cases and capabilities. - Read JSON based response payloads on REST API Calls. We use
requests
module to process the REST API response Payloads. - Once the payload is returned, we can use appropriate modules to process the data further.
Read Data from JSON files¶
Here are the steps involved in reading data from JSON files.
- Using
json
module- Create file object using
open
in read only mode. - Pass the the
file
object tojson.load
. json.load
will returndict
. We can process the data further using appropriate modules.
- Create file object using
- Using
pandas
module- Use the path for the file to invoke
read_json
. - A Pandas Data Frame will be created.
- We can process data further using rich APIs available in
pandas
module.
- Use the path for the file to invoke
In [1]:
# Using json module
import json
yt_file = open('youtube_playlist_items.json')
yt_items = json.load(yt_file)
type(yt_items)
Out[1]:
dict
In [2]:
yt_items.keys()
Out[2]:
dict_keys(['kind', 'etag', 'nextPageToken', 'items', 'pageInfo'])
In [3]:
yt_items['items']
Out[3]:
[{'kind': 'youtube#playlistItem', 'etag': 'SGHDydc4dLsY2RjfXTPneb_zc_s', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5EQkE3RTJCQTJEQkFBQTcz', 'contentDetails': {'videoId': 'ETZJln4jtAo', 'videoPublishedAt': '2020-11-28T16:29:47Z'}, 'status': {'privacyStatus': 'public'}}, {'kind': 'youtube#playlistItem', 'etag': '5EFUNhJBvcwXPxO416VYQsXGzMo', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4yQzk4QTA5QjkzMTFFOEI1', 'contentDetails': {'videoId': '1OVHjHTkP3M', 'videoPublishedAt': '2020-11-28T16:30:12Z'}, 'status': {'privacyStatus': 'public'}}, {'kind': 'youtube#playlistItem', 'etag': 'TiKqB2aeYxJjMGKQ0yLMJY0vpQE', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy45NDlDQUFFOThDMTAxQjUw', 'contentDetails': {'videoId': 'qfUbPLsLQcQ', 'videoPublishedAt': '2020-11-28T16:30:33Z'}, 'status': {'privacyStatus': 'public'}}, {'kind': 'youtube#playlistItem', 'etag': 'vQrJOpYdXmGJuV32kjj2xqvSByc', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4xN0Y2QjVBOEI2MzQ5OUM5', 'contentDetails': {'videoId': 'rLTbhSaXhSM', 'videoPublishedAt': '2020-11-28T16:30:52Z'}, 'status': {'privacyStatus': 'public'}}, {'kind': 'youtube#playlistItem', 'etag': '2CzGUToIgqywXAr4wuPswj9MuFg', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5FQUY2Qzk4RUFDN0ZFRkZF', 'contentDetails': {'videoId': 'wP7BhXrJKR8', 'videoPublishedAt': '2020-11-28T16:31:14Z'}, 'status': {'privacyStatus': 'public'}}]
In [4]:
# Further data processing (get video id and published time)
list(map(lambda rec: rec['contentDetails'], yt_items['items']))
Out[4]:
[{'videoId': 'ETZJln4jtAo', 'videoPublishedAt': '2020-11-28T16:29:47Z'}, {'videoId': '1OVHjHTkP3M', 'videoPublishedAt': '2020-11-28T16:30:12Z'}, {'videoId': 'qfUbPLsLQcQ', 'videoPublishedAt': '2020-11-28T16:30:33Z'}, {'videoId': 'rLTbhSaXhSM', 'videoPublishedAt': '2020-11-28T16:30:52Z'}, {'videoId': 'wP7BhXrJKR8', 'videoPublishedAt': '2020-11-28T16:31:14Z'}]
In [5]:
# Using Pandas Module
# As youtube items are part of nested json, we need to use both json and pandas
import json
import pandas as pd
yt_file = open('youtube_playlist_items.json')
yt_items = json.load(yt_file)
In [6]:
yt_items.keys()
Out[6]:
dict_keys(['kind', 'etag', 'nextPageToken', 'items', 'pageInfo'])
In [7]:
yt_items
Out[7]:
{'kind': 'youtube#playlistItemListResponse', 'etag': 'lfs_qWNaczIydJ2Dlp1gmX9UTAc', 'nextPageToken': 'CAUQAA', 'items': [{'kind': 'youtube#playlistItem', 'etag': 'SGHDydc4dLsY2RjfXTPneb_zc_s', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5EQkE3RTJCQTJEQkFBQTcz', 'contentDetails': {'videoId': 'ETZJln4jtAo', 'videoPublishedAt': '2020-11-28T16:29:47Z'}, 'status': {'privacyStatus': 'public'}}, {'kind': 'youtube#playlistItem', 'etag': '5EFUNhJBvcwXPxO416VYQsXGzMo', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4yQzk4QTA5QjkzMTFFOEI1', 'contentDetails': {'videoId': '1OVHjHTkP3M', 'videoPublishedAt': '2020-11-28T16:30:12Z'}, 'status': {'privacyStatus': 'public'}}, {'kind': 'youtube#playlistItem', 'etag': 'TiKqB2aeYxJjMGKQ0yLMJY0vpQE', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy45NDlDQUFFOThDMTAxQjUw', 'contentDetails': {'videoId': 'qfUbPLsLQcQ', 'videoPublishedAt': '2020-11-28T16:30:33Z'}, 'status': {'privacyStatus': 'public'}}, {'kind': 'youtube#playlistItem', 'etag': 'vQrJOpYdXmGJuV32kjj2xqvSByc', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4xN0Y2QjVBOEI2MzQ5OUM5', 'contentDetails': {'videoId': 'rLTbhSaXhSM', 'videoPublishedAt': '2020-11-28T16:30:52Z'}, 'status': {'privacyStatus': 'public'}}, {'kind': 'youtube#playlistItem', 'etag': '2CzGUToIgqywXAr4wuPswj9MuFg', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5FQUY2Qzk4RUFDN0ZFRkZF', 'contentDetails': {'videoId': 'wP7BhXrJKR8', 'videoPublishedAt': '2020-11-28T16:31:14Z'}, 'status': {'privacyStatus': 'public'}}], 'pageInfo': {'totalResults': 127, 'resultsPerPage': 5}}
In [8]:
pd.json_normalize?
Signature: pd.json_normalize( data: 'dict | list[dict]', record_path: 'str | list | None' = None, meta: 'str | list[str | list[str]] | None' = None, meta_prefix: 'str | None' = None, record_prefix: 'str | None' = None, errors: 'str' = 'raise', sep: 'str' = '.', max_level: 'int | None' = None, ) -> 'DataFrame' Docstring: Normalize semi-structured JSON data into a flat table. Parameters ---------- data : dict or list of dicts Unserialized JSON objects. record_path : str or list of str, default None Path in each object to list of records. If not passed, data will be assumed to be an array of records. meta : list of paths (str or list of str), default None Fields to use as metadata for each record in resulting table. meta_prefix : str, default None If True, prefix records with dotted (?) path, e.g. foo.bar.field if meta is ['foo', 'bar']. record_prefix : str, default None If True, prefix records with dotted (?) path, e.g. foo.bar.field if path to records is ['foo', 'bar']. errors : {'raise', 'ignore'}, default 'raise' Configures error handling. * 'ignore' : will ignore KeyError if keys listed in meta are not always present. * 'raise' : will raise KeyError if keys listed in meta are not always present. sep : str, default '.' Nested records will generate names separated by sep. e.g., for sep='.', {'foo': {'bar': 0}} -> foo.bar. max_level : int, default None Max number of levels(depth of dict) to normalize. if None, normalizes all levels. .. versionadded:: 0.25.0 Returns ------- frame : DataFrame Normalize semi-structured JSON data into a flat table. Examples -------- >>> data = [ ... {"id": 1, "name": {"first": "Coleen", "last": "Volk"}}, ... {"name": {"given": "Mark", "family": "Regner"}}, ... {"id": 2, "name": "Faye Raker"}, ... ] >>> pd.json_normalize(data) id name.first name.last name.given name.family name 0 1.0 Coleen Volk NaN NaN NaN 1 NaN NaN NaN Mark Regner NaN 2 2.0 NaN NaN NaN NaN Faye Raker >>> data = [ ... { ... "id": 1, ... "name": "Cole Volk", ... "fitness": {"height": 130, "weight": 60}, ... }, ... {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}}, ... { ... "id": 2, ... "name": "Faye Raker", ... "fitness": {"height": 130, "weight": 60}, ... }, ... ] >>> pd.json_normalize(data, max_level=0) id name fitness 0 1.0 Cole Volk {'height': 130, 'weight': 60} 1 NaN Mark Reg {'height': 130, 'weight': 60} 2 2.0 Faye Raker {'height': 130, 'weight': 60} Normalizes nested data up to level 1. >>> data = [ ... { ... "id": 1, ... "name": "Cole Volk", ... "fitness": {"height": 130, "weight": 60}, ... }, ... {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}}, ... { ... "id": 2, ... "name": "Faye Raker", ... "fitness": {"height": 130, "weight": 60}, ... }, ... ] >>> pd.json_normalize(data, max_level=1) id name fitness.height fitness.weight 0 1.0 Cole Volk 130 60 1 NaN Mark Reg 130 60 2 2.0 Faye Raker 130 60 >>> data = [ ... { ... "state": "Florida", ... "shortname": "FL", ... "info": {"governor": "Rick Scott"}, ... "counties": [ ... {"name": "Dade", "population": 12345}, ... {"name": "Broward", "population": 40000}, ... {"name": "Palm Beach", "population": 60000}, ... ], ... }, ... { ... "state": "Ohio", ... "shortname": "OH", ... "info": {"governor": "John Kasich"}, ... "counties": [ ... {"name": "Summit", "population": 1234}, ... {"name": "Cuyahoga", "population": 1337}, ... ], ... }, ... ] >>> result = pd.json_normalize( ... data, "counties", ["state", "shortname", ["info", "governor"]] ... ) >>> result name population state shortname info.governor 0 Dade 12345 Florida FL Rick Scott 1 Broward 40000 Florida FL Rick Scott 2 Palm Beach 60000 Florida FL Rick Scott 3 Summit 1234 Ohio OH John Kasich 4 Cuyahoga 1337 Ohio OH John Kasich >>> data = {"A": [1, 2]} >>> pd.json_normalize(data, "A", record_prefix="Prefix.") Prefix.0 0 1 1 2 Returns normalized data with columns prefixed with the given string. File: ~/.local/lib/python3.8/site-packages/pandas/io/json/_normalize.py Type: function
In [9]:
yt_df = pd.json_normalize(yt_items, 'items')
In [10]:
yt_df
Out[10]:
kind | etag | id | contentDetails.videoId | contentDetails.videoPublishedAt | status.privacyStatus | |
---|---|---|---|---|---|---|
0 | youtube#playlistItem | SGHDydc4dLsY2RjfXTPneb_zc_s | UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy… | ETZJln4jtAo | 2020-11-28T16:29:47Z | public |
1 | youtube#playlistItem | 5EFUNhJBvcwXPxO416VYQsXGzMo | UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy… | 1OVHjHTkP3M | 2020-11-28T16:30:12Z | public |
2 | youtube#playlistItem | TiKqB2aeYxJjMGKQ0yLMJY0vpQE | UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy… | qfUbPLsLQcQ | 2020-11-28T16:30:33Z | public |
3 | youtube#playlistItem | vQrJOpYdXmGJuV32kjj2xqvSByc | UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy… | rLTbhSaXhSM | 2020-11-28T16:30:52Z | public |
4 | youtube#playlistItem | 2CzGUToIgqywXAr4wuPswj9MuFg | UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy… | wP7BhXrJKR8 | 2020-11-28T16:31:14Z | public |
In [11]:
yt_df[['contentDetails.videoId', 'contentDetails.videoPublishedAt']]
Out[11]:
contentDetails.videoId | contentDetails.videoPublishedAt | |
---|---|---|
0 | ETZJln4jtAo | 2020-11-28T16:29:47Z |
1 | 1OVHjHTkP3M | 2020-11-28T16:30:12Z |
2 | qfUbPLsLQcQ | 2020-11-28T16:30:33Z |
3 | rLTbhSaXhSM | 2020-11-28T16:30:52Z |
4 | wP7BhXrJKR8 | 2020-11-28T16:31:14Z |
In [12]:
# Using json to process customers data
# We have one customer per line
# We need to read the data as string then use json.loads to convert each string to dict.
import json
customers_file = open('customers.json')
customers_list = customers_file.read().splitlines()
# Converting the records in the file into list of dicts
# We are processing each element in customers_list
customers = list(map(json.loads, customers_list))
In [13]:
customers
Out[13]:
[{'id': 1, 'first_name': 'Frasco', 'last_name': 'Necolds', 'email': 'fnecolds0@vk.com', 'gender': 'Male', 'ip_address': '243.67.63.34'}, {'id': 2, 'first_name': 'Dulce', 'last_name': 'Santos', 'email': 'dsantos1@mashable.com', 'gender': 'Female', 'ip_address': '60.30.246.227'}, {'id': 3, 'first_name': 'Prissie', 'last_name': 'Tebbett', 'email': 'ptebbett2@infoseek.co.jp', 'gender': 'Genderfluid', 'ip_address': '22.21.162.56'}, {'id': 4, 'first_name': 'Schuyler', 'last_name': 'Coppledike', 'email': 'scoppledike3@gnu.org', 'gender': 'Agender', 'ip_address': '120.35.186.161'}, {'id': 5, 'first_name': 'Leopold', 'last_name': 'Jarred', 'email': 'ljarred4@wp.com', 'gender': 'Agender', 'ip_address': '30.119.34.4'}, {'id': 6, 'first_name': 'Joanna', 'last_name': 'Teager', 'email': 'jteager5@apache.org', 'gender': 'Bigender', 'ip_address': '245.221.176.34'}, {'id': 7, 'first_name': 'Lion', 'last_name': 'Beere', 'email': 'lbeere6@bloomberg.com', 'gender': 'Polygender', 'ip_address': '105.54.139.46'}, {'id': 8, 'first_name': 'Marabel', 'last_name': 'Wornum', 'email': 'mwornum7@posterous.com', 'gender': 'Polygender', 'ip_address': '247.229.14.25'}, {'id': 9, 'first_name': 'Helenka', 'last_name': 'Mullender', 'email': 'hmullender8@cloudflare.com', 'gender': 'Non-binary', 'ip_address': '133.216.118.88'}, {'id': 10, 'first_name': 'Christine', 'last_name': 'Swane', 'email': 'cswane9@shop-pro.jp', 'gender': 'Polygender', 'ip_address': '86.16.210.164'}]
In [14]:
# Using Pandas only to process customers data
# For customers where we have one json per line, we can use Pandas directly
import pandas as pd
customers = pd.read_json('customers.json', lines=True)
In [15]:
customers
Out[15]:
id | first_name | last_name | gender | ip_address | ||
---|---|---|---|---|---|---|
0 | 1 | Frasco | Necolds | fnecolds0@vk.com | Male | 243.67.63.34 |
1 | 2 | Dulce | Santos | dsantos1@mashable.com | Female | 60.30.246.227 |
2 | 3 | Prissie | Tebbett | ptebbett2@infoseek.co.jp | Genderfluid | 22.21.162.56 |
3 | 4 | Schuyler | Coppledike | scoppledike3@gnu.org | Agender | 120.35.186.161 |
4 | 5 | Leopold | Jarred | ljarred4@wp.com | Agender | 30.119.34.4 |
5 | 6 | Joanna | Teager | jteager5@apache.org | Bigender | 245.221.176.34 |
6 | 7 | Lion | Beere | lbeere6@bloomberg.com | Polygender | 105.54.139.46 |
7 | 8 | Marabel | Wornum | mwornum7@posterous.com | Polygender | 247.229.14.25 |
8 | 9 | Helenka | Mullender | hmullender8@cloudflare.com | Non-binary | 133.216.118.88 |
9 | 10 | Christine | Swane | cswane9@shop-pro.jp | Polygender | 86.16.210.164 |
Write Data to JSON files¶
Here are the steps involved in writing data to JSON files.
- Using
json
module- Make sure the
dict
object is ready with processed data as per requirements before writing to the file. - Create file object using
open
in write mode. - Pass the
file
object tojson.dump
. - The
dict
will be dumped in the form of JSON in the file.
- Make sure the
- Using
pandas
module- Make sure the Data Frame is ready with processed data as per requirements before writing to the file.
- Use the path for the file to invoke
to_json
. It can be invoked using Data Frame object which have the processed data. - The Pandas Data Frame will be written in the form of JSON in the file.
- We can leverage additional keyword arguments to control the behavior. For example
orient=records
can be used to write the data frame in the form of one JSON document per line.
]