Sai

Common Use Cases for JSON

[

Common Use Cases for JSON¶

Here are some of the common use cases for JSON.

Read data from JSON files.
Write data to JSON files.
We can use either json or pandas modules to read data from JSON files or write data to JSON files. One need to be familiar with both the approaches as each of them have different use cases and capabilities.
Read JSON based response payloads on REST API Calls. We use requests module to process the REST API response Payloads.
Once the payload is returned, we can use appropriate modules to process the data further.

Read Data from JSON files¶

Here are the steps involved in reading data from JSON files.

Using json module
- Create file object using open in read only mode.
- Pass the the file object to json.load.
- json.load will return dict. We can process the data further using appropriate modules.
Using pandas module
- Use the path for the file to invoke read_json.
- A Pandas Data Frame will be created.
- We can process data further using rich APIs available in pandas module.

In [1]:

# Using json module
import json

yt_file = open('youtube_playlist_items.json')
yt_items = json.load(yt_file)

type(yt_items)

Out[1]:

dict

In [2]:

yt_items.keys()

Out[2]:

dict_keys(['kind', 'etag', 'nextPageToken', 'items', 'pageInfo'])

In [3]:

yt_items['items']

Out[3]:

[{'kind': 'youtube#playlistItem',
  'etag': 'SGHDydc4dLsY2RjfXTPneb_zc_s',
  'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5EQkE3RTJCQTJEQkFBQTcz',
  'contentDetails': {'videoId': 'ETZJln4jtAo',
   'videoPublishedAt': '2020-11-28T16:29:47Z'},
  'status': {'privacyStatus': 'public'}},
 {'kind': 'youtube#playlistItem',
  'etag': '5EFUNhJBvcwXPxO416VYQsXGzMo',
  'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4yQzk4QTA5QjkzMTFFOEI1',
  'contentDetails': {'videoId': '1OVHjHTkP3M',
   'videoPublishedAt': '2020-11-28T16:30:12Z'},
  'status': {'privacyStatus': 'public'}},
 {'kind': 'youtube#playlistItem',
  'etag': 'TiKqB2aeYxJjMGKQ0yLMJY0vpQE',
  'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy45NDlDQUFFOThDMTAxQjUw',
  'contentDetails': {'videoId': 'qfUbPLsLQcQ',
   'videoPublishedAt': '2020-11-28T16:30:33Z'},
  'status': {'privacyStatus': 'public'}},
 {'kind': 'youtube#playlistItem',
  'etag': 'vQrJOpYdXmGJuV32kjj2xqvSByc',
  'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4xN0Y2QjVBOEI2MzQ5OUM5',
  'contentDetails': {'videoId': 'rLTbhSaXhSM',
   'videoPublishedAt': '2020-11-28T16:30:52Z'},
  'status': {'privacyStatus': 'public'}},
 {'kind': 'youtube#playlistItem',
  'etag': '2CzGUToIgqywXAr4wuPswj9MuFg',
  'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5FQUY2Qzk4RUFDN0ZFRkZF',
  'contentDetails': {'videoId': 'wP7BhXrJKR8',
   'videoPublishedAt': '2020-11-28T16:31:14Z'},
  'status': {'privacyStatus': 'public'}}]

In [4]:

# Further data processing (get video id and published time)
list(map(lambda rec: rec['contentDetails'], yt_items['items']))

Out[4]:

[{'videoId': 'ETZJln4jtAo', 'videoPublishedAt': '2020-11-28T16:29:47Z'},
 {'videoId': '1OVHjHTkP3M', 'videoPublishedAt': '2020-11-28T16:30:12Z'},
 {'videoId': 'qfUbPLsLQcQ', 'videoPublishedAt': '2020-11-28T16:30:33Z'},
 {'videoId': 'rLTbhSaXhSM', 'videoPublishedAt': '2020-11-28T16:30:52Z'},
 {'videoId': 'wP7BhXrJKR8', 'videoPublishedAt': '2020-11-28T16:31:14Z'}]

In [5]:

# Using Pandas Module
# As youtube items are part of nested json, we need to use both json and pandas

import json
import pandas as pd

yt_file = open('youtube_playlist_items.json')
yt_items = json.load(yt_file)

In [6]:

yt_items.keys()

Out[6]:

dict_keys(['kind', 'etag', 'nextPageToken', 'items', 'pageInfo'])

In [7]:

yt_items

Out[7]:

{'kind': 'youtube#playlistItemListResponse',
 'etag': 'lfs_qWNaczIydJ2Dlp1gmX9UTAc',
 'nextPageToken': 'CAUQAA',
 'items': [{'kind': 'youtube#playlistItem',
   'etag': 'SGHDydc4dLsY2RjfXTPneb_zc_s',
   'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5EQkE3RTJCQTJEQkFBQTcz',
   'contentDetails': {'videoId': 'ETZJln4jtAo',
    'videoPublishedAt': '2020-11-28T16:29:47Z'},
   'status': {'privacyStatus': 'public'}},
  {'kind': 'youtube#playlistItem',
   'etag': '5EFUNhJBvcwXPxO416VYQsXGzMo',
   'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4yQzk4QTA5QjkzMTFFOEI1',
   'contentDetails': {'videoId': '1OVHjHTkP3M',
    'videoPublishedAt': '2020-11-28T16:30:12Z'},
   'status': {'privacyStatus': 'public'}},
  {'kind': 'youtube#playlistItem',
   'etag': 'TiKqB2aeYxJjMGKQ0yLMJY0vpQE',
   'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy45NDlDQUFFOThDMTAxQjUw',
   'contentDetails': {'videoId': 'qfUbPLsLQcQ',
    'videoPublishedAt': '2020-11-28T16:30:33Z'},
   'status': {'privacyStatus': 'public'}},
  {'kind': 'youtube#playlistItem',
   'etag': 'vQrJOpYdXmGJuV32kjj2xqvSByc',
   'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4xN0Y2QjVBOEI2MzQ5OUM5',
   'contentDetails': {'videoId': 'rLTbhSaXhSM',
    'videoPublishedAt': '2020-11-28T16:30:52Z'},
   'status': {'privacyStatus': 'public'}},
  {'kind': 'youtube#playlistItem',
   'etag': '2CzGUToIgqywXAr4wuPswj9MuFg',
   'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5FQUY2Qzk4RUFDN0ZFRkZF',
   'contentDetails': {'videoId': 'wP7BhXrJKR8',
    'videoPublishedAt': '2020-11-28T16:31:14Z'},
   'status': {'privacyStatus': 'public'}}],
 'pageInfo': {'totalResults': 127, 'resultsPerPage': 5}}

In [8]:

pd.json_normalize?

Signature:
pd.json_normalize(
    data: 'dict | list[dict]',
    record_path: 'str | list | None' = None,
    meta: 'str | list[str | list[str]] | None' = None,
    meta_prefix: 'str | None' = None,
    record_prefix: 'str | None' = None,
    errors: 'str' = 'raise',
    sep: 'str' = '.',
    max_level: 'int | None' = None,
) -> 'DataFrame'
Docstring:
Normalize semi-structured JSON data into a flat table.

Parameters
----------
data : dict or list of dicts
    Unserialized JSON objects.
record_path : str or list of str, default None
    Path in each object to list of records. If not passed, data will be
    assumed to be an array of records.
meta : list of paths (str or list of str), default None
    Fields to use as metadata for each record in resulting table.
meta_prefix : str, default None
    If True, prefix records with dotted (?) path, e.g. foo.bar.field if
    meta is ['foo', 'bar'].
record_prefix : str, default None
    If True, prefix records with dotted (?) path, e.g. foo.bar.field if
    path to records is ['foo', 'bar'].
errors : {'raise', 'ignore'}, default 'raise'
    Configures error handling.

    * 'ignore' : will ignore KeyError if keys listed in meta are not
      always present.
    * 'raise' : will raise KeyError if keys listed in meta are not
      always present.
sep : str, default '.'
    Nested records will generate names separated by sep.
    e.g., for sep='.', {'foo': {'bar': 0}} -> foo.bar.
max_level : int, default None
    Max number of levels(depth of dict) to normalize.
    if None, normalizes all levels.

    .. versionadded:: 0.25.0

Returns
-------
frame : DataFrame
Normalize semi-structured JSON data into a flat table.

Examples
--------
>>> data = [
...     {"id": 1, "name": {"first": "Coleen", "last": "Volk"}},
...     {"name": {"given": "Mark", "family": "Regner"}},
...     {"id": 2, "name": "Faye Raker"},
... ]
>>> pd.json_normalize(data)
    id name.first name.last name.given name.family        name
0  1.0     Coleen      Volk        NaN         NaN         NaN
1  NaN        NaN       NaN       Mark      Regner         NaN
2  2.0        NaN       NaN        NaN         NaN  Faye Raker

>>> data = [
...     {
...         "id": 1,
...         "name": "Cole Volk",
...         "fitness": {"height": 130, "weight": 60},
...     },
...     {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
...     {
...         "id": 2,
...         "name": "Faye Raker",
...         "fitness": {"height": 130, "weight": 60},
...     },
... ]
>>> pd.json_normalize(data, max_level=0)
    id        name                        fitness
0  1.0   Cole Volk  {'height': 130, 'weight': 60}
1  NaN    Mark Reg  {'height': 130, 'weight': 60}
2  2.0  Faye Raker  {'height': 130, 'weight': 60}

Normalizes nested data up to level 1.

>>> data = [
...     {
...         "id": 1,
...         "name": "Cole Volk",
...         "fitness": {"height": 130, "weight": 60},
...     },
...     {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
...     {
...         "id": 2,
...         "name": "Faye Raker",
...         "fitness": {"height": 130, "weight": 60},
...     },
... ]
>>> pd.json_normalize(data, max_level=1)
    id        name  fitness.height  fitness.weight
0  1.0   Cole Volk             130              60
1  NaN    Mark Reg             130              60
2  2.0  Faye Raker             130              60

>>> data = [
...     {
...         "state": "Florida",
...         "shortname": "FL",
...         "info": {"governor": "Rick Scott"},
...         "counties": [
...             {"name": "Dade", "population": 12345},
...             {"name": "Broward", "population": 40000},
...             {"name": "Palm Beach", "population": 60000},
...         ],
...     },
...     {
...         "state": "Ohio",
...         "shortname": "OH",
...         "info": {"governor": "John Kasich"},
...         "counties": [
...             {"name": "Summit", "population": 1234},
...             {"name": "Cuyahoga", "population": 1337},
...         ],
...     },
... ]
>>> result = pd.json_normalize(
...     data, "counties", ["state", "shortname", ["info", "governor"]]
... )
>>> result
         name  population    state shortname info.governor
0        Dade       12345   Florida    FL    Rick Scott
1     Broward       40000   Florida    FL    Rick Scott
2  Palm Beach       60000   Florida    FL    Rick Scott
3      Summit        1234   Ohio       OH    John Kasich
4    Cuyahoga        1337   Ohio       OH    John Kasich

>>> data = {"A": [1, 2]}
>>> pd.json_normalize(data, "A", record_prefix="Prefix.")
    Prefix.0
0          1
1          2

Returns normalized data with columns prefixed with the given string.
File:      ~/.local/lib/python3.8/site-packages/pandas/io/json/_normalize.py
Type:      function

In [9]:

yt_df = pd.json_normalize(yt_items, 'items')

In [10]:

yt_df

Out[10]:

	kind	etag	id	contentDetails.videoId	contentDetails.videoPublishedAt	status.privacyStatus
0	youtube#playlistItem	SGHDydc4dLsY2RjfXTPneb_zc_s	UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy…	ETZJln4jtAo	2020-11-28T16:29:47Z	public
1	youtube#playlistItem	5EFUNhJBvcwXPxO416VYQsXGzMo	UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy…	1OVHjHTkP3M	2020-11-28T16:30:12Z	public
2	youtube#playlistItem	TiKqB2aeYxJjMGKQ0yLMJY0vpQE	UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy…	qfUbPLsLQcQ	2020-11-28T16:30:33Z	public
3	youtube#playlistItem	vQrJOpYdXmGJuV32kjj2xqvSByc	UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy…	rLTbhSaXhSM	2020-11-28T16:30:52Z	public
4	youtube#playlistItem	2CzGUToIgqywXAr4wuPswj9MuFg	UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy…	wP7BhXrJKR8	2020-11-28T16:31:14Z	public

In [11]:

yt_df[['contentDetails.videoId', 'contentDetails.videoPublishedAt']]

Out[11]:

	contentDetails.videoId	contentDetails.videoPublishedAt
0	ETZJln4jtAo	2020-11-28T16:29:47Z
1	1OVHjHTkP3M	2020-11-28T16:30:12Z
2	qfUbPLsLQcQ	2020-11-28T16:30:33Z
3	rLTbhSaXhSM	2020-11-28T16:30:52Z
4	wP7BhXrJKR8	2020-11-28T16:31:14Z

In [12]:

# Using json to process customers data
# We have one customer per line
# We need to read the data as string then use json.loads to convert each string to dict.

import json

customers_file = open('customers.json')
customers_list = customers_file.read().splitlines()

# Converting the records in the file into list of dicts
# We are processing each element in customers_list
customers = list(map(json.loads, customers_list))

In [13]:

customers

Out[13]:

[{'id': 1,
  'first_name': 'Frasco',
  'last_name': 'Necolds',
  'email': 'fnecolds0@vk.com',
  'gender': 'Male',
  'ip_address': '243.67.63.34'},
 {'id': 2,
  'first_name': 'Dulce',
  'last_name': 'Santos',
  'email': 'dsantos1@mashable.com',
  'gender': 'Female',
  'ip_address': '60.30.246.227'},
 {'id': 3,
  'first_name': 'Prissie',
  'last_name': 'Tebbett',
  'email': 'ptebbett2@infoseek.co.jp',
  'gender': 'Genderfluid',
  'ip_address': '22.21.162.56'},
 {'id': 4,
  'first_name': 'Schuyler',
  'last_name': 'Coppledike',
  'email': 'scoppledike3@gnu.org',
  'gender': 'Agender',
  'ip_address': '120.35.186.161'},
 {'id': 5,
  'first_name': 'Leopold',
  'last_name': 'Jarred',
  'email': 'ljarred4@wp.com',
  'gender': 'Agender',
  'ip_address': '30.119.34.4'},
 {'id': 6,
  'first_name': 'Joanna',
  'last_name': 'Teager',
  'email': 'jteager5@apache.org',
  'gender': 'Bigender',
  'ip_address': '245.221.176.34'},
 {'id': 7,
  'first_name': 'Lion',
  'last_name': 'Beere',
  'email': 'lbeere6@bloomberg.com',
  'gender': 'Polygender',
  'ip_address': '105.54.139.46'},
 {'id': 8,
  'first_name': 'Marabel',
  'last_name': 'Wornum',
  'email': 'mwornum7@posterous.com',
  'gender': 'Polygender',
  'ip_address': '247.229.14.25'},
 {'id': 9,
  'first_name': 'Helenka',
  'last_name': 'Mullender',
  'email': 'hmullender8@cloudflare.com',
  'gender': 'Non-binary',
  'ip_address': '133.216.118.88'},
 {'id': 10,
  'first_name': 'Christine',
  'last_name': 'Swane',
  'email': 'cswane9@shop-pro.jp',
  'gender': 'Polygender',
  'ip_address': '86.16.210.164'}]

In [14]:

# Using Pandas only to process customers data
# For customers where we have one json per line, we can use Pandas directly

import pandas as pd

customers = pd.read_json('customers.json', lines=True)

In [15]:

customers

Out[15]:

	id	first_name	last_name	email	gender	ip_address
0	1	Frasco	Necolds	fnecolds0@vk.com	Male	243.67.63.34
1	2	Dulce	Santos	dsantos1@mashable.com	Female	60.30.246.227
2	3	Prissie	Tebbett	ptebbett2@infoseek.co.jp	Genderfluid	22.21.162.56
3	4	Schuyler	Coppledike	scoppledike3@gnu.org	Agender	120.35.186.161
4	5	Leopold	Jarred	ljarred4@wp.com	Agender	30.119.34.4
5	6	Joanna	Teager	jteager5@apache.org	Bigender	245.221.176.34
6	7	Lion	Beere	lbeere6@bloomberg.com	Polygender	105.54.139.46
7	8	Marabel	Wornum	mwornum7@posterous.com	Polygender	247.229.14.25
8	9	Helenka	Mullender	hmullender8@cloudflare.com	Non-binary	133.216.118.88
9	10	Christine	Swane	cswane9@shop-pro.jp	Polygender	86.16.210.164

Write Data to JSON files¶

Here are the steps involved in writing data to JSON files.

Using json module
- Make sure the dict object is ready with processed data as per requirements before writing to the file.
- Create file object using open in write mode.
- Pass the file object to json.dump.
- The dict will be dumped in the form of JSON in the file.
Using pandas module
- Make sure the Data Frame is ready with processed data as per requirements before writing to the file.
- Use the path for the file to invoke to_json. It can be invoked using Data Frame object which have the processed data.
- The Pandas Data Frame will be written in the form of JSON in the file.
- We can leverage additional keyword arguments to control the behavior. For example orient=records can be used to write the data frame in the form of one JSON document per line.

]

Sai

Common Use Cases for JSON

Common Use Cases for JSON¶

Read Data from JSON files¶

Write Data to JSON files¶

Share this post

Join Our Community

Follow Us

Links

Contact Info

Address

Phone

Email