Sai

Single JSON Document in Files

[

Single JSON Document in Files¶

Let us understand how to process single JSON in Files. We can leverage json or pandas modules for the same. For now, we will focus on json module.

Here are the files used for the demo.
- single_document.json
- youtube_playlist_items.json – This is an example for REST API calls which return results in the form of list. The list will be part of one of the attributes in response JSON.
Here are the steps you need to follow to review these documents using Jupyter Environment.
- Go to the sidebar and select the file.
- Right click on the file and click on Open With -> Editor
- It will open the json file as a plain text file or raw text file.
Both the documents have the data in single json.

Here are the steps to process a file which contain a simple JSON. You need to use json.load by passing file object (_io.TextIOWrapper).

Pass the path of the file and create a File Object.
Invoke json.load by passing the file object as argument.
It will return dict.
We can leverage dict operations to process the data further.

In [1]:

import json

In [2]:

!ls -ltr single_document.json

-rw-rw-r-- 1 itversity itversity 154 Mar  8 02:04 single_document.json

In [3]:

type('single_document.json')

Out[3]:

str

In [4]:

json.load?

Signature:
json.load(
    fp,
    *,
    cls=None,
    object_hook=None,
    parse_float=None,
    parse_int=None,
    parse_constant=None,
    object_pairs_hook=None,
    **kw,
)
Docstring:
Deserialize ``fp`` (a ``.read()``-supporting file-like object containing
a JSON document) to a Python object.

``object_hook`` is an optional function that will be called with the
result of any object literal decode (a ``dict``). The return value of
``object_hook`` will be used instead of the ``dict``. This feature
can be used to implement custom decoders (e.g. JSON-RPC class hinting).

``object_pairs_hook`` is an optional function that will be called with the
result of any object literal decoded with an ordered list of pairs.  The
return value of ``object_pairs_hook`` will be used instead of the ``dict``.
This feature can be used to implement custom decoders.  If ``object_hook``
is also defined, the ``object_pairs_hook`` takes priority.

To use a custom ``JSONDecoder`` subclass, specify it with the ``cls``
kwarg; otherwise ``JSONDecoder`` is used.
File:      /usr/local/lib/python3.8/json/__init__.py
Type:      function

In [5]:

type(open('single_document.json'))

Out[5]:

_io.TextIOWrapper

In [6]:

single_json = json.load(open('single_document.json'))

In [7]:

single_json

Out[7]:

{'id': 1,
 'first_name': 'Frasco',
 'last_name': 'Necolds',
 'email': 'fnecolds0@vk.com',
 'gender': 'Male',
 'ip_address': '243.67.63.34'}

In [8]:

type(single_json)

Out[8]:

dict

In [9]:

single_json.keys()

Out[9]:

dict_keys(['id', 'first_name', 'last_name', 'email', 'gender', 'ip_address'])

In [10]:

single_json.values()

Out[10]:

dict_values([1, 'Frasco', 'Necolds', 'fnecolds0@vk.com', 'Male', '243.67.63.34'])

In [11]:

single_json.items()

Out[11]:

dict_items([('id', 1), ('first_name', 'Frasco'), ('last_name', 'Necolds'), ('email', 'fnecolds0@vk.com'), ('gender', 'Male'), ('ip_address', '243.67.63.34')])

In [12]:

single_json['first_name']

Out[12]:

'Frasco'

The file youtube_playlist_items.json is an example for YouTube Data API response. It contain complex JSON structure.

First let us understand the definition of YouTube Playlist.
- A YouTube Playlist is nothing but series of videos.
- Playlist also have name, URL as well as description.
- Each video will have video id and its attributes.
- The result for YouTube Playlist Items contain both Playlist level details as well as the details about videos that are part of the playlist.
- The details of videos are made available as part of attribute called as items. The value for items is of type JSON Array.
You can follow the same steps as above to read the JSON in the file youtube_playlist_items.json into a dict.
However, the dict will be of complex structure. You can see items as of type list.

In [13]:

results_json = json.load(open('youtube_playlist_items.json'))

In [14]:

results_json

Out[14]:

{'kind': 'youtube#playlistItemListResponse',
 'etag': 'lfs_qWNaczIydJ2Dlp1gmX9UTAc',
 'nextPageToken': 'CAUQAA',
 'items': [{'kind': 'youtube#playlistItem',
   'etag': 'SGHDydc4dLsY2RjfXTPneb_zc_s',
   'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5EQkE3RTJCQTJEQkFBQTcz',
   'contentDetails': {'videoId': 'ETZJln4jtAo',
    'videoPublishedAt': '2020-11-28T16:29:47Z'},
   'status': {'privacyStatus': 'public'}},
  {'kind': 'youtube#playlistItem',
   'etag': '5EFUNhJBvcwXPxO416VYQsXGzMo',
   'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4yQzk4QTA5QjkzMTFFOEI1',
   'contentDetails': {'videoId': '1OVHjHTkP3M',
    'videoPublishedAt': '2020-11-28T16:30:12Z'},
   'status': {'privacyStatus': 'public'}},
  {'kind': 'youtube#playlistItem',
   'etag': 'TiKqB2aeYxJjMGKQ0yLMJY0vpQE',
   'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy45NDlDQUFFOThDMTAxQjUw',
   'contentDetails': {'videoId': 'qfUbPLsLQcQ',
    'videoPublishedAt': '2020-11-28T16:30:33Z'},
   'status': {'privacyStatus': 'public'}},
  {'kind': 'youtube#playlistItem',
   'etag': 'vQrJOpYdXmGJuV32kjj2xqvSByc',
   'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4xN0Y2QjVBOEI2MzQ5OUM5',
   'contentDetails': {'videoId': 'rLTbhSaXhSM',
    'videoPublishedAt': '2020-11-28T16:30:52Z'},
   'status': {'privacyStatus': 'public'}},
  {'kind': 'youtube#playlistItem',
   'etag': '2CzGUToIgqywXAr4wuPswj9MuFg',
   'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5FQUY2Qzk4RUFDN0ZFRkZF',
   'contentDetails': {'videoId': 'wP7BhXrJKR8',
    'videoPublishedAt': '2020-11-28T16:31:14Z'},
   'status': {'privacyStatus': 'public'}}],
 'pageInfo': {'totalResults': 127, 'resultsPerPage': 5}}

In [15]:

# Reading items. It contain details of videos in the playlist.
results_json['items']

Out[15]:

[{'kind': 'youtube#playlistItem',
  'etag': 'SGHDydc4dLsY2RjfXTPneb_zc_s',
  'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5EQkE3RTJCQTJEQkFBQTcz',
  'contentDetails': {'videoId': 'ETZJln4jtAo',
   'videoPublishedAt': '2020-11-28T16:29:47Z'},
  'status': {'privacyStatus': 'public'}},
 {'kind': 'youtube#playlistItem',
  'etag': '5EFUNhJBvcwXPxO416VYQsXGzMo',
  'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4yQzk4QTA5QjkzMTFFOEI1',
  'contentDetails': {'videoId': '1OVHjHTkP3M',
   'videoPublishedAt': '2020-11-28T16:30:12Z'},
  'status': {'privacyStatus': 'public'}},
 {'kind': 'youtube#playlistItem',
  'etag': 'TiKqB2aeYxJjMGKQ0yLMJY0vpQE',
  'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy45NDlDQUFFOThDMTAxQjUw',
  'contentDetails': {'videoId': 'qfUbPLsLQcQ',
   'videoPublishedAt': '2020-11-28T16:30:33Z'},
  'status': {'privacyStatus': 'public'}},
 {'kind': 'youtube#playlistItem',
  'etag': 'vQrJOpYdXmGJuV32kjj2xqvSByc',
  'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4xN0Y2QjVBOEI2MzQ5OUM5',
  'contentDetails': {'videoId': 'rLTbhSaXhSM',
   'videoPublishedAt': '2020-11-28T16:30:52Z'},
  'status': {'privacyStatus': 'public'}},
 {'kind': 'youtube#playlistItem',
  'etag': '2CzGUToIgqywXAr4wuPswj9MuFg',
  'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5FQUY2Qzk4RUFDN0ZFRkZF',
  'contentDetails': {'videoId': 'wP7BhXrJKR8',
   'videoPublishedAt': '2020-11-28T16:31:14Z'},
  'status': {'privacyStatus': 'public'}}]

In [16]:

type(results_json['items'])

Out[16]:

list

In [17]:

results_json['items'][0]

Out[17]:

{'kind': 'youtube#playlistItem',
 'etag': 'SGHDydc4dLsY2RjfXTPneb_zc_s',
 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5EQkE3RTJCQTJEQkFBQTcz',
 'contentDetails': {'videoId': 'ETZJln4jtAo',
  'videoPublishedAt': '2020-11-28T16:29:47Z'},
 'status': {'privacyStatus': 'public'}}

In [18]:

results_json['items'][0]['contentDetails']

Out[18]:

{'videoId': 'ETZJln4jtAo', 'videoPublishedAt': '2020-11-28T16:29:47Z'}

In [19]:

# Here is an example of printing item details.
for playlist_item in results_json['items']:
    print(playlist_item)

{'kind': 'youtube#playlistItem', 'etag': 'SGHDydc4dLsY2RjfXTPneb_zc_s', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5EQkE3RTJCQTJEQkFBQTcz', 'contentDetails': {'videoId': 'ETZJln4jtAo', 'videoPublishedAt': '2020-11-28T16:29:47Z'}, 'status': {'privacyStatus': 'public'}}
{'kind': 'youtube#playlistItem', 'etag': '5EFUNhJBvcwXPxO416VYQsXGzMo', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4yQzk4QTA5QjkzMTFFOEI1', 'contentDetails': {'videoId': '1OVHjHTkP3M', 'videoPublishedAt': '2020-11-28T16:30:12Z'}, 'status': {'privacyStatus': 'public'}}
{'kind': 'youtube#playlistItem', 'etag': 'TiKqB2aeYxJjMGKQ0yLMJY0vpQE', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy45NDlDQUFFOThDMTAxQjUw', 'contentDetails': {'videoId': 'qfUbPLsLQcQ', 'videoPublishedAt': '2020-11-28T16:30:33Z'}, 'status': {'privacyStatus': 'public'}}
{'kind': 'youtube#playlistItem', 'etag': 'vQrJOpYdXmGJuV32kjj2xqvSByc', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4xN0Y2QjVBOEI2MzQ5OUM5', 'contentDetails': {'videoId': 'rLTbhSaXhSM', 'videoPublishedAt': '2020-11-28T16:30:52Z'}, 'status': {'privacyStatus': 'public'}}
{'kind': 'youtube#playlistItem', 'etag': '2CzGUToIgqywXAr4wuPswj9MuFg', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5FQUY2Qzk4RUFDN0ZFRkZF', 'contentDetails': {'videoId': 'wP7BhXrJKR8', 'videoPublishedAt': '2020-11-28T16:31:14Z'}, 'status': {'privacyStatus': 'public'}}

In [20]:

# Here is an example of gettig only contentDetails for each item.
for playlist_item in results_json['items']:
    print(playlist_item['contentDetails'])

{'videoId': 'ETZJln4jtAo', 'videoPublishedAt': '2020-11-28T16:29:47Z'}
{'videoId': '1OVHjHTkP3M', 'videoPublishedAt': '2020-11-28T16:30:12Z'}
{'videoId': 'qfUbPLsLQcQ', 'videoPublishedAt': '2020-11-28T16:30:33Z'}
{'videoId': 'rLTbhSaXhSM', 'videoPublishedAt': '2020-11-28T16:30:52Z'}
{'videoId': 'wP7BhXrJKR8', 'videoPublishedAt': '2020-11-28T16:31:14Z'}

In [21]:

# Here is how you can get video ids (using map function)
list(
    map(
        lambda playlist_item: playlist_item['contentDetails']['videoId'],
        results_json['items']
    )
)

Out[21]:

['ETZJln4jtAo', '1OVHjHTkP3M', 'qfUbPLsLQcQ', 'rLTbhSaXhSM', 'wP7BhXrJKR8']

In [22]:

list(
    map(
        lambda playlist_item: playlist_item['contentDetails'],
        results_json['items']
    )
)

Out[22]:

[{'videoId': 'ETZJln4jtAo', 'videoPublishedAt': '2020-11-28T16:29:47Z'},
 {'videoId': '1OVHjHTkP3M', 'videoPublishedAt': '2020-11-28T16:30:12Z'},
 {'videoId': 'qfUbPLsLQcQ', 'videoPublishedAt': '2020-11-28T16:30:33Z'},
 {'videoId': 'rLTbhSaXhSM', 'videoPublishedAt': '2020-11-28T16:30:52Z'},
 {'videoId': 'wP7BhXrJKR8', 'videoPublishedAt': '2020-11-28T16:31:14Z'}]

]

Sai

Single JSON Document in Files

Single JSON Document in Files¶

Share this post

Join Our Community

Follow Us

Links

Contact Info

Address

Phone

Email