[
Single JSON Document in Files¶
Let us understand how to process single JSON in Files. We can leverage json
or pandas
modules for the same. For now, we will focus on json
module.
- Here are the files used for the demo.
- single_document.json
- youtube_playlist_items.json – This is an example for REST API calls which return results in the form of list. The list will be part of one of the attributes in response JSON.
- Here are the steps you need to follow to review these documents using Jupyter Environment.
- Go to the sidebar and select the file.
- Right click on the file and click on Open With -> Editor
- It will open the json file as a plain text file or raw text file.
- Both the documents have the data in single json.
Here are the steps to process a file which contain a simple JSON. You need to use json.load
by passing file object (_io.TextIOWrapper
).
- Pass the path of the file and create a File Object.
- Invoke
json.load
by passing the file object as argument. - It will return
dict
. - We can leverage dict operations to process the data further.
In [1]:
import json
In [2]:
!ls -ltr single_document.json
-rw-rw-r-- 1 itversity itversity 154 Mar 8 02:04 single_document.json
In [3]:
type('single_document.json')
Out[3]:
str
In [4]:
json.load?
Signature: json.load( fp, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw, ) Docstring: Deserialize ``fp`` (a ``.read()``-supporting file-like object containing a JSON document) to a Python object. ``object_hook`` is an optional function that will be called with the result of any object literal decode (a ``dict``). The return value of ``object_hook`` will be used instead of the ``dict``. This feature can be used to implement custom decoders (e.g. JSON-RPC class hinting). ``object_pairs_hook`` is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of ``object_pairs_hook`` will be used instead of the ``dict``. This feature can be used to implement custom decoders. If ``object_hook`` is also defined, the ``object_pairs_hook`` takes priority. To use a custom ``JSONDecoder`` subclass, specify it with the ``cls`` kwarg; otherwise ``JSONDecoder`` is used. File: /usr/local/lib/python3.8/json/__init__.py Type: function
In [5]:
type(open('single_document.json'))
Out[5]:
_io.TextIOWrapper
In [6]:
single_json = json.load(open('single_document.json'))
In [7]:
single_json
Out[7]:
{'id': 1, 'first_name': 'Frasco', 'last_name': 'Necolds', 'email': 'fnecolds0@vk.com', 'gender': 'Male', 'ip_address': '243.67.63.34'}
In [8]:
type(single_json)
Out[8]:
dict
In [9]:
single_json.keys()
Out[9]:
dict_keys(['id', 'first_name', 'last_name', 'email', 'gender', 'ip_address'])
In [10]:
single_json.values()
Out[10]:
dict_values([1, 'Frasco', 'Necolds', 'fnecolds0@vk.com', 'Male', '243.67.63.34'])
In [11]:
single_json.items()
Out[11]:
dict_items([('id', 1), ('first_name', 'Frasco'), ('last_name', 'Necolds'), ('email', 'fnecolds0@vk.com'), ('gender', 'Male'), ('ip_address', '243.67.63.34')])
In [12]:
single_json['first_name']
Out[12]:
'Frasco'
The file youtube_playlist_items.json is an example for YouTube Data API response. It contain complex JSON structure.
- First let us understand the definition of YouTube Playlist.
- A YouTube Playlist is nothing but series of videos.
- Playlist also have name, URL as well as description.
- Each video will have video id and its attributes.
- The result for YouTube Playlist Items contain both Playlist level details as well as the details about videos that are part of the playlist.
- The details of videos are made available as part of attribute called as items. The value for items is of type JSON Array.
- You can follow the same steps as above to read the JSON in the file youtube_playlist_items.json into a dict.
- However, the dict will be of complex structure. You can see items as of type
list
.
In [13]:
results_json = json.load(open('youtube_playlist_items.json'))
In [14]:
results_json
Out[14]:
{'kind': 'youtube#playlistItemListResponse', 'etag': 'lfs_qWNaczIydJ2Dlp1gmX9UTAc', 'nextPageToken': 'CAUQAA', 'items': [{'kind': 'youtube#playlistItem', 'etag': 'SGHDydc4dLsY2RjfXTPneb_zc_s', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5EQkE3RTJCQTJEQkFBQTcz', 'contentDetails': {'videoId': 'ETZJln4jtAo', 'videoPublishedAt': '2020-11-28T16:29:47Z'}, 'status': {'privacyStatus': 'public'}}, {'kind': 'youtube#playlistItem', 'etag': '5EFUNhJBvcwXPxO416VYQsXGzMo', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4yQzk4QTA5QjkzMTFFOEI1', 'contentDetails': {'videoId': '1OVHjHTkP3M', 'videoPublishedAt': '2020-11-28T16:30:12Z'}, 'status': {'privacyStatus': 'public'}}, {'kind': 'youtube#playlistItem', 'etag': 'TiKqB2aeYxJjMGKQ0yLMJY0vpQE', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy45NDlDQUFFOThDMTAxQjUw', 'contentDetails': {'videoId': 'qfUbPLsLQcQ', 'videoPublishedAt': '2020-11-28T16:30:33Z'}, 'status': {'privacyStatus': 'public'}}, {'kind': 'youtube#playlistItem', 'etag': 'vQrJOpYdXmGJuV32kjj2xqvSByc', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4xN0Y2QjVBOEI2MzQ5OUM5', 'contentDetails': {'videoId': 'rLTbhSaXhSM', 'videoPublishedAt': '2020-11-28T16:30:52Z'}, 'status': {'privacyStatus': 'public'}}, {'kind': 'youtube#playlistItem', 'etag': '2CzGUToIgqywXAr4wuPswj9MuFg', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5FQUY2Qzk4RUFDN0ZFRkZF', 'contentDetails': {'videoId': 'wP7BhXrJKR8', 'videoPublishedAt': '2020-11-28T16:31:14Z'}, 'status': {'privacyStatus': 'public'}}], 'pageInfo': {'totalResults': 127, 'resultsPerPage': 5}}
In [15]:
# Reading items. It contain details of videos in the playlist.
results_json['items']
Out[15]:
[{'kind': 'youtube#playlistItem', 'etag': 'SGHDydc4dLsY2RjfXTPneb_zc_s', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5EQkE3RTJCQTJEQkFBQTcz', 'contentDetails': {'videoId': 'ETZJln4jtAo', 'videoPublishedAt': '2020-11-28T16:29:47Z'}, 'status': {'privacyStatus': 'public'}}, {'kind': 'youtube#playlistItem', 'etag': '5EFUNhJBvcwXPxO416VYQsXGzMo', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4yQzk4QTA5QjkzMTFFOEI1', 'contentDetails': {'videoId': '1OVHjHTkP3M', 'videoPublishedAt': '2020-11-28T16:30:12Z'}, 'status': {'privacyStatus': 'public'}}, {'kind': 'youtube#playlistItem', 'etag': 'TiKqB2aeYxJjMGKQ0yLMJY0vpQE', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy45NDlDQUFFOThDMTAxQjUw', 'contentDetails': {'videoId': 'qfUbPLsLQcQ', 'videoPublishedAt': '2020-11-28T16:30:33Z'}, 'status': {'privacyStatus': 'public'}}, {'kind': 'youtube#playlistItem', 'etag': 'vQrJOpYdXmGJuV32kjj2xqvSByc', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4xN0Y2QjVBOEI2MzQ5OUM5', 'contentDetails': {'videoId': 'rLTbhSaXhSM', 'videoPublishedAt': '2020-11-28T16:30:52Z'}, 'status': {'privacyStatus': 'public'}}, {'kind': 'youtube#playlistItem', 'etag': '2CzGUToIgqywXAr4wuPswj9MuFg', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5FQUY2Qzk4RUFDN0ZFRkZF', 'contentDetails': {'videoId': 'wP7BhXrJKR8', 'videoPublishedAt': '2020-11-28T16:31:14Z'}, 'status': {'privacyStatus': 'public'}}]
In [16]:
type(results_json['items'])
Out[16]:
list
In [17]:
results_json['items'][0]
Out[17]:
{'kind': 'youtube#playlistItem', 'etag': 'SGHDydc4dLsY2RjfXTPneb_zc_s', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5EQkE3RTJCQTJEQkFBQTcz', 'contentDetails': {'videoId': 'ETZJln4jtAo', 'videoPublishedAt': '2020-11-28T16:29:47Z'}, 'status': {'privacyStatus': 'public'}}
In [18]:
results_json['items'][0]['contentDetails']
Out[18]:
{'videoId': 'ETZJln4jtAo', 'videoPublishedAt': '2020-11-28T16:29:47Z'}
In [19]:
# Here is an example of printing item details.
for playlist_item in results_json['items']:
print(playlist_item)
{'kind': 'youtube#playlistItem', 'etag': 'SGHDydc4dLsY2RjfXTPneb_zc_s', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5EQkE3RTJCQTJEQkFBQTcz', 'contentDetails': {'videoId': 'ETZJln4jtAo', 'videoPublishedAt': '2020-11-28T16:29:47Z'}, 'status': {'privacyStatus': 'public'}} {'kind': 'youtube#playlistItem', 'etag': '5EFUNhJBvcwXPxO416VYQsXGzMo', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4yQzk4QTA5QjkzMTFFOEI1', 'contentDetails': {'videoId': '1OVHjHTkP3M', 'videoPublishedAt': '2020-11-28T16:30:12Z'}, 'status': {'privacyStatus': 'public'}} {'kind': 'youtube#playlistItem', 'etag': 'TiKqB2aeYxJjMGKQ0yLMJY0vpQE', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy45NDlDQUFFOThDMTAxQjUw', 'contentDetails': {'videoId': 'qfUbPLsLQcQ', 'videoPublishedAt': '2020-11-28T16:30:33Z'}, 'status': {'privacyStatus': 'public'}} {'kind': 'youtube#playlistItem', 'etag': 'vQrJOpYdXmGJuV32kjj2xqvSByc', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy4xN0Y2QjVBOEI2MzQ5OUM5', 'contentDetails': {'videoId': 'rLTbhSaXhSM', 'videoPublishedAt': '2020-11-28T16:30:52Z'}, 'status': {'privacyStatus': 'public'}} {'kind': 'youtube#playlistItem', 'etag': '2CzGUToIgqywXAr4wuPswj9MuFg', 'id': 'UExmMHN3VEZoVEk4cmtINHlJZm95VEFoZUVHaldJUnRQRy5FQUY2Qzk4RUFDN0ZFRkZF', 'contentDetails': {'videoId': 'wP7BhXrJKR8', 'videoPublishedAt': '2020-11-28T16:31:14Z'}, 'status': {'privacyStatus': 'public'}}
In [20]:
# Here is an example of gettig only contentDetails for each item.
for playlist_item in results_json['items']:
print(playlist_item['contentDetails'])
{'videoId': 'ETZJln4jtAo', 'videoPublishedAt': '2020-11-28T16:29:47Z'} {'videoId': '1OVHjHTkP3M', 'videoPublishedAt': '2020-11-28T16:30:12Z'} {'videoId': 'qfUbPLsLQcQ', 'videoPublishedAt': '2020-11-28T16:30:33Z'} {'videoId': 'rLTbhSaXhSM', 'videoPublishedAt': '2020-11-28T16:30:52Z'} {'videoId': 'wP7BhXrJKR8', 'videoPublishedAt': '2020-11-28T16:31:14Z'}
In [21]:
# Here is how you can get video ids (using map function)
list(
map(
lambda playlist_item: playlist_item['contentDetails']['videoId'],
results_json['items']
)
)
Out[21]:
['ETZJln4jtAo', '1OVHjHTkP3M', 'qfUbPLsLQcQ', 'rLTbhSaXhSM', 'wP7BhXrJKR8']
In [22]:
list(
map(
lambda playlist_item: playlist_item['contentDetails'],
results_json['items']
)
)
Out[22]:
[{'videoId': 'ETZJln4jtAo', 'videoPublishedAt': '2020-11-28T16:29:47Z'}, {'videoId': '1OVHjHTkP3M', 'videoPublishedAt': '2020-11-28T16:30:12Z'}, {'videoId': 'qfUbPLsLQcQ', 'videoPublishedAt': '2020-11-28T16:30:33Z'}, {'videoId': 'rLTbhSaXhSM', 'videoPublishedAt': '2020-11-28T16:30:52Z'}, {'videoId': 'wP7BhXrJKR8', 'videoPublishedAt': '2020-11-28T16:31:14Z'}]
]