[
Process JSON String¶
Let us understand how to process JSON strings using Python as programming language. Later we will see different ways of storing JSON data in files.
We will see following examples of processing JSON strings.
- Single JSON document.
- Multiple JSON documents, with one JSON per line.
- Multiple JSON documents as an Array under one attribute. Most of the REST APIs which return multiple elements follow this approach.
- We can process JSON Strings either by using
json
module orpandas
. - As part of developing backend for web or mobile applications we use
json
or some high level wrappers. For bulk data processing typically we fall back on modules such aspandas
. - You should be familiar with both. For now, we will focus on
json
. - We should first import
json
module to process the JSON strings using it. - We have a function called as
loads
which takes a JSON in string and returnsdict
.
Single JSON document¶
Let us go through the details of processing Single JSON document.
- Import
json
module. - Create JSON String.
- Pass the string to
json.loads
. It will returndict
. - Assign it to a variable and use it further.
In [1]:
import json
In [2]:
person = '{"id":1,"first_name":"Frasco","last_name":"Necolds","email":"fnecolds0@vk.com","gender":"Male","ip_address":"243.67.63.34"}'
In [3]:
type(person)
Out[3]:
str
In [4]:
json.loads?
Signature: json.loads( s, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw, ) Docstring: Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance containing a JSON document) to a Python object. ``object_hook`` is an optional function that will be called with the result of any object literal decode (a ``dict``). The return value of ``object_hook`` will be used instead of the ``dict``. This feature can be used to implement custom decoders (e.g. JSON-RPC class hinting). ``object_pairs_hook`` is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of ``object_pairs_hook`` will be used instead of the ``dict``. This feature can be used to implement custom decoders. If ``object_hook`` is also defined, the ``object_pairs_hook`` takes priority. ``parse_float``, if specified, will be called with the string of every JSON float to be decoded. By default this is equivalent to float(num_str). This can be used to use another datatype or parser for JSON floats (e.g. decimal.Decimal). ``parse_int``, if specified, will be called with the string of every JSON int to be decoded. By default this is equivalent to int(num_str). This can be used to use another datatype or parser for JSON integers (e.g. float). ``parse_constant``, if specified, will be called with one of the following strings: -Infinity, Infinity, NaN. This can be used to raise an exception if invalid JSON numbers are encountered. To use a custom ``JSONDecoder`` subclass, specify it with the ``cls`` kwarg; otherwise ``JSONDecoder`` is used. The ``encoding`` argument is ignored and deprecated since Python 3.1. File: /usr/local/lib/python3.8/json/__init__.py Type: function
In [5]:
person_dict = json.loads(person)
In [6]:
type(person_dict)
Out[6]:
dict
In [7]:
print(person_dict)
{'id': 1, 'first_name': 'Frasco', 'last_name': 'Necolds', 'email': 'fnecolds0@vk.com', 'gender': 'Male', 'ip_address': '243.67.63.34'}
In [8]:
person_dict['id']
Out[8]:
1
In [9]:
person_dict['first_name']
Out[9]:
'Frasco'
In [10]:
person_dict.keys()
Out[10]:
dict_keys(['id', 'first_name', 'last_name', 'email', 'gender', 'ip_address'])
In [11]:
person_dict.items()
Out[11]:
dict_items([('id', 1), ('first_name', 'Frasco'), ('last_name', 'Necolds'), ('email', 'fnecolds0@vk.com'), ('gender', 'Male'), ('ip_address', '243.67.63.34')])
- Here is an example of a single JSON as string that is part of multiple lines.
In [12]:
import json
In [13]:
person = '''{
"id":1,
"first_name":"Frasco",
"last_name":"Necolds",
"email":"fnecolds0@vk.com",
"gender":"Male",
"ip_address":"243.67.63.34"
}'''
In [14]:
type(person)
Out[14]:
str
In [15]:
person_dict = json.loads(person)
In [16]:
type(person_dict)
Out[16]:
dict
In [17]:
print(person_dict)
{'id': 1, 'first_name': 'Frasco', 'last_name': 'Necolds', 'email': 'fnecolds0@vk.com', 'gender': 'Male', 'ip_address': '243.67.63.34'}
Multiple JSON Documents – One per line¶
Let us go through the steps involved in processing a string which contain one JSON per line.
- We should convert the string into list of JSON strings and then use
json.loads
to process each JSON. - Import
json
module. - Split the string into multiple strings using new line character (
\n
) as delimiter. String have a function called assplitlines
and we should be able to leverage it. - Use
for
loop ormap
function to convert list of JSON Strings into list of dicts. We should usejson.loads
to convert each JSON String as dict.
In [18]:
persons = '''{"id":1,"first_name":"Frasco","last_name":"Necolds","email":"fnecolds0@vk.com","gender":"Male","ip_address":"243.67.63.34"}
{"id":2,"first_name":"Dulce","last_name":"Santos","email":"dsantos1@mashable.com","gender":"Female","ip_address":"60.30.246.227"}
{"id":3,"first_name":"Prissie","last_name":"Tebbett","email":"ptebbett2@infoseek.co.jp","gender":"Genderfluid","ip_address":"22.21.162.56"}
{"id":4,"first_name":"Schuyler","last_name":"Coppledike","email":"scoppledike3@gnu.org","gender":"Agender","ip_address":"120.35.186.161"}
{"id":5,"first_name":"Leopold","last_name":"Jarred","email":"ljarred4@wp.com","gender":"Agender","ip_address":"30.119.34.4"}
{"id":6,"first_name":"Joanna","last_name":"Teager","email":"jteager5@apache.org","gender":"Bigender","ip_address":"245.221.176.34"}
{"id":7,"first_name":"Lion","last_name":"Beere","email":"lbeere6@bloomberg.com","gender":"Polygender","ip_address":"105.54.139.46"}
{"id":8,"first_name":"Marabel","last_name":"Wornum","email":"mwornum7@posterous.com","gender":"Polygender","ip_address":"247.229.14.25"}
{"id":9,"first_name":"Helenka","last_name":"Mullender","email":"hmullender8@cloudflare.com","gender":"Non-binary","ip_address":"133.216.118.88"}
{"id":10,"first_name":"Christine","last_name":"Swane","email":"cswane9@shop-pro.jp","gender":"Polygender","ip_address":"86.16.210.164"}'''
In [19]:
type(persons)
Out[19]:
str
In [20]:
persons.splitlines?
Signature: persons.splitlines(keepends=False) Docstring: Return a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true. Type: builtin_function_or_method
In [21]:
# Using for loop
import json
In [22]:
persons_list = persons.splitlines()
In [23]:
type(persons_list)
Out[23]:
list
In [24]:
type(persons_list[0])
Out[24]:
str
In [25]:
persons_list[1]
Out[25]:
'{"id":2,"first_name":"Dulce","last_name":"Santos","email":"dsantos1@mashable.com","gender":"Female","ip_address":"60.30.246.227"}'
In [26]:
json.loads(persons_list[0])
Out[26]:
{'id': 1, 'first_name': 'Frasco', 'last_name': 'Necolds', 'email': 'fnecolds0@vk.com', 'gender': 'Male', 'ip_address': '243.67.63.34'}
- Converting list of strings to list of dicts using conventional loops.
In [27]:
persons_dict_list = []
for person in persons_list:
persons_dict_list.append(json.loads(person))
In [28]:
type(persons_dict_list)
Out[28]:
list
In [29]:
type(persons_dict_list[0])
Out[29]:
dict
In [30]:
persons_dict_list[0]
Out[30]:
{'id': 1, 'first_name': 'Frasco', 'last_name': 'Necolds', 'email': 'fnecolds0@vk.com', 'gender': 'Male', 'ip_address': '243.67.63.34'}
In [31]:
persons_dict_list[0]['first_name']
Out[31]:
'Frasco'
- Converting list of strings to list of dicts using list comprehensions.
In [32]:
persons_dict_list = [json.loads(person) for person in persons_list]
In [33]:
type(persons_dict_list)
Out[33]:
list
In [34]:
type(persons_dict_list[0])
Out[34]:
dict
In [35]:
persons_dict_list[0]
Out[35]:
{'id': 1, 'first_name': 'Frasco', 'last_name': 'Necolds', 'email': 'fnecolds0@vk.com', 'gender': 'Male', 'ip_address': '243.67.63.34'}
- Converting list of strings to list of dicts using
map
function.
In [36]:
persons_dict_list = list(map(json.loads, persons_list))
In [37]:
type(persons_dict_list)
Out[37]:
list
In [38]:
type(persons_dict_list[0])
Out[38]:
dict
In [39]:
persons_dict_list[0]
Out[39]:
{'id': 1, 'first_name': 'Frasco', 'last_name': 'Necolds', 'email': 'fnecolds0@vk.com', 'gender': 'Male', 'ip_address': '243.67.63.34'}
In [40]:
list(map(lambda person: person['first_name'], persons_dict_list))
Out[40]:
['Frasco', 'Dulce', 'Prissie', 'Schuyler', 'Leopold', 'Joanna', 'Lion', 'Marabel', 'Helenka', 'Christine']
In [41]:
list(filter(lambda person: person['gender'] == 'Female', persons_dict_list))
Out[41]:
[{'id': 2, 'first_name': 'Dulce', 'last_name': 'Santos', 'email': 'dsantos1@mashable.com', 'gender': 'Female', 'ip_address': '60.30.246.227'}]
Multiple JSON Documents – Array¶
Let us go through the details of processing multiple JSON Documents as an array.
- We should be able to use
json.loads
. For the below string it will return Python list. - Steps are same as processing single JSON document.
- Import
json
module. - Use
json.loads
to convert to Python list. - Process using Python capabilities.
- Import
In [42]:
persons = '''[{"id":1,"first_name":"Frasco","last_name":"Necolds","email":"fnecolds0@vk.com","gender":"Male","ip_address":"243.67.63.34"},
{"id":2,"first_name":"Dulce","last_name":"Santos","email":"dsantos1@mashable.com","gender":"Female","ip_address":"60.30.246.227"},
{"id":3,"first_name":"Prissie","last_name":"Tebbett","email":"ptebbett2@infoseek.co.jp","gender":"Genderfluid","ip_address":"22.21.162.56"},
{"id":4,"first_name":"Schuyler","last_name":"Coppledike","email":"scoppledike3@gnu.org","gender":"Agender","ip_address":"120.35.186.161"},
{"id":5,"first_name":"Leopold","last_name":"Jarred","email":"ljarred4@wp.com","gender":"Agender","ip_address":"30.119.34.4"},
{"id":6,"first_name":"Joanna","last_name":"Teager","email":"jteager5@apache.org","gender":"Bigender","ip_address":"245.221.176.34"},
{"id":7,"first_name":"Lion","last_name":"Beere","email":"lbeere6@bloomberg.com","gender":"Polygender","ip_address":"105.54.139.46"},
{"id":8,"first_name":"Marabel","last_name":"Wornum","email":"mwornum7@posterous.com","gender":"Polygender","ip_address":"247.229.14.25"},
{"id":9,"first_name":"Helenka","last_name":"Mullender","email":"hmullender8@cloudflare.com","gender":"Non-binary","ip_address":"133.216.118.88"},
{"id":10,"first_name":"Christine","last_name":"Swane","email":"cswane9@shop-pro.jp","gender":"Polygender","ip_address":"86.16.210.164"}]'''
In [43]:
persons_dict_list = json.loads(persons)
In [44]:
type(persons_dict_list)
Out[44]:
list
In [45]:
type(persons_dict_list[0])
Out[45]:
dict
In [46]:
persons_dict_list[0]
Out[46]:
{'id': 1, 'first_name': 'Frasco', 'last_name': 'Necolds', 'email': 'fnecolds0@vk.com', 'gender': 'Male', 'ip_address': '243.67.63.34'}
- When we use
json.loads
on below string, it will create a dict where value will be a list.
In [47]:
persons = '''{
"results": [
{
"id": 1,
"first_name": "Frasco",
"last_name": "Necolds",
"email": "fnecolds0@vk.com",
"gender": "Male",
"ip_address": "243.67.63.34"
},
{
"id": 2,
"first_name": "Dulce",
"last_name": "Santos",
"email": "dsantos1@mashable.com",
"gender": "Female",
"ip_address": "60.30.246.227"
},
{
"id": 3,
"first_name": "Prissie",
"last_name": "Tebbett",
"email": "ptebbett2@infoseek.co.jp",
"gender": "Genderfluid",
"ip_address": "22.21.162.56"
},
{
"id": 4,
"first_name": "Schuyler",
"last_name": "Coppledike",
"email": "scoppledike3@gnu.org",
"gender": "Agender",
"ip_address": "120.35.186.161"
},
{
"id": 5,
"first_name": "Leopold",
"last_name": "Jarred",
"email": "ljarred4@wp.com",
"gender": "Agender",
"ip_address": "30.119.34.4"
},
{
"id": 6,
"first_name": "Joanna",
"last_name": "Teager",
"email": "jteager5@apache.org",
"gender": "Bigender",
"ip_address": "245.221.176.34"
},
{
"id": 7,
"first_name": "Lion",
"last_name": "Beere",
"email": "lbeere6@bloomberg.com",
"gender": "Polygender",
"ip_address": "105.54.139.46"
},
{
"id": 8,
"first_name": "Marabel",
"last_name": "Wornum",
"email": "mwornum7@posterous.com",
"gender": "Polygender",
"ip_address": "247.229.14.25"
},
{
"id": 9,
"first_name": "Helenka",
"last_name": "Mullender",
"email": "hmullender8@cloudflare.com",
"gender": "Non-binary",
"ip_address": "133.216.118.88"
},
{
"id": 10,
"first_name": "Christine",
"last_name": "Swane",
"email": "cswane9@shop-pro.jp",
"gender": "Polygender",
"ip_address": "86.16.210.164"
}
]
}'''
In [48]:
import json
In [49]:
person_results = json.loads(persons)
In [50]:
type(person_results)
Out[50]:
dict
In [51]:
person_results.keys()
Out[51]:
dict_keys(['results'])
In [52]:
type(person_results['results'])
Out[52]:
list
In [53]:
type(person_results['results'][0])
Out[53]:
dict
In [54]:
person_results['results'][0]
Out[54]:
{'id': 1, 'first_name': 'Frasco', 'last_name': 'Necolds', 'email': 'fnecolds0@vk.com', 'gender': 'Male', 'ip_address': '243.67.63.34'}
In [55]:
results = person_results['results']
In [56]:
type(results)
Out[56]:
list
In [57]:
results[0]
Out[57]:
{'id': 1, 'first_name': 'Frasco', 'last_name': 'Necolds', 'email': 'fnecolds0@vk.com', 'gender': 'Male', 'ip_address': '243.67.63.34'}
Exercise on processing collections¶
Take person_results and get list of dicts where each dict contain id, first_name and email. We would like to send an offer for all the persons in the form of email.
- You should use
map
function for the same. - Do not use loops.
]