list and set – Usage¶
Let us see some real world usage of list and set while building Python based applications.* list
is used more often than set
.
- Reading data from file into a
list
- Reading data from a table into a
list
- We can convert a
list
toset
to perform these operations.
- We can convert a
- Get unique elements from the
list
- Perform
set
operations between 2 lists such as union, intersection, difference etc.- We can convert a
set
tolist
to perform these operations.
- We can convert a
- Reverse the collection
- Append multiple collections to create new collections while retaining duplicates
- You will see some of these in action as we get into other related topics down the line
In [ ]:
%%sh
ls -ltr /data/retail_db/orders/part-00000
In [1]:
# Reading data from file into a list
path = '/data/retail_db/orders/part-00000'
# C:\\users\\itversity\\Research
orders_file = open(path)
In [2]:
orders_raw = orders_file.read()
In [3]:
orders = orders_raw.splitlines()
In [4]:
orders[:10]
Out[4]:
['1,2013-07-25 00:00:00.0,11599,CLOSED', '2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT', '3,2013-07-25 00:00:00.0,12111,COMPLETE', '4,2013-07-25 00:00:00.0,8827,CLOSED', '5,2013-07-25 00:00:00.0,11318,COMPLETE', '6,2013-07-25 00:00:00.0,7130,COMPLETE', '7,2013-07-25 00:00:00.0,4530,COMPLETE', '8,2013-07-25 00:00:00.0,2911,PROCESSING', '9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT', '10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']
In [5]:
len(orders) # same as number of records in the file
Out[5]:
68883
In [6]:
# Get unique dates
dates = ['2013-07-25 00:00:00.0', '2013-07-25 00:00:00.0', '2013-07-26 00:00:00.0', '2014-01-25 00:00:00.0']
In [7]:
dates
Out[7]:
['2013-07-25 00:00:00.0', '2013-07-25 00:00:00.0', '2013-07-26 00:00:00.0', '2014-01-25 00:00:00.0']
In [8]:
len(dates)
Out[8]:
4
In [9]:
set(dates)
Out[9]:
{'2013-07-25 00:00:00.0', '2013-07-26 00:00:00.0', '2014-01-25 00:00:00.0'}
In [10]:
len(dates)
Out[10]:
4
In [11]:
# Creating new collection retaining duplicates using 2 sets
s1 = {'2013-07-25 00:00:00.0', '2013-07-26 00:00:00.0', '2014-01-25 00:00:00.0'}
In [12]:
s2 = {'2013-08-25 00:00:00.0', '2013-08-26 00:00:00.0', '2014-01-25 00:00:00.0'}
In [13]:
s1.union(s2)
Out[13]:
{'2013-07-25 00:00:00.0', '2013-07-26 00:00:00.0', '2013-08-25 00:00:00.0', '2013-08-26 00:00:00.0', '2014-01-25 00:00:00.0'}
In [14]:
len(s1.union(s2))
Out[14]:
5
In [15]:
s = list(s1) + list(s2)
In [16]:
s
Out[16]:
['2013-07-25 00:00:00.0', '2013-07-26 00:00:00.0', '2014-01-25 00:00:00.0', '2013-08-25 00:00:00.0', '2013-08-26 00:00:00.0', '2014-01-25 00:00:00.0']
In [17]:
len(s)
Out[17]:
6