Let us understand basics behind collections such as list, set and dict as well as tuples. While tuple is unnamed object with multiple attributes collections can be often group of tuples.
Read data from files into collection
Collections – list, set and dict – group of homogeneous elements
Basic Operations on Collections
Tuples – group of heterogeneous elements
Develop data processing applications (using loops over collections)
File I/O
To understand collections in detail it is better to read real world data rather than using hypothetical examples. Let us assume that we got data in files and we will see how we can create collections out of the data in files.
Python have simple and yet rich APIs to perform file I/O
We can create file object with open in different modes (by default read only mode)
To read the contents from the file into memory, we have APIs on top of file object such as read
read will create large string using contents of the files
If the data have multiple records with new line character as delimiter, we can apply splitlines on the output of read
splitlines will convert the string into list with new line character as delimiter
Here is the sample code which reads data from files into collection (list in this case)
Collections
Let us see the definition and characteristics of different collection types that are supported by Python.
list
Group of elements with index and length
Elements can be added/inserted at a particular position
We can access elements in list by using index in []
There can be duplicates in a list
APIs are available to add elements to the list, delete elements from the list and sort the list
We will see some basic list operations by using simple examples
Adding elements into list (append, insert)
Deleting elements from list (pop, clear)
Checking how many times an element is repeated in list (count)
Get the position of element (index)
Sorting elements in the list (sort for in place sort and sorted for sorting and creating new collection)
set
Group of unique elements with no index or length
Elements can be added/inserted but not at a particular position
We can check whether the element exists using in operator
There can be no duplicates in a set
APIs are available to add elements to the set, delete elements from the set and perform set operations such as union, intersection etc
We need to convert set to list to sort the data or use sorted function. There is no API available in set to sort it.
We will see some basic set operations by using simple examples
Adding elements into set (add)
Deleting elements from set (pop/remove, clear)
Checking whether element is present in a set ([])
Set operations (union, intersection, difference etc)
dict
Group of key value pairs
Keys are unique
Values need not be unique
We can access values using keys
APIs are available to add new key value pairs to a dict, update values based on keys in dict, extract keys as set from dict, extract values as list from dict, to check whether key exists in the dict etc
We will see some basic dict operations by using simple examples
Adding elements to dict
Removing elements from dict (clear, pop, popitem)
Get all keys (keys)
Get all key value pairs (items)
Get only values (values)
Tuple
Now let us understand definition and characteristics of a tuple.
Tuple is like object with unnamed attributes
Values of attributes can be accessed only using positional notation
It represents individual row in a table or spread sheet with multiple attributes
We use () to represent tuples
Tuples are immutable
Very limited operations are available – e.g.: count, index
Develop applications (using loops)
As we understand how to read data from files and also manipulate collections now we will see how we can process data which is read from files into collections using traditional loops and collection as well as tuple data structures.
Here is sample program to develop applications using loops.
In this video you will see several examples. This will help you get better understanding about how to develop applications using loops.
Task 1: Get all order statuses from orders data (loop through list, get order status and add it to the set)
Task 2: Create a function to get revenue for a given order_id (from order_items)
Task 3: Create a function to get revenue for each order_id (from order_items)
Task 4: Create a function to get daily revenue using orders which are completed or closed and order items
Exercises
Get number of records by status (using orders data)
Get number of orders per month (using orders data)
Get those order items where order item subtotal is not equal to order item quantity multiplied by order item product price
Get all those order details from orders where there are no corresponding order items
Get all those products whose daily revenue is more than $1000 – we need order_date, product_id and product_revenue