Write Delimited Strings into files¶
Let us understand how to write delimited strings into files. We will start with a collection or list of tuples and see how to convert to delimited strings before writing to a file.
Here are the steps involved to write list of tuples into file as delimited strings.
- Convert the list of tuples into list of delimited strings.
- Open the file in write mode using
w
(overwrite) ora
(append). - Add the data into the file.
- Ensure that the data in the file is validated.
In [1]:
orders = [(1, '2013-07-25 00:00:00.0', 11599, 'CLOSED'),
(2, '2013-07-25 00:00:00.0', 256, 'PENDING_PAYMENT'),
(3, '2013-07-25 00:00:00.0', 12111, 'COMPLETE'),
(4, '2013-07-25 00:00:00.0', 8827, 'CLOSED'),
(5, '2013-07-25 00:00:00.0', 11318, 'COMPLETE'),
(6, '2013-07-25 00:00:00.0', 7130, 'COMPLETE'),
(7, '2013-07-25 00:00:00.0', 4530, 'COMPLETE'),
(8, '2013-07-25 00:00:00.0', 2911, 'PROCESSING'),
(9, '2013-07-25 00:00:00.0', 5657, 'PENDING_PAYMENT'),
(10, '2013-07-25 00:00:00.0', 5648, 'PENDING_PAYMENT')]
In [2]:
type(orders)
Out[2]:
list
In [3]:
orders[0]
Out[3]:
(1, '2013-07-25 00:00:00.0', 11599, 'CLOSED')
In [4]:
type(orders[0])
Out[4]:
tuple
In [5]:
order = orders[0]
In [6]:
str.join?
Signature: str.join(self, iterable, /) Docstring: Concatenate any number of strings. The string whose method is called is inserted in between each given string. The result is returned as a new string. Example: '.'.join(['ab', 'pq', 'rs']) -> 'ab.pq.rs' Type: method_descriptor
In [7]:
'hello'.join
Out[7]:
<function str.join(iterable, /)>
In [8]:
','.join(order) # throws error as first and third elements are of type int
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Input In [8], in <cell line: 1>() ----> 1 ','.join(order) TypeError: sequence item 0: expected str instance, int found
In [9]:
[str(item) for item in order]
Out[9]:
['1', '2013-07-25 00:00:00.0', '11599', 'CLOSED']
In [10]:
# Convering all the items in tuple to strings using list comprehension
','.join([str(item) for item in order])
Out[10]:
'1,2013-07-25 00:00:00.0,11599,CLOSED'
In [11]:
list(map(lambda item: str(item), order))
Out[11]:
['1', '2013-07-25 00:00:00.0', '11599', 'CLOSED']
In [12]:
# Convering all the items in tuple to strings using map function
','.join(map(lambda item: str(item), order))
Out[12]:
'1,2013-07-25 00:00:00.0,11599,CLOSED'
In [13]:
orders
Out[13]:
[(1, '2013-07-25 00:00:00.0', 11599, 'CLOSED'), (2, '2013-07-25 00:00:00.0', 256, 'PENDING_PAYMENT'), (3, '2013-07-25 00:00:00.0', 12111, 'COMPLETE'), (4, '2013-07-25 00:00:00.0', 8827, 'CLOSED'), (5, '2013-07-25 00:00:00.0', 11318, 'COMPLETE'), (6, '2013-07-25 00:00:00.0', 7130, 'COMPLETE'), (7, '2013-07-25 00:00:00.0', 4530, 'COMPLETE'), (8, '2013-07-25 00:00:00.0', 2911, 'PROCESSING'), (9, '2013-07-25 00:00:00.0', 5657, 'PENDING_PAYMENT'), (10, '2013-07-25 00:00:00.0', 5648, 'PENDING_PAYMENT')]
In [14]:
orders_csv = map(lambda order: ','.join(map(lambda item: str(item), order)), orders)
In [15]:
list(orders_csv)
Out[15]:
['1,2013-07-25 00:00:00.0,11599,CLOSED', '2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT', '3,2013-07-25 00:00:00.0,12111,COMPLETE', '4,2013-07-25 00:00:00.0,8827,CLOSED', '5,2013-07-25 00:00:00.0,11318,COMPLETE', '6,2013-07-25 00:00:00.0,7130,COMPLETE', '7,2013-07-25 00:00:00.0,4530,COMPLETE', '8,2013-07-25 00:00:00.0,2911,PROCESSING', '9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT', '10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT']
In [16]:
orders_csv = map(lambda order: ','.join(map(lambda item: str(item), order)), orders)
order = list(orders_csv)[0]
order
Out[16]:
'1,2013-07-25 00:00:00.0,11599,CLOSED'
- Writing CSV strings one at a time to the file.
In [17]:
!rm -rf data/retail_db/orders
In [18]:
!mkdir -p data/retail_db/orders
In [19]:
orders_file = open('data/retail_db/orders/part-00000', 'w')
In [20]:
orders_csv = map(lambda order: ','.join(map(lambda item: str(item), order)), orders)
In [21]:
for order in orders_csv:
orders_file.write(f'{order}\n')
In [22]:
orders_file.close()
- Writing as one big string. As we are opening the file using
w
, the file will be truncated. It means the contents of the file will be overwritten with the string we are trying to write to the file.
In [23]:
orders_csv = map(lambda order: ','.join(map(lambda item: str(item), order)), orders)
In [24]:
orders_string = '\n'.join(orders_csv)
In [25]:
orders_string
Out[25]:
'1,2013-07-25 00:00:00.0,11599,CLOSED\n2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT\n3,2013-07-25 00:00:00.0,12111,COMPLETE\n4,2013-07-25 00:00:00.0,8827,CLOSED\n5,2013-07-25 00:00:00.0,11318,COMPLETE\n6,2013-07-25 00:00:00.0,7130,COMPLETE\n7,2013-07-25 00:00:00.0,4530,COMPLETE\n8,2013-07-25 00:00:00.0,2911,PROCESSING\n9,2013-07-25 00:00:00.0,5657,PENDING_PAYMENT\n10,2013-07-25 00:00:00.0,5648,PENDING_PAYMENT'
In [26]:
orders_file = open('data/retail_db/orders/part-00000', 'w')
In [27]:
orders_file.write(orders_string)
Out[27]:
401
In [28]:
orders_file.close()