[
Overview of Series¶
Let us quickly go through one of the Pandas Data Structure – Series.
- Pandas Series is a one-dimensional labeled array capable of holding any data type.
- It is similar to one column in an excel spreadsheet or a database table.
- We can create Series by using dict.
In [1]:
d = {"JAN": 10, "FEB": 15, "MAR": 12, "APR": 16}
In [2]:
type(d)
Out[2]:
dict
In [3]:
d
Out[3]:
{'JAN': 10, 'FEB': 15, 'MAR': 12, 'APR': 16}
In [4]:
import pandas as pd
s = pd.Series(d)
In [5]:
s
Out[5]:
JAN 10 FEB 15 MAR 12 APR 16 dtype: int64
In [6]:
import pandas as pd
s = pd.Series(d, name='val')
In [7]:
s
Out[7]:
JAN 10 FEB 15 MAR 12 APR 16 Name: val, dtype: int64
In [8]:
s['FEB']
Out[8]:
15
In [9]:
s[0]
Out[9]:
10
In [10]:
s[1:3]
Out[10]:
FEB 15 MAR 12 Name: val, dtype: int64
In [11]:
type(s)
Out[11]:
pandas.core.series.Series
In [12]:
s.sum()
Out[12]:
53
In [13]:
l = [10, 15, 12, 16]
In [14]:
l_s = pd.Series(l)
In [15]:
l_s
Out[15]:
0 10 1 15 2 12 3 16 dtype: int64
In [16]:
l_s[0]
Out[16]:
10
- When we fetch only one column from a Pandas Dataframe, it will be returned as Series.
{note}
Don’t worry too much about creating Data Frames yet, we are trying to understand how Data Frame and Series are related.
In [17]:
orders_path = "/data/retail_db/orders/part-00000"
In [18]:
orders_schema = [
"order_id",
"order_date",
"order_customer_id",
"order_status"
]
In [19]:
orders = pd.read_csv(orders_path,
header=None,
names=orders_schema
)
In [20]:
orders
Out[20]:
order_id | order_date | order_customer_id | order_status | |
---|---|---|---|---|
0 | 1 | 2013-07-25 00:00:00.0 | 11599 | CLOSED |
1 | 2 | 2013-07-25 00:00:00.0 | 256 | PENDING_PAYMENT |
2 | 3 | 2013-07-25 00:00:00.0 | 12111 | COMPLETE |
3 | 4 | 2013-07-25 00:00:00.0 | 8827 | CLOSED |
4 | 5 | 2013-07-25 00:00:00.0 | 11318 | COMPLETE |
… | … | … | … | … |
68878 | 68879 | 2014-07-09 00:00:00.0 | 778 | COMPLETE |
68879 | 68880 | 2014-07-13 00:00:00.0 | 1117 | COMPLETE |
68880 | 68881 | 2014-07-19 00:00:00.0 | 2518 | PENDING_PAYMENT |
68881 | 68882 | 2014-07-22 00:00:00.0 | 10000 | ON_HOLD |
68882 | 68883 | 2014-07-23 00:00:00.0 | 5533 | COMPLETE |
68883 rows × 4 columns
In [21]:
type(orders)
Out[21]:
pandas.core.frame.DataFrame
In [22]:
orders.order_date
Out[22]:
0 2013-07-25 00:00:00.0 1 2013-07-25 00:00:00.0 2 2013-07-25 00:00:00.0 3 2013-07-25 00:00:00.0 4 2013-07-25 00:00:00.0 ... 68878 2014-07-09 00:00:00.0 68879 2014-07-13 00:00:00.0 68880 2014-07-19 00:00:00.0 68881 2014-07-22 00:00:00.0 68882 2014-07-23 00:00:00.0 Name: order_date, Length: 68883, dtype: object
In [23]:
type(orders.order_date)
Out[23]:
pandas.core.series.Series
In [24]:
order_dates = orders.order_date
order_dates
Out[24]:
0 2013-07-25 00:00:00.0 1 2013-07-25 00:00:00.0 2 2013-07-25 00:00:00.0 3 2013-07-25 00:00:00.0 4 2013-07-25 00:00:00.0 ... 68878 2014-07-09 00:00:00.0 68879 2014-07-13 00:00:00.0 68880 2014-07-19 00:00:00.0 68881 2014-07-22 00:00:00.0 68882 2014-07-23 00:00:00.0 Name: order_date, Length: 68883, dtype: object
In [25]:
type(order_dates)
Out[25]:
pandas.core.series.Series
]