Sai

Creating Data Frames from Lists

[

Creating Data Frames from lists¶

Let us go through the details of creating Data Frames using collections.

Pandas Data Frame is a two-dimensional labeled array capable of holding attributes of any data type.
It is similar to multi column excel spreadsheet or a database table.
We can create Data Frame using list of tuples or list of dicts.
We can also create Data Frames using data from files. We will have a look at it later.

In [1]:

import pandas as pd

{note}
Creating Pandas Data Frame using list of tuples.

In [2]:

sals_ld = [(1, 1500.0), (2, 2000.0, 10.0), (3, 2200.00)]

In [3]:

sals_df = pd.DataFrame(sals_ld)

In [4]:

sals_df

Out[4]:

	0	1	2
0	1	1500.0	NaN
1	2	2000.0	10.0
2	3	2200.0	NaN

In [5]:

sals_df = pd.DataFrame(sals_ld, columns=['id', 'sal', 'comm'])

In [6]:

sals_df

Out[6]:

	id	sal	comm
0	1	1500.0	NaN
1	2	2000.0	10.0
2	3	2200.0	NaN

In [7]:

sals_df['id']

Out[7]:

0    1
1    2
2    3
Name: id, dtype: int64

In [8]:

sals_df[['id', 'sal']]

Out[8]:

	id	sal
0	1	1500.0
1	2	2000.0
2	3	2200.0

{note}
Creating Pandas Data Frame using list of dicts.

In [9]:

sals_ld = [
    {'id': 1, 'sal': 1500.0},
    {'id': 2, 'sal': 2000.0},
    {'id': 3, 'sal': 2200.0}
]

{note}
Column names will be inherited automatically using keys from the dict.

In [10]:

sals_df = pd.DataFrame(sals_ld)

In [11]:

sals_df

Out[11]:

	id	sal
0	1	1500.0
1	2	2000.0
2	3	2200.0

In [12]:

sals_df['id']

Out[12]:

0    1
1    2
2    3
Name: id, dtype: int64

In [13]:

sals_ld = [
    {'id': 1, 'sal': 1500.0},
    {'id': 2, 'sal': 2000.0, 'comm': 10},
    {'id': 3, 'sal': 2200.0}
]

In [14]:

pd.DataFrame?

Init signature:
pd.DataFrame(
    data=None,
    index: 'Axes | None' = None,
    columns: 'Axes | None' = None,
    dtype: 'Dtype | None' = None,
    copy: 'bool | None' = None,
)
Docstring:     
Two-dimensional, size-mutable, potentially heterogeneous tabular data.

Data structure also contains labeled axes (rows and columns).
Arithmetic operations align on both row and column labels. Can be
thought of as a dict-like container for Series objects. The primary
pandas data structure.

Parameters
----------
data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame
    Dict can contain Series, arrays, constants, dataclass or list-like objects. If
    data is a dict, column order follows insertion-order.

    .. versionchanged:: 0.25.0
       If data is a list of dicts, column order follows insertion-order.

index : Index or array-like
    Index to use for resulting frame. Will default to RangeIndex if
    no indexing information part of input data and no index provided.
columns : Index or array-like
    Column labels to use for resulting frame when data does not have them,
    defaulting to RangeIndex(0, 1, 2, ..., n). If data contains column labels,
    will perform column selection instead.
dtype : dtype, default None
    Data type to force. Only a single dtype is allowed. If None, infer.
copy : bool or None, default None
    Copy data from inputs.
    For dict data, the default of None behaves like ``copy=True``.  For DataFrame
    or 2d ndarray input, the default of None behaves like ``copy=False``.

    .. versionchanged:: 1.3.0

See Also
--------
DataFrame.from_records : Constructor from tuples, also record arrays.
DataFrame.from_dict : From dicts of Series, arrays, or dicts.
read_csv : Read a comma-separated values (csv) file into DataFrame.
read_table : Read general delimited file into DataFrame.
read_clipboard : Read text from clipboard into DataFrame.

Examples
--------
Constructing DataFrame from a dictionary.

>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df
   col1  col2
0     1     3
1     2     4

Notice that the inferred dtype is int64.

>>> df.dtypes
col1    int64
col2    int64
dtype: object

To enforce a single dtype:

>>> df = pd.DataFrame(data=d, dtype=np.int8)
>>> df.dtypes
col1    int8
col2    int8
dtype: object

Constructing DataFrame from numpy ndarray:

>>> df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
...                    columns=['a', 'b', 'c'])
>>> df2
   a  b  c
0  1  2  3
1  4  5  6
2  7  8  9

Constructing DataFrame from a numpy ndarray that has labeled columns:

>>> data = np.array([(1, 2, 3), (4, 5, 6), (7, 8, 9)],
...                 dtype=[("a", "i4"), ("b", "i4"), ("c", "i4")])
>>> df3 = pd.DataFrame(data, columns=['c', 'a'])
...
>>> df3
   c  a
0  3  1
1  6  4
2  9  7

Constructing DataFrame from dataclass:

>>> from dataclasses import make_dataclass
>>> Point = make_dataclass("Point", [("x", int), ("y", int)])
>>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])
   x  y
0  0  0
1  0  3
2  2  3
File:           ~/.local/lib/python3.8/site-packages/pandas/core/frame.py
Type:           type
Subclasses:     SubclassedDataFrame

In [15]:

sals_ld

Out[15]:

[{'id': 1, 'sal': 1500.0},
 {'id': 2, 'sal': 2000.0, 'comm': 10},
 {'id': 3, 'sal': 2200.0}]

In [16]:

sals_df = pd.DataFrame(sals_ld)

In [17]:

sals_df

Out[17]:

	id	sal	comm
0	1	1500.0	NaN
1	2	2000.0	10.0
2	3	2200.0	NaN

]

Sai

Creating Data Frames from Lists

Creating Data Frames from lists¶

Share this post

Join Our Community

Follow Us

Links

Contact Info

Address

Phone

Email