Basics

Pandas is Python’s module for working with tabular data (data that has rows and columns). Pandas gives you the functionality of programs like SQL or Excel along with all the power of Python.

To use the package, we usually import it at the top of a Python file under the alias pd.

import pandas as pd

In pandas, the two major data structures are Series and DataFrame.

df = pd.DataFrame([
  ['January', 100, 100, 23, 100],
  ['February', 51, 45, 145, 45],
  ['March', 81, 96, 65, 96],
  ['April', 80, 80, 54, 180],
  ['May', 51, 54, 54, 154],
  ['June', 112, 109, 79, 129]],
  columns=['month', 'clinic_east',
           'clinic_north', 'clinic_south',
           'clinic_west']
)

clinic_north = df.clinic_north

print(type(clinic_north))
# <class 'pandas.core.series.Series'>

print(type(df))
# <class 'pandas.core.frame.DataFrame'>

Series

A Series is a one-dimensional array-like object containing an array of data (of any NumPy data type) and an associated array of data labels, called its index .

Another way to think about a Series is as a fixed-length, ordered dict, as it is a mapping of index values to data values.

# index generated automatically
In [4]: obj = Series([4, 7, -5, 3])
In [5]: obj
Out[5]:
0 4
1 7
2 -5
3 3

# get the values
In [6]: obj.values
Out[6]: array([ 4, 7, -5, 3])

# get the index
In [7]: obj.index
Out[7]: Int64Index([0, 1, 2, 3])

# specify the index
In [8]: obj2 = Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])
In [9]: obj2
Out[9]:
d 4
b 7
a -5
c 3

# indexing and slicing using index
In [11]: obj2['a']
Out[11]: -5

In [12]: obj2['d'] = 6
In [13]: obj2[['c', 'a', 'd']]
Out[13]:
c 3
a -5
d 6

# convert from dictionary
In [20]: sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
In [21]: obj3 = Series(sdata)
In [22]: obj3
Out[22]:
Ohio 35000
Oregon 16000
Texas 71000
Utah 5000

Last updated

Was this helpful?