Fundamentals 20 min read

Advanced Pandas Tutorial: Data Access, Indexing, Merging, Grouping, and Time Handling

This article extends a basic pandas tutorial by covering advanced data access methods, index types, concatenation, merging, joining, grouping, aggregation, time series handling, and basic plotting, providing clear code examples and explanations for each feature.

Python Programming Learning Circle

Apr 6, 2021

Advanced Pandas Tutorial: Data Access, Indexing, Merging, Grouping, and Time Handling

The article continues from a previous introductory pandas tutorial and assumes the reader has completed the basics. It provides a GitHub repository (pandas_tutorial) for test data and source code.

Data Access – Two fundamental ways to access pandas data are demonstrated using the

# select_data.py
import pandas as pd
import numpy as np

series1 = pd.Series([1, 2, 3, 4, 5, 6, 7], index=["C", "D", "E", "F", "G", "A", "B"])
print("series1['E'] = {} 
".format(series1['E']))
print("series1.E = {} 
".format(series1.E))

which outputs the same value via bracket and dot notation. Notes explain that dot notation works only when the index label is a valid Python identifier.

loc and iloc – Access by label and by integer position is shown:

# select_data.py
import pandas as pd
import numpy as np

df1 = pd.DataFrame({"note": ["C", "D", "E", "F", "G", "A", "B"],
                    "weekday": ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]},
                    index=['1','2','3','4','5','6','7'])
print("df1.loc['2']:
{}
".format(df1.loc['2']))

print("series1.loc['E':'A']=
{}
".format(series1.loc['E':'A']))
print("df1.iloc[2:4]=
{}
".format(df1.iloc[2:4]))

The output demonstrates selecting a single row, a label slice, and an integer‑position slice.

at and iat – For scalar value access:

# select_data.py
print("series1.at['E']={}
".format(series1.at['E']))
print("df1.iloc[4,1]={}
".format(df1.iloc[4,1]))

Outputs the scalar values 3 and "Fri" respectively.

Index Objects – Creating an Index with a name:

# index.py
index = pd.Index(["C","D","E","F","G","A","B"], name='note')

Set operations are illustrated:

# index.py
a = pd.Index([1,2,3,4,5])
b = pd.Index([3,4,5,6,7])
print("a|b = {}
".format(a|b))
print("a&b = {}
".format(a&b))
print("a.difference(b) = {}
".format(a.difference(b)))

Resulting unions, intersections, and differences are shown.

MultiIndex – A hierarchical index is built:

# multiindex.py
import pandas as pd
import numpy as np

multiIndex = pd.MultiIndex.from_arrays([
    ["Geagle","Geagle","Geagle","Geagle","Epple","Epple","Epple","Epple","Macrosoft","Macrosoft","Macrosoft","Macrosoft"],
    ["S1","S2","S3","S4","S1","S2","S3","S4","S1","S2","S3","S4"]],
    names=("Company","Turnover"))

df = pd.DataFrame(data=np.random.randint(0,1000,36).reshape(-1,12), index=[2016,2017,2018], columns=multiIndex)
print("df = 
{}
".format(df))

The example shows how to create a DataFrame with a MultiIndex and retrieve specific quarters for each company.

Data Concatenation – Using pd.concat and DataFrame.append to stack DataFrames vertically and horizontally:

# concat_append.py
import pandas as pd
import numpy as np

df1 = pd.DataFrame({"Note": ["C","D"], "Weekday": ["Mon","Tue"]}, index=[1,2])
df2 = pd.DataFrame({"Note": ["E","F"], "Weekday": ["Wed","Thu"]}, index=[3,4])
df3 = pd.DataFrame({"Note": ["G","A","B"], "Weekday": ["Fri","Sat","Sun"]}, index=[5,6,7])

df_concat = pd.concat([df1, df2, df3], keys=['df1','df2','df3'])
print("df_concat=
{}
".format(df_concat))

df_concat_column = pd.concat([df1, df2, df3], axis=1)
print("df_concat_column=
{}
".format(df_concat_column))

df_append = df1.append([df2, df3])
print("df_append=
{}
".format(df_append))

The outputs illustrate vertical concatenation (with a MultiIndex) and horizontal concatenation.

Merge and Join – Demonstrating SQL‑like merges:

# merge_join.py
import pandas as pd
import numpy as np

df1 = pd.DataFrame({"key": ["K1","K2","K3","K4"], "A": ["A1","A2","A3","A8"], "B": ["B1","B2","B3","B8"]})
df2 = pd.DataFrame({"key": ["K3","K4","K5","K6"], "A": ["A3","A4","A5","A6"], "B": ["B3","B4","B5","B6"]})

merge_df = pd.merge(df1, df2)
merge_inner = pd.merge(df1, df2, how='inner', on=['key'])
merge_left = pd.merge(df1, df2, how='left')
merge_left_on_key = pd.merge(df1, df2, how='left', on=['key'])
merge_right_on_key = pd.merge(df1, df2, how='right', on=['key'])
merge_outer = pd.merge(df1, df2, how='outer', on=['key'])
print("merge_df=
{}
".format(merge_df))
print("merge_inner=
{}
".format(merge_inner))
print("merge_left=
{}
".format(merge_left))
print("merge_left_on_key=
{}
".format(merge_left_on_key))
print("merge_right_on_key=
{}
".format(merge_right_on_key))
print("merge_outer=
{}
".format(merge_outer))

The results show inner, left, right, and outer joins. A further example uses DataFrame.join with suffixes:

# merge_join.py

df3 = pd.DataFrame({"key": ["K1","K2","K3","K4"], "A": ["A1","A2","A3","A8"], "B": ["B1","B2","B3","B8"]}, index=[0,1,2,3])
df4 = pd.DataFrame({"key": ["K3","K4","K5","K6"], "C": ["A3","A4","A5","A6"], "D": ["B3","B4","B5","B6"]}, index=[1,2,3,4])

join_df = df3.join(df4, lsuffix='_self', rsuffix='_other')
join_left = df3.join(df4, how='left', lsuffix='_self', rsuffix='_other')
print("join_df=
{}
".format(join_df))
print("join_left=
{}
".format(join_left))

Grouping and Aggregation – Using groupby, agg, and apply:

# groupby.py
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['A','A','A','B','B','B','C','C','C'],
    'Data': np.random.randint(0,100,9)})
print('df=
{}
'.format(df))

grouped = df.groupby('Name')
for name, group in grouped:
    print('Name: {}
Group:
{}
'.format(name, group))

print('Sum:')
print(grouped.sum())
print('Agg Sum:')
print(grouped.agg(['sum']))
print('Agg Map:')
print(grouped.agg([('Total','sum'),('Min','min')]))

def sort(df):
    return df.sort_values(by='Data', ascending=False)
print('Sort Group: 
{}
'.format(grouped.apply(sort)))

The script shows grouping by a column, computing sums, custom aggregations, and applying a sorting function to each group.

Time Series – Working with timestamps, periods, and intervals using Python's datetime module and pandas' date_range:

# time.py
import datetime as dt
import numpy as np
import pandas as pd

now = dt.datetime.now()
print('Now is {}'.format(now))
yesterday = now - dt.timedelta(1)
print('Yesterday is {}'.format(yesterday.strftime('%Y-%m-%d')))

this_year = pd.date_range(dt.datetime(2018,1,1), dt.datetime(2018,12,31), freq='5D')
print('Selected days in 2018: 
{}
'.format(this_year))

df = pd.DataFrame(np.random.randint(0,100,this_year.size), index=this_year)
print('Jan: 
{}
'.format(df['2018-01']))

The example prints the current date, generates a 5‑day frequency date range for 2018, creates a DataFrame indexed by that range, and selects all rows for January.

Plotting – A brief illustration of using matplotlib to plot a histogram of a CSV file:

# plot.py
import matplotlib.pyplot as plt
import pandas as pd

data = pd.read_csv('data/housing.csv')
data.hist(bins=50, figsize=(15,12))
plt.show()

The code reads a housing dataset and displays a histogram.

The article concludes by noting that while it covers many advanced pandas topics, there is still much more to explore, and readers are encouraged to consult the official documentation for deeper learning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

pandas data manipulation Merging grouping time-series

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.