Fundamentals 6 min read

Restaurant Data Analysis Case Study: Selecting Competitive Categories and Optimal Locations with Python and QGIS

This case study demonstrates how to clean, explore, and model a large restaurant dataset using Python and QGIS, derive a price‑performance metric, rank food categories, and identify the most suitable locations for opening a dessert shop through visualisation and scoring formulas.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Restaurant Data Analysis Case Study: Selecting Competitive Categories and Optimal Locations with Python and QGIS

Project background – The exercise uses a restaurant dataset (96,398 records, 10 features) to practice data‑driven decision making with Python and QGIS.

Problem definition – (1) Identify the most competitive food category; (2) Compute a comprehensive score to select the best address for that category.

Data understanding – After loading the CSV, data.info() shows 10 columns of mixed types (float64, int64, object) and a small amount of missing values.

Data cleaning – data.isnull().values.sum() reveals 283 nulls (0.29%). Because the proportion is negligible, rows with nulls are dropped using data.dropna() , resulting in 96,255 clean records.

Feature selection – For the competitiveness analysis, five columns are kept: ['类别', '口味', '环境', '服务', '人均消费'] . Records with zero scores or zero consumption are filtered out.

Feature engineering – A new column 性价比 (price‑performance) is created by summing the three rating columns and dividing by 人均消费 (average spend), yielding a score per yuan.

Model building – grouping – The cleaned DataFrame df (54,886 rows) is grouped by 类别 and the mean of each metric is calculated.

Outlier handling – Box‑plot visualisation is used to detect outliers; an outlier‑removal function deletes extreme values before scoring.

Scoring formula – A weighted formula (e.g., 口味:人均消费:性价比 = 2:5:3) combines the three indicators into a final score. The three intermediate DataFrames are merged with pd.merge and the final score is computed.

Result – The category "甜品" (dessert) obtains the highest score, indicating it as the most competitive.

Location analysis – Using QGIS, layers for population density, road density, restaurant density, competitor density, longitude and latitude are prepared. Missing values are filled with zero ( dealdata.fillna(0, inplace=True) ) and the features are standardised. A composite index is calculated with weights 4:3:2:1 for the four density metrics.

Visualization – Matplotlib scatter plots and Bokeh interactive maps illustrate the spatial distribution of the composite index and highlight optimal sites: (121°472′E,31°301′N), (121°473′E,31°274′N), (121°493′E,31°244′N).

Conclusion – Opening a dessert shop at the identified coordinates maximises the combined score of market demand and competitive advantage.

data analysisdata-visualizationQGISLocation OptimizationRestaurant
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.