Fundamentals 6 min read

Restaurant Data Analysis Case Study: Selecting Competitive Categories and Optimal Locations with Python and QGIS

This case study demonstrates how to clean, explore, and model a large restaurant dataset using Python and QGIS, derive a price‑performance metric, rank food categories, and identify the most suitable locations for opening a dessert shop through visualisation and scoring formulas.

Python Programming Learning Circle

May 5, 2020

Restaurant Data Analysis Case Study: Selecting Competitive Categories and Optimal Locations with Python and QGIS

Project background – The exercise uses a restaurant dataset (96,398 records, 10 features) to practice data‑driven decision making with Python and QGIS.

Problem definition – (1) Identify the most competitive food category; (2) Compute a comprehensive score to select the best address for that category.

Data understanding – After loading the CSV, data.info() shows 10 columns of mixed types (float64, int64, object) and a small amount of missing values.

Data cleaning – data.isnull().values.sum() reveals 283 nulls (0.29%). Because the proportion is negligible, rows with nulls are dropped using data.dropna(), resulting in 96,255 clean records.

Feature selection – For the competitiveness analysis, five columns are kept: ['类别', '口味', '环境', '服务', '人均消费']. Records with zero scores or zero consumption are filtered out.

Feature engineering – A new column 性价比 (price‑performance) is created by summing the three rating columns and dividing by 人均消费 (average spend), yielding a score per yuan.

Model building – grouping – The cleaned DataFrame df (54,886 rows) is grouped by 类别 and the mean of each metric is calculated.

Outlier handling – Box‑plot visualisation is used to detect outliers; an outlier‑removal function deletes extreme values before scoring.

Scoring formula – A weighted formula (e.g., 口味:人均消费:性价比 = 2:5:3) combines the three indicators into a final score. The three intermediate DataFrames are merged with pd.merge and the final score is computed.

Result – The category "甜品" (dessert) obtains the highest score, indicating it as the most competitive.

Location analysis – Using QGIS, layers for population density, road density, restaurant density, competitor density, longitude and latitude are prepared. Missing values are filled with zero ( dealdata.fillna(0, inplace=True)) and the features are standardised. A composite index is calculated with weights 4:3:2:1 for the four density metrics.

Visualization – Matplotlib scatter plots and Bokeh interactive maps illustrate the spatial distribution of the composite index and highlight optimal sites: (121°472′E,31°301′N), (121°473′E,31°274′N), (121°493′E,31°244′N).

Conclusion – Opening a dessert shop at the identified coordinates maximises the combined score of market demand and competitive advantage.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data visualization QGIS Location Optimization Restaurant

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.