Artificial Intelligence 18 min read

What Makes a Good Model? Understanding Model Concepts, Types, and Evaluation in Data Science

This article explores the definition of a model, distinguishes business, data, and function models, discusses criteria for a good model—including performance, fidelity to real‑world relationships, and interpretability—and examines why a universal model does not exist, all within the context of data science and AI.

DataFunTalk
DataFunTalk
DataFunTalk
What Makes a Good Model? Understanding Model Concepts, Types, and Evaluation in Data Science

Introduction : What makes a model a good model? This is a question that every data analyst and big‑data AI engineer has silently pondered.

To answer it comprehensively, we discuss three aspects: how to understand a “model”, what “good” means, and whether an all‑purpose model exists.

01 How to Understand “Model”?

The English word “model” derives from the Latin modus , meaning “measure” or “standard”. In Chinese, the characters “模” (norm) and “型” (form) together convey the idea of a form created according to a certain norm or standard.

Two key points are emphasized:

A. A model references a norm or standard but does not need to be an exact copy. For example, a physical airplane model replicates the real aircraft’s shape, while a user‑behavior interaction model abstracts payment actions into a data representation.

B. The realized form of a model can be a physical entity, a static abstract expression, or a dynamic relationship between entities. Examples include physical prototypes, tabular user profiles, family‑relationship diagrams, or causal statements such as “temperature rise causes ice to melt”.

In data‑science contexts, the “norms” can be complex real‑world phenomena, and the resulting “forms” are various data representations, both static and dynamic.

Data‑science models can be classified into three major categories:

1. Business Model

A business model reshapes real‑world problems into an abstract representation. The norm is the real‑world process, while the form is an abstract, often visual, depiction such as a flowchart of an e‑commerce app’s order‑processing logic (see Figure 1).

Figure 1: Business logic model of an e‑commerce app.

2. Data Model

A data model describes reality using organized data structures. Its norms come from real‑world scenarios, and its form is a concrete data representation, typically stored in databases or data warehouses.

Key components of a data model are:

Data structure – the types, attributes, and relationships of the data.

Data operations – the actions that can be performed on the data.

Data constraints – rules ensuring consistency, correctness, and integrity.

These foundations give rise to the three classic data models: hierarchical, network, and relational.

3. Function Model

A function model expresses the transformation relationship between variables. Its norm is the true relationship between real‑world variables, and its form is a mathematical function defined by domain, codomain, and mapping rule.

In machine learning, a function model is first chosen (the form), then its parameters are either manually set or learned from data. When parameters are learned from data, the model becomes a machine‑learning model, and the learning process is called training.

Common machine‑learning models, including deep‑learning models, are function models.

02 How to Understand “Good”?

A good model should reflect the real world with minimal cost while maintaining high fidelity. However, the concrete shape of “good” varies across scenarios.

In data‑science practice, the adage “good data beats good features, which beats good algorithms” highlights that data and features set the upper bound of performance; the model merely strives to approach that bound.

Three perspectives on a good model:

It delivers better business outcomes within given data scale and dimensionality.

It faithfully captures the underlying business relationships, guided by appropriate inductive bias.

It offers sufficient interpretability to explain predictions, which is crucial in domains such as medical diagnosis or financial risk control.

Interpretability can stem from explicit importance scores (e.g., decision‑tree, random‑forest, linear models) or from transparent structures (e.g., K‑NN, clustering). Neural‑network models, however, often lack interpretability, limiting their adoption in structured‑data tasks.

03 Is There an All‑Purpose Model?

Finding a universal model that handles every possible data, feature, and business knowledge is practically impossible. For a specific scenario, the principle of Occam’s razor—“entities should not be multiplied beyond necessity”—guides model selection.

Modifying a model arbitrarily without understanding its inductive bias can degrade performance; instead, one should evaluate the assumptions behind different models and feature‑engineering steps.

04 Conclusion

Data science is an information game that bridges technology, theory, and business. While a truly universal model is elusive, transfer learning (e.g., BERT, GPT‑3) demonstrates how pre‑trained models can achieve remarkable performance across many tasks.

Thank you for reading.

machine learningAImodel evaluationdata scienceModelInterpretability
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.