Formal Naming of Data Schemas, Data Structures, and Data Models: Definitions and Hierarchy
The article explains how to formally name data schemas, data structures, and data models, clarifies their relationships, introduces physical, business, logical, and deployment schemas, and proposes strategic and tactical schemas to improve data architecture and management across organizations.
After discussing data architecture and data structures, the article asks what distinguishes data architecture from data structure and how data schema, data structure, and data model should be formally named.
Historically, many attempts have been made to name specific data structures and the data models that contain them, often referencing the Zachman framework, which has caused confusion and a proliferation of inconsistent names.
A better approach is to formally name each data schema (outline), then name data structures based on the schema and domain, and finally name data models according to the structures they contain.
A schema is a diagrammatic representation, a structured framework or outline; a data schema is simply a diagram of a data structure. A data structure represents the arrangement, relationships, and content of data resources and must be documented with formal names, comprehensive definitions, and precise integrity rules.
With the emergence of databases in the mid‑20th century, two data schemas were identified: an internal (physical) schema that describes how data is stored in the database, and an external (business) schema that describes how applications use the data.
Because internal and external schemas often differ greatly, multiple external schemas can be developed for a single internal schema, leading to the definition of a third, conceptual schema as the common denominator between them.
To give these three schemas clearer meaning, the internal schema is renamed to *physical schema*, the external schema to *business schema*, and the conceptual schema to *logical schema*, establishing a development order of business → logical → physical.
Business data schemas are normalized into logical schemas, but normalization separates data, making it difficult to group similar entities (e.g., all employee data). This lack of technique creates significant data gaps in many organizations.
Adding *data view schemas*—the result of normalization—addresses this issue. By combining appropriate data view schemas, similar data can be grouped together. The process follows: business schema → normalized to data view schema → optimized to logical schema → denormalized to physical schema.
Before distributed data processing, this four‑step sequence worked well; confusion about data distribution and denormalization was resolved by inserting a *deployment schema* between logical and physical schemas. Logical schemas are deployed to deployment schemas via a process called data de‑optimization.
The resulting order is: business schema, normalized to data view schema, optimized to logical schema, de‑optimized to deployment schema, and finally denormalized to physical schema. These five basic schemas are illustrated below:
While these five schemas are effective for detailed data‑resource design, terminology challenges around the conceptual schema remain.
Introducing *strategic* and *tactical* data schemas resolves this: strategic schemas represent executive‑level perspectives, while tactical schemas represent management‑level perspectives.
Strategic and tactical schemas are essentially logical and are placed on top of the logical schema. A strategic schema can be refined into a tactical schema, which can be further refined into a logical schema through specialization; the reverse can occur via generalization.
The diagram below shows two main parts: general schemas (strategic and tactical) and detailed schemas (business, data view, logical, deployment, physical), together forming a three‑layer, five‑schema concept.
Questions arise about whether eight more generic schemas are needed to represent business, data view, deployment, and tactical/strategic physical schemas; the answer is that they are likely unnecessary or less useful than the formal schemas.
The seven formally named schemas (five detailed plus two general) can be prefixed with domain areas to create clear data‑structure names such as "facility strategic data structure" or "employee logical data structure".
A data model is more than a data structure; it must include a formal name, a comprehensive definition, and precise integrity rules. Combining structures with these components yields models like "facility strategic model" or "employee logical data model".
Data‑management professionals must formally name schemas, structures, and complete data models, develop them within a unified organizational architecture, and follow formal processes for normalization, optimization, de‑optimization, specialization, and generalization; otherwise, chaos, data divergence, and insufficient data resources will result.
Architects Research Society
A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.