How to build an effective data model design
In our previous article, we introduced data modelling and emphasized its importance along with the key frameworks necessary for creating a robust data model. This article looks into the process of designing a data model, drawing an analogy to creating a blueprint for a house. This comparison helps clarify the steps involved in data modelling, from gathering and defining requirements to designing and documenting the model. By examining data modelling through this lens, organizations can gain a deeper understanding of each phase's significance, ensuring that the final system is both efficient and aligned with their goals.
You can also watch the webinar here to learn more about the Fundamentals of data modelling in humanitarian and development contexts
Understanding requirements: Gathering needs
Designing a data model can be compared to building a blueprint for a house. Just as understanding the homeowner’s preferences; such as the number of rooms, style, layout, and functionality; is essential for creating the right house design, gathering requirements from stakeholders is crucial for developing a data model. Here is a list of things the stakeholders should be aware of:
1. Purpose and objectives: Theory of Change
To create a successful data model, stakeholders need to have a clear understanding of the purpose and objectives of the system. This often aligns with the Theory of Change, which focuses on the overall mission of the project and the steps needed to achieve it.
Key questions stakeholders should answer:
- What is the objective of my intervention?
- What is the pathway of change? (i.e., how does the intervention lead to the desired outcome?)
2. Data requirements: MEAL plan
For a data model to be effective, it must incorporate the organization’s data requirements, which are often guided by a MEAL plan.
Key questions include:
- What are the indicators that will help me monitor and evaluate progress?
- How are these indicators calculated?
- What is the data source for these calculations, and in what format is the data stored?
- How will I use this information to make decisions?
3. Requirements from stakeholders: Data flow
To ensure that the data model reflects the real-world flow of information, it is essential to gather requirements from stakeholders about the data lifecycle—how data is collected, accessed, and analyzed within the organization.
Key questions to address:
- Who collects the information, and how often?
- Who has access to the information, and how frequently?
- Who is responsible for analyzing the information, and how often is this done?
Designing a data model: A checklist
When designing a data model, it’s essential to ask key questions to ensure the model meets your needs effectively. Use this checklist to guide you through the process:
1. Define your objectives (WHY) Begin by outlining the objectives of your database. Clearly specify the use case, such as tracking project performance, monitoring funding impacts, or generating reports. Understanding the purpose of the database helps in aligning its design with the goals of the organization or project, ensuring it provides the necessary insights and supports decision-making effectively.
2. Identify data to capture (WHAT) Determine the types of data that need to be captured to meet the defined objectives. This includes specifying the data elements, their sources, and how they will be collected.
3. Identify database users (WHO) Identify who will be using the database. Consider their roles, skills, and the importance of the information to them. Designing the database with the end-users in mind ensures that it is tailored to their needs and capabilities, facilitating easier adoption and more effective use.
4. Describe data usage (HOW) Detail how users will interact with the data. This includes outlining data flow, connections, validation processes, and analysis methods. Describe data retention policies and quality standards. Understanding these aspects is essential for designing a model that supports the needs of key decision-makers and ensures data integrity and usability.
Making data model design easy
As discussed, above, there are some steps we need follow when designing a data model:
- Identify entities
- Identify attributes
- Identify relationships
- Apply normalization
- Apply constraints and rules
- Document
Let’s see in detail what each step entails while drawing parallels from the world of house construction.
House | Data model |
---|---|
Rooms (kitchen, bathroom) | Entities (beneficiaries, relief items, donors, distribution centres) |
Rooms’ characteristics (dimensions, window’s locations, doors placements, types of flooring) | Attributes (beneficiary ID, Name, Age, Gender, Location) |
Hallways and other connections (stairs and corridors) | Relationships (how data entities are related to each other) |
Room optimization and structure integrity (one central kitchen instead of multiple smaller ones) | Normalization (tables to avoid redundancy, data linked with keys) |
Laws and regulations for legal standards (zoning laws, safety regulations, building codes) | Constraints and Validation (primary and foreign keys, validation rules) |
Architectural plans, specifications | Documentation (data dictionaries, entity-relationship diagrams, descriptions of attributes and relationships) |
Entities: Rooms and spaces
Once you have a clear understanding of your requirements, the next step is to identify the entities within your data model. In a data model, entities represent the primary components or objects, much like the rooms and spaces in a house blueprint. Just as a house blueprint outlines the structure and layout of a home, a data model defines the structure and relationships of the data within a system.
House blueprint example: In a house blueprint, you have various rooms such as the kitchen, bedroom, bathroom, and living room.
Data model example: Similarly, in a data model, entities might include Beneficiaries, Relief Items, Donors, and Distribution Centers.
Beneficiaries: Among these entities, it is crucial to identify the focal entity of your data model. The focal entity is central to your model and often reflects the primary focus of the system's functionality.
Identifying and defining your entities: accurately ensures that your data model is well-organized and effectively supports the objectives of your system.
Attributes: Room features and details
Attributes provide detailed information about each entity, much like the specific features and dimensions of each room in a house blueprint. These attributes define the characteristics and functionality of the entity within the system, ensuring that each entity is uniquely identifiable and serves its intended purpose.
House blueprint example: In a house blueprint, attributes of rooms include dimensions, window locations, door placements, and types of flooring. These details define each room’s function and ensure that it fits within the overall design of the house.
Data model example: In a data model, attributes provide detailed information about entities, making them unique and functional within the system. For example, for the Beneficiary entity, attributes might include Beneficiary ID, Name, Age, Gender, and Location. Each of these attributes is crucial for accurately identifying and differentiating between beneficiaries.
Attributes help to describe and distinguish entities in a data model, allowing for precise and meaningful data management.
Relationships: Hallways and connections part
Relationships define how entities are connected to one another, much like how hallways, doors, and stairs connect different rooms in a house. These connections illustrate how various parts of the system interact and depend on one another.
House blueprint example: In a house blueprint, hallways connect rooms like the living room to the kitchen, while stairs provide access between different floors. These connections facilitate movement and interaction between different parts of the house, allowing the space to function cohesively.
Data model example: In a data model, relationships link entities to show how they interact. For example, beneficiaries might receive relief items, or donors might fund distribution centres. These relationships illustrate how data entities are related and interact with each other. Effective modelling of these relationships ensures that data integration across various tables and reports is seamless and accurate.
The house floor plan highlights the kitchen as the focal area, with lines connecting it to other rooms such as the living room, bedroom, and dining room. This emphasizes the central role of the kitchen in the layout.
The connections illustrate the relationships between the focal entity and other entities.
The database schema highlights the 'beneficiary' table as the central entity, with connections to related tables like 'personal details', 'services received', and 'contact information'. This illustrates the relationships between the focal entity and other entities in the database.
Establishing relationships between entities is crucial for data consistency and accuracy. These relationships are defined as cardinality.
One-to-One: In a one-to-one relationship, each entity in a pair is related to exactly one entity in the other pair. For example, a refugee might have a unique identification card. In this case, each refugee is connected to one ID card, ensuring that there are no duplicates and each refugee can be uniquely identified.
One-to-Many: In a one-to-many relationship, a single entity is related to multiple instances of another entity. For example, consider a humanitarian organization that provides various services to refugees, such as language classes or health checkups. One organization might offer multiple services, and each refugee can access multiple services. Here, one organization is related to many service instances.
Many-to-One: In a many-to-one relationship, multiple instances of one entity relate to a single instance of another entity. For instance, if multiple refugees are served by a single distribution centre, many refugees (many-to-one) are associated with one distribution centre. This relationship helps in tracking and managing the distribution of resources efficiently.
Many-to-Many: In a many-to-many relationship, multiple instances of one entity relate to multiple instances of another entity. For example, multiple refugees might participate in multiple support programs. In this case, we track which refugees attended which programs. This relationship requires a junction table to manage the associations and ensure accurate tracking of program participation.
Normalization: Structural integrity and optimization
Normalization in a data model ensures data is organized efficiently and without redundancy, similar to how an architect ensures the structural integrity and optimal layout of a house.
House blueprint example: Ensuring each room serves a distinct purpose and optimizing space to avoid unnecessary duplication (e.g., one central pantry instead of multiple small storage areas).
Data model example: Organizing data into tables to avoid redundancy, ensuring each piece of data is stored only once, and linked appropriately through keys.
How to effectively employ normalization: Following the first rule of normalization, Each attribute (column) in a table must contain only atomic (indivisible) values. This means that each cell of the table should hold a single, non-repeating value. Attributes should not contain lists, arrays, or nested structures.
Following the second rule of normalization, all other values must be functionally dependent on the whole primary key.
Lastly, following the third rule of normalization, there should not be any transitive dependencies. A transitive dependency occurs when a non-key attribute depends on another non-key attribute, rather than directly on the primary key.
Constraints and rules: Building codes and regulations
Constraints and rules in a data model ensure data integrity and accuracy, just as building codes and regulations ensure the safety and compliance of a house.
House blueprint example: In a house blueprint, builders must adhere to building codes, zoning laws, and safety regulations. These rules govern everything from electrical wiring to fire safety and structural stability, ensuring that the house is safe for inhabitants and compliant with legal standards.
Data model example: Similarly, in a data model, constraints and rules help maintain data integrity. This involves implementing primary keys (unique identifiers for each entity), foreign keys (which establish relationships between entities), and validation rules (to ensure data accuracy and consistency).
Documentation: Detailed plans and specifications
Comprehensive documentation is essential for ensuring clarity and guiding successful implementation in both construction and data modelling. Proper documentation acts as a reference, ensuring that all stakeholders understand the design, structure, and details of the system or project.
House blueprint example: In the context of a house blueprint, documentation includes detailed architectural plans, material lists, and specifications. These documents provide a clear roadmap for contractors and builders, outlining every aspect of the construction process, from the size of the rooms to the type of materials required.
Data model example: Similarly, in a data model, comprehensive documentation includes data dictionaries, entity-relationship diagrams, and thorough descriptions of all data attributes and relationships. A data dictionary defines the meaning, format, and usage of each attribute, while entity-relationship diagrams visually represent how entities interact with one another. Proper documentation ensures that developers, analysts, and end-users can effectively navigate and work with the data model.
Conclusion
In essence, a data model is crucial for effectively organizing and managing data within a system. It simplifies complex data interactions through a structured approach, particularly with relational models that use tables and keys for flexibility and scalability.
The analogy presented in this article exhibits data modelling as comparable to building a house because both processes require careful planning, structure, and adherence to stakeholder needs. Just as a house blueprint outlines the layout and function of rooms, a data model defines entities, attributes, and relationships that guide how data is organized and flows through a system. By understanding requirements, ensuring proper connections, and following best practices like normalization and constraint setting, data modelling achieves efficiency and integrity. Like detailed architectural plans, documentation in data modelling provides clarity for both developers and users, ensuring the system serves its intended purpose effectively.
Resources and further reading
Ai, P. (2023). Understanding normalization in databases: a crucial step in data organization. https://www.linkedin.com/pulse/understanding-normalization-databases-crucial-step-data-organization-dwnwc/
Relational Databases: Examples, management & Types. (n.d.). Vaia. https://www.vaia.com/en-us/explanations/computer-science/databases/relational-databases/
Shan, J. (2024). Universal Data Modeling Series 3: Conceptual Model 1. Intro to Entity. Medium. https://medium.com/@junshan0/conceptual-model-1-intro-to-entity-b353f405ee5f
Watt, A. (2014). Chapter 8 The Entity Relationship Data Model. Pressbooks. https://opentextbc.ca/dbdesign01/chapter/chapter-8-entity-relationship-model/
What is a Database Model. (n.d.). Lucidchart. https://www.lucidchart.com/pages/database-diagram/database-models
What is data modeling and why is it important. (2024). Zuci Systems. https://www.zucisystems.com/blog/what-is-data-modeling-and-why-is-it-important/