Fundamentals of data modelling in humanitarian and development contexts

Interested in more webinars and material on M&E? Sign up for our newsletter!

About the webinar

The Data Modelling webinar series is a sequence of two live sessions designed for professionals interested in mastering data modelling concepts and techniques. These webinars form a comprehensive course that will provide a deep understanding of data modelling, including creating and utilizing data models within ActivityInfo using Miro, and adopting best practices for effective data modelling.

The series is addressed to entry to intermediate level professionals, and it is highly recommended to attend or watch the recordings of both sessions in consecutive order to fully benefit from the course.

About this session

A good data model that is tailored to your project requirements is an essential part of designing and implementing an effective information management system.

During this webinar, we explain the importance of creating an effective data model when it comes to designing databases and walk you through some practical steps for creating a data model for your own project’s database.

In summary, we explore:

What is data modelling?
Why create a data model?
Fundamental principles and frameworks of data modelling
Importance of data models in organizing and analyzing information

The data modelling process:

Identifying data entities
Creating entity relationship diagrams

Data modelling best practices:

Considering the role of end user experience
Tips for aligning user experience with database functionality
Creating data models that facilitate analysis
Most common data models in humanitarian and development contexts

View the presentation slides of the Webinar.

Is this Webinar series for me?

Do you wish to understand the basics of data modelling so you can design your own databases?
Are you looking for information and inspiration for building an information system for your organization but don't know where to start?
Are you an ActivityInfo database administrator or is this a role you would like to take on?

Then, watch our webinar!

Questions and answers

What are the best practices for selecting attributes when designing a data model for individual entities, and how can we ensure these attributes are relevant and comprehensive?

This is easily one of the most important things you need to be mindful of during the data modelling process.When designing a data model for individual entities, the best practices for selecting attributes include ensuring that each attribute is unique, clearly defined, and essential for identifying or describing the entity. Attributes should be relevant to the purpose of the database, avoiding redundancy and focusing on those that will be used for analysis or operations. To ensure attributes are relevant and comprehensive, engage stakeholders to gather requirements, review existing data sources for necessary fields, and continually validate and update the model based on real-world use and feedback to maintain accuracy and utility.

Explain 3NF with more practical examples. Differentiate between 2NF and 3NF.

2NF: Ensures no partial dependencies (non-primary key attributes should fully depend on the primary key). 3NF: Ensures no transitive dependencies (non-primary key attributes should depend only on the primary key, not on other non-primary key attributes).

In your explanation of the 2nd Normalization, you focused a lot on the primary key. I felt the idea of creating new table for attributes that relate to multiple rows was not well elaborated. Kindly revisit this.

To achieve Second Normal Form (2NF) and eliminate partial dependencies in a database table, you create new tables for attributes that do not fully depend on an entire composite primary key for instance. For example, in a StudentCourses table with StudentID and CourseID as the composite primary key, attributes like StudentName and CourseName that only depend on parts of the primary key should be moved to separate tables (Students and Courses). This way, Students will store StudentID and StudentName, Courses will store CourseID and CourseName, and StudentCourses will link these with StudentID and CourseID as foreign keys. This ensures that all non-primary key attributes in each table fully depend on their respective primary keys, achieving 2NF.

In terms of normalization when is 1st normal used? Is it during the first phase of identifying entities or at a much later stage?

First Normal Form (1NF) is used during the initial phase of designing a database when you are identifying entities and their attributes. The goal at this stage is to ensure that the data is organized into tables where each column contains only atomic, indivisible values and each row is unique. This is a foundational step to eliminate repeating groups and ensure that each field contains only one piece of information.

Is there any difference between indicators and entities?

Entities and indicators serve different purposes in data management and evaluation. Entities are objects or concepts, like "Students," "Teachers," or "Courses," about which data is collected and stored in a database, with each entity having specific attributes. Indicators, on the other hand, are measurable metrics used to assess the performance or impact of a program or project, such as "Student Enrollment Rates" or "Average Test Scores." While entities form the structural foundation of a database by organizing data into identifiable objects, indicators are used to monitor and evaluate progress and outcomes, providing insights into the effectiveness of activities and interventions.

Can you explain the trade-offs between the different normal forms in data normalization, particularly when it comes to balancing data integrity and query performance in large-scale databases?

A very important question I must say; Your question implies that balancing data integrity and query performance in large-scale databases involves trade-offs between different levels of normalization. and I agree. First Normal Form (1NF) simplifies the data structure but may still have redundancy. Second Normal Form (2NF) and Third Normal Form (3NF) reduce redundancy and improve data integrity by eliminating partial and transitive dependencies, respectively, but they also increase the number of tables and required joins, which can slow down queries. Higher normalization ensures maximum data consistency but at the cost of query complexity and performance. Often, a hybrid approach is used: normalizing core tables for data integrity while selectively denormalizing or using indexing to enhance query performance, thus achieving a practical balance for specific application needs.

How do I ensure data integrity with an inherited database?

There are many things you can do. Generally, to ensure data integrity with an inherited database, start by thoroughly auditing the existing schema and data to identify any inconsistencies or anomalies. Implement appropriate normalization if needed to reduce redundancy and ensure proper relationships between tables. Establish and enforce data constraints such as primary keys, foreign keys, unique constraints, and check constraints to maintain accurate and consistent data.

Is it a must for an entity to have different components/elements for fulfilling its criteria?

No, it is not a must for an entity to have different components or elements to fulfill its criteria. An entity can be simple, consisting of a single attribute or element, as long as it effectively represents a distinct and identifiable object or concept within the database. However, in many cases, entities have multiple attributes to provide a more detailed and comprehensive description, ensuring they meet the specific requirements and use cases of the database design. The complexity of an entity depends on the needs of the database or project and the level of detail required for accurate data representation and management.

How can data modeling be effectively utilized to enhance decision-making and impact assessment in humanitarian and development operations?

We need to start from the premise that data modeling makes data more understandable and actionable. Data modeling supports better planning, quicker responses, and more effective interventions, leading to improved outcomes for affected communities. Therefore, data modeling enhances decision-making and impact assessment in humanitarian and development operations by organizing complex data into clear, visual frameworks. This helps predict needs, allocate resources efficiently, and provides a bird's-eye view of the areas that will significantly contribute to evaluating program success. From your data modeling exercise, you can already begin to see improved coordination among stakeholders, which is a precursor to effective decision-making.

List of sessions

Data modelling series:

Fundamentals of data modelling in humanitarian and development contexts (current)
Data Modelling in practice

About the Trainer

Victoria Manya has a diverse background and extensive expertise in data-driven impact, project evaluation, and organizational learning. She holds a Master's degree in local development strategies from Erasmus University in the Netherlands and is currently pursuing a Ph.D. at the African Studies Center at Leiden University. With over ten years of experience, Victoria has collaborated with NGOs, law firms, SaaS companies, tech-enabled startups, higher institutions, and governments across three continents, specializing in research, policy, strategy, knowledge valorization, evaluation, customer education, and learning for development. Her previous roles as a knowledge valorization manager at the INCLUDE platform and as an Organizational Learning Advisor at Sthrive B.V. involved delivering high- quality M&E reports, trainings, ensuring practical knowledge management, and moderating learning platforms, respectively. Today, as a Customer Education Specialist at ActivityInfo, Victoria leverages her experience and understanding of data leverage to assist customers in successfully deploying ActivityInfo.