Description
The SUMMARIZES
function summarizes its input table, grouping by one or more columns.
Usage
SUMMARIZE (<Table Expression>, <groupBy_columnName>[, <groupBy_columnName>]…[, <name>, <expression>]…)
The summarize function evaluates to a new table. The new table will include any columns that are grouped upon, as well as the new expressions.
Examples
Avoid double count activity reports
If you have a programme with the goal of promoting certain nutritional practices for mothers of children five and under, and have partners reporting aggregated output indicators to you such as:
- Number of mothers reached through door-to-door outreach (
INPERSON
) - Number of estimated mothers reached through radio broadcasts (
RADIO
)
You may want to estimate your overall reach by combining these two indicators. However, if you have one partner conducting door-to-door outreach and another partner running radio broadcasts in the same geographic area, you don't want to double count those indicators.
Partner | District | INPERSON | RADIO |
---|---|---|---|
A | North | 500 | |
A | South | 600 | 1,000 |
B | North | 2,000 | |
B | East | 5,000 | |
C | West | 300 |
In this case, if you added up all of these indicators, you would risk double counting the outputs in the Nothern district where 500 mothers were reached in person, but 2,000 were reached via the radio. It's likely that many mothers received both an in-person visit and heard the message on the radio.
One solution to this is to first add up the results at the district level, and take either the in-person numbers, or the radio numbers, which ever is greater.
We'll first use the SUMMARIZE function to compute a new table with SUMS of the two fields by district.
SUMMARIZE([Partner reports], Province, "TOTAL_INPERSON", SUM(INPERSON), "TOTAL_RADIO", SUM(RADIO))
This gives us a new table that looks like:
District | TOTAL_INPERSON | TOTAL_RADIO |
---|---|---|
North | 500 | 2,000 |
South | 600 | 1,000 |
East | 5,000 | |
West | 300 |
Now to find the total reach, we can use the SUMX function to calculate the MAX of each row, and then sum the maxes:
SUMMARIZE([Partner reports], Province, "TOTAL_INPERSON", SUM(INPERSON), "TOTAL_RADIO", SUM(RADIO)) |>
SUMX(MAX(TOTAL_INPERSON, TOTAL_RADIO))
This gives us a final result of 2,000 + 1,000 + 5,000 + 300 or 8,300.
Tracking changes in health status over time
Suppose you are tracking the number of sick family members across households over time. You have a form “Household Health Check-in” that contains records for the number of sick family members on a given date of reporting as follows:
Family | Number of sick family members | Date of reporting |
---|---|---|
Ahunna | 3 | 2023-01-15 |
Bilal | 1 | 2023-02-10 |
Ahunna | 4 | 2023-01-20 |
Bilal | 2 | 2023-01-15 |
You can define a calculated table that shows the earliest and latest reported number of sick family members for each family using the following formula:
SUMMARIZE([Household Health Check-in], [Family],
"Baseline", FIRST([Number of sick family members], [Date of
reporting]),
"Final", LAST ([Number of sick family members], [Date of reporting])) |>
ADDCOLUMNS(
"Change",
Final - Baseline,
"Pct change", (Final - Baseline) / Baseline)
The result would be a table:
Family | Baseline | Final | Change | Pct Change |
---|---|---|---|---|
Ahunna | 3 | 4 | 1 | 33% |
Bilal | 2 | 1 | -1 | -33% |