Regular expressions are powerful tools for describing patterns in text, including phone numbers. You can use this function to validate whether a collected phone number is well-formed.
In this tutorial, we'll look at how to build up a regular expression step-by-step to validate collected phone numbers.
Understanding the domain
Before writing a regular expression, it's important to understand the expected format of the text. In this example, we'll look at validating mobile phone numbers from Democratic Republic of the Congo (DRC).
In the DRC, the numbering scheme for mobile phones is as follows:
- Vodacom: 81X XXX XXX or 82X XXX XXX
- Supercell: 80X XXX XXX
- Congo-Chine Telecom: 84X XXX XXX
- Celltell: 97X XXX XXX, 98X XXX XXX. or 99X XXX XXX
Where "X" is any digit, 0-9.
We can use the patterns above to write a validation rule that tests for a valid mobile number.
A simple regular expression
Let's start by matching the Vodacom numbers. Each Vodacom mobile number starts with "81" or "82" and another digit, and then two groups of three digits:
REGEXMATCH(PHONE_NUMBER, "8[12][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9]")
Here we use the character class notation to specific a range of characters:
[12]
matches either "1" or "2"[0-9]
matches any character between "0" and "9"
We can shorten this expression by using repeating character classes:
REGEXMATCH(PHONE_NUMBER, "8[12][0-9] [0-9]{3} [0-9]{3}")
The {3}
indicates that the character before must be matched exactly three times.
Matching any of the mobile providers
The big difference between the different providers is the first group of digits:
Provider | Pattern of first group |
---|---|
Vodacom | 8[12][0-9] |
Supercell | 80[0-9] |
Congo-Chine | 84[0-9] |
Celltell | 9[789][0-9] |
We can match any of the providers above by grouping them together using parentheses, and using the pipe operator (|) to specify alternatives:
REGEXMATCH(PHONE_NUMBER, "((8[12][0-9])|(80[0-9])|(84[0-9])|(9[789][0-9])) [0-9]{3} [0-9]{3}")
Actually, the only difference between the first three providers is the second digit, so we could further simply to:
REGEXMATCH(PHONE_NUMBER, "((8[0124][0-9])|(9[789][0-9])) [0-9]{3} [0-9]{3}")
However, this will match a phone number wherever it occurs in the text. So "801 343 933" will match, but so will "TEL: 801 343 933". We'd prefer that only the phone number is entered into the text field, so we can use ^
symbol to require matching at the beginning of the field, and $
to require matching at the end:
REGEXMATCH(PHONE_NUMBER, "^((8[0124][0-9])|(9[789][0-9])) [0-9]{3} [0-9]{3}$")
Calculating the name of the mobile providers
We can also use the regular expression to calculate the name of the mobile provider from the phone number using the IF
function together with the REGEXMATCH
function:
IF(REGEXMATCH(PHONE_NUMBER, "^8[12][0-9]"), "Vodacom",
IF(REGEXMATCH(PHONE_NUMBER, "^80[0-9]"), "Supercell",
IF(REGEXMATCH(PHONE_NUMBER, "^84[0-9]"), "Congo-Chine",
IF(REGEXMATCH(PHONE_NUMBER, "^9[789][0-9]"), "Celltell",
"Other"))))
Building the form
Now that we have our two formulas, we can add a data collection form in ActivityInfo.
Add the form
- Navigate to your database.
- Click "Add Form"
- Enter the form name "Registration"
Add the phone number field
- Add a text field with the name "Phone Number" and the code
PHONE_NUMBER
. - Check "Set Validation Rules"
- Paste the validation formula above into the validation rules box, or open the formula editor
Add the mobile provider field
- Add a calculated field called "Mobile Provider"
- In the formula editor, paste the IF-statement from above.
Testing the form
When you open the form in data entry, typing a phone number that doesn't match the format for DRC should result in a validation error:
When the input does match, the field is validated, and the calculated field shows the correct mobile provider:
You can try the form yourself.