Wednesday 3 August 2011

Validation: black or white list

When you’re validating data – either client- or serverside – there are basically two strategies you can choose between. You can either blacklist data or white list data. Blacklisting seems to be the most popular way to validate data, but white listing is so much better. Here’s a brief description of the two strategies and why the white listing is better.

The black listing strategy is validating you input against a list of characters which are illegal in the input. You can either reject input containing the blacklisted characters or just remove them from the input.
The white listing strategy is just about the opposite. You have a list of allowed characters, and any character, which isn’t in the list is rejected or removed.
Suppose the user has to input a phonenumber. It may look in a number of different ways:
80808080
8080-8080
80-80-80-80
80 80 80 80
+45 8080 8080
So when white listing we’re allowing numbers and just numbers.
When blacklisting we’re disallowing letters, spaces and special characters.
Suppose we want to allow international numbers – which is often written with a plus sign to signify the international access code.
Altering the white list is quite simple – we add the plus character to the allowed list of characters.
Altering the black list is a mess. We alter “no-special characters” to allow the plus character but not any other special characters – but do you know them all?
In white listing you know what you allow to pass thorugh, but do you know what you’ve forgotten, when you black list?