Here in Australia, 6–13 June 2008 is National E-Security Awareness Week. In line with this theme, Formulate would like to shed some light on security for electronic forms, particularly those on the Web.
Security is a massive issue, and in no way can we cover all the relevant points in one article. Nor would we propose that we have all the knowledge — it's saying something that there are far more security specialists than form specialists! However there are some 'basics' that anybody working with web forms would be well placed to get a handle on, and these basics are what this article hopes to help with.
An ever-present risk: transmission of data
As discussed in a previous article, there are many different types of electronic forms, depending on the degree of interactivity and intelligence that the form exhibits. Therefore, the term 'electronic form' encompasses everything from a PDF form that must be printed to be filled out (by hand) through to an entire, self-contained desktop application with multiple screens and parts for entering and retrieving information.
When considering issues of security, what matters is not so much what type of electronic form is being used but rather how the data entered into that form is transmitted to wherever it is going to be stored (and used). In the PDF form example above, the data can be transmitted in three ways, namely by:
- Postal mail;
- Facsimile; or
- Hand (i.e. in person by the form-filler, their proxy or a delivery person).
In the desktop application example, the data will often be transmitted to an external data source — such as a company server — which may be located in altogether a different physical location to the form-filler.
In both the hand-filled PDF form and desktop application examples, there is the transmission of data into an environment that is 'external' to the form-filler. This means that there is potential for the data to be compromised, and measures to ensure the security of the data — commensurate to the corresponding risk — should be put in place.
This message is worth dwelling on for a moment: no matter how data is transmitted, if it moves from one location to another, there is a security risk. Mail is stolen, faxes can be intercepted (or sent to the wrong address) and the physical form can be taken, lost or compromised in the process of hand delivery. Similarly, data that is transmitted electronically can be intercepted, and taken or corrupted, before it gets to its destination.
How data is transmitted electronically
In the electronic world, data is transmitted from one machine to another using a physical network of some sort. This network includes wired methods like the telephone line or optical cable, as well as wireless methods such as radio, microwave or infra-red frequencies.
It's important to understand that for many networks, it is not just the form-filler's computer and the data store that are involved: a number of different computers will be used in the transmission of the data.
For example, when transmitting over the Internet, at the very least the router(s) of an Internet Service Provider (ISP) will be used. When I ran quick check today I found that there were 10 different computers between my own and one instance of Google.com (as you can imagine, Google has lots of computers). These different machines are involved because the data has to be moved across several organisations to reach its final destination.
To enable the physical movement of data, there are various consistent 'systems' in place, such as the postal service and courier companies. These systems work in a certain way: in Australia you have to put mail in a specially marked post box, with a certain style of address and the correct amount of postage. Anyone wanting to use the system has to conform to these "protocols".
Movement of data over an electronic network also relies on protocols. For example, the POP (Post Office Protocol) is commonly used for email. Voice and video calls over the Internet often use the Session Initiation Protocol (SIP). For transmitting data over the Internet, the main protocol is HyperText Transfer Protocol, or HTTP for short.
The problem with unsecure protocols
These protocols describe how the data can be transmitted. They are the equivalent of the postal system specifying what is and is not an acceptable address. However, like the postal service, they don't provide any particular security arrangements or guarantees.
With a little effort, anyone can break into a post box on the street and steal the mail it contains: envelopes does nothing to "protect" each letter from being opened and read. Similarly, the electronic protocols described above send data 'in the clear' (i.e. without being encrypted). This means that with a little effort, the data can be read, altered or deleted by any computer between the form-filler and the data store. This computer could be one of the many that are normally part of the network, as described above, or it could be a different computer that has 'broken into' the transmission.
The solution to the problem is to use a secure version of the relevant protocol. For web forms the appropriate protocol is HTTPS: HTTP with "S" for "Secure" added to it. (There are equivalent versions for other protocols, such as POP, but for simplicity I won't discuss those here.)
HTTPS provides security by encrypting the data, which means it cannot be read without first unlocking it, using a special code or "key". In this way the secure version of the transmission protocol is analogous to putting the physical letter inside a locked safe before you put it in the mail box.
If a web form contains even just one field collecting private data (e.g. credit card number, date of birth or a password), it should probably be implemented using HTTPS. Otherwise, you are opening up the data to being seen, used and compromised by a third party.
HTTPS and digital certificates
To use HTTPS, you must have a "digital certificate". The digital certificate contains the key used in encryption, as well as a digital signature. This digital signature should come from an independent third party (e.g. VeriSign, Thawte or Comodo).
Having the digital signature come from a independent third party provides an additional layer of security: authentication. When the data transmission is requested, the identity of the destination computer can be verified, by using this third party.
Malicious use of electronic forms
So far we've focused on securing data from the form-filler to the data store. There is, however, another whole major area of form security that should be raised in any half decent beginner's guide: malicious use.
Forms, unlike other web pages, represent an opportunity for people to send data to a computer. This data can be things like offensive text, advertising, malware or viruses. And the data doesn't need to be entered by a human: more often than not, malicious use of forms is done by scripts and robots.
The malicious use of electronic forms can be minimised by:
- Programming your forms to accept only legitimate data (e.g. limiting the number of characters in a street number and preventing the entry of hyperlinks)
- Implementing a mechanism to prevent submission of your form by non-humans.
You'll note that the first option — programming your forms to accept only legitimate data — is good practice from the perspective of usability as well as security. Such limitations reduce the proportion of unintentional errors and provide visual cues about what sort of data is needed. Be careful, however, to ensure that every possible case has been catered for (within reason). There's nothing more frustrating than having a form not allow you to enter legitimate data. How many of you have filled out a form that purports to be international, but only accepts postal addresses in the local format?
Preventing non-humans from filling out forms is a little trickier. The underlying principle is to present the user with a task that only a human could complete, thereby verifying that they are a legitimate submitter of the form. Such a task is called a CAPTCHA, which is an acronym for Completely Automated Public Turing test to tell Computers and Humans Apart, and was developed initially at Carnegie Mellon University.
However, developing a computer-based task that only humans can do is quite difficult, especially if you're also trying to minimise the burden on the respondent and make the task accessible to people with disabilities. Common approaches — that don't necessarily conquer all these issues — include:
- Text-based CAPTCHA, where the user has to do things like enter a string of characters from a scrambled/animated picture or sound bite, perform a mathematical calculation (displayed in an image that can't be 'read' by a computer) or answer a 'common sense' question (e.g. Which day comes after Tuesday?).
- Image-based CAPTCHA, where the user has to do things like pick an odd image out of a set (e.g. a picture of a bucket amongst pictures of trees) or use an image with numbers next to each item it contains (a scene of a room, for example) and enter the number that corresponds to a certain item.
- Measure the time it takes for the form to be filled out and discard it if less than a certain minimum threshold.
- Include a hidden field on the form. Some robots and scripts will see this hidden field and complete it—the theory is that humans will not. Therefore, if the form data being submitted has the hidden field completed, it can be disregarded. However, many robots and scripts are clever enough to know not to complete a hidden field; and some humans may complete the field if they are using assistive technology which happens to "display" it.
- Allow form submission only after authentication of the form-filler (e.g. require login to access the form).
Unfortunately, soon after most new techniques for verifying legitimate web form submission come out there are tools that can be used to break them (with varying degrees of effort). Also, the more intelligent computers become, the more work will be required to distinguish ourselves from them. For example, many people are currently working on improving computer's ability to understand (i.e. 'parse') natural human languages. This work will improve the usefulness of things like search tools, but also make it easier for computers to complete the tasks we set them.
Balancing cost and benefits
These are difficult problems at the frontline of research into web form security. It is likely that we will never be able to 100% ensure the security of our forms and the data they submit. But we can work to minimise the problems that are incurred and the costs of these problems to both form-fillers and users of the resultant data.
The thing to remember is that, as Bruce Schneier says, "security is a trade-off". You need to weigh up the risk of the status quo (problem likelihood and consequences) with the cost of mitigation (both in terms of resources and the user experience).
Schneier B. The psychology of security. Blog of Bruce Schneier. Dated 18 January 2008, accessed 10 June 2008.
Cisco Internetworking Basics. Not dated, accessed 10 June 2008
CAPTCHA: Telling Humans and Computers Apart Automatically. Not dated, accessed 10 June 2008.
Inaccessibility of CAPTCHA - Alternatives to Turing Tests on the Web. Dated 23 November 2005, accessed 10 June 2008.
The CAPTCHA Alternatives. Dated 9 November 2006, accessed 10 June 2008.