Anybody who spends time trying to build networking devices, realizes that there are significant development challenges to overcome. Linux-based devices are no exception. Without proper care, projects might have inappropriate code sizes (and are thus not cost effective), poor performance, or interoperability issues.
Nowhere is this more true than in low cost VPNs, which demand strong cryptography support. In this market, price is everything. Users want all of the new features and robustness of Linux, but they want it cheap. Since price is everything, the cost of a second Flash memory can make the difference between product success or failure; adding a hardware crypto engine is even worse. Vendors can be tempted to cut corners: stability, standards compliance, processor independence, and overall code quality can suffer as a result.
The Special Needs of VPN
Low-cost VPN devices immediately face several hurdles:
Current crypto algorithms require significant processing power, but low cost boxes usually use low-speed, limited functionality devices.
Adding specialty crypto processors adds even more expense.
Writing or re-writing crypto algorithms is difficult work, and the field of experts is limited.
Desktop and server systems use software, but with processor speeds that are often ten or twenty times higher than those used in low-cost routers, and with much more memory that help mask remaining speed issues. Performance is generally good, but some important aspects of desktop code, such as the configuration loader, are often terribly bloated by embedded system standards.
Many small devices are derived from desktop code (even if adapted to an RTOS); software crypto is the rule, occasionally with some minimal hardware support for time-critical functions, or those which can be incorporated into hardware relatively easily (such as DES/3DES). The limited power of many embedded processors makes writing optimized crypto algorithms quite difficult, and so that is not usually done either. Not surprisingly, the results are often less than ideal.
The architecture (processor, and system) for small embedded devices is rarely even a shadow of that used on a desktop or server, so for those considering the use of code derived from desktop sources, a massive amount of work can be involved. If done well, the quality of the cryptography is good though, even if the performance is poor; unfortunately, the processor limitations often force short cuts that hamper the quality. This is especially true of commonly used packages such as FreeSwan, which is very large and complex. Thus, the number of low cost products such as VPN boxes that include good quality, high-performance crypto support, has been limited.
Accelerator chips can help immensely, and some vendors take that path. However, such chips greatly increase the Bill of Material cost and so cost-conscious vendors have been unwilling to include them. And there is still integration work to do; as a result, some products have not been fully standards-compliant.
Weighing these factors for current devices, it would seem that the best alternative would be to use optimized software for most crypto functions, especially those that are either relatively straightforward or that are used infrequently. For more commonly used functions, hardware support can improve overall performance. Good performance can be gained by incorporating ‘inner-loop’ functions in hardware, with appropriate added software. But as noted previously, many processors don’t include any form of crypto hardware support, so the chosen system should have a plug-in architecture that allows use of acceleration hardware when it is available, but with a fall-back to software replacements if needed on a given device.
Security should be implemented in the IP stack, i.e., IPsec. The reason is simple: there’s no need to create custom (secure) packages for the higher software levels, and they can all take advantage of stack-level security without modification. Imagine the mess if you had to create custom versions of every common application program on a system, such as FTP clients, web browsers, etc.
IPsec, though quite complex, is built upon two key components:
A Tunnel component used to encrypt, authenticate, and encapsulate IP packets between two hosts or VPN networks.
An Internet Key Exchange (IKE) component used to negotiate the algorithms and ciphers (that is, to generate and respond to a Security Association) used for the tunnel, via ‘shared secrets’.
There are a variety of implementations available for these components. For example, the tunnel component from FreeSwan is called ‘Klips’. A custom tunnel component is used on many current routers, running on an RTOS. Another package, from Tobias Ringstrom, is called ‘ipsec_tunnel’. Yet another is from BSD.
Interoperability between tunnel implementations is desirable, although some vendors might be tempted to create devices that can only communicate with their own products, resulting in vendor lock-in. At least a few vendors claim to be certified, but careful testing is still warranted. When testing devices, verify that they can work with devices from other vendors, and with a variety of settings for each SA mechanism. For example, be certain to test both authenticated and encrypted headers, with a variety of ciphers (always including at least DES and 3DES), and with at least MD5 digests and SHA1 secure hashes.
IKE components are also available from a variety of sources; one such source is ISAKMPD from OpenBSD. There are a large number of options associated with IKE, and while it is not necessary to support all functions and options, in general one will find that supporting a larger number of options will result in the ability to communicate with a larger number of other devices, some of which may be using older encryption mechanisms.
IPsec communication occurs over a ‘tunnel’; a tunnel is another name for the encapsulation of data, in other words, cryptographic modification of the IP packet contents, and replacement of key data fields within the IP header with non-encrypted extensions (so that they can be handled by a remote TCP/IP stack). As an example, the length of the packet data may change after encryption (usually, longer). The ‘new’ packet header must reflect that fact. Remember that an IP packet can contain many other types of data, each of which might include its own headers and ‘sub-data’ (if you will), and you can see that simply encrypting the packets themselves would make it practically impossible to recover the other data.
Although it is common to use encrypted headers (ESP), it is also possible to use a simpler mechanism known as Authenticated Headers (AH). Authentication does not encrypt the header data, and thus is not as secure as ESP, but it is useful to prevent tampering, and is faster to apply. However, any decent product will include support for both mechanisms.
Each IPsec tunnel (whether directly between two hosts, or between two networks) has one or more Security Associations (SA) that specify details about how the encryption was applied: this includes the type of encryption and authentication used, the cipher and digest keys, the local and remote IP gateway addresses, and a unique 32 bit ID referred to as the Security Parameter Index (SPI). The SPI simplifies the process of handling encrypted data by pointing to the appropriate SA table entry for a given tunnel. Remember that there can be multiple tunnels in existence at any given time, and each might use a different mix of techniques. Without SPI, it would be necessary to try each currently active mechanism to determine which one applies to a given packet; it would be computationally wasteful, and worse, the process would have to be applied to all packets, including those that would otherwise be trivially rejected because they are intended for other machines. It would also require the IP stack to have knowledge of the type of data in a packet, which would be cumbersome at best, and not easily extensible. Yet another thing to watch for is how the SPI is assigned. At least one major vendor (who claims to be certified) issues sequential SPI numbers; the standard calls for random numbers.
IKE (Internet Key Exchange) is both a client and a server program, which listens and transmits (using User Datagram Protocol or UDP, port 500), in order to negotiate with other IKE systems to generate and agree upon Security Associations for the IPsec tunnels. As noted above, associations are an agreement between two systems as to the type of headers used (ESP or AH), cipher type (DES, 3DES, AES, etc.), and the digest method (MD4, MD5, SHA1, etc.). The SA indicates the method for each, so that the receiving system can apply the appropriate inverse handlers.
A common IKE option is the list of mechanisms available for shared secrets. These include Perfect Forward Secrecy (PFS) key sharing, using techniques such as Diffie-Hellman Group 1 (768-bit MODP) or Group 2 (1024-bit MODP). It is also very common to rely on manual key exchange using a pre-shared secret. The interested reader is encouraged to consult the web for the latest information on such mechanisms. A few systems are now beginning to support Diffie-Hellman Group 5 (1536-bit MODP), which is currently considered to ‘super secure’. However, even without DH Group 5, a system basing it’s tunnels and IKE upon the above options will interoperate with virtually all current IPsec packages including those for Linux, BSD, and Microsoft 2000/XP, as well as some private implementations such as those used on Linksys VPN routers.
Not surprisingly, IKE can require extensive modification in order to be fully effective with, and optimized for, a given system. In addition, most vendors support only a subset of the available cryptographic mechanisms, in order to reduce code size and overhead, or to limit interoperability with products from other vendors. However, as with tunnel components, IKE interoperability is a requirement for quality products, and strong testing programs including full regression tests are absolutely mandatory in a production environment.
Testing and Verification
Lead members of the IP Security Protocol (IPsec) working group at the IETF (Internet Engineering Task Force) have a free IKE/ISAKMP compatibility and interoperability test suite available. It is worth noting that the suite only tests IKE; it does not test the tunnels. For that, the best approach is rigorous interoperability testing with a variety of other devices. It is critically important that testing include several devices that use the FreeSwan ‘Pluto’ IPsec implementation, since it is the base for a large number of products. At least a few products experience difficulties when attempting to specify the Diffie-Hellman group used for a given SA; rigorous testing is the current best mechanism available to avoid problems in the field.
An important part of the test process is VPN throughput measurement. This will vary greatly depending on type of processor and the use (or not) of hardware encryption. In a router/gateway or VPN, usually the WAN port is the only port that uses encryption. Very low cost, low speed devices with software encryption might allow the WAN port to be used for DSL connections up to about 8 megabits/second. For speeds up to about 100 megabits/second, the smallest devices are usually not adequate unless they include a hardware accelerator.
When testing throughput, be aware that some vendors play a very good specsmanship game. Their products appear to be tested with very contrived settings that do not reflect ‘real world’ operating conditions. Examine test claims carefully.
Slow devices with limited memory, unless carefully designed, may use a restricted set of ciphers and keys to reduce code size and avoid communication using computationally ‘expensive’ crypto methods. Unfortunately, such devices may not be as secure as devices which include support for more mechanisms and larger key sizes. When comparing devices, it is worth noting the mechanisms supported by a device, to assure that it will meet both the speed and security needs of the intended users.
Improving Performance, and the CryptoAPI
As noted earlier in this article, it is possible to implement some crypto functions (or portions thereof) directly in hardware, to improve performance. For example, DES and triple DES (3DES), including variations such as chaining, are quite easy to implement in hardware. A flexible crypto package will be able to take full advantage of such hardware when it exists, without rewriting portions of the system.
One way to do this is by using a plug-in interface which can interact with a hardware driver transparently. In other words, a function such as encrypt() can be created in such a way that it uses a software encryption routine (of the appropriate type) when hardware crypto is not available, but the same function calls a hardware engine if it exists. A programmer calling the encrypt() function is unaware of any differences – except that the system runs much faster.
On Linux systems, portions of the crypto subsystem will be implemented as modules (such as ipsec_tunnel). It does not matter whether the module is statically or dynamically linked for purposes of this discussion. There are also userland applications, such as the isakmpd daemon, and libcrypto libraries that can be used by a variety of other programs.
Beginning with Linux kernel version 2.5, a CryptoAPI is defined that simplifies the use of cryptographic functions (at the time of this writing, Linux kernel version 2.5 is still under development, and has not been released as a stable version). Advantages of CryptoAPI include a smaller code base, reduced memory usage and binary image size, the ability to share common algorithms, and ways to transparently integrate hardware acceleration support without modifying existing kernel modules, libraries, or userland apps. A discussion of cipher and digest methods supported by CryptoAPI, the access mechanisms used by other kernel-space modules, as well as the available system calls that allow userland access to the API, can be found at various web sites, including kernel.org.
Summary and References
This article has discussed some of the issues facing designers of small, low-cost VPN routers, and the tradeoffs associated with those options. Any given system is a collection of individual choices that affect the final product, but with care, it is possible to build small, cost-effective VPN products.
A small IPsec tunnel component can be found from Tobias Ringstrom: http://ringstrom.mine.nu/ipsec_tunnel/
The IETF IKE test suite can be found at: http://isakmp-test.ssh.fi/cgi-bin/nph-isakmp-test
uClinux-related Web Sites: Main uClinux site: www.uclinux.org uClinux-related site: www.ucdot.org
Company Web Sites: Arcturus Networks www.arcturusnetworks.com RedHat www.redhat.com SnapGear www.snapgear.com
About the author
John Drabik is the CTO for Arcturus Networks Inc. He has been involved in system design and development for dedicated and embedded systems, for over 25 years. John has authored numerous articles and papers on the embedded Linux environment, system design, business strategies, licensing issues, and product directions. He is the former VP and CTO for Digital Media at Lineo, the Chief Engineer and Architect of Lineo’s DTV and RG stacks. John is also the former Technical Chair of TV Linux Alliance.