2 Background

This chapter should give the reader an understanding of concepts used throughout this report. We first describe the foundation for our work, cryptography, and then describe the model we will be working in, namely Public Key Infrastructure (PKI). We proceed by discussing the Domain Name System, and how it relates to PKIs. We conclude by giving an overview of one major application of PKIs on the Internet, namely Secure Messaging or more specifically, secure email.

2.1 Cryptography

Cryptography is a enabler of secure communication. The word comes from the Greek words kryptos for hidden and graphein for writing. Cryptography is thus the science (or art) of ``secret writing''. The following is based on similar material from [89], [29] and [26].

When talking about cryptography, we refer to senders and receivers wishing to exchange messages or plaintext by exchanging ciphertext. It is assumed that an eavesdropper reading ciphertext should not be able to extract corresponding plaintext. This characteristic is called confidentiality. The process performed by a sender to hide plaintext is called encryption, the reverse operation is called decryption. These processes are often expressed as mathematic functions or computing algorithms. The encryption and decryption algorithms together constitute a cipher. Cipher algorithms intended for general use cannot be secret. So cannot the eavesdropper just invoke the decryption process to extract plaintext? Ciphers use keys to solve this problem. The key is used by the encryption process. A key is one element out of a large set of elements, the key-space. Figure 2.1 illustrates these concepts.

**Figure 2.1:** Some basic cryptographic concepts
$\includegraphics[scale=0.4]{cryptoconcepts}$

The usual mathematical description of ciphers uses E to the denote the encryption function and D to denote the decryption function. Plaintext is denoted by M and ciphertext by C. If the encryption or decryption functions are key dependent they are denoted by E_K and D_K respectively. Of course, E and D should have the property D_K(E_K(M)) = M. Ciphers where the encryption and decryption keys are equal are called symmetric ciphers. Such ciphers have the property that sender and receiver must have agreed on a certain key prior to the secure communication. Such keys are called shared secret keys. A good cipher should have the property that discovering the key, e.g., by inspecting ciphertext, should take an unreasonably large amount of time or be very expensive.

In sending and receiving messages, parties are often interested in three properties of the communication other than confidentiality. Integrity means that the sender and receiver should be able to verify that a message has not been modified in transit. As a consequence, this means that an intruder should not be able to substitue a false message for a legitimate one without being detected. Authentication means that the receiving party should be able to ascertain the origin of a message. Nonrepudiation means that the sender should not be able to falsely deny that she sent a message.

Using shared secret keys, it is possible to calculate Integrity check values from a message to achieve integrity. The integrity check value should depend on all bits of the plaintext. Should any bits of the message be changed between the sender and recipient, the recipient would calculate a different integrity check value. An adversary modifying a message might as well modify the check value too, but without knowledge of the secret key she cannot duplicate the correct integrity check value. If the receiver correctly verifies the integrity check value, she knows the message was generated by someone who knew the key. An important application of integrity check values is Message Authentication Codes [47] which use a symmetric block cipher (e.g., the Data Encryption Standard [91]). Integrity check values are also known as Message Integrity Check. Hash functions can also be used to provide an integrity check value, this mechanism is called a Hashed Message Authentication Code or a keyed hash function [25].

In [16] Diffie and Hellman introduced the concept of public key cryptography, independently invented by Merkle [68]. Such ciphers are also known as asymmetric ciphers. Unlike symmetric ciphers, a public key cipher uses two related keys - one for encryption and one for decryption. In addition to the requirements on symmetric keys, it should also be ``infeasible'' (be prohibitly expensive or take large amounts of time) to learn the decryption key given the encryption key and/or ciphertext. The encryption (public) key can be made available to anyone who wishes to securely communicate with an entity. There is no longer a need for prior key arrangement between sender and receiver.

One of the first suggested and still most commonly used public key ciphers is the RSA cipher [88]. The security of the RSA cipher depends on the difficulty of finding the prime factorization of large integers. To describe how RSA works, we begin by noting how keys are generated. Two large prime numbers p and q (of roughly equal length) are chosen randomly. In practice, the size of p, q is recommended to be on the order of 100 decimal (non-zero) digits, or larger. The encryption key e is now chosen, randomly, such that e and (p - 1)(q - 1) are relatively prime. The decryption key d is calculated as a inverse of e modulo (p - 1)(q - 1), in other words by solving ed $\equiv$ 1(mod (p-1)(q-1)) for d. Together e and n are the public key, and d is the private key. A plaintext m is encrypted to ciphertext c by simple modular exponentiation, c = m^e(mod n). Decryption is performed by m = c^d(mod n). This works since c^d = (m^e)^d = m^ed = m^{k(p-1)(q-1)+1} = mm^k(p-1)(q-1) = { by Fermat's small theorem } = m^.1 = m, all calculations modulo n.

**Figure 2.2:** Simple key transfer. For symmetric keys, the communication must be private, integrity protected and authenticated. For asymmetric keys, the communication must be integrity protected and authenticated
$\includegraphics[scale=0.5]{keyexchange}$

Public key ciphers place a large burden on users to somehow distribute these public keys. The straight forward method, by directly contacting the receiver and ask him for his key, is still possible. This is illustrated in figure 2.2. For symmetric keys this communication must be private (no-one can read the key), be integrity protected (no-one can modify the key) and authenticated (you know who you are talking to). For asymmetric keys, you only need integrity and authentication. The keys are, after all, public. In practice, this is a negligible advantage--current techniques to guarantee integrity and authentication also require keying material. This would create a chicken and egg problem. However, public key cryptography opens up for other forms of key distribution. Before venturing into this field we need more concepts to work with.

Written signatures are used as proof of authorship of a document, or at least proof of agreement with a document, such as a contract. Several equivalent electronic methods have been suggested. A digital signature is a piece of data which accompanies a message. It is used to ascertain originator of message and the integrity of the message. Figure 2.3 illustrate how digital signatures are intended to operate.

**Figure 2.3:** Digital Signature
$\includegraphics[scale=0.5]{signature}$

Some (but not all) public-key ciphers are able to operate as digital signature algorithms. RSA is able to operate in an authentication mode to provide digital signatures. The authentication mode is simply to use the private key for encryption and the public key for decryption, the inverse of regular use. This way, only the originator is able to encrypt (sign) the message, and everyone that knows the public key are able to decrypt (verify). To boost efficiency, digital signatures are rarely calculated on the entire input document but rather on a one-way hash value calculated from the document. Similar to ciphers, one-way hash functions are often expressed as mathematical functions or computing algorithms. One-way hash functions are one-way in the sense that they should be fast to compute but difficult to invert. In mathematical terms, calculating hash value h as h = H(m) for a hash function H on a message m should be fast. But given h it should be infeasible to compute any member of the set {m| m = H^-1(h)}. h is uniformly distributed. Hash functions used cryptographically, for digital signatures, should have two properties:

Hash functions usually produce output of a fixed length, commonly a few hundred bits. This is in contrast to documents, that may have arbitrary size. For RSA, the hash value is encrypted using the private key, which generates the signature. On the receiving end, the verification procedure is performed. The verification procedure consists of computing the hash value of the input document and comparing it with a decryption of the signature, using the public key of the sender. Again, not all public-key ciphers are able to operate in this way to achieve signing, but we are satisfied in noting that other algorithms are able to achieve the same goals using other methods.

In the previous discussion, we have avoided to define what we mean by ``computationally infeasible'' or ``hard'', this is the topic of many text books on complexity theory and we refer to them (for example [37]).

Now the reader should have an understanding of techniques used to solve the problem that motivated this digresssion; namely how to distribute public keys in a secure fashion.

**Figure 2.4:** A digital certificate: digitally signed information containing a public key and some information related to that key
$\includegraphics[scale=0.5]{certificate}$

Recall our discussion about direct key transfer between two entities from the beginning of this section, also illustrated in the figure 2.2. Unless this communication is integrity protected and authenticated (and encrypted in the case of symmetric keys), an adversary might be able to intercept and replace the key. Then Bob will believe the key he received belongs to Alice, and might use it to encrypt sensitive information intended for Alice's eyes only. This information might be decrypted and read by the adversary since he replaced Alice's key for his own. It might also be re-encrypted (with Alice's public key) and passed on to Alice, to reduce the risk of getting caught. This form of attack is known as a man in the middle attack. We will see that the use of public-key certificates is a practical method to solve this problem.

In [58] Kohnfelder introduced the notion of public-key certificates. A public-key certificate is a public key, digitally signed by a trustworthy entity. A certificate usually contains information about the owner and the signer, such as names or email addresses. Figure 2.4 illustrates this.

By having public keys digitally signed by a mutually trusted third party, all three problems with distributing keys are solved: Protecting the key, maintaining data integrity, and authenticating data. Since public-key technology is used, there is no need to protect (encrypt) the key. The certificate is digitally signed, and thus provides data integrity. By signing the certificate, it is later possible to authenticate the data contained in the certificate (i.e., the public key). To authentiate a certificate, the knowledge of the public key of the trusted party is required. Since this mutually trusted third part can issues many certificates, an entity is able to strongly trust the validity of many public keys by knowing and trusting one public key.

By using the concept of certificates and a trusted third party, Bob can get Alice's public key without requiring an integrity protected and authenticated channel between Bob and Alice. Alice sends her certificate, issued by the trusted third party, to Bob. Bob knows the trusted third party's public key and is able to verify the signature of the certificate. He is then able to trust that the public key he received is correct and actually belongs to Alice. This is illustrated in figure 2.5. Of course, the man in the middle attack can now target the communication between the trusted third party and either Alice or Bob. Since this communication only takes place when certificates are issued or renewed, special care could be taken to protect that communication.

**Figure 2.5:** Secure key transfer. The key is actually a signed public key, a Certificate
$\includegraphics[scale=0.5]{cakeyexchange}$

Using certificates also opens up for the possibility of removing the key transfer between Alice and Bob completely! This is an important feature when secure communication is being implemented between a large number of participants. This is because none of Bob's ability to trust the public key he receives now depends on whether he is actually communicating with Alice or not. His ability to trust the key depends solely upon his trust relationship with the trusted third party. This means that Bob might actually be talking to a database that stores certificates for many users, much like a phone book. Bob can use his trusted third party's public key to determine if he should trust a certificate or not.

Since the introduction of certificates, several standards have been suggested to implement this idea. The preveiling format is X.509 Certificates, as standardized by the International Standards Institute [11] in the late 1980's. The format has evolved, and has been profiled for use on the Internet by the Internet Engineering Task Force (IETF). This profile is known as the Internet X.509 PKI (PKIX) profile. Still, to benefit from using certificates in applications, the issue of determining how the ``phone book'' should operate must be solved. This problem, and others, are solved under the framework of a Public Key Infrastructure. This thesis studies how the Domain Name System, a protocol already widely used on the Internet, can be used within a public key infrastructure to locate and retrieve certificates.

2.2 Internet and the Domain Name System

The Internet consists of loosely interconnected networks of computers located around the world. Computers communicate with each other by exchanging packets according to various protocols. Computers wishing to participate on the Internet need to follow the protocols used by other members of the Internet. The lowest level common protocol used on the Internet is named ``Internet Protocol'', often refered to as IP [82]. The addressing mechanism used by IP is similar to phone numbers. All entities that communicate on the Internet must have an IP address. An IP address may look like 195.42.214.244. Protocols are often layered on top of each other, to provide more specialized functions. For example, IP does not guarantee delivery of packets between entities. For those applications that require guaranteed packet delivery another protocol exists, layered on top of IP, that provides these functions. This protocol is known as Transmission Control Protocol (TCP) [83]. So how can TCP provide guaranteed packet delivery when TCP only uses IP, which does not provide guaranteed delivery? TCP accomplishes this by enumerating and acknowledging each packet. Using retransmission-timers TCP detects when acknowledgements are lost. If a packet (or acknowledgement) is lost, the packet will be re-transmitted until an acknowledgment is received. Many popular application protocols are layered on top of TCP. Examples include the Hypertext Transfer Protocol (HTTP) [28] for Web Browsing, the Simple Mail Transfer Protocol (SMTP) [13] and the Internet Mail Access Protocol (IMAP) [12] for Electronic Mail.

Using IP addresses to locate resources on the Internet has several problems. One of the most important problems is that IP addresses are hard to remember for humans. There is no obvious connection between real-world names of companies or persons that can be used to find the IP address. Returning to our phone number analogy, we observe that phone books often are used to collect phone numbers ordered by other information. They can be ordered by company name, personal names etc. If the same idea was used on the Internet, we would attach ordinary names to machines. It also must be possible to somehow convert this name into the actual IP address; this is the job for our Internet ``phone book''.

**Figure 2.6:** Brief example of the DNS hierarchy
$\includegraphics[scale=0.4]{dnsoverview}$

This problem was solved in the middle of the 1980's, and the solution is called the ``Domain Name System'' or DNS. The DNS organizes names of machines in a hierarchy. Universities may organize the names of their machines within a local hierarchy such as departments, and the university itself may be located within the ``educational hierarchy''. Top-level hierarchies of the Domain Name System include ``Educational'', ``Organizations'', ``Companies'', ``Millitary'' and a hierarchy for each country around the world, such as ``Sweden''. The DNS uses shorter forms for brevity, ``edu'' for educational, ``mil'' for millitary, ``se'' for Sweden etc. Figure 2.6 illustrates a few members of the DNS hierarchy.

Because names are easier to remember than IP addresses, the Domain Name System hierarchy and the names stored in it are often used by application protocols--such as web browsing and electronic mail.

This last observation is important, and combined with the flexibility of DNS, is crucial to our work. Since the domain name hierarchy is used by many modern electronic applications, there are several advantages to being able to store related information about a domain name (such as a public key in the form of a certificate) in DNS. To see this, consider a separate infrastructure for locating certificates. This system must provide a global infrastructure, similar to how DNS works, so that it is possible to look up information from everywhere without any special knowledge of which server to query etc. Such attempts exist, and they often use their own form of addressing mechanisms. The prominent example is the ISO X.500 initiative. Now, if we wish to use another directory system with domain names (that is the usual address mechanism of the Internet) we depend on the possibility and success of locating and using domain names as an addressing mechanism within the other directory system. We will see that this is a complicated issue, and that no satisfactory solution exists to this date.

On the other hand, the Domain Name System provides the flexibility to allow us to store any data attached to a domain name. For example, it can attach ``certificate'' data to a domain name in the ``phone book''. If this is the case, we do not need to involve another directory to locate and retrieve certificates. In this report we will argue that storing application keying material, or certificates, in the Domain Name System is a promising idea.

2.3 Public Key Infrastructure

Public Key Infrastructures (PKI) consists of services required to make use of public-key based technologies on a large scale. The first service of a PKI that comes to mind is locating and retrieving certificates, but many other aspects constitute a larger part of actual PKIs. Non-technical aspects such as legal considerations are also a part of what makes up a PKI. We build on the concepts introduced in the last chapter, and continue by introducing the entities of a PKI. They are illustrated in figure 2.7.

**Figure 2.7:** Players of a PKI
$\includegraphics[scale=0.5]{pkiplayers}$

Some of the more important operations performed by the PKI entities include the following:

2.4 Domain Name System

As mentioned, the DNS ``phone book'' looks like a hierarchical system. This is also reflected in how the actual database which holds the information is implemented. Instead of having a big database containing answers to all queries, DNS works by delegating the responsibility, or ``authority'', for each hierarchical component (i.e., a subtree of the DNS hierarchical tree, also called a ``zone'') to the servers that should be responsible for that component. These servers can further sub-delegate authority, resulting in a distributed database based on a tree topology.

To see how this works, consider looking up some piece of data attached to a DNS domain name. In our example we will look up the IP address attached to the domain name www.nada.kth.se. We begin with a somewhat simplified description. A client that wishes to look up a name must know the address to the ``root'' servers. The ``root'' servers are the servers located at the top of the DNS tree. The client sends a query for www.nada.kth.se to the root server, and usually the root server only knows who is responsible for the next sub-component in the query. In our example, this means the root server is only able to tell the client the addresses of the servers responsible for se. The client will now forward the same query to the se servers, and in our example these servers do not know the answer to the query either, but have delegated the authority over the kth.se zone to certain servers and it informs the client of their addresses. The client repeats this procedure, asking kth.se, and this time it receives addresses to the servers responsible for nada.kth.se. As it happens, this server will know the correct answer and construct a response and sends it to the client. The client has now received the IP address of www.nada.kth.se.

In our example, we have made some simplifications. Specifically, if the DNS system worked as we just described it is not difficult to see that the root servers would receive an enormous amount of traffic. In practice this problem is solved in two ways. First, the ``clients'' in our example will cache all answers it receives. This means that once it has received the addresses for ``se'' from the root server it is able to query these servers directly for addresses within the ``se'' zone in the future. Secondly, the ``clients'' are usually not applications or even individual workstations but rather a server located close to the workstation which usually serves many workstations. The server aggregates the DNS needs of many workstations in a smaller environment, such as a department. This means the local server holds a cache for several workstations at once. Since usage patterns on workstations often are quite similar this reduces traffic. Also, since the server is not re-started or turned off as frequently as workstations often are, the cache will be more effective.

We will see in more detail what the actual protocol that implements this looks like in section 4.4.2.

2.5 Electronic Messaging

Electronic Messaging is a conceptually well-known and well-understood application. The diversity of existing electronic messaging clients is, not surprisingly, very large. Many different classes of messaging exists, with many competeting technologies in each class.

The classic example is Electronic Mail (or email). While there exist several implementations such as X.400 and MEMO, the Internet standard for text messages [13] and its usual transport mechanism [84] is the dominating implementation. Other classes of messaging include Instant Messaging. Instant Messaging is a real-time short message service between two entities, well-known applications include the Short Message Service (SMS) on GSM Digital Mobile Phones, ICQ [46] and America On Line's AIM [2] on desktop computers. A variation of Instant Messaging is Public Chat Groups, which is a real-time short message service between several, often near-anonymous entities. The largest implementation of Public Chat Groups is IRC [48]. Another class is Public Discussion Forums, with the prominent example of Usenet [40], but still other implementations exist such as Fidonet [27] and KOM [80].

The previous discussion of Electronic Messaging technologies is far from exhaustive. The intention is to give the reader a sense of the diversity that exists in this field.

2.5.1 Secure Electronic Messaging

None of the previously mentioned messaging technologies has been designed with strong security in mind. Other features are often thought to be of more importance in the design phase. In order to accommodate professional use of messaging technology, security extensions are critical. Several security extensions for Internet Mail have been proposed, such as PEM, MSP [17], (Open-)PGP, Security Multiparts for MIME, MOSS [14], PGP/MIME and S/MIME. Before we go on and discuss some of these security extensions, we need to establish a vocabulary. It is used throughout this report.

The terminology used when talking about messaging is due to the OSI X.400 Message Handling System Model [69], the following description is due to [29, page 154] and is illustrated in figure 2.8. Messages originate from and are ultimately received by Users, which may be people or mail-enabled application programs. A message has one Originator and one or more Recipients. A user is supported by software called a User Agent, which performs such tasks as preparing and submitting messages for its user, and receiving and preprocessing received messages for its user. A User Agent may be a stand alone software application (sometimes called a Mailer), or it may be integrated into another application such as a Web Browser. The message transfer backbone comprises systems called Message Transfer Agents (MTAs). A message is submitted at an originating MTA, which delivers it to a recipient user agent. MTAs may be store-and-forward message switches of a given messaging technology, or they may be mail gateways between different technologies.

**Figure 2.8:** Message Handling System Model
$\includegraphics[scale=0.4]{mailmodel}$

2.5.2 Multipurpose Internet Mail Extension

Several Secure Electronic Mail proposals, and most of the successful ones, are related to the Multipurpose Internet Mail Extensions (MIME). MIME is described in the five-part standard suite [31] [32] [72] [33] and [30]. MIME is a framework that extends the one-dimensional Internet Mail Message Format known as RFC 822 [13]. This traditional format has several drawbacks:

These concerns were the driving force behind development of MIME. MIME has been a success. Many standards, some even unrelated to mail, uses parts of the MIME standard. Examples include the Hypertext Transfer Protocol (HTTP) [7] and Internet Mail Access Protocol (IMAP) [12].

2.5.3 Privacy Enhanced Mail

The following overview of PEM is based on similar material from [29], [9], [26].

Privacy Enhanced Mail (PEM) was the first serious effort to secure Internet mail. The Internet Resources Task Force (IRTF) Privacy and Security Research Group (PSRG) did the initial design. The Internet Engineering Task Force (IETF) PEM Working Group continued development for three years, resulting in a four-part Proposed Internet Standard published in early 1993 [64] [56] [5] [55]. PEM is a broad standard suite, it provides encryption, authentication, message integrity and key management. PEM supports both symmetric and asymmetric (public-key) key management schemes. PEM uses DES for encryption, MD2 or MD5 for authentication and X.509v1 with RSA for public-key management. The standard also allows for different suites of algorithms to be defined later. PEM is designed to be taken into use selectively, by site or by user, without affecting other parts of the network.

**Figure 2.9:** The PEM Public Key Infrastructure
$\includegraphics[scale=0.5]{pempki}$

Even though PEM is a landmark protocol in the development of secure messaging, and is also generally considered to be of sound technical design [29], it did not catch on. This was mainly due to two reasons. First, the message syntax that PEM describes was incompatible with the widely successful MIME message syntax that emerged at the same time [29, p. 156]. Secondly, the public-key management described by PEM restricted the Certificate structure [9, p. 51]. Namely, it required a top-down Certificate Authority (CA) approach. An entity, the Internet Policy Registration Authority (IPRA), establishes global certification policies by certifying Policy Certification Authorities (PCAs). Each PCA in turn certifies CAs that will follow the certificate policy of that PCA. This hierarchy is illustrated in figure 2.9. A strict hierarchical approach works well in strict hierarchical organizations, but this is a feature that the Internet lacks. Two other complications with the public-key management approach also turned out to cause problems. First, PEM required the IPRA to maintain a database of unique Distinguished Name (DN), that all PCAs were supposed to query before certifying Certificate Authorities. Secondly, the X.509 version 1 certificate format does not contain fields for certificate policies, forcing all applications to be aware of PCA policies by other means, which PEM did not provide for.

2.5.4 Pretty Good Privacy

Pretty Good Privacy (PGP) was developed during the same period as PEM, in the early 1990's. PGP was originally designed for securing Internet mail. PGP shares most technical features, such as digital signatures and public-key based encryption, with PEM. Like PEM it uses a proprietary, non-MIME-compatible, message format [3]. However, later MIME-compatible variations have evolved [22]. PGP's main difference from other proposals is its key management system. It does not use X.509 Certificates, but rather a proprietary syntax. Also, it uses a non-hierarchical certification model known as ``web of trust''. We will not study PGP further, a good reference is [98], and an account of PGP History can be found in [4].

2.5.5 Security Multiparts for MIME

Security Multiparts for MIME [35] is a simple framework for adding security enhancement to Internet Email by using MIME. Basicly, it describes how you combine a text message (or other data) with cryptographic information. It does not describe the cryptographic operations themselves, that is left for other specifications. It has gained wide popularity. Security Multiparts for MIME is used by at least three protocols; MOSS [14], PGP/MIME [22] and S/MIME, of which the latter two have been successfully deployed and are widely used today.

2.5.6 Secure MIME

Secure MIME (S/MIME) [86] [85] combines the previously mentioned Security Multiparts for MIME framework with the Cryptographic Message Syntax (CMS) [44] standard. CMS is derived from the Public-Key Cryptographic Standards 7 (PKCS#7) [60]. The differences between PKCS #7 and CMS are minor. Added features to CMS include support for key agreement techniques. As stated in the previous section, the Security Multipart framework does not define cryptographic operations or specific cryptographic message syntax. S/MIME supports both signed and encrypted messages. We now turn to a brief overview of the CMS standard because an S/MIME messages essentially is a CMS messages.

CMS defines a syntax for data that has cryptographic operations applied to it. Cryptographic operations include digital signatures and encryption. The syntax is described using OSI Abstract Syntax Notation One (ASN.1) [49], a language often used to describe data structures. The details of how ASN.1 works are not essential here, a good introduction to ASN.1 can be found in [54]. Returning to CMS, it describes how plain text (a file or network stream) is wrapped into data, after signing or encryption operations have been performed. The data structure contains information used by the receiver to understand what treatment the data has been subjected to. This enables the receiver to restore the original data, and to properly verify or decrypt the content. The CMS format is compatible with the message format used in PEM, in the sense that CMS messages can be converted from and to PEM messages without any cryptographic operations. CMS does not require a certain key management procedure or a special security infrastructure. Further, CMS can be used either inside a public key infrastructure, or in an infrastructure using shared symmetric keys.

2.3 Public Key Infrastructure

2.5 Electronic Messaging

Footnotes