Two wins for Internet Privacy on the same day

Today, the 17th of May 2016, the IETF Editor published RFC 7844, an Anonymity Profile for DHCPv4 and DHCPv6 clients, and RFC 7858, transmission of DNS requests over TLS. These two RFC can close two important avenues for leaking metadata over the Internet.

I started working on what became RFC 7844 in November 2014. The work on MAC Address Randomization was progressing swiftly, and the corresponding feature would ship in Windows 10 the next year. MAC Address Randomization ensures that your laptop or your smartphone cannot be tracked by its Wi-Fi MAC Address, but we quickly observed that doing that was not sufficient. When your computer joins a network, it executes a series of low level protocol to “get configured.” One of this protocol is DHCP, which is used to obtain an Internet Protocol Address. The problem is that DHCP is very chatty, and by default provides all kind of information about your computer name, software version, model, etc. I worked with the members of the DHCP working group in the IETF to remedy that, and their response was great. They produced thorough analyses of privacy issues in DHCPv4 and in DHCPv6, which have just been published as RFC 7819 and RFC 7824. RFC 7844 patches all these issues. And the best part is that an implementation already shipped in the November 2015 update of Windows 10.

The work on DNS Privacy is just as important. By default, your computer issues “name resolution” requests in clear text for each web site and each Internet service that it uses. This stream of requests is a rich set of metadata, rich enough to allow for identification of the computer’s user, and then to track its activities. It is here for the taking, by shady hot spot providers, spies, or maybe criminals. RFC 7858 defines how to send these requests in a secure fashion to a trusted DNS server, effectively closing that source of metadata leakage.

Of course, there is more work to do. But the 17th of May 2016 was a great day for Internet Privacy.

Posted in Uncategorized | 2 Comments

MAC Address Randomization in Windows 10

As you may know, I care a lot about Internet privacy. The main tool for privacy is encryption, hiding your communications from potential spies. But that’s not enough. We also need to deal with web privacy, the tracking of your web browsing by advertisers using cookies. And we need to minimize the “metadata” that your devices disclose when you are connecting to the Internet. I work in the Windows networking team at Microsoft, and I had a chance to work specifically on this metadata problem, helping the team implement “MAC Address Randomization.”

You may have heard of systems that track your movements by tracking the Wi-Fi hardware address of your phone or your laptop – the address also known as “MAC Address.” Windows 10 includes protection against that. It implements a form of MAC Address Randomization. The randomization function can be controlled through the settings application. You will first have to go to the Wi-Fi page in the settings UI, and if you scroll to the bottom of the page you will see two options:

If you click the “Manage Wi-Fi Settings” link, you will get to a page that control all the “global” options for the Wi-Fi interface. MAC Address Randomization is one of these options. If your hardware is capable of supporting MAC Address randomization, the page will look like that:

The feature is supported on the recent hardware. If your hardware does not support randomization, the UI will of course not present the option. If randomization is supported, switch the toggle to turn the feature ON or OFF.

On the phone, the UI is slightly different. You will need to click the “Manage” button at the bottom of the Wi-Fi page to get to the manage settings page, but the logic is the same.

If the option is turned ON, your phone or laptop will start using random MAC Addresses when “roaming” between connections. Normally, when your device is not connected, it will wake up every minute or two, and try finding if there is a Wi-Fi network available in the vicinity. For that, it will send “probes.” In the absence of randomization, these probes are sent from a constant or “permanent” MAC address that uniquely identifies your device. Trackers listen to these probes, for example in some shopping malls, department stores, or other public areas. When randomization is turned on, the system sends these probes from a random MAC Address. This defeats a large part of the “Wi-Fi” tracking currently going on.

Eventually, you will want to connect to Wi-Fi network. At this point, the system acts differently if it is the first connection to a new network, or if you already connected to that same network in the past. If this is a new network and randomization is ON, the system will pick a random MAC Address for that network, and use it for the new connection. If randomization is OFF, the system use the permanent MAC Address embedded in the hardware. For repeated connections, by default, the system will keep using the same MAC Address that it used on the first connection, random if randomization was ON for the first connection, permanent if randomization was OFF.

There are networks for which you care more about usability and management, and you should keep using the permanent MAC address there. A classic example are corporate networks, in which IT managers want to precisely track who is connecting. Another example are some home networks, in which an overactive owner decided to turn on the “MAC Address Filtering” feature. You should turn MAC Address randomization OFF before connecting to such networks. You can turn randomization back ON after the first connection is complete. The system will remember to use the permanent MAC Address for these networks.

Even when randomization is ON, the system will by default use the same random MAC Address each time you return to the same Wi-Fi network. This design attempts to balance privacy and usability. Suppose for example that you are travelling and that you connected to the hotel’s Wi-Fi. On your first connection, you will probably have to fill up a web form to get authorized. The hotel’s Wi-Fi will then remember the MAC address of your device, and let it connect. If the system kept changing the MAC Address of your Wi-Fi interface, you would have to fill that form again each time you reconnect, and that would be fairly annoying. But if you keep using the same random MAC address for each connection, the network will recognize you, and you will not have to fill a form, or, in the worst case, pay again for the connection.

Of course, if you go to a different Wi-Fi network, the system will pick a new random MAC address for each of these networks. Each network knows that the same person is connecting again and again, but different networks cannot track that the same person is moving from the hotel to the airport or to a café. We believe that most people will be happy with that compromise, but if you are not, you can use the UI to change the setting. Suppose for example that you are coming every day to the same café, and that you don’t like the idea that once the system picks a random MAC address for the café, observers could track it. Again, you will go to the WI-Fi UI and look at options at the bottom of the page. Instead of selecting the “manage Wi-Fi settings” option, you will select the “advanced options,” and you will see the “per network” randomization option:

Selecting the drop box will unveils three possible settings:

On the phone, the UI is slightly different. In the “Manage Wi-Fi” UI, you will need to click on a network name to “edit” the network properties. At the bottom of the properties page, you will see the same control as described above, we the same choice between On, Off and “Change Daily.”

The “On” setting is the default when randomization is turned on. The system picks a random MAC address and keeps using it for this network. The “Off” setting forces the system to use the permanent MAC address. The “Change daily” setting instructs the system to pick a different MAC address every day. This is the setting that you want to use if you are concerned about your privacy when you are regularly visiting the same place. Of course, if you chose the “change daily” option, you may have to fill a new web form every day when you connect. But that’s your choice!

Posted in Uncategorized | Leave a comment

The Quest for Internet Privacy

Two years have passed since the Snowden revelations, and almost two years since the IETF meeting in Vancouver. There was a palpable sense of urgency in the meeting, with more than a few hints of anger, as you can see watching the recording. But even with that kind of pressure, there were always some doubts that standard groups like the IETF could achieve significant results in a short time. Well, if we allow that two years is short enough for the IETF, we have in fact seen results.

I don’t know if RFC 7624 will ever become a best seller, but it does present a comprehensive review of the threats to Internet privacy posed by global surveillance. It took two years and the merging of several drafts, in typical IETF fashion, but the message is clear. The analysis has informed two different types of actions: deploy encryption, and reduce the amount of meta-data leaked by various protocols.

Previous standards like HTTPS were already there, and the industry started deploying encryption without waiting. We could see the entries in the EFF’s Encrypt the Web Report progressively turn green as big companies encrypted their web pages and services, their data storage, or email transmission. But new standards make the encryption more efficient and easier to deploy. In particular, the combination of HTTP/2.0 and TLS/1.3 will make encrypted transmission faster than legacy clear-text HTTP/1.0.

My personal efforts have been centered on the other part of the threat, traffic analysis and metadata collection, and I am glad that lots of my initial analyses found their way in RFC 7624. The connection of your computer or smart phone to the Internet relies on protocols like Ethernet, Wi-Fi, DHCP or the Domain Name System that were designed in the 80’s or 90’s. The priority then was to make the networks as reliable and as easy to manage as possible. As a result, the protocols carry identifiers like MAC Addresses, unique device identifiers or unique names. As you move through the Internet and connect to a variety of hot spots, the metadata reveal your identity and allows for location tracking and IP address to user name correlation. Traffic analysis gets much easier if the user of the IP address is known!

We are making progress. MAC Address Randomization, DHCP Anonymity and DNS Privacy are still work in progress, but the standards and early implementations are getting ready. That’s the good news. The bad news is that even when we will be done, there are still two worrisome kinds of tracking: advertisement on the web, and cell phone locations. Ad blockers may impede web tracking, but for cell phones the only plausible solution so far is the “airplane mode.” A little more work is needed there!

Posted in Uncategorized | Leave a comment

Hiding a Wi-Fi network is worse than Security Theater

Last month, I spent a lot of time looking at Wi-Fi protocols, and in particular at the privacy implications of Wi-Fi on mobile devices. The main privacy issue with Wi-Fi the use of “worldwide unique” MAC addresses, which enable really efficient tracking of devices and their owners. The industry is starting to address this. But a close second is the practice of “hiding the SSID,” in a misguided attempt at increasing a network’s security. The idea was to hide the name of your Wi-Fi network from people in your neighborhood. The effect is to have your phone broadcast the name of the network every few minutes, negating any privacy gain from techniques like MAC address randomization.

When you setup a Wi-Fi network, you are supposed to use the management interface of your router and assign a name to the network. (If you don’t do that, you get a default name like “linksys” or “D-Link”, which is not a very good idea.) For example, I gave to my network the name “9645NE32.” In the standard Wi-Fi setup, the wireless access points broadcast their availability by announcing their name, their SSID in Wi-Fi standard jargon. These broadcasts are captured by your device, and presented in the menu of available networks. When you want to connect to a network, you pick a name in the menu and you get connected. In many cases, the device will remember the networks that you connect to, and reconnect automatically when the network is in range. Life is good.

In the early days of Wi-Fi, some people were very concerned that outsiders would try to connect to their network. They looked for a way to “hide” the network, so the name would not appear by default in the connection menus of phones or laptops. Access Point manufacturers obliged, and provided a setting to “not broadcast the SSID.” In order to connect, the users cannot just click. They will have to manually enter the name of the network on their device. In short, the name acts as some kind of password. If you don’t know it, you cannot enter the network. It seemed like a good idea, an extra layer of security. The problem is, it is at best a very weak protection, analogous to sending a clear text password over the radio. And it allows for very efficient tracking of devices.

In the previous paragraph, I wrote that the access points broadcast their presence, and that the devices listen to these broadcasts. They do, but if the device only listened to broadcast data the discovery would be very slow. Access points operate on specific frequency bands, the Wi-Fi channels. The precise number of available channels varies from country to country, but you can count 3 or 4 popular channels at 2.4 GHz, and maybe 20 channels at 5 GHz. A device only listens to one channel at a time, and an access point only broadcast at fixed intervals. Passive discovery would involve listening on a channel for 2 or 3 broadcast intervals, then switching to the other channel and repeating. Very slow, and also power consuming since the receiver has to be active for long periods. Instead of passive listening, devices accelerate the process by sending “probes.” They will switch to a channel and send a probe messages asking “is there anyone here?” The access point that receives the message is supposed to answer immediately, “Yes, I am serving network SO-AND-SO.” Since the response is almost immediate, the device need only wait a short time to find out whether there is an access point serving the channel or not. It can then move to the next channel, repeat the process, and so on until all channels have been scanned.

In the case of hidden networks, things become a bit more complicated. The access point does answer the probes, but with a cryptic message, “Yes, I am serving some network on this channel but I won’t tell you which one.” That way, the network name is not broadcast and does not end up in the connection menus. The user will enter the network name, and at that point the device will send a new probe, one that includes the network name, “are you network SO-AND-SO?” If the name is indeed that of the hidden network, device and access point will establish the connection. Of course, users don’t want to be always entering the network name in the connection dialog, so the device’s software remembers that. It will start systematically probing for the hidden networks to which it might connect.

The problem of course is that the probing traffic can be listened to by anyone with a Wi-Fi sniffer. A sniffer near a hidden network will of course discover the network name, just by listening to the probe traffic. An active sniffer might emulate an access point to trick local devices to send probes, for very quick discovery. So much for the “Added security part.” But it gets worse. When you go to a café, to a hotel, to an airport, in fact pretty much anywhere near a Wi-Fi network, your device will keep sending these probes. “I am looking for network SO-AND-SO, are you it?” Nice way to follow you around, isn’t it?

In short, hiding the network name has no security benefit, and has a clear negative effect on privacy. It probably also open the door for instant attacks, in which access points are programmed to automatically spoof the hidden network and trick devices into attempting to connect. In short, it is a very bad idea, worse that Security Theater. If someone reads this and stops, I would be happy!

Posted in Uncategorized | 10 Comments

Blinding the spy in our pockets

Ars Technica and NPR collaborated on an interesting story. The Ars reporter, Sean Gallaghan, installed a “modified” Wi-Fi router in the office of the NPR journalist, Steve Henn. The “PwnPlug” router can capture and analyze the traffic, mimicking what the NSA can do by tapping the network. And, of course, the network analysis revealed lots of data about Steve Henn. It is as if each of us is carrying a personal spy in his or her pockets! When I heard about this on the radio, I was not really surprised by the result, but I was interested that this study was making its way to prime time radio news. That’s a good first step. If more people are aware of the spying going on, someone may eventually do something about it.

The most striking part of the report concerned what happens when Steve merely brings his iPhone in his office, without actually using it or starting any application. As soon as it obtains connectivity, the phone starts contacting a variety of web sites to update business and personal mail, calendar, maps, notes, twitter, Facebook, and probably a few more. Some of that traffic is encrypted, but even then the IP addresses are enough to understand who connects where. Enough headers are still in clear text to provide good correlation, pretty much as I was describing in an Internet Draft last year.

The same week, Quartz reported about a small feature in the latest version of Apple’s IOS: MAC address randomization. The article just provides a general description of the feature, picking a random MAC address before connecting to a Wi-Fi network. The article does not provide much detail about the implementation, but I assume that the software developers at Apple spent a lot of time tuning the feature. In general, changing the MAC address as often as possible provides better privacy, but changing it too often may force the user to repeat network authentication procedures, which is cumbersome. The IETF IPv6 working group debated a similar feature for “randomized IPv6 addresses” for over a year before publishing RFC 7217. Whatever the details of the implementation, it feels like a step in the right direction, and I hope that other operating systems will soon follow. (I don’t work in the Windows Wireless team anymore, so I have no idea whether my colleagues are working on something like that.) But then, Wi-Fi is only one part of the problem. Your phone probably has three other wireless interfaces: Near Field Communication (NFC), Bluetooth, and of course the 2G, 3G, 4G or LTE cellular radio.

The very short range of NFC reduces the risk of it being used for tracking. The user may deliberately uses NFC to provide information for example to perform an electronic payment, but then the tracking is pretty similar to what could be done with a credit card payment.

Bluetooth has a nominal range of a few meters, but spies can use powerful antenna and listen to Bluetooth at longer distance. Store or building owners can install networks of “Bluetooth Tracking Devices” to follow the movement of people and their cell phones around their store. This can only be thwarted if we develop some form of Bluetooth privacy feature. Of course, the simplest solution is to turn Bluetooth off when not needed, but this is quite inconvenient. I like my phone to discover my car automatically, and would not like to have to manually turn Bluetooth back on when stepping in the driver seat! The most convenient way is probably some form a “passive discovery” mode, which used to be the default behavior for Windows PC. In short, Bluetooth needs the same kind of attention as Wi-Fi.

The cellular radio poses a much more difficult problem. The connection between a cell phone and a mobile service starts by an authentication exchange. Each phone has a unique 15-digit “International mobile subscriber identity” (IMSI). At the beginning of the connection, the phone presents its IMSI to the visited network. The Mobility Management Entity of the visited network uses the first 5 or 6 digits of the IMSI to identify the “home network” of the phone, and requests authentication data from the Home Subscriber Server (HSS) to verify that the phone is authorized to connect. It then uses this data in an authentication request sent to the phone. The authentication request also includes information provided by the HSS to authenticate the visited network. When the exchange concludes, the phone and the network have verified each other’s identities, and have negotiated keys to secure and encrypt the exchanges.

There are two obvious problem with that design. First, the mobile phone provides its identity to the network before it had a chance to authenticate the network. This vulnerability is currently used by devices like the Stingray phone tracker, which various agencies use to spy on phone users without any need for warrants. The second problem is that the protocol discloses the actual identity of the device, when in fact it only needs the identity of the home service.

The same problems used to happen in another context, network authentication using the EAP protocols. The IETF working groups provided a solution with methods like EAP-TLS, which use two distinct identities. The mobile presents to the visited network a privacy Network Access Identifier (NAI) that identify the home server but randomizes the identity of the mobile, as defined in RFC 4282. The real identity is then sent in an encrypted way to the home server, without being revealed to the visited network.

The EAP-TLS solution cannot be easily implemented in the LTE world, if only because at this stage of deployment the LTE protocol will be very hard to change. But a Mobile Network Operator or maybe a Virtual Mobile Operator could still adopt something similar. Suppose for example that each mobile is provisioned with a set of “throw away” IMSI that can play the same role as the NAI or EAP-TLS. When the mobile wants to connect, it picks one of the throw-away IMSI. The request is routed to the HSS of the MNO, which knows the relation between the throw-away ID and the real identity. Later, the throw away IMSI is discarded, and the phone is provided with a new one. The connection completes, as expected by the visited network, but the identity is not disclosed – and devices like the Stingray become very much useless. Of course, this will need more engineering than a two lines analysis, there may well be some tricky accounting issues, the provisioning protocols will have to be defined, but it seems that a willing provider could provide much better privacy than is currently the norm.

We are seeing the first move in the right direction. Let’s hope that after randomizing Wi-Fi we will see some protection for Bluetooth and for LTE. And more encryption to plug all these leaks detailed in the Ars Technica article!

Posted in Uncategorized | Leave a comment

DMARC or not, can email evolve?

Many years ago, I worked on email standards, developing for example a gateway between SMTP/TCP-IP and X.400. We used it in the very early years of the Internet, from 1983 to about 1990, when European research networks finally gave up on the OSI/ITU standards and fully embraced the Internet. Since then, there have been hits and misses. The big success was the standardization of MIME, which allowed multimedia content. Everybody uses it to send pictures or presentations as email attachments. The big failure has been security.

Email security has progressed in the last 20 years, but only very slowly. The Internet community IETF made several attempts at defining “secure mail” extensions. “Privacy Enhanced Mail” never got deployed. S-MIME got widely implemented but hardly anybody uses it. PGP is popular among security experts and paranoid users, but the general public pretty much ignores it. The vast majority of email is SPAM, despite attempts at filtering, blacklisting, and origin verification protocols like SPF or DKIM. And then, there is “phishing,” which turns out to be a very vicious threat.

Phishing operates by sending an email to the target, the “phish.” The email will typically contain some kind of trap, maybe an attachment that conceals a virus, or maybe a link to an infected web site. If the phishes open the attachment or download the virus, they are hooked. In a large part, phishing is enabled by the very weak “origin control” in email messages. It is the same weakness that we used to exploit for practical jokes, such as sending this kind of email to a newly hired coworker:

From: Bill Gates

To: John Newbee

Subject: New Hire Welcome Party

Please join me for the new hire welcome party today at 5 pm in the Building 7 lobby.

Hilarity ensued when the poor guy prepared for and tried to join the non-existent party. (There is no building 7 on the Microsoft campus.) The prank was very easy to do: just telnet to port 25 of an email server, and tell that server that there was an incoming mail from “billg@microsoft.com.” Back then, the message was just accepted, no question asked. Things have progressed somewhat with the deployment of DKIM and SPF, and the “real” sender of message will be tracked. The same prank is still doable, but it will require creating a relay domain under the control of the prankster, plus a fair amount of configuration of that domain. If the sender field is displayed, the receiver will have more information:

Sender: Postmaster <postmaster @ mailrelay.example.net>

From: Bill Gates <billg @ microsoft.com>

To: John Newbee

Subject: New Hire Welcome Party

Please join me for the new hire welcome party today at 5 pm in the Building 7 lobby.

In theory, an alert reader will notice that the “sender” and “from” fields do not match, and would discard the message as a joke. But in practice, many people will still be fooled. Some mail programs may not have a good convention for displaying the sender information, and some do not display it at all. Users may or may not understand the significance of the “sender” field.

Phishing is often a numbers’ game, in which the attackers may try many potential phishes in a target organization. They only need to hook one of them to get a beach head inside the target, and proceed from there on their way to the trove of secrets that they are coveting. A single lapse of judgment by one user is enough to compromise the whole organization. And that’s why we see know a proposed escalation on top of SPF and DKIM, with DMARC.

DMARC allows domain owners to set policy on the handling of email coming from their domains. It basically directs email recipients to use SPF and DKIM to check the origin of the email, and to verify that the “sender” domain matches the “from” information. If there is a mismatch, the recipients are instructed to either flag the message for further inspection, or possibly to reject it outright. The sending domains chooses between “flag” or “reject” policy, and the receivers are expected to just follow orders.

In theory, DMARC would make some kind of phishing much harder. In practice, it turns out to be incompatible with the existing practice of remailers and mailing lists. For example, if I send a message to the IETF mailing list, the recipients will see it appear as:

Sender: ietf <ietf-bounces@ietf.org>

From: Christian Huitema <huitema@microsoft.com>

This is exactly the same relay structure that could be abused by phishers, so of course it is incompatible with DMARC. All mail delivered by such mailing lists will thus be flagged as “potential phishing message” by the DMARC filter. Yahoo went one step further, and asked DMARC compliant recipients to automatically reject such messages, causing a great uproar among mailing list managers. Just check the IETF list archive, and you will see a hundreds of messages in a few days, most stating the “yahoo broke mailing list, they need to fix it,” with a few DMARC supporters asserting that “mailing lists are stuck in the 80’s, they need to evolve.”

The problem of course is that mailing lists cannot evolve easily. Many mailing list agents reformat the subject field or adding some text to the email, which breaks DKIM. All mailing list operation include sending the message through a relay that has no relation with the “from” address, which either breaks SPF or requires different “sender” and “from” addresses, breaking DMARC “same origin” rule. The only way for list agents to comply would be to completely rewrite the message and create a new origin within the mailing list domain, something like:

Sender: ietf <ietf-bounces@ietf.org>

From: Christian Huitema <huitema@microsoft.com>

From: Christian Huitema <ietf-christian-huitema@ietf.org>

Reply-To: Christian Huitema <huitema@microsoft.com>

Such rewrite would comply with DMARC because the origin of the message would be clearly in the IETF domain. Of course, we would want private replies to also work, and for that mailing list agent would need to put the original sender address in the “reply-to” field. In short, we would get “DMARC compliant mailing list” that works, at some increased operational cost. But the problem is that the result is not really different from another form of phishing, in which the phisher creates addresses in his own domain, such as:

From: Christian Huitema <christian-huitema@all-your-mail-belong-to-us.info>

By encouraging mailing lists to rewrite addresses, we would encourage users to disregard the “email address” part of the “from” field, because it varies from list to list. Instead, users would just look at the common name, which can be easily forged by phishers. Strict DMARC policy would cause mail agents to create work around that make phishing easier, not harder.

I don’t know what will happen next. Reasonable minds would think that Yahoo would revert their DMARC policy from “reject” to “flag.” After all, once a message is flagged, inspection software can reasonably be expected to make a difference between a reputable mailing list and a phishing domain. The software could learn the mailing lists that a user is subscribed to. At the same time, we may expect mailing list practices to evolve and not break DKIM, for example by placing the mailing list specific information in a new header instead of rewriting subject and message fields that are covered by the DKIM signature. All that will probably happen within the next month, and hopefully phishing will become harder.

The main lesson of this debate is that changing an old standard is really hard. Email has been around since the beginnings of the Internet, and each little detail of the e-mail headers is probably there so serve some group of users somewhere. Security changes have to accommodate all these consistencies. That will be slow!

 

Posted in Uncategorized | 2 Comments

The Apple TLS bug, and coding guidelines

Right when the whole industry appears to respond to the NSA spying by reinforcing their encryption defense, we learn about a bug in Apple’s TLS implementation. There are many comments on the web about the genesis of this bug, such as for example this one from “Imperial Violet,” which provides details about the coding mistake that caused the error. I like this bug, because among other things it exposes the need for good coding guidelines.

Of course, bugs happen. Having worked for some time on Windows, I know that. Critical procedures like the negotiation of TLS encryption keys ought to be tested. For me, one of the basic rule of system programming is, “if it was not tested, it probably does not work.” In that case, good engineering requires a Unit Test that checks the various error conditions that the procedure is supposed to detect. One wonders why the Apple processes do not call for that. But this bug was particularly silly, a duplicated line in the source code that caused a critical test to be ignored. According to the Imperial Violet web page, which quotes Apple’s published source code, the offending source code looks like that:

static OSStatus

SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,

uint8_t *signature, UInt16 signatureLen)

{

    OSStatus err;

    

    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)

        goto fail;

    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)

        goto fail;

        goto fail;

    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)

        goto fail;

    

fail:

    SSLFreeBuffer(&signedHashes);

    SSLFreeBuffer(&hashCtx);

    return err;

}

You can see the duplicated “goto fail” statement that causes subsequent tests to be ignored. This kind of error is normally caught by the compiler, but only if you call the compiler with a sufficiently restrictive option. Windows developers are required to using “warning level 4,” which would have produced a compiling error because the “SSLHashSHA1.final” test is unreachable. Windows automated code checks would also have caught the error. Apparently, Apple does not require this kind of precautions.

But apart from restating the rules about “writing secure code,” this bug also reminds me of the importance of good coding standards. Writing code is not just about encoding an algorithm, it is also about communicating with other developers, like for example the folks who will have to update that code in a couple years, or your buddies who will review the code before you check it in. We require code reviews before any check-in at Microsoft, I assume they do the same at Apple, but obviously the reviewers did not find the bug, maybe because they were fooled by the indentations. This particular bug is actually a kind of optical illusion, because the duplicate line have the same indentation as the original line. This is where coding standards kick in.

Coding standards or coding guidelines often specify things like when to use upper case and lower case in the name of variables, what kind of indentation to use, what kind of comments, etc. This is a bit arbitrary, different teams will often have different guidelines, but having common guidelines makes code easier to review, and that is a big advantage. One such standard is to never use the “if () instruction;” construct, but to always use brackets for the conditional branch. For example, instead of writing:

    if ( cashAvailable < price)

        refuseTheSale();

We would require that the developer writes:

    if ( cashAvailable < price)

    {

        refuseTheSale();

    }

Apart from making the program more readable, it also avoids future maintenance errors, such as writing:

    if ( cashAvailable < price)

        AddToRefusedSalesStatistics(cashAvailable, price);

        refuseTheSale();

Instead of:

    if ( cashAvailable < price)

    {

        AddToRefusedSalesStatistics(cashAvailable, price);

        refuseTheSale();

    }

That is, this coding guideline reduces the risk of confusion between “instruction” and “block,” a classic pitfall in “if” statements. Applying it would have made the error quite obvious, and also harmless. The code would have looked like this:

    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)

    {

        goto fail;

    }

    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)

    {

        goto fail;

        goto fail;

    }

    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)

    {

        goto fail;

    }

    

The extra curly brackets add a bit to the line count, but they definitely make the code much easier to review, and the errors much easier to detect.

The other point that strikes me in that code is the use of the “goto” statement. I am probably in a minority here, but I have a lot of sympathy for Dijkstra’s statement, “Go To Statement Considered Harmful.” The code above could be easily written without using the goto statement:

static OSStatus

SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,

uint8_t *signature, UInt16 signatureLen)

{

    OSStatus err;

    

    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) == 0)

    {

        if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) == 0)

        {

            if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)

            {

                

            }

        }

    }

    SSLFreeBuffer(&signedHashes);

    SSLFreeBuffer(&hashCtx);

    return err;

}

The code is easy to read, it will not terminate too early, but it requires extra indentations and that can be painful. If the “standard path” code represented by the ellipses is long, we end up with having to count curly brackets over many lines of code, which is error prone. That’s the main reason why many of my colleagues prefer the “goto fail” pattern. On the other hand, if that was really a problem, it is always possible to factor the “common case” as a separate procedure. In any case, if you want to use goto statements, if you think that you know better than Edsger Dijkstra, you have to be extra careful!

Posted in Uncategorized | Leave a comment