Cracking the SNI encryption nut

Back in May and June this year, I was reviewing the state of SNI encryption. I found in reviewing the archives of the TLS mailing list. I collected the list of attacks that demonstrated holes in previous proposals, and documented them in an Internet draft (https://datatracker.ietf.org/doc/draft-huitema-tls-sni-encryption/) that has just been adopted as a work item by the TLS working group (https://datatracker.ietf.org/doc/draft-ietf-tls-sni-encryption/). The draft also listed two proposals that have resisted attacks so far, one based in a form of TLS in TLS tunneling, and the other based on extended session resume tokens. But something kept bothering me. Both proposals are complex to implement, the session resume token proposal does not cover the first connection between a user and a hidden node, and the tunneling proposal is difficult to extend to protocols like QUIC that use TLS but do not use TCP. In short, SNI encryption is a tough nut to crack. But I think that I have cracked that nut. Let’s introduce the proposed solution, the SNI and ALPN Encryption eXtension (SAEX).

The idea of SAEX is to have a mutable extension that can be carried in the Client Hello. The extension takes two forms: a clear text form, and an encrypted form. The clear text form contains the real SNI, the real ALPN, and a Nonce. The encrypted form is a string of bytes that only the receiver of the client hello can decode. When the client ask the TLS stack to prepare the Client Hello message, it includes the clear text form of the SAEX extension in the message, but prior to actually sending that message on the wire it substitute the encrypted form. Similarly, when the server receives the Client Hello message, it decodes the encrypted value of the SAEX extension and substitutes the clear text form. The server then passes the Client Hello for further processing to the subsystem identified by the “real SNI” named in the SAEX extension.

The most vexing of the various threats listed in the draft (https://datatracker.ietf.org/doc/draft-ietf-tls-sni-encryption/) is probably the replay attack. The adversary just observes the token carrying the encrypted SNI and replays it in its own Client Hello. If the connection succeeds, it learns the identity of the hidden site by observing the certificate or the web pages returned by the server. SAEX is robust against that attack when using TLS 1.3, because all the secrets computed by TLS depend on the content of the Client Hello, which with SAEX include the clear text form of the extension. The adversaries can only observe the encrypted form of the extension. They could guess the value of the SNI and ALPN, but they cannot guess the nonce. That means that they can replay the encrypted extension, but they won’t be able to understand the server’s responses and determine the server’s identity.

There is one threat in the list that SAEX cannot mitigate, the issue of standing out. A Client Hello that carries the SAEX extension will automatically look different from one that doesn’t. Opinions varies about the importance of this issue. On one hand, we can see censors ordering their great firewalls to drop all packets carrying the extension, forcing users to choose between connectivity and privacy. On the other hand, if some big services adopted the extension, this censoring strategy would be very costly. On balance, it is probably worth trying.

Of course, there are many details to get right. The clients need to obtain both the clear text and the encrypted value of the extension before starting the connection, and that needs to be properly engineered. The server could use a variety of encryption methods, with diverse properties. Used improperly, the nonce in the extension could serve as a cookie that identifies the user. But working groups are good at ironing out this kind of issues. And at least, the SAEX design proves that we can solve the SNI encryption issue. That tough nut can be cracked.

Advertisements
Posted in Uncategorized | Leave a comment

Newspapers, subscriptions, and privacy

Quite often now, when I click on a link to a news article, I am greeted by a message explaining that I will not be able to see it. In some cases, the news site asks me to please turn off the ad blocker. In other cases, the site will ask me to please buy a subscription. What I do then is just close the window. I will get my information some other ways. And I will continue doing that until the news sites stop their gross attacks against their readers privacy.

Many newspapers are ad funded. Even those that are funded by subscriptions also run ads. They were doing that in print form, and they continue doing that with the web. Now, to be clear, I did not care much for the printed ads, but that never stopped me buying newspapers. Ads on the web are different. They tend to be more aggressive than print ads, what with animations, interstitials and pop ups. That’s annoying, but that’s not the worst. The worst is the tracking technology behind these ads.

On the web, ads bring more money if they are “targeted”. This desire for targeting created an arms race between advertisers, eager to find new ways to learn about their targets – that is, about us. The advertisement technology has given us ad auctions and a complex opaque ecosystem that basically attempts to keep tabs on people and their browsing history. Many of us believe that this “corporate surveillance” system is just evil. The best protection against that surveillance is to use ad blockers, or more specifically tracking blockers.

Of course, blocking ads also blocks the revenues of the ad-funded newspapers. They would like us to turn off the blocker, or to buy a subscription. And it makes sense, because we need to pay the journalists. The problem there is that buying a subscription does not prevent the tracking. Whenever I get one of those suggestions from a news site, I attend to find and read their privacy policy. If I did buy a subscription, what would they do with my personal data?

The typical “privacy” policy is hard to find. For the New York Times, for example, you have to scroll to the very bottom of the home page, the very last line of gray tiny print, and find the “privacy” keyword. (It is here: https://www.nytimes.com/content/help/rights/privacy/policy/privacy-policy.html.) If you click on it, you get a long page that was probably vetted by a team of lawyers and describes every kind of tracking and advertising that they could think off. Another example would be “Wired”, for which the privacy policy link hides in small print in the middle of the subscription form. The link points to the generic policy of the publishing group, Conde Nast (http://www.condenast.com/privacy-policy/). Again, the text reserves the right to track the user in every which way and use the information however they see fit. A newspaper like the Guardian will quite often publish papers railing against state surveillance, but take a look at their privacy policy: https://www.theguardian.com/help/privacy-policy. Again, they reserve the right to collect all kind of data and use it for targeting advertisements.

Nowhere will you get something as simple as “if you subscribe and you opt to not be tracked, we will not track you and we will also not let third parties track you through our site.” And that’s too bad, because if they did say that I would in fact subscribe to many of these newspapers. But what you get is instead, “if you subscribe, we will verify your identity in order to check the subscription, and we will track you even more efficiently than if you were not a subscriber.” No, thanks.

Posted in Uncategorized | Leave a comment

Privacy’s Reductio Ad Absurdum

For Internet Privacy advocates like me, the recent vote by Congress to authorize ISP to sell customer information is disheartening. But it is also a proof that the current attacks on privacy in the Internet are not sustainable, a “reduction ad absurdum” of what will happen if the current trends continue. The press is full of discussions about the evil of telecommunication companies, and the congressmen that they lobbied. The blame there is well deserved, but the ISP are actually minor players in the “war on privacy”. The major players? Companies like Facebook and Google, and generally the “advertisement funded” business model pushed by Silicon Valley investors. Indeed, the main argument of the ISP lobbyist was that if Google can benefit from collecting private information, why shouldn’t Verizon or AT&T? The lobbyists demonstrated that they could sway Congress with that argument, no doubt helped with a generous helping of “campaign donations.”

I don’t know whether the congressmen understood the consequences of their actions. Basically, they voted to authorize a free for all. Anybody who can grab your personal information is authorized by law to sell it. The state will not be picking favorites between ISP, search engines, social media providers, or car ride companies. What goes for one, goes for everybody. Of course, that’s very scary. In the name of more efficient advertisements, companies will collect your browsing history, your interests, your lists of friends, the various places that you visit, and of course everything that you purchased. In fact, there is no reason to stop at advertisements. The prospects are endless. Someone will sell the data to hiring managers, to make sure that they select prospective employees with the right “cultural fit” with the companies. The managers of political campaigns will buy the data and tune individual messages based on the biases of individual voters. Banks will obtain information before agreeing on the loan. Secret polices will probably get their countries to pass law giving them free access to the advertisers’ data. And because everybody can collect the data, there will be very few places to hide.

So, in a sense, there is a silver lining to Congress’ decision. In its absurdity, it demonstrates to our society the extreme danger of the business model pushed by Silicon Valley in the last 20 years. A classic “reductio ad absurdum”. If one company does it, it is probably not too much of a problem. But if you allow one company to do it, you must allow all of them. And if all of them do it, the result is patently absurd.

Posted in Uncategorized | Leave a comment

Two wins for Internet Privacy on the same day

Today, the 17th of May 2016, the IETF Editor published RFC 7844, an Anonymity Profile for DHCPv4 and DHCPv6 clients, and RFC 7858, transmission of DNS requests over TLS. These two RFC can close two important avenues for leaking metadata over the Internet.

I started working on what became RFC 7844 in November 2014. The work on MAC Address Randomization was progressing swiftly, and the corresponding feature would ship in Windows 10 the next year. MAC Address Randomization ensures that your laptop or your smartphone cannot be tracked by its Wi-Fi MAC Address, but we quickly observed that doing that was not sufficient. When your computer joins a network, it executes a series of low level protocol to “get configured.” One of this protocol is DHCP, which is used to obtain an Internet Protocol Address. The problem is that DHCP is very chatty, and by default provides all kind of information about your computer name, software version, model, etc. I worked with the members of the DHCP working group in the IETF to remedy that, and their response was great. They produced thorough analyses of privacy issues in DHCPv4 and in DHCPv6, which have just been published as RFC 7819 and RFC 7824. RFC 7844 patches all these issues. And the best part is that an implementation already shipped in the November 2015 update of Windows 10.

The work on DNS Privacy is just as important. By default, your computer issues “name resolution” requests in clear text for each web site and each Internet service that it uses. This stream of requests is a rich set of metadata, rich enough to allow for identification of the computer’s user, and then to track its activities. It is here for the taking, by shady hot spot providers, spies, or maybe criminals. RFC 7858 defines how to send these requests in a secure fashion to a trusted DNS server, effectively closing that source of metadata leakage.

Of course, there is more work to do. But the 17th of May 2016 was a great day for Internet Privacy.

Posted in Uncategorized | 2 Comments

MAC Address Randomization in Windows 10

As you may know, I care a lot about Internet privacy. The main tool for privacy is encryption, hiding your communications from potential spies. But that’s not enough. We also need to deal with web privacy, the tracking of your web browsing by advertisers using cookies. And we need to minimize the “metadata” that your devices disclose when you are connecting to the Internet. I work in the Windows networking team at Microsoft, and I had a chance to work specifically on this metadata problem, helping the team implement “MAC Address Randomization.”

You may have heard of systems that track your movements by tracking the Wi-Fi hardware address of your phone or your laptop – the address also known as “MAC Address.” Windows 10 includes protection against that. It implements a form of MAC Address Randomization. The randomization function can be controlled through the settings application. You will first have to go to the Wi-Fi page in the settings UI, and if you scroll to the bottom of the page you will see two options:

If you click the “Manage Wi-Fi Settings” link, you will get to a page that control all the “global” options for the Wi-Fi interface. MAC Address Randomization is one of these options. If your hardware is capable of supporting MAC Address randomization, the page will look like that:

The feature is supported on the recent hardware. If your hardware does not support randomization, the UI will of course not present the option. If randomization is supported, switch the toggle to turn the feature ON or OFF.

On the phone, the UI is slightly different. You will need to click the “Manage” button at the bottom of the Wi-Fi page to get to the manage settings page, but the logic is the same.

If the option is turned ON, your phone or laptop will start using random MAC Addresses when “roaming” between connections. Normally, when your device is not connected, it will wake up every minute or two, and try finding if there is a Wi-Fi network available in the vicinity. For that, it will send “probes.” In the absence of randomization, these probes are sent from a constant or “permanent” MAC address that uniquely identifies your device. Trackers listen to these probes, for example in some shopping malls, department stores, or other public areas. When randomization is turned on, the system sends these probes from a random MAC Address. This defeats a large part of the “Wi-Fi” tracking currently going on.

Eventually, you will want to connect to Wi-Fi network. At this point, the system acts differently if it is the first connection to a new network, or if you already connected to that same network in the past. If this is a new network and randomization is ON, the system will pick a random MAC Address for that network, and use it for the new connection. If randomization is OFF, the system use the permanent MAC Address embedded in the hardware. For repeated connections, by default, the system will keep using the same MAC Address that it used on the first connection, random if randomization was ON for the first connection, permanent if randomization was OFF.

There are networks for which you care more about usability and management, and you should keep using the permanent MAC address there. A classic example are corporate networks, in which IT managers want to precisely track who is connecting. Another example are some home networks, in which an overactive owner decided to turn on the “MAC Address Filtering” feature. You should turn MAC Address randomization OFF before connecting to such networks. You can turn randomization back ON after the first connection is complete. The system will remember to use the permanent MAC Address for these networks.

Even when randomization is ON, the system will by default use the same random MAC Address each time you return to the same Wi-Fi network. This design attempts to balance privacy and usability. Suppose for example that you are travelling and that you connected to the hotel’s Wi-Fi. On your first connection, you will probably have to fill up a web form to get authorized. The hotel’s Wi-Fi will then remember the MAC address of your device, and let it connect. If the system kept changing the MAC Address of your Wi-Fi interface, you would have to fill that form again each time you reconnect, and that would be fairly annoying. But if you keep using the same random MAC address for each connection, the network will recognize you, and you will not have to fill a form, or, in the worst case, pay again for the connection.

Of course, if you go to a different Wi-Fi network, the system will pick a new random MAC address for each of these networks. Each network knows that the same person is connecting again and again, but different networks cannot track that the same person is moving from the hotel to the airport or to a café. We believe that most people will be happy with that compromise, but if you are not, you can use the UI to change the setting. Suppose for example that you are coming every day to the same café, and that you don’t like the idea that once the system picks a random MAC address for the café, observers could track it. Again, you will go to the WI-Fi UI and look at options at the bottom of the page. Instead of selecting the “manage Wi-Fi settings” option, you will select the “advanced options,” and you will see the “per network” randomization option:

Selecting the drop box will unveils three possible settings:

On the phone, the UI is slightly different. In the “Manage Wi-Fi” UI, you will need to click on a network name to “edit” the network properties. At the bottom of the properties page, you will see the same control as described above, we the same choice between On, Off and “Change Daily.”

The “On” setting is the default when randomization is turned on. The system picks a random MAC address and keeps using it for this network. The “Off” setting forces the system to use the permanent MAC address. The “Change daily” setting instructs the system to pick a different MAC address every day. This is the setting that you want to use if you are concerned about your privacy when you are regularly visiting the same place. Of course, if you chose the “change daily” option, you may have to fill a new web form every day when you connect. But that’s your choice!

Posted in Uncategorized | Leave a comment

The Quest for Internet Privacy

Two years have passed since the Snowden revelations, and almost two years since the IETF meeting in Vancouver. There was a palpable sense of urgency in the meeting, with more than a few hints of anger, as you can see watching the recording. But even with that kind of pressure, there were always some doubts that standard groups like the IETF could achieve significant results in a short time. Well, if we allow that two years is short enough for the IETF, we have in fact seen results.

I don’t know if RFC 7624 will ever become a best seller, but it does present a comprehensive review of the threats to Internet privacy posed by global surveillance. It took two years and the merging of several drafts, in typical IETF fashion, but the message is clear. The analysis has informed two different types of actions: deploy encryption, and reduce the amount of meta-data leaked by various protocols.

Previous standards like HTTPS were already there, and the industry started deploying encryption without waiting. We could see the entries in the EFF’s Encrypt the Web Report progressively turn green as big companies encrypted their web pages and services, their data storage, or email transmission. But new standards make the encryption more efficient and easier to deploy. In particular, the combination of HTTP/2.0 and TLS/1.3 will make encrypted transmission faster than legacy clear-text HTTP/1.0.

My personal efforts have been centered on the other part of the threat, traffic analysis and metadata collection, and I am glad that lots of my initial analyses found their way in RFC 7624. The connection of your computer or smart phone to the Internet relies on protocols like Ethernet, Wi-Fi, DHCP or the Domain Name System that were designed in the 80’s or 90’s. The priority then was to make the networks as reliable and as easy to manage as possible. As a result, the protocols carry identifiers like MAC Addresses, unique device identifiers or unique names. As you move through the Internet and connect to a variety of hot spots, the metadata reveal your identity and allows for location tracking and IP address to user name correlation. Traffic analysis gets much easier if the user of the IP address is known!

We are making progress. MAC Address Randomization, DHCP Anonymity and DNS Privacy are still work in progress, but the standards and early implementations are getting ready. That’s the good news. The bad news is that even when we will be done, there are still two worrisome kinds of tracking: advertisement on the web, and cell phone locations. Ad blockers may impede web tracking, but for cell phones the only plausible solution so far is the “airplane mode.” A little more work is needed there!

Posted in Uncategorized | Leave a comment

Hiding a Wi-Fi network is worse than Security Theater

Last month, I spent a lot of time looking at Wi-Fi protocols, and in particular at the privacy implications of Wi-Fi on mobile devices. The main privacy issue with Wi-Fi the use of “worldwide unique” MAC addresses, which enable really efficient tracking of devices and their owners. The industry is starting to address this. But a close second is the practice of “hiding the SSID,” in a misguided attempt at increasing a network’s security. The idea was to hide the name of your Wi-Fi network from people in your neighborhood. The effect is to have your phone broadcast the name of the network every few minutes, negating any privacy gain from techniques like MAC address randomization.

When you setup a Wi-Fi network, you are supposed to use the management interface of your router and assign a name to the network. (If you don’t do that, you get a default name like “linksys” or “D-Link”, which is not a very good idea.) For example, I gave to my network the name “9645NE32.” In the standard Wi-Fi setup, the wireless access points broadcast their availability by announcing their name, their SSID in Wi-Fi standard jargon. These broadcasts are captured by your device, and presented in the menu of available networks. When you want to connect to a network, you pick a name in the menu and you get connected. In many cases, the device will remember the networks that you connect to, and reconnect automatically when the network is in range. Life is good.

In the early days of Wi-Fi, some people were very concerned that outsiders would try to connect to their network. They looked for a way to “hide” the network, so the name would not appear by default in the connection menus of phones or laptops. Access Point manufacturers obliged, and provided a setting to “not broadcast the SSID.” In order to connect, the users cannot just click. They will have to manually enter the name of the network on their device. In short, the name acts as some kind of password. If you don’t know it, you cannot enter the network. It seemed like a good idea, an extra layer of security. The problem is, it is at best a very weak protection, analogous to sending a clear text password over the radio. And it allows for very efficient tracking of devices.

In the previous paragraph, I wrote that the access points broadcast their presence, and that the devices listen to these broadcasts. They do, but if the device only listened to broadcast data the discovery would be very slow. Access points operate on specific frequency bands, the Wi-Fi channels. The precise number of available channels varies from country to country, but you can count 3 or 4 popular channels at 2.4 GHz, and maybe 20 channels at 5 GHz. A device only listens to one channel at a time, and an access point only broadcast at fixed intervals. Passive discovery would involve listening on a channel for 2 or 3 broadcast intervals, then switching to the other channel and repeating. Very slow, and also power consuming since the receiver has to be active for long periods. Instead of passive listening, devices accelerate the process by sending “probes.” They will switch to a channel and send a probe messages asking “is there anyone here?” The access point that receives the message is supposed to answer immediately, “Yes, I am serving network SO-AND-SO.” Since the response is almost immediate, the device need only wait a short time to find out whether there is an access point serving the channel or not. It can then move to the next channel, repeat the process, and so on until all channels have been scanned.

In the case of hidden networks, things become a bit more complicated. The access point does answer the probes, but with a cryptic message, “Yes, I am serving some network on this channel but I won’t tell you which one.” That way, the network name is not broadcast and does not end up in the connection menus. The user will enter the network name, and at that point the device will send a new probe, one that includes the network name, “are you network SO-AND-SO?” If the name is indeed that of the hidden network, device and access point will establish the connection. Of course, users don’t want to be always entering the network name in the connection dialog, so the device’s software remembers that. It will start systematically probing for the hidden networks to which it might connect.

The problem of course is that the probing traffic can be listened to by anyone with a Wi-Fi sniffer. A sniffer near a hidden network will of course discover the network name, just by listening to the probe traffic. An active sniffer might emulate an access point to trick local devices to send probes, for very quick discovery. So much for the “Added security part.” But it gets worse. When you go to a café, to a hotel, to an airport, in fact pretty much anywhere near a Wi-Fi network, your device will keep sending these probes. “I am looking for network SO-AND-SO, are you it?” Nice way to follow you around, isn’t it?

In short, hiding the network name has no security benefit, and has a clear negative effect on privacy. It probably also open the door for instant attacks, in which access points are programmed to automatically spoof the hidden network and trick devices into attempting to connect. In short, it is a very bad idea, worse that Security Theater. If someone reads this and stops, I would be happy!

Posted in Uncategorized | 13 Comments