Blinding the spy in our pockets

Ars Technica and NPR collaborated on an interesting story. The Ars reporter, Sean Gallaghan, installed a “modified” Wi-Fi router in the office of the NPR journalist, Steve Henn. The “PwnPlug” router can capture and analyze the traffic, mimicking what the NSA can do by tapping the network. And, of course, the network analysis revealed lots of data about Steve Henn. It is as if each of us is carrying a personal spy in his or her pockets! When I heard about this on the radio, I was not really surprised by the result, but I was interested that this study was making its way to prime time radio news. That’s a good first step. If more people are aware of the spying going on, someone may eventually do something about it.

The most striking part of the report concerned what happens when Steve merely brings his iPhone in his office, without actually using it or starting any application. As soon as it obtains connectivity, the phone starts contacting a variety of web sites to update business and personal mail, calendar, maps, notes, twitter, Facebook, and probably a few more. Some of that traffic is encrypted, but even then the IP addresses are enough to understand who connects where. Enough headers are still in clear text to provide good correlation, pretty much as I was describing in an Internet Draft last year.

The same week, Quartz reported about a small feature in the latest version of Apple’s IOS: MAC address randomization. The article just provides a general description of the feature, picking a random MAC address before connecting to a Wi-Fi network. The article does not provide much detail about the implementation, but I assume that the software developers at Apple spent a lot of time tuning the feature. In general, changing the MAC address as often as possible provides better privacy, but changing it too often may force the user to repeat network authentication procedures, which is cumbersome. The IETF IPv6 working group debated a similar feature for “randomized IPv6 addresses” for over a year before publishing RFC 7217. Whatever the details of the implementation, it feels like a step in the right direction, and I hope that other operating systems will soon follow. (I don’t work in the Windows Wireless team anymore, so I have no idea whether my colleagues are working on something like that.) But then, Wi-Fi is only one part of the problem. Your phone probably has three other wireless interfaces: Near Field Communication (NFC), Bluetooth, and of course the 2G, 3G, 4G or LTE cellular radio.

The very short range of NFC reduces the risk of it being used for tracking. The user may deliberately uses NFC to provide information for example to perform an electronic payment, but then the tracking is pretty similar to what could be done with a credit card payment.

Bluetooth has a nominal range of a few meters, but spies can use powerful antenna and listen to Bluetooth at longer distance. Store or building owners can install networks of “Bluetooth Tracking Devices” to follow the movement of people and their cell phones around their store. This can only be thwarted if we develop some form of Bluetooth privacy feature. Of course, the simplest solution is to turn Bluetooth off when not needed, but this is quite inconvenient. I like my phone to discover my car automatically, and would not like to have to manually turn Bluetooth back on when stepping in the driver seat! The most convenient way is probably some form a “passive discovery” mode, which used to be the default behavior for Windows PC. In short, Bluetooth needs the same kind of attention as Wi-Fi.

The cellular radio poses a much more difficult problem. The connection between a cell phone and a mobile service starts by an authentication exchange. Each phone has a unique 15-digit “International mobile subscriber identity” (IMSI). At the beginning of the connection, the phone presents its IMSI to the visited network. The Mobility Management Entity of the visited network uses the first 5 or 6 digits of the IMSI to identify the “home network” of the phone, and requests authentication data from the Home Subscriber Server (HSS) to verify that the phone is authorized to connect. It then uses this data in an authentication request sent to the phone. The authentication request also includes information provided by the HSS to authenticate the visited network. When the exchange concludes, the phone and the network have verified each other’s identities, and have negotiated keys to secure and encrypt the exchanges.

There are two obvious problem with that design. First, the mobile phone provides its identity to the network before it had a chance to authenticate the network. This vulnerability is currently used by devices like the Stingray phone tracker, which various agencies use to spy on phone users without any need for warrants. The second problem is that the protocol discloses the actual identity of the device, when in fact it only needs the identity of the home service.

The same problems used to happen in another context, network authentication using the EAP protocols. The IETF working groups provided a solution with methods like EAP-TLS, which use two distinct identities. The mobile presents to the visited network a privacy Network Access Identifier (NAI) that identify the home server but randomizes the identity of the mobile, as defined in RFC 4282. The real identity is then sent in an encrypted way to the home server, without being revealed to the visited network.

The EAP-TLS solution cannot be easily implemented in the LTE world, if only because at this stage of deployment the LTE protocol will be very hard to change. But a Mobile Network Operator or maybe a Virtual Mobile Operator could still adopt something similar. Suppose for example that each mobile is provisioned with a set of “throw away” IMSI that can play the same role as the NAI or EAP-TLS. When the mobile wants to connect, it picks one of the throw-away IMSI. The request is routed to the HSS of the MNO, which knows the relation between the throw-away ID and the real identity. Later, the throw away IMSI is discarded, and the phone is provided with a new one. The connection completes, as expected by the visited network, but the identity is not disclosed – and devices like the Stingray become very much useless. Of course, this will need more engineering than a two lines analysis, there may well be some tricky accounting issues, the provisioning protocols will have to be defined, but it seems that a willing provider could provide much better privacy than is currently the norm.

We are seeing the first move in the right direction. Let’s hope that after randomizing Wi-Fi we will see some protection for Bluetooth and for LTE. And more encryption to plug all these leaks detailed in the Ars Technica article!

Advertisements
Posted in Uncategorized | Leave a comment

DMARC or not, can email evolve?

Many years ago, I worked on email standards, developing for example a gateway between SMTP/TCP-IP and X.400. We used it in the very early years of the Internet, from 1983 to about 1990, when European research networks finally gave up on the OSI/ITU standards and fully embraced the Internet. Since then, there have been hits and misses. The big success was the standardization of MIME, which allowed multimedia content. Everybody uses it to send pictures or presentations as email attachments. The big failure has been security.

Email security has progressed in the last 20 years, but only very slowly. The Internet community IETF made several attempts at defining “secure mail” extensions. “Privacy Enhanced Mail” never got deployed. S-MIME got widely implemented but hardly anybody uses it. PGP is popular among security experts and paranoid users, but the general public pretty much ignores it. The vast majority of email is SPAM, despite attempts at filtering, blacklisting, and origin verification protocols like SPF or DKIM. And then, there is “phishing,” which turns out to be a very vicious threat.

Phishing operates by sending an email to the target, the “phish.” The email will typically contain some kind of trap, maybe an attachment that conceals a virus, or maybe a link to an infected web site. If the phishes open the attachment or download the virus, they are hooked. In a large part, phishing is enabled by the very weak “origin control” in email messages. It is the same weakness that we used to exploit for practical jokes, such as sending this kind of email to a newly hired coworker:

From: Bill Gates

To: John Newbee

Subject: New Hire Welcome Party

Please join me for the new hire welcome party today at 5 pm in the Building 7 lobby.

Hilarity ensued when the poor guy prepared for and tried to join the non-existent party. (There is no building 7 on the Microsoft campus.) The prank was very easy to do: just telnet to port 25 of an email server, and tell that server that there was an incoming mail from “billg@microsoft.com.” Back then, the message was just accepted, no question asked. Things have progressed somewhat with the deployment of DKIM and SPF, and the “real” sender of message will be tracked. The same prank is still doable, but it will require creating a relay domain under the control of the prankster, plus a fair amount of configuration of that domain. If the sender field is displayed, the receiver will have more information:

Sender: Postmaster <postmaster @ mailrelay.example.net>

From: Bill Gates <billg @ microsoft.com>

To: John Newbee

Subject: New Hire Welcome Party

Please join me for the new hire welcome party today at 5 pm in the Building 7 lobby.

In theory, an alert reader will notice that the “sender” and “from” fields do not match, and would discard the message as a joke. But in practice, many people will still be fooled. Some mail programs may not have a good convention for displaying the sender information, and some do not display it at all. Users may or may not understand the significance of the “sender” field.

Phishing is often a numbers’ game, in which the attackers may try many potential phishes in a target organization. They only need to hook one of them to get a beach head inside the target, and proceed from there on their way to the trove of secrets that they are coveting. A single lapse of judgment by one user is enough to compromise the whole organization. And that’s why we see know a proposed escalation on top of SPF and DKIM, with DMARC.

DMARC allows domain owners to set policy on the handling of email coming from their domains. It basically directs email recipients to use SPF and DKIM to check the origin of the email, and to verify that the “sender” domain matches the “from” information. If there is a mismatch, the recipients are instructed to either flag the message for further inspection, or possibly to reject it outright. The sending domains chooses between “flag” or “reject” policy, and the receivers are expected to just follow orders.

In theory, DMARC would make some kind of phishing much harder. In practice, it turns out to be incompatible with the existing practice of remailers and mailing lists. For example, if I send a message to the IETF mailing list, the recipients will see it appear as:

Sender: ietf <ietf-bounces@ietf.org>

From: Christian Huitema <huitema@microsoft.com>

This is exactly the same relay structure that could be abused by phishers, so of course it is incompatible with DMARC. All mail delivered by such mailing lists will thus be flagged as “potential phishing message” by the DMARC filter. Yahoo went one step further, and asked DMARC compliant recipients to automatically reject such messages, causing a great uproar among mailing list managers. Just check the IETF list archive, and you will see a hundreds of messages in a few days, most stating the “yahoo broke mailing list, they need to fix it,” with a few DMARC supporters asserting that “mailing lists are stuck in the 80’s, they need to evolve.”

The problem of course is that mailing lists cannot evolve easily. Many mailing list agents reformat the subject field or adding some text to the email, which breaks DKIM. All mailing list operation include sending the message through a relay that has no relation with the “from” address, which either breaks SPF or requires different “sender” and “from” addresses, breaking DMARC “same origin” rule. The only way for list agents to comply would be to completely rewrite the message and create a new origin within the mailing list domain, something like:

Sender: ietf <ietf-bounces@ietf.org>

From: Christian Huitema <huitema@microsoft.com>

From: Christian Huitema <ietf-christian-huitema@ietf.org>

Reply-To: Christian Huitema <huitema@microsoft.com>

Such rewrite would comply with DMARC because the origin of the message would be clearly in the IETF domain. Of course, we would want private replies to also work, and for that mailing list agent would need to put the original sender address in the “reply-to” field. In short, we would get “DMARC compliant mailing list” that works, at some increased operational cost. But the problem is that the result is not really different from another form of phishing, in which the phisher creates addresses in his own domain, such as:

From: Christian Huitema <christian-huitema@all-your-mail-belong-to-us.info>

By encouraging mailing lists to rewrite addresses, we would encourage users to disregard the “email address” part of the “from” field, because it varies from list to list. Instead, users would just look at the common name, which can be easily forged by phishers. Strict DMARC policy would cause mail agents to create work around that make phishing easier, not harder.

I don’t know what will happen next. Reasonable minds would think that Yahoo would revert their DMARC policy from “reject” to “flag.” After all, once a message is flagged, inspection software can reasonably be expected to make a difference between a reputable mailing list and a phishing domain. The software could learn the mailing lists that a user is subscribed to. At the same time, we may expect mailing list practices to evolve and not break DKIM, for example by placing the mailing list specific information in a new header instead of rewriting subject and message fields that are covered by the DKIM signature. All that will probably happen within the next month, and hopefully phishing will become harder.

The main lesson of this debate is that changing an old standard is really hard. Email has been around since the beginnings of the Internet, and each little detail of the e-mail headers is probably there so serve some group of users somewhere. Security changes have to accommodate all these consistencies. That will be slow!

 

Posted in Uncategorized | 2 Comments

The Apple TLS bug, and coding guidelines

Right when the whole industry appears to respond to the NSA spying by reinforcing their encryption defense, we learn about a bug in Apple’s TLS implementation. There are many comments on the web about the genesis of this bug, such as for example this one from “Imperial Violet,” which provides details about the coding mistake that caused the error. I like this bug, because among other things it exposes the need for good coding guidelines.

Of course, bugs happen. Having worked for some time on Windows, I know that. Critical procedures like the negotiation of TLS encryption keys ought to be tested. For me, one of the basic rule of system programming is, “if it was not tested, it probably does not work.” In that case, good engineering requires a Unit Test that checks the various error conditions that the procedure is supposed to detect. One wonders why the Apple processes do not call for that. But this bug was particularly silly, a duplicated line in the source code that caused a critical test to be ignored. According to the Imperial Violet web page, which quotes Apple’s published source code, the offending source code looks like that:

static OSStatus

SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,

uint8_t *signature, UInt16 signatureLen)

{

    OSStatus err;

    

    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)

        goto fail;

    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)

        goto fail;

        goto fail;

    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)

        goto fail;

    

fail:

    SSLFreeBuffer(&signedHashes);

    SSLFreeBuffer(&hashCtx);

    return err;

}

You can see the duplicated “goto fail” statement that causes subsequent tests to be ignored. This kind of error is normally caught by the compiler, but only if you call the compiler with a sufficiently restrictive option. Windows developers are required to using “warning level 4,” which would have produced a compiling error because the “SSLHashSHA1.final” test is unreachable. Windows automated code checks would also have caught the error. Apparently, Apple does not require this kind of precautions.

But apart from restating the rules about “writing secure code,” this bug also reminds me of the importance of good coding standards. Writing code is not just about encoding an algorithm, it is also about communicating with other developers, like for example the folks who will have to update that code in a couple years, or your buddies who will review the code before you check it in. We require code reviews before any check-in at Microsoft, I assume they do the same at Apple, but obviously the reviewers did not find the bug, maybe because they were fooled by the indentations. This particular bug is actually a kind of optical illusion, because the duplicate line have the same indentation as the original line. This is where coding standards kick in.

Coding standards or coding guidelines often specify things like when to use upper case and lower case in the name of variables, what kind of indentation to use, what kind of comments, etc. This is a bit arbitrary, different teams will often have different guidelines, but having common guidelines makes code easier to review, and that is a big advantage. One such standard is to never use the “if () instruction;” construct, but to always use brackets for the conditional branch. For example, instead of writing:

    if ( cashAvailable < price)

        refuseTheSale();

We would require that the developer writes:

    if ( cashAvailable < price)

    {

        refuseTheSale();

    }

Apart from making the program more readable, it also avoids future maintenance errors, such as writing:

    if ( cashAvailable < price)

        AddToRefusedSalesStatistics(cashAvailable, price);

        refuseTheSale();

Instead of:

    if ( cashAvailable < price)

    {

        AddToRefusedSalesStatistics(cashAvailable, price);

        refuseTheSale();

    }

That is, this coding guideline reduces the risk of confusion between “instruction” and “block,” a classic pitfall in “if” statements. Applying it would have made the error quite obvious, and also harmless. The code would have looked like this:

    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)

    {

        goto fail;

    }

    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)

    {

        goto fail;

        goto fail;

    }

    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)

    {

        goto fail;

    }

    

The extra curly brackets add a bit to the line count, but they definitely make the code much easier to review, and the errors much easier to detect.

The other point that strikes me in that code is the use of the “goto” statement. I am probably in a minority here, but I have a lot of sympathy for Dijkstra’s statement, “Go To Statement Considered Harmful.” The code above could be easily written without using the goto statement:

static OSStatus

SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,

uint8_t *signature, UInt16 signatureLen)

{

    OSStatus err;

    

    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) == 0)

    {

        if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) == 0)

        {

            if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)

            {

                

            }

        }

    }

    SSLFreeBuffer(&signedHashes);

    SSLFreeBuffer(&hashCtx);

    return err;

}

The code is easy to read, it will not terminate too early, but it requires extra indentations and that can be painful. If the “standard path” code represented by the ellipses is long, we end up with having to count curly brackets over many lines of code, which is error prone. That’s the main reason why many of my colleagues prefer the “goto fail” pattern. On the other hand, if that was really a problem, it is always possible to factor the “common case” as a separate procedure. In any case, if you want to use goto statements, if you think that you know better than Edsger Dijkstra, you have to be extra careful!

Posted in Uncategorized | Leave a comment

On IPv6 security, SEND, CGA, and alternatives

Some time ago, in 2005, Tuomas Aura of Microsoft Research proposed a way to embed a “cryptographic proof” in the IPv6 addresses. The idea was to tie the address to a public key, by encoding in the IID part of the address the hash of a public key. This would allow IPv6 nodes to prove ownership of the address by signing a message with their private key, and then showing how the IID address is derived from the public key. Tuomas envisaged a set of applications, e.g. encryption of data using IPSEC, or secure roaming with IP mobility, but the IETF only picked the idea in the context of “secure network discovery,” defined in RFC 3791. The address format defined by Tuomas came to me known as “Cryptographically Generated Addresses (CGA)”, defined in RFC 3792. SEND and CGA looked like good ideas at the time, but 9 years have passed and we don’t see a lot of adoption.

More recently, another researcher, Hosnieh Rafiee, revisited that design and proposed an alternative way to tie public key and IPv6 addresses, which she calls “A Simple Secure Addressing Scheme for IPv6 AutoConfiguration” (SSAS). She also wrote a draft presenting her view of the issues with SEND, titled “Recommendations for Local Security Deployments.” I am happy to see researchers working on IPv6 and revisiting the issues that we studied years ago, but I have a problem with that specific proposal. The SSAS algorithm is insecure, as is strength is limited to exactly 64 bits, way too short to prevent spoofing. The local security arguments against CGA and RA-guard are specious, and in at least one case just wrong. For me, it would be much more pleasant to just help a young researcher, but it is one of those cases where I have to write a review that “rejects the paper.”

Draft-rafiee-6man-local-security-00.txt appears a motivating draft, designed to encourage adoption of SSAS. Hosnieh has to make two points, first that the current solutions for local network security are inadequate because SEND is not deployed, and second that the reason why SEND is not deployed is because of issues with the CGA specification, which would be fixed by her SSAS proposal. The current solution for local security is to rely on link level security procedures like 802.1x or Wi-Fi Protected Access (WPA), complemented with filtering of router advertisement by the local switches, defined in RFC 6105, IPv6 Router Advertisement Guard (RA-Guard). Of course, applications should also use end-to-end security protocols like IPSEC or TLS.

Section 3 of the draft-rafiee-6man-local-security-00.txt purports to demonstrate the insufficiencies of the RA-guard solutions. I found it somewhat confusing. Someone else may want to review it, but I believe that some of the statements made there are just wrong. For example, saying that “this feature is supported only in the ingress direction” is bizarre – the whole point is for switches to filter spurious RA from misconfigured local hosts, or from local hosts infected by a virus. Similarly, saying that this does not protected tunnels is also a bit bizarre, since point-to-point tunnels are protected by their own configuration methods.

 

Section 4.1.1.1 of the draft-rafiee-6man-local-security-00.txt assesses that the IPR restrictions on CGA have prevented the deployment of SEND. This is a bit specious, since the IPR owner, Ericsson and Microsoft, have both granted royalty free licenses. It is true that some open source developers are put off by any licensing condition, even royalty free, but there are plenty examples of protocols with similar restrictions that have been widely adopted. In any case, the IPR situation cannot explain why SEND was not deployed by Microsoft, who owns one of the CGA patents.

 

Section 4.1.1.3 of the draft-rafiee-6man-local-security-00.txt purports to demonstrate an attack against CGA. The discussion on the mailing list showed that this attack relies on a downgrade of the CGA SEC value. But since the SEC value is actually part of the address, the attack cannot actually be used against CGA+SEND. Several people pointed that, but Hosnieh kept reasserting her same arguments, and the discussion was really not productive.

 

The valid argument against CGA is that it is only secure if we use non zero values of the SEC parameters, but that doing so requires expensive computation. This consume too much power, and may deplete a mobile device’s battery. Nobody doubts that, but it is pretty much a consequence of the short size of the host identifier field. When it comes to cryptography, 64 bits is not a large number. 80 bits might be acceptable now, but we should really only use 128 bits in the future. CGA anticipated that problem with the SEC field, which complements the limited size host identifier with a requirement that the hash starts with a specified number of zeroes. This is expensive but proven. Bitcoin for example uses a similar algorithm for its “proof of work.”

 

The purported advantage of SSAS is that the IID is much easier to compute than CGA hashes with non-zero SEC numbers. That is true, but the corollary is that SSAS is subject to an obvious attack. According to <draft-rafiee-6man-ssas-08.txt> the “secure” IP address would be derived by concatenating the local prefix and a host ID derived from a public key. The public key is normally an EEC key longer than 196 bit. The host ID is derived as follow:

 

2. Divide the public key array of bytes into two half byte array (see figure 1). Obtain the first 4 bytes from the first half byte array and call it the partial IID1. Obtain the first 4 bytes of the second half byte array and call this the partial IID2. (Dividing the public key is only for randomization)

 

3. Concatenate partial IID1 with partial IID2 and call this the IID.

 

The problem with that approach is obvious. An attacker can simply generate a large number of public keys, and build a catalog of 64 bit identifiers. In at most 2^64 attempts, the attacker will have obtained an IID that matches any target address. Since there is no hashing or proof of work involved, there is no way to improve the security in the future.

 

This is a fundamental flaw, and I wonder why Hosnieh continues to push this flawed approach. The only significant change between the initial SSAS algorithm and draft 08 is to replace the use of RSA public keys by ECC keys, probably based on the belief that ECC keys cannot be generated so easily. But picking a new ECC key is actually rather simple. RFC 6090 describes it in section “5.3.1. Keypair Generation:”

 

The private key z is an integer between 1 and q-1, inclusive, generated uniformly at random. (See Appendix B regarding random integers.) The public key is the group element Y = alpha^z. Each public key is associated with a particular parameter set as per Section 3.3.

 

Given that, the attack by generation of 2^64 keys appears well within the means of today’s attackers. Wait a few turns of Moore’s law, or implementations on GPU or FPGA, and SSAS will be trivially broken by random hackers.

 

Posted in Uncategorized | 5 Comments

A Server in Every Home

Our representatives just voted down the amendment do defund the NSA domestic monitoring program. The good news is that 205 representatives had the courage to vote “Yes” but the leaders of both democrats and republicans supported the NSA. What that means is clear: the spying will continue. We can expect the “intelligence establishment” to push for ever more powers. In the 90’s, they were using pedophiles as the scarecrow that justified controlling encryption. After 2001, they are using the specter of Bin Laden. Something else will come up, to justify ever greater powers, ever more monitoring. There will always be more billions of dollars to buy big computer centers, and whatever new equipment becomes fashionable. The “intelligence industry complex” will feats on that, and of course, a fraction of that budget will be used to lobby congress and justify more spending and more intrusions.

This move to total surveillance is also a byproduct of “The Cloud.” In the last decade, we saw the Internet move towards centralization. Where we initially had a web of small servers interconnected by hyperlinks, we now have a very small number of big computer centers managing services for millions of users. Facebook, Google, Apple, Microsoft, Amazon and a few smaller players centralize much of the activity. They perform their own “monitoring” of our electronic communications, so they can better “serve advertisements.” Each web page now comes equipped with a variety of trackers, Google reads the content of each message sent to Gmail, and Facebook can analyze our posts. No wonder that the spooks became jealous, and wished they had the same information. Such big repositories are just too tempting. The intelligence lobbies will get it, even if that means creating new laws and secret courts. For the intelligence service, the cloud is just transparent.

The obvious conclusion is that we cannot have big centralized servers and privacy. If we care about privacy, we must get away from “the cloud.” The cloud companies will provide our data to whoever asks with a proper court order, and the orders from the secret court will specify that they should not tell us. The only way around that is to run our own servers, in our own houses. If we want a really decentralized Internet, we need A Server in Every Home!

Of course, there are reasons why people don’t run their own servers today. Drew Crawford posted a guide on running your own mail server: NSA-proof your e-mail in 2 hours. This is very well done, but it does involve configuring a Linux server and setting up a number of programs. If we want to get a server in every house, we need to be better than that. The server has to be dead simple to deploy and run. The development team of the Windows Home Server at Microsoft spent a lot of time dealing with these issues, make it as simple as possible. We know that there is basic software available, we understand the concept, and we understand the usability requirements. It will probably require a year or so of work by a dedicated team to assemble an “easy to use” appliance and work out the kinks, but it is doable. We can come to a point were servers can indeed be deployed in every home. Maybe there is hope after all.

 

 


 

Posted in Uncategorized | 3 Comments

Let’s build a cookie exchange

Bruce Schneier’s post on Internet privacy hits the nail on the head. He is not the first one to make the point. Scott McNealy did that in 1999. Reporters were asking questions about the privacy implications of Sun’s Java/Jini technology, and he quipped “you have no privacy, get over it.” That was 14 years ago. Since then, we have seen the “web 2.0” technology drive surveillance to ever greater extremes, in the name of better advertisements. Bruce is making the strong argument that, by now, trying individual actions to protect privacy is futile. The only real solution would be political, that our elected representative pass laws that forbid such surveillance. But at the same time, there is so much “synergy” between surveillance by government and tracking by advertisers that such laws are very unlikely to get passed, let alone enforced.

I am all for political action and trying to pass such laws, but I think we should also start developing “protest technology” that actually fights back against tracking by advertisers. My pet project would be a “cookie exchange.” The idea is to mess with the tracking, so that the service end up collecting lots of fallacious information. In effect, this will poison the data collected by trackers, diminish their value, and hopefully make tracking much less profitable.

Tracking services get developers to insert a reference to their services in the web pages, typically in exchange of better analytics, or as part of a display advertisement service. When we visit web pages, the tracking services get their own cookie back. The same tracking cookie identifies a given user on many web pages, allowing for correlation and profiling. The standard defense is to “block third party cookies,” but that’s not always available. In any case, blocking cookies only reduces the total amount of information in the database.

Let’s suppose now that whenever a browser receives a cookie from a tracking site, it sends a copy of that cookie to our “cookie exchange,” and receives back a cookie that was allocated to somebody else. The next time the browser access a web page, it serves back the exchanged cookie instead of the real one. Voila, the tracking service starts getting confused, it will believe that the page was accessed by that other person. If many people play that game, the data base and the statistics will be seriously flawed.

Of course, we need to get a few engineering details right. For example, we have to check how often the local cookie should be swapped with the exchange. We have to find the right way to design cookie exchange plug-ins in the browsers. We have to look at some filtering procedure to avoid swapping the “good” cookies, such as for example the access tokens to our bank account. The exchange will have to understand the lifetime of cookies, so as to avoid serving obsolete ones. If we cannot access the browsers, we may want to check for possible implementation of the exchange inside a web proxy.

There will be a cat-and-mouse aspect to all that, with advertisers trying counter-measures, and exchange developers hacking back. But all in all it sound like fun. If you are interested by such a project, drop me an e-mail!

Posted in Uncategorized | Leave a comment

Looking back at Teredo, IPv6 deployment, and protocol design

I just read the paper on Teredo published in the Computer Communication Review: Investigating the IPv6 Teredo Tunnelling Capability and Performance of Internet Clients by Sebastian Zander, Lachlan L. H. Andrew, Grenville Armitage, Geoff Huston and George Michaelson. This is a very well done study, which used image links on web sites to test the capability of clients to use IPv4, IPv6, and distinguish in IPv6 the native connections and the Teredo connections. Their conclusion is that many more hosts would use IPv6 if Microsoft was shipping Teredo in “active” instead of “dormant” state in Windows Vista and Windows 7, but that the communications using Teredo incur some very long delays, at least 1.5 second more to fetch a page than with native IPv4 or IPv6 connections. Both of these issues can be traced to specific elements of the protocol design and especially to our emphasis of security over performance.

I proposed the first Teredo drafts back in 2000, shortly after joining Microsoft. The idea was simple: develop a mechanism to tunnel IPv6 packets over the IPv4 Internet, in a fashion that works automatically across NAT and firewalls. It seemed obvious, but was also quite provocative – the part about working automatically across firewalls did not sit well with firewall vendors and other security experts. In fact, this was so controversial that I had to revise the proposal almost 20 times between July 2000 and the eventual publication of RFC 4380 in February 2006. Some of the revisions dealt with deployment issues, such as minimizing the impact on the server, but most were answers to security considerations. When Microsoft finally shipped Teredo, my colleagues added quite a few restrictions of their own, again largely to mitigate security issues and some deployment issues.

The connection between Teredo clients and IPv6 servers is certainly slowed down by decisions we made in name of security. When a Teredo client starts a connection with an IPv6 server, the first packet is not the TCP “SYN,” but rather a Teredo “bubble” encapsulated in an IPv6 ICMP echo request (ping) packet. The client will then wait to receive a response from the server through the Teredo relay closest to the server, and will then send the SYN packet through that server. Instead of a single round trip, we have at least two, one for the ICMP exchange and another for the SYN itself. That means at a minimum twice the set up delay. But in fact, since the client is dormant, it will send first a qualification request to the server, to make sure that the server will be able to relay the exchange, thus adding another round trip, for a total of three. The server happens to be often quite overloaded, and the queuing delays in the servers can cause quite a few additional latency. This is very much what the study is demonstrating.

We could have engineered the exchange for greater speed. For example, instead of first sending a ping to the server, we could have just sent the TCP SYN to the server, and used the SYN response to discover the relay. This would have probably increased the connection success rate, as many servers are “protected” by firewalls that discard the ICMP packets. But at the time we convinced ourselves that it would be too easy to attack. A hacker could send a well-timed spoofed TCP SYN response and hijack the connection. The randomness of the source port number and of the initial TCP sequence number provide some protection against spoofing, but these are only 16 and 32 bits, and that was deemed too risky. The ICMP exchange, in contrast, can carry a large random number and is almost impossible to spoof by hackers not in the path. So the protocol design picked the “slow and secure” option.

The connection to IPv6 hosts is just an example of these design choices for security over performance. There are quite a few other parts of the protocol were we could have chosen more aggressive options, using optimistic early transmission instead of relying on preliminary synchronization. But we really wanted to deliver a secure solution – secure computing was indeed becoming a core tenet of the Microsoft culture by that time. We were also concerned that if Teredo was perceived as insecure, more and more firewalls would simply block it, and our deployment efforts would fail. All of these were valid reasons, but the long latencies observed in the study are also an impediment to deployment. If I did it again, I would probably bring the balance a bit more towards the side of performance.

But then, the really cool part of the study is their point that removing some of the restrictions on Teredo would almost triple the number of hosts capable of downloading Internet content, adding IPv6 capability to 15-16% of Internet hosts. That would be very nice, and I would be happy to see that!

Posted in Uncategorized | Leave a comment