Implementing Cubic congestion control in Quic

Back in January, I was observing that having free for all innovation on congestion control might be problematic. I was invited to discuss that in front of the TCPM working during the IETF meeting in Montreal, this July. After presenting the issue in a few slides, I saw a number of TCP experts line at the microphone and explain me that I was wrong. Their main argument was that if this was true, the Internet would have collapsed long ago due for example to video streaming, or to server using very large initial congestion windows. OK. So I went on to tune the congestion control algorithms used in Picoquic, my implementation of the Quic protocol. The first step was to make my implementation of Cubic actually work.

The Quic draft for recovery specifies use of the Reno congestion algorithm, but Reno is old and does not perform very well on high speed high latency links. Cubic was developed to provide better stability and scalability. It is implemented in a number of TCP stacks, including Windows and Linux, and has probably become the most used algorithm on the Internet – at least until BBR catches on. Cubic is described in several academic papers and in RFC 8312, and I based my initial implementation on the text in that RFC. It kind of worked, but there were three issues: implementation of the TCP friendly algorithm, making Hystart work with noisy RTT measurements, and recovery from spurious retransmissions.

Revisiting Cubic’s TCP Friendly definition

The RFC explains that Cubic, like Reno, can operate in two modes: slow start and congestion avoidance. In congestion avoidance mode, Cubic uses a time-base formula to grow the congestion window as:

W_cubic(t) = C*(t-K)^3 + W_max

The window grows relatively fast at the beginning of the interval, stays relatively stable when “t” is near the value “K”, and then grows quickly to probe for available bandwidth. During the initial part of that period, the window could well be lower than what would have been computed using TCP Reno. The RFC describes that as the “TCP friendly region”. The RFC suggests computing the window that Reno would have found and using it if it is larger than what the Cubic formula provides. I implemented that, and the results were surprising. The traces showed that the congestion window was swinging widely, being reevaluated after pretty much every received acknowledgement. After a bit of head scratching, I concluded that this was caused by the formula used in the draft to compute the “TCP friendly” window:

W_est(t) = W_max*beta_cubic + [3*(1-beta_cubic)/(1+beta_cubic)] * (t/RTT)

This formula captures the way New Reno grows the TCP window by one packet per RTT, but the text in the RFC does not specify what the RTT value shall be. In my initial implementation, I simply used the “smoothed RTT” computed by the Quic recovery algorithm, but this smoothed RTT is of course not constant. The randomness in the RTT measurement causes it to vary up and down, and caused inversely proportional changes in the congestion window. I suppose that we could use an arbitrary constant value instead of the smoothed RTT, such as maybe the same RTT value used to compute the Cubic coefficient K, but then the formula would not describe the actual result of the New Reno evaluation. In my code, I fixed the issue by simply running the New Reno computation in parallel to Cubic, processing the received acknowledgements and increasing the window accordingly.

Hystart and noisy RTT measurements

The RFC mentions that implementations might use the hybrid slow start algorithm usually referred to at Hystart. The algorithm itself is not defined in the RFC, but can be found in a research paper. In addition to exiting slow start on first packet loss, the way new Reno does, Hystart will also monitors the evolution of the RTT to estimate the bandwidth and the target window, and exit slow start before any packet is lost. This can result in connections that perform very well and often conclude without losing any packets. In my initial tests, using Hystart resulted in significant performance improvements most of the time. However, there were also cases in which Hystart would exit slow start too soon, resulting in degraded performances. A look at RTT measurements explain why:

rtt-measure

The graph above shows a series of RTT measurements. The X axis shows the time of the measurement since the start of the session, the Y axis shows the RTT measured as the delay between sending a packet and receiving the acknowledgement – all values are in microseconds. We can visually identify 6 intervals in the session: 3 intervals in which the RTT remains low, and 3 intervals during which congestion causes higher delays. The interesting part is the high variability of the measurements during the “low delay” intervals – it might be due to specificities of the Wi-Fi and LTE links used in the experiment, or to queuing delays in the UDP sockets.

Traditionally, delay-based algorithms like Hystart do not account for this variability. They typically reference an “RTT min” measured as the lowest delay since the beginning of measurements. When the measured delay is larger than the minimum, they infer the beginning of congestion. But if the delay varies somewhat randomly as in the example above, the RTT min will ratchet down each time a lucky packet experience low delay, and the congestion will be detected too soon when an unlucky packet got a randomly high delay.

My solution was to implement two filters. Instead of measuring the RTT “min”, I measure an RTT “mode” as:

RTTmode = min (RTTmode, max(RTTn, …, RTTn-4))

Then, I would detect congestion when 5 consecutive RTT measurements are all larger than the mode. This stabilized my implementation of Hystart, and greatly reduced the number of “false exit”.

Spurious retransmissions

If you look closely at the graph of RTT above, you will see a number of vertical bars with multiple measurements happening at almost the same time. This is a classic symptom of “ACK compression”. Some mechanism in the path causes ACK packets to be queued and then all released at the same time. I copied in my implementation the packet repetition algorithms described in the Quic recovery specification, but the algorithms were not designed to deal with this kind of ACK compression. The queuing of the acknowledgment caused packets to be retransmitted too soon. The ACK was received some time later, causing the retransmission to be classified “spurious”, but since the Cubic RFC does not say anything about spurious retransmissions. I had not written any code to deal with it. The results were not good:

cwin-spurious-decrease

Each time a packet is retransmitted, the Cubic algorithm exits the congestion avoidance mode and recompute its coefficients. We see in the graph that each spurious retransmission is mistaken as a congestion signal, causing the congestion window to “ratchet down”. The performances become really bad.

The solution is to implement some form of spurious retransmission recovery. Supposed that a retransmission caused the congestion window to shrink and is later recognized as spurious. Spurious retransmission recovery will reset the congestion control to the state that it would have if the retransmission did not happen. In Cubic, that means restoring key parameters like the coefficients K and Wmax or the beginning of the epoch to their value before the retransmission. After doing that, Cubic performs much better:

cwin-spurious-recovery

During the Quic Interop sessions, we do tests comparing the performance of Quic and HTTP3 to the performance of a classic HTTP2 server. On my server I do that by comparing the performance of Picoquic to that of an Apache2 server running on the same machine. The test is officially described as: Download objects named 5000000 and 10000000 on separate connections over both HTTP/2 and HTTP/3 (0.9 if H3 not available). Overall transfer time should be no worse than 10% slower than HTTP/2.

With the fixes described above, the Quic server is definitely not 10% slower than the HTTP server. In fact, it is close to twice faster. Innovation is good!

Posted in Uncategorized | Leave a comment

He said we run ads, but he didn’t say that’s why we have fake news

“Senator, we run ads.” That was Mark Zuckerberg’s response to Orrin Hatch, senator from Utah, asking how Facebook could possibly sustain a business without having customer pays. The senator was then mocked, as an old guy who obviously did not have a clue about the Internet economy. Well, not so fast. Orrin Hatch is definitely not young, but you should not laugh at his question. Zuckerberg’s response actually points at the root cause of the “fake news” problem. You get fake news because the algorithms that manage your news feed realize that you like them, and that brings more ad revenues. Moreover, Facebook

Back in the late 90’s, the budding Internet economy had a problem. Nobody was paying for content. In theory, people were not unwilling to pay, and in fact at the time people routinely paid to get a copy of their daily newspaper. But that did not work on the Internet. If you interrupted the readers and asked them for a payment, they would typically close the page and go to another site. This is well explained in a paper by Andrew Odlyzko, “The case against micropayments“. Web sites and the Silicon Valley venture capitalists behind them quickly realized that to make money, they had to rely on advertisements.

I did not find advertising particularly shocking back them. Lots of newspapers were supported by a mix of subscriptions, sales and advertisements. Having an ad banner on a web page felt very much like having a printed ad in a newspaper. But that was before Google. In the old print word, advertisers would try to place the ads near content that attracted the desired type of readers. Fishing gear would be advertised in a fishing magazine, and fancy dresses in a fashion magazine. The Internet started like that, and even Google started the same way. They would display ads for fancy dresses if you searched for fashion, and ads for fishing gear if you searched for fishing stories. The ad was still tied to the content. In restrospect, these were the good old days.

Soon after that, Google realized that they could make much more money if instead of targeting the content, they targeted the user. If they knew you were spending your weekends fishing, it did not matter whether you were currently researching references for the next term’s paper. They could still feed you ads about fishing gear, and there was a pretty good chance that you would click on them. Google then theorized that the more they know about you, the more “relevant” ads they could feed you. They built an awesome infrastructure to learn as much as they could about you, measuring your web activity, your cell phone activity, and even installing gadgets to monitor your home. I imagine that the next step will be some kind of body implant. That is bringing them a lot of money.

Remember the line of the Sean Parker character in the “Social Network” movie, telling the young Zuckerberg that h should aim bigger than “a chain of very successful yogurt shops”? That’s was all about ads and revenues and competing with Google. It turns out that this game of “relevant advertisements” can only be played if you are very big, if you have lots of users, and if you can observe these users frequently enough to learn a lot about them. Facebook embraced that vision, and as they say in business talks, they executed rather well. After the company was introduced on the stock exchange, the investors and Zuckerberg got very rich. But then, lofty stock prices can only be sustained if the company keeps growing.

When you have captured the vast number of users that Facebook has, further growth cannot be merely accomplished by getting more users. At some point, that number of users will reach a plateau. To keep growing revenues, you need users to spend more time on Facebook, what the business folks call “increased engagement”. One way they do that by manipulating the content that they present, so you be ever more likely to read the next page, and the next, and in the process get more and more occasions to see advertisements, click on them, and bring revenues to Facebook. Google, YouTube and Twitter follow the same model. You could say that they are feeding the content that users like, much like a successful novelist would write books that his readership likes, but this is much different.

In the big platforms like Facebook, the choice of “what to show next” is driven by automated algorithms that constantly learn about you, but also about people like you. They don’t just show you the next content that you will like, they aim further, like a chess master planning several moves ahead. Out of the many available contents that they know you will like to read, they know which ones are more likely to make you read the next one after that, and so on. These algorithms appear to love propaganda, because propaganda plays with your mind. The more propaganda you read, the more convinced you become, the more likely you are to read the next batch of similar stories. If you are fed a constant diet of such propaganda, you will become a true believer. For the Facebook algorithm, it does not matter whether the propaganda is about racism, or vaccines, or veganism. It also does not matter if the news is true or fake, if it keeps you engaged and clicking.

At their best, services like Facebook unite people, let them keep in touch with distant friends and family. But that alone would not bring enough revenues. Advertisements needs data about you, what comes to be known as Surveillance Capitalism. Advertisement revenues only grow if you look at the platform often. Revenue grows best if you become addicted. News from grandma will not suffice for that, but propaganda and fake news will. Like all addictions, it will sicken your mind, but Facebook’s algorithms do not care. The senator’s question about business models uncovers the root of the problem: yes, Facebook runs ads. But depending on ads leads to surveillance, addiction and fake news. And of course Mark Zuckerberg would rather not say that.

Posted in Uncategorized | Leave a comment

Deploying AQM and enabling transport innovation

In a previous post, I pointed out that the move to “application level” transports enables transport innovation and bypasses the traditional gatekeepers, the OS developers. The one big fear is that applications would seek greater performance through selfish innovation, and that the combined effect of many large applications doing that could destabilize the Internet. The proposed solution is to have networks manage traffic, so each user gets a fair share. But the big question is whether this kind of traffic management can in fact be deployed on the current Internet.

Let’s not forget that innovation is generally a good thing. The congestion control algorithms currently used are certainly not the best that we can think off, and in fact many experimental designs have demonstrated better latency, or better throughput, or both; this paper on a “protocol design contests” provides many examples. But innovation is not fair. For example, a protocol that tracks transmission delays will not behave the same way as a protocol that tracks packet losses, and it is very hard to tune them so that they compete fairly. This will be a problem if some applications feel penalized, and the competition could degenerate into a catastrophic race to the bottom. But if the network is designed to enforce fairness, then it can enable innovation without this major inconvenience. That’s why I personally care about traffic management, or active queue management.

Before answering whether the AQM algorithms can be deployed, let’s look at where we need them most. Congestion appears first at bottlenecks where the incoming traffic exceeds the available capacity. Bottlenecks happen on the customers’ downlinks, when too many devices try to download too much data simultaneously, or on customers’ uplinks, when applications like games, VPN and video chat compete. They happen on wireless links when the radio fades, or when too many users are using the same cell. They may happen on the backhaul networks that connect cell towers to the Internet. Due to large capacity optical fibers, it does not happen so often anymore on transoceanic links, but that still happens in some countries when the international connections are under provisioned. Is it possible to shape traffic at all these places?

In fact, these traffic management techniques are already used in many of these places. Take wireless networks. The wireless networks already manage bandwidth allocation, so that scarce resource can be fairly shared between competing users. If an application tries to send too much data, it will only manage to create queues and increase latency for that application’s user, without impacting other users too much. In that case, applications have a vested interest in using efficient mechanisms for congestion control. If they do, the application will appear much more responsive than if they don’t.

Traffic management is also used in cable networks, at least in those networks that abide by the DOCSIS specifications of CableLabs. Active queue management was added to the cable modem specifications with DOCSIS 3.1, published in 2014. There are also implementation of active queue management such as FQ-Codel for OpenWRT, the open source software for home routers. Clearly, deployment of AQM on simple devices is possible.

The edge of the network appears covered. But congestion might also happen in the middle of the network, for example in constrained backhaul networks, at interconnection points between networks, or on some international connections. The traffic at those places is handled by high performance routers, in which the forwarding of packet is handled by “data plane” modules optimized for speed. Programming AQM algorithms there is obviously harder than in general purpose processors, but some forms of control are plausible. Many of these routers support “traffic shaping” algorithms that resemble AQM algorithms, so there is certainly some hope.

But rather than speculating, it might be more useful to have a discussion between the developers of routers and those of end to end transport algorithms. How to we best unleash innovation while keeping the network stable? Hopefully, that discussion can happen at the IETF meeting in Prague, next March.

Posted in Uncategorized | 2 Comments

Will transport innovation collapse the Internet?

Maybe you have heard of QUIC, a new transport protocol that combines functionalities of TCP and TLS, plus some new features like parallel handling of streams and latency minimization through 0-RTT. The work started at Google 5 years ago, as part of their effort to speed up the web. The IETF picked up the early work from Google, and is working through its standardization. There is a great deal of innovation happening there, and it is made possible by a series of design choices, such as running over UDP to cross NAT and firewalls, complete encryption of the transport headers to prevent interference from the middle of the network, and implementation in user space. It is all very cool, but there is a potential danger. Implementations can innovate and deploy new congestion control algorithms pretty much at will. That could trigger a race to the bottom, and potentially break the Internet, unless network providers start deploying adequate Active Queue Management algorithms.

Before 1988, the Internet was occasionally victim of congestion events. I was working at INRIA in France at the time, and some links in our early TCP-IP network were quite slow, X.25 virtual circuits that maxed out at 9600 bps. It worked some of the time, and then there would be a congestion event. Some node would start sending too fast, queues would build up, delays would increase and trigger retransmissions which would contribute to more queues, and very soon nothing useful could be done. This was eventually fixed in 1988, with the work of Lixia Zhang on adaptive timers and of Van Jacobson on congestion control algorithms – the algorithm that came to be known as “TCP Reno”.

The Internet grew a lot since 1988. The link speed quickly reached megabits per second, and then gigabits. Some people may complain that their connection is too slow, but we have not experienced anything like the congestion collapses of the early era. The traditional stance is that the Internet only remained stable because most of the traffic uses TCP and uses congestion control algorithms designed to back-off in the presence of congestion. The design of the TCP stacks tends to be very conservative. There were some updates in the 90’s to accommodate larger transmission speeds and more efficient retransmissions, but the congestion control logic remained very much the same. The only major change happened in the Linux kernel in 2006, with the replacement of “New Reno” by “Cubic” – 18 years after Van Jacobson’s design. Cubic does in fact keep a lot of the design principles of Reno, probing slowly for increased capacity and backing off promptly in case of congestion. This slow rate of change might soon change.

Change was slow in large part because the TCP stacks were shipped as part of the operating systems, and there never were very many operating systems. The operating systems developers are by nature conservative. If you are working on Windows, Linux or FreeBSD, you don’t want to ship a code update that might cause Internet congestion. Anything you do will have to pass multiple reviews, and deviation from standards would require a lot of justification. In fact, that conservative stance was pretty much one of the motivations for developing QUIC as an “application level” protocol. It runs over UDP, and the code is shipped as a library, compiled with the application. Google does not have to wait that Microsoft updates the TCP stack in Windows: it can ship a new version of the QUIC library in a update to the Chrome browser. Instead of getting an operating system update every year or so, the application can be updated every month, maybe even every day. The operating system developers acted as gatekeepers of transport innovation. Shipping the code with application bypasses these gatekeepers.

On one hand, removing the gatekeepers has immediate benefits, because enables development of new transport features. QUIC can for example deploy 0-RTT transmission, so that the first web request can be sent from client to server without waiting for complete establishment of the connection. It mitigates head of queue blocking by managing simultaneous transmission of multiple streams. Future updates may bring partial reliability or forward error correction. This increased deployment agility is most welcome. On the other hand, that same agility enables immediate deployment of “alternative” congestion control algorithms. Some of that may be good, such as designing algorithms that react better than New Reno or Cubic to the specific behavior of wireless links. But there is a risk that developers will abuse this newfound agility and engage in a race towards ever more aggressive behaviors. If everybody tunes their algorithms to be a little faster than the competition, the net result could well be a return of congestion collapses, just like we saw in the pre-1988 Internet.

In fact, Google already started shipping QUIC with a new congestion control algorithm called BBR. BBR operates by tracking the end to end delay and trying to maintain is small, backing off when an increase in sending rate creates a disproportionate increase in queuing delay. It does deliver high bandwidth and low latency, which is great. But it also delivers something else: domination over the competitors. The graph above is taken from a presentation by Geoff Huston at the RIPE 76 conference. It shows what happens when a 10 Gbps link is shared between two connections, one managed using the Cubic algorithm that ships with Linux and Windows, and the other with BBR. In Geoff’s words, “that was brutal”: very soon after the BBR connection starts, the throughput of the Cubic connection drops to almost zero. I don’t have actual measurements, but it may well be that if two users are trying to access the web from the same home, the one accessing Google servers using Chrome will get most of bandwidth, while the other accessing some other server will be left with a very slow connection.

Let’s be clear and not rush to assign bad intentions to my colleagues at Google. There is no proof that the dramatic effect shown in Geoff’s slide was deliberate. It may well be the unintended side effect of the design priority, achieving low latency. It may well be that the design will be fixed to achieve better behavior, since after all the code can be readily updated. Geoff Huston’s  detailed analysis of BBR shows that there are indeed some efforts in that direction. But still, the optics are not good.

Faced with results like that, the competitors are going to react. They cannot really accept to be pushed to a slow lane. The most expedient development could be to just use QUIC, since the IETF standardization has well progressed, and of course ship it with their own copy of BBR. But there is the latent temptation to do even better, to “move to an even faster lane”. Maybe they will just change a couple of parameter values in the BBR algorithm to make it “faster”. Maybe they will invent their very own congestion control algorithm. Remember, there are no gatekeepers any more. If you can ship the same software to your app and to your servers, you can compete.

That’s not an entirely new problem. Some applications have been accused to do that for a long time. For example, some video conference applications are said to send video just as fast as they could, because they are “real-time”. (In fact, congestion control for video conferences was demonstrated in the early 90’s.) But having wide spread deployment of new congestion control algorithms changes the game. In fact, it exposes a gap in the current Internet architecture. The gatekeepers in the operating systems pretty much guaranteed that network managers could rely on end-to-end congestion control to guarantee stability. Well, not anymore.

The solution is probably a widespread deployment of Active Queue Management. There are many AQM algorithms. Random Early Detection and Explicit Congestion Notification were proposed years ago. More recent algorithms were developed in the IETF AQM Working Group to address the “buffer bloat” problem, such as FQ-CODEL or PIE. My hope is that we will see a virtuous cycle: AQM algorithms get deployed at bottlenecks and enforce that each class of traffic gets a fair share. If network providers do that, there will be much less incentive to deploy “unfair” congestion control algorithms in applications. On the contrary, the algorithms will evolve to stay within the allocated fair share, while keeping queues as short as possible. That would be much better than collapsing the Internet!

Posted in Uncategorized | 3 Comments

Having fun and surprises with IPv6

(Corrected March 4, 2018)

I am participating in the standardization of the QUIC protocol. That’s why I am writing a prototype implementation of the new transport protocol: Picoquic. And the development involves regular testing against other prototypes, the result of which are shown in this test result matrix. This is work in progress on a complex protocol development, so when we test we certainly expect to find bugs and glitches, either because the spec is not yet fully clear, or because somewhat did not read it correctly. But this morning, I woke up to find an interesting message from Patrick McManus, who is also developing an implementation of QUIC at Mozilla. Something is weird, he said. The first data message that your server sends, with sequence number N, always arrives before the final handshake message, with sequence number N-1. That inversion appears to happen systematically.

We both wondered for some time what kind of bug this was, until finally we managed to get a packet capture of the data:

And then, we were able to build a theory. The exchange was over IPv6. Upon receiving a connection request from Patrick’s implementation, it was sending back a handshake packet. In QUIC, the handshake packet responds to the connection request with a TLS “server hello” message and associated extensions, which set the encryption keys for the connection. Immediately after that, my implementation was sending its first data packet, which happens to be an MTU probe. This is a test message with the largest plausible length. If it is accepted, the connection can use message of that length. If it is not, it will have to use shorter packet.

It turns out that the messages that the implementation was sending were indeed a bit long, and they triggered an interesting behavior. My test server runs in a virtual machine in AWS. AWS exposes an Ethernet interface to virtual machines, but applications cannot use the full Ethernet packet length, 1536 bytes. The AWS machinery probably use some form of tunneling, which reduces the available size. The test message that I was sending was 1518 bytes, but that was still too long which is larger than the 1500 bytes MTU. Some router on the path, probably AWS, The IPv6 fragmentation code in the Linux kernel splits it in two: a large initial fragment, 1496 byte long, and a small second fragment 78 bytes long. You could think that fragmentation is no big deal, since fragments would just be reassembled at the destination, but you would be wrong.

Some routers on the path try to be helpful. They have learned from past experience that short packets often carry important data, and so they try to route them faster than long data packets. And here is what happens in our case:

  • The server prepares and send a Handshake packet, 590 bytes long.
  • The server then prepares the MTU probe, 1518 bytes long.
  • The MTU probe is split into fragment 1, 1496 bytes, and fragment 2, 78 bytes.
  • The handshake and the long fragment are routed on the normal path, but the small fragment is routed at a higher priority level.
  • The Linux driver at the destination receives the small fragment first. It queues everything behind that until it receives the long fragment.
  • The Linux driver passes the reassembled packet to the application, which cannot do anything with it because the encryption keys can only be obtained from the handshake packet.
  • The Linux driver then passes the handshake packet to the application.

And it took the two of us the best part of the day to explore possible failure modes in our code before we understood that. Which confirms an old opinion. When routers try to be smart and helpful, they end up being dumb and harmful. Please just send the packets in the order you get them!

More explanations, after 1 day of comments and further inquiries

It seems clear now that the fragmentation happened at the source, in the Linux kernel. This leaves one remaining issue, the out of order delivery.

There are in fact two separate out of order delivery issues. One is having the second fragment arrive between the first one, and the second is having the MTU probe arrive before the previously sent Handshake packet. The inversion between the segments may be due to code in the Linux kernel that believes that sending the last segment first speeds up reassembly at the receiver. The inversion between the fragmented MTU probe and the Handshake packet has two plausible causes:

  • Some router on the path may be forwarding the small fragment at a higher priority level.
  • Some routers may be implementing equal cost multipath, and then placing fragmented packets and regular packets into separate hash buckets

The summary for developers, and for QUIC in particular, is that we should really avoid triggering IPv6 fragmentation. It can lead to packet losses when NATs and firewalls cannot find the UDP payload type and the port numbers in the fragments. And it can also lead to out of order delivery as we just saw. And for my own code, the lesson is simple. I really need to set up the IPv6 Don’t Fragment option when sending MTU probes, per section 11.2 of RFC 3542.

 

Posted in Uncategorized | 12 Comments

Cracking the SNI encryption nut

Back in May and June this year, I was reviewing the state of SNI encryption. I found in reviewing the archives of the TLS mailing list. I collected the list of attacks that demonstrated holes in previous proposals, and documented them in an Internet draft (https://datatracker.ietf.org/doc/draft-huitema-tls-sni-encryption/) that has just been adopted as a work item by the TLS working group (https://datatracker.ietf.org/doc/draft-ietf-tls-sni-encryption/). The draft also listed two proposals that have resisted attacks so far, one based in a form of TLS in TLS tunneling, and the other based on extended session resume tokens. But something kept bothering me. Both proposals are complex to implement, the session resume token proposal does not cover the first connection between a user and a hidden node, and the tunneling proposal is difficult to extend to protocols like QUIC that use TLS but do not use TCP. In short, SNI encryption is a tough nut to crack. But I think that I have cracked that nut. Let’s introduce the proposed solution, the SNI and ALPN Encryption eXtension (SAEX).

The idea of SAEX is to have a mutable extension that can be carried in the Client Hello. The extension takes two forms: a clear text form, and an encrypted form. The clear text form contains the real SNI, the real ALPN, and a Nonce. The encrypted form is a string of bytes that only the receiver of the client hello can decode. When the client ask the TLS stack to prepare the Client Hello message, it includes the clear text form of the SAEX extension in the message, but prior to actually sending that message on the wire it substitute the encrypted form. Similarly, when the server receives the Client Hello message, it decodes the encrypted value of the SAEX extension and substitutes the clear text form. The server then passes the Client Hello for further processing to the subsystem identified by the “real SNI” named in the SAEX extension.

The most vexing of the various threats listed in the draft (https://datatracker.ietf.org/doc/draft-ietf-tls-sni-encryption/) is probably the replay attack. The adversary just observes the token carrying the encrypted SNI and replays it in its own Client Hello. If the connection succeeds, it learns the identity of the hidden site by observing the certificate or the web pages returned by the server. SAEX is robust against that attack when using TLS 1.3, because all the secrets computed by TLS depend on the content of the Client Hello, which with SAEX include the clear text form of the extension. The adversaries can only observe the encrypted form of the extension. They could guess the value of the SNI and ALPN, but they cannot guess the nonce. That means that they can replay the encrypted extension, but they won’t be able to understand the server’s responses and determine the server’s identity.

There is one threat in the list that SAEX cannot mitigate, the issue of standing out. A Client Hello that carries the SAEX extension will automatically look different from one that doesn’t. Opinions varies about the importance of this issue. On one hand, we can see censors ordering their great firewalls to drop all packets carrying the extension, forcing users to choose between connectivity and privacy. On the other hand, if some big services adopted the extension, this censoring strategy would be very costly. On balance, it is probably worth trying.

Of course, there are many details to get right. The clients need to obtain both the clear text and the encrypted value of the extension before starting the connection, and that needs to be properly engineered. The server could use a variety of encryption methods, with diverse properties. Used improperly, the nonce in the extension could serve as a cookie that identifies the user. But working groups are good at ironing out this kind of issues. And at least, the SAEX design proves that we can solve the SNI encryption issue. That tough nut can be cracked.

*Correction, March 2, 2018*

It turns out that this kind of solution has a big downside, which is explained in the latest version of https://datatracker.ietf.org/doc/draft-ietf-tls-sni-encryption/. If someone replays the connection attempt, it will fail because they don’t have access to the encryption key in the extension. That failure protects the user somewhat, but it also tells the adversary that there was something hiding there. And then who knows what kind of questions the secret police will be asking…

Posted in Uncategorized | Leave a comment

Newspapers, subscriptions, and privacy

Quite often now, when I click on a link to a news article, I am greeted by a message explaining that I will not be able to see it. In some cases, the news site asks me to please turn off the ad blocker. In other cases, the site will ask me to please buy a subscription. What I do then is just close the window. I will get my information some other ways. And I will continue doing that until the news sites stop their gross attacks against their readers privacy.

Many newspapers are ad funded. Even those that are funded by subscriptions also run ads. They were doing that in print form, and they continue doing that with the web. Now, to be clear, I did not care much for the printed ads, but that never stopped me buying newspapers. Ads on the web are different. They tend to be more aggressive than print ads, what with animations, interstitials and pop ups. That’s annoying, but that’s not the worst. The worst is the tracking technology behind these ads.

On the web, ads bring more money if they are “targeted”. This desire for targeting created an arms race between advertisers, eager to find new ways to learn about their targets – that is, about us. The advertisement technology has given us ad auctions and a complex opaque ecosystem that basically attempts to keep tabs on people and their browsing history. Many of us believe that this “corporate surveillance” system is just evil. The best protection against that surveillance is to use ad blockers, or more specifically tracking blockers.

Of course, blocking ads also blocks the revenues of the ad-funded newspapers. They would like us to turn off the blocker, or to buy a subscription. And it makes sense, because we need to pay the journalists. The problem there is that buying a subscription does not prevent the tracking. Whenever I get one of those suggestions from a news site, I attend to find and read their privacy policy. If I did buy a subscription, what would they do with my personal data?

The typical “privacy” policy is hard to find. For the New York Times, for example, you have to scroll to the very bottom of the home page, the very last line of gray tiny print, and find the “privacy” keyword. (It is here: https://www.nytimes.com/content/help/rights/privacy/policy/privacy-policy.html.) If you click on it, you get a long page that was probably vetted by a team of lawyers and describes every kind of tracking and advertising that they could think off. Another example would be “Wired”, for which the privacy policy link hides in small print in the middle of the subscription form. The link points to the generic policy of the publishing group, Conde Nast (http://www.condenast.com/privacy-policy/). Again, the text reserves the right to track the user in every which way and use the information however they see fit. A newspaper like the Guardian will quite often publish papers railing against state surveillance, but take a look at their privacy policy: https://www.theguardian.com/help/privacy-policy. Again, they reserve the right to collect all kind of data and use it for targeting advertisements.

Nowhere will you get something as simple as “if you subscribe and you opt to not be tracked, we will not track you and we will also not let third parties track you through our site.” And that’s too bad, because if they did say that I would in fact subscribe to many of these newspapers. But what you get is instead, “if you subscribe, we will verify your identity in order to check the subscription, and we will track you even more efficiently than if you were not a subscriber.” No, thanks.

Posted in Uncategorized | Leave a comment