Can Telex.CC build censorship evasion in the network?

A colleague pointed me to Telex.CC, a project from the University of Michigan that proposes “a new approach to circumventing Internet censorship that is intended to help citizens of repressive governments freely access online services and information.” They start from the observation that current anti-evasion techniques are really a game of cat-and-mice. When a site gets blocked, users try to access it through some proxies, but the censors will discover and block the proxies, so users get to constantly seek new proxies. Also, if a proxy is discovered, the censors can find out who is attempting to use it, and maybe tell the local police. What if there was no need for proxies? What if the proxies were somehow built in the fabric of the Internet?

The idea is quite neat. Assume that the censor is willing to allow communication with various sites that use SSL/TLS. They have to if their country wants to continue benefiting from e-commerce over the Internet. Assume now that the user has some kind of browser plug-in that detects the blocked sites, and replaces the connection request by a connection to some other innocuous site. The connection to “https://blocked.com” will be replaced by a connection request to some other site, say “https://nonblocked.com.” The TCP connection will proceed to the requested address, but the packets will then get “magically” routed to the blocked site, “https://blocked.com.”

 

The diagram explains how the magic is possible. Somewhere on the path to the non-blocked site, the traffic encounters a “diversion” router that recognizes the original intent of the client and diverts the connection towards the blocked site. The information was carefully hidden in the HTTP header sent by the client. According to the “Telex” design, it is actually encrypted by the client using the public key of the diversion service, and then woven into the HTTP header using steganography. The censor’s firewall cannot find the information, because it does not have access to the diversion service’s private key.

Of course, there are some daunting engineering problems, and it is not clear that the Telex project has solved them all. Steganography is difficult to engineer in practice, and I have no idea how good the proposed solution actually is. The client software needs to maintain lists of blocked sites and adequate “non-blocked” sites, and these lists have to be somehow passed through the censor’s firewall. But the greatest difficulty stems from the dynamic nature of Internet routing.

The diversion will only work if there is a diversion box on the path between the client and the “non-blocked” site. In practice, that requires either placing a lot of diversion routers in the Internet, or placing the diversion routers very close to the non-blocked sites. The latter is probably more realistic, at least during the early deployments, but that means a relatively small number of non-blocked sites. If there are few such sites, the censors may just block them. Let’s hope that the Telex project can deploy enough of these boxes!

Advertisements
Posted in Uncategorized | Leave a comment

Can we achieve private publication in a public DHT?

Many peer-to-peer systems rely on Distributed Hash Tables. DHT allow publication of indexed data through a set of cooperating nodes. For example, in the P2P SIP design, a SIP node can publish its current location, e.g. “host-29.some-network.net,” indexed by the SIP URL of the user, e.g. sip:alice@example.com. The publication process starts by hashing the index to the corresponding hash code, typically a large integer. The message is then pushed to the DHT node whose index is closest to the expected hash code. When someone later wants to contact sip:alice@example.com, they will in turn compute the hash of the SIP URL, retrieve the data indicating the current location of Alice, and establish a P2P SIP session.

This is well and good, but not very private. Anyone can access the DHT and retrieve Alice’s location. In fact, we see examples of that in Bit Torrent systems that also rely on DHT to publish the location of files. That’s quite convenient, as it eliminates any need for a central repository of file locations. But if they publish copyrighted material, the copyright holders or they representatives can find out and sue them. The data published in a DHT is just as public as if it was published on a big web service! Anyone can simply use a DHT client, seek the record describing where a specific file is published, and find the address of the publisher.

Suppose that we want to design a system where “only my friends can see me.” We will still want to use an existing DHT, because engineering a new DHT is really hard. So, we need to rely on some form of encryption to ensure that the communication remains private. Let’s go back to our example of P2P SIP, and assume that Alice wants to publish information for all her friends. In her first attempt at privacy, she makes two changes:

  • Instead of publishing a location under the well-known index sip:alice@example.com she can pick a random index, say “jabberwocky2011.”
  • Instead of publishing her location in clear-text, she encrypts it with a key shared with her friends.

She has tried to achieve two goals. Since only her friends know that “jabberwocky2011” is Alice, third parties cannot know whether Alice is online. They may be able to scan the DHT and find that there is a record under the “jabberwocky2011” index, but in theory they cannot link that to Alice. And in any case, as long as only her friends know the secret key used to encrypt the location data, only her friends will be able to know her location.

All that may work well, until something happens. Suppose that her former friend Eve becomes not so friendly anymore. Eve knew the random index used by Alice, and she also knew the key used to encrypt the location. She can turn that information to third parties. This is exactly how many French underground cells fell during WW2. The Gestapo would somehow manage to arrest a member of the cell and make him speak, and he would turn out information leading to the other members. The cell could only remained safe if it quickly noticed the former member’s arrest, cut all communication with him, and change all passwords and other secrets that the former member knew. If she wants to retain her privacy, Alice would have to do exactly the same thing: cut Eve off, create a new random identifier and a new password, and tell all that to her remaining friends. That may now be practical.

The first design that Alice envisaged is flawed because the same secret information is shared with all her friends. Thinking again, Alice devises a new system, in which she exchanges different identifiers and secret keys with each of her friends – one for Bob, another one for Carol, and yet another one for Eve. When she comes online, she publishes as many location records as she has friends. Her friends know which specific record they should retrieve, and they can only decrypt the individual record meant for them. As soon as Alice learns that Eve cannot be trusted anymore, she destroys the identifier and secret key dedicated to Eve. She does not need to destroy the information relating to her other friends, and she can still communicate with Bob and Carol.

Alice believes that she has done well, but there is still an issue. DHT nodes cannot be all trusted, yet they can find out Alice’s location, or at least her IP address, at the time of publication. The request to publish information in the DHT is sent over the network, and the DHT nodes can find the message’s origin. When Bob retrieves Alice’s location, the DHT nodes can also find Bob’s address. They may not understand the random index and the encrypted value of the location message, but they can find out that the message was published by a certain IP address and retrieved by another IP address, and thus infer that there is a relation between these two IP addresses.

A quick web search turns out at least one paper discussion this issue, “Agyaat: Providing Mutually Anonymous Services over Structured P2P Networks,” by Aameek Singh and Ling Liu from Georgia Tech. Their design proposes to add to the DHT a layer of “onion routing,” similar to Tor. The message requesting publication or retrieval would be first routed through a set of friendly DHT nodes to obscure its actual origin. I don’t know whether that’s necessary or practical, but that’s certainly an option.

 

 

Posted in Uncategorized | 1 Comment

SOPA and the Great Firewalling of America

It appears that Microsoft does not in fact support SOPA, the draconian copyright protection proposal that the congress is working on. That’s very good news, and we have to enjoy every pieces of good news these days. Why do all these folks try to break the Internet? Why are we not fighting harder against these censors?

SOPA would allow the US government to blacklist foreign web sites that somehow infringe US copyright laws. It sets a really bad example for every dictatorship or other authoritarian governments to block websites that somehow infringe their decrees. In her well-publicized Remarks on Internet Freedom, Secretary of State Hillary Clinton said: “In the last year, we’ve seen a spike in threats to the free flow of information. China, Tunisia, and Uzbekistan have stepped up their censorship of the internet. In Vietnam, access to popular social networking sites has suddenly disappeared…” Well, SOPA would allow the US government to do exactly that, make access to popular website suddenly disappear.

The simple idea of censoring foreign web sites is bad enough, but the provisions of SOPA are really a recipe for breaking the Internet. The first step is to ask ISP to block some “offending” DNS names. You would try to access www.example.com, but if the “example” site is banned you would just get a “no such domain” response. Of course, there are easy workarounds, for example creating a non-blocked alias for the “example.com.” At that point, enforcement becomes a game of whack-a-mole: alias pops-up, copyright enforcers find out, alias gets blocked, new alias pops up… SOPA tries to make this kind of evasion harder, but in doing so it justifies all kinds of new censorship techniques, from snooping on packet content to blocking IP addresses. If the law passes, the US would get its own “Great Firewall,” much like China.

The first defense against the Great Firewalling of America is of course the political process. We should tell our friends, explain how bad this is, and make sure that our senators and representative hear about it. With enough pressure, maybe the law won’t pass. But I also believe we can build Internet technologies that thwart this kind of attempts. Do you remember hearing that the decentralized Internet was so robust that it would resist an atom bomb and repair itself? That’s not quite true anymore, because we developed big centralized services like the DNS, because we allowed telecommunication companies to snoop on our unencrypted traffic, and because the large cloud services are an easy prey for censors and other government agents. Why are we not fixing that?

Posted in Uncategorized | Leave a comment

Peer to peer, Friend to friend

Social networks are exclusive. Only my selected “friends” can see what I publish on Facebook. That would be great privacy feature, but there is a catch. All my information is stored in “the cloud,” and the masters of the cloud find and use my relations, my interest and my whereabouts. Right now, they mostly use it to tune the advertisements that appear on their pages, which is arguably benign. But the potential for abuse is there, and potentials have a way to turn into future realities. Secret polices, for example, will learn the lessons of the “Arab Spring” and find ways to access the information in the cloud.
Peer-to-peer networks have no master. Everybody is equal. There is no central repository. That should be a great privacy features, but peer-to-peer networks are not exclusive. Everybody can find what anyone else publishes on Bit Torrent. This is precisely what the agents of copyright holders do. They navigate the network, find which IP address publishes copies of protected work, and work from there to identify and sue the publisher. If copyright holders can do that, we can be assured that secret polices can do it too.
Could we combine the exclusivity of social networks and the decentralized nature of peer-to-peer networks? Suppose for example that we build a peer-to-peer service in which we would only establish connections with our close friends. Only my friends would see my IP address. There would be no central storage of my publications. It could easily be used to spread the messages of the next Arab spring if our friends redistribute them to their friends, much like we see jokes being e-mailed from friend to friend today. That would be really private.
Of course, there are engineering issues. We will need to identify our friends online without relying on a central server. We will need to discover our friends’ addresses before establishing the peer-to-peer connections without disclosing too much information. We will need some kind of relay to ensure that messages can be forwarded even if two friends are not online at the same time. We can probably to that with a combination of public key cryptography, distributed hash tables and peer-to-peer store-and-forward, much like PGP, Kademlia and Skype groups. This will not be a simple mash up, especially if we want to make it easy to use. But engineering challenges are fun!

Posted in Uncategorized | Tagged , , | Leave a comment

Safebook, an interesting peer-to-peer social network

Jon Crowcroft pointed me to Safebook, a peer to peer social network developed by a team of researchers at Eurecom. Eurecom is research lab and engineering school in Sophia-Antipolis, and I used to have closed contact with them 15 years ago, when I was researching networks at INRIA in Sophia-Antipolis. The Safebook work appears very interesting, and I am happy to see that Sophia-Antipolis continues to produce interesting networking project!

I have yet to finish reading their papers, but the general architecture of Safebook appears quite nice. The general idea is to replace the centralized repository of social interactions by a peer-to-peer network, and to also use this peer-to-peer network to relay transactions and obfuscate the source and origin of traffic, much like what is done with Tor.

The obvious issue with such architecture is the relative weakness of P2P networks against determined intruders. Classic architectures based on distributed hash tables are rather easy to penetrate by determined attackers. They would set up their own nodes, participate in the network, and either observe the traffic or in some cases selectively disrupt it. They use a centralized identity server to assign unique identities to network participants, which if done correctly could mitigate the “sybil” attack. The network appears organized around expanding circles of trust which they call “matriochkas,” and they replicate the user data on a small set of trusted peers, instead of the random nodes of classic DHT, which should contribute to better reliability. I think I will spend some time reading a bunch of papers!

Of course, a sound architecture does not guarantee success. The centralized nature of Facebook may be a target for all kind of interferences, but it is also very popular and very convenient. Being almost as good but way more private appeals to guys like me, but not always with the general public. It will be interesting to see whether we can develop specific services that can work peer to peer in Safebook but would be hard to develop in centralized services like Facebook or Twitter. I suppose that’s for the next project!

Posted in Uncategorized | 1 Comment

IPv6, Privacy and the IEEE 802 identifiers

Recently, I found myself writing to an IPv6 IETF mailing list that a critical part of the IPv6 design “looked like a good idea at the time” but in fact was not. Back when we were designing IPv6, we chose a simple address structure: 64 bit for the header designating the subnet, and then 64 bit for the host identifier. That structure enabled automatic address configuration: the computers learn the prefix from the router advertisement, and can automatically compose the address by composing it with their host identifier. Of course, this only works if host identifiers are unique within the subnet, but we had a simple solution for that. We could just derive the host identifier from the unique IEEE 802 identifier embedded in the Ethernet or wireless interface, and voila, automatic configuration with minimal risks of collisions. It really looked like a good idea at the time.

We were actually surprised when a few years later privacy advocates protested against that design. Embedding a unique number in the address allowed web sites to easily track users! The IETF reaction was to create a variant of IPv6 addresses, dubbed “privacy addresses” – see RFC 3041 later updated by RFC 4941 and the message on the subject from Steve Deering and Bob Hinden.

Privacy addresses have been implemented in Windows and in IOS, but there is still some resistance to their use. A lot of network management practices rely on tracking computers by addresses, so network administrators don’t really like the idea of computers hiding their identities by changing addresses over time. So we still have lots of networks where administrators insist on disabling the “privacy” features. That was perhaps acceptable when computers where statically bound to desks, but in a world or laptop and tablets, that is very counter-productive. The simple prohibition means that when a laptop moves out of the managed network it will still not use temporary addresses. For example, when it connects to the Wi-Fi network at the airport or in a café, it will continue using to build the IPv6 address using the MAC address of the Wi-Fi interface. By insisting on stable addresses in their network, the administrators are enabling online tracking of their users!

Of course, administrators are in a pickle. They tend to care about the user privacy, but they have conflicting requirements, and they really need to be able to manage their network. This is a problem that the IETF and the industry have to solve. The design is simple, although the details probably need a lot of work. Instead of relying on MAC addresses, the addresses could be formed with a hash of the MAC address, the prefix, and maybe a security key provided by the administrators. But before any standard happens, the problem has to be acknowledged. Let’s hope that this short note will contribute to the solution!

Posted in Uncategorized | Leave a comment

A simple P2P DNS proposal

Extending DNS operation for peer-to-peer name resolution may be much simpler than many expect. DNS resolvers know how to maintain caches of previously resolved names. These caches can be used for progressive peer-to-peer name resolution without relying on centralized servers.

Let’s assume first that the names in the peer-to-peer domain can be sorted, for example using alphabetic order – case insensitive, of course, since this is DNS. A node that receives a query for a peer-to-peer name will examine the requested name. If the name is known locally, the node will provide the answer. If the name is not known, the node will forward the request to one of the peers known from the cache, whose name is closest to the requested name according to the alphabetic order. The process will repeat until the name is found, or, if the name is not reachable, no progress can be made. The response will then be returned, just like any regular DNS query.

This may be better explained with an example. Suppose that a P2P name resolver, “rose.p2p,” wants to resolve the name “flower.p2p.” It will first look for the “closest” name in its own cache, maybe “earth.p2p” and send it the request. “Earth.p2p” will look for the closest name in its cache, maybe “fire.p2p,” and forward the request. “Fire.p2p” may forward the request to “Flare.p2p”, which forwards it to “flour.p2p,” which in turn could have the data for “flower.p2p.”

What I just described is basically the PNRP name resolution algorithm, adapted to DNS names instead of PNRP’s binary identifiers. The resolution process has the side effect of propagating knowledge in the network, as all intermediate nodes can see the response and cache it. In our example, rose, earth, fire and flare all learned about “flower.” The interesting point is that the density of cached names increases closer to the name itself. I won’t go in a lengthy mathematical demonstration, but this property has a very good effect on the network performance. As the network learns, queries get resolved ever faster, allowing the system to manage very large number of names.

Of course, there is a catch. The propagation of knowledge works, but it only works because there was an initial “chain of knowledge.” In our example, “flare.p2p” knew about “flour.p2p,” which knew about “flower.p2p.” This initial knowledge cannot be discovered by the side effect of queries, and has to be somehow “seeded” in the network. This has been studied a lot in the context of distributed hash tables, and also during the design of PNRP. The solution is to add to the heuristic described above a “leaf-set management” procedure.

In the context of DHT, the leaf-set designates the other nodes in the network whose indices are closest to a particular node. We can prove that if all nodes in the network know their leaf-set, every name in the network can be resolved. Of course, in the worst case the resolution will require visiting half the nodes in the network, but in practice the caches will allow for efficient short cuts. Efficient resolution requires thus a combination of leaf-set management and efficient caching.

It turns out that the leaf-set can be managed by a simple process. At regular intervals, a node can look at its own cache and find in the cache the two closest names to its own – specifically the one just before its name in alphabetic order and the one just below. It can then send to each of these nodes a query for made up name located in alphabetical order between the sender and the receiver, in the hope of learning whether some names exist that are even closer than the ones in the cache. If new names are found, the process can be repeated until the node is convinced of knowing the two names closest to its own in the network. To each of these, the node can send an “unsolicited DNS response” advertising its own name and address. This will be an extension to the regular DNS protocol, a specific extension for peer-to-peer domains. This simple extension ensures that the network is seeded with all the relevant names, and that queries will succeed, even if they require initially many relays.

We can see that P2P DNS could be implemented with very minimal extensions to the basic DNS protocol: allow for relays of requests to the closest name in the cache instead of relying solely on NS records; perform “leaf-set discovery” by sending fake queries at regular intervals; and, perform “leaf-set maintenance” by pushing unsolicited DNS responses to the leaf-set. Of course, this is just the sketch of the solution. The full specification will need to deal with cache management issues, such as what replacement algorithm to implement if the cache is full. And it will also have to deal with security issues. But that will be the matter for another post. For now, let’s just notice that developing P2P DNS does not need to be hard!

Posted in Uncategorized | 5 Comments