GoT seeders vs HBO
Hello everyone!
Well, the title of the blog post may not sound technical but it’s actually a technical post. “Game of Thrones - S8E01” will be out tomorrow and so will be torrent seeders for pirating its episodes. If you have ever tried to seed HBO’s content (or any other television network) on your servers, you must remember getting messages about abuse from your server provider. Do you actually understand how it works ? It’s all because of P2P protocol. You must have heard about Cryptocurrencies, decentralized network, TOR, GOSSIP protocol, torrent clients, some video calling solutions, Spotify(till 2014) - they all use P2P protocols.
Let’s see Peer-to-Peer 101.
In traditional, client-server networking model, all the clients request services from the central servers. If central server is down, client won’t be able to access the resources. To overcome that, we need to introduce redundancies and design services such that we can make systems resilient.
Where as in P2P network, all the nodes are regarded as peers which are equally privileged and share resources amongst each other without the use of any centralized system.
When Internet was designed, I believe it was drafted mainly keeping “Client-Server” networking model in mind. But still, we can implement P2P network over OSI. The crux of P2P model is we have to architect our system in such a way that peers can act as “clients” as well as “servers” for other peers.
To achieve this, we can create a virtual network of nodes. Let’s try to build up a baseline P2P system like Bittorrent for sharing files.
Joining the network: Let’s say a peer wants to join our virtual network. It somehow knows which all other peers are in the network. It will broadcast its own identity to those peers and will ask them for theirs. It can be a simple ping message implemented in our protocol which will be sent by the peer (acting as client) to other peers (acting as server).
To get to know about those peers, we can save that information in foo.torrent
file which you usually download from the torrent websites.
Peer identity: If a peer joins and then leaves a network, you may want to preserve identity of the peer. May be you have some ranking algorithm for leeching score which can be used to generate foo.torrent
file.
To do this, you can create UUID and save it client-side. Ahem ahem, your centralised service storing all the torrent files can keep this list. There can be several use cases of uniquely identifying peers based on the problem you are modelling.
Data fetching: Now you have connected to all the peers who have data which you are interested in. How can you get that data ? Well, you may just define “FETCH” operation in your protocol in which you will mention the file hash (stored in the foo.torrent
). Then you will send it to one of the peers and get file from it.
Instead of doing this, what if you think of dividing whole file into n
parts and store hashes of each part into the same foo.torrent
file. Then you can request data from multiple peers. This way you may even request data from the peers who themselves are seeding some part from other peers. After you fetch one part, you can compute hash of that part and see if hash is correct or not. If hash isn’t correct, that means peer is sending invalid data and you can announce it in the network, by which based on ranking, network consensus will tell every client to block that specific peer. So, there are infinite optimisations and fixes you can apply to this protocol.
Also, if you understand computer networking well, you might be thinking all the peers would need public IP in order to communicate like this. Yes, that’s true, but there are some ways to over come this. One obvious way would be having a server acting as relay between these two peers. More complex but not globally supported technique is hole-punching. You can google it out to know how can two device behind the NAT communicate with each other.
Adding new data: Let’s say we want to add new episode to our network so that everyone who’s interested in it, can fetch it. Since we are relying on the central service to give us foo.torrent
, client application running on peer can upload list of all the files which are present locally. Central service will then create an index of that which can be presented to other users.
“Napster” - first music streaming service did something like this. But since it used to maintain index, just like we did in above given design, of all the files on the central server, company faced legal issues and it was finally shutdown.
To fix this, another design came up - Gnutella network which is fully distributed network unlike Napster’s implementation. Ever heard of Limewire ? I remember downloading song from there on a dial-up connection at 5KBps, sitting beside my dad, back when I was in 8th standard. And after it downloaded successfully, I remember happiness on my face to see that magic. What a time it was! Limewire was Gnutella client application. But it faced scaling issues with increase in number of users. You can read more about its protocol implementation on Wikipedia.
Base of more reliable and scalable systems is Kademila DHT. I may write about it some other time.
Back to the title of the blog, now I think you can guess how HBO can know about the peers who are pirating the content and then report it to service providers to take them down. To overcome this, people came up with the idea of private trackers.
That’s it for this post. Now go think how all usecases listed at the start of the blog can be implemented using P2P protocols.
Thanks for reading!