A performance evaluation of the
Kad-protocol
Master Thesis
by
Ren´e Brunner
from Sinsheim
submitted at
Corporate Communications Department
Prof. Dr. E. Biersack
Institut Eur´ecom
France
Lehrstuhl f¨ur Praktische Informatik IV
Prof. Dr. W. Effelsberg
Fakult¨at f¨ur Mathematik und Informati k
Universit ¨a t Mannheim
Germany
November 2006
Supervisor: Dipl.-Wirtsch.-Inf. Moritz Steiner
Acknowledgements
First, I would like to thank my supervisor Prof. Dr. Ernst Biersack, who
gave me the honour to work at the Eurecom Institute. His comments and
feedback were a great enrichment for this Master Thesis as well as for my
research experience.
I am very grateful for the assistance and support of my advisor Dipl.-Wirtsch.-
Inf. Moritz Steiner. His expertise and advices supported me in many ways.
I want to express my gratitude to Prof. Dr. Wolfgang Effelsberg who made this
extraordinary exchange between the University of Mannheim and the Institute
Eur´ecom possible.
Finally, I would like to thank all people, who supported me directly or indirectly.
I am very grateful for the kindness and helpful environment during my s tay at
the Institute Eurecom.
Abstract
Most efficient peer-to-peer protocols deploy structured overlay networks based
on Distributed Hash Tables (DHTs). These have been extensively studied
through theoretical simulations and analysis over the last few years. Recently,
the popular eMule and aMule file-sharing app lications incorporate a widely-
deployed Kademlia-based DHT, called Kad. The Kad-network with over a
million simultaneous users enables the observation of its behaviour in practise.
This Master Thesis consists of two main parts. Firstly, it describes and ob-
serves the structure and processes of the Kad-protocol in detail. Secondly, it
empirically evaluates the performance of Kad with the focus on the data item
distribution process.
Contents
List of Figures viii
List of Tables xiii
Abbreviations xiv
1. Introduction 1
2. Background P2P - Related Work 3
2.1. Unstructured overlay network . . . . . . . . . . . . . . . . . . . . 4
2.2. Structured overlay network . . . . . . . . . . . . . . . . . . . . . 4
3. Kad architecture and structure 6
3.1. Kad background . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.1. Kad networks . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.2. Network p rotocol . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.3. Kademlia 128-bit space . . . . . . . . . . . . . . . . . . . 8
3.2. Routing table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.1. Contacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.2. Structur e . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.3. Insert node . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.4. Routing table maintenance . . . . . . . . . . . . . . . . . 14
3.2.5. Number of users and files . . . . . . . . . . . . . . . . . . 15
3.3. Initialisation processes . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.1. Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3.2. Initial handshake . . . . . . . . . . . . . . . . . . . . . . . 17
3.3.3. Firewall check . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3.4. Find Buddy . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4. Lookup process 20
4.1. The Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1.1. Object locating . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1.2. Concurrent lookup . . . . . . . . . . . . . . . . . . . . . . 21
4.1.3. Object ID . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2. The Search object . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.1. Search types . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.2. Search states . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.3. The iteration process . . . . . . . . . . . . . . . . . . . . . 28
4.2.4. Kademlia request . . . . . . . . . . . . . . . . . . . . . . . 29
iv
Contents v
5. File sharing 32
5.1. 2-level publishing s cheme . . . . . . . . . . . . . . . . . . . . . . 32
5.2. Object publishin g . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2.1. Metadata publishing . . . . . . . . . . . . . . . . . . . . . 35
5.2.2. Source publishing . . . . . . . . . . . . . . . . . . . . . . . 36
5.2.3. Pub lish note . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.3. Overload protection . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3.1. Reference limitation . . . . . . . . . . . . . . . . . . . . . 40
5.3.2. Repub lishing delay . . . . . . . . . . . . . . . . . . . . . . 42
5.4. Object retrieving . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.4.1. Keyword search . . . . . . . . . . . . . . . . . . . . . . . . 43
5.4.2. Source search . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.4.3. Note search . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.5. The file transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.5.1. Download process . . . . . . . . . . . . . . . . . . . . . . 45
5.5.2. Credit system . . . . . . . . . . . . . . . . . . . . . . . . . 45
6. Analysis framework 47
6.1. Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.1.1. aMule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.1.2. The database . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.1.3. Word list . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.1.4. General impr ovements . . . . . . . . . . . . . . . . . . . . 51
6.2. System infrastructur e . . . . . . . . . . . . . . . . . . . . . . . . 52
6.2.1. Active investigation . . . . . . . . . . . . . . . . . . . . . 52
6.2.2. Passive investigation . . . . . . . . . . . . . . . . . . . . . 53
7. Measurements and Analysis 55
7.1. Analysis of the Keywords . . . . . . . . . . . . . . . . . . . . . . 55
7.2. Analysis of iteration process . . . . . . . . . . . . . . . . . . . . . 56
7.2.1. Iteration time . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.2.2. Iteration messages . . . . . . . . . . . . . . . . . . . . . . 56
7.3. Analysis of peers . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.3.1. Firewalled peers . . . . . . . . . . . . . . . . . . . . . . . 57
7.3.2. Peers with the aliasing effect . . . . . . . . . . . . . . . . 57
7.3.3. Host availability . . . . . . . . . . . . . . . . . . . . . . . 58
7.4. Publishin g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.4.1. Standard Publishing Performance . . . . . . . . . . . . . . 60
7.4.2. Variation of the size of the tolerance zone . . . . . . . . . 62
7.4.3. Variation of the content replication factor . . . . . . . . . 62
7.5. Search time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.6. Metadata distributions . . . . . . . . . . . . . . . . . . . . . . . 63
7.6.1. Distribution in different tolerance zones . . . . . . . . . . 64
7.6.2. Distribution with different content replications . . . . . . 64
7.6.3. Distribution of republished metadata . . . . . . . . . . . . 65
7.6.4. Cumulative publish distribution . . . . . . . . . . . . . . . 68
7.6.5. Popular keyword d istribution . . . . . . . . . . . . . . . . 68