Books

Designing Data Intensive Applications
Guide To Reliable Distributed Systems
Protocol Design
Introduction to Reliable and Secure Distributed Programming (Carlos Varela, Cachin)
Distributed Systems Principles and Paradigms (Tanenbaum)
Distributed Systems: Concepts and Design (Colouris)
Distributed Operating Systems (Tanenbaum)
Distributed Operating Systems: Theory and Practice (lamport)
Distributed Algorithms (Lynch)
Distributed Algorithms for Message-Passing Systems (raynal)
Distributed Systems An Algorithmic Approach (Ghosh)
Distributed Systems An Intuitive Approach (Wan Fokkink)
Notes on Theory of Distributed Systems
Scaling Reliably: Improving the Scalability of the Erlang Distributed Actor Platform

Designing Data Intensive Applications

I foundations of data systems
- 1 reliable, scalable, and maintainable applications
  - 1.1 thinking about data systems
  - 1.2 reliability
  - 1.3 scalability
  - 1.4 maintainability
- 2 data models and query languages
  - 2.1 relational model vs document model
  - 2.2 query languages for data
  - 2.3 graph-like data models
- 3 storage and retrieval
  - 3.1 data structures that power your database
  - 3.2 transaction processing or analytics?
  - 3.3 column-oriented storage
- 4 encoding and evolution
  - 4.1 formats for encoding data
  - 4.2 modes of dataflow
II distributed data
- 5 replication
  - 5.1 leaders and followers
  - 5.2 problems with replication lag
  - 5.3 multi-leader replication
  - 5.4 leaderless replication
- 6 partitioning
  - 6.1 partitioning and replication
  - 6.2 partitioning of key-value data
  - 6.3 partitioning and secondary indexes
  - 6.4 rebalancing partitions
  - 6.5 request routing
- 7 transactions
  - 7.1 the slippery concept of a transaction
  - 7.2 weak isolation levels
  - 7.3 serializability
- 8 the trouble with distributed systems
  - 8.1 faults and partial failures
  - 8.2 unreliable networks
  - 8.3 unreliable clocks
  - 8.4 knowledge, truth and lies
- 9 consistency and consensus
  - 9.1 consistency guarantees
  - 9.2 linearizability
  - 9.3 ordering guarantees
  - 9.4 distributed transactions and consensus
III derived data
- 10 batch processing
  - 10.1 batch processing with unix tools
  - 10.2 mapreduce and distributed filesystems
  - 10.3 beyond mapreduce
- 11 stream processing
  - 11.1 transmitting event streams
  - 11.2 database and streams
  - 11.3 processing streams
- 12 the future of data systems
  - 12.1 data integration
  - 12.2 unbundling databases
  - 12.3 aiming for correctness
  - 12.4 doing the right thing

Guide To Reliable Distributed Systems

I computing in the cloud
- 1 introduction
- 2 the way of the cloud
- 3 client perspective
- 4 network perspective
- 5 the structure of cloud data centers
- 6 remote procedure calls and the client/server model
- 7 corba: the common object request broker architecture
- 8 system support for fast client/server communication
II reliable distributed computing
- 9 how and why computer systems fail
- 10 overcoming failures in a distributed system
- 11 dynamic membership
- 12 group communication systems
- 13 point to point and multi-group considerations
- 14 the virtual synchrony execution model
- 15 consistency in distributed systems
III applications of reliability techniques
- 16 retrofitting reliability into complex systems
- 17 software architectures for group communication
IV related technologies
- 18 security options for distributed settings
- 19 clock synchronization and synchronization systems
- 20 transactional systems
- 21 peer-to-peer systems and probabilistic protocols
- 22 appendix A: virtualy synchronous methodology for building dynamic reliable services
- 23 appendix B: isis API

Protocol Design

1 introduction
- 1.1 what is a protocol?
- 1.2 protocols as processes
- 1.3 techniques for actual proofs
- 1.4 real protocols
- 1.5 readers's guide
2 CSP descriptions and proof rules
- 2.1 processes and process synchronization
- 2.2 channel history semantics
- 2.3 failure semantics
3 protocols and services
- 3.1 providing a service
- 3.2 service features
- 3.3 OSI and other layered architectures
4 basic protocol mechanisms
- 4.1 sequence control and error control
- 4.2 flow control
- 4.3 indication of change of peer state
- 4.4 change of service mode
- 4.5 multiplexing and splitting
- 4.6 segmentation and reassembly
- 4.7 prioritisation
5 multi-peer consensus
- 5.1 reliable consensus
- 5.2 election
- 5.3 commitment
- 5.4 byzantine agreement
- 5.5 clock synchronization
- 5.6 finding the global state
6 security
- 6.1 cryptographic methods
- 6.2 integrity
- 6.3 digital signatures
- 6.4 entity authentication
- 6.5 key exchange
- 6.6 non-cryptographic methods
7 naming addressing and routing
- 7.1 general principles of naming and addressing
- 7.2 addressing structures
- 7.3 routing
- 7.4 congestion
8 protocol encoding
- 8.1 simple binary encoding
- 8.2 TLV encoding
- 8.3 ASN.1 encoding
- 8.4 ASCII encodings
9 protocols in the OSI lower layers
- 9.1 data link layer
- 9.2 network layer
- 9.3 transport layer
10 application support protocols
- 10.1 session layer
- 10.2 presentation layer
- 10.3 application layer
- 10.4 basic application service elements
- 10.5 commitment, concurrency and recovery
- 10.6 client-server systems
- 10.7 security middleware
11 application protocols
- 11.1 file transfer
- 11.2 distributed transaction processing
- 11.3 message handling
- 11.4 hypertext and the world wide web
- 11.5 web services
A notation
- A.1 data types and variables
- A.2 data values and expressions
- A.3 processes and process expressions
- A.4 traces, failures, and transitions
- A.5 inference rules for process specifications
- A.6 security
B standarization
- B.1 standards organizations
- B.2 standards documents

Introduction to Reliable and Secure Distributed Programming (Carlos Varela, Cachin)

1 introduction
- 1.1 motivation
- 1.2 distributed programming abstractions
- 1.3 the end-to-end argument
- 1.4 software components
- 1.5 classes of algorithms
2 basic abstractions
- 2.1 distributed computation
- 2.2 abstracting processes
- 2.3 cryptographics abstractions
- 2.4 abstracting communication
- 2.5 timing assumptions
- 2.6 abstracting time
- 2.7 distributed system models
3 reliable broadcast
- 3.1 motivation
- 3.2 best-effort broadcast
- 3.3 regular reliable broadcast
- 3.4 uniform reliable broadcast
- 3.5 stubborn broadcast
- 3.6 logged best-effort broadcast
- 3.7 logged uniform reliable broadcast
- 3.8 probabilistic broadcast
- 3.9 FIFO and causal broadcast
- 3.10 byzantine consistent broadcast
- 3.11 byzantine reliable broadcast
- 3.12 byzantine broadcast channels
4 shared memory
- 4.1 introduction
- 4.2 (1,N) regular register
- 4.3 (1,N) atomic register
- 4.4 (N,N) atomic register
- 4.5 (1,N) logged regular register
- 4.6 (1,N) byzantine safe register
- 4.7 (1,N) byzantine regular register
- 4.8 (1,N) byzantine atomic register
5 consensus
- 5.1 regular consensus
- 5.2 uniform consensus
- 5.3 uniform consensus in the fail-noisy model
- 5.4 logged consensus
- 5.5 randomized consensus
- 5.6 byzantine consensus
- 5.7 byzantine randomized consensus
6 consensus variants
- 6.1 total-order broadcast
- 6.2 byzantine total order broadcast
- 6.3 terminating reliable broadcast
- 6.4 fast consensus
- 6.5 fast byzantine consensus
- 6.6 nonblocking atomic commit
- 6.7 group membership
- 6.8 view-synchronous communication
7 concluding remarks
- 7.1 implementation in appia
- 7.2 further implementations
- 7.3 further reading

Distributed Systems Principles and Paradigms (Tanenbaum)

1 introduction
- 1.1 definition of distributed system
- 1.2 goals
- 1.3 types of distributed systems
2 architectures
- 2.1 architectural styles
- 2.2 system architectures
- 2.3 architectures versus middleware
- 2.4 self-management in distributed systems
3 processes
- 3.1 threads
- 3.2 virtualization
- 3.3 clients
- 3.4 servers
- 3.5 code migration
4 communication
- 4.1 fundamentals
- 4.2 remote procedure call
- 4.3 message-oriented communication
- 4.4 stream-oriented communication
- 4.5 multicast communication
5 naming
- 5.1 names, identifiers and addresses
- 5.2 flat naming
- 5.3 structured naming
- 5.4 attribute-based naming
6 synchronization
- 6.1 clock synchronization
- 6.2 logical clocks
- 6.3 mutual exlusion
- 6.4 global positioning of nodes
- 6.5 election algorithms
7 consistency and replication
- 7.1 introduction
- 7.2 data-centric consistency models
- 7.3 client-centric consistency models
- 7.4 replica management
- 7.5 consistency protocols
8 fault tolerance
- 8.1 introduction to fault tolerance
- 8.2 process resilience
- 8.3 reliable client-server communication
- 8.4 reliable group communication
- 8.5 distributed commit
- 8.6 recovery
9 security
- 9.1 introduction to security
- 9.2 secure channels
- 9.3 access control
- 9.4 security management
10 distributed object-based systems
- 10.1 architecture
- 10.2 processes
- 10.3 communication
- 10.4 naming
- 10.5 synchronization
- 10.6 consistency and replication
- 10.7 fault tolerance
- 10.8 security
11 distributed file systems
- 11.1 architecture
- 11.2 processes
- 11.3 communication
- 11.4 naming
- 11.5 synchronization
- 11.6 consistency and replication
- 11.7 fault tolerance
- 11.8 security
12 distributed web-based systems
- 12.1 architecture
- 12.2 processes
- 12.3 communication
- 12.4 naming
- 12.5 synchronization
- 12.6 consistency and replication
- 12.7 fault tolerance
- 12.8 security
13 distributed coordination-based systems
- 13.1 introduction to coordination models
- 13.2 architectures
- 13.3 processes
- 13.4 communication
- 13.5 naming
- 13.6 synchronization
- 13.7 consistency and replication
- 13.8 fault tolerance
- 13.9 security
14 suggestions for further reading and bibliography

Distributed Systems: Concepts and Design (Colouris)

1 characterization of distributed systems
- 1.1 introduction
- 1.2 examples of distributed systems
- 1.3 trends in distributed systems
- 1.4 focus on resource sharing
- 1.5 challenges
- 1.6 case study: the world wide web
2 system models
- 2.1 introduction
- 2.2 physical models
- 2.3 architectural models
- 2.4 fundamental models
3 networking and internetworking
- 3.1 introduction
- 3.2 types of network
- 3.3 network principles
- 3.4 internet protocols
- 3.5 case studies: ethernet, wifi and bluetooth
4 interprocess communication
- 4.1 introduction
- 4.2 the api for the internet protocols
- 4.3 external data representation and marshalling
- 4.4 multicast communication
- 4.5 network virtualization: overlay networks
- 4.6 case study: MPI
5 remote invocation
- 5.1 introduction
- 5.2 request-reply protocols
- 5.3 remote procedure call
- 5.4 remote procedure invocation
- 5.5 case study: java RMI
6 indirect communication
- 6.1 introduction
- 6.2 group communication
- 6.3 publish-subscribe systems
- 6.4 message queues
- 6.5 shared memory approaches
7 operating systems support
- 7.1 introduction
- 7.2 the operating system layer
- 7.3 protection
- 7.4 processes and threads
- 7.5 communication and invocation
- 7.6 operating system architecture
- 7.7 virtualization at the operating system level
8 distributed objects and components
- 8.1 introduction
- 8.2 distributed objects
- 8.3 case study: CORBA
- 8.4 from objects to components
- 8.5 case studies: enterprise javabeans and fractal
9 web services
- 9.1 introduction
- 9.2 web services
- 9.3 service descriptions and IDL for web services
- 9.4 a directory service for use with web services
- 9.5 XML security
- 9.6 coordination of web services
- 9.7 applications of web services
10 peer-to-peer systems
- 10.1 introduction
- 10.2 napster and its legacy
- 10.3 peer-to-peer middleware
- 10.4 routing overlays
- 10.5 overlay case studies: pastry, tapestry
- 10.6 application case studies: squirrel, oceanstore, ivy
11 security
- 11.1 introduction
- 11.2 overview of security techniques
- 11.3 digital signatures
- 11.4 cryptography pragmatics
- 11.6 case studies: needham-schroeder, kerberos, tls, 802.11 wifi
12 distributed file systems
- 12.1 introduction
- 12.2 file service architecture
- 12.3 case study: sun network file system
- 12.4 case study: the andrew file system
- 12.5 enhancements and further developments
13 name services
- 13.1 introduction
- 13.2 name services and the domain name system
- 13.3 directory services
- 13.5 case study: the X.500 directory service
14 time and global states
- 14.1 introduction
- 14.2 clocks, events and process states
- 14.3 synchronizing physical clocks
- 14.4 logical time and logical clocks
- 14.5 global states
- 14.6 distributed debugging
15 coordination and agreement
- 15.1 introduction
- 15.2 distributed mutual exclusion
- 15.3 elections
- 15.4 coordination and agreement in group communication
- 15.5 consensus and related problems

Distributed Operating Systems (Tanenbaum)

1 introduction to distributed systems
- 1.1 what is a distributed system?
- 1.2 goals
- 1.3 hardware concepts
- 1.4 software concepts
- 1.5 design issues
2 communication in distributed systems
- 2.1 layered protocols
- 2.2 asynchronous transfer mode networks
- 2.3 the client-server model
- 2.4 remote procedure call
- 2.5 group communication
3 synchronization in distributed systems
- 3.1 clock synchronization
- 3.2 mutual exclusion
- 3.3 election algorithms
- 3.4 atomic transactions
- 3.5 deadlocks in distributed systems
4 processes and processors in distributed systems
- 4.1 threads
- 4.2 system models
- 4.3 processor allocation
- 4.4 scheduling in distributed systems
- 4.5 fault tolerance
- 4.6 real-time distributed systems
5 distributed file systems
- 5.1 distributed file system design
- 5.2 distributed file system implementation
- 5.3 trends in distributed file systems
6 distributed shared memory
- 6.1 introduction
- 6.2 what is shared memory?
- 6.3 consistency models
- 6.4 paged-based distributed shared memory
- 6.5 shared-variable distributed shared memory
- 6.6 object-based distributed shared memory
- 6.7 comparison
7 case study 1: amoeba
- 7.1 introduction to amoeba
- 7.2 objects and capabilities in amoeba
- 7.3 process management in amoeba
- 7.4 memory management in amoeba
- 7.5 communication in amoeba
- 7.6 the amoeba servers
8 case study 2: mach
- 8.1 introduction to mach
- 8.2 process management in mach
- 8.3 memory management in mach
- 8.4 communication in mach
- 8.5 unix emulation in mach
9 case study 3: chorus
- 9.1 introduction to chorus
- 9.2 process management in chorus
- 9.3 memory management in chorus
- 9.4 communication in chorus
- 9.5 unix emulation in chorus
- 9.6 cool: an object oriented subsystem
- 9.7 comparison of amoeba, mach and chorus
10 case study 4: DCE
- 10.1 introduction to DCE
- 10.2 threads
- 10.3 remote procedure call
- 10.4 time service
- 10.5 directory service
- 10.6 security service
- 10.7 distributed file system

Distributed Operating Systems: Theory and Practice (lamport)

1 formal aspects of concurrent systems
- 1.1 a formal basis for the specification of concurrent systems
- 1.2 on the construction of distributed programs
- 1.3 derivation of distributed algorithms
2 design issues for distributed systems
- 2.1 design of highly decentralised operating systems
- 2.2 communication models for distributed computation
- 2.3 new concepts for distributed system structuring
3 hardware support for distributed computing systems
- 3.1 distributed computing system architectures: hardware
- 3.2 hardware support for the distributed operating system of the heidelberg polyp processor
4 case studies
- 4.1 the apollo domain distributed file system
- 4.2 the chorus distributed operating system: some design issues
- 4.3 the conic support environment for distributed systems
- 4.4 an experience in solving a transactions ordering problem in a distributed system
- 4.5 distributed transaction processing and the camelot system
- 4.6 work programs

Distributed Algorithms (Lynch)

I synchronous network algorithms
- 1 introduction
- 2 modelling 1: synchronous network model
- 3 leader election in a synchronous ring
- 4 algorithms in general synchronous networks
- 5 distributed consensus with link failures
- 6 distributed consensus with process failures
- 7 more consensus problems
II asynchronous algorithms
- 8 modelling 2: asynchronous system model
IIA asynchronous shared memory algorithms
- 9 modelling 3: asynchronous shared memory model
- 10 mutual exclusion
- 11 resource allocation
- 12 consensus
- 13 atomic objects
IIB asynchronous network algorithms
- 14 modelling 4: asynchronous network model
- 15 basic asynchronous network algorithms
- 16 synchronizers
- 17 shared memory versus networks
- 18 logical time
- 19 global snapshots and stable properties
- 20 network resource allocation
- 21 asynchronous network with process failures
- 22 data link protocols
III partially synchronous algorithms
- 23 partially synchronous system models
- 24 mutual exlusion with partial synchrony
- 25 consensus with partial synchrony

Distributed Algorithms for Message-Passing Systems (raynal)

I distributed graph algorithms
- 1 basic definitions and network traversal algorithms
- 2 dsitributed graph algorithms
- 3 an algorithmic framework to compute global functions on a process graph
- 4 leader election algorithms
- 5 mobile objects navigating a network
II logical time and global states in distributed systems
- 6 nature of distributed computations and the concept of a global state
- 7 logical time in asynchronous distributed systems
- 8 asynchronous distributed checkpointing
- 9 simulating synchrony on top of asynchronous systems
III mutual exclusion and resource allocation
- 10 permission-based mutual exclusion algorithms
- 11 distributed resource allocation
IV high-level communication abstractions
- 12 order constraints on message delivery
- 13 rendezvous (synchronous) communication
V detection properties on distributed executions
- 14 distributed termination detection
- 15 distributed deadlock detection
VI distributed shared memory
- 16 atomic consistency (linearizability)
- 17 sequential consistency
- 18 afterword

Distributed Systems An Algorithmic Approach (Ghosh)

I background materials
- 1 introduction
  - 1.1 what is a distributed system
  - 1.2 why distributed systems
  - 1.3 examples of distributed systems
  - 1.4 important issues in distributed systems
  - 1.5 common subproblems
  - 1.6 implementing a distributed system
  - 1.7 parallel versus distributed systems
- 2 interprocess communication: an overview
  - 2.1 introduction
  - 2.2 network protocols
  - 2.3 naming
  - 2.4 remote procedure call
  - 2.5 remote method invocation
  - 2.6 messages
  - 2.7 web services
  - 2.8 event notification
  - 2.9 virtualization: cloud computing
  - 2.10 mobile agents
  - 2.11 basic group communication services
  - 2.12 concluding remarks
II foundational topics
- 3 models for communication
  - 3.1 need for a model
  - 3.2 message-passing model for interprocess communication
  - 3.3 shared variables
  - 3.4 modeling mobile agents
  - 3.5 relationship among models
  - 3.6 classification based on special properties
  - 3.7 complexity measures
  - 3.8 concluding remarks
- 4 representing distributed algorithms: syntax and semantics
  - 4.1 introduction
  - 4.2 guarded actions
  - 4.3 nondeterminism
  - 4.4 atomic operations
  - 4.5 fairness
  - 4.6 central versus distributed schedulers
  - 4.7 concluding remarks
- 5 program correctness
  - 5.1 introduction
  - 5.2 correctness criteria
  - 5.3 correctness proofs
  - 5.4 assertional reasoning: proving safety priorities
  - 5.5 proving liveness properties using well-founded sets
  - 5.6 programming logic
  - 5.7 predicate transformers
  - 5.8 concluding remarks
- 6 time in a distributed system
  - 6.1 introduction
  - 6.2 logical clocks
  - 6.3 vector clocks
  - 6.4 physical clock synchronization
  - 6.5 concluding remarks
III important paradigms
- 7 mutual exclusion
  - 7.1 introduction
  - 7.2 solutions on message-passing systems
  - 7.3 token-passing algorithms
  - 7.4 solutions on the shared-memory model
  - 7.5 mutual exclusion using special instructions
  - 7.6 group mutual exclusion
  - 7.7 concluding remarks
- 8 distributed snapshot
  - 8.1 introduction
  - 8.2 properties of consisten snapshots
  - 8.3 chandy-lamport algorithm
  - 8.4 lai-yang algorithm
  - 8.5 distributed debugging
  - 8.6 concluding remarks
- 9 global state collection
  - 9.1 introduction
  - 9.2 elementary algorithm for all-to-all broadcasting
  - 9.3 termination-detection algorithms
  - 9.4 wave algorithms
  - 9.5 distributed deadlock detection
  - 9.6 concluding remarks
- 10 graph algorithms
  - 10.1 introduction
  - 10.2 routing algorithms
  - 10.3 graph traversal
  - 10.4 graph coloring
  - 10.5 cole-vishkin reduction algorithm for tree coloring
  - 10.6 maximal independent set: luby's algorithm
  - 10.7 concluding remarks
- 11 coordination algorithms
  - 11.1 introduction
  - 11.2 leader election
  - 11.3 synchronizers
  - 11.4 concluding remarks
IV faults and fault-tolerant systems
- 12 fault-tolerant systems
  - 12.1 introduction
  - 12.2 classification of faults
  - 12.3 specification of faults
  - 12.4 fault-tolerant systems
  - 12.5 tolerating crash failures
  - 12.6 tolerating omission failures
  - 12.7 concluding remarks
- 13 distributed consensus
  - 13.1 introduction
  - 13.2 consensus in asynchronous systems
  - 13.3 consensus in synchronous systems: byzantine generals problem
  - 13.4 paxos algorithm
  - 13.5 failure detectors
  - 13.6 concluding remarks
- 14 distributed transactions
  - 14.1 introduction
  - 14.2 classification of transactions
  - 14.3 implementing transactions
  - 14.4 concurrency control and serializability
  - 14.5 atomic commit protocols
  - 14.6 recovery from failures
  - 14.7 conclusing remarks
- 15 group communication
  - 15.1 introduction
  - 15.2 atomic multicast
  - 15.3 ip multicast
  - 15.4 application layer multicast
  - 15.5 ordered multicasts
  - 15.6 reliable multicast
  - 15.7 open groups
  - 15.8 overview of transis
  - 15.9 concluding remarks
- 16 replicated data management
  - 16.1 introduction
  - 16.2 architecture of replicated data management
  - 16.3 data-centric consistency tools
  - 16.4 client-centric consistency protocols
  - 16.5 implementation of data-centric consistency models
  - 16.6 quorum-based protcols
  - 16.7 replica placement
  - 16.8 brewer's cap theorem
  - 16.9 case studies
  - 16.10 concluding remarks
- 17 self-stabilizing systems
  - 17.1 introduction
  - 17.2 theoretical foundation
  - 17.3 stabilizing mutual exclusion
  - 17.4 stabilizing graph coloring
  - 17.5 stabilizing spanning tree protocol
  - 17.6 stabilizing maximal matching
  - 17.7 distributed reset
  - 17.8 stabilizing clock phase synchronization
  - 17.9 concluding remarks
V real-world issues
- 18 distributed discrete-event simulation
  - 18.1 introduction
  - 18.2 distributed simulation
  - 18.3 conservative simulation
  - 18.4 optimistic simulation and time warp
  - 18.5 concluding remarks
- 19 security in distributed systems
  - 19.1 introduction
  - 19.2 security mechanisms
  - 19.3 common security attacks
  - 19.4 encryption
  - 19.5 secret key cryptosystem
  - 19.6 public key cryptosystems
  - 19.7 digital signatures
  - 19.8 hashing algorithms
  - 19.9 elliptic curve cryptography
  - 19.10 authentication server
  - 19.11 digital certificates
  - 19.12 case studies
  - 19.13 virtual private networks and firewalls
  - 19.14 sharing a secret
  - 19.15 concluding remarks
- 20 sensor networks
  - 20.1 vision
  - 20.2 architecture of sensor nodes
  - 20.3 challenges in wireless sensor networks
  - 20.4 routing algorithms
  - 20.5 time synchronization using reference broadcast
  - 20.6 localization algorithms
  - 20.7 security in sensor networks
  - 20.8 applications
  - 20.9 concluding remarks
- 21 social and peer-to-peer networks
  - 21.1 introduction to social networks
  - 21.2 metrics of social networks
  - 21.3 modeling socail networks
  - 21.4 centrality measures in social networks
  - 21.5 community detection
  - 21.6 introduction to peer-to-peer networks
  - 21.7 first-generation p2p systems
  - 21.8 second-generation p2p systems
  - 21.9 koorde and de bruijn graph
  - 21.10 skip graph
  - 21.11 replication management
  - 21.12 bittorrent and free riding
  - 21.13 censorship resistance, anonimity
  - 21.14 concluding remarks

Distributed Systems An Intuitive Approach (Wan Fokkink)

I message passing
- 1 introduction
- 2 preliminaries
- 3 snapshots
- 4 waves
- 5 deadlock detection
- 6 terminatino detection
- 7 garbage collection
- 8 routing
- 9 election
- 10 anonymous networks
- 11 synchronous networks
- 12 crash failures
- 13 byzantine failures
- 14 mutual exlusion
II shared memory
- 15 preliminaries
- 16 mutual exclusion II
- 17 barriers
- 18 self-stabilization
- 19 online scheduling

Notes on Theory of Distributed Systems

I message passing
- 1 introduction
- 2 model
- 3 coordinated attack
- 4 broadcast and convergecast
- 5 distributed breadth-first search
- 6 leader election
- 7 synchronous agreement
- 8 byzantine agreement
- 9 impossibility of asynchronous agreement
- 10 paxos
- 11 failure detectors
- 12 logical clocks
- 13 synchronizers
- 14 quorum systems
II shared memory
- 15 model
- 16 distributed shared memory
- 17 mutual exclusion
- 18 the wait-free hiararchy
- 19 atomic snapshots
- 20 lower bounds on perturbable objects
- 21 restricted-use objects
- 22 common2
- 23 randomized consensus and test-and-set
- 24 renaming
- 25 software transactional memory
- 26 obstruction-freedom
- 27 BG simulation
- 28 topological methods
- 29 approximate agreement

Scaling Reliably: Improving the Scalability of the Erlang Distributed Actor Platform

1 INTRODUCTION Controlling shared state is the only way to build reliable scalable systems. State shared by multiple units of computation limits scalability due to high synchronisation and communication costs. Moreover, shared state is a threat for reliability as failures corrupting or permanently locking shared state may poison the entire system. To facilitate the development of scalable Erlang systems and make them maintainable, we have developed three new tools—Devo, SDMon, and WombatOAM—and enhanced two others—the visualisation tool Percept and the refactorer Wrangler.
2 CONTEXT
- 2.1 Scalable Reliable Programming Models
- 2.2 Actor Languages
- 2.3 Erlang’s Support for Concurrency
- 2.4 Scalability and Reliability in Erlang Systems
- 2.5 ETS: Erlang Term Storage
3 BENCHMARKS FOR SCALABILITY AND RELIABILITY
- 3.1 Orbit
- 3.2 Ant Colony Optimisation (ACO)
4 ERLANG SCALABILITY LIMITS
- 4.1 Scaling Erlang on a Single Host
- 4.2 Distributed Erlang Scalability
- 4.3 Persistent Storage
5 IMPROVING LANGUAGE SCALABILITY
- 5.1 SD Erlang Design
- 5.2 S_group Semantics
- 5.3 Semantics Validation
6 IMPROVING VM SCALABILITY
- 6.1 Improvements to Erlang Term Storage
- 6.2 Improvements to Schedulers
- 6.3 Improvements to Time Management
7 SCALABLE TOOLS
- 7.1 Refactoring for Scalability
- 7.2 Scalable Deployment
- 7.3 Monitoring and Visualisation
8 SYSTEMIC EVALUATION
- 8.1 Orbit
- 8.2 Ant Colony Optimisation (ACO)
- 8.3 Evaluation Summary
9 DISCUSSION
APPENDIX: ARCHITECTURE SPECIFICATIONS

PKB