Swarm Topology
Last updated
Last updated
Due to the ever-changing topology of the swarm network, we further extend our computational efficiency by monitoring and classifying changes and the current state of our swarm topology over time. In conjunction, we build several precomputed sharding schemas based on the commonly observed swarm topologies as we monitor our networks. In doing so, we can quickly select among sharding schemes to find one that is near-optimal for the current swarm topology. In the following section, we outline how we monitor and analyze the swarm topology.
Dynamic sharding is computationally expensive as finding an optimal sharding scheme is known to be NP-hard. We mitigate this expense by precomputing highly efficient sharding schemas for different topological states of our network on a per-model basis. As the network topology itself is dynamic, we periodically monitor our network using techniques from (weighted) persistent homology, to give snapshots of topological characteristics to guide the applications of our pre-computed sharding schemas. The main goal of this section will be to introduce many of the concepts behind persistent homology (which is not a mainstream topic) to guide intuition on how this computational tool can be leveraged to make dynamic sharding optimization much more tractable.
As a basic introduction to the implementation of this framework, we briefly illustrate the mathematical technique of simplicial homology. This uses the simplicial complex to calculate the homology groups -- these groups are formally Abelian groups but they can be thought of as vector spaces. A simplicial complex is a collection of simplices that are the convex hull of a group of vertices. More explicitly, an -simplex contains vertices and is the collection of points in such that:
Thus, a 0-simplex is a point, a 1-simplex is a line segment, a 2-simplex is a triangle, and so on. To build a simplicial complex out of a collection of simplices, we need to define how to glue the pieces together. Mathematically it is given as (linear) boundary maps which map each -simplex to its boundary which lives in one dimension lower (if it exists). Here we are using the notation to indicate the collection of all -simplices in our complex, sums of which are often referred to as -chains or simply chains. In general, we can (abstractly) represent the entire data of the simplicial complex by:
Referring back to the boundary maps , the image of will be the (simplicial) boundary of . We assign an orientation to the simplices to make boundary calculations consistent. For example, with the 1-simple as the convex hull of the image of the boundary map would be defined as , with such an orientation induced by imaging the 1-simple is a directed edge from the point to the point . More explicitly:
where we have intentionally denoted as the simplex constructed from the vertices . Looking at Figure \ref{fig:simplical_homology}, and considering the triangle at the left, the collection of 1-chains contains three edges, which are the convex hulls of the sets, , and . The arrows give the orientation of each edge, which guides the boundary map calculation. We already calculated the boundary map for the first edge. The remaining two edges, and map can be calculated as:
By linearity, we can compute the boundary for the full chain, as:
Thus, the chain has no formal boundary (to be more clear this chain is what most people identify as the triangle itself). More importantly. This example demonstrates that a cycle (a chain without a boundary) has a trivial image. This will correlate with some topological holes in the simplicial complex. Thus, cycles correspond to elements of the kernel of the boundary map (a kernel of a map is simply all elements that map to zero). Formally, we calculate the th-homology group of a simplicial complex as
where is the group of -chains that are cycles and are the group of -chains that are boundaries.
Thinking of these groups as vector spaces with each dimension corresponding to a simplex and with the operation being vector addition, this definition allows one to identify chains where the difference is a boundary (a chain that has a non-trivial image). In other words, adding a boundary chain to a cycle does not change its class under homology. This is demonstrated in Figure \ref{fig:simplical_homology} where the two diagrams are equivalent since the difference in is the edge between . Note that this is a boundary because the value under the boundary map is , according to the indicated orientation. It should be further noted that points never have boundaries so does not count holes, but counts the number of connected components. Overall, this mathematical formalism allows one to unambiguously tabulate the topological characteristics of a wide variety of different geometric objects.
Persistent homology (PH) is based on introducing a filtration of the topological space as a sequence of simplicial complexes, which for our purposes the complexes will be the derived from the graph corresponding to our network. For clarity, we will refer to the persistent th homology group for a space as . For our application, the embedding of this graph into Euclidean space is based on the geographic locations and the edge weighting is a function of individual node internet connectivity and node computational power.
The filtering process in persistent homology for a graph resembles inflating circles around each network node (vertex) and capturing data about how the resulting topology changes as the circles begin to merge. One can gain intuition behind this idea by imagining three points on the vertices of a triangle (see Figure 1). A scale parameter increases the radius of each point uniformly. Each step along this parameter change introduces a new step in the filtration of complexes. Until the circles grow enough to overlap, there will be three separate components so the zeroth homology group which measures the number of connected components will have dimension 3 and higher order groups which measure the number of higher dimensional holes will be trivial (left schematic in Figure 1). As the radii grow they eventually grow to overlapping circles. Immediately after the circles overlap there will still be a gap between the three circles centered in the middle of the triangle, so now they’re one hole and thus the first homology group suddenly becomes non-trivial with dimension 1 (center schematic in Figure 1). At the same time the number of connected components collapses to 1 so the zeroth homology group will change from three dimensional to 1 dimensional. As the circles continue to grow, the gap in the middle will fill, closing the hole and the first homology group will become trivial again, but the zeroth homology group will continue to have dimension 1 (right schematic in Figure 1). No further meaningful changes will occur to the persistent diagram after this point.
The changes illustrated above are typically captured using a persistence diagram which simply introduces a point for each birth-death pair, where the x-axis is the birth time and the y-axis is the death time. Here time is just the scaling parameter. Since births always occur before deaths, all the points on the diagram will be above the main diagonal line . Following the toy example above, one can imagine that smaller holes will have quicker deaths since they do not persist long, so they will be near the main diagonal. Conversely bigger holes will have a longer lifetime. Intuitively, this gives information about the network latency, connectivity and possible bottlenecks. The lifetimes of the connected components, measured by , give a complementary view of these characteristics. This data from both and persistence can be fed into a dedicated ML classifier to predict which precomputed sharding scheme will be most efficient based on the current network topology. As calculating persistent homology is not NP-hard even in dimensions much higher than those used to model graphs, this gives us an avenue to select a near-optimal precomputed sharding scheme at a reasonable computational cost.