LogoLogo
  • Nesa Docs
    • Introduction to Nesa
    • Overview of the Nesa System
      • AI Models: Repository, Standardization, Uniformity
      • Users: Why Do We Need Private Inference?
      • Node Runners: Doing Inference and Earning $NES
    • Organization of the Documentation
  • Technical Designs
    • Decentralized Inference
      • Overview
      • Model Partitioning and Deep Network Sharding
      • Dynamic Sharding of Arbitrary Neural Networks
      • Cache Optimization to Enhance Efficiency
      • BSNS with Parameter-efficient Fine-tuning via Adapters
      • Enhanced MTPP Slicing of Topological Order
      • Swarm Topology
      • Additional: Free-Riding Prevention
    • Security and Privacy
      • Overview
      • Hardware Side: Trusted Execution Environments (TEEs)
      • Software/algorithm Side: Model Verification
        • Zero-knowledge Machine Learning (ZKML)
        • Consensus-based Distribution Verification (CDV)
      • Software/algorithm Side: Data Encryption
        • Visioning: Homomorphic Encryption
        • Implementation: Split Learning (HE)
      • Additional Info
        • Additional Info: Trusted Execution Environments (TEEs)
        • Additional Info: Software-based Approaches
    • Overview of $NES
      • $NES Utility
    • The First Application on Nesa: DNA X
    • Definitions
    • Additional Information
      • Dynamic Model Versioning and Fork Management
      • Nesa's Utility Suite
      • The AI Kernel Market
      • Privacy Technology
        • Trusted Execution Environment (TEE)
        • Secure Multi-Party Computation (MPC)
        • Verifiable Random Function (VRF)
        • Zero-Knowledge Proof (ZKP)
      • The Integration of Evolutionary AI to Evolve the Nesa Ecosystem
      • Interoperability and Nesa Future Plans
  • Using Nesa
    • Getting Started
      • Wallet Setup
      • Testnet Nesa Faucet
    • Via Web
      • Your Nesa Account
      • Selecting an AI Kernel
      • Submitting a Query
    • Via SDK
    • Via IBC
    • Via NESBridge
      • On Sei
  • Run a Nesa Node
    • Prerequisites
    • Installation
    • Troubleshooting
    • FAQ
  • Links
    • nesa.ai
    • Nesa Discord
    • Nesa Twitter
    • Nesa dApp: dnax.ai
    • Nesa dApp: DNA X Docs
    • Terms of Service
    • Privacy Policy
Powered by GitBook
On this page
  1. Technical Designs
  2. Decentralized Inference

Overview

PreviousDecentralized InferenceNextModel Partitioning and Deep Network Sharding

Last updated 11 months ago

Distributed inference and training across multiple nodes are essential due to the exponential increase in the size and complexity of deep learning models and the scarcity of computational resources capable of processing them. The essence of both of these tasks lies in the necessity to handle vast computational loads more efficiently and to reduce the latency involved in generating predictions and updating model parameters. This approach uses the collective computational resources and memory available across several processing nodes. It makes it possible to utilize larger models or increase batch sizes without a proportional increase in inference or training time. One of the critical aspects of efficient distributed inference and training is partitioning the computational graph of any neural network. The computational graph represents all the operations and data flows within the model from input to output. Partitioning this graph effectively means dividing the model’s computations in such a way that they can be processed in parallel or in sequence across different nodes.

During training, this distributed approach also allows for the parallelization of gradient computation and parameter updates, significantly accelerating the training process. Strategies to minimize communication overhead are critical as well. Specifically, efficient communication and synchronization mechanisms are essential for updating model parameters across nodes without incurring significant delays. We use different strategies to minimize the communication overhead between the computational units. After processing its assigned portion of the graph, each node must send its outputs to the next in the sequence. However, in standard distributed approaches, this inter-node communication often happens over slower channels, which can become a significant bottleneck.

In the following of sections, we introduce Nesa's innotive approaches in making AI model inference decentralized:

  • and describe our approach to splitting large AI model computational graphs into small chunks to be distributed across node runners

  • Additionally, we provide designs to further improve the efficiency of the Nesa system via , , ,

Model Partitioning and Deep Network Sharding
Dynamic Sharding of Arbitrary Neural Networks
Cache Optimization to Enhance Efficiency
BSNS with Parameter-efficient Fine-tuning via Adapters
Enhanced MTPP Slicing of Topological Order
Swarm Topology