Miner Reputation and Incentives

In decentralized inference, miners vary in reliability and responsiveness, and the system cannot know this in advance. Treating all miners equally would reward unreliable behavior and penalize dependable ones.

Nesa uses incentives to make reliable and efficient behavior consistently more profitable than unreliable behavior.


Core Principle

Miners who deliver correct and timely results should receive more opportunities and higher long-term rewards.

This principle is implemented through a continuously updated reputation score rather than manual rules or centralized control.

How Incentives Work

Reputation summarizes observed miner behavior and directly affects outcomes that matter:

  • Task assignment: higher-reputation miners are selected more often

  • Failure cost: timeouts and non-responses reduce future opportunities

  • Recovery: occasional failures are tolerated, persistent issues are deprioritized

From a miner’s perspective:

Better performance → higher reputation → more tasks → higher rewards


Design Goals

The miner reputation system is designed to be:

  • Fast to compute (per request)

  • Fair across modalities (LLMs, diffusion, evolution models, etc.)

  • Robust to failures (timeouts and non-responses)

  • Recoverable for miners with temporary issues

  • Extensible to future validation signals (e.g., output correctness)


Core Reputation Variable

Each miner maintains a reputation score:

  • Reputation R[0.1,10]R \in [0.1, 10]

  • Initialized at R=1R = 1

  • Updated after every inference request

The reputation score directly influences:

  • Miner selection priority

  • Task routing tier

  • Reward and penalty magnitude


Top-Down Reputation Update (Orchestrated Routing)

In orchestrated settings, miners are assigned tasks by an agent or scheduler. Reputation is updated primarily based on completion correctness.

Update Rule

R=RPenMRew1MR' = R \cdot \text{Pen}^{M} \cdot \text{Rew}^{1 - M}

Where:

  • RR: current reputation

  • RR': updated reputation

  • M{0,1}M \in \{0, 1\}: mistake indicator

    • M=1M = 1 → timeout or non-response

    • M=0M = 0 → successful execution

  • Pen=0.8\text{Pen} = 0.8: penalty multiplier

  • Rew=1.01\text{Rew} = 1.01 (– 1.05 under evaluation): reward multiplier

Interpretation

  • Correct behavior leads to gradual exponential growth

  • Failures cause multiplicative decay

  • Consistently correct miners quickly separate from unreliable ones

  • Occasional mistakes are recoverable


Bottom-Up Reputation Update (Bidding / Open Routing)

In decentralized or bidding-based architectures, correctness alone is insufficient. The system must also filter out miners that are consistently slow or underpowered.

To address this, the reputation update incorporates performance metrics.

Extended Update Rule

R=αRPenMRew1M  +  β(wSS+wFF+wBB+wII)R' = \alpha \cdot R \cdot \text{Pen}^{M} \cdot \text{Rew}^{1 - M} \;+\; \beta \cdot (w_S S + w_F F + w_B B + w_I I)

Where:

  • α=0.7\alpha = 0.7: weight for correctness history

  • β=0.3\beta = 0.3: weight for performance

  • SS: single-sample inference efficiency (token/s per block)

  • FF: forward-pass throughput (batch inference)

  • BB: backward-pass throughput (if applicable)

  • II: network responsiveness / bandwidth

  • wS,wF,wB,wIw_S, w_F, w_B, w_I: configurable weights

All performance metrics are min-max normalized:

xxxminxmaxxminx \leftarrow \frac{x - x_{\text{min}}}{x_{\text{max}} - x_{\text{min}}}

Design Rationale

  • Correctness remains the dominant signal

  • Performance differentiates miners with similar accuracy

  • Hardware-only advantages cannot overwhelm correctness

  • Low-reputation miners can still recover over time


Efficiency Metric SS

Efficiency is measured per request as:

Sraw=input_size+output_sizeactual_timeS_{\text{raw}} = \frac{\text{input\_size} + \text{output\_size}}{\text{actual\_time}}

Normalization bounds:

  • Computed per model, not per miner

  • Window size: last 100 requests for that model

  • Strict min/max bounds (percentile-based bounds under evaluation)

A rolling average is maintained per miner:

Srolling0.9Srolling+0.1SS_{\text{rolling}} \leftarrow 0.9 \cdot S_{\text{rolling}} + 0.1 \cdot S

Timeouts and Mistakes

A request is considered a mistake (M=1M = 1) if:

  • Execution time exceeds 1.5×1.5 \times expected time

  • No response within 2×2 \times expected time

Expected time accounts for:

  • Average execution time for the model

  • Input and output size scaling

Non-responses incur a stronger penalty multiplier than slow responses.


Rewards and Penalties

Rewards (Successful Requests)

reward=base_rewardRγ\text{reward} = \text{base\_reward} \cdot R^{\gamma}
  • γ=1.2\gamma = 1.2 (under evaluation)

  • Encourages long-term reliability

Penalties (Failures)

penalty=base_penaltyRδ\text{penalty} = \frac{\text{base\_penalty}}{R^{\delta}}
  • δ=0.5\delta = 0.5

  • Prevents overly harsh punishment of high-reputation miners

  • Still discourages repeated failures


Reputation Bounds and Recovery

  • Reputation is clamped to [0.1,10][0.1, 10]

  • A small catch-up factor helps low-reputation miners recover:

    • Encourages re-entry after transient issues

    • Prevents permanent exclusion


Empirical Observations

From large-scale inference traces:

  • Total response time is strongly correlated with model loading time

  • Inference time alone correlates weakly with model size

  • Performance distributions are heavy-tailed and zero-inflated

These observations motivate:

  • Normalization by model size

  • Separate accounting of loading vs inference time

  • Careful tuning of performance weights


Relationship to Miner Selection

Reputation directly affects:

  • Miner tier assignment

  • Selection probability in routing

  • Reputation-based rewards (RBR)

  • Fallback and retry ordering

The system favors reliable miners, but does not permanently exclude others.


Summary

The miner reputation system in Nesa:

  • Combines correctness and efficiency

  • Adapts across models and modalities

  • Supports decentralized participation

  • Enables recovery from transient failures

  • Remains extensible to future validation signals

This framework provides a stable foundation for trust-aware, incentive-aligned decentralized inference.

Last updated