# Scalable Private Inference with HSS-EE

**HSS-EE (Homomorphic Secret Sharing over Encrypted Embeddings)** is Nesa’s two-party inference protocol that provides stronger privacy than standard encrypted inference. Built on top of [Equivariant Encryption (EE)](/nesa/major-innovations/private-inference-for-ai/practical-approach-equivariant-encryption-ee.md), HSS-EE ensures that **no single server ever sees the full user input, not even in encrypted form**.

This makes HSS-EE ideal for use cases with **high confidentiality demands**, such as medical inference, compliance-constrained applications, or decentralized infrastructure with minimal trust assumptions.

***

### 🔑 Core Idea: Additive Sharing + Encrypted Embeddings

HSS-EE splits an encrypted user input into **two additive shares**, sending each to a separate server. Both servers run the same EE-compatible model but over different shares:

* Neither server can reconstruct the original input
* Activations and outputs remain secret-shared end-to-end
* The user alone combines results to recover the output

This achieves **information-theoretic security under non-collusion** — even if one server is fully compromised.

***

### 🛠️ How HSS-EE Works

1. **Preprocessing (Client-side):**
   * The user embeds their input locally (e.g., token embeddings or image patches)
   * Applies Equivariant Encryption (EE)
   * Splits the encrypted vector into two additive shares: `x = x₁ + x₂`
2. **Distributed Inference (Server-side):**
   * Server A receives `x₁`, Server B receives `x₂`
   * Each server runs the same encrypted model over their share
   * Intermediate activations remain encrypted and secret-shared
3. **Result Reconstruction (Client-side):**
   * Final results are returned as shares
   * The user reconstructs the final output locally: `y = y₁ + y₂`

<figure><img src="/files/zq3EvRct0OHDfzEKTSYY" alt=""><figcaption><p>High-level diagram of HSS-EE workflow and broadcasting schedule showing minimized</p></figcaption></figure>

***

### 🧠 Why HSS-EE?

| Feature                             | EE                      | HSS-EE              |
| ----------------------------------- | ----------------------- | ------------------- |
| Protects input from server          | ✅                       | ✅                   |
| No server sees full encrypted input | ❌                       | ✅                   |
| Collusion-resistance                | ❌ (single-server trust) | ✅ (two-party model) |
| GPU compatible                      | ✅                       | ✅                   |
| Applicable to large models          | ✅                       | ✅                   |

HSS-EE is particularly valuable in settings where:

* No single party should have full visibility
* Regulators require infrastructure separation (e.g., EU + US)
* Inference is hosted across multi-party environments (e.g., DAO + enterprise)

***

### 📊 Performance Benchmarks

All results are measured using consumer-grade A100 nodes over gRPC with CUDA kernel acceleration.

| Model      | Latency (batch=1) | Throughput (QPS) | Notes                          |
| ---------- | ----------------- | ---------------- | ------------------------------ |
| LLaMA-2 7B | \~700–850 ms      | 3–5 QPS          | Sequence generation (1-token)  |
| ResNet-50  | \~400 ms          | 10–12 QPS        | Image classification           |
| T5-small   | \~540 ms          | 6 QPS            | Text generation, decoder-heavy |

➡ Compared to Equivariant Encryption (EE), HSS-EE adds \~1.5× latency due to dual-server parallelism.\
➡ Compared to vanilla inference, HSS-EE remains **2–3× faster than MPC or HE-based protocols** for similar security guarantees.

***

### 🛡️ Threat Model

HSS-EE assumes:

* **At most one server is compromised** (non-collusion assumption)
* User-side preprocessing is trusted (input embedding and splitting)
* Transport is secure (e.g., TLS or VPN)

Even if one server is malicious:

* It sees only a random-looking vector (a share)
* It cannot reconstruct the plaintext input
* It cannot reverse-engineer intermediate states

> HSS-EE raises the security bar while keeping inference scalable and GPU-compatible.

***

### 🧭 Summary

**HSS-EE enables secure, two-server inference over encrypted inputs** with near-native performance:

* Zero TEE requirement
* No single point of failure
* End-to-end secret-shared encrypted inference
* GPU-native implementation

HSS-EE is the **cryptographic backbone** for strong privacy in decentralized inference, particularly when users demand **split trust**.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.nesa.ai/nesa/major-innovations/private-inference-for-ai/scalable-private-inference-with-hss-ee.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
