# Practical Approach: Equivariant Encryption (EE)

**Equivariant Encryption (EE)** is Nesa’s core technique for enabling fast, privacy-preserving inference—without relying on heavy cryptography or trusted hardware. It ensures that large models like LLMs and vision transformers can run over encrypted data at near-native speeds, without exposing user inputs, intermediate activations, or model outputs.

***

### 🔒 Why EE?

Most privacy-preserving methods (as discussed [earlier](/nesa/major-innovations/private-inference-for-ai/why-privacy-matters-in-decentralized-ai.md)) fail under the scale and latency constraints of modern AI inference:

| Technique                   | Scales to LLMs?   | Protects Input? | Trust Assumptions    | Drawbacks                                       |
| --------------------------- | ----------------- | --------------- | -------------------- | ----------------------------------------------- |
| Homomorphic Encryption (HE) | ❌                 | ✅               | None                 | 10⁴–10⁶× slowdown, limited nonlinearity support |
| Trusted Execution (TEE)     | ⚠️ (Limited)      | ⚠️              | CPU vendor, firmware | Memory limits, side-channel attacks             |
| Differential Privacy (DP)   | ✅ (Training only) | ❌               | None                 | Not usable at inference time                    |
| Zero-Knowledge Proofs (ZKP) | ⚠️                | ❌ (alone)       | None                 | High prover cost; not private by default        |

**EE fills the gap**: it provides encrypted inference that scales, runs fast, and requires no hardware trust.

***

### ⚙️ What Is Equivariant Encryption?

EE is a lightweight transformation scheme that enables models to operate directly on encrypted data. It ensures:

* **Recoverability**: encrypted inputs can be decoded losslessly
* **Equivariance**: the result of encrypted inference is identical to that of plaintext inference

Formally, for any plaintext input `p`:

* **Recoverability**:\
  `decrypt(encrypt(p)) = p`
* **Equivariance**:\
  `decrypt(F(encrypt(p))) = F(p)`

Where `F` is a supported operation (e.g., linear layers, ReLU, GeLU, LayerNorm).

This allows secure inference pipelines without altering model logic or degrading output quality.

***

### 🧠 How It Works

1. **Offline Transformation**\
   A secure setup phase transforms a model into its EE form by modifying layer operations.
2. **Encrypted Inference**\
   The EE model is deployed to remote nodes. Users submit encrypted queries; servers run inference directly on ciphertext—without ever decrypting.
3. **Decryption**\
   The user decrypts the result using their private key.

📌 All activations and intermediate states remain encrypted throughout the process. Here a high-level overview flowchart is provided.

<figure><img src="/files/mWm25r0PHvWckXkcuFcC" alt=""><figcaption><p>Equivariant Encryption (EE) end-to-end workflow.<br>A secure one-time setup phase transforms a trained neural network into its EE-compatible form. The encrypted model is then uploaded to cloud storage and distributed to untrusted inference servers. During inference, user queries remain encrypted throughout processing, with only the client able to decrypt the result. Remote compute resources may assist in sub-tasks, but all activations, parameters, and outputs stay encrypted end to end.</p></figcaption></figure>

***

### ✅ Key Advantages

| Property                     | EE Description                                        |
| ---------------------------- | ----------------------------------------------------- |
| **Server Blindness**         | All inputs, activations, and outputs stay encrypted   |
| **Runtime Speed**            | Near-identical latency to vanilla inference           |
| **Deep Model Compatibility** | Supports transformers, CNNs, RAG, LayerNorm, and more |
| **No Hardware Dependency**   | GPU-native; no enclave or vendor lock-in              |
| **Plug-and-Play**            | Minimal code changes (e.g., replace layer types)      |

***

### 📊 Benchmarking Results

* **Latency overhead**: < 9% (measured on LLaMA-8B, with and without vLLM)
* **Fidelity score**: > 99.99% match with vanilla inference
* **Applications tested**: IMDB classification, MT-Bench QA, ShareGPT prompts, RAG

🧪 EE enables LLMs and RAG pipelines to maintain **full accuracy and response quality**—at production speed.

***

### 🛡️ Threat Model & Attack Resistance

EE is designed for robustness even under full adversarial observability:

* Inputs and outputs are transformed via one-way, high-dimensional mappings
* Reversing EE requires solving combinatorial permutation problems (e.g., 128k! for LLM vocabularies)
* Known attack strategies (brute-force, hill-climbing, LLM-as-a-judge) are computationally infeasible in practice

> EE's security comes from **combinatorial hardness**, not access control or TEE black boxes.

***

### 🧪 Deployment Scenarios

* **LLMs**: Token embeddings remain encrypted during generation
* **Vision Models**: Feature maps remain protected throughout convolutional and attention layers
* **RAG Pipelines**: Queries and retrieved documents are encrypted end-to-end
* **Multi-modal Models**: Encrypted inputs across modalities remain isolated from untrusted nodes

🛠 EE supports these models in sharded and parallelized settings, ensuring no plaintext data is leaked across the network.

***

### 📈 Comparison with Homomorphic Encryption (HE)

| Property             | EE                          | Fully Homomorphic Encryption (FHE) |
| -------------------- | --------------------------- | ---------------------------------- |
| **Latency Overhead** | Near-zero                   | 10⁴–10⁶×                           |
| **Nonlinear Ops**    | Exact (ReLU, GeLU, etc.)    | Approximate only                   |
| **Integration**      | Layer-local transforms      | Full model rewrite                 |
| **Accuracy**         | Matches plaintext inference | May degrade                        |
| **Hardware**         | Commodity GPU               | Often CPU-based                    |
| **Key Management**   | Lightweight, per-user       | Complex, scheme-bound              |

***

### 🧭 Summary

**Equivariant Encryption (EE)** delivers:

* Encrypted inference for large models at production speed
* Compatibility with modern deep learning architectures
* No hardware dependencies
* Mathematically provable correctness and privacy

While EE provides an efficient and blind inference framework over encrypted models, it operates under a single-server assumption. But what if the goal is to split trust between multiple servers—ensuring that **no single machine ever sees even encrypted embeddings alone**?

This is where **HSS-EE** comes in.

HSS-EE combines Equivariant Encryption with **Homomorphic Secret Sharing (HSS)**, enabling secure two-party inference for large models like LLaMA-7B with **sub-second latency** and **zero reliance on trusted hardware**. By splitting each user query into additive shares and computing on both simultaneously, HSS-EE achieves **information-theoretic security** under non-collusion—while still preserving EE’s model-blindness guarantees.

→ *Continue to HSS-EE: Secure Two-Party Inference at Scale*


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.nesa.ai/nesa/major-innovations/private-inference-for-ai/practical-approach-equivariant-encryption-ee.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
