Equivariant Encryption (EE)

Equivariant Encryption (EE) is Nesa’s core technique for enabling fast, privacy-preserving inference—without relying on heavy cryptography or trusted hardware. It ensures that large models like LLMs and vision transformers can run over encrypted data at near-native speeds, without exposing user inputs, intermediate activations, or model outputs.

🔒 Why EE?

Most privacy-preserving methods (as discussed earlier) fail under the scale and latency constraints of modern AI inference:

Technique

Scales to LLMs?

Protects Input?

Trust Assumptions

Drawbacks

Homomorphic Encryption (HE)

❌

✅

None

10⁴–10⁶× slowdown, limited nonlinearity support

Trusted Execution (TEE)

⚠️ (Limited)

⚠️

CPU vendor, firmware

Memory limits, side-channel attacks

Differential Privacy (DP)

✅ (Training only)

❌

None

Not usable at inference time

Zero-Knowledge Proofs (ZKP)

⚠️

❌ (alone)

None

High prover cost; not private by default

EE fills the gap: it provides encrypted inference that scales, runs fast, and requires no hardware trust.

⚙️ What Is Equivariant Encryption?

EE is a lightweight transformation scheme that enables models to operate directly on encrypted data. It ensures:

Recoverability: encrypted inputs can be decoded losslessly
Equivariance: the result of encrypted inference is identical to that of plaintext inference

Formally, for any plaintext input p:

Recoverability: decrypt(encrypt(p)) = p
Equivariance: decrypt(F(encrypt(p))) = F(p)

Where F is a supported operation (e.g., linear layers, ReLU, GeLU, LayerNorm).

This allows secure inference pipelines without altering model logic or degrading output quality.

🧠 How It Works

Offline Transformation A secure setup phase transforms a model into its EE form by modifying layer operations.
Encrypted Inference The EE model is deployed to remote nodes. Users submit encrypted queries; servers run inference directly on ciphertext—without ever decrypting.
Decryption The user decrypts the result using their private key.

📌 All activations and intermediate states remain encrypted throughout the process. Here a high-level overview flowchart is provided.

✅ Key Advantages

Property

EE Description

Server Blindness

All inputs, activations, and outputs stay encrypted

Runtime Speed

Near-identical latency to vanilla inference

Deep Model Compatibility

Supports transformers, CNNs, RAG, LayerNorm, and more

No Hardware Dependency

GPU-native; no enclave or vendor lock-in

Plug-and-Play

Minimal code changes (e.g., replace layer types)

📊 Benchmarking Results

Latency overhead: < 9% (measured on LLaMA-8B, with and without vLLM)
Fidelity score: > 99.99% match with vanilla inference
Applications tested: IMDB classification, MT-Bench QA, ShareGPT prompts, RAG

🧪 EE enables LLMs and RAG pipelines to maintain full accuracy and response quality—at production speed.

🛡️ Threat Model & Attack Resistance

EE is designed for robustness even under full adversarial observability:

Inputs and outputs are transformed via one-way, high-dimensional mappings
Reversing EE requires solving combinatorial permutation problems (e.g., 128k! for LLM vocabularies)
Known attack strategies (brute-force, hill-climbing, LLM-as-a-judge) are computationally infeasible in practice

EE's security comes from combinatorial hardness, not access control or TEE black boxes.

🧪 Deployment Scenarios

LLMs: Token embeddings remain encrypted during generation
Vision Models: Feature maps remain protected throughout convolutional and attention layers
RAG Pipelines: Queries and retrieved documents are encrypted end-to-end
Multi-modal Models: Encrypted inputs across modalities remain isolated from untrusted nodes

🛠 EE supports these models in sharded and parallelized settings, ensuring no plaintext data is leaked across the network.

📈 Comparison with Homomorphic Encryption (HE)

Property

Fully Homomorphic Encryption (FHE)

Latency Overhead

Near-zero

10⁴–10⁶×

Nonlinear Ops

Exact (ReLU, GeLU, etc.)

Approximate only

Integration

Layer-local transforms

Full model rewrite

Accuracy

Matches plaintext inference

May degrade

Hardware

Commodity GPU

Often CPU-based

Key Management

Lightweight, per-user

Complex, scheme-bound

🧭 Summary

Equivariant Encryption (EE) delivers:

Encrypted inference for large models at production speed
Compatibility with modern deep learning architectures
No hardware dependencies
Mathematically provable correctness and privacy

While EE provides an efficient and blind inference framework over encrypted models, it operates under a single-server assumption. But what if the goal is to split trust between multiple servers—ensuring that no single machine ever sees even encrypted embeddings alone?

This is where HSS-EE comes in.

HSS-EE combines Equivariant Encryption with Homomorphic Secret Sharing (HSS), enabling secure two-party inference for large models like LLaMA-7B with sub-second latency and zero reliance on trusted hardware. By splitting each user query into additive shares and computing on both simultaneously, HSS-EE achieves information-theoretic security under non-collusion—while still preserving EE’s model-blindness guarantees.

→ Continue to HSS-EE: Secure Two-Party Inference at Scale

PreviousWhy Privacy Matters in Decentralized AI NextHSS-EE: Secure Two-Party Inference at Scale

Last updated 27 days ago