Equivariant Encryption (EE)
Equivariant Encryption (EE) is Nesa’s core technique for enabling fast, privacy-preserving inference—without relying on heavy cryptography or trusted hardware. It ensures that large models like LLMs and vision transformers can run over encrypted data at near-native speeds, without exposing user inputs, intermediate activations, or model outputs.
🔒 Why EE?
Most privacy-preserving methods (as discussed earlier) fail under the scale and latency constraints of modern AI inference:
Homomorphic Encryption (HE)
❌
✅
None
10⁴–10⁶× slowdown, limited nonlinearity support
Trusted Execution (TEE)
⚠️ (Limited)
⚠️
CPU vendor, firmware
Memory limits, side-channel attacks
Differential Privacy (DP)
✅ (Training only)
❌
None
Not usable at inference time
Zero-Knowledge Proofs (ZKP)
⚠️
❌ (alone)
None
High prover cost; not private by default
EE fills the gap: it provides encrypted inference that scales, runs fast, and requires no hardware trust.
⚙️ What Is Equivariant Encryption?
EE is a lightweight transformation scheme that enables models to operate directly on encrypted data. It ensures:
Recoverability: encrypted inputs can be decoded losslessly
Equivariance: the result of encrypted inference is identical to that of plaintext inference
Formally, for any plaintext input p
:
Recoverability:
decrypt(encrypt(p)) = p
Equivariance:
decrypt(F(encrypt(p))) = F(p)
Where F
is a supported operation (e.g., linear layers, ReLU, GeLU, LayerNorm).
This allows secure inference pipelines without altering model logic or degrading output quality.
🧠 How It Works
Offline Transformation A secure setup phase transforms a model into its EE form by modifying layer operations.
Encrypted Inference The EE model is deployed to remote nodes. Users submit encrypted queries; servers run inference directly on ciphertext—without ever decrypting.
Decryption The user decrypts the result using their private key.
📌 All activations and intermediate states remain encrypted throughout the process. Here a high-level overview flowchart is provided.

✅ Key Advantages
Server Blindness
All inputs, activations, and outputs stay encrypted
Runtime Speed
Near-identical latency to vanilla inference
Deep Model Compatibility
Supports transformers, CNNs, RAG, LayerNorm, and more
No Hardware Dependency
GPU-native; no enclave or vendor lock-in
Plug-and-Play
Minimal code changes (e.g., replace layer types)
📊 Benchmarking Results
Latency overhead: < 9% (measured on LLaMA-8B, with and without vLLM)
Fidelity score: > 99.99% match with vanilla inference
Applications tested: IMDB classification, MT-Bench QA, ShareGPT prompts, RAG
🧪 EE enables LLMs and RAG pipelines to maintain full accuracy and response quality—at production speed.
🛡️ Threat Model & Attack Resistance
EE is designed for robustness even under full adversarial observability:
Inputs and outputs are transformed via one-way, high-dimensional mappings
Reversing EE requires solving combinatorial permutation problems (e.g., 128k! for LLM vocabularies)
Known attack strategies (brute-force, hill-climbing, LLM-as-a-judge) are computationally infeasible in practice
EE's security comes from combinatorial hardness, not access control or TEE black boxes.
🧪 Deployment Scenarios
LLMs: Token embeddings remain encrypted during generation
Vision Models: Feature maps remain protected throughout convolutional and attention layers
RAG Pipelines: Queries and retrieved documents are encrypted end-to-end
Multi-modal Models: Encrypted inputs across modalities remain isolated from untrusted nodes
🛠 EE supports these models in sharded and parallelized settings, ensuring no plaintext data is leaked across the network.
📈 Comparison with Homomorphic Encryption (HE)
Latency Overhead
Near-zero
10⁴–10⁶×
Nonlinear Ops
Exact (ReLU, GeLU, etc.)
Approximate only
Integration
Layer-local transforms
Full model rewrite
Accuracy
Matches plaintext inference
May degrade
Hardware
Commodity GPU
Often CPU-based
Key Management
Lightweight, per-user
Complex, scheme-bound
🧭 Summary
Equivariant Encryption (EE) delivers:
Encrypted inference for large models at production speed
Compatibility with modern deep learning architectures
No hardware dependencies
Mathematically provable correctness and privacy
While EE provides an efficient and blind inference framework over encrypted models, it operates under a single-server assumption. But what if the goal is to split trust between multiple servers—ensuring that no single machine ever sees even encrypted embeddings alone?
This is where HSS-EE comes in.
HSS-EE combines Equivariant Encryption with Homomorphic Secret Sharing (HSS), enabling secure two-party inference for large models like LLaMA-7B with sub-second latency and zero reliance on trusted hardware. By splitting each user query into additive shares and computing on both simultaneously, HSS-EE achieves information-theoretic security under non-collusion—while still preserving EE’s model-blindness guarantees.
→ Continue to HSS-EE: Secure Two-Party Inference at Scale
Last updated