Equivariant Encryption (EE)

Equivariant Encryption (EE) is Nesa’s core technique for enabling fast, privacy-preserving inference—without relying on heavy cryptography or trusted hardware. It ensures that large models like LLMs and vision transformers can run over encrypted data at near-native speeds, without exposing user inputs, intermediate activations, or model outputs.


🔒 Why EE?

Most privacy-preserving methods (as discussed earlier) fail under the scale and latency constraints of modern AI inference:

Technique
Scales to LLMs?
Protects Input?
Trust Assumptions
Drawbacks

Homomorphic Encryption (HE)

None

10⁴–10⁶× slowdown, limited nonlinearity support

Trusted Execution (TEE)

⚠️ (Limited)

⚠️

CPU vendor, firmware

Memory limits, side-channel attacks

Differential Privacy (DP)

✅ (Training only)

None

Not usable at inference time

Zero-Knowledge Proofs (ZKP)

⚠️

❌ (alone)

None

High prover cost; not private by default

EE fills the gap: it provides encrypted inference that scales, runs fast, and requires no hardware trust.


⚙️ What Is Equivariant Encryption?

EE is a lightweight transformation scheme that enables models to operate directly on encrypted data. It ensures:

  • Recoverability: encrypted inputs can be decoded losslessly

  • Equivariance: the result of encrypted inference is identical to that of plaintext inference

Formally, for any plaintext input p:

  • Recoverability: decrypt(encrypt(p)) = p

  • Equivariance: decrypt(F(encrypt(p))) = F(p)

Where F is a supported operation (e.g., linear layers, ReLU, GeLU, LayerNorm).

This allows secure inference pipelines without altering model logic or degrading output quality.


🧠 How It Works

  1. Offline Transformation A secure setup phase transforms a model into its EE form by modifying layer operations.

  2. Encrypted Inference The EE model is deployed to remote nodes. Users submit encrypted queries; servers run inference directly on ciphertext—without ever decrypting.

  3. Decryption The user decrypts the result using their private key.

📌 All activations and intermediate states remain encrypted throughout the process. Here a high-level overview flowchart is provided.

Equivariant Encryption (EE) end-to-end workflow. A secure one-time setup phase transforms a trained neural network into its EE-compatible form. The encrypted model is then uploaded to cloud storage and distributed to untrusted inference servers. During inference, user queries remain encrypted throughout processing, with only the client able to decrypt the result. Remote compute resources may assist in sub-tasks, but all activations, parameters, and outputs stay encrypted end to end.

✅ Key Advantages

Property
EE Description

Server Blindness

All inputs, activations, and outputs stay encrypted

Runtime Speed

Near-identical latency to vanilla inference

Deep Model Compatibility

Supports transformers, CNNs, RAG, LayerNorm, and more

No Hardware Dependency

GPU-native; no enclave or vendor lock-in

Plug-and-Play

Minimal code changes (e.g., replace layer types)


📊 Benchmarking Results

  • Latency overhead: < 9% (measured on LLaMA-8B, with and without vLLM)

  • Fidelity score: > 99.99% match with vanilla inference

  • Applications tested: IMDB classification, MT-Bench QA, ShareGPT prompts, RAG

🧪 EE enables LLMs and RAG pipelines to maintain full accuracy and response quality—at production speed.


🛡️ Threat Model & Attack Resistance

EE is designed for robustness even under full adversarial observability:

  • Inputs and outputs are transformed via one-way, high-dimensional mappings

  • Reversing EE requires solving combinatorial permutation problems (e.g., 128k! for LLM vocabularies)

  • Known attack strategies (brute-force, hill-climbing, LLM-as-a-judge) are computationally infeasible in practice

EE's security comes from combinatorial hardness, not access control or TEE black boxes.


🧪 Deployment Scenarios

  • LLMs: Token embeddings remain encrypted during generation

  • Vision Models: Feature maps remain protected throughout convolutional and attention layers

  • RAG Pipelines: Queries and retrieved documents are encrypted end-to-end

  • Multi-modal Models: Encrypted inputs across modalities remain isolated from untrusted nodes

🛠 EE supports these models in sharded and parallelized settings, ensuring no plaintext data is leaked across the network.


📈 Comparison with Homomorphic Encryption (HE)

Property
EE
Fully Homomorphic Encryption (FHE)

Latency Overhead

Near-zero

10⁴–10⁶×

Nonlinear Ops

Exact (ReLU, GeLU, etc.)

Approximate only

Integration

Layer-local transforms

Full model rewrite

Accuracy

Matches plaintext inference

May degrade

Hardware

Commodity GPU

Often CPU-based

Key Management

Lightweight, per-user

Complex, scheme-bound


🧭 Summary

Equivariant Encryption (EE) delivers:

  • Encrypted inference for large models at production speed

  • Compatibility with modern deep learning architectures

  • No hardware dependencies

  • Mathematically provable correctness and privacy

While EE provides an efficient and blind inference framework over encrypted models, it operates under a single-server assumption. But what if the goal is to split trust between multiple servers—ensuring that no single machine ever sees even encrypted embeddings alone?

This is where HSS-EE comes in.

HSS-EE combines Equivariant Encryption with Homomorphic Secret Sharing (HSS), enabling secure two-party inference for large models like LLaMA-7B with sub-second latency and zero reliance on trusted hardware. By splitting each user query into additive shares and computing on both simultaneously, HSS-EE achieves information-theoretic security under non-collusion—while still preserving EE’s model-blindness guarantees.

Continue to HSS-EE: Secure Two-Party Inference at Scale

Last updated