SEMQ — The semantic state layer for reproducible AI infrastructure

The Problem

The representation problem.

Modern AI systems rely on continuous vector embeddings as their semantic foundation.

While effective for short-term similarity, continuous representations lack structural guarantees. They drift over time, are expensive to persist at scale, and couple semantic meaning to magnitude and precision.

As systems grow larger, more distributed, and more persistent, these limitations compound. Memory becomes unstable. Routing becomes probabilistic. Storage and transmission become increasingly costly.

The issue is not performance. It is representation.

Typical embedding vector — 1536 dimensions

6,144 bytes · hardware-dependent · non-reproducible · un-diffable

The Insight

Semantic content lives in direction, not magnitude.

The meaningful structure of a high-dimensional vector is not in its raw values, it is in its orientation relative to others. SEMQ extracts that directional structure as a compact set of symbolic angles: a fixed-size encoding that is hardware-independent, deterministic, and human-inspectable.

The angle representation preserves the relational structure that matters, including nearest neighbors, clustering geometry, directional similarity, while discarding the float noise that doesn't.

Embedding space → symbolic angular domain

160 points · scattered magnitude → unit-circle direction

Before & after encoding

Traditional embedding

FP32 dense vector

[-0.321, 0.552, -0.113, 0.984, -0.447, 0.218,
-0.032, 0.761, ...]

Traditional high-dimensional FP32 embedding (768 floats).

Same meaning, smaller symbolic representation.

Why it matters

Dense FP32 vectors balloon storage and query costs. SEMQ keeps the semantic meaning while shrinking payloads so indexes stay light and queries fly.

Less I/O per lookup
Smaller indexes & RAM
Cheaper scaling
Directional semantics preserved

How It Works

A topology-preserving projection.

SEMQ encodes pairs of vector components as angular coordinates on the unit circle. Each angle is a compact summary of a local relationship within the embedding. Taken together, the angle sequence is a an address in semantic space that is independent of floating-point precision or hardware rounding.

The encoding is proven to preserve pairwise similarity to within a configurable error bound. As the number of angle dimensions increases, the bound tightens.

Research

Read the paper.

The SEMQ whitepaper is available here as a preprint.

Preprint — 2026

Symbolic Angle Encoding for Deterministic Semantic State in Large-Scale AI Systems

SEMQ Team — [Venue / arXiv:XXXX.XXXXX]

Read the paper →

Infrastructure Impact

A new primitive for AI systems.

Symbolic angles compose naturally with existing infrastructure. They are small enough to store in a database column, stable enough to use as cache keys, and structured enough to diff across model versions. SEMQ provides the semantic state layer that slots under your existing AI stack.

What becomes possible

Reproducibility

Deterministic encodings make AI pipeline outputs stable across hardware and time.

Compression

16x smaller state footprint. Cache more, store longer, move faster.

Versioning

Diff model outputs across versions like source code. Catch semantic drift early.

Caching

Stable angle keys enable exact-match and approximate semantic caches at scale.

Observability

Structured angle sequences make AI state auditable and loggable for the first time.

Portability

Hardware-agnostic encoding. Move workloads freely without state migration pain.

For Teams Building AI

Start shipping.

SEMQ integrates at the embedding layer of your pipeline between your model and your storage, cache, or retrieval system. Generate a .semq file and you now have a portable semantic state.

[TODO: ADD EXAMPLES HERE]

The semantic state layerfor reproducible AI infrastructure.