SEMQ
High-dimensional vectors, encoded as compact symbolic angles.
The Problem
Modern AI systems rely on continuous vector embeddings as their semantic foundation.
While effective for short-term similarity, continuous representations lack structural guarantees. They drift over time, are expensive to persist at scale, and couple semantic meaning to magnitude and precision.
As systems grow larger, more distributed, and more persistent, these limitations compound. Memory becomes unstable. Routing becomes probabilistic. Storage and transmission become increasingly costly.
The issue is not performance. It is representation.
Typical embedding vector — 1536 dimensions
6,144 bytes · hardware-dependent · non-reproducible · un-diffable
The Insight
The meaningful structure of a high-dimensional vector is not in its raw values, it is in its orientation relative to others. SEMQ extracts that directional structure as a compact set of symbolic angles: a fixed-size encoding that is hardware-independent, deterministic, and human-inspectable.
The angle representation preserves the relational structure that matters, including nearest neighbors, clustering geometry, directional similarity, while discarding the float noise that doesn't.
Embedding space → symbolic angular domain
160 points · scattered magnitude → unit-circle direction
Before & after encoding
How It Works
SEMQ encodes pairs of vector components as angular coordinates on the unit circle. Each angle is a compact summary of a local relationship within the embedding. Taken together, the angle sequence is a an address in semantic space that is independent of floating-point precision or hardware rounding.
The encoding is proven to preserve pairwise similarity to within a configurable error bound. As the number of angle dimensions increases, the bound tightens.
Research
The SEMQ whitepaper is available here as a preprint.
SEMQ Team — [Venue / arXiv:XXXX.XXXXX]
Read the paper →Infrastructure Impact
Symbolic angles compose naturally with existing infrastructure. They are small enough to store in a database column, stable enough to use as cache keys, and structured enough to diff across model versions. SEMQ provides the semantic state layer that slots under your existing AI stack.
What becomes possible
Deterministic encodings make AI pipeline outputs stable across hardware and time.
16x smaller state footprint. Cache more, store longer, move faster.
Diff model outputs across versions like source code. Catch semantic drift early.
Stable angle keys enable exact-match and approximate semantic caches at scale.
Structured angle sequences make AI state auditable and loggable for the first time.
Hardware-agnostic encoding. Move workloads freely without state migration pain.
For Teams Building AI
SEMQ integrates at the embedding layer of your pipeline between your model and your storage, cache, or retrieval system. Generate a .semq file and you now have a portable semantic state.
[TODO: ADD EXAMPLES HERE]