Vera VB — Voice Biometric Identity Platform

01 — The Challenge

Voice IP is being extracted before creators know what they're signing away.

Every major AI voice platform requires creators to upload audio data. The terms governing that upload — perpetual licenses, irrevocable rights, rights to train on and commercialize the voice — are written to maximize the platform's flexibility and minimize the creator's future options. This is not a new pattern. It is the same pattern the music industry used for decades, now executing at the speed of software deployments rather than contract negotiations.

The problem is structural. There is no platform-independent infrastructure for voice creators to establish verified ownership of their own biometric data before platforms capture it. Existing provenance standards like C2PA are designed to prove file authorship — they do not cover biometric voice identity, clone detection, or longitudinal aging analysis. Vera VB fills that gap: not as a protection tool or a monitoring service, but as an identity layer that creators own independently of any platform relationship.

0 Independent Registries

No platform-neutral infrastructure exists for creators to establish verified ownership of their voice biometrics before engaging AI voice platforms.

Perpetual & Irrevocable

Standard voice platform ToS grants perpetual, irrevocable, worldwide licenses to uploaded voice data — terms that survive platform relationship termination.

No Identity Standard

C2PA and emerging provenance standards address file-level authorship. Biometric voice identity — speaker embeddings, aging characteristics, clone detection — has no standard.

02 — Strategic Approach

Build the identity layer, not the tool. Survive the standards cycle.

The most important architectural decision was framing. Vera VB is not a "voice protection tool" — a category that will be marginalized as platform compliance features improve. It is a biometric identity layer — a category that is additive to every provenance standard that emerges, because no provenance standard covers biometric voice identity.

C2PA will commoditize file-level timestamps. Blockchain provenance will handle content authenticity. None of these systems address the biometric identity question: is this voice sample from the same speaker as a reference set registered two years ago, and can that claim be verified independently? That question requires speaker embeddings, longitudinal aging analysis, and clone detection — capabilities that Vera VB builds as its core, not as features.

Zero-knowledge privacy architecture: Vera VB never stores decrypted audio. Only derived features (cryptographic hashes, speaker embeddings, temporal metrics, synthetic detection scores) are retained on the platform. Users hold their own audio in their own Google Drive.
Quarterly re-registration as the core product mechanic — each registration adds a temporal layer to a longitudinal biometric record. A single registration is useful. Twelve quarterly data points with documented aging characteristics are something no competitor can retroactively manufacture.
Open-source ML models requiring assembly rather than invention — the value is in the architecture, the verification stack design, and the accumulated longitudinal dataset, not in proprietary algorithms.
Five independent proof layers, each independently valuable and independently verifiable — designed so that the failure or commoditization of any single layer does not compromise the others.

03 — Technical Architecture

Five independent proof layers. Zero audio on platform.

The architecture is designed around a single constraint that drives every other decision: Vera VB must never be a voice data honeypot. If the platform held audio, it would be the single most valuable — and most targeted — voice data repository on the internet. By holding only derived features, a breach yields nothing exploitable. This constraint is a structural advantage, not a limitation.

The five-layer verification stack is the core product. Each layer is independently verifiable by a third party — courts, platforms, compliance officers, or the creator themselves:

L1 — File Hash

SHA-256 hash of the original audio file, held by the user. Tamper-evident, platform-independent, blockable by no one — the creator controls this layer entirely. Provides cryptographic proof that a specific file existed at a specific time.

L2 — Embeddings

Acoustic speaker embeddings extracted using Resemblyzer (GE2E) and ECAPA-TDNN models. These speaker vectors capture the biometric identity of the voice independent of content — the same speaker, different words, same embedding neighborhood. Stored as derived features only; never reconstructable to audio.

L3 — Blockchain Timestamp

OriginStamp integration anchors the registration hash into the Bitcoin and Ethereum blockchains. Provides a publicly verifiable timestamp that cannot be backdated, independent of any relationship with Vera VB as a company.

L4 — Copyright Office

Facilitated Copyright Office registration for voice samples as an expressive work. Establishes legal standing independent of platform terms of service. This layer is human-institutional rather than technical — its value is in the legal presumptions it creates.

L5 — C2PA (Roadmap)

Content Credentials integration on roadmap. Will complement rather than replace the existing layers — C2PA handles content authorship; L1–L4 handle biometric identity. The two standards serve different verification questions.

The processing pipeline is designed for ephemeral execution — decrypted audio is never written to disk. Client-side Web Crypto API encryption precedes upload; Cloud Functions process audio ephemerally in memory; only derived features are written to Firestore and BigQuery. User audio is stored only in the user's own Google Drive, under their own Google account credentials.

GCP Cloud Run Cloud Functions Next.js Python Resemblyzer Wav2Vec2 Parselmouth Web Crypto API Firestore BigQuery Firebase Auth

04 — Current Status & Metrics

Private Beta — validating the identity layer with voice professionals.

Independent Proof Layers

Each layer independently verifiable by a third party. The redundancy is structural — the failure or commoditization of any single layer does not compromise the verification value of the others.

Raw Audio Stored on Platform

Derived features only — cryptographic hashes, speaker embeddings, temporal metrics, synthetic detection scores. A breach yields nothing exploitable. Trust is built into the architecture, not bolted on as a policy.

<1%

Infrastructure Cost / Revenue

Ephemeral processing with no long-term audio storage means infrastructure costs are dominated by compute during registration events — not storage. The unit economics improve as volume scales.

Beta

Private Beta Phase

Currently validating the product with voice professionals — voice actors, podcast hosts, audiobook narrators — to confirm that the registration workflow and verification outputs meet their practical needs before broader release.

05 — Key Takeaways

Architecture and strategy lessons from building biometric identity infrastructure.

"Build the Identity Layer, Not the Tool"

C2PA will commoditize file-level timestamps. Platform compliance features will commoditize basic content provenance. By building the biometric identity layer — speaker embeddings, longitudinal aging analysis, clone detection — as the core, Vera VB complements rather than competes with every provenance standard that emerges. The framing decision (identity layer vs. protection tool) determines the entire competitive trajectory. Tools get replaced. Infrastructure layers get integrated.

"Zero-Knowledge Is a Structural Advantage"

Not storing user audio isn't primarily a privacy feature — it's an architectural advantage that eliminates the platform's own exposure as a data target. A platform holding a large voice database is an attractive target for both hackers and legal discovery. A platform holding only derived features — hashes, embeddings, metrics — is not. The "zero audio stored" constraint shaped every other architectural decision in the pipeline, and it's the property that makes the trust model work without requiring users to trust the platform's security posture.

"Temporal Data Is the Moat"

A single voice registration is useful as a point-in-time record. A longitudinal record spanning 12+ quarterly registrations — with documented aging characteristics, consistent embedding distance from the baseline, and a temporal chain anchored to public blockchains at each point — is something no competitor can retroactively manufacture. The quarterly re-registration mechanic is designed to compound dataset value over time. The users who register earliest and most consistently build the strongest biometric identity records.

"Regulatory Timing Matters"

Five state and federal legislative efforts are building the regulatory framework for voice IP protection. Building the infrastructure before regulation arrives means being the platform that both creators and AI companies need when compliance requirements materialize. Regulation doesn't create markets. It crystallizes them. The companies that have already built compliant infrastructure when regulation arrives are positioned as solutions rather than problems.

View More Work

See the full portfolio — production AI systems across asset management, insurance intelligence, and property data.

Back to Portfolio → Get in Touch