Document Classifier

MOSAIC

The first classifier that learns from your folder structure and classifies long documents where current systems fail. No manual labeling. No predefined schemas.

91.25%Global AccuracyReal AEAT-Legal corpus

92.2%DAIC AccuracyDistributed signal docs

1.1msMedian Latency74.7% resolved without AI

0Annotated DocsExisting folders only

The problem nobody solved

“

There is a class of documents that no current system classifies well: those whose type cannot be determined from the first page. The discriminative signal is spread throughout the document — in clause 7, in article 18, in the combination of obligations that only appears on page 12.

A reader that only sees the beginning fails. A reader that sees the first eight pages also fails. Current systems reach between 40% and 73% on this type of document. MOSAIC reaches 92.2%.

How it works

MOSAIC applies a proprietary progressive hypothesis reduction approach. Instead of classifying the document against all possible types at once, the system progressively eliminates candidates using specialized observers of increasing cost — until the decision becomes trivial.

74.7%of documents are resolved in less than 1ms, at zero API cost

5%require deep reasoning — and only over ~2 final candidates, not the entire type space

$0.000056effective cost per document in real production

Why it's different

Learns from your folders

No need to label documents one by one, or use vendor schemas. MOSAIC distills the tacit knowledge that already exists in your directory structure — the classification an expert perfected over years of operational practice.

Epistemic honesty

When evidence is insufficient, MOSAIC declares the document ambiguous instead of forcing a low-confidence classification. In regulated environments, a human review signal is worth more than an incorrect classification presented with confidence.

→

Auditable decisions

Each classification cites concrete evidence: which step resolved it, which document fragment was decisive, and at what confidence level. Not an anonymous probability distribution — a traceable justification for audit.

New types in days

Adding a document type does not require retraining the complete pipeline. Just create a folder with representative examples. Competitors require weeks or months of retraining.

Detects filing errors

MOSAIC automatically detects misfiled documents and marks them for correction, improving document order as a byproduct of deployment.

Cost proportional to difficulty

Easy documents are resolved for free in less than 1ms. Only genuinely difficult ones escalate to costly steps. Average production cost per document: $0.000056.

Classifier comparison

MOSAIC is a document classifier, not an OCR system. Evaluated on the real AEAT-Legal corpus: 2,037 Spanish tax documents, 14 types, 96.6% difficult documents.

Classifier	Global Accuracy	Hard Docs	Latency	Auditable	Admits Uncertainty	New Type
Pure Regex	~65%	~35%	<10ms	YES	NO	Days
Short-window classifier	90.0%	91.3%	93ms	NO	NO	Months
Long-window classifier	~87%	~87%	350ms	NO	NO	Months
Concatenated ensemble	~87%	~85%	380ms	NO	NO	Months
Direct LLM (no reduction)	~83%	58.0%	>3s	~	NO	Days
MOSAIC	91.25%	92.2%	1.1ms	YES	YES	Days

“The only classifier that simultaneously resolves distributed-signal documents with high precision, justifies each decision by citing evidence from the text, admits uncertainty when it cannot determine the type, and is robust to real-world filing noise.”

Use cases

Legal / Tax

Tax and legal firms

Automatic classification of files, contracts and notarial deeds where the type emerges from clauses distributed throughout the document, not from the header.

Banking & Insurance

Credit documentation processing

Policies, loan contracts, appraisal reports. Documents where the financial product type appears in the combination of clauses, not at the beginning.

Public Administration

Procedural document management

Any organization with documents organized in folders by type can deploy MOSAIC without prior labeling — just with their existing file structure.

Ready to classify documents with real precision?

Deploy MOSAIC on your existing folder structure and get results from day one — no labeling, no complex integrations, and at a near-zero cost per document.

Request a demo Talk to an expert