One API call. Send PDFs, images, HTML, or emails with your own schema — get back validated JSON with confidence scores. No custom parsers. No regex. Ever.
From messy file → clean JSON in seconds
INVOICE #INV-2026-0891 From: NovaTech Industries Bill To: Globex Corp Date: March 25, 2026 · Due: April 24, 2026 Enterprise Software License 5× $1,200 = $6,000.00 Implementation Services 40h $150.00 = $6,000.00 Annual Support Plan 1× $2,500 = $2,500.00 Training Workshop (on-site) 2d $800.00 = $1,600.00 ──────────────────────────────────────────────────── Subtotal: $16,100.00 · Tax (9.5%): $1,529.50 TOTAL DUE: $17,629.50 · Terms: Net 30
{
"success": true,
"data": {
"vendor": "NovaTech Industries",
"bill_to": "Globex Corp",
"invoice_number": "INV-2026-0891",
"date": "2026-03-25",
"due_date": "2026-04-24",
"subtotal": 16100.00,
"tax": 1529.50,
"total": 17629.50,
"payment_terms": "Net 30"
},
"confidence": {
"vendor": 0.99,
"total": 0.99,
"invoice_number": 0.99
},
"meta": { "processing_ms": 1380 }
}Stop writing brittle parsers and regex that break every time a format changes.
Every business has critical data locked inside unstructured content. Getting it out is painful.
Someone reads an invoice, types the numbers into a spreadsheet. Slow, expensive, error-prone. Doesn't scale.
Regex, OCR pipelines, scraping scripts. Brittle — breaks every time a vendor changes their invoice format.
One service for OCR, another for scraping, another for PDFs, another for emails. Complex, expensive, hard to maintain.
Send any content — a photo of a receipt, a PDF invoice, a product page URL, a customer email — plus a JSON Schema describing what you want. Get back validated, typed, structured JSON. Every time. No custom code. No multiple tools. No manual entry.
Any workflow that needs structured data from unstructured content.
Your agent processes documents, emails, and web pages but needs clean structured data to act on. CleanJSON gives it guaranteed-valid JSON with confidence scores — so it knows exactly how much to trust each value before making decisions.
You're building a pipeline that pulls data from invoices, receipts, forms, or web pages into your database or CRM. Instead of writing and maintaining custom parsers for every format, you send content to one endpoint and get typed JSON back.
Your team processes hundreds or thousands of invoices, purchase orders, compliance forms, or contracts. Instead of manual data entry or fragile OCR pipelines, every document goes through one API and comes out as clean, validated JSON.
You're reading a PDF and typing numbers into a spreadsheet. You're copying product specs from a website. You're extracting dates from emails. CleanJSON does this in seconds — accurately, every time, at any scale.
Your agent sends any content plus a schema. We handle the rest.
POST an image, PDF, HTML, URL, raw text, or email. Base64 or plain text. We accept everything.
Provide a standard JSON Schema describing the fields you want. Any shape, any depth, any types.
Receive validated, typed JSON with per-field confidence scores. Schema-compliant. Ready to use.
One endpoint. Six input types. Zero configuration.
Send messy real-world data. Get back clean, typed, validated JSON.
POST /api/v1/extract
{
"input_type": "text",
"content": "Invoice #INV-2026-042 from Acme Corp.
Date: March 15, 2026. Total: $1,250.00
Items: 2x Widget Pro ($500 ea),
1x Service Fee ($250)",
"schema": {
"type": "object",
"properties": {
"vendor": { "type": "string" },
"invoice_number": { "type": "string" },
"date": { "type": "string", "format": "date" },
"total": { "type": "number" },
"line_items": { "type": "array", "items": {...} }
},
"required": ["vendor", "total"]
}
}{
"success": true,
"validated": true,
"data": {
"vendor": "Acme Corp",
"invoice_number": "INV-2026-042",
"date": "2026-03-15",
"total": 1250.00,
"line_items": [
{ "description": "Widget Pro",
"quantity": 2, "unit_price": 500, "total": 1000 },
{ "description": "Service Fee",
"quantity": 1, "unit_price": 250, "total": 250 }
]
},
"confidence": {
"vendor": 0.99,
"invoice_number": 0.99,
"date": 0.90,
"total": 0.99,
"line_items": 0.95
},
"meta": {
"input_type": "text",
"processing_ms": 1240,
"tokens_used": 820,
"retries": 0
}
}Pick a real document and watch CleanJSON extract structured data in seconds.
No API key needed to preview · Full playground available after free signup — includes 5,000 free tokens
Every response is validated against your JSON Schema with Ajv. Failed validations auto-retry with error context. You get typed data or clear errors — never malformed JSON.
Every field gets a 0.0–1.0 confidence score. Set a threshold and we'll reject low-confidence extractions automatically. Your agent knows exactly how much to trust each value.
"$1,250.00" becomes 1250.00. "March 15, 2026" becomes "2026-03-15". "yes" becomes true. We handle the messy real-world formatting so your agent gets clean typed values.
If a value isn't in the source, we return null — never a plausible guess. Confidence 0.0 for anything uncertain. Your agent can trust every non-null value has evidence in the input.
Token-based billing. No subscriptions required. Starts free.
From invoices to product pages — if it has data, CleanJSON extracts it.
Sign up free, grab your API key, and make your first extraction. No credit card. No contracts. No minimums.
Try it free →5,000 tokens free · No credit card required