Schemas
JSON-Schema documents that the AI scraper config (AI_CONF mode) feeds to the LLM to constrain its output. Two flavours coexist:
- Global schemas — admin-managed templates (BEBO-1406
SchemaSeederships the initial set). Read-only for customers; create/update/delete is admin-only. - Customer schemas — owned by the authenticated customer. Full CRUD via the endpoints below. Used to fork a global template and tailor it to a customer’s shape, or to define a brand-new extraction schema from scratch.
When the AI scraper runs, it sends the schema to the LLM and requires the response to match. Wrong shape → row rejected. Schemas are the contract between your data model and the AI extractor.
Endpoint summary
| Method | Path | Operation ID | Auth scope |
|---|---|---|---|
| PUT | /api/schema/customer | scrapewise_create_customer_schema | bearer |
| GET | /api/schema/customer | scrapewise_list_customer_schema | bearer |
| GET | /api/schema/get/{id} | — | bearer |
| DELETE | /api/schema/customer/{id} | scrapewise_delete_customer_schema | bearer + idempotency-key |
| POST | /api/schema/customer/{id}/preview-delete | scrapewise_delete_customer_schema_preview | bearer |
| GET | /api/schema/items/existing | — | bearer |
| GET | /api/schema/{type}/{version} | — | admin |
| GET | /api/schema | — | admin |
| PUT | /api/schema | — | admin |
Create or update a customer schema — PUT /api/schema/customer
PUT /api/schema/customer
Authorization: Bearer <key>
Idempotency-Key: <uuid>
Content-Type: application/json
{
"id": null,
"version": 1,
"type": "PRODUCT",
"description": "Product schema for competitor-x.com",
"content": {
"type": "object",
"properties": {
"title": { "type": "string" },
"price": { "type": "number" },
"sku": { "type": "string" }
},
"required": ["title", "price"]
},
"templateId": null,
"templateName": null,
"domainPatterns": ["competitor-x.com"]
}Request fields
| Field | Required | Type | Description |
|---|---|---|---|
id | no | string | Set to update an existing schema; omit/null to create new |
version | yes | int | Version number for change tracking |
type | yes | enum | SchemaType — one of PRODUCT, REVIEW, ARTICLE, etc. |
content | yes | object | The JSON-Schema document — properties map, required array |
description | no | string | Human-readable purpose, surfaced when a scraper references this schema |
templateId | no | string | Provenance if forked from a global template |
templateName | no | string | Display name of the source template |
domainPatterns | no | string[] | Domains this schema applies to (used by the schema-suggest UI) |
Upsert semantics: id set + matches an existing customer-owned schema → updates; otherwise → creates new. The response always carries the persisted id.
Response (200)
{
"id": "5f9a1b2c3d4e5f6a7b8c9d0e",
"version": 1,
"type": "PRODUCT",
"description": "Product schema for competitor-x.com",
"content": { /* the JSON-Schema document */ },
"customerRef": "cust_abc123",
"domainPatterns": ["competitor-x.com"]
}customerRef is server-set from the auth principal — you cannot forge another customer’s schema.
Errors
| Code | Meaning |
|---|---|
| 400 | content violates JSON-Schema syntax or required fields missing |
| 401 | Missing/invalid bearer |
| 403 | N/A (no scope restriction beyond auth) |
| 404 | N/A (upsert never 404s) |
| 429 | Rate-limited (per-key throttle) |
| 500 | Persistence failure (correlation id in response envelope) |
Idempotency. This endpoint requires Idempotency-Key. Retrying with the same key returns the previously-persisted response without re-executing the upsert.
List customer schemas — GET /api/schema/customer
curl -H "Authorization: Bearer $KEY" \
https://portal.scrapewise.ai/api/scraper-api/api/schema/customerReturns every JSON-Schema document owned by the authenticated customer. Excludes global schemas — those are admin-only (see Get global schema).
Response (200)
[
{ "id": "5f9a...", "name": "Product schema for competitor-x.com", "type": "PRODUCT", "version": 1 },
{ "id": "60ab...", "name": "Review schema", "type": "REVIEW", "version": 2 }
]Each entry is a lightweight SchemaListDTO (id, name, type, version) — enough to build a picker UI or feed an agent’s “which schema?” decision. For the full content, call GET /api/schema/get/{id} with the id from this list.
Errors
| Code | Meaning |
|---|---|
| 400 | N/A (no validation) |
| 401 | Missing/invalid bearer |
| 403 | N/A |
| 404 | N/A (empty list returned if no schemas) |
| 429 | Rate-limited |
| 500 | Mongo lookup failure |
Get any schema by id — GET /api/schema/get/{id}
curl -H "Authorization: Bearer $KEY" \
https://portal.scrapewise.ai/api/scraper-api/api/schema/get/5f9a1b2c3d4e5f6a7b8c9d0eLooks up a schema by Mongo ObjectId. Global schemas (customerRef = null) are readable by any authenticated customer; customer-scoped schemas are only readable by the owning customer. Cross-tenant access surfaces as 404 (intentional info-hide: an attacker cannot distinguish “schema id exists but isn’t yours” from “schema id doesn’t exist”).
Auth gap fixed 2026-05-13. Pre-fix this endpoint had no auth call — any authenticated user could read any schema by id including other customers’ scoped schemas. Surfaced and fixed in the pre-PR-3v7 audit.
Response (200) — full SchemaDTO including the content JSON-Schema document.
Errors
| Code | Meaning |
|---|---|
| 400 | id is not a valid ObjectId |
| 401 | Missing/invalid bearer |
| 403 | N/A (cross-tenant returns 404) |
| 404 | Schema does not exist OR is owned by another customer |
| 429 | Rate-limited |
| 500 | Mongo lookup failure |
Delete a customer schema (destructive — two-call protocol) — DELETE /api/schema/customer/{id}
Destructive operation. Deleting a schema is permanent. Any AI_CONF scraper that references this schema will fail at runtime with a missing-schema error. Reattach or delete dependent scrapers before committing the delete.
ADR-012 two-call pattern: first preview, then commit within 5 minutes.
Steps
POST /api/schema/customer/{id}/preview-delete— mints a 5-minuteDestructiveOpTokenand returns a preview summary (warnings about dependent scrapers).DELETE /api/schema/customer/{id}— commits the deletion, idempotency-key required.
Skipping the preview step deletes rows without confirmation.
Step 1 — Preview
POST /api/schema/customer/5f9a1b2c3d4e5f6a7b8c9d0e/preview-delete
Authorization: Bearer <key>Response (200)
{
"token": "8f5c9e2a-...-...",
"opName": "scrapewise_delete_customer_schema",
"targetEntityId": "5f9a1b2c3d4e5f6a7b8c9d0e",
"previewSummary": {
"entityName": "5f9a1b2c3d4e5f6a7b8c9d0e",
"entityType": "customer_schema",
"cascadeCounts": {},
"warnings": [
"AI_CONF scrapers that reference this schema will fail at runtime with a missing-schema error. Reattach or delete dependent scrapers BEFORE committing."
]
}
}Render previewSummary to the user. Only proceed to step 2 on explicit approval.
Step 2 — Commit
DELETE /api/schema/customer/5f9a1b2c3d4e5f6a7b8c9d0e
Authorization: Bearer <key>
Idempotency-Key: <uuid>Response — 204 No Content.
Verify deletion by re-listing customer schemas and confirming the id is gone.
Errors (both steps)
| Code | Meaning |
|---|---|
| 400 | id is not a valid ObjectId |
| 401 | Missing/invalid bearer |
| 403 | N/A |
| 404 | Schema does not exist or is owned by another customer |
| 429 | Rate-limited |
| 500 | Persistence failure mid-delete (correlation id in response envelope) |
List predefined schema-item names — GET /api/schema/items/existing
curl -H "Authorization: Bearer $KEY" \
https://portal.scrapewise.ai/api/scraper-api/api/schema/items/existingCatalogue of common field names (price, productName, imageUrl, etc.) used by the schema-builder UI for autocomplete. Call this when authoring a schema to align with the rest of the platform’s naming conventions.
Response (200)
[
{ "name": "productName", "type": "string" },
{ "name": "price", "type": "number" },
{ "name": "imageUrl", "type": "string" }
]Errors
| Code | Meaning |
|---|---|
| 400 | N/A |
| 401 | Missing/invalid bearer |
| 403 | N/A |
| 404 | N/A |
| 429 | Rate-limited |
| 500 | Service failure |
Admin-only endpoints
The following are documented for completeness but use authorizeRoot() — non-admin requests get 401:
Get global schema — GET /api/schema/{type}/{version}
Returns the canonical schema document for the given (type, version) pair.
List global schemas — GET /api/schema
Returns every global schema (id, name, type, version). The scraper-builder UI picker uses this.
Create or update a global schema — PUT /api/schema
Upserts a global schema by id (or by (type, version) if id is absent). The BEBO-1406 SchemaSeeder ships the initial set; this endpoint is for ad-hoc admin updates.
If you need a global schema modified for your customer, fork it as a customer schema via PUT /api/schema/customer instead.
See also
- Scrapers — attach a schema to an
AI_CONFscraper via the scraper’sschemafield - REST overview — errors — error envelope shape
- Authentication — bearer token lifecycle