Skip to Content
REST APISchemas

Schemas

JSON-Schema documents that the AI scraper config (AI_CONF mode) feeds to the LLM to constrain its output. Two flavours coexist:

  • Global schemas — admin-managed templates (BEBO-1406 SchemaSeeder ships the initial set). Read-only for customers; create/update/delete is admin-only.
  • Customer schemas — owned by the authenticated customer. Full CRUD via the endpoints below. Used to fork a global template and tailor it to a customer’s shape, or to define a brand-new extraction schema from scratch.

When the AI scraper runs, it sends the schema to the LLM and requires the response to match. Wrong shape → row rejected. Schemas are the contract between your data model and the AI extractor.

Endpoint summary

MethodPathOperation IDAuth scope
PUT/api/schema/customerscrapewise_create_customer_schemabearer
GET/api/schema/customerscrapewise_list_customer_schemabearer
GET/api/schema/get/{id}bearer
DELETE/api/schema/customer/{id}scrapewise_delete_customer_schemabearer + idempotency-key
POST/api/schema/customer/{id}/preview-deletescrapewise_delete_customer_schema_previewbearer
GET/api/schema/items/existingbearer
GET/api/schema/{type}/{version}admin
GET/api/schemaadmin
PUT/api/schemaadmin

Create or update a customer schema — PUT /api/schema/customer

PUT /api/schema/customer Authorization: Bearer <key> Idempotency-Key: <uuid> Content-Type: application/json { "id": null, "version": 1, "type": "PRODUCT", "description": "Product schema for competitor-x.com", "content": { "type": "object", "properties": { "title": { "type": "string" }, "price": { "type": "number" }, "sku": { "type": "string" } }, "required": ["title", "price"] }, "templateId": null, "templateName": null, "domainPatterns": ["competitor-x.com"] }

Request fields

FieldRequiredTypeDescription
idnostringSet to update an existing schema; omit/null to create new
versionyesintVersion number for change tracking
typeyesenumSchemaType — one of PRODUCT, REVIEW, ARTICLE, etc.
contentyesobjectThe JSON-Schema document — properties map, required array
descriptionnostringHuman-readable purpose, surfaced when a scraper references this schema
templateIdnostringProvenance if forked from a global template
templateNamenostringDisplay name of the source template
domainPatternsnostring[]Domains this schema applies to (used by the schema-suggest UI)

Upsert semantics: id set + matches an existing customer-owned schema → updates; otherwise → creates new. The response always carries the persisted id.

Response (200)

{ "id": "5f9a1b2c3d4e5f6a7b8c9d0e", "version": 1, "type": "PRODUCT", "description": "Product schema for competitor-x.com", "content": { /* the JSON-Schema document */ }, "customerRef": "cust_abc123", "domainPatterns": ["competitor-x.com"] }

customerRef is server-set from the auth principal — you cannot forge another customer’s schema.

Errors

CodeMeaning
400content violates JSON-Schema syntax or required fields missing
401Missing/invalid bearer
403N/A (no scope restriction beyond auth)
404N/A (upsert never 404s)
429Rate-limited (per-key throttle)
500Persistence failure (correlation id in response envelope)

Idempotency. This endpoint requires Idempotency-Key. Retrying with the same key returns the previously-persisted response without re-executing the upsert.

List customer schemas — GET /api/schema/customer

curl -H "Authorization: Bearer $KEY" \ https://portal.scrapewise.ai/api/scraper-api/api/schema/customer

Returns every JSON-Schema document owned by the authenticated customer. Excludes global schemas — those are admin-only (see Get global schema).

Response (200)

[ { "id": "5f9a...", "name": "Product schema for competitor-x.com", "type": "PRODUCT", "version": 1 }, { "id": "60ab...", "name": "Review schema", "type": "REVIEW", "version": 2 } ]

Each entry is a lightweight SchemaListDTO (id, name, type, version) — enough to build a picker UI or feed an agent’s “which schema?” decision. For the full content, call GET /api/schema/get/{id} with the id from this list.

Errors

CodeMeaning
400N/A (no validation)
401Missing/invalid bearer
403N/A
404N/A (empty list returned if no schemas)
429Rate-limited
500Mongo lookup failure

Get any schema by id — GET /api/schema/get/{id}

curl -H "Authorization: Bearer $KEY" \ https://portal.scrapewise.ai/api/scraper-api/api/schema/get/5f9a1b2c3d4e5f6a7b8c9d0e

Looks up a schema by Mongo ObjectId. Global schemas (customerRef = null) are readable by any authenticated customer; customer-scoped schemas are only readable by the owning customer. Cross-tenant access surfaces as 404 (intentional info-hide: an attacker cannot distinguish “schema id exists but isn’t yours” from “schema id doesn’t exist”).

Auth gap fixed 2026-05-13. Pre-fix this endpoint had no auth call — any authenticated user could read any schema by id including other customers’ scoped schemas. Surfaced and fixed in the pre-PR-3v7 audit.

Response (200) — full SchemaDTO including the content JSON-Schema document.

Errors

CodeMeaning
400id is not a valid ObjectId
401Missing/invalid bearer
403N/A (cross-tenant returns 404)
404Schema does not exist OR is owned by another customer
429Rate-limited
500Mongo lookup failure

Delete a customer schema (destructive — two-call protocol) — DELETE /api/schema/customer/{id}

Destructive operation. Deleting a schema is permanent. Any AI_CONF scraper that references this schema will fail at runtime with a missing-schema error. Reattach or delete dependent scrapers before committing the delete.

ADR-012 two-call pattern: first preview, then commit within 5 minutes.

Steps

  1. POST /api/schema/customer/{id}/preview-delete — mints a 5-minute DestructiveOpToken and returns a preview summary (warnings about dependent scrapers).
  2. DELETE /api/schema/customer/{id} — commits the deletion, idempotency-key required.

Skipping the preview step deletes rows without confirmation.

Step 1 — Preview

POST /api/schema/customer/5f9a1b2c3d4e5f6a7b8c9d0e/preview-delete Authorization: Bearer <key>

Response (200)

{ "token": "8f5c9e2a-...-...", "opName": "scrapewise_delete_customer_schema", "targetEntityId": "5f9a1b2c3d4e5f6a7b8c9d0e", "previewSummary": { "entityName": "5f9a1b2c3d4e5f6a7b8c9d0e", "entityType": "customer_schema", "cascadeCounts": {}, "warnings": [ "AI_CONF scrapers that reference this schema will fail at runtime with a missing-schema error. Reattach or delete dependent scrapers BEFORE committing." ] } }

Render previewSummary to the user. Only proceed to step 2 on explicit approval.

Step 2 — Commit

DELETE /api/schema/customer/5f9a1b2c3d4e5f6a7b8c9d0e Authorization: Bearer <key> Idempotency-Key: <uuid>

Response204 No Content.

Verify deletion by re-listing customer schemas and confirming the id is gone.

Errors (both steps)

CodeMeaning
400id is not a valid ObjectId
401Missing/invalid bearer
403N/A
404Schema does not exist or is owned by another customer
429Rate-limited
500Persistence failure mid-delete (correlation id in response envelope)

List predefined schema-item names — GET /api/schema/items/existing

curl -H "Authorization: Bearer $KEY" \ https://portal.scrapewise.ai/api/scraper-api/api/schema/items/existing

Catalogue of common field names (price, productName, imageUrl, etc.) used by the schema-builder UI for autocomplete. Call this when authoring a schema to align with the rest of the platform’s naming conventions.

Response (200)

[ { "name": "productName", "type": "string" }, { "name": "price", "type": "number" }, { "name": "imageUrl", "type": "string" } ]

Errors

CodeMeaning
400N/A
401Missing/invalid bearer
403N/A
404N/A
429Rate-limited
500Service failure

Admin-only endpoints

The following are documented for completeness but use authorizeRoot() — non-admin requests get 401:

Get global schema — GET /api/schema/{type}/{version}

Returns the canonical schema document for the given (type, version) pair.

List global schemas — GET /api/schema

Returns every global schema (id, name, type, version). The scraper-builder UI picker uses this.

Create or update a global schema — PUT /api/schema

Upserts a global schema by id (or by (type, version) if id is absent). The BEBO-1406 SchemaSeeder ships the initial set; this endpoint is for ad-hoc admin updates.

If you need a global schema modified for your customer, fork it as a customer schema via PUT /api/schema/customer instead.

See also