Concepts

A quick map of the Scrapewise data model. Useful when you’re sketching an integration or reading API responses.

Customer

Your Scrapewise account. Every object below (scrapers, schemas, data, keys) is tenant-scoped to one customer — you can’t see another customer’s data and they can’t see yours. The customer-scoping is enforced at the auth layer; once you’re signed in, the platform automatically filters to your tenant.

Your customer is identified by an internal customerRef. You’ll see it in /whoami responses and in audit-log entries.

Scraper

A reusable extraction recipe. A scraper bundles together:

One or more start URLs (where to begin)
A schema (the shape of the data to extract — see below)
Extraction rules (CSS selectors, AI-driven field hints, navigation patterns)
Run settings (concurrency, schedule, etc.)

A scraper can be run on-demand or on a cron schedule. Each run produces a job (see below) which produces rows of data.

You can have many scrapers; each one is independent.

Scraper group

A folder that holds related scrapers. Use groups to:

Organize many scrapers under one logical project (e.g. “competitor pricing” group)
Run multiple scrapers as a single batch
Share schemas across the group’s scrapers

Groups are optional — a scraper can live at the top level without belonging to any group.

Job

A single execution of a scraper. Each job has:

A start time + end time
A status (PENDING, RUNNING, SUCCEEDED, FAILED, CANCELLED)
A count of rows produced
Logs / progress events (streamable via SSE — see REST quickstart)

If you run a scraper 10 times, you have 10 jobs. The scraper definition stays the same; the jobs are the historical run records.

Schema

The shape of data a scraper produces. Each schema is a JSON object describing the fields:


{
  "title": "string",
  "price": "number",
  "in_stock": "boolean",
  "tags": ["string"]
}

Schemas serve three purposes:

Validation — Scrapewise rejects rows that don’t match the schema
Type information — exports (Excel, JSON) preserve types correctly
AI hints — when you use preview-from-url or AI-driven extraction, the schema tells the extractor what to look for

A schema can be defined inline on a scraper, or saved as a reusable “customer schema” and referenced by name from multiple scrapers.

Row (scraped data)

One extracted record. Rows are returned by GET /api/scraper/data (paginated). Each row has:

The fields defined by its scraper’s schema
A unique id
A scrapedAt timestamp
A reference to the job that produced it
A reference to the source URL

API key

A long-lived bearer token for accessing the REST API or MCP gateway. Each key has:

A prefix (visible in the portal — sw_live_abc1234.) for identification
A secret (only shown once at mint time — only the hash is stored)
A scope (USER / LLM_READ / LLM_FULL / MCP_GATEWAY) — see Scopes
A name (human-readable label, e.g. my-laptop, claude-desktop)

Keys are independently revocable from the portal. Revocation is immediate.

Scope

What a key is allowed to do. Three customer-mintable scopes:

Scope	What it can do	Typical use
`USER`	All portal REST endpoints. Cannot use the MCP gateway.	Human / portal-equivalent integrations from your code.
`LLM_READ`	Read-only MCP gateway access. List/get/read but not create/update/delete.	AI agents you trust to look but not touch.
`LLM_FULL`	Full MCP gateway access including destructive operations.	AI agents authorized to take actions.

There’s also MCP_GATEWAY (gateway-internal; not customer-mintable) for the platform itself.

For the full matrix of which tools each scope can call, see Scopes.

Customer Unique Ref (`customerUniqueRef`)

Every customer has an immutable customerUniqueRef string. You’ll see it in API responses (customerRef), URLs, and audit logs. It’s the tenant identifier; the platform uses it to scope all queries.

You typically don’t need to use it directly — your API key already authenticates you to a specific customer.

Putting it together — a typical flow

Sign up → create a customer
Mint an API key (USER scope for your code, LLM_READ for Claude)
Build a scraper in the portal (or via POST /api/scraper) — this saves a scraper definition
Trigger a run → Scrapewise creates a job → the job runs → produces rows
Read the rows via GET /api/scraper/data (paginated)
Export to Excel or read them programmatically — they conform to the scraper’s schema

What’s next

Mint your first key → Sign up + first key
Run a scraper from curl → REST quickstart
Wire Claude to scrapewise → MCP quickstart
Scope details → Scopes