Concepts
A quick map of the Scrapewise data model. Useful when you’re sketching an integration or reading API responses.
Customer
Your Scrapewise account. Every object below (scrapers, schemas, data, keys) is tenant-scoped to one customer — you can’t see another customer’s data and they can’t see yours. The customer-scoping is enforced at the auth layer; once you’re signed in, the platform automatically filters to your tenant.
Your customer is identified by an internal customerRef. You’ll see it in /whoami responses and in audit-log entries.
Scraper
A reusable extraction recipe. A scraper bundles together:
- One or more start URLs (where to begin)
- A schema (the shape of the data to extract — see below)
- Extraction rules (CSS selectors, AI-driven field hints, navigation patterns)
- Run settings (concurrency, schedule, etc.)
A scraper can be run on-demand or on a cron schedule. Each run produces a job (see below) which produces rows of data.
You can have many scrapers; each one is independent.
Scraper group
A folder that holds related scrapers. Use groups to:
- Organize many scrapers under one logical project (e.g. “competitor pricing” group)
- Run multiple scrapers as a single batch
- Share schemas across the group’s scrapers
Groups are optional — a scraper can live at the top level without belonging to any group.
Job
A single execution of a scraper. Each job has:
- A start time + end time
- A status (
PENDING,RUNNING,SUCCEEDED,FAILED,CANCELLED) - A count of rows produced
- Logs / progress events (streamable via SSE — see REST quickstart)
If you run a scraper 10 times, you have 10 jobs. The scraper definition stays the same; the jobs are the historical run records.
Schema
The shape of data a scraper produces. Each schema is a JSON object describing the fields:
{
"title": "string",
"price": "number",
"in_stock": "boolean",
"tags": ["string"]
}Schemas serve three purposes:
- Validation — Scrapewise rejects rows that don’t match the schema
- Type information — exports (Excel, JSON) preserve types correctly
- AI hints — when you use
preview-from-urlor AI-driven extraction, the schema tells the extractor what to look for
A schema can be defined inline on a scraper, or saved as a reusable “customer schema” and referenced by name from multiple scrapers.
Row (scraped data)
One extracted record. Rows are returned by GET /api/scraper/data (paginated). Each row has:
- The fields defined by its scraper’s schema
- A unique
id - A
scrapedAttimestamp - A reference to the job that produced it
- A reference to the source URL
API key
A long-lived bearer token for accessing the REST API or MCP gateway. Each key has:
- A prefix (visible in the portal —
sw_live_abc1234.) for identification - A secret (only shown once at mint time — only the hash is stored)
- A scope (
USER/LLM_READ/LLM_FULL/MCP_GATEWAY) — see Scopes - A name (human-readable label, e.g.
my-laptop,claude-desktop)
Keys are independently revocable from the portal. Revocation is immediate.
Scope
What a key is allowed to do. Three customer-mintable scopes:
| Scope | What it can do | Typical use |
|---|---|---|
USER | All portal REST endpoints. Cannot use the MCP gateway. | Human / portal-equivalent integrations from your code. |
LLM_READ | Read-only MCP gateway access. List/get/read but not create/update/delete. | AI agents you trust to look but not touch. |
LLM_FULL | Full MCP gateway access including destructive operations. | AI agents authorized to take actions. |
There’s also MCP_GATEWAY (gateway-internal; not customer-mintable) for the platform itself.
For the full matrix of which tools each scope can call, see Scopes.
Customer Unique Ref (customerUniqueRef)
Every customer has an immutable customerUniqueRef string. You’ll see it in API responses (customerRef), URLs, and audit logs. It’s the tenant identifier; the platform uses it to scope all queries.
You typically don’t need to use it directly — your API key already authenticates you to a specific customer.
Putting it together — a typical flow
- Sign up → create a customer
- Mint an API key (
USERscope for your code,LLM_READfor Claude) - Build a scraper in the portal (or via
POST /api/scraper) — this saves a scraper definition - Trigger a run → Scrapewise creates a job → the job runs → produces rows
- Read the rows via
GET /api/scraper/data(paginated) - Export to Excel or read them programmatically — they conform to the scraper’s schema
What’s next
- Mint your first key → Sign up + first key
- Run a scraper from curl → REST quickstart
- Wire Claude to scrapewise → MCP quickstart
- Scope details → Scopes