Sites

A Site is the URL inventory for a scraper that runs against a fixed set of pages (vs a paginated category listing). Use Sites when you want a scraper to iterate “these 50 specific URLs” rather than crawl a category page.

Subject to the SITE_MAP plan feature on mutation. Read endpoints work for any authenticated customer who owns the scraper.

Endpoint summary

Method	Path	Operation ID	Auth scope
PUT	`/api/scraper/site`	`scrapewise_create_scraper_site`	bearer + `SITE_MAP` + idempotency-key
GET	`/api/scraper/{id}/site`	`scrapewise_get_scraper_site`	bearer
GET	`/api/scraper/site/{siteId}/links`	`scrapewise_get_scraper_site_links`	bearer
DELETE	`/api/scraper/site/{siteId}`	`scrapewise_delete_scraper_site`	bearer + idempotency-key

Create or update a Site — `PUT /api/scraper/site`


PUT /api/scraper/site
Authorization: Bearer <key>
Idempotency-Key: <uuid>
Content-Type: application/json
 
{
  "id": null,
  "name": "competitor-x product pages",
  "scraperRef": "scr_abc123",
  "links": [
    { "url": "https://competitor-x.com/product/1", "title": "Product 1" },
    { "url": "https://competitor-x.com/product/2", "title": "Product 2" }
  ]
}

Empty id → create new; non-empty id → update. The initial Link list can be passed inline via links[].

Response (200) — persisted SiteDTO with the assigned id.

Errors — 400 (validation) / 401 / 402 (plan lacks SITE_MAP) / 403 (N/A) / 404 (N/A) / 429 / 500.

Adding more links to an existing Site: today there is no incremental add-link endpoint — re-PUT the full SiteDTO with the union of old + new links.

Get a scraper’s Site — `GET /api/scraper/{id}/site`


curl -H "Authorization: Bearer $KEY" \
  https://portal.scrapewise.ai/api/scraper-api/api/scraper/scr_abc123/site

Returns the Site attached to a given scraper (note: the path id is the scraper’s id, NOT the site’s id — the lookup is “what site does this scraper use?”).

Response (200) — SiteDTO or null. Not every scraper uses a Site (single-page scrapers and paginated category-listing scrapers don’t). For the actual link inventory, follow up with /api/scraper/site/{siteId}/links using the site id from this response.

Errors — 400 (N/A) / 401 / 403 (N/A) / 404 (N/A) / 429 / 500.

List Site links (paginated, filterable) — `GET /api/scraper/site/{siteId}/links`


curl -H "Authorization: Bearer $KEY" \
  "https://portal.scrapewise.ai/api/scraper-api/api/scraper/site/5f9a.../links?page=0&size=100&sortField=url&sortDirection=asc"

Paginated URL inventory. Each Link carries url, title, and per-link state (visited, errored, pending).

Query params

Param	Default	Description
`page`	`0`	Zero-indexed page number
`size`	`100`	Page size, capped at 1000
`sortField`	`url`	One of `url`, `title`, `curl` (whitelisted)
`sortDirection`	`asc`	`asc` or `desc` (case-insensitive)
`filters`	—	URL-encoded MongoDB-style filter expression, e.g. `{"state":"ERRORED"}`

Out-of-range page/size are silently clamped (not 500ed). Unknown sortField values fall back to url with a server-side warn log.

Filter fields are whitelisted: url, title, curl. Logical operators $and, $or, $nor are supported and recursively sanitised.

Response (200) — Spring Page<LinkDTO> ({ content, totalElements, totalPages, ... }).

Errors — 400 (invalid siteId ObjectId / malformed filters JSON / site not found for customer) / 401 / 403 (N/A) / 404 (N/A — 400 used) / 429 / 500.

Delete a Site (destructive) — `DELETE /api/scraper/site/{siteId}`

Destructive operation. Deleting a Site cascades to delete every Link attached to it. The scraper itself is preserved — only the URL inventory is removed.

This endpoint is idempotent by design (ADR-013): deleting a site that’s already gone returns 204 No Content rather than 404. Agent retries are safe.


DELETE /api/scraper/site/5f9a1b2c3d4e5f6a7b8c9d0e
Authorization: Bearer <key>
Idempotency-Key: <uuid>

Steps (single-call delete — Sites are an exception to the two-call destructive pattern because the cascade impact is bounded to the Link inventory):

DELETE /api/scraper/site/{siteId} with Idempotency-Key.

Skipping idempotency on the header is allowed but means a retry may double-execute (no-op on second call thanks to ADR-013, but the header is still required by @RequireIdempotencyKey).

Response — 204 No Content. Verify deletion by re-fetching /api/scraper/{scraperId}/site, which should now return null.

Errors

Code	Meaning
400	`siteId` is not a valid ObjectId
401	Missing/invalid bearer
403	N/A (cross-tenant returns 204 via ADR-013)
404	N/A (idempotent — 204 used instead)
429	Rate-limited
500	Persistence failure

Sites

Endpoint summary

Create or update a Site — PUT /api/scraper/site

Get a scraper’s Site — GET /api/scraper/{id}/site

List Site links (paginated, filterable) — GET /api/scraper/site/{siteId}/links

Delete a Site (destructive) — DELETE /api/scraper/site/{siteId}

See also

Create or update a Site — `PUT /api/scraper/site`

Get a scraper’s Site — `GET /api/scraper/{id}/site`

List Site links (paginated, filterable) — `GET /api/scraper/site/{siteId}/links`

Delete a Site (destructive) — `DELETE /api/scraper/site/{siteId}`