Skip to Content
REST APIPreview (auto-detect)

Preview — auto-detect scraper configs

Two convenience endpoints used by the scraper-builder UI’s “Simple Mode”. Hand over a target URL (or a curl invocation) and the service runs every applicable detection strategy in parallel — SINGLE_PRODUCT, MULTIPLE_PRODUCTS, APPLICATION_LD_JSON, plus the Amazon / Google specialisations — and returns the candidate scraper configs that successfully produced data, along with their sample rows.

Each candidate is plan-feature gated; only configs the customer’s plan can actually run are returned. Nothing persists. Pick a candidate, then save it via PUT /api/scraper.

Endpoint summary

MethodPathOperation IDAuth scope
POST/api/scraper-simple/url-basedscrapewise_preview_scraper_from_urlbearer
POST/api/scraper-simple/curl-basedscrapewise_preview_scraper_from_curlbearer + API feature

Preview from URL — POST /api/scraper-simple/url-based

POST /api/scraper-simple/url-based?tryWithHiddenData=false&useCache=false Authorization: Bearer <key> Content-Type: application/json { "id": null, "name": "competitor-prices", "groupId": "grp_xyz", "url": "https://competitor.com/products", "itemsConfig": null }

Request body (SimpleModeWithUrlDTO)

FieldRequiredTypeDescription
idnostringExisting scraper id when previewing changes to a saved scraper
nameyesstringDisplay name for the future scraper (server uses it as a tag on candidate DTOs)
groupIdyesstringOwning group’s MongoDB ObjectId (see Groups)
urlyesstringThe page to analyse. Public HTTP(S); internal / link-local hosts are SSRF-blocked
itemsConfignoarrayPre-existing field selectors to bias detection toward

Query params

ParamDefaultDescription
tryWithHiddenDatafalseEnable a more aggressive JSON-LD pass that reads scripts hidden by the renderer
useCachefalseReuse a recent fetch instead of hitting the network. Faster but may serve stale HTML

Response (200)Set<ScraperSampleDataDTO>. One entry per detection strategy that produced data, plan-filtered to what your plan can run:

[ { "scraperDTO": { "name": "competitor-prices", "groupRef": "grp_xyz", "type": "MULTIPLE_PRODUCTS", "sourceConfig": { "url": "https://competitor.com/products", "pagination": { "type": "URL_PAGE_PARAM", "param": "page", "max": 50 } }, "itemsConfig": [ { "name": "title", "selector": "h2.product-title", "kind": "TEXT" }, { "name": "price", "selector": ".price", "kind": "TEXT" }, { "name": "imageUrl", "selector": "img.product-image", "kind": "ATTR", "attr": "src" } ] }, "sampleData": [ { "title": "Premium Widget XL", "price": "29.99", "imageUrl": "https://..." }, { "title": "Standard Widget", "price": "19.99", "imageUrl": "https://..." } ], "executionTimeSec": 2.4 } ]

Empty array is a normal success state — no detection strategy succeeded for this URL on this plan; treat it as “this URL is unsupported by the available detectors”.

Errors — 400 (validation failed) / 401 / 403 (N/A) / 404 (N/A) / 429 / 500 (correlationId returned if a detector or downstream proxy crashes).

Node.js

const res = await fetch( 'https://portal.scrapewise.ai/api/scraper-api/api/scraper-simple/url-based', { method: 'POST', headers: { Authorization: `Bearer ${process.env.SCRAPEWISE_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ name: 'competitor-prices', groupId: 'grp_xyz', url: 'https://competitor.com/products', }), } ); const candidates = await res.json(); const pick = candidates[0]; // pick whichever shape matches your need // → then persist via PUT /api/scraper with `pick.scraperDTO`

Preview from curl — POST /api/scraper-simple/curl-based

Same idea but the input is a full curl invocation. Use when the target endpoint needs custom request shape — cookies, headers, an auth bearer, or a POST body — that wouldn’t survive a plain GET. Scrapewise parses the curl, replays the exact request, then runs detection on the response.

Requires the API plan feature.

POST /api/scraper-simple/curl-based Authorization: Bearer <key> Content-Type: application/json { "name": "vendor-graphql-products", "groupId": "grp_xyz", "curl": "curl 'https://vendor.example/graphql' -H 'Cookie: session=...' -H 'Content-Type: application/json' --data-raw '{\"query\":\"{ products { id title price } }\"}'" }

Request body (SimpleModeWithCurlDTO)

FieldRequiredTypeDescription
idnostringExisting scraper id when previewing changes
nameyesstringDisplay name
groupIdyesstringOwning group’s MongoDB ObjectId
curlyesstringFull curl command (single-line or with line continuations). Supports -X / -H / -d / --cookie / etc.

Response (200) — same Set<ScraperSampleDataDTO> shape as /url-based.

Errors — 400 (curl string malformed) / 401 / 402 (plan lacks API feature) / 403 (N/A) / 404 (N/A) / 429 / 500.

What to do with a candidate

The response is a list. Each entry’s scraperDTO is a ready-to-persist payload. To save the picked candidate as a real scraper:

curl -X PUT \ -H "Authorization: Bearer $KEY" \ -H "Idempotency-Key: $(uuidgen)" \ -H "Content-Type: application/json" \ -d "$(cat candidate-from-preview.json)" \ "https://portal.scrapewise.ai/api/scraper-api/api/scraper"

That returns the persisted scraper with its id. Then trigger runs via GET /api/scraper/{id}/run.

When to use Preview vs a real scraper

QuestionPreviewReal scraper
Just exploring what a page yields?
Need to scrape the same page repeatedly?
Need pagination / scheduled runs / historical storage?
Want to A/B several detector strategies before committing?

Preview is the “test drive.” For production, save the picked candidate as a real scraper.

See also