Preview — auto-detect scraper configs
Two convenience endpoints used by the scraper-builder UI’s “Simple Mode”. Hand over a target URL (or a curl invocation) and the service runs every applicable detection strategy in parallel — SINGLE_PRODUCT, MULTIPLE_PRODUCTS, APPLICATION_LD_JSON, plus the Amazon / Google specialisations — and returns the candidate scraper configs that successfully produced data, along with their sample rows.
Each candidate is plan-feature gated; only configs the customer’s plan can actually run are returned. Nothing persists. Pick a candidate, then save it via PUT /api/scraper.
Endpoint summary
| Method | Path | Operation ID | Auth scope |
|---|---|---|---|
| POST | /api/scraper-simple/url-based | scrapewise_preview_scraper_from_url | bearer |
| POST | /api/scraper-simple/curl-based | scrapewise_preview_scraper_from_curl | bearer + API feature |
Preview from URL — POST /api/scraper-simple/url-based
POST /api/scraper-simple/url-based?tryWithHiddenData=false&useCache=false
Authorization: Bearer <key>
Content-Type: application/json
{
"id": null,
"name": "competitor-prices",
"groupId": "grp_xyz",
"url": "https://competitor.com/products",
"itemsConfig": null
}Request body (SimpleModeWithUrlDTO)
| Field | Required | Type | Description |
|---|---|---|---|
id | no | string | Existing scraper id when previewing changes to a saved scraper |
name | yes | string | Display name for the future scraper (server uses it as a tag on candidate DTOs) |
groupId | yes | string | Owning group’s MongoDB ObjectId (see Groups) |
url | yes | string | The page to analyse. Public HTTP(S); internal / link-local hosts are SSRF-blocked |
itemsConfig | no | array | Pre-existing field selectors to bias detection toward |
Query params
| Param | Default | Description |
|---|---|---|
tryWithHiddenData | false | Enable a more aggressive JSON-LD pass that reads scripts hidden by the renderer |
useCache | false | Reuse a recent fetch instead of hitting the network. Faster but may serve stale HTML |
Response (200) — Set<ScraperSampleDataDTO>. One entry per detection strategy that produced data, plan-filtered to what your plan can run:
[
{
"scraperDTO": {
"name": "competitor-prices",
"groupRef": "grp_xyz",
"type": "MULTIPLE_PRODUCTS",
"sourceConfig": {
"url": "https://competitor.com/products",
"pagination": { "type": "URL_PAGE_PARAM", "param": "page", "max": 50 }
},
"itemsConfig": [
{ "name": "title", "selector": "h2.product-title", "kind": "TEXT" },
{ "name": "price", "selector": ".price", "kind": "TEXT" },
{ "name": "imageUrl", "selector": "img.product-image", "kind": "ATTR", "attr": "src" }
]
},
"sampleData": [
{ "title": "Premium Widget XL", "price": "29.99", "imageUrl": "https://..." },
{ "title": "Standard Widget", "price": "19.99", "imageUrl": "https://..." }
],
"executionTimeSec": 2.4
}
]Empty array is a normal success state — no detection strategy succeeded for this URL on this plan; treat it as “this URL is unsupported by the available detectors”.
Errors — 400 (validation failed) / 401 / 403 (N/A) / 404 (N/A) / 429 / 500 (correlationId returned if a detector or downstream proxy crashes).
Node.js
const res = await fetch(
'https://portal.scrapewise.ai/api/scraper-api/api/scraper-simple/url-based',
{
method: 'POST',
headers: {
Authorization: `Bearer ${process.env.SCRAPEWISE_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
name: 'competitor-prices',
groupId: 'grp_xyz',
url: 'https://competitor.com/products',
}),
}
);
const candidates = await res.json();
const pick = candidates[0]; // pick whichever shape matches your need
// → then persist via PUT /api/scraper with `pick.scraperDTO`Preview from curl — POST /api/scraper-simple/curl-based
Same idea but the input is a full curl invocation. Use when the target endpoint needs custom request shape — cookies, headers, an auth bearer, or a POST body — that wouldn’t survive a plain GET. Scrapewise parses the curl, replays the exact request, then runs detection on the response.
Requires the API plan feature.
POST /api/scraper-simple/curl-based
Authorization: Bearer <key>
Content-Type: application/json
{
"name": "vendor-graphql-products",
"groupId": "grp_xyz",
"curl": "curl 'https://vendor.example/graphql' -H 'Cookie: session=...' -H 'Content-Type: application/json' --data-raw '{\"query\":\"{ products { id title price } }\"}'"
}Request body (SimpleModeWithCurlDTO)
| Field | Required | Type | Description |
|---|---|---|---|
id | no | string | Existing scraper id when previewing changes |
name | yes | string | Display name |
groupId | yes | string | Owning group’s MongoDB ObjectId |
curl | yes | string | Full curl command (single-line or with line continuations). Supports -X / -H / -d / --cookie / etc. |
Response (200) — same Set<ScraperSampleDataDTO> shape as /url-based.
Errors — 400 (curl string malformed) / 401 / 402 (plan lacks API feature) / 403 (N/A) / 404 (N/A) / 429 / 500.
What to do with a candidate
The response is a list. Each entry’s scraperDTO is a ready-to-persist payload. To save the picked candidate as a real scraper:
curl -X PUT \
-H "Authorization: Bearer $KEY" \
-H "Idempotency-Key: $(uuidgen)" \
-H "Content-Type: application/json" \
-d "$(cat candidate-from-preview.json)" \
"https://portal.scrapewise.ai/api/scraper-api/api/scraper"That returns the persisted scraper with its id. Then trigger runs via GET /api/scraper/{id}/run.
When to use Preview vs a real scraper
| Question | Preview | Real scraper |
|---|---|---|
| Just exploring what a page yields? | ✅ | — |
| Need to scrape the same page repeatedly? | — | ✅ |
| Need pagination / scheduled runs / historical storage? | — | ✅ |
| Want to A/B several detector strategies before committing? | ✅ | — |
Preview is the “test drive.” For production, save the picked candidate as a real scraper.
See also
- Scrapers — Create or update — save a picked candidate
- Schemas — for
AI_CONFscrapers; preview won’t generate one, you bring your own - Plan features — which tier unlocks the
APIfeature for the curl-based variant