Skip to main content
WISEPIM web scraping import configuring a source URL and previewing extracted products Sometimes the data you want is on a website, not in a feed: a supplier’s catalog, your own store on another platform, or a reference range you want to match. Web Scraping Import crawls a public site, uses AI to extract each product (name, SKU, brand, price, images, attributes), and imports them into WISEPIM. There is no file to prepare and no API to connect. What acting on it enables: onboard a supplier or an existing storefront in minutes instead of asking for a feed, and bring products into WISEPIM where you can enrich, translate, and publish them.
Web Scraping Import works on any public website, with no API keys or setup. It always shows you a live preview of one extracted product before you commit, so you can confirm the data looks right first.

How it works

1

Pick a source mode

Category page walks a listing page and its pagination to find every product (best for a supplier or competitor catalog). Sitemap starts from one product URL and finds similar pages across the site. Manual list takes a list of product URLs you paste in, one per line.
2

Add the URL and limits

Paste the seed URL. Optionally set a URL pattern (to include only the right pages) and caps on how many listing pages and products to pull, so a first run stays small.
3

Preview one product

Run the preview. WISEPIM reports how many product URLs it matched, the pattern it detected, a few sample URLs, and one fully extracted product so you can check the fields landed correctly.
4

Import

Happy with the preview? Start the import. It runs in the background, so you can leave the page and watch progress in the Process Tracker. When it finishes, the products are in your catalog, ready to work with.
Keep your first run small. Set the Max products and Max listing pages caps low, preview the result, and confirm the data looks right before you let it pull the whole catalog.

Controls you can set

You shape each scrape with a few optional overrides. The defaults work for most sites, so reach for these only when a run needs a nudge:
  • Sitemap URL override: point WISEPIM at the right sitemap when a site doesn’t declare one in its robots.txt. Use this if the sitemap mode can’t find product URLs on its own.
  • Product URL pattern override: tell WISEPIM which URLs count as products (for example /p/ or /products/) when the auto-detected pattern picks up the wrong pages.
  • Max listing pages: how many pagination pages of a category to walk. Raise it for large catalogs, keep it low for a quick test.
  • Max products: an upper bound on how many products a run imports. A safety cap that keeps a first run small and predictable.

Reading the preview

The preview exists so you never import blind:
  • Matched URL count tells you whether the crawl found roughly the number of products you expected. Zero or far too few means the pattern or seed URL needs adjusting.
  • The detected pattern shows which URLs will be treated as products. If it is catching category or blog pages, tighten the pattern with the product URL pattern override.
  • The extracted sample is the real test: check that name, price, images, and key attributes mapped correctly before you commit to the full run.
If WISEPIM can’t extract a product from the first URL, it shows a clear warning rather than failing silently. The import can still work on the other pages, so it’s worth previewing once more or starting the run and checking the results. If the sample stays empty, adjust the seed URL or pattern and preview again.

Act on what you find

The seed URL or pattern is off. For a category page, make sure you pasted the listing page (not a single product); for sitemap mode, paste a real product URL so WISEPIM can learn the pattern. Adjust the pattern override and preview again. Outcome: the crawl finds the full set before you spend an import run on it.
Some sites bury data in scripts or images. Re-preview to confirm it is consistent, import what extracts cleanly, then fill the gaps with Enriching Products (AI can read the product images to recover attributes). Outcome: a complete catalog even when the source page was thin.
Note the settings that worked: the source mode, the seed or category URL, and any pattern or sitemap overrides. Next time the supplier updates, enter the same values to pull the changes. For sources you re-import often, a structured feed is the more reliable long-term option when one is available. Outcome: repeatable supplier onboarding.
If the source can give you an XML or CSV feed, prefer Feed Hub import or file import: structured feeds are faster and more reliable than crawling. Use scraping when no feed is available. Outcome: the right tool for each source.

How it compares

Web Scraping ImportFile / Feed importWeb Research
InputA live website URLAn XML / CSV / JSON file or feedA search query or competitor URL
Best forSites with no feed availableSuppliers and channels that publish a feedGathering facts to enrich existing products
OutputProducts in your catalogProducts in your catalogResearch you apply to content
AI doesExtracts fields from the pageMaps columns to fieldsSearches and summarizes

Importing Products

File-based import (CSV, Excel) when you have structured data.

Feed Hub

Import from and publish to XML / feed sources.

Web Research

Research products on the web to enrich what you already have.

Enriching Products

Fill any gaps the scrape left, with AI.