Skip to content

ZeroTrace OSINT

Web Crawler

Multi-page crawl with email, phone, external-domain extraction and per-page tech-stack hints.

The web crawler walks a site, page by page, and extracts the contact-and-context data scattered across its public pages. It is the broad-strokes reconnaissance tool — give it an apex URL, get back a structured digest of everything publicly visible.

What you get

For a configurable depth and page count, the crawler returns:

SectionWhat it surfaces
Pages crawledURL, status code, title, content length per page
EmailsEvery email address discovered across the crawl, deduped
Phone numbersEvery phone number discovered, with E.164 normalisation
External domainsEvery external host the site links to, with link counts
Form actionsEvery form's action URL — endpoints that accept user input
Per-page tech-stack hintsA lightweight version of site analysis per page, surfacing CMS / framework changes across the site
Sitemap auto-seedThe crawl seeds itself with the URLs found in the site's sitemap (auto-composed via the robots/sitemap tool)

Configuration

Inputs:

  • Start URL — the page the crawl begins from.
  • Max pages — hard cap on pages visited. Default 100.
  • Max depth — hard cap on link-distance from the start URL. Default 3.
  • Same-host only — restrict to the start URL's host (recommended). Default on.
  • Wordlist seed — optional list of additional paths to try (/admin, /api/v1, etc.).
  • Crawl delay — politeness delay between requests. Defaults to a courteous value.

The crawler respects robots.txt by default. A toggle disables this, but you should not enable it without a clear authorised reason.

The web crawler generates HTTP traffic against the target. Use it only on sites where you have authorisation, or sites that are openly published for public consumption. Aggressive crawling can be misread as an attack.

Email and phone extraction

Every page is scanned for:

  • Email addresses matching standard patterns, plus common obfuscations (name [at] example [dot] com).
  • Phone numbers matching international and national patterns, normalised to E.164 where possible.

Deduplication is automatic. The result table shows the count of pages each email / phone appeared on, so the most-mentioned contact rises to the top.

Form actions

Every <form> element on every page contributes its action attribute to the form-actions list. This tells you:

  • Endpoints that accept user input — login forms, search forms, contact forms, upload forms, comment forms.
  • Cross-origin form submissions — forms that POST to a different origin (often legitimate third-party services, sometimes a misconfiguration).

For authorised security testing, the form-actions list is the input to web-application-test planning.

Secrets in JS

The crawler optionally scans inline and linked JavaScript for common secret patterns:

  • AWS access keys (AKIA...).
  • Google Cloud API keys.
  • Slack tokens (xoxb-, xoxp-).
  • Stripe keys.
  • Generic API-key-shaped strings.

A match is a finding to investigate, not a confirmed leak — many matches are placeholders or test keys. But every real secret leak in a public JS bundle started as a pattern hit somewhere.

Per-page tech-stack hints

Each page contributes its own lightweight tech-stack fingerprint to the crawl. Useful for spotting:

  • Stack discontinuities — a primary site running on WordPress with one section running on a Django subdomain.
  • Multiple CMSes on the same domain — often migration artefacts.
  • Embedded third-party tools — admin areas, support widgets, embedded apps.

For full per-page detail, pivot to site analysis.

Pivots

Click on...Pivot to
Page URLSite analysis, redirect analyzer, Wayback
EmailEmail analyzer, password breach lookup, person investigation
PhonePhone lookup
External domainDNS, WHOIS, certificate transparency, site analysis
Form action URLURL parser, redirect analyzer
JS secret pattern(no pivot — copy and verify externally)

Sources

  • Direct HTTP requests to each crawled URL (rate-limited, robots.txt-honouring by default).
  • The site's own sitemap.xml for auto-seed.
  • A bundled secret-pattern catalog for the JS scan.

The crawler does not call any external API — it only fetches the target site itself.

Command Palette

Search for a command to run...