katana
Fast web crawler for collecting URLs and endpoints. ProjectDiscovery.
Quickstart
katana -u https://target.com
katana -u https://target.com -headless
katana -list urls.txt
katana -u https://target.com -silent | nuclei -silent
Core Concepts
| Concept |
Description |
| Crawling |
Follow links and discover endpoints |
| Headless |
Use browser for JS-heavy sites |
| Scope |
Control what gets crawled |
| Passive |
Extract URLs without making requests |
Syntax
katana -u <url> [options]
katana -list <file> [options]
Options
| Option |
Description |
-u <url> |
Single URL |
-list <file> |
URL list |
- |
Read from stdin |
-resume <file> |
Resume from file |
Crawling
| Option |
Description |
-d <n> |
Max depth (default 3) |
-jc |
Crawl JS files |
-ct <sec> |
Crawl timeout |
-kf |
Keep query string in URLs |
-ef <ext> |
Exclude extensions |
-em <type> |
Exclude media |
-fs <pattern> |
Field scope |
Headless
| Option |
Description |
-headless |
Enable headless browser |
-hl |
Headless with full browser |
-sc |
Use system Chrome |
-xhr |
Extract XHR requests |
-ws |
Extract WebSocket URLs |
Scope
| Option |
Description |
-cs <scope> |
Crawl scope (dn, rdn, fqdn) |
-do |
Display out of scope URLs |
-fs <regex> |
Filter scope |
-sf <domain> |
Scope filter domain |
Output
| Option |
Description |
-o <file> |
Output file |
-json |
JSON output |
-silent |
Silent mode |
-nc |
No color |
-v |
Verbose |
| Option |
Description |
-c <n> |
Concurrency (default 10) |
-p <n> |
Parallelism |
-rl <n> |
Rate limit |
-timeout <sec> |
Timeout |
-retry <n> |
Retries |
Request
| Option |
Description |
-H "Header: val" |
Custom header |
-proxy <url> |
HTTP proxy |
-xhr |
XHR extraction |
Recipes
Basic Crawling
katana -u https://target.com
katana -u https://target.com -d 5
katana -u https://target.com -silent
katana -list urls.txt -silent
JS-Heavy Sites
katana -u https://target.com -headless
katana -u https://target.com -headless -xhr
katana -u https://target.com -headless -sc
Endpoint Discovery
katana -u https://target.com -jc
katana -u https://target.com -kf
katana -u https://target.com -f
Scope Control
katana -u https://target.com -cs dn
katana -u https://target.com -cs rdn
katana -u https://target.com -ef png,jpg,gif,css,woff
Pipeline Integration
katana -u https://target.com -silent | nuclei -silent
subfinder -d target.com -silent | httpx -silent | katana -silent
katana -u https://target.com -silent | gf xss
katana -u https://target.com -silent | grep "?" | sort -u
API Endpoint Discovery
katana -u https://target.com -silent | grep -E "/api/|/v[0-9]/"
katana -u https://target.com -json -o crawl.json
katana -u https://target.com -silent | \
sed 's/\?.*//' | sort -u
Through Proxy
katana -u https://target.com -proxy http://127.0.0.1:8080
katana -u https://target.com -H "Authorization: Bearer token"
Output & Parsing
katana -u https://target.com -json -o results.json
cat results.json | jq -r '.request.endpoint'
katana -u https://target.com -silent | sort -u > endpoints.txt
katana -u https://target.com -silent | grep -E "\.(php|asp|jsp)"
Troubleshooting
| Issue |
Solution |
| Missing JS endpoints |
Use -headless |
| Too slow |
Reduce -d, increase -c |
| Stuck on site |
Add -ct timeout |
| Scope issues |
Check -cs setting |
References