Web scraping - Claude skills for journalism

When to use

Three-tier fallback: Trafilatura (fast) to Requests (HTTP) to Playwright (JavaScript rendering with stealth).

Detect paywalls, CAPTCHAs, rate limits, Cloudflare, and login walls with pattern matching.

Find and use hidden APIs via browser dev tools, with examples for autocomplete endpoints.

yt-dlp for YouTube/TikTok, instaloader for Instagram, with metadata extraction and download patterns.

Try multiple extraction strategies with automatic fallback:

Fast

Lightweight extraction for standard articles. Best for news sites and blogs.

Medium

HTTP requests with rotating user agents. Good for static content.

Heavy

Full JavaScript rendering with anti-bot bypass. For SPAs and protected sites.

Type	Detection patterns
Paywall	"subscribe to continue", "you've reached your limit"
CAPTCHA	"verify you are human", "robot verification"
Rate limit	"too many requests", HTTP 429
Cloudflare	"checking your browser", "ddos protection"
Login required	"sign in to continue", "create an account"

# Recommended: install the dev-toolkit plugin

/plugin marketplace add jamditis/claude-skills-journalism

/plugin install dev-toolkit@claude-skills-journalism

# Or copy just this skill from the plugin tree

git clone https://github.com/jamditis/claude-skills-journalism.git

cp -r claude-skills-journalism/dev-toolkit/skills/web-scraping ~/.claude/skills/

Or browse this skill in the GitHub repository.

Web archiving Python pipeline Social media intelligence

Cascade architecture, poison pill detection, and social media tools in one skill.