Panel Alexamigo
← Memory

captcha-scraping.md

# Captcha Solving & Free Tool Scraping

## CapSolver (PRINCIPAL)
- **API Key**: `CAP-3581EA483014F075D79F2796B99E2978BC89FD9FCE11E7C27B8DEDA61A65A51C`
- **Balance**: ~$6 (abr 2026)
- **Endpoint**: `https://api.capsolver.com/createTask` / `https://api.capsolver.com/getTaskResult`
- **Task type**: `ReCaptchaV3EnterpriseTaskProxyLess`
- **Scores**: 0.1-0.4 (suficiente para SEMrush free tools)
- **Ventaja vs 2Captcha**: Scores mas altos, funciona con SEO Checker (2Captcha no)

## 2Captcha (BACKUP)
- **API Key**: `95ee103bd6a8e103e90ac2181d5895d1`
- **Balance**: ~$4.80 (abr 2026)
- **Endpoint**: `https://2captcha.com/in.php` (submit) / `https://2captcha.com/res.php` (poll)
- **Limitacion**: max score ~0.3, no funciona con SEO Checker

## SEMrush Free Tools Scraping (TODAS)
- **Sitekey comun**: `6LcwzpErAAAAALabrs4m1pOnRKkikqUUsc3jQ49j` (reCAPTCHA Enterprise)
- **Rate limit**: 3 checks/IP/day por herramienta
- **Bypass**: Webshare proxy rotation (50K IPs)
- **TLS**: Node.js bloqueado (403), Python requests funciona → subprocess bridge
- **Script**: `/home/ubuntu/proyectos-cloud/hub-beepeek/apps/hub/server/services/semrush_scrape.py`
- **Bridge**: `/home/ubuntu/proyectos-cloud/hub-beepeek/apps/hub/server/services/semrush-scraper.js`

| Tool | API Endpoint | Input Field | Page URL | Estado |
|------|-------------|-------------|----------|--------|
| authority | `/v2/website-authority-checker/` | `website` | `/free-tools/website-authority-checker/` | INTEGRADO |
| ai-visibility | `/v2/ai-visibility-checker/` | `domain` | `/free-tools/ai-search-visibility-checker/` | INTEGRADO |
| keyword | `/v2/keyword-checker/` | `keyword`+`country` | `/free-tools/keyword-checker/` | INTEGRADO |
| seo-checker | `/v2/seo-checker/` | `url` | `/siteaudit/` | INTEGRADO |

## SE Ranking Free Tool Scraping (HTTP-only, no captcha)
- **URL**: `https://seranking.com/domain-trust-checker.html`
- **API**: `POST https://seranking.com/wp-admin/admin-ajax.php` action=`se_backlinks_results`
- **Nonce**: Extraer del HTML con regex
- **Rate limit**: 5 checks/IP/day
- **Datos**: backlinks, refdomains, domain_inlink_rank (Domain Trust), dofollow/nofollow

## SEMrush API Oficial (via cuentas compartidas)
- **API Key activa**: `214581fc91d0e623e79e1bebfa4f7e6b` (Guru trial, user 29159276)
- **Base**: `https://api.semrush.com/?type=TYPE&key=KEY&domain=DOMAIN&database=DB`
- **Formato respuesta**: CSV separado por `;`, respuesta instantanea (<1s)
- **Sin captcha, sin proxy, sin limites de intentos**

### Endpoints que funcionan
| Type | Que devuelve | Coste |
|------|-------------|-------|
| `domain_organic` | **Keywords posicionadas** (pos, vol, CPC, URL, trafico) | OK |
| `domain_ranks` | Resumen: rank, total KWs, trafico, coste organico | OK |
| `phrase_this` | Overview de keyword (vol, CPC, competencia) | OK |
| `phrase_related` | Keywords relacionadas | OK |
| `url_organic` | Keywords de una URL especifica | OK |
| `domain_organic_organic` (con export_columns competitors) | Competidores organicos | OK |

### Endpoints que NO funcionan (sin unidades)
- `domain_organic_organic` (overview) - API UNITS BALANCE ZERO
- `backlinks_overview` - no encontrado

### Columnas utiles (export_columns)
- `Ph` = Keyword, `Po` = Position, `Nq` = Search Volume, `Cp` = CPC
- `Ur` = URL, `Tr` = Traffic %, `Tc` = Traffic Cost %, `Co` = Competition
- `Nr` = Number of Results, `Td` = Trends (12 meses)

### Rotacion de cuentas (jonyonlinecash.com)
- **URL**: `https://jonyonlinecash.com/herramienta-semrush2/`
- **Login**: `tumejorfisico100@gmail.com` / `01091988aA1º*`
- **4 cuentas** SEMrush con cookies embebidas en el HTML
- **Flujo cuando key falle**: login jony → parsear cookies → Patchright (VPS) → cargar cookies en browser → navegar a SEMrush → extraer `window.sm2.user.api_key`
- **Problema**: Python requests da 403 en SEMrush (TLS fingerprinting), necesita Patchright
- **PENDIENTE**: Script automatico de rotacion (montar cuando la key actual falle)

## Estrategia en Authority Tab
- **Primario**: SE Ranking (gratis, sin captcha, HTTP-only)
- **Secundario/complementario**: SEMrush Authority Checker (con CapSolver + proxy)
- **DR History**: Ahrefs API `/domain-rating-history`