Bright Data Google Search¶
This document describes how to use the Bright Data SERP API for Google searches.
Overview¶
By integrating Bright Data's SERP API, we can bypass Google's anti-bot mechanisms and get structured search results reliably.
Class Overview¶
BrightDataSearch- A class that provides Bright Data Google search functionality
Initialization Parameters¶
api_keys: Optional[str] = None- Comma-separated Bright Data API tokens. If not provided, will try to get from BRIGHTDATA_API_KEY env varzone: Optional[str] = None- Bright Data zone name (default: "mcp_unlocker")rate_limit_delay: float = 1.0- Delay between requests in seconds to avoid rate limits
Architecture¶
Bright Data search implementation uses a universal Google result parser (GoogleResultParser) that:
- Handles variations in API response formats
- Provides consistent result scoring based on search position
- Simplifies maintenance and reduces code duplication
- Enables easy integration of new Google search providers
Configuration¶
1. Get API Token¶
- Visit Bright Data and register an account
- Create an API Token in the console
- (Optional) Create or use an existing Web Unlocker Zone
2. Set Environment Variables¶
# Required: API Token
export BRIGHTDATA_API_KEY="your_api_token_here"
# Optional: Custom Zone (default is mcp_unlocker)
export BRIGHTDATA_ZONE="your_zone_name"
Or configure in .env file:
Free Tier¶
Bright Data offers very generous free tier allowances:
- 5,000 free queries per month upon registration
- No personal information verification or detailed documentation required
- Token box appears in the bottom right corner after registration
- Immediately available, no approval waiting required
Maximizing Free Usage¶
- Multi-Token Rotation: Use multiple API tokens for load balancing
- Plan Search Frequency: Control search frequency wisely to avoid wasting quota
Free Tier Policy
All free tier information may be subject to provider policy changes. Information is accurate at the time of writing.
Usage¶
Python API¶
from toolregistry_hub.websearch import BrightDataSearch
# Initialize search client
search = BrightDataSearch()
# Basic search
results = search.search("python web scraping", max_results=10)
for result in results:
print(f"Title: {result.title}")
print(f"URL: {result.url}")
print(f"Content: {result.content[:200]}...")
print(f"Score: {result.score}") # Score based on search position
print("-" * 50)
### Using Multiple API Keys
```python
from toolregistry_hub.websearch import BrightDataSearch
# Create search instance with multiple API keys for load balancing
api_keys = "token1,token2,token3"
search = BrightDataSearch(api_keys=api_keys)
# Execute search
results = search.search("machine learning tutorial", max_results=10)
# Process search results
for result in results:
print(f"Title: {result.title}")
print(f"URL: {result.url}")
print(f"Content: {result.content[:200]}...")
print("-" * 50)
Paginated search (get page 2 results)¶
results_page2 = search.search( "artificial intelligence", max_results=10, cursor="1" # Page number starts from 0 )
Custom timeout¶
results = search.search( "machine learning", max_results=5, timeout=30.0 )
POST /tools/web/brightdata_search/search
#### Request Example
```bash
curl -X POST "http://localhost:8000/tools/web/brightdata_search/search" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_auth_token" \
-d '{
"query": "python web scraping",
"max_results": 10,
"timeout": 10.0,
"cursor": "0"
}'
Request Parameters¶
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query |
string | Yes | - | Search query string |
max_results |
integer | No | 5 | Number of results to return (1-20) |
timeout |
float | No | 10.0 | Request timeout in seconds |
cursor |
string | No | "0" | Pagination cursor (page number, starts from 0) |
Response Example¶
{
"results": [
{
"title": "Python Web Scraping Tutorial",
"url": "https://example.com/tutorial",
"content": "Learn how to scrape websites using Python...",
"score": 0.95
},
{
"title": "Best Python Scraping Libraries",
"url": "https://example.com/libraries",
"content": "A comprehensive guide to Python scraping tools...",
"score": 0.9
}
]
}
Note: The score field now reflects the search result's position (higher position = higher score).
Advanced Usage¶
Batch Search¶
from toolregistry_hub.websearch import BrightDataSearch
search = BrightDataSearch()
queries = ["python", "javascript", "golang"]
all_results = []
for query in queries:
results = search.search(query, max_results=5)
all_results.extend(results)
print(f"Total results retrieved: {len(all_results)}")
Deep Search (Multiple Pages)¶
from toolregistry_hub.websearch import BrightDataSearch
search = BrightDataSearch()
# Get first 50 results (automatic pagination)
results = search.search("machine learning", max_results=50)
# Or manually control pagination
all_results = []
for page in range(3): # Get first 3 pages
results = search.search(
"deep learning",
max_results=20,
cursor=str(page)
)
all_results.extend(results)
Custom Configuration¶
from toolregistry_hub.websearch import BrightDataSearch
# Use custom configuration
search = BrightDataSearch(
api_keys="your_custom_token",
zone="custom_zone_name",
rate_limit_delay=2.0 # 2 seconds delay between requests
)
results = search.search("custom query")
Result Scoring¶
Results are scored based on their position in search results:
- Position 1: score = 0.95
- Position 2: score = 0.90
- Position 3: score = 0.85
- And so on...
The scoring formula is: score = 1.0 - (position * 0.05), clamped between 0.0 and 1.0.
This provides a more accurate representation of result relevance compared to a fixed score.
Zone Explanation¶
Zone is a core concept in Bright Data, similar to a "proxy pool" or "service instance":
- Each Zone has independent quota and billing
- Default uses
mcp_unlockerzone (Web Unlocker type) - Can be customized via
BRIGHTDATA_ZONEenvironment variable - Auto-creation: If zone doesn't exist, the system will automatically create it (requires valid API key)
Zone Auto-creation¶
When you initialize BrightDataSearch, the system will:
- Check if the specified zone exists
- If not, automatically create a Web Unlocker type zone
- If creation fails, log a warning but continue running (zone may be created by Bright Data on first use)
You can also manually create a Zone:
- Log in to Bright Data Console
- Click the "Add" button
- Select "Unlocker zone"
- Enter zone name and create
- Set the zone name in environment variables
Error Handling¶
Common Errors¶
1. Authentication Failed (401)¶
Solution: Check if the API token is correctly set.
2. Zone Not Found (422)¶
Solution: Create the zone in Bright Data console, or use the default mcp_unlocker.
3. Rate Limit (429)¶
Solution: Increase the rate_limit_delay parameter value.
4. Timeout Error¶
Solution: Increase the timeout parameter value.
Error Handling Example¶
from toolregistry_hub.websearch import BrightDataSearch
try:
search = BrightDataSearch()
results = search.search("test query")
if not results:
print("No results found or error occurred")
else:
for result in results:
print(f"{result.title} (score: {result.score})")
except ValueError as e:
print(f"Configuration error: {e}")
except Exception as e:
print(f"Search failed: {e}")
Performance Optimization¶
1. Rate Limiting¶
2. Timeout Settings¶
3. Batch Processing¶
Limitations¶
- Maximum 20 results per request
- Maximum 180 results total (via pagination)
- Subject to Bright Data account quota limits
- Only supports Google search (no Bing, Yandex)
Testing¶
Run tests:
# Run all Bright Data tests
pytest tests/websearch/test_websearch_brightdata.py -v
# Run specific test
pytest tests/websearch/test_websearch_brightdata.py::TestBrightDataGoogleSearch::test_search_basic -v
# Run debug tests to see raw API responses
python tests/websearch/test_debug_google_apis.py
Technical Details¶
Universal Parser¶
Bright Data search uses the GoogleResultParser with the following configuration:
BRIGHTDATA_CONFIG = GoogleAPIConfig(
results_key="organic",
url_keys=["link", "url"],
description_keys=["description", "snippet"],
position_key="rank",
use_position_scoring=True,
)
This configuration tells the parser:
- Where to find organic results in the API response
- Which fields to check for URLs (in priority order)
- Which fields to check for descriptions
- How to calculate relevance scores
Related Resources¶
- Bright Data Official Documentation
- Bright Data API Reference
- Bright Data Console
- Universal Google Parser Documentation
License¶
This integration follows the project's MIT license. Using Bright Data services requires compliance with their terms of service.