Apify
Social media and web scraping powered by Apify actors.
Provider: Apify.com
Authentication: API key required
Category: Social Media & Web Scraping
Credit Cost: 5 credits per actor run
Overview
Apify tools provide access to powerful scraping actors for social media platforms and specialized websites. These actors handle complex scraping challenges including anti-bot protection, rate limiting, and data normalization.
Setup
Get Apify API Key
- Sign up at apify.com
- Navigate to Settings → Integrations
- Copy your API token
- Add to Reeva:
- Dashboard → Accounts → Add Account
- Select Apify
- Paste API key
- Save
Available Tools
Reddit Scraper
Scrape Reddit posts, comments, communities, and user profiles.
Tool ID: apify_Reddit_Scraper
Credit Cost: 5 credits
Apify Actor: reddit-scraper-lite
Parameters:
start_urls(array, optional): Direct URLs to Reddit posts or profilessearches(array, optional): Search terms to find postssubreddits(array, optional): Subreddit names to scrapeusers(array, optional): User profiles to scrapesearch_posts(boolean, optional): Include posts in search- Default:
true
- Default:
search_comments(boolean, optional): Include comments in search- Default:
false
- Default:
sort(string, optional): Sort order (new,hot,top,rising)- Default:
"new"
- Default:
time(string, optional): Time filter (hour,day,week,month,year,all)- Default:
"all"
- Default:
max_items(integer, optional): Maximum items to scrape- Default: 50
Example Usage:
# Python - Scrape subreddit
response = client.call_tool(
name="apify_Reddit_Scraper",
arguments={
"subreddits": ["machinelearning"],
"sort": "top",
"time": "week",
"max_items": 25
}
)
for post in response["items"]:
print(f"{post['title']} - {post['upvotes']} upvotes")
// TypeScript - Search Reddit
const response = await client.callTool({
name: "apify_Reddit_Scraper",
arguments: {
searches: ["MCP protocol"],
search_posts: true,
max_items: 10
}
});
Use Cases:
- Market research and sentiment analysis
- Track brand mentions
- Gather community feedback
- Monitor competitor discussions
- Research trends and topics
Tweet Scraper
Extract tweets via searches, handles, or direct URLs.
Tool ID: apify_Tweet_Scraper
Credit Cost: 5 credits
Apify Actor: tweet-scraper
Status: Currently inactive
Parameters:
start_urls(array, optional): Direct tweet URLssearch_terms(array, optional): Keywords to search fortwitter_handles(array, optional): User handles to scrapeconversation_ids(array, optional): Conversation thread IDsmax_items(integer, optional): Maximum tweets to scrapesort(string, optional): Sort ordertweet_language(string, optional): Filter by language code (e.g., "en")only_verified_users(boolean, optional): Only verified accountsonly_twitter_blue(boolean, optional): Only Twitter Blue subscribersonly_image(boolean, optional): Only tweets with imagesonly_video(boolean, optional): Only tweets with videosonly_quote(boolean, optional): Only quote tweetsminimum_retweets(integer, optional): Minimum retweet countminimum_favorites(integer, optional): Minimum like countminimum_replies(integer, optional): Minimum reply count
Example Usage:
# Python - Monitor brand mentions
response = client.call_tool(
name="apify_Tweet_Scraper",
arguments={
"search_terms": ["@YourBrand"],
"max_items": 50,
"sort": "Latest"
}
)
Use Cases:
- Social media monitoring
- Influencer analysis
- Sentiment tracking
- Competitive intelligence
- Trend analysis
YouTube Transcripts
Extract video transcripts and metadata from YouTube.
Tool ID: apify_Scrape_YouTube_Transcripts
Credit Cost: 5 credits
Apify Actor: youtube-transcripts
Parameters:
urls(array, required): YouTube video URLsoutput_format(string, optional): Format for transcripts- Default:
"captions"
- Default:
max_retries(integer, optional): Retry attempts for failed requests- Default: 8
include_channel_name(boolean, optional): Include channel name- Default:
true
- Default:
include_channel_id(boolean, optional): Include channel ID- Default:
true
- Default:
include_date_published(boolean, optional): Include publish date- Default:
true
- Default:
include_view_count(boolean, optional): Include view count- Default:
false
- Default:
include_likes(boolean, optional): Include likes count- Default:
false
- Default:
include_comments(boolean, optional): Include comments- Default:
false
- Default:
include_keywords(boolean, optional): Include video keywords- Default:
false
- Default:
include_thumbnail(boolean, optional): Include thumbnail URL- Default:
false
- Default:
include_description(boolean, optional): Include video description- Default:
false
- Default:
Response:
{
"items": [
{
"video_id": "dQw4w9WgXcQ",
"title": "Video Title",
"channel_name": "Channel Name",
"channel_id": "UCxxxxxxxx",
"date_published": "2025-11-22",
"transcript": "Full transcript text...",
"duration": "3:45"
}
]
}
Example Usage:
# Python - Get video transcript
response = client.call_tool(
name="apify_Scrape_YouTube_Transcripts",
arguments={
"urls": ["https://www.youtube.com/watch?v=VIDEO_ID"],
"include_description": True,
"include_view_count": True
}
)
transcript = response["items"][0]["transcript"]
print(f"Transcript length: {len(transcript)} characters")
// TypeScript - Batch transcript extraction
const videoUrls = [
"https://www.youtube.com/watch?v=VIDEO1",
"https://www.youtube.com/watch?v=VIDEO2",
"https://www.youtube.com/watch?v=VIDEO3"
];
const response = await client.callTool({
name: "apify_Scrape_YouTube_Transcripts",
arguments: {
urls: videoUrls,
include_keywords: true,
include_thumbnail: true
}
});
Use Cases:
- Content analysis and summarization
- SEO research and keyword extraction
- Educational content processing
- Accessibility (add captions to videos)
- Research and fact-checking
Zillow Scraper
Extract real estate property details from Zillow.
Tool ID: apify_Zillow_Scraper
Credit Cost: 5 credits
Apify Actor: zillow-detail-scraper
Parameters:
startUrls(array, optional): Direct Zillow property URLs- Format:
https://www.zillow.com/homedetails/Address/12345678_zpid/
- Format:
addresses(array, optional): Property addresses- Format:
"123 Main St, City, State"
- Format:
propertyStatus(string, optional): Property status filter- Options:
"FOR_SALE","RECENTLY_SOLD","FOR_RENT" - Default:
"RECENTLY_SOLD"
- Options:
extractBuildingUnits(string, optional): Extract individual units from buildings- Options:
"disabled","all","for_sale","recently_sold","for_rent","off_market" - Default:
"disabled"
- Options:
Response:
{
"items": [
{
"address": "123 Main St, San Francisco, CA 94102",
"zpid": "12345678",
"price": 1250000,
"bedrooms": 3,
"bathrooms": 2.5,
"sqft": 1800,
"lot_size": 4500,
"year_built": 2010,
"property_type": "Single Family",
"listing_url": "https://www.zillow.com/...",
"images": ["url1", "url2"],
"description": "Beautiful home..."
}
]
}
Example Usage:
# Python - Scrape by address
response = client.call_tool(
name="apify_Zillow_Scraper",
arguments={
"addresses": [
"123 Main St, San Francisco, CA",
"456 Oak Ave, Los Angeles, CA"
],
"propertyStatus": "FOR_SALE"
}
)
for property in response["items"]:
print(f"{property['address']}: ${property['price']:,}")
// TypeScript - Scrape by URLs
const response = await client.callTool({
name: "apify_Zillow_Scraper",
arguments: {
startUrls: [
{ url: "https://www.zillow.com/homedetails/Address/12345678_zpid/" }
],
propertyStatus: "RECENTLY_SOLD"
}
});
Use Cases:
- Real estate market analysis
- Property investment research
- Comparative market analysis (CMA)
- Price trend monitoring
- Rental market research
Cost Management
Understanding Apify Credits
- Apify charges separately for actor runs (consumes Apify credits)
- Reeva charges 5 credits per tool execution
- Total cost = Reeva credits (5) + Apify credits (varies)
Optimization Tips
-
Batch Requests: Scrape multiple items in one call
- Use
max_itemsto control volume - Pass multiple URLs/searches in single request
- Use
-
Filter Early: Use parameters to reduce data
only_verified_usersfor TwitterpropertyStatusfor Zillow- Time filters for Reddit
-
Cache Results: Store scraped data to avoid re-scraping
- Use Supabase or Notion to cache
- Check cache before making new requests
-
Monitor Usage: Track Apify actor consumption
- Review Apify dashboard regularly
- Set up usage alerts
Best Practices
Rate Limiting
- Apify actors handle rate limiting automatically
- Spread large scraping jobs over time
- Use Apify's built-in retry mechanisms
Data Quality
- Validate scraped data before processing
- Handle missing fields gracefully
- Be aware of platform changes affecting scrapers
Legal Compliance
- Review platform terms of service
- Respect robots.txt and rate limits
- Use scraped data ethically
- Attribute data sources appropriately
Integration Examples
Example 1: Reddit Sentiment Analysis
# Scrape Reddit, analyze sentiment, store in Notion
def analyze_brand_sentiment(brand_name):
# Scrape Reddit mentions
reddit_data = client.call_tool(
name="apify_Reddit_Scraper",
arguments={
"searches": [brand_name],
"max_items": 100,
"time": "week"
}
)
# Analyze each post
for post in reddit_data["items"]:
# Use Perplexity to analyze sentiment
sentiment = client.call_tool(
name="Perplexity_Ask",
arguments={
"question": f"Analyze sentiment of this Reddit post: {post['title']} {post['body']}"
}
)
# Store in Notion
client.call_tool(
name="notion_create_page",
arguments={
"title": post["title"],
"properties": {
"Source": "Reddit",
"Sentiment": sentiment["answer"],
"Upvotes": post["upvotes"],
"URL": post["url"]
}
}
)
Example 2: YouTube Content Aggregator
# Extract transcripts and create summaries
def process_youtube_playlist(video_urls):
# Get transcripts
transcripts = client.call_tool(
name="apify_Scrape_YouTube_Transcripts",
arguments={
"urls": video_urls,
"include_description": True,
"include_keywords": True
}
)
# Process each video
for video in transcripts["items"]:
# Summarize transcript
summary = client.call_tool(
name="web_scraper_Summarize_Webpage",
arguments={
"url": f"https://youtube.com/watch?v={video['video_id']}",
"max_length": 200
}
)
# Store in database
client.call_tool(
name="supabase_create_records",
arguments={
"table": "youtube_content",
"records": [{
"video_id": video["video_id"],
"title": video["title"],
"transcript": video["transcript"],
"summary": summary["summary"],
"keywords": video.get("keywords", [])
}]
}
)
Example 3: Real Estate Market Monitor
# Track property prices in target areas
def monitor_real_estate(addresses):
# Scrape current listings
properties = client.call_tool(
name="apify_Zillow_Scraper",
arguments={
"addresses": addresses,
"propertyStatus": "FOR_SALE"
}
)
# Analyze and alert
for prop in properties["items"]:
# Check if price is below threshold
if prop["price"] < 500000:
# Send alert
client.call_tool(
name="HTTPS_Call",
arguments={
"method": "POST",
"url": "https://hooks.slack.com/services/YOUR/WEBHOOK/URL",
"json": {
"text": f"🏠 Deal Alert: {prop['address']} for ${prop['price']:,}"
}
}
)
Troubleshooting
"Actor not found" Error
Cause: Required actor not available in your Apify account
Solutions:
- Verify actor name in Apify dashboard
- Check actor is active and accessible
- Ensure Apify subscription includes required actors
Slow Execution
Cause: Actors can take time to complete
Solutions:
- This is normal for large scraping jobs
- Reduce
max_itemsfor faster results - Use Apify's async execution for very large jobs
Missing Data
Cause: Platform blocks or rate limits
Solutions:
- Actors handle this automatically
- Retry failed requests
- Check Apify actor run logs
- Reduce scraping frequency
Related Tools
- Web Scraper - Simple webpage scraping
- Firecrawl - Advanced web scraping
- Perplexity - Analyze scraped content
- Notion - Store scraped data
- Supabase - Database storage
See Also
- Creating Custom Tools - Pre-configure scraping parameters
- Managing Credentials - Store Apify API key
- All Tools - Complete tool catalog