Skip to main content

Apify

Social media and web scraping powered by Apify actors.

Provider: Apify.com
Authentication: API key required
Category: Social Media & Web Scraping
Credit Cost: 5 credits per actor run

Overview

Apify tools provide access to powerful scraping actors for social media platforms and specialized websites. These actors handle complex scraping challenges including anti-bot protection, rate limiting, and data normalization.

Setup

Get Apify API Key

  1. Sign up at apify.com
  2. Navigate to Settings → Integrations
  3. Copy your API token
  4. Add to Reeva:
    • Dashboard → AccountsAdd Account
    • Select Apify
    • Paste API key
    • Save

Available Tools

Reddit Scraper

Scrape Reddit posts, comments, communities, and user profiles.

Tool ID: apify_Reddit_Scraper
Credit Cost: 5 credits
Apify Actor: reddit-scraper-lite

Parameters:

  • start_urls (array, optional): Direct URLs to Reddit posts or profiles
  • searches (array, optional): Search terms to find posts
  • subreddits (array, optional): Subreddit names to scrape
  • users (array, optional): User profiles to scrape
  • search_posts (boolean, optional): Include posts in search
    • Default: true
  • search_comments (boolean, optional): Include comments in search
    • Default: false
  • sort (string, optional): Sort order (new, hot, top, rising)
    • Default: "new"
  • time (string, optional): Time filter (hour, day, week, month, year, all)
    • Default: "all"
  • max_items (integer, optional): Maximum items to scrape
    • Default: 50

Example Usage:

# Python - Scrape subreddit
response = client.call_tool(
name="apify_Reddit_Scraper",
arguments={
"subreddits": ["machinelearning"],
"sort": "top",
"time": "week",
"max_items": 25
}
)

for post in response["items"]:
print(f"{post['title']} - {post['upvotes']} upvotes")
// TypeScript - Search Reddit
const response = await client.callTool({
name: "apify_Reddit_Scraper",
arguments: {
searches: ["MCP protocol"],
search_posts: true,
max_items: 10
}
});

Use Cases:

  • Market research and sentiment analysis
  • Track brand mentions
  • Gather community feedback
  • Monitor competitor discussions
  • Research trends and topics

Tweet Scraper

Extract tweets via searches, handles, or direct URLs.

Tool ID: apify_Tweet_Scraper
Credit Cost: 5 credits
Apify Actor: tweet-scraper
Status: Currently inactive

Parameters:

  • start_urls (array, optional): Direct tweet URLs
  • search_terms (array, optional): Keywords to search for
  • twitter_handles (array, optional): User handles to scrape
  • conversation_ids (array, optional): Conversation thread IDs
  • max_items (integer, optional): Maximum tweets to scrape
  • sort (string, optional): Sort order
  • tweet_language (string, optional): Filter by language code (e.g., "en")
  • only_verified_users (boolean, optional): Only verified accounts
  • only_twitter_blue (boolean, optional): Only Twitter Blue subscribers
  • only_image (boolean, optional): Only tweets with images
  • only_video (boolean, optional): Only tweets with videos
  • only_quote (boolean, optional): Only quote tweets
  • minimum_retweets (integer, optional): Minimum retweet count
  • minimum_favorites (integer, optional): Minimum like count
  • minimum_replies (integer, optional): Minimum reply count

Example Usage:

# Python - Monitor brand mentions
response = client.call_tool(
name="apify_Tweet_Scraper",
arguments={
"search_terms": ["@YourBrand"],
"max_items": 50,
"sort": "Latest"
}
)

Use Cases:

  • Social media monitoring
  • Influencer analysis
  • Sentiment tracking
  • Competitive intelligence
  • Trend analysis

YouTube Transcripts

Extract video transcripts and metadata from YouTube.

Tool ID: apify_Scrape_YouTube_Transcripts
Credit Cost: 5 credits
Apify Actor: youtube-transcripts

Parameters:

  • urls (array, required): YouTube video URLs
  • output_format (string, optional): Format for transcripts
    • Default: "captions"
  • max_retries (integer, optional): Retry attempts for failed requests
    • Default: 8
  • include_channel_name (boolean, optional): Include channel name
    • Default: true
  • include_channel_id (boolean, optional): Include channel ID
    • Default: true
  • include_date_published (boolean, optional): Include publish date
    • Default: true
  • include_view_count (boolean, optional): Include view count
    • Default: false
  • include_likes (boolean, optional): Include likes count
    • Default: false
  • include_comments (boolean, optional): Include comments
    • Default: false
  • include_keywords (boolean, optional): Include video keywords
    • Default: false
  • include_thumbnail (boolean, optional): Include thumbnail URL
    • Default: false
  • include_description (boolean, optional): Include video description
    • Default: false

Response:

{
"items": [
{
"video_id": "dQw4w9WgXcQ",
"title": "Video Title",
"channel_name": "Channel Name",
"channel_id": "UCxxxxxxxx",
"date_published": "2025-11-22",
"transcript": "Full transcript text...",
"duration": "3:45"
}
]
}

Example Usage:

# Python - Get video transcript
response = client.call_tool(
name="apify_Scrape_YouTube_Transcripts",
arguments={
"urls": ["https://www.youtube.com/watch?v=VIDEO_ID"],
"include_description": True,
"include_view_count": True
}
)

transcript = response["items"][0]["transcript"]
print(f"Transcript length: {len(transcript)} characters")
// TypeScript - Batch transcript extraction
const videoUrls = [
"https://www.youtube.com/watch?v=VIDEO1",
"https://www.youtube.com/watch?v=VIDEO2",
"https://www.youtube.com/watch?v=VIDEO3"
];

const response = await client.callTool({
name: "apify_Scrape_YouTube_Transcripts",
arguments: {
urls: videoUrls,
include_keywords: true,
include_thumbnail: true
}
});

Use Cases:

  • Content analysis and summarization
  • SEO research and keyword extraction
  • Educational content processing
  • Accessibility (add captions to videos)
  • Research and fact-checking

Zillow Scraper

Extract real estate property details from Zillow.

Tool ID: apify_Zillow_Scraper
Credit Cost: 5 credits
Apify Actor: zillow-detail-scraper

Parameters:

  • startUrls (array, optional): Direct Zillow property URLs
    • Format: https://www.zillow.com/homedetails/Address/12345678_zpid/
  • addresses (array, optional): Property addresses
    • Format: "123 Main St, City, State"
  • propertyStatus (string, optional): Property status filter
    • Options: "FOR_SALE", "RECENTLY_SOLD", "FOR_RENT"
    • Default: "RECENTLY_SOLD"
  • extractBuildingUnits (string, optional): Extract individual units from buildings
    • Options: "disabled", "all", "for_sale", "recently_sold", "for_rent", "off_market"
    • Default: "disabled"

Response:

{
"items": [
{
"address": "123 Main St, San Francisco, CA 94102",
"zpid": "12345678",
"price": 1250000,
"bedrooms": 3,
"bathrooms": 2.5,
"sqft": 1800,
"lot_size": 4500,
"year_built": 2010,
"property_type": "Single Family",
"listing_url": "https://www.zillow.com/...",
"images": ["url1", "url2"],
"description": "Beautiful home..."
}
]
}

Example Usage:

# Python - Scrape by address
response = client.call_tool(
name="apify_Zillow_Scraper",
arguments={
"addresses": [
"123 Main St, San Francisco, CA",
"456 Oak Ave, Los Angeles, CA"
],
"propertyStatus": "FOR_SALE"
}
)

for property in response["items"]:
print(f"{property['address']}: ${property['price']:,}")
// TypeScript - Scrape by URLs
const response = await client.callTool({
name: "apify_Zillow_Scraper",
arguments: {
startUrls: [
{ url: "https://www.zillow.com/homedetails/Address/12345678_zpid/" }
],
propertyStatus: "RECENTLY_SOLD"
}
});

Use Cases:

  • Real estate market analysis
  • Property investment research
  • Comparative market analysis (CMA)
  • Price trend monitoring
  • Rental market research

Cost Management

Understanding Apify Credits

  • Apify charges separately for actor runs (consumes Apify credits)
  • Reeva charges 5 credits per tool execution
  • Total cost = Reeva credits (5) + Apify credits (varies)

Optimization Tips

  1. Batch Requests: Scrape multiple items in one call

    • Use max_items to control volume
    • Pass multiple URLs/searches in single request
  2. Filter Early: Use parameters to reduce data

    • only_verified_users for Twitter
    • propertyStatus for Zillow
    • Time filters for Reddit
  3. Cache Results: Store scraped data to avoid re-scraping

    • Use Supabase or Notion to cache
    • Check cache before making new requests
  4. Monitor Usage: Track Apify actor consumption

    • Review Apify dashboard regularly
    • Set up usage alerts

Best Practices

Rate Limiting

  • Apify actors handle rate limiting automatically
  • Spread large scraping jobs over time
  • Use Apify's built-in retry mechanisms

Data Quality

  • Validate scraped data before processing
  • Handle missing fields gracefully
  • Be aware of platform changes affecting scrapers
  • Review platform terms of service
  • Respect robots.txt and rate limits
  • Use scraped data ethically
  • Attribute data sources appropriately

Integration Examples

Example 1: Reddit Sentiment Analysis

# Scrape Reddit, analyze sentiment, store in Notion
def analyze_brand_sentiment(brand_name):
# Scrape Reddit mentions
reddit_data = client.call_tool(
name="apify_Reddit_Scraper",
arguments={
"searches": [brand_name],
"max_items": 100,
"time": "week"
}
)

# Analyze each post
for post in reddit_data["items"]:
# Use Perplexity to analyze sentiment
sentiment = client.call_tool(
name="Perplexity_Ask",
arguments={
"question": f"Analyze sentiment of this Reddit post: {post['title']} {post['body']}"
}
)

# Store in Notion
client.call_tool(
name="notion_create_page",
arguments={
"title": post["title"],
"properties": {
"Source": "Reddit",
"Sentiment": sentiment["answer"],
"Upvotes": post["upvotes"],
"URL": post["url"]
}
}
)

Example 2: YouTube Content Aggregator

# Extract transcripts and create summaries
def process_youtube_playlist(video_urls):
# Get transcripts
transcripts = client.call_tool(
name="apify_Scrape_YouTube_Transcripts",
arguments={
"urls": video_urls,
"include_description": True,
"include_keywords": True
}
)

# Process each video
for video in transcripts["items"]:
# Summarize transcript
summary = client.call_tool(
name="web_scraper_Summarize_Webpage",
arguments={
"url": f"https://youtube.com/watch?v={video['video_id']}",
"max_length": 200
}
)

# Store in database
client.call_tool(
name="supabase_create_records",
arguments={
"table": "youtube_content",
"records": [{
"video_id": video["video_id"],
"title": video["title"],
"transcript": video["transcript"],
"summary": summary["summary"],
"keywords": video.get("keywords", [])
}]
}
)

Example 3: Real Estate Market Monitor

# Track property prices in target areas
def monitor_real_estate(addresses):
# Scrape current listings
properties = client.call_tool(
name="apify_Zillow_Scraper",
arguments={
"addresses": addresses,
"propertyStatus": "FOR_SALE"
}
)

# Analyze and alert
for prop in properties["items"]:
# Check if price is below threshold
if prop["price"] < 500000:
# Send alert
client.call_tool(
name="HTTPS_Call",
arguments={
"method": "POST",
"url": "https://hooks.slack.com/services/YOUR/WEBHOOK/URL",
"json": {
"text": f"🏠 Deal Alert: {prop['address']} for ${prop['price']:,}"
}
}
)

Troubleshooting

"Actor not found" Error

Cause: Required actor not available in your Apify account

Solutions:

  • Verify actor name in Apify dashboard
  • Check actor is active and accessible
  • Ensure Apify subscription includes required actors

Slow Execution

Cause: Actors can take time to complete

Solutions:

  • This is normal for large scraping jobs
  • Reduce max_items for faster results
  • Use Apify's async execution for very large jobs

Missing Data

Cause: Platform blocks or rate limits

Solutions:

  • Actors handle this automatically
  • Retry failed requests
  • Check Apify actor run logs
  • Reduce scraping frequency

See Also