Comparison of popular web scraping API services

Comparison of popular web scraping API services

Meltdown between ScraperAPI, ScrapingBee, ScrapeStack, ScrapeUp, ProxyCrawl and more.

Web scraping is challenging task. We can get data using curl, Axios and extract the data ourselves. But sometimes just using curl is not enough to handle massive amount of extraction.

Ever had trouble extracting data from a giant e-commerce website with millions of pages for your price comparison website?

  • You will need to have access to API for the giant which can either be non-existent or very costly depending on the use case.
  • Some of the sites will block your IP if you visit them too frequently, put captcha or restrict you in other ways. You will need a solution to handle captcha as well.
  • Some of those sites will be slow to load from your region. Or change the price depending on your localization and timezone. You will need proxies to handle those.

That's where scraping API services comes to play. They makes web scraping a breeze. Most of them have support for many languages like Bash, Node, Python, PHP etc, and important features like rotating proxies, Captcha, rendering JavaScript, custom headers, sessions, location spoofing, etc.

Scraping with ScraperAPI

Let's start with a sample use-case using NodeJS. We can use python, dart and almost any programming language since these services will provide you with an API, as well as SDK/libraries.

We will create a simple project using npm.

mkdir scraper && cd scraper
npm init -y
npm install axios

For the target website, we will use example.com to get a very basic idea of how to do it.

const axios = require('axios').default;

async function scrape(){
    const response = await axios
        .get('<http://api.scraperapi.com>', {
            params: {
                api_key: "API_KEY",
                url: "<http://example.com>"
            }});
    console.log(response.data);
}

scrape();

You would need to change the API_KEY with appropriate api key you get from the dashboard.

Once you run this code, you would get a response in the console.

➜  node index.js
<!doctype html>
<html>
<head>
    <title>Example Domain</title>

    <meta charset="utf-8" />
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <style type="text/css">
    body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;

    }
...

What if you want to extract data from a dynamic website? For example the data from duckduckgo search result page? You just need to add render: true to the params.

const axios = require('axios').default;

async function scrape(){
    const response = await axios
        .get('<http://api.scraperapi.com>', {
            params: {
                api_key: "API_KEY",
                url: "<https://duckduckgo.com/?q=test&t=h_&ia=web>",
                render: true
            }});
    console.log(response.data);
}

scrape();

It would render the page with proper scripts, and return code from the rendered page, which is very useful for dynamic websites built with frameworks like ReactJS, NextJS, Angular, VueJS etc.

It takes less than few minutes to setup and get started whereas setting up a project using chromium takes hours of work. Ofc both has it's own perks, but it's good for prototypes.

Comparison between Scraping API Services

Even though ScraperAPI is a great start, it's best to compare several services to see how fast, powerful, developer-friendly, customer-friendly they are.

We will use these few sites to see if they can even load the pages. It's alright to fail the tests since these are all highly protected sites. It's alright to fail this test, but will be a huge plus point if it loads.

  • Example.com for a quick latency test. The faster they loads, the better, despite the location of these services and my current location.
  • Datadome blog, they are protected by datadome. One of the strong anti-bot services. The scraping proxy should load the page without any problem.
  • Booking.com, they are protected by PerimeterX. Another strong anti-bot service. The page should load even after multiple requests.
  • Sephora.com, they are protected by cloudflare. Normal bots shouldn't even be able to load them, since they would be blocked at the DNS level.
  • FastPeopleSearch. They are also protected by multiple services, including cloudflare and several captcha services. Normal bots shouldn't even be able to load them.

ScraperAPI

One of the oldest, first and best web scraping api service. They have best customer support and can handle massive amount of loads. They are trusted by founder of Parse, optimization director at SquareTrade and many more. They are my go-to solution for most of the time.

Pricing:

  • 10% discount if promo code TAHER10 is used
  • 5000 Free Credits
  • $29 per month for 250K api calls with 10 concurrency.
  • $249 per month for 3M API calls with 50 concurrent threads.
  • Enterprise plans could cost $700-$1000 per month with custom anti-bot bypasses.

Features:

  • JavaScript rendering
  • Automatic proxy rotation
  • Residential proxies
  • Geotargetting
  • Custom Sessions

Test Results:

  • ✔️ Example
  • ✔️ Datadome
  • ✔️ Booking
  • ✔️ Sephora
  • ✔️ FastPeopleSearch

ScrapeUp

They are fairly new service with a good amount of free credits.

Pricing:

  • 10000 Free Credits (⚠️ but you will need to add your credit card)
  • $29 per month for 250K api calls with 10 concurrency.
  • $249 per month for 3M API calls with 50 concurrent threads.
  • Annual plan with 10% discount.

Features:

  • JavaScript rendering
  • Pagination and infinite scroll
  • Automatic proxy rotation
  • Residential proxies
  • Geo targeting

ScrapeStack

They are a service provided by apilayer, serving companies like Amazon, Slack, Zendesk with their various API products.

Pricing:

  • 250 Free Credits
  • $19.99 per month for 200K api calls with concurrency.
  • $199.99 per month for 3M API calls with concurrency.
  • Enterprise plans.
  • Annual subscription has 20% discount.

Features:

  • Javascript rendering
  • Automatic proxy rotation
  • Residential proxies
  • Geotargetting

Test Results:

  • ✔️ Example
  • ❌ Datadome
  • ✔️ Booking
  • ❌ Sephora
  • ❌ FastPeopleSearch

ScrapingBee

Not just a simple HTML scraper like others. They have post-loading functions for screenshot, extraction etc.

Pricing:

  • 1000 Free Credits
  • $99 per month for 1M api calls with 10 concurrency.
  • $249 per month for 2.5M API calls with 40 concurrent threads.
  • Enterprise plans could cost $700-$1000 per month.

Features:

  • Javascript rendering
  • Automatic proxy rotation
  • Residential proxies
  • Geotargetting
  • Screenshot, extraction rules, google search API

Test Results:

  • ✔️ Example
  • ✔️ Datadome
  • ✔️ Booking
  • ✔️ Sephora
  • ❌ FastPeopleSearch

ProxyCrawl

Scraping API is just one of their features. They also have API for crawling, backconnect proxies, leads, screenshot and many more.

Pricing:

  • 1000 Free Credits (2000 credits if joined through this link)
  • $29 per month for 50k api calls with 10 concurrency.
  • $149 per month for 1M API calls with 30 concurrent threads.

Features:

  • Javascript rendering
  • Automatic proxy rotation
  • Residential proxies
  • Geotargetting

Test Results:

  • ✔️ Example
  • ✔️ Datadome
  • ✔️ Booking
  • ✔️ Sephora
  • ❌ FastPeopleSearch

At a glance

Site/ServiceScraperAPIScrapingBeeScrapeStackProxyCrawl
Example.com
Datadome Blog
Sephora.com
Booking.com
FastPeopleSearch.com

If you want to extract data from big companies for machine learning and educational research, it would be best to leverage the proxy API instead of trying to handle this mess yourself unless you have a very custom requirement or require some level of web automation.