Data scraping is one of the main use cases for Apify. We have built our actors as a platform where you can run any job in the cloud. On top of that, we have built Apify SDK, a JavaScript scraping and automation library that enables us to write really powerful scrapers efficiently and without hard-to-solve bugs. We have also built three generic scrapers which come with a nice UI and allow rapid development and easy setup for new users. 

That's a lot of features and we know how overwhelming it can feel for our new users. You, as a user, care much more about extracting high quality data that can power your business than about the technical details of our platform. That's why we try to make sure that we provide you with the support, tutorials and articles you need to easily choose the right tool for the job.

Apify Store

Okay, so you arrive on our website and wonder what's the quickest path to the data you need. Apify Store is definitely the first place you should look. It's full of open-sourced actors (Google Search, TripAdvisor, Booking, Instagram, etc.) that you can use right away without any upfront commitment. Just sign up for a free account, set your input, press run and you will see the data flowing in. 

But if you need a more custom solution or are scraping a not very well known website, you will probably need to think further. The next step is to ask yourself whether you want to develop the scrapers yourself (or with your team) or you'd rather let our experts prepare a solution for you from scratch. 

Apify Marketplace

On Apify Marketplace, we connect our customers with talented developers who have been through our internal training. You can think of it as a more specialized and more managed Freelancer.com. You can be sure that the developers know our platform well and have lots of scraping expertise. You simply provide a specification and the developers will bid their price so you can choose whatever offer makes the most sense for you. The developer will then build the actor and copy it to your account along with an initial dataset. You own the code and it also includes a guaranteed period during which Apify will fix any problems with the actor for free.

Apify for Enterprise

If you don't want to spend time managing multiple projects on Apify Marketplace, we also offer a premium service called Apify for Enterprise. As our enterprise customer, we will assign you a dedicated data expert who will manage the whole project, will send you periodic reports and communicate with you via whatever tools you prefer (email, Slack, UberConference, etc.). We will sign a contract describing specific SLA, integrate into your platform and maintain the smooth workflow of the project in the long term (if needed). 

Develop your own solution

The most interesting but also the most difficult is the do-it-yourself path. You will need to write at least a bit of JavaScript, but you definitely don't need to be a senior developer. Just don't give up too easily! Fortunately, Apify provides you with some of the best development tools on the market. In the rest of this article, we will discuss these tools and their particular pros and cons.

Scrapers vs. SDK

We have built three generic scrapers, namely Web Scraper, Cheerio Scraper and Puppeteer Scraper. Their main goal is to provide a nice UI plus configuration that will help you start extracting data as quickly as possible. Usually, you just provide a simple JavaScript function and set up one or two parameters and you're good to go. Since scraping can have various forms, we decided to build not just one, but three scrapers so you can always pick the right tool for the job.

Moving on to more complicated development, Apify SDK is just a JavaScript library. You  developing a standard Node.js application, so be prepared to write a little more code. The reward for that is almost infinite flexibility. There aren't many problems that you cannot solve with the whole JavaScript ecosystem at your disposal (and Apify SDK to guide you through it).

Let's discuss each particular tool and its advantages and disadvantages.

Web Scraper

If you remember our old crawler platform, the Web Scraper is just like that, but cooler. It simply crawls the website's pages with a Headless Chrome browser and executes a bit of code (that we call a pageFunction) on each page.

+ simple
+ fully JavaScript rendered pages

- can only execute client-side (inside the browser) JavaScript

Cheerio Scraper

The UI of Cheerio Scraper is almost the same as that of Web Scraper. What changes is the underlying technology. Instead of using a browser to crawl the website, it fires a series of simple HTTP requests to get the page's HTML. Your code then executes on top of that HTML with the help of Cheerio parser, which is basically a JQuery on the server. The main goal here is speed (and therefore being cost effective).

+ simple
+ fast

- some pages may not be fully rendered (lacking JavaScript rendering)
- can execute only server-side(in Node.js) JavaScript

Puppeteer Scraper

Unlike the two previous scrapers, Puppeteer Scraper doesn't focus primarily on simplicity, but provides you with a greater variety of powerful features. You have the whole Puppeteer library for managing Headless Chrome at your disposal.

+ powerful Puppeteer functions (methods on the page object)
+ can execute both server-side and client-side JavaScript

- more complex

Apify SDK

If you love JavaScript and Node.js as much as we do, definitely go for our SDK. Apify SDK is fully open-sourced on GitHub and many developers are actively contributing to its continuous growth. It is meant mainly as a tool for developers who know at least the basics of Node.js so they can get up to speed quickly and build powerful applications. The SDK outsources the hard problems (like managing concurrency, auto-scaling, request queues, etc.) from the developer to the library so you can just focus on the task you want to complete.

+ complete freedom, maximum flexibility
+ full power of JavaScript ecosystem (npm) and Apify SDK

- requires that you write some boilerplate code
- more complex, higher chance of making mistakes (the flip side of freedom)

Epilogue

Whatever tool you choose, don't feel trapped by your initial choice. Play and experiment: you can easily transfer large parts of the code between all of our tools. Over time you'll get a feeling for when to use what and you may also find your personal favorite.

If you feel like our scrapers aren't doing what they should do, you can always report an issue on their public GitHub page. 

Just remember you shouldn't post cases when your own code fails for any reason. If that is the case and you cannot progress despite your best efforts, it's best to post the problem on StackOverflow with an apify  tag. Our experts go through these posts periodically and will leave you a reply anybody can learn from. Or you can always just contact our support team!

Happy scraping!

Did this answer your question?