This article relates to the legacy Apify Crawler product, which is being retired in favor of the apify/legacy-phantomjs-crawler actor. All the information in this article is still valid, and applies to both the legacy Crawler product and the new actor. For more information, please read this blog post.
For new projects, we recommend using the newer apify/web-scraper actor that is based on the modern headless Chrome browser.
In this article, we'll show you how to transfer cookies from your web browser to an Apify crawler, in order to crawl a website that requires a login. Note that this is not the only way to crawl pages behind a login form, for example you can also fill in the login form directly with your crawler or POST to it from start URLs.
First, you'll need to install EditThisCookie extension to your web browser. After you install it, go to the website you'd like to crawl and log in using your credentials.
Click the EditThisCookie button right next to your URL and click Export. The cookies will be copied to the clipboard as a JSON array that is compatible with the cookie format used by PhantomJS (the headless web browser that we use for crawling).
In the next step, create a new crawler on Apify, set the Start URLs to point to the requested website
and then paste the exported cookies to the Initial cookies field.
Now run your crawler and voilà, it is immediately logged in to the desired website:
Note that in the previous step we also set the Cookies persistence field to Over all crawler runs. With this setting, the Initial cookies will be automatically updated whenever your crawler finishes, which helps you prevent expiration of the cookies. If you run your crawler often enough, it might remain logged in to the website for a very long time. Whenever the login expires, you'll need to repeat the above steps.
Happy logged-in crawling!