In this article we will go through the most basic workflow with the Apify crawler API, which is to run the crawler and fetch its results.

All the information we will use is present in more detail in this part of the Apify documentation:
https://www.apify.com/docs/api/v1#/

Starting execution

The first part is to start the crawler we have prepared on our account. We will use this API endpoint:
https://api.apify.com/v1/user_id/crawlers/crawler_id/execute?token=rWLaYmvZeK55uatRrZib4xbZs

This request has three mandatory parameters: 'user_id', 'crawler_id' and your API token ('token'), which you can find in account => integrations tab on Apify web app.

If we send a POST request to this endpoint, the crawler will start just as if we had pressed the Run button in the web app. But we can also send with it an optional JSON body where we can rewrite any part of the crawler. For example, if we want to change 'startUrls' and add 'customData', we will send this body on the endpoint:

{
"startUrls": [
    {
      "key": "START",
      "value": "https://google.com"
    },
    {
      "key": "START",
      "value": "http://example.com"
    }
  ],
"customData": "some custom content"
}

Waiting for data

We have three basic options for how to set a connection between starting the crawler and getting the results.

  1. Using a finish webhook - This is probably the most elegant solution. In the advanced settings of the crawler, you set a finish webhook URL, where our API will send a POST request when the crawler finishes in any way (is stopped, failed, etc.). You can check its body for information about the run and extract the '_id' of the finished run to get the results. You can also send any data with it. This is an example of what a webhook POST body looks like: 
{
    _id: 'faRp7LosWiuRtamsH',
    actId: 'e7a3EfjwZjyXda8eP',
    data: {myData:'my data'}
}

2. Using wait parameter  - When you send a POST request to start a crawler as was shown in the previous part of this article, you can add one more parameter - 'wait'. The value of the parameter is a number of seconds from 0 up to 120 that you want to wait for the crawler to finish. If the crawler finishes in this period, our API will send you a response with some information and '_id' of the execution which you use for getting the results. 

Here's a sample POST request: 

https://api.apify.com/v1/user_id/crawlers/crawler_id/execute?token=rWLaYmvZeK55uatRrZib4xbZs&wait=120

This is a sample response:


{
  "_id": "br9CKmk457",
  "actId": "i6tjys5XNh",
  "startedAt": "2015-10-29T07:34:24.202Z",
  "finishedAt": "null",
  "status": "SUCCEEDED",
  "statusMessage": "null",
  "tag": "my_test_run",
  "stats": {
    "downloadedBytes": 74232,
    "pagesInQueue": 1,
    "pagesCrawled": 3,
    "pagesOutputted": 3,
    "pagesFailed": 0,
    "pagesCrashed": 0,
    "resultCount": 8,
    "totalPageRetries": 0,
    "storageBytes": 24795
  },
  "meta": {
    "source": "API",
    "method": "POST",
    "clientIp": "1.2.3.4",
    "userAgent": "curl/7.43.0",
    "scheduleId": "3ioW6u35s8g7kHDoE",
    "scheduledActId": "vJmysCj4xx98ftgKo",
    "scheduledAt": "2016-12-22T11:30:00.000Z"
  },
  "detailsUrl": "https://api.apify.com/v1/execs/br9CKmk457",
  "resultsUrl": "https://api.apify.com/v1/execs/br9CKmk457/results"
}

If the status is 'SUCCEEDED', it means the crawler finished properly and you can ask for the results. If you omitted the 'wait' parameter or it didn't finish in the specified time (max 120 seconds), the status will be 'RUNNING'.

3. Periodically polling the API  - The last option is to periodically poll our API about details of the execution. You need to use this endpoint: https://api.apify.com/v1/execs/execution_id where you specify the 'execution_id' which is the '_id' you get from starting the crawler without waiting. The response is the same as the one shown before. You can, for example, check this endpoint every minute and once it returns a status other than 'RUNNING', you can ask for results. Please keep in mind that asking often in short periods will lead to receiving errors instead of the information.

Getting the results

After choosing one of the specified ways to wait for the crawler to finish, you should now have the '_id' of the execution and know that it finished. The last part is to get the data.
You use this endpoint for that:
https://api.apify.com/v1/execs/execution_id/results?format=json&simplified=1

You replace 'execution_id' with the '_id' you got and then you can specify a few other parameters such as 'format', 'offset', 'limit', etc. Keep in mind that if you want to get more than 100000 results, you need to paginate through them using 'limit' and 'offset' parameters. You can read detailed documentation here: https://www.apify.com/docs/api/v1#/reference/results/get-execution-results

Did this answer your question?