Once you set up and test your crawler, you can integrate it into your application via our API. Many of the questions we get are about API integration from PHP, so we've prepared this article on using Apify's API with a PHP application.

What is our goal?

The main goal of this article is to give you an example of how to start your crawler and fetch results from a PHP application. We'll also show you how to alter the crawler configuration when starting the crawler.

Where can I find the API endpoint for my crawler?

Each crawler you set up at Apify has its own automatically-generated API endpoint. You can find this in the API section of the crawler configuration page. The API for starting the crawler has the following format:

https://api.apify.com/v1/USER_ID/crawlers/CRAWLER_NAME/execute?token=SECURITY_TOKEN

You can test this API and also read more information and a security warning there.

Starting the crawler and fetching results from PHP

We can now use this API from our PHP application. Here's an example of how to start the crawler and get a response describing the crawler run:

<?php

$ch = curl_init('https://api.apify.com/v1/USER_ID/crawlers/CRAWLER_NAME/execute?token=SECURITY_TOKEN');
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result_json = curl_exec($ch);

In $result_json you can find the response describing the crawler run:

{
  "_id": "SiTBQ4HdCKT3cJQCq",
  "actId": "TSyzQvetkP56rMPXw",
  "startedAt": "2016-07-27T08:41:30.544Z",
  "finishedAt": null,
  "status": "RUNNING",
  "statusMessage": null,
  "stats": {
    "workersUsed": 0,
    "downloadedBytes": 0,
    "pagesInQueue": 0,
    "pagesCrawled": 0,
    "pagesOutputted": 0,
    "pagesFailed": 0,
    "pagesCrashed": 0,
    "pagesRetried": 0,
    "totalPageRetries": 0,
    "storageBytes": 0
  },
  "detailsUrl": "https://api.apify.com/v1/execs/SiTBQ4HdCKT3cJQCq",
  "resultsUrl": "https://api.apify.com/v1/execs/SiTBQ4HdCKT3cJQCq/results"
}

Even if the crawler is still running, you can fetch results (data you outputted from the pageFunction) using the resultsUrl . To get complete results when the crawler is finished, you can periodically check detailsUrl  for crawler status or even better use a webhook, which you can set up under advanced settings on the crawler configuration page.

Changing crawler configuration when starting the crawler

Sometimes you want to run your crawler and change some of its settings for the current run. A typical use case is a crawler configuration with set Start URLs when starting the crawl. You can achieve this by altering these settings when starting the crawler via API using POST data with new crawler settings.

Here's an example in PHP (with some basic output):

<?php

$data = array(
    "startUrls" => [
        array("key" => "TEST", "value" => "http://www.example.com"),
        array("key" => "TEST2", "value" => "http://www.example.com?test2")]
);
$data_json = json_encode($data);

echo "<h1>Invoking Apify API with:</h1>";
echo "<pre>\n".$data_json."</pre>";
echo "<br><br>";

$ch = curl_init('https://api.apify.com/v1/USER_ID/crawlers/CRAWLER_NAME/execute?token=SECURITY_TOKEN');
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_POSTFIELDS, $data_json);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
        'Content-Type: application/json',
        'Content-Length: ' . strlen($data_json))
);
$result_json = curl_exec($ch);

echo "<h1>Result:</h1>";
echo "<pre>\n".$result_json."</pre>";

Note that this only changes the crawler settings for the current run. This enables you to run crawls in parallel with different configurations.

Let us know if you come across any issues or would like to contribute examples for other languages.

Did this answer your question?