Keboola is cloud-based service that enables companies to integrate data from several sources to a single cloud storage. Keboola then makes it possible to import and transform data to several databases and platforms, such as GoodData, Tableau or Google Sheets.
We want to offer all these Keboola features to our customers and so we integrated Apify into the Keboola platform. In this article, I show you how you can easily set up integrations for your existing crawler or actor.
Getting started with Keboola
First, we have to sign in to the Keboola connection portal. If you don't have your account yet, you can create it from this link. Next, go to the Extractors section from the left-hand menu and click on the New Extractor button.
After that, we have to find the Apify extractor and click on it.
On the Apify extractor page, you can see all your configurations. We want to create new configurations, so we have to click on the New Configuration button for that.
In the next step, you can set up your extractor configuration name and description and click on the Create Configuration button.
Configure Apify extractor
With the created configuration, we can configure the extractor to get the data that we want. We can do that by clicking on the Configure Crawler button.
In the next step, we can choose an action. All possible actions are described below.
Run Crawler - This action runs the selected crawler and waits until the crawler run finishes. After the crawler run finishes, the results will be pushed to Keboola Storage.
Run Actor - This action runs the selected actor, waits until the actor finishes, then pushes all items from the default dataset to Keboola Storage.
Retrieve results from Crawler run - This action takes the executionId of the finished crawler run and retrieves all results from that.
Retrieve items from Dataset - This action takes the dataset ID or dataset name and retrieves all items from that.
After hitting "Next", you have to set up your Apify API credentials. You can go to your Apify account page, where you can copy & paste your credentials into the form.
In the next step, you can set up options for a specific run. For example, for the Run Crawler action, you can configure these options.
Crawler - You can choose which crawler you want to run. All crawlers from your account will be loaded in the selected box.
Input Table - You can choose a table from the Keboola platform to be sent to the crawler using the customData attribute.
Crawler Setting - You can override crawler setting for this specific run with the JSON format. In the example above, I override start URLs.
After you fill in all options, you have to save your options using the Save button.
Run configured extractor
After your extractor has been configured, we can run it. You can do that with the Run button in the upper right corner of your configuration.
After you run the extractor, you can go to job detail, which you can find in the list in the right-hand column.
After the run finishes, you can find the results on the job detail page under link in Storage Stats section.
As I said at the beginning, you can integrate your results with the dozens of other services that Keboola integrates. Check out the full list here. You can set up a writer for a selected service using Keboola Writer. You can also set up orchestrations, where you can transform, merge or split your data from results.
And that's it, thanks for reading! We’d love to hear from you if you’ve found a great use case for Keboola <> Apify integration, if so just let us know at firstname.lastname@example.org or contact us through chat.