Crawlers are a great way to get started on Apify and easily scrape big websites. But their dedication to crawling doesn't allow them to process data after they finish, because all your code runs inside a web browser. That's one of the reasons we created Apify actors.

Actors can be used to run arbitrary code and implement very complicated workflows that are impossible inside a crawler's page function. One way to use actors is in post-processing data from a crawler. You can send the data to a database, as an email attachment or just simply modify it and save it again. 

We will assume that you have already a working crawler that outputs some data. Now lets move to the actor. Let's start by creating a new actor, naming it postprocessing and then heading to the API tab where you can find the Run actor API endpoint URL. Copy this to your clipboard and return to our crawler.

Open the advanced settings tab and scroll down to Finish webhook URL input and paste the copied URL here. Press save and we're done with the crawler.

Let's move to our actor again. After the crawler finishes, it will call the run endpoint and the actor will start with the input it gets from the webhook. This is what the input looks like by default.

{_id: "S76d9xzpvY7NLfSJc", actId: "lepE4f93lkDPqojdC"}

We are interested in the _id property, which is the id of the crawler run that just finished. Thanks to that, we can upload the data to our actor using the Apify client. It is pretty simple.

const response = await Apify.client.crawlers.getExecutionResults({
        executionId: input._id
    }))
const data = response.items

Now that we have the data loaded, we can manipulate it and save it to the dataset, which is a very similar storage type to the crawler's execution results. Let's imagine we have a list of people and we want to save only the women.

const processedData = data.filter(item=> item.gender === 'woman')
await Apify.pushData(processedData)

Now we have saved the processed data to a new dataset that we can access in the actor interface itself.

The whole actor would look like this

const Apify = require('apify');

Apify.main(async () => {
    const input = await Apify.getValue('INPUT');

    const response = await Apify.client.crawlers.getExecutionResults({
        executionId: input._id
    }))
    const data = response.items
   
    const processedData = data.filter(item=> item.gender === 'woman')
    await Apify.pushData(processedData)
});

Did this answer your question?