This article relates to the legacy Apify Crawler product, which is being retired in favor of the apify/legacy-phantomjs-crawler actor. All the information in this article is still valid, and applies to both the legacy Crawler product and the new actor. For more information, please read this blog post.

For new projects, we recommend using the newer apify/web-scraper actor that is based on the modern headless Chrome browser.


If you want to save whole images during crawling (not just URLs), you can do that directly in pageFunction without any need for an external library or API just by using base64 encoding.

First, we will create a generic function called encodeImageFromUrl to encode any URL. Note that the function is asynchronous, so we will need to use context.willFinishLater() and context.finish() and pass a callback to the encodeImageFromUrl function.

function encodeImageFromUrl(url, callback){
    var can = document.createElement('canvas');
    var ctx = can.getContext('2d');
    var img = new Image();
    img.onload = function(){
        can.width  = img.width;
        can.height = img.height;
        ctx.drawImage(img, 0, 0, img.width, img.height);
        var data = can.toDataURL()
        callback(data)
    }
    img.src = url;
    img.crossOrigin="anonymous"
}

In this code, we create a canvas element and set its context to 2D. Then we instantiate a new image object and add its onload callback. We set image source as the URL and crossOrigin as anonymous to allow access to another domain. In onload function, we set up canvas size and then draw the image on the canvas. We use native toDataURL() method to encode the image to base64. The data returned is a huge string. Then we just call the callback with the base64 string as an argument.

The whole pageFunction might look like this

function pageFunction(context) {
  // called on every page the crawler visits, use it to extract data from it   var $ = context.jQuery;
  context.willFinishLater()
  var result = {}
  var myUrl = $('my-image-element').attr('src')

  encodeImageFromUrl(myUrl, function(data){
      result.base64 = data
      context.finish(result)
  })

  function encodeImageFromUrl(url, callback){
      var can = document.createElement('canvas');
      var ctx = can.getContext('2d');
      var img = new Image();
      img.onload = function(){
          can.width  = img.width;
          can.height = img.height;
          ctx.drawImage(img, 0, 0, img.width, img.height);
          var data = can.toDataURL()
          callback(data)
      }        
      img.src = url;
      img.crossOrigin="anonymous"
   }}

In the results, under a "base64" key, we will find a string like this, which is a base64 representation of our image (the string is huge, this is just a small snippet):

"base64":"iVBORw0KGgoAAAANSUhEUgAAAHAAAACnCAYAAADJ29jcAAAACXBIWXMAAAsTAAALEwEAmpwYAAAgAElEQVR4AQAJgPZ/Ad7o5/8AAAAAAgICAAEBAQABAQEAAAAAAAAAAAAAAAAAAgICAAICAgACAgIAAQEBAAAAAAD///8A/v7+AP7+/gAHBwcA////AP7+/gD+/v4A/v7+AAAAAAABAQEAAAAAAAEBAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA+vn5AAYGBg"}
Did this answer your question?