When you try to crawl multiple pages that have the same URL but only differ in the POST data, the crawler thinks that all the URLs refer to the same page and only loads the first page. For example, let's say you have the following two Start URLs:

https://www.example.com/test.html[POST]data=1
https://www.example.com/test.html[POST]data=2

Once you run the crawler, you'll notice that it only loads the first Start URL and skips the second one. That's because the crawler only looks at the URL and ignores the POST data when deciding whether the page has already been visited. Because both URLs are the same, it thinks the page was already visited.

To work around that, you have three options:

1) Add a dummy query parameter to URLs to make them unique. For example:

https://www.example.com/test.html?dummy=1[POST]data=1
https://www.example.com/test.html?dummy=2[POST]data=2

This will ensure that each page has a unique URL and the crawler visits them all.

2) Implement the interceptRequest  function and assign a different uniqueKey to each of the URLs, so that the crawler considers both pages as unique and visits them both. Learn more about the uniqueKey in the Apify Crawler documentation.

3) Use context.enqueuePage()  to add the second URL and set its uniqueKey to something different. This option might be easier to implement than option 2).

Still not clear? Let us know.

Did this answer your question?