Scroll Top

Mastering Request Interceptions in Puppeteer

Cloud computing for digital storage and transfer big data on int

Google developed Puppeteer, a Node library, which offers a high-level API for controlling headless or full browsers via the DevTools Protocol. One powerful feature of Puppeteer is the ability to intercept and manipulate network requests, allowing developers to customize requests, modify responses, and control data flow during web scraping or automation.

Understanding Request Interception

Request interception in Puppeteer allows you to observe, modify, or block outgoing HTTP requests and incoming responses. This feature is handy when optimizing page loading, simulating various network conditions, or handling dynamic content loading.

Enabling Request Interception

To activate request interception in Puppeteer, you employ two functions: page.setRequestInterception() and page.on().

This involves three essential steps:

  1. Use activate request interception on the page: page.setRequestInterception(true).
  2. The system captures all requests made on the site and emits an event for each network request.
  3. Use page.on('response') to capture all API responses on the site.
await page.setRequestInterception(true);
page.on('request', (request) => {
  // Your custom logic here
  request.continue();
});
page.on('response', (response) => {
  // Your response handling logic here
})

Modifying Requests

Request interception facilitates modification of outgoing requests’ properties, such as setting custom headers, altering request methods, or adjusting the request payload.

page.on('request', (request) => {
  const headers = request.headers();
  headers['Authorization'] = 'Bearer YOUR_TOKEN';
  request.continue({ headers });
});

In this example, we add an Authorization header to each outgoing request. 

Blocking Requests

Another powerful aspect of request interception is the ability to block specific requests based on certain conditions.

page.on('request', (request) => {
  if (request.url().includes('blocked-resource')) {
    request.abort();
  } else {
    request.continue();
  }
});

In this instance, requests to a resource containing ‘blocked-resource’ in its URL are blocked.

Real-world Examples

Let’s explore practical use cases for request interception in Puppeteer:

  1. Dynamic Content Loading

Many modern websites load content dynamically via AJAX requests. Intercepting these requests allows you to pause your automation until specific data has been loaded.

page.on('request', async (request) => {
  if (request.url().includes('dynamic-content')) {
    await request.continue();
    await page.waitForSelector('.loaded-element');
  } else {
    request.continue();
  }
});

This example waits for an element with the class ‘loaded-element’ to appear after intercepting a request to a URL containing ‘dynamic-content.’

  1. API Mocking

During testing, it may be necessary to simulate different scenarios by mocking API responses. Request interception facilitates this process.

page.on('request', (request) => {
  if (request.url().includes('mock-api')) {
    request.respond({
      status: 200,
      contentType: 'application/json',
      body: JSON.stringify({ mockData: true }),
    });
  } else {
    request.continue();
  }
});

Here, any request to a URL containing ‘mock-api’ will receive a mocked JSON response.

 

Note:

Keep in mind that Puppeteer’s page.on("request") only captures requests made using the page object (e.g., via page.goto, page.evaluate, etc.). Puppeteer captures XHR and fetches requests made within the page’s context but may not intercept requests initiated outside the page’s context, such as within an iframe or by injected scripts.

An alternate way to access the request and responses without request interception

As previously mentioned, we can capture the request and make a fetch call to obtain a new request with a modified payload.

Practical implementations for the alternative approach

Now let’s start the implementation of request interception on the IRCTC website.

const puppeteer = require("puppeteer-extra");
const pluginStealth = require("puppeteer-extra-plugin-stealth")();
puppeteer.use(pluginStealth);
const scrape = async () => {
  const browser = await puppeteer.launch({
    headless: false,
    defaultViewport: null,
  });
  const page = await browser.newPage();

  await page.goto("https://www.irctc.co.in/nget/train-search", {
    waitUntil: "networkidle0",
  });

  await page.type("#destination > span > input", "MAS");
  await page.keyboard.press("ArrowDown");
  await page.keyboard.press("Enter");
  await page.type("#origin > span > input", "KRR");
  await page.keyboard.press("ArrowDown");
  let headers;
  page.on("response", async (response) => {
    if (
      response
        .request()
        .url()
        .includes(
          "https://www.irctc.co.in/eticketing/protected/mapps1/altAvlEnq/TC"
        )
    ) {
      headers = response.request().headers();
      const apiRes = await fetch(
        "https://www.irctc.co.in/eticketing/protected/mapps1/altAvlEnq/TC",
        {
          headers,
          body: '{"concessionBooking":false,"srcStn":"MAS","destStn":"MMCT","jrnyClass":"","jrnyDate":"20240225","quotaCode":"GN","currentBooking":"false","flexiFlag":false,"handicapFlag":false,"ticketType":"E","loyaltyRedemptionBooking":false,"ftBooking":false}',
          method: "POST",
          credentials: "omit",
        }
      );
      console.log(await apiRes.json());
    }
  });
  await page.keyboard.press("Enter");
  await page.click("[label='Find Trains']");
};
scrape();

 

In the above code, we would have accessed the response emitter and then entered the destination station as KRR. However, in the API fetch call body, we are using MMCT for the destination station. Thus, we get the response as per the body and can access the data accordingly.

Note: Sometimes the above code doesn’t work because the IRCTC asks for a login; in such cases, wait for some time and try again later. 

Conclusion

Delving into the realm of Puppeteer’s request interception opens up a treasure trove of possibilities for web automation and testing. Picture this: you have the power to tweak headers, intercept and block specific requests, or even simulate diverse network conditions at your fingertips. It’s akin to possessing a powerful tool in the realm of web development!

Imagine being able to seamlessly modify the flow of requests, crafting an intricate dance between your script and the web server. With each intercepted request, you’re not just automating tasks; you’re orchestrating a symphony of digital interactions.

So, dive in, explore the possibilities, and let your creativity run wild. Whether you’re an experienced developer or just beginning your journey into web automation, the journey ahead promises excitement, discovery, and limitless opportunities for innovation. Happy coding!

 

Saairaam Prasad KV

+ posts
Privacy Preferences
When you visit our website, it may store information through your browser from specific services, usually in form of cookies. Here you can change your privacy preferences. Please note that blocking some types of cookies may impact your experience on our website and the services we offer.