Google developed Puppeteer, a Node library, which offers a high-level API for controlling headless or full browsers via the DevTools Protocol. One powerful feature of Puppeteer is the ability to intercept and manipulate network requests, allowing developers to customize requests, modify responses, and control data flow during web scraping or automation.
Understanding Request Interception
Request interception in Puppeteer allows you to observe, modify, or block outgoing HTTP requests and incoming responses. This feature is handy when optimizing page loading, simulating various network conditions, or handling dynamic content loading.
Enabling Request Interception
To activate request interception in Puppeteer, you employ two functions: page.setRequestInterception()
and page.on().
This involves three essential steps:
- Use activate request interception on the page:
page.setRequestInterception(true).
- The system captures all requests made on the site and emits an event for each network request.
- Use
page.on('response')
to capture all API responses on the site.
await page.setRequestInterception(true);
page.on('request', (request) => {
// Your custom logic here
request.continue();
});
page.on('response', (response) => {
// Your response handling logic here
})
Modifying Requests
Request interception facilitates modification of outgoing requests’ properties, such as setting custom headers, altering request methods, or adjusting the request payload.
page.on('request', (request) => {
const headers = request.headers();
headers['Authorization'] = 'Bearer YOUR_TOKEN';
request.continue({ headers });
});
In this example, we add an Authorization header to each outgoing request.
Blocking Requests
Another powerful aspect of request interception is the ability to block specific requests based on certain conditions.
page.on('request', (request) => {
if (request.url().includes('blocked-resource')) {
request.abort();
} else {
request.continue();
}
});
In this instance, requests to a resource containing ‘blocked-resource’ in its URL are blocked.
Real-world Examples
Let’s explore practical use cases for request interception in Puppeteer:
-
Dynamic Content Loading
Many modern websites load content dynamically via AJAX requests. Intercepting these requests allows you to pause your automation until specific data has been loaded.
page.on('request', async (request) => {
if (request.url().includes('dynamic-content')) {
await request.continue();
await page.waitForSelector('.loaded-element');
} else {
request.continue();
}
});
This example waits for an element with the class ‘loaded-element’ to appear after intercepting a request to a URL containing ‘dynamic-content.’
-
API Mocking
During testing, it may be necessary to simulate different scenarios by mocking API responses. Request interception facilitates this process.
page.on('request', (request) => {
if (request.url().includes('mock-api')) {
request.respond({
status: 200,
contentType: 'application/json',
body: JSON.stringify({ mockData: true }),
});
} else {
request.continue();
}
});
Here, any request to a URL containing ‘mock-api’ will receive a mocked JSON response.
Note:
Keep in mind that Puppeteer’s page.on("request")
only captures requests made using the page
object (e.g., via page.goto
, page.evaluate
, etc.). Puppeteer captures XHR and fetches requests made within the page’s context but may not intercept requests initiated outside the page’s context, such as within an iframe or by injected scripts.
An alternate way to access the request and responses without request interception
As previously mentioned, we can capture the request and make a fetch call to obtain a new request with a modified payload.
Practical implementations for the alternative approach
Now let’s start the implementation of request interception on the IRCTC website.
const puppeteer = require("puppeteer-extra");
const pluginStealth = require("puppeteer-extra-plugin-stealth")();
puppeteer.use(pluginStealth);
const scrape = async () => {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: null,
});
const page = await browser.newPage();
await page.goto("https://www.irctc.co.in/nget/train-search", {
waitUntil: "networkidle0",
});
await page.type("#destination > span > input", "MAS");
await page.keyboard.press("ArrowDown");
await page.keyboard.press("Enter");
await page.type("#origin > span > input", "KRR");
await page.keyboard.press("ArrowDown");
let headers;
page.on("response", async (response) => {
if (
response
.request()
.url()
.includes(
"https://www.irctc.co.in/eticketing/protected/mapps1/altAvlEnq/TC"
)
) {
headers = response.request().headers();
const apiRes = await fetch(
"https://www.irctc.co.in/eticketing/protected/mapps1/altAvlEnq/TC",
{
headers,
body: '{"concessionBooking":false,"srcStn":"MAS","destStn":"MMCT","jrnyClass":"","jrnyDate":"20240225","quotaCode":"GN","currentBooking":"false","flexiFlag":false,"handicapFlag":false,"ticketType":"E","loyaltyRedemptionBooking":false,"ftBooking":false}',
method: "POST",
credentials: "omit",
}
);
console.log(await apiRes.json());
}
});
await page.keyboard.press("Enter");
await page.click("[label='Find Trains']");
};
scrape();
In the above code, we would have accessed the response emitter and then entered the destination station as KRR. However, in the API fetch call body, we are using MMCT for the destination station. Thus, we get the response as per the body and can access the data accordingly.
Note: Sometimes the above code doesn’t work because the IRCTC asks for a login; in such cases, wait for some time and try again later.
Conclusion
Delving into the realm of Puppeteer’s request interception opens up a treasure trove of possibilities for web automation and testing. Picture this: you have the power to tweak headers, intercept and block specific requests, or even simulate diverse network conditions at your fingertips. It’s akin to possessing a powerful tool in the realm of web development!
Imagine being able to seamlessly modify the flow of requests, crafting an intricate dance between your script and the web server. With each intercepted request, you’re not just automating tasks; you’re orchestrating a symphony of digital interactions.
So, dive in, explore the possibilities, and let your creativity run wild. Whether you’re an experienced developer or just beginning your journey into web automation, the journey ahead promises excitement, discovery, and limitless opportunities for innovation. Happy coding!