I am trying to create a website monitoring webapp using PHP. At the minute I’m using curl to collect headers from different websites and update a MySQL database when a website’s status changes (e.g. if a site that was ‘up’ goes ‘down’).
I’m using curl_multi (via the Rolling Curl X class which I’ve adapted slightly) to process 20 sites in parallel (which seems to give the fastest results) and CURLOPT_NOBODY to make sure only headers are collected and I’ve tried to streamline the script to make it as fast as possible.
It is working OK and I can process 40 sites in approx. 2-4 seconds. My plan has been to run the script via cron every minute… so it looks like I will be able to process about 600 websites per minute. Although this is fine at the minute it won’t be enough in the long term.
So how can I scale this? Is it possible to run multiple crons in parallel or will this run into bottle-necking issues?
Off the top of my head I was thinking that I could maybe break the database into groups of 400 and run a separate script for these groups (e.g. ids 1-400, 401-800, 801-1200 etc. could run separate scripts) so there would be no danger of database corruption. This way each script would be completed within a minute.
However it feels like this might not work since the one script running curl_multi seems to max out performance at 20 requests in parallel. So will this work or is there a better approach?