How to preload your page cache using this handy little script.
It’s great for any scenario where you need automated cache-warming:
- Sites using cache without a built-in cache prebuild function.
- Large sites (with many pages) and want to prebuild more often.
- Server admins cache warming for many sites.
- Those of you on restricted webhosts using proprietary caching (Kinsta, WPEngine, etc) can definitely use this to precache your sites.
The only requirements are that you have server access or cron jobs, and that your site has an XML sitemap.
Optimus Cache Prime (by Patrick Mylund Nielson)
Before we start, let’s give a special thanks to Patrick for writing this awesome script many years ago. Also a special thanks to Fabio Tielen of https://codeagency.be/ for answering n00b questions I had.
OCP basically crawls all the pages in your XML sitemap to prebuild your cache. This helps avoid the slow “uncached hits” effect on sites without cache prebuild function, or sites with too many pages to prebuild quick enough.
Installing and testing OCP:
- Download OCP from https://patrickmn.com/projects/ocp/
- Extract on your server. (Can be root for self-managed servers, and user directory for shared servers.)
- Test it with
/path/to/ocp -v https://yoursite.com/sitemap.xml
and see if verbose output is correct. - If working, you can set a cron job for it to
/path/to/ocp https://yoursite.com/sitemap.xml
- If you’re curious, try
tail -f /var/log/cron
to see if it’s runs.
I was using CentOS 7, WHM/cPanel. Fabio was on Ubuntu and using something like cd /var/www/ocp/; ./ocp -v=true https://yoursite.com/sitemap.xml
but I found it unnecessary that way.
FYI: the commands above are only possible for those with SSH access to the server. If you don’t have SSH access, you can still use this script as a cron job.
Running OCP as cron job:
Sure, you can simply cron the command above but it’s not recommended for large sites that don’t finish pre-caching before the next cron interval. I like to do a check every 5 or 10 minutes but some sites might take an hour or more to pre-cache, which means you run the risk of multiple cron jobs hitting the same script.
So to get around this, we use one.sh (another neat little script by Patrick). It will run commands but first checks to make sure they’re not already running…which is ideal for triggering frequent cron runs without the risk of multiple executions. It’s super handy not only for OCP but other scripts as well.)
- Download one.sh from https://patrickmn.com/projects/one/
- Extract on your server. (I like to put it in the same OCP directory.)
- Set a cron job for
/path/to/one.sh /path/to/ocp https://yoursite.com/sitemap.xml
Other tips:
- If your script doesn’t run, make sure you have the right path and also proper permissions (755).
- You should read the documentation to see the different commands available.
- I listed the commands using HTTPS domain path to make this guide easier for non-Linux people, but large sites should go the local file path (“/path/to/sitemap.xml”) for better performance since it negates DNS lookup and HTTPS handshake.
- To prevent high server load, use the
-c
option to limit how many pages are crawled at once. It will look something likeocp -c 5 https://yoursite.com/sitemap.xml
which means 5 pages at a time. - Thoughts on running cron as root.
Leif @ CdnGuide
Just a thought, could I run this on one server, and run cron jobs for multiple domains including domains on other servers to? Since the cron is written with https://example.com/sitemap.xml then. It should work, right?
Yin
Absolutely! You can designate one server as a cache-warming server for all other sites. I’ve already schemed way ahead of you….making a central sitemap where other sitemaps can be added to it easily and not requiring extra cron jobs. Hahaha. Anyway…if there’s any issue, it might be a connection limit issue as most server security won’t allow one IP to make so many connections since that might be considered a form of DDOS attack. So in that case, doing it from the local server may be less hassle. But do play around and let me know what you find. 🙂
Leif @ CdnGuide
So if I set up a small server in a/couple of regions and run it combined with a nice website and some automation, one might have a CDN/Website cache warmer sidehusstle business then .. 😉 .. Just some food for thoughts ..
Could make some money here .. 😉
George
Yes I already do this with my custom cache preloader script for 32+ geographical locations to warm up Cloudflare edge caches for non-Enterprise plans. For Cloudflare Enterprise, I use their native Cloudflare cache prefetcher as it does a better job of pre-warming all 200+ CF datacenters https://support.cloudflare.com/hc/en-us/articles/206776707-Does-Cloudflare-Do-Prefetching- 🙂
Thanks Yin, going to give OCP a try too
Leif @ CdnGuide
Yes, I know about CF, but a lot of us just use CF for DNS and another CDN for speed, checkout BunnyCDN´s new Perma-Cache function, a better choice in my eye´s .. =)
.
Tom
Thanks Yin. If I run Litespeed with crawler set to 10 minutes, do I need to run this too?
Yin
You don’t need OCP. But if you want something faster and more aggressive than LS crawler, OCP is nice.
Martin
Hi. Could you kindly provide an example of a cron job that can be used in Cpanel?
Yin
I already did! A cron job is just a scheduled server command.
Martin
What I mean is something a beginner like me can just copy, edit and paste to cpanel, e.g using this example /path/to/one.sh /path/to/ocp https://yoursite.com/sitemap.xml, and if i uploaded the two files above inside a folder called ‘example’ in wp-content on a litespeed shared server.
Yin
Go to your File Manager and you’ll see the path. It’s probably something like “/home/user/public_html/wp-content/example/ocp”.
Martin
So is this cron job example correct? /home/user/public_html/wp-content/example/one.sh /home/user/public_html/wp-content/example/ocp https://yoursite.com/sitemap.xml
Yin
Assuming the paths are correct, that looks what I showed in my guide. Try and see.
Martin
I just saw this error via email from the cron==> /usr/local/cpanel/bin/jailshell: /home/user/public_html/wp-content/example/one.sh: Permission denied.
Yin
That sounds like your server is blocking SSH. Maybe for whole account, maybe for only that directory or just that type of file. You’ll have to ask support. Or yes…permissions (try 644/755).
Martin
Last Question (I changed one.sh permissions to 755), What about ==> /home/user/public_html/wp-content/example/one.sh: line 26: /home/user/public_html/wp-content/example/ocp: Is a directory.
Yin
That sounds to me it’s complaining about whatever you put on line 26.
Martin
I just uploaded the files ‘as is’ in one directory. Or perhaps there is an issue with the files? I am on shared hosting (my blog is educational and I don’t make money off it), so the LS crawler cannot be activated.
Yin
I really wouldn’t know unless I looked in there myself and saw what you did. I’m sorry but I can’t help from here.
Duc
Hello,
my server is nginx and I got 403 error when I run ocp.
Bad response for http://domain.com/: 403 Forbidden
What could be the problem and how can I fix this? Thanks
Yin
I wouldn’t know without seeing exactly for myself. What you did, what you put, other environment settings, etc.
Duc
My bad, i can run it now and the cache folder is filling up, so it seems to work. Thank you for this very useful tool.
Duc
It would be nice if there is an option to run the job reverse, so I can run 2 jobs at the same time (1 from a-z, and 1 from z-a), combined with the -l flag to skip already cached url, it will be 2x faster.
Yin
I really don’t think you need that. How many pages do you have? If you have tons of traffic, the traffic will warm it anyway. If you don’t have lots of traffic, then nobody will notice the longer crawl time. Just having an auto-crawler is enough already.
Duc
I have about 80.000 Urls, after warm up the cache folder size is about 18 GB and I have to let the task run overnight. If I let the traffic warm the cache, it used to take 6-7 months. But, I need to rewarm the cache every 2-3 months after updating theme and plugins. And it is a long running task, I only run it manually when needed, so warming with traffic or cronjob is not really a solution for me. So the reverse option is only my suggestion (may be useful for similar use case), because OCP already does the job very good.
Yin
In a moment like this, I might consider warming only part of your site and not the entire sitemap. Or perhaps split your sitemap into parts and then have a separate OCP job for each one.
Duc
Thats also possible, the plugin already splitted main_sitemap into 80 sub_sitemaps, still too much work. But I found a work around, I downloaded the main sitemap (contains only subsitemap urls) and reverse the content order, then run locally 2 OCP jobs with the original sitemap and the edited one.
Percy King
Hye, @Yin sorry if this a newb question.
1) I have an ecommerce site, I would like to know if this will also keep cache warm for both logged-in and logged-out users. Or does it only keep cache warm for site visitors that aren’t logged in?
2) Will this be of benefit to my site users who navigate my website via their mobile devices or just will benefit users who use desktop to navigate my website?
Yin
1. It’s a good question and feature request but I’d say it warms only for logged-out visitors.
2. Of course, should benefit for all users (especially when assuming you’re not using mobile separate design) unless your cache mechanism for whatever reason doesn’t build both desktop/mobile cache versions together.
Richie
Hi Yin,
An interesting read, thank you.
May I expose my ignorance with a quick question? 😉 Will OCP still be effective if a CDN is being used? Would any changes to the guidance above be required?
Thanks.
Yin
OCP is for pre-warming local cache. CDN is delivering assets to faraway visitors via local mirrors. Could OCP be used to pre-warm a PULL-CDN cache? Sure. Anyway, for things like this…it’s best you test and see for yourself. Much easier that way.
Richie
Ah, I feared it might be something like that. Thanks for your feedback.
Michel
Hi Yin, Looks like a great way of warming up cache, but I can’t seem to get it to work.
I put the files in public_html/wp-content/ocp and I am on a LiteSpeed server. I can’t seem to find the correct path so it will work.
/home/yoursite/public_html/wp-content/ocp -v https://yoursite.com/sitemap.xml
What is the exact cronjob URL I need to set? For both OCP and ONE?
Hope you can help!
Yin
This guide literally spells everything out. I don’t understand where your problem is. Maybe you can post in the FB group and share screenshots of exactly what you put down and then someone can correct your command to syntax, etc.
Sovichetra
How about my sitemap that contain thousan of url? It’s taking longer to done that? Any solution to generate sitemap that containt only latest post and some manualy page?