How to Find the Largest, Most Bandwidth-Heavy Files on Your Site

We recently experienced a large increase in traffic on Paleo Porn and received a notice from Synthesis that we were on pace to exceed our monthly bandwidth allotment.

At first glance it’s a simple fix. Simply look at the most heavily trafficked pages and posts and find the largest files on those pages and reduce or replace them. Unfortunately for us our most heavily trafficked posts yielded no such culprits.

And with over 400 posts discovering those needles in the haystack are quite a challenge. And unfortunately, due to the nature of WordPress’ image handling and automatic crop and scale, I couldn’t simply search our file server for the largest images. There is no good method in WordPress or available via an SSH terminal for discovering which pages or posts have large images inserted.

Instead I went ahead and determined this using a more analog method. It’s not highly technical by any means, but a solution nonetheless.

  1. Install SiteSucker, an OSX app that will download the HTML of an entire site. For those of you using a Windows machine, first I’m sorry to hear that ;), but you’ll want to give HTTrack a try.

  2. Next simply download your entire site using SiteSucker and then, using Finder, perform a search for . (a period) within that newly downloaded directory. Be sure your Finder window is in List View and if you don’t see a column for Size add it by right-clicking on a column heading. Then simply click the Size column and sort in descending order.

  3. Now you have the largest images currently in use on your site listed in descending order. Now open a new Finder window and search again within your downloaded site directory, but this time reference the file names from your first Finder window and search for them one at a time. These searches will yield 2 or more results, the image itself and the HTML document that contains a reference to that image. Using those resulting HTML files you can now pinpoint the precise URLs that have bandwidth-heavy files.

As an added bonus you can also have a look at the Error Log to uncover broken links.