The lazy person’s guide to confirming that a move to a static site worked.

Overview:

Download all relevant URLs

I’m picking one approximate source of truth - the URLs that received impressions in Google Search. Thie list doesn’t need to be comprehensive, just something more than I’d manually pick. In general, any reasonable sample will include URLs from a variety of different templates / sections of the site – and usually problems are not unique to URLs, but rather templates / sections. You can also use a Google Analytics export, for example. I use Search Console.

  1. Verify ownership (if necessary – then wait a few days for the data to appear)
  2. Go to the performance report, pick full time frame (16 months).
  3. Export to CSV file
  4. Done.

something long

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

something long2

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Pros & amateurs

HTML

Pros:

Cons:

Verdict: If all fails, use this.

Append annotation

Setup: append an annotation that doesn’t block the direct clicking of the URL, but which can be caught by a pre-processor and turned into a nofollow link. Use something like # to avoid affecting the destination.

Pros:

Cons:

Verdict: Fine if you don’t strongly care about nofollow (if dropping a nofollow doesn’t bug you, should something fail). Not caring about nofollow seems to go against why I’m setting this up, so … meh.

Prepend annotation

Setup: Prepend an annotation that drops the link should processing fail. Use something like # to break the link completely.

Pros:

Cons:

Verdict: Not having links work for users should processing fails seems annoying. Skip this.

Bounce-pad

Setup: Create a page that redirects to the destination. Block the bounce-URL with robots.txt. On the bounce-URL, recognize a parameter that points at the final URL and redirect as appropriate. Block the bounce URLs with robots.txt to prevent any crawling. Add a noindex, nofollow in case the robots.txt doesn’t get uploaded. Have the site-generator swap out the bounce-URL against a normal nofollow link.

Pros:

Cons:

Verdict: Works for me. It’s implemented here.

Don’t forget

If you’re messing with nofollow links on your site, make sure to add nofollow link highlighting to your browser, and add nofollow link-highlighting to your CSS.

Overall

Which one to pick? Up to you.

Convert download into a URL list

Search Console does a funky ZIP file with data in various places. We’ll unzip, and take the URLs out of the CSV file. We’ll drop the rest (you can keep it, I don’t want it).

$ unzip yoursite.com-Performance-on-Search-2021-04-24.zip 

Archive:  yoursite.com-Performance-on-Search-2021-04-24.zip
  inflating: Queries.csv             
  inflating: Pages.csv               
  inflating: Countries.csv           
  inflating: Devices.csv             
  inflating: Search appearance.csv   
  inflating: Dates.csv               
  inflating: Filters.csv             

$ rm Queries.csv && rm Countries.csv && rm Devices.csv && \
  rm "Search appearance.csv" && rm Dates.csv && rm Filters.csv

$

Outcome: we have Pages.csv

Extract URL list

First: Fix the wonky Search Console multi-line CSV file. URLs with spaces in them may be line-wrapped, making it impossible to parse the CSV file on a per-line basis.

prev="" ;
while read line ; 
  do if [[ $line == \"* ]] ; then 
    prev="$line ";
  else 
    echo "$prev$line";
    prev="";
  fi;
done < Pages.csv > PagesClean.csv

If you have access to csvtool, everything is trivial. If you don’t have access, you can simplify the file (drop some URLs) and use the remaining sample. These commands remove the first line, then take the first column from the CSV file and put them into a urls.txt file.

$ # With csvtool
$ csvtool format "%(1)\n" PagesClean.csv | tail -n +2 >urls.txt

$ # Without csvtool (drop lines with quotes)
$ grep -v \" PagesClean.csv | awk -F',' '{print $1}' | tail -n +2 >urls.txt

Outcome: we have urls.txt

Check for http/https redirect (if needed)

If, like me, you were too lazy to move to HTTPS, here’s a way to check for the redirects. This creates a tab separated file with the URL, HTTP status code, and any redirect target.

while read line ; do
    echo -ne "$line\t";
    curl -sI "$line" | grep -E "(^HTTP|^Location)" | tr '\n\r' '\t';
    echo "";
done < urls.txt > urls-result.txt

(The code goes through the list of URLs, checks the header for the URL, and returns the URL, the HTTP result code, and the location field)

We can also just list the ones that have a missing redirect to the HTTPS version:

while read line ; do
  result=$(curl -sI "$line" | grep -E "(^HTTP|^Location)" | tr '\n\r' '\t');
  if [[ ${result} != *" 301 "* ]]; then
    echo "$line - Missing 301: $result";
  else
    httpsurl=$(echo $line | sed "s/http:\/\//https:\/\//");
    if [[ ${result} != *"$httpsurl"* ]]; then
      echo "$line - Wrong redirect: $result";
    fi;
  fi;
done < urls.txt 

(The code goes through the list of URLs, checks the header for the URL, looks for a “301” and checks if the https-version of the URL is in the result)

Check that HTTPS URLs are accessible with a 200 result code

while read line ; do
  httpsurl=$(echo $line | sed "s/http:\/\//https:\/\//");
  result=$(curl -sIL "$httpsurl" | grep -E "(^HTTP)");
  if [[ ${result} != *" 200 "* ]]; then
    echo "$httpsurl - Missing 200: $result";
  fi;
done < urls.txt 

Well, it looks like I still have work to do. :-)