How to check website for broken links

Check website for broken links using HTTrack website copier.

Install HTTrack.

$ sudo apt-get install -y httrack

Check website for broken links using 8 concurrent connections, /tmp/spider-check temporary directory, ignoring robots.txt and logging to /tmp/spider-check.log file.

$ httrack --spider --robots=0 --sockets=8 --path /tmp/spider-check --verbose https://blog.sleeplessbeastie.eu/ | tee /tmp/spider-check.log
HTTrack3.49-2 launched on Sun, 04 Nov 2018 21:56:47 at https://blog.sleeplessbeastie.eu/
(httrack -p0C0I0t -s0 -c8 -O /tmp/spider-check -v https://blog.sleeplessbeastie.eu/ )

Information, Warnings and Errors reported for this mirror:
note:   the hts-log.txt file, and hts-cache folder, may contain sensitive information,
        such as username/password authentication for websites mirrored in this project
        do not share these files/folders if you want these information to remain private

Mirror launched on Sun, 04 Nov 2018 21:56:47 by HTTrack Website Copier/3.49-2 [XR&CO'2014]
mirroring https://blog.sleeplessbeastie.eu/ with the wizard help..
21:59:18        Warning:        Retry after error -4 (Incorrect length (0 Bytes, 790 expected)) at link https://blog.sleeplessbeastie.eu/assets/uploads/2016/07/awstats.simple.patch (from https://blog.sleeplessbeastie.eu/2016/07/04/how-to-add-menu-to-awstats-web-interface/)
21:59:18        Warning:        Retry after error -4 (Incorrect length (0 Bytes, 772 expected)) at link https://blog.sleeplessbeastie.eu/assets/uploads/2016/07/awstats.advanced.patch (from https://blog.sleeplessbeastie.eu/2016/07/04/how-to-add-menu-to-awstats-web-interface/)
21:59:18        Warning:        Retry after error -4 (Incorrect length (0 Bytes, 3592696 expected)) at link https://blog.sleeplessbeastie.eu/assets/uploads/2012/07/WAG120N-EU-ANNEXA-ETSI-1.00.16code.bin.7z (from https://blog.sleeplessbeastie.eu/2012/07/28/how-to-reboot-linksys-wag120n-router/)
21:59:18        Warning:        Retry after error -4 (Incorrect length (0 Bytes, 3589948 expected)) at link https://blog.sleeplessbeastie.eu/assets/uploads/2012/07/WAG120N-EU-ANNEXB-ETSI-1.00.16code.bin.7z (from https://blog.sleeplessbeastie.eu/2012/07/28/how-to-reboot-linksys-wag120n-router/)
21:59:19        Error:  "Not Found" (404) at link https://blog.sleeplessbeastie.eu/2018/03/28/how-to-install-docker-on-debian-stretch/ (from https://blog.sleeplessbeastie.eu/2018/04/16/how-to-setup-private-docker-registry/)
21:59:19        Error:  "Not Found" (404) at link https://blog.sleeplessbeastie.eu/privacy/ (from https://blog.sleeplessbeastie.eu/2013/06/16/wordpress-to-jekyll-migration/)
21:59:19        Warning:        Retry after error -4 (Incorrect length (0 Bytes, 790 expected)) at link https://blog.sleeplessbeastie.eu/assets/uploads/2016/07/awstats.simple.patch (from https://blog.sleeplessbeastie.eu/2016/07/04/how-to-add-menu-to-awstats-web-interface/)
21:59:19        Warning:        Retry after error -4 (Incorrect length (0 Bytes, 772 expected)) at link https://blog.sleeplessbeastie.eu/assets/uploads/2016/07/awstats.advanced.patch (from https://blog.sleeplessbeastie.eu/2016/07/04/how-to-add-menu-to-awstats-web-interface/)
21:59:19        Warning:        Retry after error -4 (Incorrect length (0 Bytes, 3592696 expected)) at link https://blog.sleeplessbeastie.eu/assets/uploads/2012/07/WAG120N-EU-ANNEXA-ETSI-1.00.16code.bin.7z (from https://blog.sleeplessbeastie.eu/2012/07/28/how-to-reboot-linksys-wag120n-router/)
21:59:19        Warning:        Retry after error -4 (Incorrect length (0 Bytes, 3589948 expected)) at link https://blog.sleeplessbeastie.eu/assets/uploads/2012/07/WAG120N-EU-ANNEXB-ETSI-1.00.16code.bin.7z (from https://blog.sleeplessbeastie.eu/2012/07/28/how-to-reboot-linksys-wag120n-router/)
21:59:20        Error:  "Incorrect length (0 Bytes, 790 expected)" (-4) after 2 retries at link https://blog.sleeplessbeastie.eu/assets/uploads/2016/07/awstats.simple.patch (from https://blog.sleeplessbeastie.eu/2016/07/04/how-to-add-menu-to-awstats-web-interface/)
21:59:20        Error:  "Incorrect length (0 Bytes, 772 expected)" (-4) after 2 retries at link https://blog.sleeplessbeastie.eu/assets/uploads/2016/07/awstats.advanced.patch (from https://blog.sleeplessbeastie.eu/2016/07/04/how-to-add-menu-to-awstats-web-interface/)
21:59:20        Error:  "Incorrect length (0 Bytes, 3592696 expected)" (-4) after 2 retries at link https://blog.sleeplessbeastie.eu/assets/uploads/2012/07/WAG120N-EU-ANNEXA-ETSI-1.00.16code.bin.7z (from
https://blog.sleeplessbeastie.eu/2012/07/28/how-to-reboot-linksys-wag120n-router/)
21:59:20        Error:  "Incorrect length (0 Bytes, 3589948 expected)" (-4) after 2 retries at link https://blog.sleeplessbeastie.eu/assets/uploads/2012/07/WAG120N-EU-ANNEXB-ETSI-1.00.16code.bin.7z (from
https://blog.sleeplessbeastie.eu/2012/07/28/how-to-reboot-linksys-wag120n-router/)

HTTrack Website Copier/3.49-2 mirror complete in 2 minutes 33 seconds : 598 links scanned, 584 files written (11857590 bytes overall) [3024769 bytes received at 19769 bytes/sec], 11875420 bytes transferred using HTTP compression in 586 files, ratio 23%, 49.8 requests per connection
(6 errors, 8 warnings, 0 messages)
Done.
Thanks for using HTTrack!

Inspect /tmp/spider-check.log log file for details.

$ grep "(...)" /tmp/spider-ckeck.log 
21:59:19        Error:  "Not Found" (404) at link https://blog.sleeplessbeastie.eu/2018/03/28/how-to-install-docker-on-debian-stretch/ (from https://blog.sleeplessbeastie.eu/2018/04/16/how-to-setup-private-docker-registry/)
21:59:19        Error:  "Not Found" (404) at link https://blog.sleeplessbeastie.eu/privacy/ (from https://blog.sleeplessbeastie.eu/2013/06/16/wordpress-to-jekyll-migration/)

Get status.

$ tail -3 /tmp/spider-ckeck.log | head -1
(6 errors, 8 warnings, 0 messages)

Filter by mime type to skip these non important errors and focus on links.

$ httrack --spider --robots=0 --sockets=8 --path /tmp/spider-check --verbose https://blog.sleeplessbeastie.eu/  -mime:* +mime:text/*
HTTrack3.49-2 launched on Sun, 04 Nov 2018 22:15:10 at https://blog.sleeplessbeastie.eu/ -mime:* +mime:text/*
(httrack -p0C0I0t -s0 -c8 -O /tmp/spider-check -v https://blog.sleeplessbeastie.eu/ -mime:* +mime:text/* )

Information, Warnings and Errors reported for this mirror:
note:   the hts-log.txt file, and hts-cache folder, may contain sensitive information,
        such as username/password authentication for websites mirrored in this project
        do not share these files/folders if you want these information to remain private

Mirror launched on Sun, 04 Nov 2018 22:15:10 by HTTrack Website Copier/3.49-2 [XR&CO'2014]
mirroring https://blog.sleeplessbeastie.eu/ -mime:* +mime:text/* with the wizard help..
22:17:39 https:/Error: l"Not Found" (404) at link https://blog.sleeplessbeastie.eu/2018/03/28/how-to-install-docker-on-debian-stretch/ (from https://blog.sleeplessbeastie.eu/2018/04/16/how-to-setup-private-docker-registry/)
22:17:39 https:/Error: l"Not Found" (404) at link https://blog.sleeplessbeastie.eu/privacy/ (from https://blog.sleeplessbeastie.eu/2013/06/16/wordpress-to-jekyll-migration/)

HTTrack Website Copier/3.49-2 mirror complete in 2 minutes 29 seconds : 584 links scanned, 582 files written (11107549 bytes overall) [2931119 bytes received at 19671 bytes/sec], 11125379 bytes transferred using HTTP compression in 584 files, ratio 23%, 73.0 requests per connection
(2 errors, 0 warnings, 0 messages)
Done.
Thanks for using HTTrack!

Remove log and temporary directory.

$ rm /tmp/spider-check.log
$ rm -rf /tmp/spider-check