How to remotely archive websites using ArchiveBox

Use nginx and Lua module to remotely archive websites using ArchiveBox.

Install nginx web-server and Lua module.

$ sudo apt install nginx libnginx-mod-http-lua

Disable default configuration.

$ sudo unlink /etc/nginx/sites-enabled/default

Create /etc/nginx/sites-available/archivebox configuration file.

This is a very simple and naive solution using POST request, specific URL /archive_url, secret token secret_token and url parameter. Remember to create and configure SSL certificate.
server {
  listen 80;
  server_name _;

  root /srv/archivebox/output/; 
  index index.html;

  location / {
    try_files $uri $uri/ =404;
  }

  location /archive/ {
    autoindex on;
  }

  location /archive_url {
    default_type text/plain;
    charset utf8;

    content_by_lua_block{
      local method = ngx.var.request_method
      if method == "POST" then
        ngx.req.read_body()
        local args = ngx.req.get_post_args()
        if args["token"] == "secret_token" and args["url"] ~= nil then
          local url = string.gsub(args["url"], "%s+", '%%20')
          local exec = assert(io.popen("cd /srv/archivebox/; export $(grep -v '^#' etc/ArchiveBox.conf | xargs); echo " .. url .. " | /srv/archivebox/archive", 'r'))
          local output = assert(exec:read('*a'))
          exec:close()
          ngx.log(ngx.INFO, args["token"], output)
          ngx.say(output)
        end
      else
        ngx.status = 404
        ngx.exit(404)
      end
    }
  }
}

Enable this specific configuration.

$ sudo ln -s /etc/nginx/sites-available/archivebox /etc/nginx/sites-enabled/

Reload nginx service.

$ sudo systemctl reload nginx

Use curl to archive specific URL.

$ curl -X POST http://archivebox.example.org/archive_url -d 'token=secret_token&url=https://www.debian.org/'
[*] [2019-06-16 22:17:45] Parsing new links from output/sources/stdin-1560723465.txt...
    > Adding 1 new links to index (parsed import as Plain Text)
[*] [2019-06-16 22:17:45] Saving main index files...
    √ output/index.json
    √ output/index.html
[▶] [2019-06-16 22:17:45] Updating content for 1 pages in archive...

[+] [2019-06-16 22:17:45] "https://www.debian.org/"
    https://www.debian.org/
    > output/archive/1560723465
      > title
      > favicon
      > wget
      > pdf
      > screenshot
      > dom
      > archive_org
[√] [2019-06-16 22:17:53] Update of 1 pages complete (7.35 sec)
    - 0 links skipped
    - 1 links updated
    - 0 links had errors
    To view your archive, open: output/index.html
[*] [2019-06-16 22:17:53] Saving main index files...
    √ output/index.json
    √ output/index.html

This is really cool. You can easily extend this configuration to support every possible action.