How to pretty print size in bytes using AWK

Pretty print size in bytes using AWK.

I will use mawk as it is installed by default on Debian based distributions.

$ awk -W version
mawk 1.3.3 Nov 1996, Copyright (C) Michael D. Brennan

compiled limits:
max NF             32767
sprintf buffer      2040

AWK script

Sample AWK script that defines covert function.

function convert(value, base, suffixes_string, skip) {
  suffixes_len = split(suffixes_string, suffixes)
 
  n_suffix = 1 + skip; 
  while(value >= base && n_suffix < suffixes_len) {value /= base; n_suffix++}

  return sprintf("%.2f %s", value, suffixes[n_suffix])
}

function convert1000(value, skip) {
  return convert(value,1000,"B kB MB GB TB PB EB ZB YB", skip)
}

function convert1024(value, skip) {
  return convert(value,1024,"B KiB MiB GiB TiB PiB EiB ZiB YiB", skip)
}

BEGIN { 
  split("4096 273001 4030110 8020090009",values)
  for(i in values) { 
    printf("%s bytes is %s (base 1024) or %s (base 1000)\n", values[i], convert1024(values[i]), convert1000(values[i]))
  }
}

Scipt output.

$ awk -f ~/awk/convert.awk
4096 bytes is 4.00 KiB (base 1024) or 4.10 kB (base 1000)
273001 bytes is 266.60 KiB (base 1024) or 273.00 kB (base 1000)
4030110 bytes is 3.84 MiB (base 1024) or 4.03 MB (base 1000)
8020090009 bytes is 7.47 GiB (base 1024) or 8.02 GB (base 1000)

AWK one-liner

Now you can use this knowledge to create simple one-liner.

$ ls -l /usr/share/doc/debian | sed 1d
-rw-r--r-- 1 root root  2095 Aug 28  2016 bug-log-access.txt
-rw-r--r-- 1 root root  3115 Aug 28  2016 bug-log-mailserver.txt.gz
-rw-r--r-- 1 root root  2384 Aug 28  2016 bug-mailserver-refcard.txt
-rw-r--r-- 1 root root  6092 Aug 28  2016 bug-maint-info.txt.gz
-rw-r--r-- 1 root root  6246 Aug 28  2016 bug-maint-mailcontrol.txt.gz
-rw-r--r-- 1 root root  5985 Aug 28  2016 bug-reporting.txt.gz
-rw-r--r-- 1 root root  9580 Aug 28  2016 constitution.1.0.txt.gz
-rw-r--r-- 1 root root  9352 Aug 28  2016 constitution.1.1.txt.gz
-rw-r--r-- 1 root root  9523 Aug 28  2016 constitution.1.2.txt.gz
-rw-r--r-- 1 root root  9776 Aug 28  2016 constitution.1.3.txt.gz
-rw-r--r-- 1 root root  9785 Aug 28  2016 constitution.1.4.txt.gz
-rw-r--r-- 1 root root 10025 Aug 28  2016 constitution.1.5.txt.gz
-rw-r--r-- 1 root root 10027 Aug 28  2016 constitution.1.6.txt.gz
-rw-r--r-- 1 root root  9993 Aug 28  2016 constitution.txt.gz
-rw-r--r-- 1 root root  2966 Dec 24  2013 debian-manifesto.gz
drwxr-xr-x 2 root root  4096 Aug  5 17:33 FAQ
-rw-r--r-- 1 root root 14230 Aug 28  2016 mailing-lists.txt.gz
-rw-r--r-- 1 root root  2563 Aug 28  2016 social-contract.1.0.txt.gz
-rw-r--r-- 1 root root  2549 Aug 28  2016 social-contract.txt.gz
-rw-r--r-- 1 root root  2592 Dec 24  2013 source-unpack.txt
$ ls -l /usr/share/doc/debian/ | sed 1d | awk 'function convert(v) {s_len = split("B KiB MiB GiB TiB PiB EiB ZiB YiB", s); n = 1 + skip; while(v >= 1024 && n < s_len) {v /= 1024; n++}; return sprintf("%7.2f %3s", v, s[n])} {$5=convert($5); print}'
-rw-r--r-- 1 root root    2.05 KiB Aug 28 2016 bug-log-access.txt
-rw-r--r-- 1 root root    3.04 KiB Aug 28 2016 bug-log-mailserver.txt.gz
-rw-r--r-- 1 root root    2.33 KiB Aug 28 2016 bug-mailserver-refcard.txt
-rw-r--r-- 1 root root    5.95 KiB Aug 28 2016 bug-maint-info.txt.gz
-rw-r--r-- 1 root root    6.10 KiB Aug 28 2016 bug-maint-mailcontrol.txt.gz
-rw-r--r-- 1 root root    5.84 KiB Aug 28 2016 bug-reporting.txt.gz
-rw-r--r-- 1 root root    9.36 KiB Aug 28 2016 constitution.1.0.txt.gz
-rw-r--r-- 1 root root    9.13 KiB Aug 28 2016 constitution.1.1.txt.gz
-rw-r--r-- 1 root root    9.30 KiB Aug 28 2016 constitution.1.2.txt.gz
-rw-r--r-- 1 root root    9.55 KiB Aug 28 2016 constitution.1.3.txt.gz
-rw-r--r-- 1 root root    9.56 KiB Aug 28 2016 constitution.1.4.txt.gz
-rw-r--r-- 1 root root    9.79 KiB Aug 28 2016 constitution.1.5.txt.gz
-rw-r--r-- 1 root root    9.79 KiB Aug 28 2016 constitution.1.6.txt.gz
-rw-r--r-- 1 root root    9.76 KiB Aug 28 2016 constitution.txt.gz
-rw-r--r-- 1 root root    2.90 KiB Dec 24 2013 debian-manifesto.gz
drwxr-xr-x 2 root root    4.00 KiB Aug 5 17:33 FAQ
-rw-r--r-- 1 root root   13.90 KiB Aug 28 2016 mailing-lists.txt.gz
-rw-r--r-- 1 root root    2.50 KiB Aug 28 2016 social-contract.1.0.txt.gz
-rw-r--r-- 1 root root    2.49 KiB Aug 28 2016 social-contract.txt.gz
-rw-r--r-- 1 root root    2.53 KiB Dec 24 2013 source-unpack.txt
$ ls -lh /usr/share/doc/debian | sed 1d
-rw-r--r-- 1 root root 2.1K Aug 28  2016 bug-log-access.txt
-rw-r--r-- 1 root root 3.1K Aug 28  2016 bug-log-mailserver.txt.gz
-rw-r--r-- 1 root root 2.4K Aug 28  2016 bug-mailserver-refcard.txt
-rw-r--r-- 1 root root 6.0K Aug 28  2016 bug-maint-info.txt.gz
-rw-r--r-- 1 root root 6.1K Aug 28  2016 bug-maint-mailcontrol.txt.gz
-rw-r--r-- 1 root root 5.9K Aug 28  2016 bug-reporting.txt.gz
-rw-r--r-- 1 root root 9.4K Aug 28  2016 constitution.1.0.txt.gz
-rw-r--r-- 1 root root 9.2K Aug 28  2016 constitution.1.1.txt.gz
-rw-r--r-- 1 root root 9.3K Aug 28  2016 constitution.1.2.txt.gz
-rw-r--r-- 1 root root 9.6K Aug 28  2016 constitution.1.3.txt.gz
-rw-r--r-- 1 root root 9.6K Aug 28  2016 constitution.1.4.txt.gz
-rw-r--r-- 1 root root 9.8K Aug 28  2016 constitution.1.5.txt.gz
-rw-r--r-- 1 root root 9.8K Aug 28  2016 constitution.1.6.txt.gz
-rw-r--r-- 1 root root 9.8K Aug 28  2016 constitution.txt.gz
-rw-r--r-- 1 root root 2.9K Dec 24  2013 debian-manifesto.gz
drwxr-xr-x 2 root root 4.0K Aug  5 17:33 FAQ
-rw-r--r-- 1 root root  14K Aug 28  2016 mailing-lists.txt.gz
-rw-r--r-- 1 root root 2.6K Aug 28  2016 social-contract.1.0.txt.gz
-rw-r--r-- 1 root root 2.5K Aug 28  2016 social-contract.txt.gz
-rw-r--r-- 1 root root 2.6K Dec 24  2013 source-unpack.txt

Additional notes

Use skip parameter if value is not defined in bytes, but kB/KiB (skip: 1), MB/MiB (skip: 2) and so on.

Feel free to play with value >= base condition inside while loop, you can modifi it to value >= base * 0.8 to see 0.98 KiB instead of 1001.00 B.

To make it more interesting you can use value > 10^int(log(base)/log(10)) condition inside while loop, so it will be 1000 for base 1000/1024 and will scale according to the used base.

$ awk 'BEGIN{print 10^int(log(5)/log(10))}'
1
$ awk 'BEGIN{print 10^int(log(58)/log(10))}'
10
$ awk 'BEGIN{print 10^int(log(748)/log(10))}'
100
$ awk 'BEGIN{print 10^int(log(1020)/log(10))}'
1000

... and so on.