How to extract cover image from an e-book

I have successfully used Google Drive and Insync to organize all of the e-books that I have acquired during last years, but currently plan to upload them to personal DokuWiki instance, since I use it more every day. Before I can start, I need to extract cover images to ensure that I will get a decent outcome.

Requirements

It is absolutely enough to install only ImageMagick package to perform PDF to image conversion.

$ sudo apt-get install imagemagick

Additionally you can install Poppler utilities to get PDF details.

$ sudo apt-get install poppler-utils

Extract single cover image

Use convert utility to convert first page to an image.

$ convert Linux-Voice-Issue-016.pdf[0] Linux-Voice-Issue-016.png

You perform additional operations (like resize in this example) on this image during conversion process.

$ convert Linux-Voice-Issue-016.pdf[0] -resize 200x300 Linux-Voice-Issue-016.png

Notice that from ImageMagick's point of view page numbers start from 0.

Extract multiple cover images

Use simple Bash shell script to extract and store cover images from e-books found in sub-directories.

#!/bin/bash
# Create cover images from e-books in sub-directories
# This shell script is not recursive

# maximum width and height of the output image
maxsize="200x200"

for directory in */;do
  if [ -d "$directory" ]; then
    echo "Processing sub-directory: "${directory%%/}
    mkdir -p "${directory}covers"
    for ebook in "${directory}"*.pdf; do
      ebook="$(basename "$ebook")"
      if [ ! -f "${directory}covers/${ebook%%.pdf}.png" -a -f "${directory}${ebook}" ]; then
        echo "  Processing e-book: $ebook"
        convert "${directory}${ebook}"[0] -resize $maxsize "${directory}covers/${ebook%%.pdf}.png" 2>/dev/null
      fi
    done
  fi
done

The output will look similar to the following.

Processing sub-directory: BSDmag
  Processing e-book: BSD_2008_01.pdf
  Processing e-book: BSD_2008_02.pdf
[...]
Processing sub-directory: LinuxFormat
  Processing e-book: LXF134.complete.pdf
  Processing e-book: LXF135.book.pdf
[...]  
Processing sub-directory: LinuxVoice
  Processing e-book: Linux-Voice-Issue-001.pdf
  Processing e-book: Linux-Voice-Issue-002.pdf
[...]

Simple shell script to generate wiki content

It is just an ugly snippet, but it will help you to quickly build list of PDF files.

#!/bin/bash
# create DokuWiki content
# create list of PDF files in current directory

dir=$(basename $(pwd))

for pdf in *.pdf; do
cat << EOF
{{:bookshelf:$dir:covers:${pdf%%.pdf}.png?nolink |}}
**$(echo $pdf | sed s/.pdf// | sed "s/_/ /g"| sed "s/-/ /g")**\\\\
//$(pdfinfo $pdf | sed -ne "/Author:/ {s/^Author:\ *//;p}")//

{{:bookshelf:$dir:${pdf}|Download e-book}}
----

EOF
done

Sample output.

[...]
{{:bookshelf:pragprog:covers:the-viml-primer_p1_0.png?nolink |}}
**the viml primer p1 0**\\
//Benjamin Klein//

{{:bookshelf:pragprog:the-viml-primer_p1_0.pdf|Download e-book}}
----

{{:bookshelf:pragprog:covers:tmux_p3_0.png?nolink |}}
**tmux p3 0**\\
//Brian P. Hogan//

{{:bookshelf:pragprog:tmux_p3_0.pdf|Download e-book}}
----
[...]
Notice that DokuWiki does not like mixed case names - see Page Names documentation.

Additional information

The most effective way to get number of pages from PDF e-book is to use pdfinfo utility from mentioned earlier Poppler utilities package.

$ pdfinfo Linux-Voice-Issue-016.pdf | awk '/^Pages:/ { print $2 }'
116

You can use ImageMagick's identify command to get the same information, but it is very slow, as it extracts every page as an image.

$ identify -format "%n" Linux-Voice-Issue-016.pdf | head -1
116

You can analyze first ten pages to print the one with most colors using the following command.

$ identify -format "%s %k\n" Linux-Voice-Issue-016.pdf[0-10] | sort -nrk2 | awk 'NR==1 {print $1}'
3

This command can be very useful if you need to search for cover image.

Milosz Galazka's Picture

About Milosz Galazka

Milosz is a system administrator working for a successful Polish company and a long time supporter of Free Software Foundation and Debian operating system.

Gdansk, Poland https://sleeplessbeastie.eu