Prepare release 0.1.0.1
Make naming of pointers a bit more consistent
Prevent deletion of document before deletion of pages
pdftotext package provides functions for extraction of plain text from PDF documents. It uses C++ library Poppler, which is required to be installed in the system. Output of Haskell
pdftotext library is identical to output of Poppler's tool
import qualified Data.Text.IO as T import Pdftotext main :: IO () main = do Just pdf <- openFile "path/to/file.pdf" T.putStrLn $ pdftotext Physical pdf
pdftotext comes with executable program
pdftotext.hs which can print text extracted from PDF and basic information from the document.
$> pdftotext.hs info test/simple.pdf File : test/simple.pdf Pages : 4 Properties Title : Simple document for testing Author : G. Eyaeb Subject : Testing Creator : pdflatex Producer: LaTeX with hyperref Keywords: haskell,pdf
$> pdftotext.hs text --pages 1,4 test/simple.pdf Simple document for testing deserve neither liberty nor safety.
See help for more information:
$> pdftotext.hs --help $> pdftotext.hs text --help $> pdftotext.hs info --help
The library uses poppler via FFI, therefore internally all functions are of type
IO. However, their non-
IO variants (using
unsafePerformIO) should be safe to use. Module
Pdftotext.Internal exposes all
Project is hosted at https://sr.ht/~geyaeb/haskell-pdftotext/ . The homepage provides links to Mercurial repository, mailing list and ticket tracker.
Patches, suggestions, questions and general discussions can be send to the mailing list. Detailed information about sending patches by email can be found at https://man.sr.ht/hg.sr.ht/email.md.