View Full Version : Scanning a book & converting to ebook
I recently inquired if Haymarket or Brill have digital copies of Lenin Rediscovered - 'What is to be done?' in context. Sadly, they do not. Now, it so happens I also have a hardcopy of it (it's rather big though and I read mostly in the train or other spots where big books are inconvenient).
So, I had a rather radical idea of cutting apart the hardcopy, page by page, then scan it in high DPI and make a pdf out of it.
However, and here is the issue, what software tools can make a pdf into a real text document? As far as I know, a scanned document pdf is really just a bunch of pictures put into a pdf container. However, to make it into an ebook, conversion to EPUB or (for Kindles) MOBI format is also necessary, otherwise the whole endeavour is useless.
So, how do I scan into a text-pdf (as opposed to a picture-pdf), while holding onto the formatting (I really wouldn't want to redo the layout of an 800+ page document!), so it can be converted into more usable ebook formats?
Dr. Fish
24th March 2012, 21:20
That'd be a really big project. Whether it will immediately be an image or a PDF depends on how you're scanning it. If your scanner makes PDFs, you'll want to make PDFs for chapters instead of the entire book, cuz you'll be driven insane. If it only makes images, you can paste them into an .odt or .doc or whatever, and then "print" a PDF, or make a PDF from that. If you go down that avenue, I suggest using Open Office.
If you're making an EPUB, then I'd advise making the .odt, and then converting the .odt into whatever format, MOBI, EPUB, &c. Converting PDF --> really any other format doesn't really work, it always comes out fucked up.
But then, it also might be good to just try to read it as it is. It'd be a cool demonstration of endurance &c.
PM me if you finish.:cool:
NewLeft
24th March 2012, 21:46
I do this often:
1) Get abbyy finereader.
2) Remove the binding.
3) Do a double page scan using the scan feeder for about 50 sheets (depends on the feeder size)
4) Scan the first sides, set it to 300dpi, black and white (unless you need color)
5) Flip the stack of papers once done scanning to do the other side.
6) Repeat until you're done the entire book.
7) Import into finereader.
8) Save it as formatted pdf.
9) Share/spread the wealth.
I do this often:
1) Get abbyy finereader.
2) Remove the binding.
3) Do a double page scan using the scan feeder for about 50 sheets (depends on the feeder size)
4) Scan the first sides, set it to 300dpi, black and white (unless you need color)
5) Flip the stack of papers once done scanning to do the other side.
6) Repeat until you're done the entire book.
7) Import into finereader.
8) Save it as formatted pdf.
9) Share/spread the wealth.
1. While ABBYY seems to be pretty good OCR software, it has no Linux offering (and Wine has no reports on it, I might try it though). Are there any open source offerings? The one most promising seems to be OCRFeeder (http://live.gnome.org/OCRFeeder).
2. I was considering a Stanley knife for this, but are there easier/better ways?
3 - 5. I don't have a scanner with a feeder. Just a flatbed. So yeah.
After converting it into a text PDF, there is another issue that, while not strictly necessary, makes the ebook much more readable: Linking, especially in the table of content. I've found Sigil (http://code.google.com/p/sigil/) as a possible editor for that, but I wonder if there are more easy options (preferably options that can be run on Linux). Anyway, this still probably involves quite a bit of work.
NewLeft
24th March 2012, 22:56
1. While ABBYY seems to be pretty good OCR software, it has no Linux offering (and Wine has no reports on it, I might try it though). Are there any open source offerings? The one most promising seems to be OCRFeeder (http://live.gnome.org/OCRFeeder).
2. I was considering a Stanley knife for this, but are there easier/better ways?
3 - 5. I don't have a scanner with a feeder. Just a flatbed. So yeah.
After converting it into a text PDF, there is another issue that, while not strictly necessary, makes the ebook much more readable: Linking, especially in the table of content. I've found Sigil (http://code.google.com/p/sigil/) as a possible editor for that, but I wonder if there are more easy options (preferably options that can be run on Linux). Anyway, this still probably involves quite a bit of work.
I only have experience with Abbyy, try this list: http://en.wikipedia.org/wiki/List_of_optical_character_recognition_software
If you don't have a feeder, then it will take a long time. In that case, set it to 150dpi, black and white and just rip off the cover and bend the book along the middle. If you can thin out the binding, then it'll be easier to fold. As for linking, I never do it so I can't really help you with that..
ellipsis
25th March 2012, 00:17
Maybe finding somebody local who has the resources for professional or academic use. tell them you are trying to digitally preserve an out of print text. If the text is out of copywrite, you could probably send it to Gutenberg project or google, although clearly you would "lose control" over the process but would have a high quality, free and distributed product.
Maybe finding somebody local who has the resources for professional or academic use. tell them you are trying to digitally preserve an out of print text. If the text is out of copywrite, you could probably send it to Gutenberg project or google, although clearly you would "lose control" over the process but would have a high quality, free and distributed product.
Sadly the book is not out of print and copyright has not expired yet (it was only published in 2006, by Brill). There are some services I've found that do the scanning for me, but crucially don't do proof-reading (to cut costs). So, I'll probably end up doing it myself anyhow.
ellipsis
25th March 2012, 01:07
seems like a lot of work to be able to read one book on the go and electronically, to me, but sounds like you are into it.
Samwise
9th April 2013, 16:10
You might want to try this set-up instead:
www dot instructables dot com/id/Bargain-Price-Book-Scanner-From-A-Cardboard-Box/
Much much faster than a scanner, and doesn't destroy the book.
You might want to try this set-up instead:
www dot instructables dot com/id/Bargain-Price-Book-Scanner-From-A-Cardboard-Box/
Much much faster than a scanner, and doesn't destroy the book.
That actually looks like an easy enough setup. Might consider that. Thanks!
Powered by vBulletin® Version 4.2.5 Copyright © 2020 vBulletin Solutions Inc. All rights reserved.