• NOW LIVE! Into the Woods--new character species, eerie monsters, and haunting villains to populate the woodlands of your D&D games.

PDF Help

Brown Jenkin

First Post
ENWorld has wasted way to much of my work time. Now hopefuly it can give back a little. I need to create a PDF of one of our 150 page books to make availible for download on our website. Unfortunately it is over 10 years old so we only have the printed version to scan (No electronic copys). Can anyone tell me the best way to create a PDF from Tiff scans that produce the smallest yet still readable files. We have the full version of Acrobat 5.0 currently, but could upgrade if it is absolutely neccesary. The original pages are 8 1/2 x 11 B&W, so far converting to 150dpi jpgs resulted in 17Megs for the first 16 pages.

Thanks.
 

log in or register to remove this ad

hope this helps

You may be able to go to your local kinko's. the one I used to work at had a Xerox Digipath that can scan to PDF. The only downside is the fact that the scans will all be images of the pages and not searchable. The compression used is LZW Tiffs embedded in the PDF file format, so you may be able to copress the files yourself first then convert them to PDF. But that would take longer, if I remember the equipment it should take at most around 1-2 hours if the gur running the machine dosnt know what he is doing.

From there Adobe has a utility called ISI Copy which if I read correctly will OCR the text for you. Dont ask me fr any details, I just saw it on their website a couple of days ago.
adobe.com said:

by Image Solutions ISICopy works with Adobe® Acrobat® software to extract text from image-based PDF files, coverting it into valuable editable text. There is no need to OCR an entire page; if you have a paper-based PDF file, you can select the precise amount of text you want to copy and then paste it into any application.
http://www.adobe.com/store/products/plugIn.jhtml?id=catPlugins_ISI_iCopy
Or if you have a really good OCR program you can input the Tiffs into that to output straight text with any graphs and pics scanned separately.

Either way there is a bunch of work ahead of you.
 


Brown Jenkin said:
ENWorld has wasted way to much of my work time. Now hopefuly it can give back a little. I need to create a PDF of one of our 150 page books to make availible for download on our website. Unfortunately it is over 10 years old so we only have the printed version to scan (No electronic copys). Can anyone tell me the best way to create a PDF from Tiff scans that produce the smallest yet still readable files. We have the full version of Acrobat 5.0 currently, but could upgrade if it is absolutely neccesary. The original pages are 8 1/2 x 11 B&W, so far converting to 150dpi jpgs resulted in 17Megs for the first 16 pages.

Thanks.

Are there many images in the book? Or is it just text? It makes a difference. Also, since it's b/w, you can significantly cut down the image size while keeping the resolution. For example, are the images in the RGB or CMYK mode? If so, cutting them to greyscale or even bitmap can make the images smaller. 150dpi is a good resolution to keep the pdf readable but not too large. Sometimes, though, you can go as low at 100dpi. I've done it with color images, and at 100dpi they still look good, as long as you don't zoom in too closely.

Anyway, if you can, give a little more info on it. If it's a book, the OCR option may be a good solution as well.
 

The book is an architectural survey book. With the exception of a few full page photos, most pages are 80% text - 20 % photos (1-3 small photos per page). All the pages were scanned as full resolution tiffs, and have also been saved as 150dpi greyscale jpgs. As a test I have done the first 15 pages (out of 150) without OCR and now have the size down to 7.5 meg. This would equal 75 meg for the enitire book. I have looked at other PDF books out there with roughly equivalent text/image ratio and page numbers and they have been 10-12 Meg in size. These other books are real text and not all image files which acounts for the discrepency. Time is a factor so I will rephrase my request slightly.

Would a 75 Meg PDF be unreasonable?
What if it was availible by chapter as well?
What about if it was availible by request on CD ($1-$3 to cover materials and postage)?

Now if I go OCR and convert it to text how difficult will it be?
Is there software that would convert the text but leave the layout so it doesn't have to be reformated? What software?
Even given Hi-res Tiffs how accurate is OCR? Will I have to spend hours looking for mistakes and correcting them manually (And I will have to ensure accuracy)?
 
Last edited:

Brown Jenkin said:
The book is an architectural survey book. With the exception of a few full page photos, most pages are 80% text - 20 % photos (1-3 small photos per page). All the pages were scanned as full resolution tiffs, and have also been saved as 150dpi greyscale jpgs. As a test I have done the first 15 pages (out of 150) without OCR and now have the size down to 7.5 meg. This would equal 75 meg for the enitire book. I have looked at other PDF books out there with roughly equivalent text/image ratio and page numbers and they have been 10-12 Meg in size. These other books are real text and not all image files which acounts for the discrepency. Time is a factor so I will rephrase my request slightly.

Would a 75 Meg PDF be unreasonable?
What if it was availible by chapter as well?
What about if it was availible by request on CD ($1-$3 to cover materials and postage)?

Now if I go OCR and convert it to text how difficult will it be?
Is there software that would convert the text but leave the layout so it doesn't have to be reformated? What software?
Even given Hi-res Tiffs how accurate is OCR? Will I have to spend hours looking for mistakes and correcting them manually (And I will have to ensure accuracy)?

Hmm, tough. 75MB is a very large file. Will this be for people to download? If so, you will definitely want to break it out by chapter. It will pretty much rule out people on dialup. OCR is pretty good these days, but it's not going to retain the layout well enough. You'd have to go through it. And I suspect you'd need to go through and check for errors. Though most of it will just be going through and redoing the formatting.

However, there may be a program out there I'm unfamiliar with. Otherwise, you don't have great options. The best would require a great deal of time. Doing it as you are now, it's going to be very big.

My advice is to split the file out by chapters (as well as one large file--some people could handle it) and offer the whole thing on CD as well. If you have time later on, you can do an OCR scan and get a better version.
 

75MB is too big for a lot of sans-broadband folks.

You say you have the full version of Adobe Acrobat 5, so lemme ask you this: did you go to File >> Preferences >> Options and check Save As Optimizes for Fast Web View?

And when you scan a document as a TIFF file, do you use compression? You should have an option when you save the document to do that.

I've found PaperPort 8.0 and the accompanying OmniPage OCR to be excellent at converting documents like the one you mention.
 
Last edited:

I would also recommend Omnipage Pro to use as an OCR. I have had pretty good luck using it, even for pages of both text and images. Other than that, I have nothing of any use to add to the discussion. Sorry.

zen
 

Felon said:
75MB is too big for a lot of sans-broadband folks.

Not that it matters too much but the final full copy came out to only 50MB. Still too big for non broadband but I think that it is unlikely to be of use to non-broadband no matter what. Even with OCR and text the file would likely be 10MB and from experience with a 56k modem even 10MB is out of reach.

Felon said:
You say you have the full version of Adobe Acrobat 5, so lemme ask you this: did you go to File >> Preferences >> Options and check Save As Optimizes for Fast Web View?

We expect people to print out at least portions to use. I know that limiting it to just screen resolution you can get down to 72dpi but we want at least 150dpi for printing. Won't I lose decent printing resolution if I go with Fast Web View?

Felon said:
And when you scan a document as a TIFF file, do you use compression? You should have an option when you save the document to do that.

As I stated above the original scans are non-compressed TIFFs to allow the most flexibility for whatever option is desired. Copies are then made as 150dpi JPGs for use in the PDF.

Felon said:
I've found PaperPort 8.0 and the accompanying OmniPage OCR to be excellent at converting documents like the one you mention.

Thanks for the recomendations. I will look into those programs.
 

Brown Jenkin said:
We expect people to print out at least portions to use. I know that limiting it to just screen resolution you can get down to 72dpi but we want at least 150dpi for printing. Won't I lose decent printing resolution if I go with Fast Web View?

For reasons having to do with internal design, PDF is much more effective when you use image resolutions that are multiples of 96 dpi. I don't know how much of a difference that will make on printing, but it will speed up the performance of the PDF in Acrobat.

Or so I was told at a conference; I switched the image resolutions in the newsletter I do to multiples of 288 and I find it does make a difference.

On the other hand, I'm not particularly optimizing for PDF size (beyond the options that the software hands me).
 

Into the Woods

Remove ads

Top