Showcase and discover digital art at yex

Follow Design Stacks

Subscribe to our free newsletter to get all our latest tutorials and articles delivered directly to your inbox!

About extracting illustrations from print files

About extracting illustrations from print files

Admittedly, this is a very tempting operation. It can be done.

It can not be done properly by extracting the pertinent page from the print file, for example by using Ghostview page selection; using an illustration modifier program to `White Out(tm)‘ the undesired material on the page; adding EPS DSC material to the resultant file and using the result as an EPS file.

With generally poor results and considerable effort, it can be done by extracting the pertinent page from the print file, converting the image to a bit-map image, doing a bit-map edit on the image, selecting the desired portion of the bitmap image and generating a PostScript bitmap image. There generally is a major loss of resolution and fine detail when this is done. The file size generally grows significantly–10 000 byte files can easily become 10 000 000 byte files.

With generally good results, it can be done by extracting the pertinent page from the print file, text editing the file to remove the encapsulating material, reconstructing the frequently deleted EPS DSC material and analyzing for and removing any undesired annotations. The resultant EPS file should be given the same testing as any other EPS file. Here is a basic set of EPS headers:

%!PS-Adobe-2.0 EPSF-1.2 %%BoundingBox: llx lly urx ury %%EndComments     

The BoundingBox numbers are, in pairs, the coordinates of the lower left (ll) and upper right (ur) corners of the box which surrounds the figure. They must be integers. The horizontal (x) coordinate is the first coordinate in a pair; the vertical (y) coordinate is the second coordinate in a pair. The origin is the lower left corner of the interpreter’s imaging area. The units are PostScript points (72ths of inches or 352 7/9 micrometers). Select values which are a small nearly uniform amount outside the actual figure.

The difficulty in the last method is mostly in the removal of undesired annotations. With LaTex, the encapsulating material is quite stylized and therefore usually easily removed. Ghostview can be used to reconstruct the BoundingBox data which is the hardest of the required EPS DSC material to reconstruct.

About files

Whereas Adobe PostScript interpreters can process files with any of UNIX, Apple MacIntosh or MSDOS line terminators, the same is not generally true of the application programs that process the Adobe PostScript files. To each other operating system, UNIX and Apple MacOS files appear to be one long line. This can cause problem with line buffer sizes in applications and print spoolers. Adobe PostScript files should be copied between dissimilar operating systems as text files.

Whereas Apple MacOS considers null (zero) bytes to be white space characters, UNIX considers them to character string terminators. Unless an UNIX application is careful coded to handle character streams containing null bytes which are data, ommisions of data may occur. Many Adobe PostScript interpreters will treat null bytes as whitespace.

About Ghostview

Ghostview is dependent on proper Adobe DSC. Ghostview uses these comments to locate the file’s prolog which defines procedures and values used by the individual page descriptions, the individual page descriptions themselves and the file’s epilog which may do some cleanup. If the application generating the file does not generate proper DSC or obey the constraints of page independence, the individual pages may not be properly handed to ghostscript for display. A common symptom is an eyeball hang. Ghostview presents an eyeball cursor until the underlying ghostscript interpreter signals that it has presented the page to the display. Then, Ghostview switches to a cross cursor. If the showpage command which should be part of the page description is not executed, then the cursor does not switch.

If ghostview erronously includes extra material after the showpage, such as combining multiple page descriptions into a single offering to ghostscript, then the extra material is silently ignored. This can result in a file that appears to display properly in Ghostview which prints extra pages on a printer.

Font Selection

Yes, I know that Palatino looks good. But, it is not present in many interpreters and printers. If you want your document to be presentable on most printers and display programs, then you should only use the basic 13 fonts that the Apple Laserwriter and almost all other printers have:

  • Courier
  • Courier-Bold
  • Courier-BoldOblique
  • Courier-Oblique
  • Helvetica
  • Helvetica-Bold
  • Helvetica-BoldOblique
  • Helvetica-Oblique
  • Times-Roman
  • Times-Bold
  • Times-BoldItalic
  • Times-Italic
  • ZapfDingbats

Even the Helvetica-Narrow family is suspect; some interpreters have Helvetica-Condensed instead.

Paper bin, tray, size, feed mode, etc. selection

In a word, don’t do this in a document intended for general distribution! What your printer calls bin 0, may mean manual feed or not exist on another printer or interpreter. The entire world does not use either A4 (about 8.28 by 11.7 inches) or American letter (215.9 by 279.4 mm) paper. Instead of selecting a paper size, set your margins so that the print area is not too high for an American letter page nor too wide for A4 paper.

Using an unavailable paper tray, bin, size, feed mode, etc. is a good way to cause the print job to be canceled without being printed!

There are tricks available to resize print files to fit the other paper sizes.

In General …

PostScript Sins lists a number of failings of a number of Adobe PostScript language generating and handling applications.