7
10,355,212 members (63,786 online) Sign in home quick answers discussions features community help how to create a document with multiple Articles » Web Development » Charts, Graphs and Images » Images and multimedia Article Browse Code Stats Revisions (5) Alternatives Comments & Discussions (17) About Article A Technical Blog, originally posted at http://cyotek.com /blog/convert-a-pdf-into-a- series-of-images-using- csharp- and-ghostscript?source=rss How to convert a PDF into a series of images using C# and GhostScript Type Technical Blog Licence CPOL First Posted 20 Jan 2012 Views 33,799 Downloads 2,195 Bookmarked 29 times C# Next Rate this: Tweet 0 Convert a PDF into a Series of Images using C# and GhostScript By Richard James Moss, 21 Jun 2013 GhostScript .NET integration component - 11.7 KB PDF conversion component - 5.4 KB Introduction An application I was recently working on received PDF files from a webservice which it then needed to store in a database. I wanted the ability to display previews of these documents within the application. While there are a number of solutions for creating PDF files from C#, options for viewing a PDF within your application is much more limited, unless you purchase expensive commercial products, or use COM interop to embed Acrobat Reader into your application. This article describes an alternate solution, in which the pages in a PDF are converted into images using GhostScript, from where you can then display them in your application. In order to avoid huge walls of text, this article has been split into two parts, the first dealing with the actual conversion of a PDF, and the second demonstrates how to extend the ImageBox control to display the images. Caveat Emptor Before we start, some quick points: The method I'm about to demonstrate converts the page of the PDF into an image. This means that it is very suitable for viewing, but interactive elements such as forms, hy perlinks and even good old text selection are not available. GhostScript has a number of licenses associated with it but I can't find any information of the pricing of commercial licenses. The GhostScript API Integration library used by this project isn't complete and I'm not going to go into the bells and whistles of how it works in this pair of articles - once I've completed the outstanding functionality, I'll create a new article for it. Getting Started You can download the two libraries used in this article from the links below, these are: 5.00 (7 votes) 0 3 Like Like articles Convert a PDF into a Series of Images using C# and Gho... http://www.codeproject.com/Articles/317700/Convert-a... 1 of 7 1/29/2014 9:27 PM

Convert a PDF Into a Series of Images Using C# and GhostScript

  • Upload
    cafjnk

  • View
    191

  • Download
    5

Embed Size (px)

DESCRIPTION

Convert a PDF Into a Series of Images Using C# and GhostScript

Citation preview

10,355,212 members (63,786 online)

Sign in

home quick answers discussions features community help how to create a document with multiple pages per sheet using

Articles » Web Development » Charts, Graphs and Images » Images and multimedia

Article

Browse Code

Stats

Revisions (5)

Alternatives

Comments &

Discussions (17)

About Article

A Technical Blog, originally

posted at http://cyotek.com

/blog/convert-a-pdf-into-a-

series-of-images-using-

csharp-

and-ghostscript?source=rss

How to convert a PDF into a

series of images using C#

and GhostScript

Type Technical

Blog

Licence CPOL

First Posted 20 Jan 2012

Views 33,799

Downloads 2,195

Bookmarked 29 times

C#

Next

Rate this:Tweet 0

Convert a PDF into a Series of Images using C# and

GhostScriptBy Richard James Moss, 21 Jun 2013

GhostScript .NET integration component - 11.7 KB

PDF conversion component - 5.4 KB

Introduction

An application I was recently working on received PDF files from a webservice which it then needed to store in a

database. I wanted the ability to display previews of these documents within the application. While there are a

number of solutions for creating PDF files from C#, options for viewing a PDF within your application is much more

limited, unless you purchase expensive commercial products, or use COM interop to embed Acrobat Reader into your

application.

This article describes an alternate solution, in which the pages in a PDF are converted into images using GhostScript,

from where you can then display them in your application.

In order to avoid huge walls of text, this article has been split into two parts, the first dealing with the actual

conversion of a PDF, and the second demonstrates how to extend the ImageBox control to display the images.

Caveat Emptor

Before we start, some quick points:

The method I'm about to demonstrate converts the page of the PDF into an image. This means that it is very

suitable for viewing, but interactive elements such as forms, hyperlinks and even good old text selection are

not available.

GhostScript has a number of licenses associated with it but I can't find any information of the pricing of

commercial licenses.

The GhostScript API Integration library used by this project isn't complete and I'm not going to go into the

bells and whistles of how it works in this pair of articles - once I've completed the outstanding functionality, I'll

create a new article for it.

Getting Started

You can download the two libraries used in this article from the links below, these are:

5.00 (7 votes) 03LikeLike

articles

Convert a PDF into a Series of Images using C# and Gho... http://www.codeproject.com/Articles/317700/Convert-a...

1 of 7 1/29/2014 9:27 PM

Top News

Windows 9 release date,

news and rumors

Get the Insider News free each

morning.

Related Videos

Related Articles

WPF Visual Print Component

Using the WebBrowser control,

simplified

Report Generator

Using a Wiki for knowledge

sharing and SQL Server

database documentation

Opening / Saving multiple

types of documents in MFC

MDI applications

Quick C# Documentation using

XML

Dynamic Pagination

Generate Thumbnail Images

from PDF Documents

Step In To: SharePoint 2013

Online - Sites & Contents

10 Minutes to document your

code

Working with Crystal Reports in

.Net

MOSS Multilingual Site

Configuration

Generating Dynamic PDF

Documents using the Open

Source Scryber Library

The Ultimate Toolbox 3D Tab

Views

PDF File Analyzer With C#

Parsing Classes

Creating your first MFC

Doc/View application

XML class for processing and

building simple XML

documents

Beginner's Guide: Exploring IIS

6.0 With ASP.NET

Building a Web Message Board

using Visual Studio 2008 Part II

- Posting messages using

Microsoft Word

Driver Development Part 3:

Introduction to driver contexts

ASP.NET Application and Page

Cyotek.GhostScript - core library providing GhostScript integration support

Cyotek.GhostScript.PdfConversion - support library for converting a PDF document into images

Please note that the native GhostScript DLL is not included in these downloads, you will need to obtain that from the

GhostScript project page.

Using the GhostScriptAPI Class

As mentioned above, the core GhostScript library isn't complete yet, so I'll just give a description of the basic

functionality required by the conversion library.

The GhostScriptAPI class handles all communication with GhostScript. When you create an instance of the class, it

automatically calls gsapi_new_instance in the native GhostScript DLL. When the class is disposed, it will

automatically release any handles and calls the native gsapi_exit and gsapi_delete_instance methods.

In order to actually call GhostScript, you call the Execute method, passing in either a string array of all the

arguments to pass to GhostScript, or a typed dictionary of commands and values. The GhostScriptCommand enum

contains most of the commands supported by GhostScript, which may be a preferable approach rather than trying to

remember the parameter names themselves.

Defining Conversion Settings

The Pdf2ImageSettings class allows you to customize various properties of the output image. The following

properties are available:

AntiAliasMode - specifies the antialiasing level between Low, Medium and High. This internally will set the

dTextAlphaBits and dGraphicsAlphaBits GhostScript switches to appropriate values.

Dpi - dots per inch. Internally sets the r switch. This property is not used if a paper size is set.

GridFitMode - controls the text readability mode. Internally sets the dGridFitTT switch.

ImageFormat - specifies the output image format. Internally sets the sDEVICE switch.

PaperSize - specifies a paper size from one of the standard sizes supported by GhostScript.

TrimMode - specifies how the image should be sized. Your mileage may vary if you try and use the paper size

option. Internally sets either the dFIXEDMEDIA and sPAPERSIZE or the dUseCropBox or the dUseTrimBox

switches.

Typical settings could look like this:

Collapse | Copy Code

Pdf2ImageSettings settings;

settings = new Pdf2ImageSettings();settings.AntiAliasMode = AntiAliasMode.High;settings.Dpi = 300;settings.GridFitMode = GridFitMode.Topological;settings.ImageFormat = ImageFormat.Png24;settings.TrimMode = PdfTrimMode.CropBox;

Converting the PDF

To convert a PDF file into a series of images, use the Pdf2Image class. The following properties and methods are

offered:

ConvertPdfPageToImage - converts a given page in the PDF into an image which is saved to disk

GetImage - converts a page in the PDF into an image and returns the image

GetImages - converts a range of pages into the PDF into images and returns an image array

PageCount - returns the number of pages in the source PDF

PdfFilename - returns or sets the filename of the PDF document to convert

PdfPassword - returns or sets the password of the PDF document to convert

Settings - returns or sets the settings object described above

A typical example to convert the first image in a PDF document:

Collapse | Copy Code

Bitmap firstPage = new Pdf2Image("sample.pdf").GetImage();

The Inner Workings

Most of the code in the class is taken up with the GetConversionArguments method. This method looks at the

various properties of the conversion such as output format, quality, etc. and returns the appropriate commands to

pass to GhostScript:

Collapse | Copy Code

protected virtual IDictionary<GhostScriptCommand, object> GetConversionArguments(string pdfFileName, string outputImageFileName, int pageNumber, string password, Pdf2ImageSettings settings)

Convert a PDF into a Series of Images using C# and Gho... http://www.codeproject.com/Articles/317700/Convert-a...

2 of 7 1/29/2014 9:27 PM

Life Cycle

Related Research

Ten Tips of Web App Testing

Learn Agile: Ten Tips for

Launching and Testing High

Quality Apps for the American

Market

Essential Keys to Mobile

Usability

The Essential Guide to Mobile

App Testing: Tips for

Developers in USA & Canada

{ IDictionary<GhostScriptCommand, object> arguments;

arguments = new Dictionary<GhostScriptCommand, object>();

// basic GhostScript setup arguments.Add(GhostScriptCommand.Silent, null); arguments.Add(GhostScriptCommand.Safer, null); arguments.Add(GhostScriptCommand.Batch, null); arguments.Add(GhostScriptCommand.NoPause, null);

// specify the output arguments.Add(GhostScriptCommand.Device, GhostScriptAPI.GetDeviceName(settings.ImageFormat)); arguments.Add(GhostScriptCommand.OutputFile, outputImageFileName);

// page numbers arguments.Add(GhostScriptCommand.FirstPage, pageNumber); arguments.Add(GhostScriptCommand.LastPage, pageNumber);

// graphics options arguments.Add(GhostScriptCommand.UseCIEColor, null);

if (settings.AntiAliasMode != AntiAliasMode.None) { arguments.Add(GhostScriptCommand.TextAlphaBits, settings.AntiAliasMode); arguments.Add(GhostScriptCommand.GraphicsAlphaBits, settings.AntiAliasMode); }

arguments.Add(GhostScriptCommand.GridToFitTT, settings.GridFitMode);

// image size if (settings.TrimMode != PdfTrimMode.PaperSize) arguments.Add(GhostScriptCommand.Resolution, settings.Dpi.ToString());

switch (settings.TrimMode) { case PdfTrimMode.PaperSize: if (settings.PaperSize != PaperSize.Default) { arguments.Add(GhostScriptCommand.FixedMedia, true); arguments.Add(GhostScriptCommand.PaperSize, settings.PaperSize); } break; case PdfTrimMode.TrimBox: arguments.Add(GhostScriptCommand.UseTrimBox, true); break; case PdfTrimMode.CropBox: arguments.Add(GhostScriptCommand.UseCropBox, true); break; }

// pdf password if (!string.IsNullOrEmpty(password)) arguments.Add(GhostScriptCommand.PDFPassword, password);

// pdf filename arguments.Add(GhostScriptCommand.InputFile, pdfFileName);

return arguments; }

As you can see from the method above, the commands are being returned as a strongly typed dictionary - the

GhostScriptAPI class will convert these into the correct GhostScript commands, but the enum is much easier to

work with from your code! The following is an example of the typical GhostScript commands to convert a single page

in a PDF document:

Collapse | Copy Code

-q -dSAFER -dBATCH -dNOPAUSE -sDEVICE=png16m -sOutputFile=tmp78BC.tmp -dFirstPage=1 -dLastPage=1 -dUseCIEColor -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -dGridFitTT=2 -r150 -dUseCropBox=true sample.pdf

The next step is to call GhostScript and convert the PDF which is done using the ConvertPdfPageToImage

method:

Collapse | Copy Code

public void ConvertPdfPageToImage(string outputFileName, int pageNumber) { if (pageNumber < 1 || pageNumber > this.PageCount) throw new ArgumentException("Page number is out of bounds", "pageNumber");

using (GhostScriptAPI api = new GhostScriptAPI()) api.Execute(this.GetConversionArguments(this._pdfFileName, outputFileName, pageNumber, this.PdfPassword, this.Settings)); }

As you can see, this is a very simple call - create an instance of the GhostScriptAPI class and then pass in the list

of parameters to execute. The GhostScriptAPI class takes care of everything else.

Once the file is saved to disk, you can then load it into a Bitmap or Image object for use in your application. Don't

Convert a PDF into a Series of Images using C# and Gho... http://www.codeproject.com/Articles/317700/Convert-a...

3 of 7 1/29/2014 9:27 PM

Richard James MossSoftware Developer (Senior)

forget to delete the file when you are finished with it!

Alternatively, the GetImage method will convert the file and return the bitmap image for you, automatically deleting

the temporary file. This saves you from having to worry about providing and deleting the output file, but it does mean

you are responsible for disposing of the returned bitmap.

Collapse | Copy Code

public Bitmap GetImage(int pageNumber) { Bitmap result; string workFile;

if (pageNumber < 1 || pageNumber > this.PageCount) throw new ArgumentException("Page number is out of bounds", "pageNumber");

workFile = Path.GetTempFileName();

try { this.ConvertPdfPageToImage(workFile, pageNumber); using (FileStream stream = new FileStream(workFile, FileMode.Open, FileAccess.Read)) result = new Bitmap(stream); } finally { File.Delete(workFile); }

return result; }

You could also convert a range of pages at once using the GetImages method:

Collapse | Copy Code

public Bitmap[] GetImages(int startPage, int lastPage){ List<Bitmap> results;

if (startPage < 1 || startPage > this.PageCount) throw new ArgumentException ("Start page number is out of bounds", "startPage");

if (lastPage < 1 || lastPage > this.PageCount) throw new ArgumentException ("Last page number is out of bounds", "lastPage"); else if (lastPage < startPage) throw new ArgumentException ("Last page cannot be less than start page", "lastPage");

results = new List<Bitmap>(); for (int i = startPage; i <= lastPage; i++) results.Add(this.GetImage(i));

return results.ToArray();}

In Conclusion

The above methods provide a simple way of providing basic PDF viewing in your applications. In the next part of this

series, we describe how to extend the ImageBox component to support conversion and navigation.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Convert a PDF into a Series of Images using C# and Gho... http://www.codeproject.com/Articles/317700/Convert-a...

4 of 7 1/29/2014 9:27 PM

United Kingdom

No Biography provided

Follow on Twitter Google

Search this forum Go

Article Top

Comments and Discussions

You must Sign In to use this message board.

Profile popups Spacing Relaxed Noise Very High Layout Open All Per page 10 Update

First Prev Next

Member 6431561 6-Nov-13 4:13

Hi, How can I convert multiple page pdf file to tiff has sinle long page.

Thanks.

Sign In ·View Thread · Permalink

Richard James Moss 7-Nov-13 2:42

Well, if you have converted the PDF into a series of images, you can query their Size properties to

determine the final size of the image, create a new Bitmap object and then use the methods of the

Graphics class to draw the different images appropriately into the final image. As .NET supports reading

and writing TIFF files (not too sure about multi-page TIFF files though) you would then be able to save your

new image into that format.

Seems pretty straightforward to me...

Regards;

Richard Moss

Sign In ·View Thread ·Permalink

Member 10029390 8-Oct-13 21:24

Hi I am working in web application and i have to convert a PDF into image there. Is this also working for web

application also ?

Sign In ·View Thread · Permalink

Richard James Moss 10-Oct-13 7:58

Hello,

Yes, this works with ASP.net applications (with a caveat or 10) - you can see a follow up article I did

describing my experiences with this here.

Hope that helps!

Regards;

Richard Moss

Sign In ·View Thread ·Permalink

Member 10067330 27-Aug-13 19:53

Convert vertical Images

Re: Convert vertical Images

Compatible for Web apllication ?

Re: Compatible for Web apllication ?

Would be better if you use better Ghostscript wrapper

like..

Convert a PDF into a Series of Images using C# and Gho... http://www.codeproject.com/Articles/317700/Convert-a...

5 of 7 1/29/2014 9:27 PM

https://ghostscriptnet.codeplex.com[^]

It allows you to render pdf pages to images in memory so you dont need to save it on the disk first. It has a lot

more features...

Sign In ·View Thread ·Permalink

Member 10194914 7-Aug-13 20:33

hi, downloaded ur code, compile bouth .dll-s. compile is ok. add them to my project.

when try to use:

Bitmap firstPage = new Pdf2Image(@"D:\test1.pdf").GetImage();

catch exception - Failed to process GhostScript command. what can it be?

I had no gs32dll.dll on my PC at the beginning. When I tried to use

Bitmap firstPage = new Pdf2Image(@"D:\test1.pdf").GetImage();

I had exception something like that - cannot found gs32dll module... I found, downloaded and try placed that dll into

Windows\system32 and into Windows\syswow64. There is no effect except I got new kind of expection mentioned

above.

what additioanl information u need to answer?

============================================================================

solved! Mention plz in ur acticle that user need last version of GhostScript .dll in user project bin-folder, not where

else!

modified 8-Aug-13 5:46am.

Sign In ·View Thread ·Permalink

torti83 27-Jun-13 23:52

hi,

I tried to make a batch of converting pdfs in several jpgs. unfortunately, I got an out of memory exception

after 48 files. is there a bug in the solution or is ghostscript not possible to do that?

thanks,

thorsten

Sign In ·View Thread ·Permalink

Richard James Moss 28-Jun-13 5:45

Where does the out of memory exception to come from? I'd be more inclined to assume the error came

from how it was being used. For example, you said 48 files - how many images were in these files and were

you disposing of the image files once you'd finished with them?

Regards;

Richard Moss

Sign In · View Thread ·Permalink

matrix37 18-Jul-13 23:52

I'm getting roughly the same problem.

The app spikes from 40mb to 900mb when I call the GetImages code. I'm just testing against a relatively

small pdf file.

Granted, I'm setting the quality to a higher value, but after the processing is done, the memory is never

runtime exception [modified]

Out of memory exception

Re: Out of memory exception

Re: Out of memory exception

Convert a PDF into a Series of Images using C# and Gho... http://www.codeproject.com/Articles/317700/Convert-a...

6 of 7 1/29/2014 9:27 PM

Permalink | Advertise | Privacy | Mobile

Web02 | 2.8.140127.2 | Last Updated 21 Jun 2013

Article Copyright 2012 by Richard James Moss

Everything else Copyright © CodeProject, 1999-2014

Terms of Use

Layout: fixed | fluid

released even if i explicitly kill all the values.

If I run the process the 2nd time, it gives me the out of memory exception.

Is there something else I need to do to free up the resources after the processing is done?

Sign In ·View Thread ·Permalink

Richard James Moss 22-Jul-13 6:36

Are you calling Dispose on each Bitmap object after you're finished with it?

I suppose it's possible the Garbage Collector isn't clearing up fast enough, you could try introducing a

GC.Collect call or two, although this will affect the performance of your application and slow it

down a touch.

Sign In ·View Thread ·Permalink

Last Visit: 31-Dec-99 18:00 Last Update: 29-Jan-14 11:25 Refresh 1 2 Next »

General News Suggestion Question Bug Answer Joke Rant Admin

Re: Out of memory exception

Convert a PDF into a Series of Images using C# and Gho... http://www.codeproject.com/Articles/317700/Convert-a...

7 of 7 1/29/2014 9:27 PM