Pdfbox 简明教程
PDFBox - Converting PDF To Image.
在之前的章节中,我们已经了解了如何合并多个 PDF 文档。在本节中,我们将了解如何从 PDF 文档的页面中提取图像。
In the previous chapter, we have seen how to merge multiple PDF documents. In this chapter, we will understand how to extract an image from a page of a PDF document.
Generating an Image from a PDF Document
PDFBox 库提供了一个名为 PDFRenderer 的类,该类将 PDF 文档呈现为 AWT BufferedImage。
PDFBox library provides you a class named PDFRenderer which renders a PDF document into an AWT BufferedImage.
以下是要从 PDF 文档中生成图像的步骤。
Following are the steps to generate an image from a PDF document.
Step 1: Loading an Existing PDF Document
使用 PDDocument 类的静态方法 load() 加载现有 PDF 文档。此方法接受一个文件对象作为参数,因为这是一个静态方法,您可使用类名调用它,如下所示:
Load an existing PDF document using the static method load() of the PDDocument class. This method accepts a file object as a parameter, since this is a static method you can invoke it using class name as shown below.
File file = new File("path of the document")
PDDocument document = PDDocument.load(file);
Step 2: Instantiating the PDFRenderer Class
名为 PDFRenderer 的类将 PDF 文档呈现为 AWT BufferedImage 。因此,您需要如下所示实例化此类。此类的构造函数接受一个文档对象;如以下所示传递先前步骤中创建的文档对象。
The class named PDFRenderer renders a PDF document into an AWT BufferedImage. Therefore, you need to instantiate this class as shown below. The constructor of this class accepts a document object; pass the document object created in the previous step as shown below.
PDFRenderer renderer = new PDFRenderer(document);
Step 3: Rendering Image from the PDF Document
您可以使用 Renderer 类的 renderImage() 方法在特定页面中渲染图像,为此方法您需要传递要渲染的图像所在页面的索引。
You can render the image in a particular page using the method renderImage() of the Renderer class, to this method you need to pass the index of the page where you have the image that is to be rendered.
BufferedImage image = renderer.renderImage(0);
Step 4: Writing the Image to a File
您可以使用 write() 方法将前一步中渲染的图像写入文件。为此方法,您需要传递三个参数 −
You can write the image rendered in the previous step to a file using the write() method. To this method, you need to pass three parameters −
-
The rendered image object.
-
String representing the type of the image (jpg or png).
-
File object to which you need to save the extracted image.
ImageIO.write(image, "JPEG", new File("C:/PdfBox_Examples/myimage.jpg"));
Example
假设,我们在 C:\PdfBox_Examples\ 路径中有一个 PDF 文档 — sample.pdf ,其中在第一页中包含一个图像,如下所示。
Suppose, we have a PDF document — sample.pdf in the path C:\PdfBox_Examples\ and this contains an image in its first page as shown below.
此示例演示如何将上述 PDF 文档转换为图像文件。此处,我们将检索 PDF 文档第 1 页中的图像,并将其保存为 myimage.jpg 。将此代码保存为 PdfToImage.java 。
This example demonstrates how to convert the above PDF document into an image file. Here, we will retrieve the image in the 1st page of the PDF document and save it as myimage.jpg. Save this code as PdfToImage.java
import java.awt.image.BufferedImage;
import java.io.File;
import javax.imageio.ImageIO;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.PDFRenderer;
public class PdfToImage {
public static void main(String args[]) throws Exception {
//Loading an existing PDF document
File file = new File("C:/PdfBox_Examples/sample.pdf");
PDDocument document = PDDocument.load(file);
//Instantiating the PDFRenderer class
PDFRenderer renderer = new PDFRenderer(document);
//Rendering an image from the PDF document
BufferedImage image = renderer.renderImage(0);
//Writing the image to a file
ImageIO.write(image, "JPEG", new File("C:/PdfBox_Examples/myimage.jpg"));
System.out.println("Image created");
//Closing the document
document.close();
}
}
使用以下命令从命令提示符处编译并执行已保存的 Java 文件。
Compile and execute the saved Java file from the command prompt using the following commands.
javac PdfToImage.java
java PdfToImage
执行后,上述程序检索给定 PDF 文档中的图像,显示以下消息。
Upon execution, the above program retrieves the image in the given PDF document displaying the following message.
Image created
如果验证给定路径,则可以观察到图像已生成并保存为 myimage.jpg ,如下所示。
If you verify the given path, you can observe that the image is generated and saved as myimage.jpg as shown below.