Pdfbox 简明教程

PDFBox - Quick Guide

PDFBox - Overview

可移植文档格式 (PDF) 是一种文件格式,有助于以独立于应用程序软件、硬件和操作系统的方式来显示数据。

The Portable Document Format (PDF) is a file format that helps to present data in a manner that is independent of Application software, hardware, and operating systems.

每个 PDF 文件都包含固定版面平面文档的描述,包括显示该文档所需的文本、字体、图形和其他信息。

Each PDF file holds description of a fixed-layout flat document, including the text, fonts, graphics, and other information needed to display it.

目前有几个库可供通过程序创建和处理 PDF 文档,例如:

There are several libraries available to create and manipulate PDF documents through programs, such as −

  1. Adobe PDF Library − This library provides API in languages such as C++, .NET and Java and using this we can edit, view print and extract text from PDF documents.

  2. Formatting Objects Processor − Open-source print formatter driven by XSL Formatting Objects and an output independent formatter. The primary output target is PDF.

  3. iText − This library provides API in languages such as Java, C#, and other .NET languages and using this library we can create and manipulate PDF, RTF and HTML documents.

  4. JasperReports − This is a Java reporting tool which generates reports in PDF document including Microsoft Excel, RTF, ODT, comma-separated values and XML files.

What is a PDFBox

Apache PDFBox 是支持 PDF 文档的开发和转换的开源 Java 库。使用此库,您可以开发创建、转换和处理 PDF 文档的 Java 程序。

Apache PDFBox is an open-source Java library that supports the development and conversion of PDF documents. Using this library, you can develop Java programs that create, convert and manipulate PDF documents.

此外,PDFBox 还包括一个命令行实用程序,用于使用可用的 Jar 文件对 PDF 执行各种操作。

In addition to this, PDFBox also includes a command line utility for performing various operations over PDF using the available Jar file.

Features of PDFBox

以下是 PDFBox 值得注意的功能:

Following are the notable features of PDFBox −

  1. Extract Text − Using PDFBox, you can extract Unicode text from PDF files.

  2. Split & Merge − Using PDFBox, you can divide a single PDF file into multiple files, and merge them back as a single file.

  3. Fill Forms − Using PDFBox, you can fill the form data in a document.

  4. Print − Using PDFBox, you can print a PDF file using the standard Java printing API.

  5. Save as Image − Using PDFBox, you can save PDFs as image files, such as PNG or JPEG.

  6. Create PDFs − Using PDFBox, you can create a new PDF file by creating Java programs and, you can also include images and fonts.

  7. Signing− Using PDFBox, you can add digital signatures to the PDF files.

Applications of PDFBox

以下为 PDFBox 的应用程序 −

The following are the applications of PDFBox −

  1. Apache Nutch − Apache Nutch is an open-source web-search software. It builds on Apache Lucene, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.

  2. Apache Tika − Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.

Components of PDFBox

以下是 PDFBox 的四个主要组件 −

The following are the four main components of PDFBox −

  1. PDFBox − This is the main part of the PDFBox. This contains the classes and interfaces related to content extraction and manipulation.

  2. FontBox − This contains the classes and interfaces related to font, and using these classes we can modify the font of the text of the PDF document.

  3. XmpBox − This contains the classes and interfaces that handle XMP metadata.

  4. Preflight − This component is used to verify the PDF files against the PDF/A-1b standard.

PDFBox - Environment

Installing PDFBox

以下是下载 Apache PDFBox 的步骤 -

Following are the steps to download Apache PDFBox −

Step 1 − 通过单击以下链接打开 Apache PDFBox 的主页 − https://pdfbox.apache.org/

Step 1 − Open the homepage of Apache PDFBox by clicking on the following link − https://pdfbox.apache.org/

Step 2 − 上述链接会将您带到主页,如下面的屏幕截图所示 −

Step 2 − The above link will direct you to the homepage as shown in the following screenshot −

pdfbox homepage

Step 3 − 现在,在上述屏幕截图中,单击 Downloads 链接。单击后,您将被导向 PDFBox 的下载页面,如下面的屏幕截图所示。

Step 3 − Now, click on the Downloads link highlighted in the above screenshot. On clicking, you will be directed to the downloads page of PDFBox as shown in the following screenshot.

pdfbox downloads

Step 4 − 在下载页面中,您将获得用于下载 PDFBox 的链接。单击相应链接,获取最新版本。比如,我们选择的是 PDFBox 2.0.1 ,单击后,您将被带到所需的 jar 文件,如下面的屏幕截图所示。

Step 4 − In the Downloads page, you will have links for PDFBox. Click on the respective link for the latest release. For instance, we are opting for PDFBox 2.0.1 and on clicking this, you will be directed to the required jar files as shown in the following screenshot.

pdfbox jarfiles

Step 5 − 下载 jar 文件 pdfbox-2.0.1.jar、fontbox-2.0.1.jar、preflight-2.0.1.jar、xmpbox-2.0.1.jar 以及 pdfbox-tools-2.0.1.jar。

Step 5 − Download the jar files pdfbox-2.0.1.jar, fontbox-2.0.1.jar, preflight-2.0.1.jar, xmpbox-2.0.1.jar and, pdfbox-tools-2.0.1.jar.

Eclipse Installation

下载必需的 jar 文件后,您需要将这些 JAR 文件嵌入到 Eclipse 环境中。您可以通过将构建路径设置为这些 JAR 文件并使用 pom.xml 来执行此操作。

After downloading the required jar files, you have to embed these JAR files to your Eclipse environment. You can do this by setting the Build path to these JAR files and by using pom.xml.

Setting Build Path

以下是将 PDFBox 安装到 Eclipse 中所需的步骤 -

Following are the steps to install PDFBox in Eclipse −

Step 1 − 确保在您的系统中安装了 Eclipse。如果没有,请下载并安装 Eclipse。

Step 1 − Ensure that you have installed Eclipse in your system. If not, download and install Eclipse in your system.

Step 2 − 打开 Eclipse,单击文件、新建,打开一个新项目,如下面的屏幕截图所示。

Step 2 − Open Eclipse, click on File, New, and Open a new project as shown in the following screenshot.

eclipse file menu

Step 3 − 选择项目后,您将获得 New Project 向导。在此向导中,选择 Java 项目,然后单击 Next 按钮,如下面的屏幕截图所示。

Step 3 − On selecting the project, you will get New Project wizard. In this wizard, select Java project and proceed by clicking Next button as shown in the following screenshot.

eclipse newproject wizard

Step 4 − 继续前进后,您将被带到 New Java Project wizard 。创建一个新项目,然后单击 Next ,如下面的屏幕截图所示。

Step 4 − On proceeding forward, you will be directed to the New Java Project wizard. Create a new project and click on Next as shown in the following screenshot.

create project wizard

Step 5 − 创建新项目后,右键单击它;选择 Build Path 并单击 Configure Build Path… ,如下面的屏幕截图所示。

Step 5 − After creating a new project, right click on it; select Build Path and click on Configure Build Path… as shown in the following screenshot.

eclipse build path

Step 6 − 单击 Build Path 选项后,您会被定向到 Java Build Path wizard 。 如以下屏幕截图所示,选择 Add External JARs

Step 6 − On clicking on the Build Path option you will be directed to the Java Build Path wizard. Select the Add External JARs as shown in the following screenshot.

eclipse external jars

Step 7 − 如以下屏幕截图所示,选择 Jar 文件 fontbox-2.0.1.jar, pdfbox-2.0.1.jar, pdfbox-tools-2.0.1.jar, preflight-2.0.1.jar, xmpbox-2.0.1.jar

Step 7 − Select the jar files fontbox-2.0.1.jar, pdfbox-2.0.1.jar, pdfbox-tools-2.0.1.jar, preflight-2.0.1.jar, xmpbox-2.0.1.jar as shown in the following screenshot.

jarfiles location

Step 8 − 单击上述屏幕截图中的 Open 按钮后,这些文件将被添加到您的代码库,如下所示。

Step 8 − On clicking the Open button in the above screenshot, those files will be added to your library as shown in the following screenshot.

jarfiles added

Step 9 − 单击 OK ,您将成功地将所需的 JAR 文件添加到当前项目中,并且可以通过展开引用的代码库来验证这些添加的代码库,如下所示。

Step 9 − On clicking OK, you will successfully add the required JAR files to the current project and you can verify these added libraries by expanding the Referenced Libraries as shown in the following screenshot.

eclipse jar files

Using pom.xml

将项目转换为 Maven 项目并添加以下内容到其 pom.xml.

Convert the project into maven project and add the following contents to its pom.xml.

<project xmlns="https://maven.apache.org/POM/4.0.0"
   xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="https://maven.apache.org/POM/4.0.0
   https://maven.apache.org/xsd/maven-4.0.0.xsd">
   <modelVersion>4.0.0</modelVersion>
   <groupId>my_project</groupId>
   <artifactId>my_project</artifactId>
   <version>0.0.1-SNAPSHOT</version>

   <build>
      <sourceDirectory>src</sourceDirectory>
      <plugins>
         <plugin>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.3</version>
            <configuration>
               <source>1.8</source>
               <target>1.8</target>
            </configuration>
         </plugin>
      </plugins>
   </build>

   <dependencies>
      <dependency>
         <groupId>org.apache.pdfbox</groupId>
         <artifactId>pdfbox</artifactId>
         <version>2.0.1</version>
      </dependency>

      <dependency>
         <groupId>org.apache.pdfbox</groupId>
         <artifactId>fontbox</artifactId>
         <version>2.0.0</version>
      </dependency>

      <dependency>
         <groupId>org.apache.pdfbox</groupId>
         <artifactId>jempbox</artifactId>
         <version>1.8.11</version>
      </dependency>

      <dependency>
         <groupId>org.apache.pdfbox</groupId>
         <artifactId>xmpbox</artifactId>
         <version>2.0.0</version>
      </dependency>

      <dependency>
         <groupId>org.apache.pdfbox</groupId>
         <artifactId>preflight</artifactId>
         <version>2.0.0</version>
      </dependency>

      <dependency>
         <groupId>org.apache.pdfbox</groupId>
         <artifactId>pdfbox-tools</artifactId>
         <version>2.0.0</version>
      </dependency>

   </dependencies>

</project>

PDFBox - Creating a PDF Document

现在,我们来了解如何使用 PDFBox 库创建 PDF 文档。

Let us now understand how to create a PDF document using the PDFBox library.

Creating an Empty PDF Document

您可以通过实例化 PDDocument 类创建一个空的 PDF 文档。您可以使用 Save() 方法按所需位置保存文档。

You can create an empty PDF Document by instantiating the PDDocument class. You can save the document in your desired location using the Save() method.

以下是要创建空的 PDF 文档的步骤。

Following are the steps to create an empty PDF document.

Step 1: Creating an Empty Document

属于 org.apache.pdfbox.pdmodel 包的 PDDocument 类是 PDFDocument 的内存表示。因此,通过实例化此类,您可以如以下代码块中所示创建一个空的 PDFDocument。

The PDDocument class that belongs to the package org.apache.pdfbox.pdmodel, is an In-memory representation of the PDFDocument. Therefore, by instantiating this class, you can create an empty PDFDocument as shown in the following code block.

PDDocument document = new PDDocument();

Step 2: Saving the Document

创建文档后,需要将该文档保存到所需路径中,您可以通过使用 PDDocument 类的 Save() 方法做到这一点。此方法接受一个字符串值作为参数,该字符串值代表要存储文档的路径。以下是 PDDocument 类 save() 方法的原型。

After creating the document, you need to save this document in the desired path, you can do so using the Save() method of the PDDocument class. This method accepts a string value, representing the path where you want to store the document, as a parameter. Following is the prototype of the save() method of the PDDocument class.

document.save("Path");

Step 3: Closing the Document

在任务完成后,最后,您需要使用 close () 方法关闭 PDDocument 对象。以下是 PDDocument 类 close() 方法的原型。

When your task is completed, at the end, you need to close the PDDocument object using the close () method. Following is the prototype of the close() method of PDDocument class.

document.close();

Example

此示例演示了 PDF 文档的创建。在此,我们将创建一个名为 my_doc.pdf 的 Java 程序以生成 PDF 文档,并将其保存在路径 C:/PdfBox_Examples/ 中。将此代码保存在名为 Document_Creation.java. 的文件中

This example demonstrates the creation of a PDF Document. Here, we will create a Java program to generate a PDF document named my_doc.pdf and save it in the path C:/PdfBox_Examples/. Save this code in a file with name Document_Creation.java.

import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;

public class Document_Creation {

   public static void main (String args[]) throws IOException {

      //Creating PDF document object
      PDDocument document = new PDDocument();

      //Saving the document
      document.save("C:/PdfBox_Examples/my_doc.pdf");

      System.out.println("PDF created");

      //Closing the document
      document.close();

   }
}

使用以下命令从命令提示符处编译并执行已保存的 Java 文件。

Compile and execute the saved Java file from the command prompt using the following commands.

javac Document_Creation.java
java Document_Creation

在执行以上程序时,创建了一个 PDF 文档,显示了以下消息。

Upon execution, the above program creates a PDF document displaying the following message.

PDF created

如果验证指定路径,则可以找到创建的 PDF 文档,如下所示。

If you verify the specified path, you can find the created PDF document as shown below.

my doc saved

由于这是一个空文档,因此如果尝试打开此文档,将出现提示,显示错误消息,如下面的屏幕截图所示。

Since this is an empty document, if you try to open this document, this gives you a prompt displaying an error message as shown in the following screenshot.

empty pdf

PDFBox - Adding Pages

在上一章中,我们已经了解了如何创建 PDF 文档。创建 PDF 文档后,您需要向其中添加页面。现在,我们来了解如何在 PDF 文档中添加页面。

In the previous chapter, we have seen how to create a PDF document. After creating a PDF document, you need to add pages to it. Let us now understand how to add pages in a PDF document.

Adding Pages to a PDF Document

您可以通过实例化 PDPage 类创建空页面,并使用 PDDocument 类的 addPage() 方法将其添加到 PDF 文档中。

You can create an empty page by instantiating the PDPage class and add it to the PDF document using the addPage() method of the PDDocument class.

以下是创建空文档并向其中添加页面的步骤。

Following are the steps to create an empty document and add pages to it.

Step 1: Creating an Empty Document

通过如下所示实例化 PDDocument 类来创建一个空 PDF 文档。

Create an empty PDF document by instantiating the PDDocument class as shown below.

PDDocument document = new PDDocument();

Step 2: Creating a Blank Page

PDPage 类表示 PDF 文档中的页面,因此您可以通过实例化此类来创建一个空页面,如下面的代码块所示。

The PDPage class represents a page in the PDF document therefore, you can create an empty page by instantiating this class as shown in the following code block.

PDPage my_page = new PDPage();

Step 3: Adding Page to the Document

您可以使用 PDDocument 类的 addPage() 方法将页面添加到 PDF 文档中。您需要将 PDPage 对象作为参数传递给此方法。

You can add a page to the PDF document using the addPage() method of the PDDocument class. To this method you need to pass the PDPage object as a parameter.

因此,将上一步中创建的空白页面添加到 PDDocument 对象,如下面的代码块所示。

Therefore, add the blank page created in the previous step to the PDDocument object as shown in the following code block.

document.addPage(my_page);

通过这种方式,您可以向 PDF 文档中添加任意数量的页面。

In this way you can add as many pages as you want to a PDF document.

Step 4: Saving the Document

添加完所有页面后,使用 PDDocument 类的 save() 方法保存 PDF 文档,如下面的代码块所示。

After adding all the pages, save the PDF document using the save() method of the PDDocument class as shown in the following code block.

document.save("Path");

Step 5: Closing the Document

最后,使用 PDDocument 类的 close() 方法关闭文档,如下所示。

Finally close the document using the close() method of the PDDocument class as shown below.

document.close();

Example

此示例演示了如何创建 PDF 文档并向其中添加页面。在此,我们将创建一个名为 my_doc.pdf 的 PDF 文档,并进一步向其中添加 10 个空白页面,并将其保存在路径 C:/PdfBox_Examples/ 中。将此代码保存在名为 Adding_pages.java. 的文件中

This example demonstrates how to create a PDF Document and add pages to it. Here we will create a PDF Document named my_doc.pdf and further add 10 blank pages to it, and save it in the path C:/PdfBox_Examples/. Save this code in a file with name Adding_pages.java.

package document;

import java.io.IOException;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;

public class Adding_Pages {

   public static void main(String args[]) throws IOException {

      //Creating PDF document object
      PDDocument document = new PDDocument();

      for (int i=0; i<10; i++) {
         //Creating a blank page
         PDPage blankPage = new PDPage();

         //Adding the blank page to the document
         document.addPage( blankPage );
      }

      //Saving the document
      document.save("C:/PdfBox_Examples/my_doc.pdf");
      System.out.println("PDF created");

      //Closing the document
      document.close();

   }
}

使用以下命令从命令提示符编译并执行已保存的 Java 文件 −

Compile and execute the saved Java file from the command prompt using the following commands −

javac Adding_pages.java
java Adding_pages

执行后,上述程序会创建一个空白页面的 PDF 文档,显示以下消息 −

Upon execution, the above program creates a PDF document with blank pages displaying the following message −

PDF created

如果您验证指定的路径,则可以在下面的屏幕截图中找到创建的 PDF 文档。

If you verify the specified path, you can find the created PDF document as shown in the following screenshot.

create document

PDFBox - Loading a Document

在前面的示例中,你已经看到了如何新建文档并向其中添加页。本节将教你如何加载系统中已存在的 PDF 文件,并对其执行一些操作。

In the previous examples, you have seen how to create a new document and add pages to it. This chapter teaches you how to load a PDF document that already exists in your system, and perform some operations on it.

Loading an Existing PDF Document

PDDocument 类的 load() 方法用于加载一个现有的 PDF 文件。遵循下面给出的步骤来加载一个现有的 PDF 文件。

The load() method of the PDDocument class is used to load an existing PDF document. Follow the steps given below to load an existing PDF document.

Step 1: Loading an Existing PDF Document

使用 PDDocument 类的静态方法 load() 加载现有 PDF 文档。此方法接受一个文件对象作为参数,因为这是一个静态方法,您可使用类名调用它,如下所示:

Load an existing PDF document using the static method load() of the PDDocument class. This method accepts a file object as a parameter, since this is a static method you can invoke it using class name as shown below.

File file = new File("path of the document")
PDDocument.load(file);

Step 2: Perform the Required Operations

执行所需的操作,如向加载的文档中添加页、添加文本、添加图像。

Perform the required operations such as adding pages adding text, adding images to the loaded document.

Step 3: Saving the Document

添加完所有页面后,使用 PDDocument 类的 save() 方法保存 PDF 文档,如下面的代码块所示。

After adding all the pages, save the PDF document using the save() method of the PDDocument class as shown in the following code block.

document.save("Path");

Step 4: Closing the Document

最后,使用 PDDocument 类的 close() 方法关闭文档,如下所示。

Finally close the document using the close() method of the PDDocument class as shown below.

document.close();

Example

假设我们有一个 PDF 文档,它包含一个页面,路径为 C:/PdfBox_Examples/ ,如下图所示。

Suppose we have a PDF document which contains a single page, in the path, C:/PdfBox_Examples/ as shown in the following screenshot.

loading document

此示例演示如何加载现有 PDF 文档。此处,我们将加载上述 sample.pdf PDF 文档,向其中添加一页,并以相同名称保存在同一路径中。

This example demonstrates how to load an existing PDF Document. Here, we will load the PDF document sample.pdf shown above, add a page to it, and save it in the same path with the same name.

Step 1 - 将此代码保存到名为 LoadingExistingDocument.java. 的文件中

Step 1 − Save this code in a file with name LoadingExistingDocument.java.

import java.io.File;
import java.io.IOException;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
public class LoadingExistingDocument {

   public static void main(String args[]) throws IOException {

      //Loading an existing document
      File file = new File("C:/PdfBox_Examples/sample.pdf");
      PDDocument document = PDDocument.load(file);

      System.out.println("PDF loaded");

      //Adding a blank page to the document
      document.addPage(new PDPage());

      //Saving the document
      document.save("C:/PdfBox_Examples/sample.pdf");

      //Closing the document
      document.close();

   }
}

使用以下命令,从命令提示符中编译并执行已保存的 Java 文件

Compile and execute the saved Java file from the command prompt using the following commands

javac LoadingExistingDocument.java
java LoadingExistingDocument

执行后,上述程序将加载指定的 PDF 文件,并向其中添加一个空白页,显示如下消息。

Upon execution, the above program loads the specified PDF document and adds a blank page to it displaying the following message.

PDF loaded

如果验证指定的路径,您可以找到一个附加到指定的 PDF 文档中的其他页面,如下所示。

If you verify the specified path, you can find an additional page added to the specified PDF document as shown below.

additional page in document

PDFBox - Removing Pages

现在我们来了解如何从 PDF 文档中删除页面。

Let us now learn how to remove pages from a PDF document.

Removing Pages from an Existing Document

您可以使用 PDDocument 类中的 removePage() 方法从现有 PDF 文档中删除一个页面。

You can remove a page from an existing PDF document using the removePage() method of the PDDocument class.

Step 1: Loading an Existing PDF Document

使用 PDDocument 类的静态方法 load() 加载现有 PDF 文档。此方法接受一个文件对象作为参数,因为这是一个静态方法,您可使用类名调用它,如下所示:

Load an existing PDF document using the static method load() of the PDDocument class. This method accepts a file object as a parameter, since this is a static method you can invoke it using class name as shown below.

File file = new File("path of the document")
PDDocument.load(file);

Step 2: Listing the Number of Pages

您可以使用 getNumberOfPages() 方法列出 PDF 文档中存在的页数,如下所示。

You can list the number of pages that exists in the PDF document using the getNumberOfPages() method as shown below.

int noOfPages= document.getNumberOfPages();
System.out.print(noOfPages);

Step 3: Removing the Page

您可以使用 PDDocument 类中的 removePage() 方法从 PDF 文档中删除一个页面。对于此方法,您需要传递要删除的页面的索引。

You can remove a page from the PDF document using the removePage() method of the PDDocument class. To this method, you need to pass the index of the page that is to be deleted.

在为 PDF 文档中的页面指定索引时,请记住这些页面的索引从零开始,即,如果您要删除第 1 页,则索引值必须为 0。

While specifying the index for the pages in a PDF document, keep in mind that indexing of these pages starts from zero, i.e., if you want to delete the 1st page then the index value needs to be 0.

document.removePage(2);

Step 4: Saving the Document

删除页面后,使用 PDDocument 类中的 save() 方法保存 PDF 文档,如下面的代码块所示。

After removing the page, save the PDF document using the save() method of the PDDocument class as shown in the following code block.

document.save("Path");

Step 5: Closing the Document

最后,按如下所示使用 PDDocument 类的 close() 方法关闭文档。

Finally, close the document using the close() method of the PDDocument class as shown below.

document.close();

Example

假设我们有一个名为 sample.pdf 的 PDF 文档,它包含三个空页面,如下所示。

Suppose, we have a PDF document with name sample.pdf and it contains three empty pages as shown below.

removing page before

此示例演示如何从现有 PDF 文档中删除页面。在此,我们将加载上面指定的,名为 sample.pdf 的 PDF 文档,从中删除一个页面,并将其保存在 C:/PdfBox_Examples/ 路径中。将此代码保存在名为 Removing_pages.java 的文件中。

This example demonstrates how to remove pages from an existing PDF document. Here, we will load the above specified PDF document named sample.pdf, remove a page from it, and save it in the path C:/PdfBox_Examples/. Save this code in a file with name Removing_pages.java.

import java.io.File;
import java.io.IOException;

import org.apache.pdfbox.pdmodel.PDDocument;

public class RemovingPages {

   public static void main(String args[]) throws IOException {

      //Loading an existing document
      File file = new File("C:/PdfBox_Examples/sample.pdf");
      PDDocument document = PDDocument.load(file);

      //Listing the number of existing pages
      int noOfPages= document.getNumberOfPages();
      System.out.print(noOfPages);

      //Removing the pages
      document.removePage(2);

      System.out.println("page removed");

      //Saving the document
      document.save("C:/PdfBox_Examples/sample.pdf");

      //Closing the document
      document.close();

   }
}

使用以下命令从命令提示符处编译并执行已保存的 Java 文件。

Compile and execute the saved Java file from the command prompt using the following commands.

javac RemovingPages.java
java RemovingPages

在执行时,上述程序创建了一个 PDF 文档,其中包含空白页面,显示以下消息。

Upon execution, the above program creates a PDF document with blank pages displaying the following message.

3
page removed

如果您验证指定的路径,您会发现所需的页面已被删除,并且文档中仅剩两页,如下所示。

If you verify the specified path, you can find that the required page was deleted and only two pages remained in the document as shown below.

removing page after

PDFBox - Document Properties

像其他文件一样,PDF 文档也具有文档属性。这些属性是键值对。每个属性都提供有关文档的特定信息。

Like other files, a PDF document also has document properties. These properties are key-value pairs. Each property gives particular information about the document.

以下是 PDF 文档的属性 −

Following are the properties of a PDF document −

S.No.

Property & Description

1

File This property holds the name of the file.

2

Title Using this property, you can set the title for the document.

3

Author Using this property, you can set the name of the author for the document.

4

Subject Using this property, you can specify the subject of the PDF document.

5

Keywords Using this property, you can list the keywords with which we can search the document.

6

Created Using this property, you can set the date created for the document.

7

Modified Using this property, you can set the date modified for the document.

8

Application Using this property, you can set the Application of the document.

以下是 PDF 文档的文档属性表截图。

Following is a screenshot of the document properties table of a PDF document.

pdf properties

Setting the Document Properties

PDFBox 为您提供了名为 PDDocumentInformation 的类。此类包含一组 setter 和 getter 方法。

PDFBox provides you a class named PDDocumentInformation. This class has a set of setter and getter methods.

此类的 setter 方法用于将值设置给文档的各种属性,而 getter 方法用于检索这些值。

The setter methods of this class are used to set values to various properties of a document and getter methods which are used to retrieve these values.

以下是 PDDocumentInformation 类的 setter 方法。

Following are the setter methods of the PDDocumentInformation class.

S.No.

Method & Description

1

setAuthor(String author) This method is used to set the value for the property of the PDF document named Author.

2

setTitle(String title) This method is used to set the value for the property of the PDF document named Title.

3

setCreator(String creator) This method is used to set the value for the property of the PDF document named Creator.

4

setSubject(String subject) This method is used to set the value for the property of the PDF document named Subject.

5

setCreationDate(Calendar date) This method is used to set the value for the property of the PDF document named CreationDate.

6

setModificationDate(Calendar date) This method is used to set the value for the property of the PDF document named ModificationDate.

7

setKeywords(String keywords list) This method is used to set the value for the property of the PDF document named Keywords.

Example

PDFBox 提供了一个名为 PDDocumentInformation 的类,此类提供了各种方法。这些方法可以将各种属性设置给文档并检索这些属性。

PDFBox provides a class called PDDocumentInformation and this class provides various methods. These methods can set various properties to the document and retrieve them.

此示例演示了如何将属性(如 Author, Title, Date, and Subject )添加到 PDF 文档。此处,我们将创建名为 doc_attributes.pdf 的 PDF 文档,向其中添加各种属性,并将其保存在路径 C:/PdfBox_Examples/ 中。在名为 AddingAttributes.java 的文件中保存此代码。

This example demonstrates how to add properties such as Author, Title, Date, and Subject to a PDF document. Here, we will create a PDF document named doc_attributes.pdf, add various attributes to it, and save it in the path C:/PdfBox_Examples/. Save this code in a file with name AddingAttributes.java.

import java.io.IOException;
import java.util.Calendar;
import java.util.GregorianCalendar;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentInformation;
import org.apache.pdfbox.pdmodel.PDPage;

public class AddingDocumentAttributes {
   public static void main(String args[]) throws IOException {

      //Creating PDF document object
      PDDocument document = new PDDocument();

      //Creating a blank page
      PDPage blankPage = new PDPage();

      //Adding the blank page to the document
      document.addPage( blankPage );

      //Creating the PDDocumentInformation object
      PDDocumentInformation pdd = document.getDocumentInformation();

      //Setting the author of the document
      pdd.setAuthor("Tutorialspoint");

      // Setting the title of the document
      pdd.setTitle("Sample document");

      //Setting the creator of the document
      pdd.setCreator("PDF Examples");

      //Setting the subject of the document
      pdd.setSubject("Example document");

      //Setting the created date of the document
      Calendar date = new GregorianCalendar();
      date.set(2015, 11, 5);
      pdd.setCreationDate(date);
      //Setting the modified date of the document
      date.set(2016, 6, 5);
      pdd.setModificationDate(date);

      //Setting keywords for the document
      pdd.setKeywords("sample, first example, my pdf");

      //Saving the document
      document.save("C:/PdfBox_Examples/doc_attributes.pdf");

      System.out.println("Properties added successfully ");

      //Closing the document
      document.close();

   }
}

使用以下命令从命令提示符处编译并执行已保存的 Java 文件。

Compile and execute the saved Java file from the command prompt using the following commands.

javac AddingAttributes.java
java AddingAttributes

执行时,上述程序将所有指定属性添加到文档中,并显示以下消息。

Upon execution, the above program adds all the specified attributes to the document displaying the following message.

Properties added successfully

现在,如果您访问给定路径,您可以在其中找到创建的 PDF。右击文档并选择文档属性选项,如下所示。

Now, if you visit the given path you can find the PDF created in it. Right click on the document and select the document properties option as shown below.

document properties

以下方法为文档属性窗口,你可以观察到文档的所有属性均设置为指定的值。

This will give you the document properties window and here you can observe all the properties of the document were set to specified values.

properties menu

Retrieving the Document Properties

你可以使用 PDDocumentInformation 类提供的 getter 方法来检索文档属性。

You can retrieve the properties of a document using the getter methods provided by the PDDocumentInformation class.

下面是 PDDocumentInformation 类的 getter 方法。

Following are the getter methods of the PDDocumentInformation class.

S.No.

Method & Description

1

getAuthor() This method is used to retrieve the value for the property of the PDF document named Author.

2

getTitle() This method is used to retrieve the value for the property of the PDF document named Title.

3

getCreator() This method is used to retrieve the value for the property of the PDF document named Creator.

4

getSubject() This method is used to retrieve the value for the property of the PDF document named Subject.

5

getCreationDate() This method is used to retrieve the value for the property of the PDF document named CreationDate.

6

getModificationDate() This method is used to retrieve the value for the property of the PDF document named ModificationDate.

7

getKeywords() This method is used to retrieve the value for the property of the PDF document named Keywords.

Example

此示例演示如何检索现有 PDF 文件的属性。在此处,我们将创建一个 Java 程序并加载名为 doc_attributes.pdf 的 PDF 文件,它保存在路径 C:/PdfBox_Examples/ 中,并检索其属性。将此代码保存到名为 RetrivingDocumentAttributes.java 的文件中。

This example demonstrates how to retrieve the properties of an existing PDF document. Here, we will create a Java program and load the PDF document named doc_attributes.pdf, which is saved in the path C:/PdfBox_Examples/, and retrieve its properties. Save this code in a file with name RetrivingDocumentAttributes.java.

import java.io.File;
import java.io.IOException;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentInformation;

public class RetrivingDocumentAttributes {
   public static void main(String args[]) throws IOException {

      //Loading an existing document
      File file = new File("C:/PdfBox_Examples/doc_attributes.pdf")
      PDDocument document = PDDocument.load(file);
      //Getting the PDDocumentInformation object
      PDDocumentInformation pdd = document.getDocumentInformation();

      //Retrieving the info of a PDF document
      System.out.println("Author of the document is :"+ pdd.getAuthor());
      System.out.println("Title of the document is :"+ pdd.getTitle());
      System.out.println("Subject of the document is :"+ pdd.getSubject());

      System.out.println("Creator of the document is :"+ pdd.getCreator());
      System.out.println("Creation date of the document is :"+ pdd.getCreationDate());
      System.out.println("Modification date of the document is :"+
         pdd.getModificationDate());
      System.out.println("Keywords of the document are :"+ pdd.getKeywords());

      //Closing the document
      document.close();
   }
}

使用以下命令从命令提示符处编译并执行已保存的 Java 文件。

Compile and execute the saved Java file from the command prompt using the following commands.

javac RetrivingDocumentAttributes.java
java RetrivingDocumentAttributes

执行后,上述程序将检索文档的所有属性并按如下所示显示它们。

Upon execution, the above program retrieves all the attributes of the document and displays them as shown below.

Author of the document is :Tutorialspoint
Title of the document is :Sample document
Subject of the document is :Example document
Creator of the document is :PDF Examples
Creation date of the document is :11/5/2015
Modification date of the document is :6/5/2016
Keywords of the document are :sample, first example, my pdf

PDFBox - Adding Text

在上一个章节中,我们讨论了如何向 PDF 文档添加页面。在本章节中,我们将讨论如何向现有 PDF 文档添加文本。

In the previous chapter, we discussed how to add pages to a PDF document. In this chapter, we will discuss how to add text to an existing PDF document.

Adding Text to an Existing PDF Document

您可以使用 PDFBox 库向文档添加内容,这为您提供了名为 PDPageContentStream 的类,其中包含在 PDF 文档页面中插入文本、图片和其他类型内容的必要方法。

You can add contents to a document using the PDFBox library, this provides you a class named PDPageContentStream which contains the required methods to insert text, images, and other types of contents in a page of a PDFDocument.

以下是创建空白文档并在其中向页面添加内容的步骤。

Following are the steps to create an empty document and add contents to a page in it.

Step 1: Loading an Existing Document

您可以使用 PDDocument 类的 load() 方法加载现有文档。因此,实例化此类并按如下所示加载所需文档。

You can load an existing document using the load() method of the PDDocument class. Therefore, instantiate this class and load the required document as shown below.

File file = new File("Path of the document");
PDDocument doc = document.load(file);

Step 2: Getting the Required Page

您可以使用 getPage() 方法获取文档中所需的页面。按如下所示将索引传递给此方法来检索所需页面的对象。

You can get the required page in a document using the getPage() method. Retrieve the object of the required page by passing its index to this method as shown below.

PDPage page = doc.getPage(1);

Step 3: Preparing the Content Stream

您可以使用类 PDPageContentStream 的对象插入多种数据元素。您需要将文档对象和页面对象传递给这个类的构造函数,因此,实例化这个类需要传递前面步骤创建的这两个对象,如下所示。

You can insert various kinds of data elements using the object of the class PDPageContentStream. You need to pass the document object and the page object to the constructor of this class therefore, instantiate this class by passing these two objects created in the previous steps as shown below.

PDPageContentStream contentStream = new PDPageContentStream(doc, page);

Step 4: Beginning the Text

在向 PDF 文档插入文本时,您可以使用 PDPageContentStream 类中的 beginText() 和 endText() 方法指定文本的起始点和结束点,如下所示。

While inserting text in a PDF document, you can specify the start and end points of the text using the beginText() and endText() methods of the PDPageContentStream class as shown below.

contentStream.beginText();
………………………..
code to add text content
………………………..
contentStream.endText();

因此,按如下所示使用 beginText() 方法开始文本。

Therefore, begin the text using the beginText() method as shown below.

contentStream.beginText();

Step 5: Setting the Position of the Text

通过使用 newLineAtOffset() 方法,您可以在页面中的内容流上设置位置。

Using the newLineAtOffset() method, you can set the position on the content stream in the page.

//Setting the position for the line
contentStream.newLineAtOffset(25, 700);

Step 6: Setting the Font

您可以使用 PDPageContentStream 类的 setFont() 方法将文本字体设置为所需的样式,如下所示。您需要向此方法传递字体类型和大小。

You can set the font of the text to the required style using the setFont() method of the PDPageContentStream class as shown below. To this method you need to pass the type and size of the font.

contentStream.setFont( font_type, font_size );

Step 7: Inserting the Text

您可以使用 PDPageContentStream 类的 ShowText() 方法将文本插入到页面中,如下所示。该方法以字符串的形式接受所需的文本。

You can insert the text into the page using the ShowText() method of the PDPageContentStream class as shown below. This method accepts the required text in the form of string.

contentStream.showText(text);

Step 8: Ending the Text

插入文本后,您需要使用 PDPageContentStream 类的 endText() 方法结束文本,如下所示。

After inserting the text, you need to end the text using the endText() method of the PDPageContentStream class as shown below.

contentStream.endText();

Step 9: Closing the PDPageContentStream

按如下所示使用 close() 方法关闭 PDPageContentStream 对象。

Close the PDPageContentStream object using the close() method as shown below.

contentstream.close();

Step 10: Saving the Document

添加所需内容后,按如下所示使用 PDDocument 类的 save() 方法保存 PDF 文档。

After adding the required content, save the PDF document using the save() method of the PDDocument class as shown in the following code block.

doc.save("Path");

Step 11: Closing the Document

最后,按如下所示使用 PDDocument 类的 close() 方法关闭文档。

Finally, close the document using the close() method of the PDDocument class as shown below.

doc.close();

Example

本例演示如何向文档中的页面添加内容。在这里,我们将创建一个 Java 程序以加载名为 my_doc.pdf 的 PDF 文档,该文档保存在路径 C:/PdfBox_Examples/ 中,并向其中添加一些文本。将该代码保存在名为 AddingContent.java 的文件中。

This example demonstrates how to add contents to a page in a document. Here, we will create a Java program to load the PDF document named my_doc.pdf, which is saved in the path C:/PdfBox_Examples/, and add some text to it. Save this code in a file with name AddingContent.java.

import java.io.File;
import java.io.IOException;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDType1Font;

public class AddingContent {
   public static void main (String args[]) throws IOException {

      //Loading an existing document
      File file = new File("C:/PdfBox_Examples/my_doc.pdf");
      PDDocument document = PDDocument.load(file);

      //Retrieving the pages of the document
      PDPage page = document.getPage(1);
      PDPageContentStream contentStream = new PDPageContentStream(document, page);

      //Begin the Content stream
      contentStream.beginText();

      //Setting the font to the Content stream
      contentStream.setFont(PDType1Font.TIMES_ROMAN, 12);

      //Setting the position for the line
      contentStream.newLineAtOffset(25, 500);

      String text = "This is the sample document and we are adding content to it.";

      //Adding text in the form of string
      contentStream.showText(text);

      //Ending the content stream
      contentStream.endText();

      System.out.println("Content added");

      //Closing the content stream
      contentStream.close();

      //Saving the document
      document.save(new File("C:/PdfBox_Examples/new.pdf"));

      //Closing the document
      document.close();
   }
}

使用以下命令从命令提示符处编译并执行已保存的 Java 文件。

Compile and execute the saved Java file from the command prompt using the following commands.

javac AddingContent.java
java AddingContent

在执行时,上述程序会将给定的文本添加到文档并显示以下消息。

Upon execution, the above program adds the given text to the document and displays the following message.

Content added

如果您在指定路径中验证 PDF 文档 new.pdf ,您可以观察到给定的内容已添加到文档中,如下所示。

If you verify the PDF Document new.pdf in the specified path, you can observe that the given content is added to the document as shown below.

adding text

PDFBox - Adding Multiple Lines

在上一章提供的示例中,我们讨论了如何向 PDF 中的页面添加文本,但是通过此程序,您只能添加适合单行文本。如果您尝试添加更多内容,则所有超出行间距的文本都将不会显示。

In the example provided in the previous chapter we discussed how to add text to a page in a PDF but through this program, you can only add the text that would fit in a single line. If you try to add more content, all the text that exceeds the line space will not be displayed.

例如,如果您按以下方式传递以下字符串来执行上一章中的上述程序,则只会显示其一部分。

For example, if you execute the above program in the previous chapter by passing the following string only a part of it will be displayed.

String text = "This is an example of adding text to a page in the pdf document. we can
   add as many lines as we want like this using the showText() method of the
   ContentStream class";

用上述字符串替换上一章示例中的 string text 并执行它。执行后,您将收到以下输出。

Replace the string text of the example in the previous chapter with the above mentioned string and execute it. Upon execution, you will receive the following output.

single line extended

如果您仔细观察输出,则会注意到仅显示字符串的一部分。

If you observe the output carefully, you can notice that only a part of the string is displayed.

要向 PDF 添加多行,您需要使用 setLeading() 方法设置前导,并在完成每行后使用 newline() 方法移至新行。

In order to add multiple lines to a PDF you need to set the leading using the setLeading() method and shift to new line using newline() method after finishing each line.

Steps

以下是创建空白文档并在其中向页面添加内容的步骤。

Following are the steps to create an empty document and add contents to a page in it.

Step 1: Loading an Existing Document

您可以使用 PDDocument 类的 load() 方法加载现有文档。因此,实例化此类并按如下所示加载所需文档。

You can load an existing document using the load() method of the PDDocument class. Therefore, instantiate this class and load the required document as shown below.

File file = new File("Path of the document");
PDDocument doc = PDDocument.load(file);

Step 2: Getting the Required Page

您可以使用 getPage() 方法获取文档中所需的页面。按如下所示将索引传递给此方法来检索所需页面的对象。

You can get the required page in a document using the getPage() method. Retrieve the object of the required page by passing its index to this method as shown below.

PDPage page = doc.getPage(1);

Step 3: Preparing the Content stream

您可以使用名为 PDPageContentStream 的类的对象插入各种数据元素。您需要将文档对象和页面对象传递给此类的构造函数,因此,按如下所示传递在先前步骤中创建的这两个对象来实例化此类。

You can insert various kinds of data elements using the object of the class named PDPageContentStream. You need to pass the document object and the page object to the constructor of this class therefore, instantiate this class by passing these two objects created in the previous steps as shown below.

PDPageContentStream contentStream = new PDPageContentStream(doc, page);

Step 4: Beginning the Text

在 PDF 文档中插入文本时,您可以使用 PDPageContentStream 类的 beginText()endText() 方法指定文本的开始和结束点,如下所示。

While inserting text in a PDF document, you can specify the start and end points of the text using the beginText() and endText() methods of the PDPageContentStream class as shown below.

contentStream.beginText();
………………………..
code to add text content
………………………..
contentStream.endText();

因此,按如下所示使用 beginText() 方法开始文本。

Therefore, begin the text using the beginText() method as shown below.

contentStream.beginText();

Step 5: Setting the Position of the Text

通过使用 newLineAtOffset() 方法,您可以在页面中的内容流上设置位置。

Using the newLineAtOffset() method, you can set the position on the content stream in the page.

//Setting the position for the line
contentStream.newLineAtOffset(25, 700);

Step 6: Setting the Font

您可以按如下所示使用 PDPageContentStream 类的 setFont() 方法将文本的字体设置所需样式,对于此方法,您需要传递字体的类型和大小。

You can set the font of the text to the required style using the setFont() method of the PDPageContentStream class as shown below to this method you need to pass the type and size of the font.

contentStream.setFont( font_type, font_size );

Step 7: Setting the Text Leading

您可以按如下所示使用 setLeading() 方法设置文本前导。

You can set the text leading using the setLeading() method as shown below.

contentStream.setLeading(14.5f);

Step 8: Inserting Multiple Strings Using newline()

您可以按如下所示使用 PDPageContentStream 类的 ShowText() 方法插入多个字符串,方法是使用 newline() 方法将它们逐个分割。

You can insert multiple strings using the ShowText() method of the PDPageContentStream class, by dividing each of them using the newline() method as shown below.

contentStream. ShowText(text1);
contentStream.newLine();
contentStream. ShowText(text2);

Step 9: Ending the Text

插入文本后,您需要使用 PDPageContentStream 类的 endText() 方法结束文本,如下所示。

After inserting the text, you need to end the text using the endText() method of the PDPageContentStream class as shown below.

contentStream.endText();

Step 10: Closing the PDPageContentStream

按如下所示使用 close() 方法关闭 PDPageContentStream 对象。

Close the PDPageContentStream object using the close() method as shown below.

contentstream.close();

Step 11: Saving the Document

添加所需内容后,按如下所示使用 PDDocument 类的 save() 方法保存 PDF 文档。

After adding the required content, save the PDF document using the save() method of the PDDocument class as shown in the following code block.

doc.save("Path");

Step 12: Closing the Document

最后,按如下所示使用 PDDocument 类的 close() 方法关闭文档。

Finally, close the document using the close() method of the PDDocument class as shown below.

doc.close();

Example

本例演示如何使用 PDFBox 在 PDF 中添加多行。将这个程序保存在名为 AddMultipleLines.java. 的文件中

This example demonstrates how to add multiple lines in a PDF using PDFBox. Save this program in a file with name AddMultipleLines.java.

import java.io.File;
import java.io.IOException;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDType1Font;

public class AddMultipleLines {
   public static void main(String args[]) throws IOException {

      //Loading an existing document
      File file = new File("C:/PdfBox_Examples/my_pdf.pdf");
      PDDocument doc = document.load(file);

      //Creating a PDF Document
      PDPage page = doc.getPage(1);

      PDPageContentStream contentStream = new PDPageContentStream(doc, page);

      //Begin the Content stream
      contentStream.beginText();

      //Setting the font to the Content stream
      contentStream.setFont( PDType1Font.TIMES_ROMAN, 16 );

      //Setting the leading
      contentStream.setLeading(14.5f);

      //Setting the position for the line
      contentStream.newLineAtOffset(25, 725);

      String text1 = "This is an example of adding text to a page in the pdf document.
         we can add as many lines";
      String text2 = "as we want like this using the ShowText()  method of the
         ContentStream class";

      //Adding text in the form of string
      contentStream. ShowText(text1);
      contentStream.newLine();
      contentStream. ShowText(text2);
      //Ending the content stream
      contentStream.endText();

      System.out.println("Content added");

      //Closing the content stream
      contentStream.close();

      //Saving the document
      doc.save(new File("C:/PdfBox_Examples/new.pdf"));

      //Closing the document
      doc.close();
   }
}

使用以下命令从命令提示符处编译并执行已保存的 Java 文件。

Compile and execute the saved Java file from the command prompt using the following commands.

javac AddMultipleLines.java
java AddMultipleLines

在执行时,上述程序会将给定的文本添加到文档并显示以下消息。

Upon execution, the above program adds the given text to the document and displays the following message.

Content added

如果您在指定路径中验证 PDF 文档 new.pdf ,您可以看到给定的内容已以多行的形式添加到文档中,如下所示。

If you verify the PDF Document new.pdf in the specified path, you can observe that the given content is added to the document in multiple lines as shown below.

adding multiplelines

PDFBox - Reading Text

在前一章中,我们已经了解如何向现有 PDF 文档中添加文本。在本章中,我们将讨论如何从现有 PDF 文档中读取文本。

In the previous chapter, we have seen how to add text to an existing PDF document. In this chapter, we will discuss how to read text from an existing PDF document.

Extracting Text from an Existing PDF Document

提取文本是 PDF 框库的主要功能。您可以使用 PDFTextStripper 类中的 getText() 方法提取文本。此类从给定的 PDF 文档中提取所有文本。

Extracting text is one of the main features of the PDF box library. You can extract text using the getText() method of the PDFTextStripper class. This class extracts all the text from the given PDF document.

以下是从现有 PDF 文档中提取文本的步骤。

Following are the steps to extract text from an existing PDF document.

Step 1: Loading an Existing PDF Document

使用 PDDocument 类的静态方法 load() 加载现有 PDF 文档。此方法接受一个文件对象作为参数,因为这是一个静态方法,您可使用类名调用它,如下所示:

Load an existing PDF document using the static method load() of the PDDocument class. This method accepts a file object as a parameter, since this is a static method you can invoke it using class name as shown below.

File file = new File("path of the document")
PDDocument document = PDDocument.load(file);

Step 2: Instantiate the PDFTextStripper Class

PDFTextStripper 类提供了从 PDF 文档中检索文本的方法,因此,如下所示实例化此类。

The PDFTextStripper class provides methods to retrieve text from a PDF document therefore, instantiate this class as shown below.

PDFTextStripper pdfStripper = new PDFTextStripper();

Step 3: Retrieving the Text

您可以使用 PDFTextStripper 类中的 getText() 方法从 PDF 文档中读取/检索页面内容。对于此方法,您需要传递文档对象作为参数。此方法检索给定文档中的文本,并以字符串对象的形式返回它。

You can read/retrieve the contents of a page from the PDF document using the getText() method of the PDFTextStripper class. To this method you need to pass the document object as a parameter. This method retrieves the text in a given document and returns it in the form of a String object.

String text = pdfStripper.getText(document);

Step 4: Closing the Document

最后,使用 PDDocument 类的 close() 方法关闭文档,如下所示。

Finally, close the document using the close() method of the PDDocument class as shown below.

document.close();

Example

假设我们有一个 PDF 文档,其中包含一些文本,如下所示。

Suppose, we have a PDF document with some text in it as shown below.

example pdf

此示例演示如何从上述 PDF 文档中读取文本。在此,我们将创建一个 Java 程序,并加载名为 new.pdf 的 PDF 文档,该文档保存在 C:/PdfBox_Examples/ 路径中。将此代码保存在名为 ReadingText.java 的文件中。

This example demonstrates how to read text from the above mentioned PDF document. Here, we will create a Java program and load a PDF document named new.pdf, which is saved in the path C:/PdfBox_Examples/. Save this code in a file with name ReadingText.java.

import java.io.File;
import java.io.IOException;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
public class ReadingText {

   public static void main(String args[]) throws IOException {

      //Loading an existing document
      File file = new File("C:/PdfBox_Examples/new.pdf");
      PDDocument document = PDDocument.load(file);

      //Instantiate PDFTextStripper class
      PDFTextStripper pdfStripper = new PDFTextStripper();

      //Retrieving text from PDF document
      String text = pdfStripper.getText(document);
      System.out.println(text);

      //Closing the document
      document.close();

   }
}

使用以下命令从命令提示符处编译并执行已保存的 Java 文件。

Compile and execute the saved Java file from the command prompt using the following commands.

javac ReadingText.java
java ReadingText

在执行时,上述程序检索给定 PDF 文档中的文本,并按如下所示显示它。

Upon execution, the above program retrieves the text from the given PDF document and displays it as shown below.

This is an example of adding text to a page in the pdf document. we can add as many lines
as we want like this using the ShowText() method of the ContentStream class.

PDFBox - Inserting Image

在上一章中,我们已经了解如何从现有 PDF 文档中提取文本。在本章中,我们将讨论如何将图像插入 PDF 文档。

In the previous chapter, we have seen how to extract text from an existing PDF document. In this chapter, we will discuss how to insert image to a PDF document.

Inserting Image to a PDF Document

使用 PDImageXObject 类和 PDPageContentStream 类中的 createFromFile()drawImage() 方法,可以将图像插入 PDF 文档。

You can insert an image into a PDF document using the createFromFile() and drawImage() methods of the classes PDImageXObject and PDPageContentStream respectively.

以下是从现有 PDF 文档中提取文本的步骤。

Following are the steps to extract text from an existing PDF document.

Step 1: Loading an Existing PDF Document

使用 PDDocument 类的静态方法 load() 加载现有 PDF 文档。此方法接受一个文件对象作为参数,因为这是一个静态方法,您可使用类名调用它,如下所示:

Load an existing PDF document using the static method load() of the PDDocument class. This method accepts a file object as a parameter, since this is a static method you can invoke it using class name as shown below.

File file = new File("path of the document")
PDDocument doc = PDDocument.load(file);

Step 2: Retrieving a Page

选择 PDF 文档中的一页并使用 getPage() 方法检索其页面对象,如下所示。

Select a page in the PDF document and retrieve its page object using the getPage() method as shown below.

PDPage page = doc.getPage(0);

Step 3: Creating PDImageXObject object

PDFBox 库中的 PDImageXObject 类表示一张图片。它提供了执行图片相关操作所需的所有方法,例如插入图片、设置其高度、设置其宽度等。

The class PDImageXObject in PDFBox library represents an image. It provides all the required methods to perform operations related to an image, such as, inserting an image, setting its height, setting its width etc.

我们可以使用 createFromFile() 方法创建此类的对象。我们需要向此方法传递作为字符串的要添加图片的路径和要向其中添加图片的文档对象。

We can create an object of this class using the method createFromFile(). To this method, we need to pass the path of the image which we want to add in the form of a string and the document object to which the image needs to be added.

PDImageXObject pdImage = PDImageXObject.createFromFile("C:/logo.png", doc);

Step 4: Preparing the Content Stream

您可以使用名为 PDPageContentStream 的类的对象插入各种数据元素。您需要将文档对象和页面对象传递给此类的构造函数,因此,按如下所示传递在先前步骤中创建的这两个对象来实例化此类。

You can insert various kinds of data elements using the object of the class named PDPageContentStream. You need to pass the document object and the page object to the constructor of this class therefore, instantiate this class by passing these two objects created in the previous steps as shown below.

PDPageContentStream contentStream = new PDPageContentStream(doc, page);

Step 5: Drawing the Image in the PDF Document

可以使用 drawImage() 方法在 PDF 文档中插入图片。需要向此方法添加在上一步创建的图片对象和图片所需尺寸(宽度和高度),如下所示。

You can insert an image in the PDF document using the drawImage() method. To this method, you need to add the image object created in the above step and the required dimensions of the image (width and height) as shown below.

contentstream.drawImage(pdImage, 70, 250);

Step 6: Closing the PDPageContentStream

按如下所示使用 close() 方法关闭 PDPageContentStream 对象。

Close the PDPageContentStream object using the close() method as shown below.

contentstream.close();

Step 7: Saving the Document

添加所需内容后,按如下所示使用 PDDocument 类的 save() 方法保存 PDF 文档。

After adding the required content, save the PDF document using the save() method of the PDDocument class as shown in the following code block.

doc.save("Path");

Step 8: Closing the Document

最后,按如下所示使用 PDDocument 类的 close() 方法关闭文档。

Finally, close the document using the close() method of the PDDocument class as shown below.

doc.close();

Example

假设我们有一个名为 sample.pdf 的 PDF 文档,路径为 C:/PdfBox_Examples/ ,空白页如下所示。

Suppose we have a PDF document named sample.pdf, in the path C:/PdfBox_Examples/ with empty pages as shown below.

sample document

此示例演示如何将图片添加到上述 PDF 文档的空白页。此处,我们将加载名为 sample.pdf 的 PDF 文档并向其中添加图片。使用名称 InsertingImage.java. 将此代码保存在一个文件中

This example demonstrates how to add image to a blank page of the above mentioned PDF document. Here, we will load the PDF document named sample.pdf and add image to it. Save this code in a file with name InsertingImage.java.

import java.io.File;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;

public class InsertingImage {

   public static void main(String args[]) throws Exception {
      //Loading an existing document
      File file = new File("C:/PdfBox_Examples/sample.pdf");
      PDDocument doc = PDDocument.load(file);

      //Retrieving the page
      PDPage page = doc.getPage(0);

      //Creating PDImageXObject object
      PDImageXObject pdImage = PDImageXObject.createFromFile("C:/PdfBox_Examples/logo.png",doc);

      //creating the PDPageContentStream object
      PDPageContentStream contents = new PDPageContentStream(doc, page);

      //Drawing the image in the PDF document
      contents.drawImage(pdImage, 70, 250);

      System.out.println("Image inserted");

      //Closing the PDPageContentStream object
      contents.close();

      //Saving the document
      doc.save("C:/PdfBox_Examples/sample.pdf");

      //Closing the document
      doc.close();

   }
}

使用以下命令从命令提示符处编译并执行已保存的 Java 文件。

Compile and execute the saved Java file from the command prompt using the following commands.

javac InsertingImage.java
java InsertingImage

执行后,以上程序将图片插入到指定页的给定 PDF 文档,并显示以下消息。

Upon execution, the above program inserts an image into the specified page of the given PDF document displaying the following message.

Image inserted

验证文档 sample.pdf ,您可以看到其中插入的图片如下所示。

If you verify the document sample.pdf, you can observe that an image is inserted in it as shown below.

inserting image

PDFBox - Encrypting a PDF Document

在上一章中,我们已经了解如何向 PDF 文档中插入图片。在本章中,我们讨论如何加密 PDF 文档。

In the previous chapter, we have seen how to insert an image in a PDF document. In this chapter, we will discuss how to encrypt a PDF document.

Encrypting a PDF Document

可以使用 StandardProtectionPolicyAccessPermission classes 提供的方法加密 PDF 文档。

You can encrypt a PDF document using the methods provided by StandardProtectionPolicy and AccessPermission classes.

AccessPermission 类用于通过向 PDF 文档分配访问权限来保护它。使用此类,可以限制用户执行以下操作。

The AccessPermission class is used to protect the PDF Document by assigning access permissions to it. Using this class, you can restrict users from performing the following operations.

  1. Print the document

  2. Modify the content of the document

  3. Copy or extract content of the document

  4. Add or modify annotations

  5. Fill in interactive form fields

  6. Extract text and graphics for accessibility to visually impaired people

  7. Assemble the document

  8. Print in degraded quality

StandardProtectionPolicy 类用于为文档添加基于密码的保护。

The StandardProtectionPolicy class is used to add a password based protection to a document.

以下是加密现有 PDF 文档的步骤。

Following are the steps to encrypt an existing PDF document.

Step 1: Loading an Existing PDF Document

使用 PDDocument 类的静态方法 load() 加载现有 PDF 文档。此方法接受一个文件对象作为参数,因为这是一个静态方法,您可使用类名调用它,如下所示:

Load an existing PDF document using the static method load() of the PDDocument class. This method accepts a file object as a parameter, since this is a static method you can invoke it using class name as shown below.

File file = new File("path of the document")
PDDocument document = PDDocument.load(file);

Step 2: Creating Access Permission Object

实例化 AccessPermission 类,如下所示。

Instantiate the AccessPermission class as shown below.

AccessPermission accessPermission = new AccessPermission();

Step 3: Creating StandardProtectionPolicy Object

使用 StandardProtectionPolicy 类,通过传递所有者密码、用户密码和 AccessPermission 对象,按照如下所示进行实例化。

Instantiate the StandardProtectionPolicy class by passing the owner password, user password, and the AccessPermission object as shown below.

StandardProtectionPolicy spp = new StandardProtectionPolicy("1234","1234",accessPermission);

Step 4: Setting the Length of the Encryption Key

使用 setEncryptionKeyLength() 方法设置加密密钥长度,按照如下所示进行。

Set the encryption key length using the setEncryptionKeyLength() method as shown below.

spp.setEncryptionKeyLength(128);

Step 5: Setting the Permissions

使用 StandardProtectionPolicy 类的 setPermissions() 方法设置权限。此方法接受 AccessPermission 对象作为参数。

Set the permissions using the setPermissions() method of the StandardProtectionPolicy class. This method accepts an AccessPermission object as a parameter.

spp.setPermissions(accessPermission);

Step 6: Protecting the Document

您可以使用 PDDocument 类的 protect() 方法来保护您的文档,按照如下所示进行。将 StandardProtectionPolicy 对象作为参数传递给此方法。

You can protect your document using the protect() method of the PDDocument class as shown below. Pass the StandardProtectionPolicy object as a parameter to this method.

document.protect(spp);

Step 7: Saving the Document

添加所需内容后,使用 PDDocument 类的 save() 方法保存 PDF 文档,如下面的代码块所示。

After adding the required content save the PDF document using the save() method of the PDDocument class as shown in the following code block.

document.save("Path");

Step 8: Closing the Document

最后,按照 PDDocumentclose() 方法,关闭文档。如下所示。

Finally, close the document using close() method of PDDocument class as shown below.

document.close();

Example

假设我们有一个名为 sample.pdf 的 PDF 文档,路径为 C:/PdfBox_Examples/ ,并且页面为空,如下所示。

Suppose, we have a PDF document named sample.pdf, in the path C:/PdfBox_Examples/ with empty pages as shown below.

sample document

此示例演示如何加密上述的 PDF 文档。在此,我们将加载名为 sample.pdf 的 PDF 文档并对其进行加密。将此代码保存在名为 EncriptingPDF.java. 的文件中。

This example demonstrates how to encrypt the above mentioned PDF document. Here, we will load the PDF document named sample.pdf and encrypt it. Save this code in a file with name EncriptingPDF.java.

import java.io.File;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.encryption.AccessPermission;
import org.apache.pdfbox.pdmodel.encryption.StandardProtectionPolicy;
public class EncriptingPDF {

   public static void main(String args[]) throws Exception {
      //Loading an existing document
      File file = new File("C:/PdfBox_Examples/sample.pdf");
      PDDocument document = PDDocument.load(file);

      //Creating access permission object
      AccessPermission ap = new AccessPermission();

      //Creating StandardProtectionPolicy object
      StandardProtectionPolicy spp = new StandardProtectionPolicy("1234", "1234", ap);

      //Setting the length of the encryption key
      spp.setEncryptionKeyLength(128);

      //Setting the access permissions
      spp.setPermissions(ap);

      //Protecting the document
      document.protect(spp);

      System.out.println("Document encrypted");

      //Saving the document
      document.save("C:/PdfBox_Examples/sample.pdf");
      //Closing the document
      document.close();

   }
}

使用以下命令从命令提示符处编译并执行已保存的 Java 文件。

Compile and execute the saved Java file from the command prompt using the following commands.

javac EncriptingPDF.java
java EncriptingPDF

执行完上述程序之后,会对给定的 PDF 文档加密并显示以下信息。

Upon execution, the above program encrypts the given PDF document displaying the following message.

Document encrypted

如果您尝试打开文档 sample.pdf ,则您无法打开它,因为它已被加密。于是,它会提示您输入密码以打开文档,按照如下所示进行。

If you try to open the document sample.pdf, you cannot, since it is encrypted. Instead, it prompts to type the password to open the document as shown below.

document encryption

PDFBox - JavaScript in PDF Document

在上一章中,我们学习了如何将图像插入 PDF 文档。在本章中,我们将讨论如何将 JavaScript 添加到 PDF 文档中。

In the previous chapter, we have learnt how to insert image into a PDF document. In this chapter, we will discuss how to add JavaScript to a PDF document.

Adding JavaScript to a PDF Document

可以使用 PDActionJavaScript 类将 JavaScript 操作添加到 PDF 文档。这表示一个 JavaScript 操作。

You can add JavaScript actions to a PDF document using the PDActionJavaScript class. This represents a JavaScript action.

以下是向现有 PDF 文档添加 JavaScript 操作的步骤。

Following are the steps to add JavaScript actions to an existing PDF document.

Step 1: Loading an Existing PDF Document

使用 PDDocument 类的静态方法 load() 加载现有 PDF 文档。此方法接受一个文件对象作为参数,因为这是一个静态方法,您可使用类名调用它,如下所示:

Load an existing PDF document using the static method load() of the PDDocument class. This method accepts a file object as a parameter, since this is a static method you can invoke it using class name as shown below.

File file = new File("path of the document")
PDDocument document = PDDocument.load(file);

Step 2: Creating the PDActionJavaScript Object

如下所示,实例化 PDActionJavaScript 对象。为此类的构造函数传递所需的 JavaScript,形式为 String,如下所示。

Instantiate the PDActionJavaScript object as shown below. To the constructor of this class, pass the required JavaScript in the form of String as shown below.

String javaScript = "app.alert( {cMsg: 'this is an example', nIcon: 3,"
   + " nType: 0,cTitle: 'PDFBox Javascript example' } );";
PDActionJavaScript PDAjavascript = new PDActionJavaScript(javaScript);

Step 3: Embedding Java script in the Document

将所需字符串嵌入 PDF 文档,如下所示。

Embed the required string to the PDF document as shown below.

document.getDocumentCatalog().setOpenAction(PDAjavascript);

Step 4: Saving the Document

添加所需内容后,使用 PDDocument 类的 save() 方法保存 PDF 文档,如下面的代码块所示。

After adding the required content save the PDF document using the save() method of the PDDocument class as shown in the following code block.

document.save("Path");

Step 5: Closing the Document

最后,使用 PDDocument 类的 close() 方法关闭文档,如下所示。

Finally, close the document using close() method of the PDDocument class as shown below.

document.close();

Example

假设我们有一个名为 sample.pdf 的 PDF 文档,路径为 C:/PdfBox_Examples/ ,并且页面为空,如下所示。

Suppose, we have a PDF document named sample.pdf, in the path C:/PdfBox_Examples/ with empty pages as shown below.

sample document

此示例演示如何将 JavaScript 嵌入上述 PDF 文档。此处,我们将加载名为 sample.pdf 的 PDF 文档,并在其中嵌入 JavaScript。将此代码保存到名为 AddJavaScript.java. 的文件中。

This example demonstrates how to embed JavaScript in the above mentioned PDF document. Here, we will load the PDF document named sample.pdf and embed JavaScript in it. Save this code in a file with name AddJavaScript.java.

import java.io.File;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.interactive.action.PDActionJavaScript;

public class AddJavaScript {

   public static void main(String args[]) throws Exception {

      //Loading an existing file
      File file = new File("C:/PdfBox_Examples/new.pdf");
      PDDocument document = PDDocument.load(file);

      String javaScript = "app.alert( {cMsg: 'this is an example', nIcon: 3,"
         + " nType: 0, cTitle: 'PDFBox Javascript example’} );";

      //Creating PDActionJavaScript object
      PDActionJavaScript PDAjavascript = new PDActionJavaScript(javaScript);

      //Embedding java script
      document.getDocumentCatalog().setOpenAction(PDAjavascript);

      //Saving the document
      document.save( new File("C:/PdfBox_Examples/new.pdf") );
      System.out.println("Data added to the given PDF");

      //Closing the document
      document.close();

   }
}

使用以下命令从命令提示符处编译并执行已保存的 Java 文件。

Compile and execute the saved Java file from the command prompt using the following commands.

javac AddJavaScript.java
java AddJavaScript

执行后,上述程序将 JavaScript 嵌入给定 PDF 文档,显示以下消息。

Upon execution, the above program embeds JavaScript in the given PDF document displaying the following message.

Data added to the given PDF

如果您尝试打开文档 new.pdf ,它会显示如下所示的警报消息。

If you try to open the document new.pdf it will display an alert message as shown below.

adding javascript

PDFBox - Splitting a PDF Document

在上一个章节中,我们已经看到如何向 PDF 文档添加 JavaScript。现在,让我们学习如何将给定的 PDF 文档拆分为多个文档。

In the previous chapter, we have seen how to add JavaScript to a PDF document. Let us now learn how to split a given PDF document into multiple documents.

Splitting the Pages in a PDF Document

您可以使用名为 Splitter 的类将给定的 PDF 文档拆分为多个 PDF 文档。该类用于将给定 PDF 文档拆分为多个其他文档。

You can split the given PDF document in to multiple PDF documents using the class named Splitter. This class is used to split the given PDF document into several other documents.

以下是拆分现有 PDF 文档的步骤:

Following are the steps to split an existing PDF document

Step 1: Loading an Existing PDF Document

使用 PDDocument 类的静态方法 load() 加载现有 PDF 文档。此方法接受一个文件对象作为参数,因为这是一个静态方法,您可使用类名调用它,如下所示:

Load an existing PDF document using the static method load() of the PDDocument class. This method accepts a file object as a parameter, since this is a static method you can invoke it using class name as shown below.

File file = new File("path of the document")
PDDocument document = PDDocument.load(file);

Step 2: Instantiate the Splitter Class

名为 Splitter 的类包含拆分给定 PDF 文档的方法,因此,实例化这个类如下所示:

The class named Splitter contains the methods to split the given PDF document therefore, instantiate this class as shown below.

Splitter splitter = new Splitter();

Step 3: Splitting the PDF Document

您可以使用此类的 Split() 方法拆分给定的文档。此方法接受一个 Splitter 类的对象作为参数。

You can split the given document using the Split() method of the Splitter class this class. This method accepts an object of the PDDocument class as a parameter.

List<PDDocument> Pages = splitter.split(document);

split() 方法将给定文档的每页作为单独的文档进行拆分,并以列表的形式返回所有这些文档。

The split() method splits each page of the given document as an individual document and returns all these in the form of a list.

Step 4: Creating an Iterator Object

为了遍历文档列表,您需要获取上面步骤中获取的列表的迭代器对象,您需要使用 listIterator() 方法获取列表的迭代器对象,如下所示:

In order to traverse through the list of documents you need to get an iterator object of the list acquired in the above step, you need to get the iterator object of the list using the listIterator() method as shown below.

Iterator<PDDocument> iterator = Pages.listIterator();

Step 5: Closing the Document

最后,按照 PDDocumentclose() 方法,关闭文档。如下所示。

Finally, close the document using close() method of PDDocument class as shown below.

document.close();

Example

比如说,有一个名称为 sample.pdf 的 PDF 文档,位于路径 C:\PdfBox_Examples\ 当中,此文档包含两页——一页包含图片,另一页包含文本,如下所示。

Suppose, there is a PDF document with name sample.pdf in the path C:\PdfBox_Examples\ and this document contains two pages — one page containing image and another page containing text as shown below.

split page

此示例演示如何分割上述 PDF 文档。这里,我们将名为 sample.pdf 的 PDF 文档分割为 sample1.pdfsample2.pdf 两个不同的文档。使用名称为 SplitPages.java. 的文件,保存此代码

This example demonstrates how to split the above mentioned PDF document. Here, we will split the PDF document named sample.pdf into two different documents sample1.pdf and sample2.pdf. Save this code in a file with name SplitPages.java.

import org.apache.pdfbox.multipdf.Splitter;
import org.apache.pdfbox.pdmodel.PDDocument;

import java.io.File;
import java.io.IOException;
import java.util.List;
import java.util.Iterator;

public class SplitPages {
   public static void main(String[] args) throws IOException {

      //Loading an existing PDF document
      File file = new File("C:/PdfBox_Examples/sample.pdf");
      PDDocument document = PDDocument.load(file);

      //Instantiating Splitter class
      Splitter splitter = new Splitter();

      //splitting the pages of a PDF document
      List<PDDocument> Pages = splitter.split(document);

      //Creating an iterator
      Iterator<PDDocument> iterator = Pages.listIterator();

      //Saving each page as an individual document
      int i = 1;
      while(iterator.hasNext()) {
         PDDocument pd = iterator.next();
         pd.save("C:/PdfBox_Examples/sample"+ i++ +".pdf");
      }
      System.out.println("Multiple PDF’s created");
      document.close();
   }
}

使用以下命令,从命令提示符中编译并执行已保存的 Java 文件

Compile and execute the saved Java file from the command prompt using the following commands

javac SplitPages.java
java SplitPages

执行完上述程序之后,会对给定的 PDF 文档加密并显示以下信息。

Upon execution, the above program encrypts the given PDF document displaying the following message.

Multiple PDF’s created

如果您验证提供的路径,您会观察到,创建了几个 PDF 文件,名称为 sample1sample2 ,如下所示。

If you verify the given path, you can observe that multiple PDFs were created with names sample1 and sample2 as shown below.

split first
split second

PDFBox - Merging Multiple PDF Documents

在前一章中,我们已经了解如何将给定的 PDF 文档拆分为多个文档。现在我们来了解如何合并多个 PDF 文档为一个文档。

In the previous chapter, we have seen how to split a given PDF document into multiple documents. Let us now learn how to merge multiple PDF documents as a single document.

Merging Multiple PDF Documents

借助名为 PDFMergerUtility class 的类,可以将多个 PDF 文档合并到一个 PDF 文档中,此类提供将两个或更多 PDF 文档合并到一个 PDF 文档的方法。

You can merge multiple PDF documents into a single PDF document using the class named PDFMergerUtility class, this class provides methods to merge two or more PDF documents in to a single PDF document.

以下是合并多个 PDF 文档的步骤。

Following are the steps to merge multiple PDF documents.

Step 1: Instantiating the PDFMergerUtility class

按如下所示实例化合并实用程序类。

Instantiate the merge utility class as shown below.

PDFMergerUtility PDFmerger = new PDFMergerUtility();

Step 2: Setting the destination file

按如下所示使用 setDestinationFileName() 方法设置目标文件。

Set the destination files using the setDestinationFileName() method as shown below.

PDFmerger.setDestinationFileName("C:/PdfBox_Examples/data1/merged.pdf");

Step 3: Setting the source files

按如下所示使用 addSource() 方法设置源文件。

Set the source files using the addSource() method as shown below.

File file = new File("path of the document")
PDFmerger.addSource(file);

Step 4: Merging the documents

按如下所示使用 PDFmerger 类的 mergeDocuments() 方法合并文档。

Merge the documents using the mergeDocuments() method of the PDFmerger class as shown below.

PDFmerger.mergeDocuments();

Example

假设我们有两个 PDF 文档 sample1.pdfsample2.pdf ,位于 C:\PdfBox_Examples\ 路径下,如下所示。

Suppose, we have two PDF documents — sample1.pdf and sample2.pdf, in the path C:\PdfBox_Examples\ as shown below.

image file
content file

此示例演示如何合并上述 PDF 文档。在此我们将名为 sample1.pdfsample2.pdf 的 PDF 文档合并到单个 PDF 文档 merged.pdf 中。将此代码保存在名为 MergePDFs.java. 的文件中。

This example demonstrates how to merge the above PDF documents. Here, we will merge the PDF documents named sample1.pdf and sample2.pdf in to a single PDF document merged.pdf. Save this code in a file with name MergePDFs.java.

import org.apache.pdfbox.multipdf.PDFMergerUtility;
import java.io.File;
import java.io.IOException;
public class MergePDFs {
   public static void main(String[] args) throws IOException {
      File file1 = new File("C:\\EXAMPLES\\Demo1.pdf");
      File file2 = new File("C:\\EXAMPLES\\Demo2.pdf");

      //Instantiating PDFMergerUtility class
      PDFMergerUtility PDFmerger = new PDFMergerUtility();

      //Setting the destination file
      PDFmerger.setDestinationFileName("C:\\Examples\\merged.pdf");

      //adding the source files
      PDFmerger.addSource(file1);
      PDFmerger.addSource(file2);

      //Merging the two documents
      PDFmerger.mergeDocuments();
      System.out.println("Documents merged");
   }
}

使用以下命令从命令提示符处编译并执行已保存的 Java 文件。

Compile and execute the saved Java file from the command prompt using the following commands.

javac MergePDFs.java
java MergePDFs

执行完上述程序之后,会对给定的 PDF 文档加密并显示以下信息。

Upon execution, the above program encrypts the given PDF document displaying the following message.

Documents merged

如果您验证给定的路径,您可以观察到创建了名为 merged.pdf 的 PDF 文档,其中包含源文档的所有页面,如下所示。

If you verify the given path, you can observe that a PDF document with name merged.pdf is created and this contains the pages of both the source documents as shown below.

merged

PDFBox - Converting PDF To Image

在之前的章节中,我们已经了解了如何合并多个 PDF 文档。在本节中,我们将了解如何从 PDF 文档的页面中提取图像。

In the previous chapter, we have seen how to merge multiple PDF documents. In this chapter, we will understand how to extract an image from a page of a PDF document.

Generating an Image from a PDF Document

PDFBox 库提供了一个名为 PDFRenderer 的类,该类将 PDF 文档呈现为 AWT BufferedImage。

PDFBox library provides you a class named PDFRenderer which renders a PDF document into an AWT BufferedImage.

以下是要从 PDF 文档中生成图像的步骤。

Following are the steps to generate an image from a PDF document.

Step 1: Loading an Existing PDF Document

使用 PDDocument 类的静态方法 load() 加载现有 PDF 文档。此方法接受一个文件对象作为参数,因为这是一个静态方法,您可使用类名调用它,如下所示:

Load an existing PDF document using the static method load() of the PDDocument class. This method accepts a file object as a parameter, since this is a static method you can invoke it using class name as shown below.

File file = new File("path of the document")
PDDocument document = PDDocument.load(file);

Step 2: Instantiating the PDFRenderer Class

名为 PDFRenderer 的类将 PDF 文档呈现为 AWT BufferedImage 。因此,您需要如下所示实例化此类。此类的构造函数接受一个文档对象;如以下所示传递先前步骤中创建的文档对象。

The class named PDFRenderer renders a PDF document into an AWT BufferedImage. Therefore, you need to instantiate this class as shown below. The constructor of this class accepts a document object; pass the document object created in the previous step as shown below.

PDFRenderer renderer = new PDFRenderer(document);

Step 3: Rendering Image from the PDF Document

您可以使用 Renderer 类的 renderImage() 方法在特定页面中渲染图像,为此方法您需要传递要渲染的图像所在页面的索引。

You can render the image in a particular page using the method renderImage() of the Renderer class, to this method you need to pass the index of the page where you have the image that is to be rendered.

BufferedImage image = renderer.renderImage(0);

Step 4: Writing the Image to a File

您可以使用 write() 方法将前一步中渲染的图像写入文件。为此方法,您需要传递三个参数 −

You can write the image rendered in the previous step to a file using the write() method. To this method, you need to pass three parameters −

  1. The rendered image object.

  2. String representing the type of the image (jpg or png).

  3. File object to which you need to save the extracted image.

ImageIO.write(image, "JPEG", new File("C:/PdfBox_Examples/myimage.jpg"));

Step 5: Closing the Document

最后,使用 PDDocument 类的 close() 方法关闭文档,如下所示。

Finally, close the document using the close() method of the PDDocument class as shown below.

document.close();

Example

假设,我们在 C:\PdfBox_Examples\ 路径中有一个 PDF 文档 — sample.pdf ,其中在第一页中包含一个图像,如下所示。

Suppose, we have a PDF document — sample.pdf in the path C:\PdfBox_Examples\ and this contains an image in its first page as shown below.

sample image

此示例演示如何将上述 PDF 文档转换为图像文件。此处,我们将检索 PDF 文档第 1 页中的图像,并将其保存为 myimage.jpg 。将此代码保存为 PdfToImage.java

This example demonstrates how to convert the above PDF document into an image file. Here, we will retrieve the image in the 1st page of the PDF document and save it as myimage.jpg. Save this code as PdfToImage.java

import java.awt.image.BufferedImage;
import java.io.File;

import javax.imageio.ImageIO;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.PDFRenderer;
public class PdfToImage {

   public static void main(String args[]) throws Exception {

      //Loading an existing PDF document
      File file = new File("C:/PdfBox_Examples/sample.pdf");
      PDDocument document = PDDocument.load(file);

      //Instantiating the PDFRenderer class
      PDFRenderer renderer = new PDFRenderer(document);

      //Rendering an image from the PDF document
      BufferedImage image = renderer.renderImage(0);

      //Writing the image to a file
      ImageIO.write(image, "JPEG", new File("C:/PdfBox_Examples/myimage.jpg"));

      System.out.println("Image created");

      //Closing the document
      document.close();

   }
}

使用以下命令从命令提示符处编译并执行已保存的 Java 文件。

Compile and execute the saved Java file from the command prompt using the following commands.

javac PdfToImage.java
java PdfToImage

执行后,上述程序检索给定 PDF 文档中的图像,显示以下消息。

Upon execution, the above program retrieves the image in the given PDF document displaying the following message.

Image created

如果验证给定路径,则可以观察到图像已生成并保存为 myimage.jpg ,如下所示。

If you verify the given path, you can observe that the image is generated and saved as myimage.jpg as shown below.

generateimage

PDFBox - Adding Rectangles

本章将教您如何在 PDF 文档页面中创建彩色方框。

This chapter teaches you how to create color boxes in a page of a PDF document.

Creating Boxes in a PDF Document

可以使用 PDPageContentStream 类的 addRect() 方法在 PDF 页面中添加矩形方框。

You can add rectangular boxes in a PDF page using the addRect() method of the PDPageContentStream class.

以下是如何在 PDF 文档页面中创建矩形形状的步骤。

Following are the steps to create rectangular shapes in a page of a PDF document.

Step 1: Loading an Existing PDF Document

使用 PDDocument 类的静态方法 load() 加载现有 PDF 文档。此方法接受一个文件对象作为参数,因为这是一个静态方法,您可使用类名调用它,如下所示:

Load an existing PDF document using the static method load() of the PDDocument class. This method accepts a file object as a parameter, since this is a static method you can invoke it using class name as shown below.

File file = new File("path of the document")
PDDocument document = PDDocument.load(file);

Step 2: Getting the Page Object

您需要使用 PDDocument 类的 getPage() 方法检索您想在其中添加矩形的所需页面的 PDPage 对象。对该方法,您需要传递您想在其中添加矩形的页面的索引。

You need to retrieve the PDPage object of the required page where you want to add rectangles using the getPage() method of the PDDocument class. To this method you need to pass the index of the page where you want to add rectangles.

PDPage page = document.getPage(0);

Step 3: Preparing the Content Stream

您可以使用名为 PDPageContentStream 的类的对象插入各种数据元素。您需要将文档对象和页面对象传递给此类的构造函数,因此,按如下所示传递在先前步骤中创建的这两个对象来实例化此类。

You can insert various kinds of data elements using the object of the class named PDPageContentStream. You need to pass the document object and the page object to the constructor of this class therefore, instantiate this class by passing these two objects created in the previous steps as shown below.

PDPageContentStream contentStream = new PDPageContentStream(document, page);

Step 4: Setting the Non-stroking Color

可以使用 PDPageContentStream 类的 setNonStrokingColor() 方法向矩形设置非描边颜色。对该方法,您需要将所需的颜色作为参数传递,如下所示。

You can set the non-stroking color to the rectangle using the setNonStrokingColor() method of the class PDPageContentStream. To this method, you need to pass the required color as a parameter as shown below.

contentStream.setNonStrokingColor(Color.DARK_GRAY);

Step 5: Drawing the rectangle

使用 addRect() 方法绘制所需尺寸的矩形。对该方法,您需要传递待添加的矩形的尺寸,如下所示。

Draw the rectangle with required dimensions using the addRect() method. To this method, you need to pass the dimensions of the rectangle that is to be added as shown below.

contentStream.addRect(200, 650, 100, 100);

Step 6: Filling the Rectangle

PDPageContentStream 类的 fill() 方法填充指定尺寸之间的路径,指定颜色,如下所示。

The fill() method of the PDPageContentStream class fills the path between the specified dimensions with the required color as shown below.

contentStream.fill();

Step 7: Closing the Document

最后,使用 PDDocument 类的 close() 方法关闭文档,如下所示。

Finally close the document using close() method of the PDDocument class as shown below.

document.close();

Example

假设我们在路径 C:\PdfBox_Examples\ 中有一个名为 blankpage.pdf 的 PDF 文档,其中包含一个空白页,如下所示。

Suppose we have a PDF document named blankpage.pdf in the path C:\PdfBox_Examples\ and this contains a single blank page as shown below.

blankpage

此示例演示如何在 PDF 文档中创建/插入矩形。在此,我们将在空白 PDF 中创建一个方框。将此代码保存为 AddRectangles.java

This example demonstrates how to create/insert rectangles in a PDF document. Here, we will create a box in a Blank PDF. Save this code as AddRectangles.java.

import java.awt.Color;
import java.io.File;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
public class ShowColorBoxes {

   public static void main(String args[]) throws Exception {

      //Loading an existing document
      File file = new File("C:/PdfBox_Examples/BlankPage.pdf");
      PDDocument document = PDDocument.load(file);

      //Retrieving a page of the PDF Document
      PDPage page = document.getPage(0);

      //Instantiating the PDPageContentStream class
      PDPageContentStream contentStream = new PDPageContentStream(document, page);

      //Setting the non stroking color
      contentStream.setNonStrokingColor(Color.DARK_GRAY);

      //Drawing a rectangle
      contentStream.addRect(200, 650, 100, 100);

      //Drawing a rectangle
      contentStream.fill();

      System.out.println("rectangle added");

      //Closing the ContentStream object
      contentStream.close();

      //Saving the document
      File file1 = new File("C:/PdfBox_Examples/colorbox.pdf");
      document.save(file1);

      //Closing the document
      document.close();
   }
}

使用以下命令从命令提示符处编译并执行已保存的 Java 文件。

Compile and execute the saved Java file from the command prompt using the following commands.

javac AddRectangles.java
java AddRectangles

在执行以上程序时,会在 PDF 文档中创建一个显示以下图片的矩形框。

Upon execution, the above program creates a rectangle in a PDF document displaying the following image.

Rectangle created

如果您验证给定的路径并打开已保存的文档 — colorbox.pdf ,您可以观察到其中插入了一个框,如下所示。

If you verify the given path and open the saved document — colorbox.pdf, you can observe that a box is inserted in it as shown below.

coloredbox