Tika 简明教程

TIKA Tutorial

本教程提供了 Apache Tika 库、它所支持的文件格式以及使用 Apache Tika 提取内容和元数据的基本理解。

This tutorial provides a basic understanding of Apache Tika library, the file formats it supports, as well as content and metadata extraction using Apache Tika.

Audience

本教程专为那些想要学习使用 Apache Tika 进行文档类型检测和内容提取的所有 Java 爱好者而设计。

This tutorial is designed for all Java enthusiasts who want to learn document type detection and content extraction using Apache Tika.

Prerequisites

为了充分利用本教程,读者应事先接触过 Java 编程,包括 JDK 1.6 和 Java 中的 IO 概念。

To make the most of this tutorial, the readers should have prior exposure to Java programming with JDK 1.6 and IO concepts in Java.