Lucene 简明教程
Lucene - Search Operation
搜索过程是 Lucene 提供的核心功能之一。下图说明了该过程及其用途。IndexSearcher 是搜索过程的核心组件之一。
The process of searching is one of the core functionalities provided by Lucene. Following diagram illustrates the process and its use. IndexSearcher is one of the core components of the searching process.
我们首先创建包含索引的 Directory(ies),然后将其传递给使用 IndexReader 打开 Directory 的 IndexSearcher。然后,我们使用 Term 创建一个 Query,并使用 IndexSearcher 进行搜索,方法是将 Query 传递给搜索器。IndexSearcher 返回一个 TopDocs 对象,其中包含搜索详细信息以及搜索操作结果的 Document 的文档 ID。
We first create Directory(s) containing indexes and then pass it to IndexSearcher which opens the Directory using IndexReader. Then we create a Query with a Term and make a search using IndexSearcher by passing the Query to the searcher. IndexSearcher returns a TopDocs object which contains the search details along with document ID(s) of the Document which is the result of the search operation.
现在,我们将向您展示一个逐步的方法,并帮助您理解使用基本示例进行的索引过程。
We will now show you a step-wise approach and help you understand the indexing process using a basic example.
Create a QueryParser
QueryParser 类将用户输入的输入解析为 Lucene 理解的格式查询。按照以下步骤创建 QueryParser −
QueryParser class parses the user entered input into Lucene understandable format query. Follow these steps to create a QueryParser −
Step 1 − 创建 QueryParser 对象。
Step 1 − Create object of QueryParser.
Step 2 − 使用具有版本信息和要在此查询上运行的索引名称的标准分析器初始化创建的 QueryParser 对象。
Step 2 − Initialize the QueryParser object created with a standard analyzer having version information and index name on which this query is to be run.
QueryParser queryParser;
public Searcher(String indexDirectoryPath) throws IOException {
queryParser = new QueryParser(Version.LUCENE_36,
LuceneConstants.CONTENTS,
new StandardAnalyzer(Version.LUCENE_36));
}
Create a IndexSearcher
IndexSearcher 类充当搜索器索引的核心组件,该索引在索引过程中创建。按照以下步骤创建 IndexSearcher −
IndexSearcher class acts as a core component which searcher indexes created during indexing process. Follow these steps to create a IndexSearcher −
Step 1 − 创建 IndexSearcher 对象。
Step 1 − Create object of IndexSearcher.
Step 2 − 创建 Lucene 目录,该目录应该指向存储索引的位置。
Step 2 − Create a Lucene directory which should point to location where indexes are to be stored.
Step 3 − 使用索引目录初始化创建的 IndexSearcher 对象。
Step 3 − Initialize the IndexSearcher object created with the index directory.
IndexSearcher indexSearcher;
public Searcher(String indexDirectoryPath) throws IOException {
Directory indexDirectory =
FSDirectory.open(new File(indexDirectoryPath));
indexSearcher = new IndexSearcher(indexDirectory);
}
Make search
按照以下步骤进行搜索 −
Follow these steps to make search −
Step 1 − 通过 QueryParser 解析搜索表达式来创建 Query 对象。
Step 1 − Create a Query object by parsing the search expression through QueryParser.
Step 2 − 通过调用 IndexSearcher.search() 方法进行搜索。
Step 2 − Make search by calling the IndexSearcher.search() method.
Query query;
public TopDocs search( String searchQuery) throws IOException, ParseException {
query = queryParser.parse(searchQuery);
return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
}
Get the Document
以下程序显示如何获取文档。
The following program shows how to get the document.
public Document getDocument(ScoreDoc scoreDoc)
throws CorruptIndexException, IOException {
return indexSearcher.doc(scoreDoc.doc);
}
Close IndexSearcher
以下程序显示如何关闭 IndexSearcher。
The following program shows how to close the IndexSearcher.
public void close() throws IOException {
indexSearcher.close();
}
Example Application
让我们创建一个测试 Lucene 应用程序来测试搜索过程。
Let us create a test Lucene application to test searching process.
Step |
Description |
1 |
Create a project with a name LuceneFirstApplication under a package com.tutorialspoint.lucene as explained in the Lucene - First Application chapter. You can also use the project created in Lucene - First Application chapter as such for this chapter to understand the searching process. |
2 |
Create LuceneConstants.java,TextFileFilter.java and Searcher.java as explained in the Lucene - First Application chapter. Keep the rest of the files unchanged. |
3 |
Create LuceneTester.java as mentioned below. |
4 |
Clean and Build the application to make sure business logic is working as per the requirements. |
LuceneConstants.java
此类用于提供将在整个示例应用程序中使用的各种常量。
This class is used to provide various constants to be used across the sample application.
package com.tutorialspoint.lucene;
public class LuceneConstants {
public static final String CONTENTS = "contents";
public static final String FILE_NAME = "filename";
public static final String FILE_PATH = "filepath";
public static final int MAX_SEARCH = 10;
}
TextFileFilter.java
此类用作 .txt 文件过滤器。
This class is used as a .txt file filter.
package com.tutorialspoint.lucene;
import java.io.File;
import java.io.FileFilter;
public class TextFileFilter implements FileFilter {
@Override
public boolean accept(File pathname) {
return pathname.getName().toLowerCase().endsWith(".txt");
}
}
Searcher.java
此类用于读取原始数据上创建的索引并使用 Lucene 库搜索数据。
This class is used to read the indexes made on raw data and searches data using the Lucene library.
package com.tutorialspoint.lucene;
import java.io.File;
import java.io.IOException;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
public class Searcher {
IndexSearcher indexSearcher;
QueryParser queryParser;
Query query;
public Searcher(String indexDirectoryPath) throws IOException {
Directory indexDirectory =
FSDirectory.open(new File(indexDirectoryPath));
indexSearcher = new IndexSearcher(indexDirectory);
queryParser = new QueryParser(Version.LUCENE_36,
LuceneConstants.CONTENTS,
new StandardAnalyzer(Version.LUCENE_36));
}
public TopDocs search( String searchQuery)
throws IOException, ParseException {
query = queryParser.parse(searchQuery);
return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
}
public Document getDocument(ScoreDoc scoreDoc)
throws CorruptIndexException, IOException {
return indexSearcher.doc(scoreDoc.doc);
}
public void close() throws IOException {
indexSearcher.close();
}
}
LuceneTester.java
此类用于测试 Lucene 库的搜索功能。
This class is used to test the searching capability of the Lucene library.
package com.tutorialspoint.lucene;
import java.io.IOException;
import org.apache.lucene.document.Document;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
public class LuceneTester {
String indexDir = "E:\\Lucene\\Index";
String dataDir = "E:\\Lucene\\Data";
Searcher searcher;
public static void main(String[] args) {
LuceneTester tester;
try {
tester = new LuceneTester();
tester.search("Mohan");
} catch (IOException e) {
e.printStackTrace();
} catch (ParseException e) {
e.printStackTrace();
}
}
private void search(String searchQuery) throws IOException, ParseException {
searcher = new Searcher(indexDir);
long startTime = System.currentTimeMillis();
TopDocs hits = searcher.search(searchQuery);
long endTime = System.currentTimeMillis();
System.out.println(hits.totalHits +
" documents found. Time :" + (endTime - startTime) +" ms");
for(ScoreDoc scoreDoc : hits.scoreDocs) {
Document doc = searcher.getDocument(scoreDoc);
System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH));
}
searcher.close();
}
}
Data & Index Directory Creation
我们使用了 10 个名为 record1.txt 到 record10.txt 的文本文件,其中包含学生姓名和其他详细信息,并将它们放入目录 E:\Lucene\Data. 中。 Test Data . 索引目录路径应创建为 E:\Lucene\Index。在章节 Lucene - Indexing Process 中运行索引程序后,您可以在该文件夹中看到所创建的索引文件列表。
We have used 10 text files named record1.txt to record10.txt containing names and other details of the students and put them in the directory E:\Lucene\Data. Test Data. An index directory path should be created as E:\Lucene\Index. After running the indexing program in the chapter Lucene - Indexing Process, you can see the list of index files created in that folder.
Running the Program
完成源、原始数据、数据目录、索引目录和索引的创建后,您可以通过编译和运行程序继续操作。为此,请保持 LuceneTester.Java 文件选项卡处于活动状态,并使用 Eclipse IDE 中提供的“运行”选项,或使用 Ctrl + F11 编译并运行 LuceneTesterapplication 。如果您的应用程序运行成功,它将在 Eclipse IDE 控制台中打印以下消息 −
Once you are done with the creation of the source, the raw data, the data directory, the index directory and the indexes, you can proceed by compiling and running your program. To do this, keep LuceneTester.Java file tab active and use either the Run option available in the Eclipse IDE or use Ctrl + F11 to compile and run your LuceneTesterapplication. If your application runs successfully, it will print the following message in Eclipse IDE’s console −
1 documents found. Time :29 ms
File: E:\Lucene\Data\record4.txt