Lucene - Sắp xếp

Trong chương này, chúng ta sẽ xem xét các thứ tự sắp xếp trong đó Lucene đưa ra kết quả tìm kiếm theo mặc định hoặc có thể được thao tác theo yêu cầu.

Sắp xếp theo mức độ liên quan

Đây là chế độ sắp xếp mặc định được Lucene sử dụng. Lucene cung cấp kết quả theo lượt truy cập có liên quan nhất ở trên cùng.

private void sortUsingRelevance(String searchQuery)
   throws IOException, ParseException {
   searcher = new Searcher(indexDir);
   long startTime = System.currentTimeMillis();
   
   //create a term to search file name
   Term term = new Term(LuceneConstants.FILE_NAME, searchQuery);
   //create the term query object
   Query query = new FuzzyQuery(term);
   searcher.setDefaultFieldSortScoring(true, false);
   //do the search
   TopDocs hits = searcher.search(query,Sort.RELEVANCE);
   long endTime = System.currentTimeMillis();

   System.out.println(hits.totalHits +
      " documents found. Time :" + (endTime - startTime) + "ms");
   for(ScoreDoc scoreDoc : hits.scoreDocs) {
      Document doc = searcher.getDocument(scoreDoc);
      System.out.print("Score: "+ scoreDoc.score + " ");
      System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH));
   }
   searcher.close();
}

Sắp xếp theo IndexOrder

Chế độ sắp xếp này được sử dụng bởi Lucene. Tại đây, tài liệu đầu tiên được lập chỉ mục được hiển thị đầu tiên trong kết quả tìm kiếm.

private void sortUsingIndex(String searchQuery)
   throws IOException, ParseException {
   searcher = new Searcher(indexDir);
   long startTime = System.currentTimeMillis();
   
   //create a term to search file name
   Term term = new Term(LuceneConstants.FILE_NAME, searchQuery);
   //create the term query object
   Query query = new FuzzyQuery(term);
   searcher.setDefaultFieldSortScoring(true, false);
   //do the search
   TopDocs hits = searcher.search(query,Sort.INDEXORDER);
   long endTime = System.currentTimeMillis();

   System.out.println(hits.totalHits +
      " documents found. Time :" + (endTime - startTime) + "ms");
   for(ScoreDoc scoreDoc : hits.scoreDocs) {
      Document doc = searcher.getDocument(scoreDoc);
      System.out.print("Score: "+ scoreDoc.score + " ");
      System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH));
   }
   searcher.close();
}

Ứng dụng mẫu

Hãy để chúng tôi tạo một ứng dụng Lucene thử nghiệm để kiểm tra quá trình sắp xếp.

Bươc	Sự miêu tả
1	Tạo một dự án với tên LuceneFirstApplication dưới một gói com.tutorialspoint.lucene như đã giải thích trong chương Lucene - Ứng dụng đầu tiên . Bạn cũng có thể sử dụng dự án được tạo trong chương Lucene - First Application như vậy cho chương này để hiểu quá trình tìm kiếm.
2	Tạo LuceneConstants.java và Searcher.java như đã giải thích trong chương Lucene - Ứng dụng đầu tiên . Giữ phần còn lại của các tệp không thay đổi.
3	Tạo LuceneTester.java như được đề cập bên dưới.
4	Làm sạch và xây dựng ứng dụng để đảm bảo logic nghiệp vụ đang hoạt động theo yêu cầu.

LuceneConstants.java

Lớp này được sử dụng để cung cấp các hằng số khác nhau được sử dụng trong ứng dụng mẫu.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

Searcher.java

Lớp này được sử dụng để đọc các chỉ mục được tạo trên dữ liệu thô và tìm kiếm dữ liệu bằng thư viện Lucene.

package com.tutorialspoint.lucene;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.Sort;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

public class Searcher {
	
IndexSearcher indexSearcher;
   QueryParser queryParser;
   Query query;

   public Searcher(String indexDirectoryPath) throws IOException {
      Directory indexDirectory 
         = FSDirectory.open(new File(indexDirectoryPath));
      indexSearcher = new IndexSearcher(indexDirectory);
      queryParser = new QueryParser(Version.LUCENE_36,
         LuceneConstants.CONTENTS,
         new StandardAnalyzer(Version.LUCENE_36));
   }

   public TopDocs search( String searchQuery) 
      throws IOException, ParseException {
      query = queryParser.parse(searchQuery);
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public TopDocs search(Query query) 
      throws IOException, ParseException {
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public TopDocs search(Query query,Sort sort) 
      throws IOException, ParseException {
      return indexSearcher.search(query, 
         LuceneConstants.MAX_SEARCH,sort);
   }

   public void setDefaultFieldSortScoring(boolean doTrackScores, 
      boolean doMaxScores) {
      indexSearcher.setDefaultFieldSortScoring(
         doTrackScores,doMaxScores);
   }

   public Document getDocument(ScoreDoc scoreDoc) 
      throws CorruptIndexException, IOException {
      return indexSearcher.doc(scoreDoc.doc);	
   }

   public void close() throws IOException {
      indexSearcher.close();
   }
}

LuceneTester.java

Lớp này dùng để kiểm tra khả năng tìm kiếm của thư viện Lucene.

package com.tutorialspoint.lucene;

import java.io.IOException;

import org.apache.lucene.document.Document;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.search.FuzzyQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.Sort;
import org.apache.lucene.search.TopDocs;

public class LuceneTester {
	
   String indexDir = "E:\\Lucene\\Index";
   String dataDir = "E:\\Lucene\\Data";
   Indexer indexer;
   Searcher searcher;

   public static void main(String[] args) {
      LuceneTester tester;
      try {
          tester = new LuceneTester();
          tester.sortUsingRelevance("cord3.txt");
          tester.sortUsingIndex("cord3.txt");
      } catch (IOException e) {
          e.printStackTrace();
      } catch (ParseException e) {
          e.printStackTrace();
      }		
   }

   private void sortUsingRelevance(String searchQuery)
      throws IOException, ParseException {
      searcher = new Searcher(indexDir);
      long startTime = System.currentTimeMillis();
      
      //create a term to search file name
      Term term = new Term(LuceneConstants.FILE_NAME, searchQuery);
      //create the term query object
      Query query = new FuzzyQuery(term);
      searcher.setDefaultFieldSortScoring(true, false);
      //do the search
      TopDocs hits = searcher.search(query,Sort.RELEVANCE);
      long endTime = System.currentTimeMillis();

      System.out.println(hits.totalHits +
         " documents found. Time :" + (endTime - startTime) + "ms");
      for(ScoreDoc scoreDoc : hits.scoreDocs) {
         Document doc = searcher.getDocument(scoreDoc);
         System.out.print("Score: "+ scoreDoc.score + " ");
         System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH));
      }
      searcher.close();
   }

   private void sortUsingIndex(String searchQuery)
      throws IOException, ParseException {
      searcher = new Searcher(indexDir);
      long startTime = System.currentTimeMillis();
      //create a term to search file name
      Term term = new Term(LuceneConstants.FILE_NAME, searchQuery);
      //create the term query object
      Query query = new FuzzyQuery(term);
      searcher.setDefaultFieldSortScoring(true, false);
      //do the search
      TopDocs hits = searcher.search(query,Sort.INDEXORDER);
      long endTime = System.currentTimeMillis();

      System.out.println(hits.totalHits +
      " documents found. Time :" + (endTime - startTime) + "ms");
      for(ScoreDoc scoreDoc : hits.scoreDocs) {
         Document doc = searcher.getDocument(scoreDoc);
         System.out.print("Score: "+ scoreDoc.score + " ");
         System.out.println("File: "+ doc.get(LuceneConstants.FILE_PATH));
      }
      searcher.close();
   }
}

Tạo thư mục dữ liệu & chỉ mục

Chúng tôi đã sử dụng 10 tệp văn bản từ record1.txt sang record10.txt chứa tên và các thông tin chi tiết khác của học sinh và đưa chúng vào thư mục E:\Lucene\Data. Dữ liệu thử nghiệm . Đường dẫn thư mục chỉ mục nên được tạo dưới dạng E: \ Lucene \ Index. Sau khi chạy chương trình lập chỉ mục trong chươngLucene - Indexing Process, bạn có thể xem danh sách các tệp chỉ mục được tạo trong thư mục đó.

Chạy chương trình

Khi bạn đã hoàn tất việc tạo nguồn, dữ liệu thô, thư mục dữ liệu, thư mục chỉ mục và các chỉ mục, bạn có thể biên dịch và chạy chương trình của mình. Để làm điều này, hãy giữLuceneTester.Java tab tệp đang hoạt động và sử dụng tùy chọn Chạy có sẵn trong IDE Eclipse hoặc sử dụng Ctrl + F11 để biên dịch và chạy LuceneTesterứng dụng. Nếu ứng dụng của bạn chạy thành công, nó sẽ in thông báo sau trong bảng điều khiển của Eclipse IDE:

10 documents found. Time :31ms
Score: 1.3179655 File: E:\Lucene\Data\record3.txt
Score: 0.790779 File: E:\Lucene\Data\record1.txt
Score: 0.790779 File: E:\Lucene\Data\record2.txt
Score: 0.790779 File: E:\Lucene\Data\record4.txt
Score: 0.790779 File: E:\Lucene\Data\record5.txt
Score: 0.790779 File: E:\Lucene\Data\record6.txt
Score: 0.790779 File: E:\Lucene\Data\record7.txt
Score: 0.790779 File: E:\Lucene\Data\record8.txt
Score: 0.790779 File: E:\Lucene\Data\record9.txt
Score: 0.2635932 File: E:\Lucene\Data\record10.txt
10 documents found. Time :0ms
Score: 0.790779 File: E:\Lucene\Data\record1.txt
Score: 0.2635932 File: E:\Lucene\Data\record10.txt
Score: 0.790779 File: E:\Lucene\Data\record2.txt
Score: 1.3179655 File: E:\Lucene\Data\record3.txt
Score: 0.790779 File: E:\Lucene\Data\record4.txt
Score: 0.790779 File: E:\Lucene\Data\record5.txt
Score: 0.790779 File: E:\Lucene\Data\record6.txt
Score: 0.790779 File: E:\Lucene\Data\record7.txt
Score: 0.790779 File: E:\Lucene\Data\record8.txt
Score: 0.790779 File: E:\Lucene\Data\record9.txt