Lucene - Ứng dụng đầu tiên

Trong chương này, chúng ta sẽ học lập trình thực tế với Lucene Framework. Trước khi bắt đầu viết ví dụ đầu tiên bằng cách sử dụng khung công tác Lucene, bạn phải đảm bảo rằng bạn đã thiết lập môi trường Lucene đúng cách như đã giải thích trong hướng dẫn Thiết lập Môi trường - Lucene . Bạn nên có kiến thức về Eclipse IDE.

Bây giờ chúng ta hãy tiếp tục bằng cách viết một Ứng dụng Tìm kiếm đơn giản sẽ in ra số lượng kết quả tìm kiếm được tìm thấy. Chúng tôi cũng sẽ thấy danh sách các chỉ mục được tạo trong quá trình này.

Bước 1 - Tạo dự án Java

Bước đầu tiên là tạo một Dự án Java đơn giản bằng Eclipse IDE. Làm theo tùy chọnFile > New -> Project và cuối cùng chọn Java Projectwizard từ danh sách wizard. Bây giờ đặt tên cho dự án của bạn làLuceneFirstApplication bằng cách sử dụng cửa sổ thuật sĩ như sau:

Khi dự án của bạn được tạo thành công, bạn sẽ có nội dung sau trong Project Explorer -

Bước 2 - Thêm Thư viện Bắt buộc

Bây giờ chúng ta hãy thêm thư viện Lucene core Framework trong dự án của mình. Để làm điều này, hãy nhấp chuột phải vào tên dự án của bạnLuceneFirstApplication và sau đó làm theo tùy chọn sau có sẵn trong menu ngữ cảnh: Build Path -> Configure Build Path để hiển thị cửa sổ Java Build Path như sau:

Bây giờ sử dụng Add External JARs nút có sẵn dưới Libraries để thêm JAR cốt lõi sau từ thư mục cài đặt Lucene -

lucene-core-3.6.2

Bước 3 - Tạo tệp nguồn

Bây giờ chúng ta hãy tạo các tệp nguồn thực tế trong LuceneFirstApplicationdự án. Đầu tiên, chúng ta cần tạo một gói có têncom.tutorialspoint.lucene. Để làm điều này, nhấp chuột phải vào src trong phần trình khám phá gói và làm theo tùy chọn: New -> Package.

Tiếp theo chúng ta sẽ tạo LuceneTester.java và các lớp java khác trong com.tutorialspoint.lucene gói hàng.

LuceneConstants.java

Lớp này được sử dụng để cung cấp các hằng số khác nhau được sử dụng trong ứng dụng mẫu.

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

TextFileFilter.java

Lớp này được sử dụng như một .txt file bộ lọc.

package com.tutorialspoint.lucene;

import java.io.File;
import java.io.FileFilter;

public class TextFileFilter implements FileFilter {

   @Override
   public boolean accept(File pathname) {
      return pathname.getName().toLowerCase().endsWith(".txt");
   }
}

Indexer.java

Lớp này được sử dụng để lập chỉ mục dữ liệu thô để chúng ta có thể tìm kiếm nó bằng thư viện Lucene.

package com.tutorialspoint.lucene;

import java.io.File;
import java.io.FileFilter;
import java.io.FileReader;
import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

public class Indexer {

   private IndexWriter writer;

   public Indexer(String indexDirectoryPath) throws IOException {
      //this directory will contain the indexes
      Directory indexDirectory = 
         FSDirectory.open(new File(indexDirectoryPath));

      //create the indexer
      writer = new IndexWriter(indexDirectory, 
         new StandardAnalyzer(Version.LUCENE_36),true, 
         IndexWriter.MaxFieldLength.UNLIMITED);
   }

   public void close() throws CorruptIndexException, IOException {
      writer.close();
   }

   private Document getDocument(File file) throws IOException {
      Document document = new Document();

      //index file contents
      Field contentField = new Field(LuceneConstants.CONTENTS, new FileReader(file));
      //index file name
      Field fileNameField = new Field(LuceneConstants.FILE_NAME,
         file.getName(),Field.Store.YES,Field.Index.NOT_ANALYZED);
      //index file path
      Field filePathField = new Field(LuceneConstants.FILE_PATH,
         file.getCanonicalPath(),Field.Store.YES,Field.Index.NOT_ANALYZED);

      document.add(contentField);
      document.add(fileNameField);
      document.add(filePathField);

      return document;
   }   

   private void indexFile(File file) throws IOException {
      System.out.println("Indexing "+file.getCanonicalPath());
      Document document = getDocument(file);
      writer.addDocument(document);
   }

   public int createIndex(String dataDirPath, FileFilter filter) 
      throws IOException {
      //get all files in the data directory
      File[] files = new File(dataDirPath).listFiles();

      for (File file : files) {
         if(!file.isDirectory()
            && !file.isHidden()
            && file.exists()
            && file.canRead()
            && filter.accept(file)
         ){
            indexFile(file);
         }
      }
      return writer.numDocs();
   }
}

Searcher.java

Lớp này dùng để tìm kiếm các chỉ mục do Indexer tạo ra để tìm kiếm nội dung được yêu cầu.

package com.tutorialspoint.lucene;

import java.io.File;
import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

public class Searcher {
	
   IndexSearcher indexSearcher;
   QueryParser queryParser;
   Query query;
   
   public Searcher(String indexDirectoryPath) 
      throws IOException {
      Directory indexDirectory = 
         FSDirectory.open(new File(indexDirectoryPath));
      indexSearcher = new IndexSearcher(indexDirectory);
      queryParser = new QueryParser(Version.LUCENE_36,
         LuceneConstants.CONTENTS,
         new StandardAnalyzer(Version.LUCENE_36));
   }
   
   public TopDocs search( String searchQuery) 
      throws IOException, ParseException {
      query = queryParser.parse(searchQuery);
      return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
   }

   public Document getDocument(ScoreDoc scoreDoc) 
      throws CorruptIndexException, IOException {
      return indexSearcher.doc(scoreDoc.doc);	
   }

   public void close() throws IOException {
      indexSearcher.close();
   }
}

LuceneTester.java

Lớp này được sử dụng để kiểm tra khả năng lập chỉ mục và tìm kiếm của thư viện lucene.

package com.tutorialspoint.lucene;

import java.io.IOException;

import org.apache.lucene.document.Document;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;

public class LuceneTester {
	
   String indexDir = "E:\\Lucene\\Index";
   String dataDir = "E:\\Lucene\\Data";
   Indexer indexer;
   Searcher searcher;

   public static void main(String[] args) {
      LuceneTester tester;
      try {
         tester = new LuceneTester();
         tester.createIndex();
         tester.search("Mohan");
      } catch (IOException e) {
         e.printStackTrace();
      } catch (ParseException e) {
         e.printStackTrace();
      }
   }

   private void createIndex() throws IOException {
      indexer = new Indexer(indexDir);
      int numIndexed;
      long startTime = System.currentTimeMillis();	
      numIndexed = indexer.createIndex(dataDir, new TextFileFilter());
      long endTime = System.currentTimeMillis();
      indexer.close();
      System.out.println(numIndexed+" File indexed, time taken: "
         +(endTime-startTime)+" ms");		
   }

   private void search(String searchQuery) throws IOException, ParseException {
      searcher = new Searcher(indexDir);
      long startTime = System.currentTimeMillis();
      TopDocs hits = searcher.search(searchQuery);
      long endTime = System.currentTimeMillis();
   
      System.out.println(hits.totalHits +
         " documents found. Time :" + (endTime - startTime));
      for(ScoreDoc scoreDoc : hits.scoreDocs) {
         Document doc = searcher.getDocument(scoreDoc);
            System.out.println("File: "
            + doc.get(LuceneConstants.FILE_PATH));
      }
      searcher.close();
   }
}

Bước 4 - Tạo thư mục Data & Index

Chúng tôi đã sử dụng 10 tệp văn bản từ record1.txt sang record10.txt chứa tên và các thông tin chi tiết khác của học sinh và đưa chúng vào thư mục E:\Lucene\Data. Dữ liệu thử nghiệm . Một đường dẫn thư mục chỉ mục nên được tạo nhưE:\Lucene\Index. Sau khi chạy chương trình này, bạn có thể xem danh sách các tệp chỉ mục được tạo trong thư mục đó.

Bước 5 - Chạy chương trình

Khi bạn đã hoàn tất việc tạo nguồn, dữ liệu thô, thư mục dữ liệu và thư mục chỉ mục, bạn đã sẵn sàng cho việc biên dịch và chạy chương trình của mình. Để làm điều này, hãy giữLuceneTester.Java tab tệp đang hoạt động và sử dụng Run tùy chọn có sẵn trong IDE Eclipse hoặc sử dụng Ctrl + F11 để biên dịch và chạy LuceneTesterứng dụng. Nếu ứng dụng chạy thành công, nó sẽ in thông báo sau trong bảng điều khiển của Eclipse IDE:

Indexing E:\Lucene\Data\record1.txt
Indexing E:\Lucene\Data\record10.txt
Indexing E:\Lucene\Data\record2.txt
Indexing E:\Lucene\Data\record3.txt
Indexing E:\Lucene\Data\record4.txt
Indexing E:\Lucene\Data\record5.txt
Indexing E:\Lucene\Data\record6.txt
Indexing E:\Lucene\Data\record7.txt
Indexing E:\Lucene\Data\record8.txt
Indexing E:\Lucene\Data\record9.txt
10 File indexed, time taken: 109 ms
1 documents found. Time :0
File: E:\Lucene\Data\record4.txt

Khi bạn đã chạy chương trình thành công, bạn sẽ có nội dung sau trong index directory -