lucence完整项目资源-CSDN文库

共14个文件

jar：4个

java：2个

class：2个

1星需积分: 9 199 浏览量 2013-03-08 10:06:29 上传评论收藏 1.13MB ZIP 举报

**Lucene 完整项目详解** Lucene 是一个开源全文搜索引擎库，由 Apache 软件基金会开发。它提供了一个可扩展的、高性能的搜索框架，使得开发者能够在其应用程序中轻松地实现全文检索功能。在这个名为 "lucence 完整项目" 的压缩包中，我们很显然会看到一个用于学习 Lucene 的实例项目，它涵盖了 Lucene 的基本概念、核心组件和实际应用。 **1. Lucene 基础** Lucene 的核心组成部分包括索引（Index）、文档（Document）、字段（Field）和查询（Query）。在索引过程中，我们将文档拆分成一系列的字段，并为每个字段创建倒排索引。倒排索引允许快速查找包含特定词的文档，这是 Lucene 高效搜索的关键。 **2. 文档与字段** 在 Lucene 中，一个文档是由多个字段组成的，每个字段代表了数据的一个方面。例如，一个新闻文章可以有标题、作者、内容等字段。每个字段都可以被设置不同的分析器（Analyzer），以进行分词和其他预处理操作。 **3. 分析器** 分析器是 Lucene 中的重要组件，负责将原始文本转换成可用于索引的词项（Term）。分析过程通常包括分词、去除停用词、词形还原等步骤。在本项目中，我们可以看到如何选择和自定义分析器以适应不同类型的文本。 **4. 索引与搜索** 创建索引是 Lucene 的首要任务。通过 IndexWriter 类，我们可以将文档添加到索引中，也可以更新或删除已有文档。搜索则通过 QueryParser 或 QueryBuilder 实现，它们将用户的查询字符串转换为 Lucene 查询对象，然后使用 Searcher 执行查询，返回匹配的文档集。 **5. 示例项目结构** 从文件名 "lucene0210_02" 我们无法直接获取具体文件内容，但通常一个完整的 Lucene 项目会包含以下部分： - **源代码**：Java 文件，展示了如何创建、添加和搜索索引。 - **配置文件**：可能包含 Analyzer 或其他组件的配置。 - **数据文件**：待索引的文本文件。 - **索引目录**：存储 Lucene 索引的文件夹。 - **测试代码**：用于验证索引和搜索功能的 JUnit 测试。 **6. 学习与实践** 对于初学者来说，这个项目提供了很好的学习资源。通过阅读代码，可以了解 Lucene 的基本工作流程，包括如何创建索引、构建查询以及执行搜索。同时，注释可以帮助理解每个步骤的目的和功能。 "lucence 完整项目" 提供了一个实践 Lucene 全文搜索功能的实例，对于想要深入了解 Lucene 工作原理和应用的开发者来说，这是一个非常宝贵的参考资料。通过这个项目，你可以逐步掌握 Lucene 的关键技术和最佳实践，从而在自己的项目中实现高效的全文搜索功能。

资源推荐

资源详情

资源评论

收起资源包目录

lucene0210_02.zip （14个子文件）

lucene0210_02

.project 386B

src

itcast0210

lucene

hellworld

HelloWorld.java 4KB

bean

Article.java 481B

lib

lucene-core-3.0.1.jar 1002KB

lucene-memory-3.0.1.jar 27KB

lucene-analyzers-3.0.1.jar 196KB

lucene-highlighter-3.0.1.jar 46KB

.settings

org.eclipse.jdt.core.prefs 629B

.classpath 706B

bin

itcast0210

lucene

hellworld

HelloWorld.class 5KB

bean

Article.class 989B

indexDir

segments_2 230B

_0.cfs 524B

segments.gen 20B

package cn.itcast0210.lucene.hellworld; import java.io.File; import java.util.ArrayList; import java.util.List; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.Fieldable; import org.apache.lucene.document.Field.Index; import org.apache.lucene.document.Field.Store; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriter.MaxFieldLength; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TopDocs; import org.apache.lucene.search.WildcardQuery; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; import org.junit.Test; import cn.itcast0210.lucene.bean.Article; /* * 把一篇文章上传到索引库中，并且检索出来 */ public class HelloWorld { /** * 为这个文章创建一个索引 */ @Test public void createIndex() throws Exception { /* * 思路： * 模拟创建一个文章对象 * 把文章这个对象放入到索引库中 */ // 1 创建文章对象 Article article = new Article(); article.setId(1); article.setTitle("lucene可以做搜索引擎"); article.setContent("baidu,google都是很好的搜索引擎"); // 2 放入到索引库中 IndexWriter indexWriter = null; // 1 对indexWriter进行初始化 /* * 创建IndexWriter的三个参数 * Directory 索引库 * Analyzer 分词器 */ /** * 创建Directory 在当前目录下创建索引库，索引库的名称为indexDir */ Directory directory = FSDirectory.open(new File("./indexDir")); /* * 创建分词器基本的分析器 */ Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); indexWriter = new IndexWriter(directory, analyzer, MaxFieldLength.LIMITED); /** ******************************************************************************* */ /* * 从artile到document的转化 */ /** * 1、创建Document对象 */ Document document = new Document(); /* * Field构造函数有4个参数 name 属性的名称 value 属性的值 Store Index */ Field idField = new Field("id", article.getId().toString(), Store.YES, Index.NOT_ANALYZED); Field titleField = new Field("title", article.getTitle(), Store.YES, Index.ANALYZED); Field contentField = new Field("content", article.getContent(), Store.YES, Index.ANALYZED); document.add(idField); document.add(titleField); document.add(contentField); indexWriter.addDocument(document); // 关闭io流的过程 indexWriter.close(); } /** * 根据关键字检索信息 */ @Test public void searchIndex() throws Exception { List<Article> articleList = new ArrayList<Article>(); IndexSearcher indexSearcher = null; /* * 构建query对象 */ /** * 先构建QueryParser对象参数3个 * 版本 * 字段 * 分词器 */ Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30); QueryParser queryParser = new QueryParser(Version.LUCENE_30, "content", analyzer); Query query = queryParser.parse("lucene");// 关键词 Directory directory = FSDirectory.open(new File("./indexDir")); // 创建indexSearch过程 indexSearcher = new IndexSearcher(directory); TopDocs topDocs = indexSearcher.search(query, 1); int totalCount = topDocs.totalHits;// 根据关键词搜索出来的总的记录数 ScoreDoc[] scoreDocs = topDocs.scoreDocs; for (int i = 0; i < scoreDocs.length; i++) { int index = scoreDocs[i].doc; float score = scoreDocs[i].score;// 相关度得分 Document document = indexSearcher.doc(index); /* * 把document转化成article ,并且输出 */ // document.getField("id").stringValue(); Article article = new Article(); article.setId(Integer.parseInt(document.get("id"))); article.setTitle(document.get("title")); article.setContent(document.get("content")); articleList.add(article); } for (Article article : articleList) { System.out.println(article.getId()); System.out.println(article.getTitle()); System.out.println(article.getContent()); } } }

评论收藏

内容反馈