jun - 云代码空间

—— 相信，梦

Lucene核心语法

2014-08-04|2088阅|作者：jun|举报

摘要:1、索引在项目开发中使用 2、什么是lucene ？ lucene能够做什么？ 3、 lucene快速入门 4、分析索引内部结构原理 5、核心API 详细分析 6、基于lucene 结合数据库实现增删改查 7、 lucene 使用注意问题（

1. lucene和搜索概述

1.1. 搜索的历史

搜索引擎：对互联网上资源，建立索引，加速搜索
FTP资源、网页资源 ---------- 音频、视频、图片建立索引
Robot 网络机器人：指在互联网上自动运行，指定特定任务一些程序
Spider 网络爬虫：特殊网络机器人，去互联网下载各种资源，建立索引

爬虫是所有搜索引擎基础

----不做介绍自己百度

1.2. 搜索技术的应用

用途一：应用软件（word 、 window操作系统、 myeclipse ）

用途二：贴吧、论坛、博客（对于文章的搜索） ---- 最常见应用

用途三：站内搜索（京东搜索商品、 51job 搜索招聘信息） --- 应用非常广

用途四：专业搜索（垂直领域搜索 818 工作网、搜索引擎 baidu、google ）

信息搜索的过程

第一步：构建文本库（对各种各样被搜索资源，提取文本信息）

第二步：对文本信息建立索引

第三步：结合索引完成搜索

第四步：对搜索结果排序显示

1.3. 搜索系统中最常用索引 --- 倒排索引

传统线性查找一个10MB的word文件，查找关键字如果在文档最后，大约3秒钟

倒排索引，类似一本书的目录，索引技术，是一项优化技术，提高查找速度

2. lucene的快速入门

2.1. lucene概述

问题：什么是lucene ？

Apache 提高一套用于进行全文信息检索java框架（开源免费）

Lucene 不是搜索引擎，不可以直接当做软件或者产品使用，使用lucene 开发搜索引擎

问题：什么是全文检索？
计算机索引程序通过扫描文章中的每一个词，对每一个词建立一个索引，指明该词在文章中出现的次数和位置，
当用户查询时，检索程序就根据事先建立的索引进行查找，并将查找的结果反馈给用户的检索方式
强调，对文本信息中的每一个词，建立索引--------全文检索（全文索引）

Lucene 是全文索引工具

官网：http://lucene.apache.org/ 下载开发jar包

企业使用lucene ，下载lucene （最新版本4.9 ）、下载 solr （是基于lucene搜索服务器）

开发lucene 导入 lucene-core-3.6.2.jar 核心jar包
contrib 目录存放 lucene开发依赖工具jar包开发项目，导入核心包 + contrib 依赖jar

1.1. 快速入门

导入jar 到工程

第一步：提取文本数据 --- 转换 Document对象（被存放在索引库中）
第二步：结合lucene API 对Document 建立索引
第三步：结合luceneAPI 对索引库进行查询

创建一个实体类 Article

package cn.itcast.lucene;

import javax.persistence.Entity;

public class Article {

	private String id;// 编号
	private String title;// 标题
	private String content;// 内容

	public String getId() {
		return id;
	}

	public void setId(String id) {
		this.id = id;
	}

	public String getTitle() {
		return title;
	}

	public void setTitle(String title) {
		this.title = title;
	}

	public String getContent() {
		return content;
	}

	public void setContent(String content) {
		this.content = content;
	}

}

将对象数据转换为Document （不管是什么数据，lucene只能操作Document）

需要使用Fieldable接口实现类 Field 构造

Field.Store 用来设置当前属性是否存放到索引库中（搜索的结果）

Field.Index 用来设置当前属性是否索引库建立索引（搜索的过程）

对Document数据建立索引

设置索引目录 Directory
设置分词器Analyzer
通过IndexWriter 创建索引

查看索引内容工具 ---luke （是一个科执行jar文件通过java -jar 命令运行）

基于索引进行查找 ---testSearchIndex

搜索关键字（用户输入）
设置索引目录Directory
设置分词器 Analyzer
搜索获取Query 对象（基于QueryParser分词搜索）
通过IndexSearcher对象，进行搜索
搜索结果按照得分排名 TopDocs
获取每个文档得分对象 ScoreDocs

import java.io.File;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.Field.Index;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
import org.junit.Test;

public class LuceneDemo {

	@Test
	public void textLucene() throws Exception {
		// 针对 文章数据建立索引
		// 1、文本数据
		Article article = new Article();
		article.setId("100");
		article.setTitle("lucene简介");
		article.setContent("Lucene是apache软件基金会4 jakarta项目组的一个子项目，是一个开放源代码的全文检索引擎工具包，即它不是一个完整的全文检索引擎，而是一个全文检索引擎的架构");
		// 2、 将文本数据 --- 转换 Document对象
		// 一个document 对应一篇文章
		Document document = new Document();
		// 对document 添加数据
		document.add(new Field("id", article.getId(), Store.YES, Index.NO));
		document.add(new Field("title", article.getTitle(), Store.NO, Index.ANALYZED));
		document.add(new Field("content", article.getContent(), Store.YES, Index.ANALYZED));
		// 建立索引
		// 设置索引文件位置
		Directory directory = FSDirectory.open(new File("index"));
		// 设置索引分词器
		Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
		// 建立indexWriterConfig
		IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_36, analyzer);
		// 建立indexwriter
		IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);
		// 建立索引
		indexWriter.addDocument(document);
		indexWriter.close();
	}

	public void testSearchIndex() throws Exception {
		// 1、 用户输入一个搜索的词
		String searchContent = "jakarta项目";
		// 2、设置索引目录、分词器
		Directory directory = FSDirectory.open(new File("index"));
		Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
		// 3、构建Query对象
		QueryParser queryParser = new QueryParser(Version.LUCENE_36, "content", analyzer);
		Query query = queryParser.parse(searchContent);
		// 4、通过 IndexSearcher 对象，结合 Query完成查询
		IndexSearcher indexSearcher = new IndexSearcher(IndexReader.open(directory));
		TopDocs topDocs = indexSearcher.search(query, 100000000);
		// 搜索到实际结果数
		System.out.println("搜索到" + topDocs.totalHits + "个结果");
		// 获取结果内容 Document
		ScoreDoc[] scoreDocs = topDocs.scoreDocs;
		for (ScoreDoc scoreDoc : scoreDocs) {
			System.out.println("文档得分：" + scoreDoc.score);
			// 获取具体Document对象
			int documentId = scoreDoc.doc;
			Document document = indexSearcher.doc(documentId);
			// 通过 Field的name 获取属性的值
			System.out.println("id:" + document.get("id"));
			System.out.println("title:" + document.get("title"));
			System.out.println("content:" + document.get("content"));
		}
		// 释放资源
		indexSearcher.close();
	}
}

顶 1 踩 0 收藏

文章评论

发表评论

个人资料

昵称： jun
等级：资深程序员
积分： 1523
代码： 94 个
文章： 24 篇
随想： 0 条
访问： 7 次
关注

人气文章

人气代码

24212阅
使用JDBC执行 select insert update delete
15956阅
给div中添加文本元素
13944阅
el表达式取出list集合,map,数组,javabean的属性
8115阅
设置浏览器禁止缓存的三个头 : expires,pragma, cache-
8081阅
jstl——foreach 标签的使用
6856阅
Md5Utils
4631阅
请求转发技术, 以及在请求转发中实现数据共享

用户注册

用户登录

发表随想

Lucene核心语法

摘要:1、索引在项目开发中使用 2、什么是lucene ？ lucene能够做什么？ 3、 lucene快速入门 4、分析索引内部结构原理 5、核心API 详细分析 6、基于lucene 结合数据库实现增删改查 7、 lucene 使用注意问题（

1. lucene和搜索概述

1.1. 搜索的历史

1.2. 搜索技术的应用

1.3. 搜索系统中最常用索引 --- 倒排索引

2. lucene的快速入门

2.1. lucene概述

1.1. 快速入门

对Document数据建立索引

个人资料

人气文章

人气代码

标签

最新提问

站长推荐

用户注册

用户登录

发表随想

Lucene核心语法

摘要:1、 索引在项目开发中使用 2、 什么是lucene ？ lucene能够做什么 ？ 3、 lucene快速入门 4、 分析索引内部结构原理 5、 核心API 详细分析 6、 基于lucene 结合 数据库实现 增删改查 7、 lucene 使用注意问题 （

1. lucene和搜索概述

1.1. 搜索的历史

1.2. 搜索技术的应用

1.3. 搜索系统中 最常用索引 --- 倒排索引

2. lucene的快速入门

2.1. lucene概述

1.1. 快速入门

对Document数据 建立索引

个人资料

人气文章

人气代码

标签

最新提问

站长推荐

摘要:1、索引在项目开发中使用 2、什么是lucene ？ lucene能够做什么？ 3、 lucene快速入门 4、分析索引内部结构原理 5、核心API 详细分析 6、基于lucene 结合数据库实现增删改查 7、 lucene 使用注意问题（

1.3. 搜索系统中最常用索引 --- 倒排索引

对Document数据建立索引