$Id: README.txt,v 1.5 2004/10/30 11:36:18 lhelper Exp $
中文文档请参考: http://www.chedong.com/tech/weblucene.html
WebLucene
=========
Lucene Web interface, use XML as a lightweight protocol. developer can convert data source (text, DB, MS Word, PDF... etc) into xml format, indexing with lucene engine, and get full text search result via HTTP, with XML format output, user can easily intergrated with JSP ASP PHP front end or use XSLT at server side transform output.
Indexing Process
================
MySQL \ / JSP
Oracle - DB - ==> XML ==> (Lucene Index) ==> XML - ASP
MSSQL / - PHP
MS Word / \ / XHTML
PDF / =XSLT=> - TEXT
\ XML
\_______WebLucene_______/
i18n issue: for Java is Unicode based, user can indexing data source(XML) in different charset into one lucene index(in unicode) and output result according to client browser support languages.
GBK \ / BIG5
BIG5 - UNICODE ====> Unicode - GB2312
SJIS - (XML) (XML) - SJIS
ISO-8859-1 / \ ISO-8859-1
Searching Process
=================
Input/Output: "HTTP GET"/XML
Client Browser Input==(HTTP GET)==> WebLuceneServlet ==> XML Result Set==(XSLT)==> XHTML output ==> Output to Client Browser
XML format search result
========================
Lucene_result.dtd
Chinese_gbk.xml Simplified Chinese indexing source sample
Chinese_big5.xml Triditional Chinese indexing source sample
Japanese_sjis.xml Japanese indexing source sample
English_en.xml English indexing source sample
every sample contents 5 articles
indexing source: XML format
Lucene_index.dtd
WebLuceneSource
Document:
Field:
title author content pub_date lang meta_info(not stored)
Index:
all_idx: title + content + meta_info, for full text searching
author_idx: index only without token, for author match
date_idx: index only without token, for date range search
lang_idx: index only without token, for language filter search
searching result: XML format
WebLuceneResult: Simple search result
ResultSet:
Record: contents with stored
Field name
Query
QueryString
OffSet
PageSize
OutputFormat
Filter field type(match/prefix/before/after)
SortType (score/doc/doc_desc)
result xslt transform:
source map:
1 lucene extension:
CJKTokenizer: a simple tokenizer support European languages and East Asia languages
org/apache/lucene/analysis/cjk/
IndexOrderSearcher: docID based result sorting
org/apache/lucene/search/
2 web application:
FileBasedPropertiesSupplier.java: properties supplier, supplier properties for SimplePropertiesFactory
PropertyFileFilenameFilter.java: properties file filter, filte file by filename
SimplePropertiesConsumer.java: properties consumer, which acquire data from SimplePropertiesFactory
SimplePropertiesFactory.java: properties container
com/chedong/properties
ParamUtil.java: used to validate the validation of the parameter
RequestParser.java: a utility that can used to parser the http request
com/chedong/util/
WebLuceneAdminServlet.java: Globle configurations viewer and re-loader
WebLucenePropertiesConsumer.java:
WebLucenePropertiesPreprocessor.java:
WebLuceneServlet.java: Search Entrance <==construct WebLuceneQuery and choose correct xslt trans XML output
com/chedong/weblucene/
SAXIndexer: SAX based lucene xml source indexer
com/chedong/weblucene/index/
DOMSearcher.java: invoke lucene indexSearcher,highlight result hits and convert search result to XML
WebLuceneHighlighter.java: search result hits highligher and abstractor.
WebLuceneQuery.java: Search Query Bean
WebLuceneResultSet.java:
WebLuceneSearcherBase.java:
com/chedong/weblucene/search/
XsltCache: xslt transformer caching
com/chedong/xslt/
3 application file list:
BUILD.txt install document in chinese
README.txt read me document
INSTALL.txt install documnet
CHANGES.txt change log
LICENSE.txt we use "The Apache Software License"
build.xml ant build file
webapp/
index.html test entrnace ==> WebLucene?dir=demo&q=keyword&encoding=utf-8&offset=10&size=10
weblucene_results.dtd search results XML definition
weblucene_index.dtd indexing source XML definition
WEB-INF/
web.xml webapp configuration
src/ java source directory
test/ unit test directory
classes/ java classes directory
bin/ shell commands: java LuceneXMLIndexer input.xml output directory
conf/
weblucene.conf global config file
log4j.conf config file for log4j
blog.conf config file for demo, which can override the declaration in weblucene.conf
var/ lucene indics
blog/ demo directory
index/ lucene index library
html.xsl xslt template for html
rss.xsl xslt template for rss
lib/ include jar files
java-getopt.jar java command line get options
lucene.jar Lucene: core full text index engine
xerces.jar XML parser
xalan.jar XSLT
log4j.jar logger
README.txt the jar file download path
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
WebLucene: Lucene search engine XML interface, provided sax based indexing, indexing sequence based result sorting and xml output with highlight support.The CJKTokenizer support Chinese Japanese and Korean with Westen language simulately.
资源推荐
资源详情
资源评论
收起资源包目录
【weblucene官方CVS源码】----<下载不扣分,回帖加1分,欢迎下载,童叟无欺> (333个子文件)
.checkstyle 87B
comments.conf 799B
blog.conf 796B
weblucene.conf 691B
style.css 839B
.cvsignore 37B
.cvsignore 21B
.cvsignore 12B
log4j.conf.default 2KB
build.properties.default 455B
db.inc.default 213B
weblucene.log.defualt 0B
weblucene_results.dtd 1KB
weblucene_index.dtd 754B
Entries 717B
Entries 560B
Entries 550B
Entries 493B
Entries 338B
Entries 312B
Entries 282B
Entries 278B
Entries 237B
Entries 224B
Entries 172B
Entries 172B
Entries 127B
Entries 121B
Entries 112B
Entries 111B
Entries 104B
Entries 104B
Entries 103B
Entries 91B
Entries 72B
Entries 63B
Entries 59B
Entries 58B
Entries 52B
Entries 51B
Entries 17B
Entries 14B
Entries 14B
Entries 3B
Entries 3B
Entries 3B
Entries 3B
Entries 3B
Entries 3B
Entries 3B
Entries.Extra 461B
Entries.Extra 283B
Entries.Extra 271B
Entries.Extra 264B
Entries.Extra 192B
Entries.Extra 166B
Entries.Extra 164B
Entries.Extra 152B
Entries.Extra 119B
Entries.Extra 106B
Entries.Extra 78B
Entries.Extra 78B
Entries.Extra 77B
Entries.Extra 77B
Entries.Extra 71B
Entries.Extra 63B
Entries.Extra 62B
Entries.Extra 55B
Entries.Extra 55B
Entries.Extra 54B
Entries.Extra 52B
Entries.Extra 39B
Entries.Extra 37B
Entries.Extra 32B
Entries.Extra 26B
Entries.Extra 25B
Entries.Extra 20B
Entries.Extra 17B
Entries.Extra 17B
Entries.Extra 0B
Entries.Extra 0B
Entries.Extra 0B
Entries.Extra 0B
Entries.Extra 0B
Entries.Extra 0B
Entries.Extra 0B
search_tab.html 1KB
search.html 1KB
xalan.jar 1.23MB
xercesImpl.jar 871KB
log4j.jar 344KB
lucene.jar 223KB
jdom.jar 132KB
java-getopt.jar 53KB
WebLuceneServlet.java 25KB
WebLucenePropertiesConsumer.java 20KB
SAXIndexer.java 19KB
WebLuceneAdminServlet.java 16KB
SimpleQueryParser.java 15KB
WebLuceneHighlighter.java 12KB
共 333 条
- 1
- 2
- 3
- 4
资源评论
Star_of_Java
- 粉丝: 13
- 资源: 137
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功