tika+lucene完整jar包资源-CSDN文库

共2000个文件

html：6470个

txt：70个

jar：65个

tika

lucene

jar包

需积分: 10 26 浏览量 2019-03-11 09:51:16 上传评论收藏 141.46MB RAR 举报

资源推荐

资源详情

资源评论

收起资源包目录

tika+lucene完整jar包（2000个子文件）

stylesheet.css 14KB

DTDDocStyle.css 2KB

dtreeStyle.css 932B

ChangesFancyStyle.css 575B

ChangesFixedWidthStyle.css 445B

ChangesSimpleStyle.css 154B

README.htm 1KB

Changes.html 1.12MB

QueryNode.html 252KB

LuceneTestCase.html 250KB

overview-tree.html 224KB

overview-tree.html 223KB

BytesRef.html 183KB

constant-values.html 174KB

BaseDocValuesFormatTestCase.html 166KB

BaseTokenStreamTestCase.html 160KB

AbstractAnalysisFactory.html 159KB

IndexWriter.html 157KB

Query.html 148KB

Directory.html 139KB

allclasses-frame.html 127KB

overview-tree.html 127KB

TokenFilterFactory.html 126KB

PlanetModel.html 124KB

package-use.html 124KB

Plane.html 117KB

GeoPoint.html 116KB

QueryNodeException.html 113KB

UnifiedHighlighter.html 112KB

constant-values.html 112KB

allclasses-noframe.html 111KB

AnalyzingInfixSuggester.html 108KB

QueryParserBase.html 105KB

package-summary.html 105KB

TestUtil.html 98KB

BaseCompoundFormatTestCase.html 96KB

IndexWriterConfig.html 95KB

BaseGeoPointTestCase.html 94KB

LuceneContribQuery.dtd.html 94KB

IndexSearcher.html 93KB

IOContext.html 93KB

package-use.html 92KB

package-summary.html 92KB

FilterFileSystemProvider.html 87KB

Membership.html 87KB

overview-tree.html 87KB

overview-tree.html 86KB

Term.html 85KB

BaseTermVectorsFormatTestCase.html 85KB

BasePostingsFormatTestCase.html 84KB

BaseExplanationTestCase.html 83KB

Automaton.html 83KB

BaseDirectoryTestCase.html 83KB

BaseNormsFormatTestCase.html 82KB

MultiFieldQueryParser.html 81KB

SerializableObject.html 81KB

BaseCompressingDocValuesFormatTestCase.html 78KB

QueryNodeProcessorPipeline.html 78KB

ValueSource.html 78KB

BaseSegmentInfoFormatTestCase.html 76KB

BaseFieldInfoFormatTestCase.html 74KB

ThreadedIndexingAndSearchingTestCase.html 73KB

MockDirectoryWrapper.html 73KB

allclasses-frame.html 72KB

CustomAnalyzer.Builder.html 72KB

BasePointsFormatTestCase.html 72KB

BaseStoredFieldsFormatTestCase.html 71KB

StandardQueryParser.html 71KB

LeafReaderContext.html 71KB

SegmentInfos.html 71KB

共 2000 条

Lucene Change Log For more information on past and future Lucene versions, please see: http://s.apache.org/luceneversions ======================= Lucene 7.7.1 ======================= (No Changes) ======================= Lucene 7.7.0 ======================= Changes in Runtime Behavior * LUCENE-8527: StandardTokenizer and UAX29URLEmailTokenizer now support Unicode 9.0, and provide Unicode UTS#51 v11.0 Emoji tokenization with the "<EMOJI>" token type. Build * LUCENE-8611: Update randomizedtesting to 2.7.2, JUnit to 4.12, add hamcrest-core dependency. (Dawid Weiss) * LUCENE-8537: ant test command fails under lucene/tools (Peter Somogyi) Bug fixes: * LUCENE-8669: Fix LatLonShape WITHIN queries that fail with Multiple search Polygons that share the dateline. (Nick Knize) * LUCENE-8603: Fix the inversion of right ids for additional nouns in the Korean user dictionary. (Yoo Jeongin via Jim Ferenczi) * LUCENE-8624: int overflow in ByteBuffersDataOutput.size(). (Mulugeta Mammo, Dawid Weiss) * LUCENE-8625: int overflow in ByteBuffersDataInput.sliceBufferList. (Mulugeta Mammo, Dawid Weiss) * LUCENE-8639: Newly created threadstates while flushing / refreshing can cause duplicated sequence IDs on IndexWriter. (Simon Willnauer) * LUCENE-8649: LatLonShape's within and disjoint queries can return false positives with indexed multi-shapes. (Ignacio Vera) * LUCENE-8654: Polygon2D#relateTriangle returns the wrong answer if polygon is inside the triangle. (Ignacio Vera) * LUCENE-8650: ConcatenatingTokenStream did not correctly clear its state in reset(), and was not propagating final position increments from its child streams correctly. (Dan Meehl, Alan Woodward) * LUCENE-8676: The Korean tokenizer does not update the last position if the backtrace is caused by a big buffer (1024 chars). (Jim Ferenczi) New Features * LUCENE-8026: ExitableDirectoryReader may now time out queries that run on points such as range queries or geo queries. (Christophe Bismuth via Adrien Grand) * LUCENE-8508: IndexWriter can now set the created version via IndexWriterConfig#setIndexCreatedVersionMajor. This is an expert feature. (Adrien Grand) * LUCENE-8601: Attributes set in the IndexableFieldType for each field during indexing will now be recorded into the corresponding FieldInfo's attributes, accessible at search time (Murali Krishna P) Improvements * LUCENE-8463: TopFieldCollector can now early-terminates queries when sorting by SortField.DOC. (Christophe Bismuth via Jim Ferenczi) * LUCENE-8562: Speed up merging segments of points with data dimensions by only sorting on the indexed dimensions. (Ignacio Vera) * LUCENE-8529: TopSuggestDocsCollector will now use the completion key to tiebreak completion suggestion with identical scores. (Jim Ferenczi) * LUCENE-8575: SegmentInfos#toString now includes attributes and diagnostics. (Namgyu Kim via Adrien Grand) * LUCENE-8548: The KoreanTokenizer no longer splits unknown words on combining diacritics and detects script boundaries more accurately with Character#UnicodeScript#of. (Christophe Bismuth, Jim Ferenczi) * LUCENE-8581: Change LatLonShape encoding to use 4 bytes Per Dimension. (Ignacio Vera, Nick Knize, Adrien Grand) * LUCENE-8527: Upgrade JFlex dependency to 1.7.0; in StandardTokenizer and UAX29URLEmailTokenizer, increase supported Unicode version from 6.3 to 9.0, and support Unicode UTS#51 v11.0 Emoji tokenization. * LUCENE-8640: Date Range format validation (Lucky Sharma, David Smiley via Mikhail Khludnev) Optimizations * LUCENE-8552: FieldInfos.getMergedFieldInfos no longer does any merging if there is <= 1 segment. (Christophe Bismuth via David Smiley) * LUCENE-8590: BufferedUpdates now uses an optimized storage for buffering docvalues updates that can safe up to 80% of the heap used compared to the previous implementation and uses non-object based datastructures. (Simon Willnauer, Mike McCandless, Shai Erera, Adrien Grand) * LUCENE-8598: Moved to the default accepted overhead ratio for packet ints in DocValuesFieldUpdats yields an up-to 4x performance improvement when applying doc values updates. (Simon Willnauer, Adrien Grand) * LUCENE-8599: Use sparse bitset to store docs in SingleValueDocValuesFieldUpdates. (Simon Willnauer, Adrien Grand) * LUCENE-8600: Doc-value updates get applied faster by sorting with quicksort, rather than an in-place mergesort, which needs to perform fewer swaps. (Adrien Grand) * LUCENE-8623: Decrease I/O pressure when merging high dimensional points. (Ignacio Vera) Other * LUCENE-8573: BKDWriter now uses FutureArrays#mismatch to compute shared prefixes. (Christoph Büscher via Adrien Grand) * LUCENE-8605: Separate bounding box spatial logic from query logic on LatLonShapeBoundingBoxQuery. (Ignacio Vera) * LUCENE-8609: Deprecated IndexWriter#numDocs() and IndexWriter#maxDoc() in favor of IndexWriter#getDocStats() that allows to get consistent numDocs and maxDoc stats that are not subject to concurrent changes. (Simon Willnauer, Nhat Nguyen) ======================= Lucene 7.6.0 ======================= Build * LUCENE-8504: Upgrade forbiddenapis to version 2.6. (Uwe Schindler) * LUCENE-8498: Deprecate LowerCaseTokenizer and CharTokenizer static methods that take normalizer functions (Alan Woodward) * LUCENE-8493: Stop publishing insecure .sha1 files with releases (janhoy) Bug fixes * LUCENE-8479: QueryBuilder#analyzeGraphPhrase now throws TooManyClause exception if the number of expanded path reaches the BooleanQuery#maxClause limit. (Jim Ferenczi) * LUCENE-8522: throw InvalidShapeException when constructing a polygon and all points are coplanar. (Ignacio Vera) * LUCENE-8531: QueryBuilder#analyzeGraphPhrase now creates one phrase query per finite strings in the graph if the slop is greater than 0. Span queries cannot be used in this case because they don't handle slop the same way than phrase queries. (Steve Rowe, Uwe Schindler, Jim Ferenczi) * LUCENE-8524: Add the Hangul Letter Araea (interpunct) as a separator in Nori's tokenizer. This change also removes empty terms and trim surface form in Nori's Korean dictionary. (Trey Jones, Jim Ferenczi) * LUCENE-8550: Fix filtering of coplanar points when creating linked list on polygon tesselator. (Ignacio Vera) * LUCENE-8549: Polygon tessellator throws an error if some parts of the shape could not be processed. (Ignacio Vera) * LUCENE-8540: Better handling of min/max values for Geo3d encoding. (Ignacio Vera) * LUCENE-8534: Fix incorrect computation for triangles intersecting polygon edges in shape tessellation. (Ignacio Vera) * LUCENE-8559: Fix bug where polygon edges were skipped when checking for intersections. (Ignacio Vera) * LUCENE-8556: Use latitude and longitude instead of encoding values to check if triangle is ear when using morton optimisation. (Ignacio Vera) * LUCENE-8586: Intervals.or() could get stuck in an infinite loop on certain indexes (Alan Woodward) * LUCENE-8589: Fix MultiPhraseQuery to not duplicate terms when building the phrase's weight. (Jim Ferenczi) * LUCENE-8595: Fix interleaved DV update and reset. Interleaved update and reset value to the same doc in the same updates package looses an update if the reset comes before the update as well as loosing the reset if the update comes frist. (Simon Willnauer, Adrien Grand) * LUCENE-8592: Fix index sorting corruption due to numeric overflow. The merge of sorted segments can produce an invalid sort if the sort field is an Integer/Long that uses reverse order and contains values equal to Integer/Long#MIN_VALUE. These values are always sorted first during a merge (instead of last because of the reverse order) due to this bug. Indices affected by the bug can be detected by running the CheckIndex command on a distribution that contains the fix (7.6+). (Jim Ferenczi, Adrien Grand, Mike McCandless, Simon Willnauer) New Fea

评论收藏

内容反馈