package org.apache.lucene.index;
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.search.Similarity;
import org.apache.lucene.search.Query;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.Lock;
import org.apache.lucene.store.LockObtainFailedException;
import org.apache.lucene.store.AlreadyClosedException;
import org.apache.lucene.util.BitVector;
import org.apache.lucene.util.Constants;
import java.io.File;
import java.io.IOException;
import java.io.PrintStream;
import java.util.List;
import java.util.Collection;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Set;
import java.util.HashSet;
import java.util.LinkedList;
import java.util.Iterator;
/**
An <code>IndexWriter</code> creates and maintains an index.
<p>The <code>create</code> argument to the
<a href="#IndexWriter(org.apache.lucene.store.Directory, org.apache.lucene.analysis.Analyzer, boolean)"><b>constructor</b></a>
determines whether a new index is created, or whether an existing index is
opened. Note that you
can open an index with <code>create=true</code> even while readers are
using the index. The old readers will continue to search
the "point in time" snapshot they had opened, and won't
see the newly created index until they re-open. There are
also <a href="#IndexWriter(org.apache.lucene.store.Directory, org.apache.lucene.analysis.Analyzer)"><b>constructors</b></a>
with no <code>create</code> argument which
will create a new index if there is not already an index at the
provided path and otherwise open the existing index.</p>
<p>In either case, documents are added with <a
href="#addDocument(org.apache.lucene.document.Document)"><b>addDocument</b></a>
and removed with <a
href="#deleteDocuments(org.apache.lucene.index.Term)"><b>deleteDocuments(Term)</b></a>
or <a
href="#deleteDocuments(org.apache.lucene.search.Query)"><b>deleteDocuments(Query)</b></a>.
A document can be updated with <a href="#updateDocument(org.apache.lucene.index.Term, org.apache.lucene.document.Document)"><b>updateDocument</b></a>
(which just deletes and then adds the entire document).
When finished adding, deleting and updating documents, <a href="#close()"><b>close</b></a> should be called.</p>
<a name="flush"></a>
<p>These changes are buffered in memory and periodically
flushed to the {@link Directory} (during the above method
calls). A flush is triggered when there are enough
buffered deletes (see {@link #setMaxBufferedDeleteTerms})
or enough added documents since the last flush, whichever
is sooner. For the added documents, flushing is triggered
either by RAM usage of the documents (see {@link
#setRAMBufferSizeMB}) or the number of added documents.
The default is to flush when RAM usage hits 16 MB. For
best indexing speed you should flush by RAM usage with a
large RAM buffer. Note that flushing just moves the
internal buffered state in IndexWriter into the index, but
these changes are not visible to IndexReader until either
{@link #commit()} or {@link #close} is called. A flush may
also trigger one or more segment merges which by default
run with a background thread so as not to block the
addDocument calls (see <a href="#mergePolicy">below</a>
for changing the {@link MergeScheduler}).</p>
<a name="autoCommit"></a>
<p>The optional <code>autoCommit</code> argument to the <a
href="#IndexWriter(org.apache.lucene.store.Directory,
boolean,
org.apache.lucene.analysis.Analyzer)"><b>constructors</b></a>
controls visibility of the changes to {@link IndexReader}
instances reading the same index. When this is
<code>false</code>, changes are not visible until {@link
#close()} or {@link #commit()} is called. Note that changes will still be
flushed to the {@link org.apache.lucene.store.Directory}
as new files, but are not committed (no new
<code>segments_N</code> file is written referencing the
new files, nor are the files sync'd to stable storage)
until {@link #close()} or {@link #commit()} is called. If something
goes terribly wrong (for example the JVM crashes), then
the index will reflect none of the changes made since the
last commit, or the starting state if commit was not called.
You can also call {@link #rollback}, which closes the writer
without committing any changes, and removes any index
files that had been flushed but are now unreferenced.
This mode is useful for preventing readers from refreshing
at a bad time (for example after you've done all your
deletes but before you've done your adds). It can also be
used to implement simple single-writer transactional
semantics ("all or none"). You can do a two-phase commit
by calling {@link #prepareCommit()}
followed by {@link #commit()}. This is necessary when
Lucene is working with an external resource (for example,
a database) and both must either commit or rollback the
transaction.</p>
<p>When <code>autoCommit</code> is <code>true</code> then
the writer will periodically commit on its own. [<b>Deprecated</b>: Note that in 3.0, IndexWriter will
no longer accept autoCommit=true (it will be hardwired to
false). You can always call {@link #commit()} yourself
when needed]. There is
no guarantee when exactly an auto commit will occur (it
used to be after every flush, but it is now after every
completed merge, as of 2.4). If you want to force a
commit, call {@link #commit()}, or, close the writer. Once
a commit has finished, newly opened {@link IndexReader} instances will
see the changes to the index as of that commit. When
running in this mode, be careful not to refresh your
readers while optimize or segment merges are taking place
as this can tie up substantial disk space.</p>
<p>Regardless of <code>autoCommit</code>, an {@link
IndexReader} or {@link org.apache.lucene.search.IndexSearcher} will only see the
index as of the "point in time" that it was opened. Any
changes committed to the index after the reader was opened
are not visible until the reader is re-opened.</p>
<p>If an index will not have more documents added for a while and optimal search
performance is desired, then either the full <a href="#optimize()"><b>optimize</b></a>
method or partial {@link #optimize(int)} method should be
called before the index is closed.</p>
<p>Opening an <code>IndexWriter</code> creates a lock file for the directory in use. Trying to open
another <code>IndexWriter</code> on the same directory will lead to a
{@link LockObtainFailedException}. The {@link LockObtainFailedException}
is also thrown if an IndexReader on the same directory is used to delete documents
from the index.</p>
<a name="deletionPolicy"></a>
<p>Expert: <code>IndexWriter</code> allows an optional
{@link IndexDeletionPolicy} implementation to be
specified. You can use this to control when prior commits
are deleted from the index. The default policy is {@link
KeepOnlyLastCommitDeletionPolicy} which removes all prior
commits as soon as