* Functional specification for issue #516: Obliterate *
#TODO
+ add the missing functional requirements
+ add more details and examples - where needed - on the functional requirements
+ add the missing non-functional requirements
+ probably need to split up some of the requirements to make them smaller
(and SMARTer).
+ add a clear description of the cascading effect:
revisions contain directories, directory contains files and property
changes, files contains content changes and property changes, content and
property changes can be merged.
+ verify that with the documented requirements (functional and non-functional)
the use cases and examples can be 'solved'.
+ fill in use case vs requirements table.
+ let a native English speaker review for clarity and propose better keywords
+ review on mailing list(s).
+ finalize
#END-TODO
I. Overview
This document serves as the functional specification for what's commonly
called the 'svn obliterate' feature.
II. Use Cases
1. Disable all access to confidential information in a repository.
[security]
A. Description
This is the case where a user has added information to the repository
that should not have been made public. The distribution of this
information must be halted, and where it has been distributed, it must
be removed.
This use case typically requires removal of any trace of that
information from the whole history of the repository. In short, if a
confidential file was copied, also obliterate the copy.
B. Examples
+ User adding documents with confidential information to the repository.
Needs to stop distribution to working copies and mirrors ASAP.
+ User adding source code to the repository, finds out later that it's
infringing certain intellectual rights. Need to remove all traces of
the infringing source code, including all derivatives, from the
repository.
C. Primary actor triggering this use case
A key user of the repository that knows what confidential information
should be removed, and who can estimate the impact of obliteration
(which paths, which revision range(s) etc.
Normal users should not be able to obliterate. For those users we
already have 'svn rm'.
2. Remove obsolete information from a repository and free the associated
disc space.
[disc space]
A. Description
This is the case where unneeded or obsolete information is stored in the
repository, taking up lots of disc space. In order to free up disc
space, this information may be obliterated.
This use case typically requires removal of certain subsets of the
repository while leaving later revisions intact. In short, if an
obsolete file was copied, leave the copy intact.
This use case is often combined with archiving of the obsolete
information: archive first, then obliterate.
B. Examples
+ User adding a whole set of development tools, huge binaries or
external libraries to the product by mistake.
+ Users managing huge files (MB/GB's) as part of their normal workflow.
These files can be removed when work on newer versions has started.
+ Users adding source code, assets and build deliverables in the same
repository. Certain assets or build deliverables can be removed
+ When a project is moved to its own repository, the project's files may
be obliterated from the original repository. This includes moving old
projects to an archive repository.
+ Repositories setup to store product deliverables. Those deliverables
for old unmaintained versions, like everything older than a revision
or date, may be obliterated from the repository.
+ Removal of dead branches which changes have and will not be included
in the main development line.
C. Primary actor triggering this use case
A repository administrator that's concerned about disc space usage.
However, only a key user can decide which information may be
obliterated.
III. Current solution
1. Dump -> Filter -> Load
Subversion already has a solution in place to completely remove
information from a repository. It's a combination of dumping a
repository to text format (svnadmin dump), using filters to remove some
nodes or revisions from the text (svndumpfilter) and then loading it
back into a new repository (svnadmin load).
Where svndumpfilter is used to remove information from a repository,
obliterate should cover at least all of its features.
2. Advantages of current solution
+ svndumpfilter exists today.
+ It has the most basic include and exclude filters built-in.
+ Its functionality is reasonably well understood.
3. Disadvantages of current solution
+ svndumpfilter has a series of issues (8 right now, see the issue
tracker).
+ Its filtering options are limited to include or exclude paths, no
wildcard support...
+ Filtering is based on pathnames, not node based
+ Due to its streamy way of working it has no random access to the
source nor target repository, hence it can't rewrite copies or later
modifications on filtered files.
+ Uses an intermediate text format and requires filtering the whole
repository, not only the relevant revisions -> Slow.
+ Requires the extra disc space for the output repository.
+ The svndumpfiler code is not actively maintained.
+ Slow.
+ Requires shell access on repository server or at least access to
dump files.
IV. Detailed functional requirements
0. Overview
The workflow of the obliterate solution can be defined in six steps:
1. SELECT the lines of history to obliterate.
2. LIMIT the range of obliteration to a revision or revision range.
3. DEFINE how to handle the consequences of obliteration on derivative
modifications. [#TODO: this needs a clearer keyword]
4. HIDE the selected modifications.
5. If needed, UNHIDE selected modifications.
6. OBLITERATE the selected modifications from the repository.
While in the final solution step 4 HIDE and step 5 OBLITERATE may be
combined into one - as it's probably much easier to implement, there are
some clear advantages to keeping the HIDE step separate:
+ In the security use case, hiding confidential information is much more
time-critical than the final obliteration.
+ Hiding information can be done by a key user, whereas obliteration
should be done by an administrator with direct repository access.
Note: while there's certainly a need to have repository administration
control without requiring shell access to a server, this need is not
obliterate specific and as such doesn't have to be solved in the scope
of this solution.
+ Hiding information can be seen as a dry run for final obliteration. It
allows the key user to analyse the impact of the selected filters,
hide extra information or recover where needed before committing to
removing it from the repository.
Each of these steps are detailed in the following list of functional
requirements. We'll probably find that the differences in requirements
needed for each use cases are mainly in step 3 and 4.
Priorities are one of: ( MoSCoW )