http://bugzilla.lysator.liu.se/show_bug.cgi?id=1616
------- Additional Comments From ceder(a)lysator.liu.se 2006-10-07 21:49 -------
lyskomd should be able to convert legacy databases to the new format on-the-fly.
Generation 0 is reserved for this. A server that is started with an old-style
database will set the generation to 0 in all text-stats as they are read. It
will write a new database file with the new-style text-stats.
Generation 0 means that a text is stored in the file "lyskomd-texts", at the
position specified by "position". No metadata is stored in the file.
During startup, the server will count how many texts there are. It will assign
them to new generations, using the configuration that specifies when new
generation files should be created. Say that there are 10000 old texts. The
server might assign 1000 texts per generation, which means that the old texts
will be assigned to generations 1, 2, 3... 10. The server would have
generation 11 as the current generation (where new texts will be created).
In the background, the old texts will be written to new-style generation
files. Once all the generations are created (10 different generation files
in the example) and the text-stats have been committed to disk, the
lyskomd-texts file can be removed.
lyskomd should periodically write an index file, which mentions all the
generation files that are in use. dbck should be taught to check if there
are any stray generation files left.
Generation files could be named using the printf-style expression
"lyskomd-texts-%d-%d", generation, reclamation
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.
http://bugzilla.lysator.liu.se/show_bug.cgi?id=1616
Summary: Store text mass in generation files
Product: lyskomd
Version: 2.1.2
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: server
AssignedTo: ceder(a)lysator.liu.se
ReportedBy: ceder(a)lysator.liu.se
QAContact: lyskomd-qa(a)lists.lysator.liu.se
The actual texts are currently stored in a single, large file. New texts are
appended to the tail. Old deleted texts remain there forever (or until the
file is compacted using "dbck -g" -- something that requires downtime).
A more efficient structure with automatic reclaiming is needed to increse the
availability on large installations.
Here is a draft suggestion on a new format:
Texts are stored in generation files. That is, texts created at roughly the
same time will be stored in the same file. Each file consists of:
- a magic marker, for sanity checking
- the generation number
- the reclamation sequence number
- a number of texts
Each text is stored as:
- the text number
- the actual text, as a hollerith string
- the checksum method as a hollerith string (currently always "5HSHA-1")
- a checksum as a hollerith string containing the hexadecimal
representation of the checksum, using 0-9, a-f (and not A-F)
lyskomd starts by writing to generation 1. (Generation 0 is reserved for
online conversion of legacy databases; see below.) Each new text will be
appended to the tail of the generation 1 file. The reclamation sequence
count is 0.
Once the file reaches a configurable size, or the oldest text in it reaches
a configurable age (whichever happens first), a new generation file is
created. The generation number of that file will be one more than the
previous generation number.
When a configurable percentage of the texts (or text size) of an old
generation file has been deleted, a copy of the generation file is created.
The reclamation sequence number is increased. Only the texts that still
exists are written to the new copy. This operation happens in the
background. Once the file has been completely written, and all indexes
in the text statuses written to disk, the old generation file can be removed.
If the total size of two adjacent generation files drops below a configurable
size, they will be combined to a single file, which gets the generation
number of the oldest file and a reclamation number one larger than the
previous reclamation number of that file. This should maybe only happen
as part of a normal reclamation operation.
The text-stat will have to change. The "long file_pos;" attribute needs
to be replaced with something like this:
long generation;
long reclamation;
long position;
long generation_b;
long reclamation_b;
long position_b;
The "_b" variants are needed to keep references to both the previous and
the current file while a new generation file is being written. Maybe
that state can be kept separately, since it is only needed while a new
generation file is being written.
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.
http://bugzilla.lysator.liu.se/show_bug.cgi?id=1615
ceder(a)lysator.liu.se changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Target Milestone|--- |Future
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.
http://bugzilla.lysator.liu.se/show_bug.cgi?id=1614
ceder(a)lysator.liu.se changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Target Milestone|--- |Future
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.
http://bugzilla.lysator.liu.se/show_bug.cgi?id=168
ceder(a)lysator.liu.se changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
Target Milestone|Future |2.1.3
------- Additional Comments From ceder(a)lysator.liu.se 2006-10-01 22:10 -------
Fixed in r5594.
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.