http://bugzilla.lysator.liu.se/show_bug.cgi?id=1616
Summary: Store text mass in generation files Product: lyskomd Version: 2.1.2 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: server AssignedTo: ceder@lysator.liu.se ReportedBy: ceder@lysator.liu.se QAContact: lyskomd-qa@lists.lysator.liu.se
The actual texts are currently stored in a single, large file. New texts are appended to the tail. Old deleted texts remain there forever (or until the file is compacted using "dbck -g" -- something that requires downtime).
A more efficient structure with automatic reclaiming is needed to increse the availability on large installations.
Here is a draft suggestion on a new format:
Texts are stored in generation files. That is, texts created at roughly the same time will be stored in the same file. Each file consists of:
- a magic marker, for sanity checking - the generation number - the reclamation sequence number - a number of texts
Each text is stored as:
- the text number - the actual text, as a hollerith string - the checksum method as a hollerith string (currently always "5HSHA-1") - a checksum as a hollerith string containing the hexadecimal representation of the checksum, using 0-9, a-f (and not A-F)
lyskomd starts by writing to generation 1. (Generation 0 is reserved for online conversion of legacy databases; see below.) Each new text will be appended to the tail of the generation 1 file. The reclamation sequence count is 0.
Once the file reaches a configurable size, or the oldest text in it reaches a configurable age (whichever happens first), a new generation file is created. The generation number of that file will be one more than the previous generation number.
When a configurable percentage of the texts (or text size) of an old generation file has been deleted, a copy of the generation file is created. The reclamation sequence number is increased. Only the texts that still exists are written to the new copy. This operation happens in the background. Once the file has been completely written, and all indexes in the text statuses written to disk, the old generation file can be removed.
If the total size of two adjacent generation files drops below a configurable size, they will be combined to a single file, which gets the generation number of the oldest file and a reclamation number one larger than the previous reclamation number of that file. This should maybe only happen as part of a normal reclamation operation.
The text-stat will have to change. The "long file_pos;" attribute needs to be replaced with something like this:
long generation; long reclamation; long position; long generation_b; long reclamation_b; long position_b;
The "_b" variants are needed to keep references to both the previous and the current file while a new generation file is being written. Maybe that state can be kept separately, since it is only needed while a new generation file is being written.
------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
lyskomd-qa@lists.lysator.liu.se