Multifile

class Multifile

Bases: ReferenceCount

A file that contains a set of files.

Inheritance diagram

Multifile(void)

bool add_signature(Filename const &certificate, Filename const &chain, Filename const &pkey, std::string const &password = "")

bool add_signature(Filename const &composite, std::string const &password = "")

Adds a new signature to the Multifile. This signature associates the indicated certificate with the current contents of the Multifile. When the Multifile is read later, the signature will still be present only if the Multifile is unchanged; any subsequent changes to the Multifile will automatically invalidate and remove the signature.

The chain filename may be empty if the certificate does not require an authenticating certificate chain (e.g. because it is self-signed).

The specified private key must match the certificate, and the Multifile must be open in read-write mode. The private key is only used for generating the signature; it is not written to the Multifile and cannot be retrieved from the Multifile later. (However, the certificate can be retrieved from the Multifile later, to identify the entity that created the signature.)

This implicitly causes a repack() operation if one is needed. Returns true on success, false on failure.

This flavor of add_signature() reads the certificate and private key from a PEM-formatted file, for instance as generated by the openssl command. If the private key file is password-encrypted, the third parameter will be used as the password to decrypt it.

Adds a new signature to the Multifile. This signature associates the indicated certificate with the current contents of the Multifile. When the Multifile is read later, the signature will still be present only if the Multifile is unchanged; any subsequent changes to the Multifile will automatically invalidate and remove the signature.

This flavor of add_signature() reads the certificate, private key, and certificate chain from the same PEM-formatted file. It takes the first private key found as the intended key, and then uses the first certificate found that matches that key as the signing certificate. Any other certificates in the file are taken to be part of the chain.

Adds a new signature to the Multifile. This signature associates the indicated certificate with the current contents of the Multifile. When the Multifile is read later, the signature will still be present only if the Multifile is unchanged; any subsequent changes to the Multifile will automatically invalidate and remove the signature.

The signature certificate is the first certificate on the CertChain object. Any remaining certificates are support certificates to authenticate the first one.

The specified private key must match the certificate, and the Multifile must be open in read-write mode. The private key is only used for generating the signature; it is not written to the Multifile and cannot be retrieved from the Multifile later. (However, the certificate can be retrieved from the Multifile later, to identify the entity that created the signature.)

This implicitly causes a repack() operation if one is needed. Returns true on success, false on failure.

std::string add_subfile(std::string const &subfile_name, Filename const &filename, int compression_level)

std::string add_subfile(std::string const &subfile_name, std::istream *subfile_data, int compression_level)

Adds a file on disk as a subfile to the Multifile. The file named by filename will be read and added to the Multifile at the next call to flush(). If there already exists a subfile with the indicated name, it is replaced without examining its contents (but see also update_subfile).

Either Filename:::set_binary() or set_text() must have been called previously to specify the nature of the source file. If set_text() was called, the text flag will be set on the subfile.

Returns the subfile name on success (it might have been modified slightly), or empty string on failure.

Adds a file from a stream as a subfile to the Multifile. The indicated istream will be read and its contents added to the Multifile at the next call to flush(). The file will be added as a binary subfile.

Note that the istream must remain untouched and unused by any other code until flush() is called. At that time, the Multifile will read the entire contents of the istream from the current file position to the end of the file. Subsequently, the Multifile will not close or delete the istream. It is the caller’s responsibility to ensure that the istream pointer does not destruct during the lifetime of the Multifile.

Returns the subfile name on success (it might have been modified slightly), or empty string on failure.

void close(void): Closes the Multifile if it is open. All changes are flushed to disk, and the file becomes invalid for further operations until the next call to open().

static void close_read_subfile(std::istream *stream): Closes a file opened by a previous call to open_read_subfile(). This really just deletes the istream pointer, but it is recommended to use this interface instead of deleting it explicitly, to help work around compiler issues.

bool compare_subfile(int index, Filename const &filename)

Performs a byte-for-byte comparison of the indicated file on disk with the nth subfile. Returns true if the files are equivalent, or false if they are different (or the file is missing).

If Filename::set_binary() or set_text() has already been called, it specifies the nature of the source file. If this is different from the text flag of the subfile, the comparison will always return false. If this has not been specified, it will be set from the text flag of the subfile.

bool extract_subfile(int index, Filename const &filename): Extracts the nth subfile into a file with the given name.

bool extract_subfile_to(int index, std::ostream &out): Extracts the nth subfile to the indicated ostream.

int find_subfile(std::string const &subfile_name) const: Returns the index of the subfile with the indicated name, or -1 if the named subfile is not within the Multifile.

bool flush(void)

Writes all contents of the Multifile to disk. Until flush() is called, add_subfile() and remove_subfile() do not actually do anything to disk. At this point, all of the recently-added subfiles are read and their contents are added to the end of the Multifile, and the recently-removed subfiles are marked gone from the Multifile.

This may result in a suboptimal index. To guarantee that the index is written at the beginning of the file, call repack() instead of flush().

It is not necessary to call flush() explicitly unless you are concerned about reading the recently-added subfiles immediately.

Returns true on success, false on failure.

std::string const &get_encryption_algorithm(void) const: Returns the encryption algorithm that was specified by set_encryption_algorithm().

bool get_encryption_flag(void) const: Returns the flag indicating whether subsequently-added subfiles should be encrypted before writing them to the multifile. See set_encryption_flag().

int get_encryption_iteration_count(void) const: Returns the value that was specified by set_encryption_iteration_count().

int get_encryption_key_length(void) const: Returns the encryption key length, in bits, that was specified by set_encryption_key_length().

std::string const &get_encryption_password(void) const: Returns the password that will be used to encrypt subfiles subsequently added to the multifile. See set_encryption_password().

std::string const &get_header_prefix(void) const: Returns the string that preceded the Multifile header on the file, if any. See set_header_prefix().

std::streamoff get_index_end(void) const

Returns the first byte that is guaranteed to follow any index byte already written to disk in the Multifile.

This number is largely meaningless in many cases, but if needs_repack() is false, and the file is flushed, this will indicate the number of bytes in the header + index. Everything at this byte position and later will be actual data.

std::string get_magic_number(void): Returns a string with the first n bytes written to a Multifile, to identify it as a Multifile.

Filename const &get_multifile_name(void) const: Returns the filename of the Multifile, if it is available.

int get_num_signatures(void) const

Returns the number of matching signatures found on the Multifile. These signatures may be iterated via get_signature() and related methods.

A signature on this list is guaranteed to match the Multifile contents, proving that the Multifile has been unmodified since the signature was applied. However, this does not guarantee that the certificate itself is actually from who it says it is from; only that it matches the Multifile contents. See validate_signature_certificate() to authenticate a particular certificate.

int get_num_subfiles(void) const: Returns the number of subfiles within the Multifile. The subfiles may be accessed in alphabetical order by iterating through [0 .. get_num_subfiles()).

bool get_record_timestamp(void) const: Returns the flag indicating whether timestamps should be recorded within the Multifile or not. See set_record_timestamp().

std::size_t get_scale_factor(void) const: Returns the internal scale factor for this Multifile. See set_scale_factor().

std::string get_signature_friendly_name(int n) const

Returns a “friendly name” for the nth signature found on the Multifile. This attempts to extract out the most meaningful part of the subject name. It returns the emailAddress, if it is defined; otherwise, it returns the commonName.

See the comments in get_num_signatures().

std::string get_signature_public_key(int n) const

Returns the public key used for the nth signature found on the Multifile. This is encoded in DER form and returned as a string of hex digits.

This can be used, in conjunction with the subject name (see get_signature_subject_name()), to uniquely identify a particular certificate and its subsequent reissues. See the comments in get_num_signatures().

std::string get_signature_subject_name(int n) const: Returns the “subject name” for the nth signature found on the Multifile. This is a string formatted according to RFC2253 that should more-or-less identify a particular certificate; when paired with the public key (see get_signature_public_key()), it can uniquely identify a certificate. See the comments in get_num_signatures().

std::size_t get_subfile_internal_length(int index) const: Returns the number of bytes the indicated subfile consumes within the archive. For compressed subfiles, this will generally be smaller than get_subfile_length(); for encrypted (but noncompressed) subfiles, it may be slightly different, for noncompressed and nonencrypted subfiles, it will be equal.

std::streamoff get_subfile_internal_start(int index) const: Returns the starting byte position within the Multifile at which the indicated subfile begins. This may be used, with get_subfile_internal_length(), for low-level access to the subfile, but usually it is better to use open_read_subfile() instead (which automatically decrypts and/or uncompresses the subfile data).

std::size_t get_subfile_length(int index) const: Returns the uncompressed data length of the nth subfile. This might return 0 if the subfile has recently been added and flush() has not yet been called.

std::string const &get_subfile_name(int index) const: Returns the name of the nth subfile.

time_t get_subfile_timestamp(int index) const: Returns the modification time of the nth subfile. If this is called on an older .mf file, which did not store individual timestamps in the file (or if get_record_timestamp() is false), this will return the modification time of the overall multifile.

time_t get_timestamp(void) const: Returns the modification timestamp of the overall Multifile. This indicates the most recent date at which subfiles were added or removed from the Multifile. Note that it is logically possible for an individual subfile to have a more recent timestamp than the overall timestamp.

bool has_directory(std::string const &subfile_name) const: Returns true if the indicated subfile name is the directory prefix to one or more files within the Multifile. That is, the Multifile contains at least one file named “subfile_name/…”.

bool is_read_valid(void) const: Returns true if the Multifile has been opened for read mode and there have been no errors, and individual Subfile contents may be extracted.

bool is_subfile_compressed(int index) const: Returns true if the indicated subfile has been compressed when stored within the archive, false otherwise.

bool is_subfile_encrypted(int index) const: Returns true if the indicated subfile has been encrypted when stored within the archive, false otherwise.

bool is_subfile_text(int index) const: Returns true if the indicated subfile represents text data, or false if it represents binary data. If the file is text data, it may have been processed by end-of-line conversion when it was added. (But the actual bits in the multifile will represent the standard Unix end-of-line convention, e.g. n instead of rn.)

bool is_write_valid(void) const: Returns true if the Multifile has been opened for write mode and there have been no errors, and Subfiles may be added or removed from the Multifile.

void ls(std::ostream &out = ::std::cout) const: Shows a list of all subfiles within the Multifile.

bool needs_repack(void) const: Returns true if the Multifile index is suboptimal and should be repacked. Call repack() to achieve this.

bool open_read(Filename const &multifile_name, std::streamoff const &offset = 0)

bool open_read(IStreamWrapper *multifile_stream, bool owns_pointer = false, std::streamoff const &offset = 0)

Opens the named Multifile on disk for reading. The Multifile index is read in, and the list of subfiles becomes available; individual subfiles may then be extracted or read, but the list of subfiles may not be modified.

Also see the version of open_read() which accepts an istream. Returns true on success, false on failure.

Opens an anonymous Multifile for reading using an istream. There must be seek functionality via seekg() and tellg() on the istream.

If owns_pointer is true, then the Multifile assumes ownership of the stream pointer and will delete it when the multifile is closed, including if this function returns false.

std::istream *open_read_subfile(int index)

Returns an istream that may be used to read the indicated subfile. You may seek() within this istream to your heart’s content; even though it will be a reference to the already-opened pfstream of the Multifile itself, byte 0 appears to be the beginning of the subfile and EOF appears to be the end of the subfile.

The returned istream will have been allocated via new; you should pass the pointer to close_read_subfile() when you are finished with it to delete it and release its resources.

Any future calls to repack() or close() (or the Multifile destructor) will invalidate all currently open subfile pointers.

The return value will be NULL if the stream cannot be opened for some reason.

This variant of open_read_subfile() is used internally only, and accepts a pointer to the internal Subfile object, which is assumed to be valid and written to the multifile.

bool open_read_write(Filename const &multifile_name)

bool open_read_write(std::iostream *multifile_stream, bool owns_pointer = false)

Opens the named Multifile on disk for reading and writing. If there already exists a file by that name, its index is read. Subfiles may be added or removed, and the resulting changes will be written to the named file.

Also see the version of open_read_write() which accepts an iostream. Returns true on success, false on failure.

Opens an anonymous Multifile for reading and writing using an iostream. There must be seek functionality via seekg()/seekp() and tellg()/tellp() on the iostream.

If owns_pointer is true, then the Multifile assumes ownership of the stream pointer and will delete it when the multifile is closed, including if this function returns false.

bool open_write(Filename const &multifile_name)

bool open_write(std::ostream *multifile_stream, bool owns_pointer = false)

Opens the named Multifile on disk for writing. If there already exists a file by that name, it is truncated. The Multifile is then prepared for accepting a brand new set of subfiles, which will be written to the indicated filename. Individual subfiles may not be extracted or read.

Also see the version of open_write() which accepts an ostream. Returns true on success, false on failure.

Opens an anonymous Multifile for writing using an ostream. There must be seek functionality via seekp() and tellp() on the pstream.

If owns_pointer is true, then the Multifile assumes ownership of the stream pointer and will delete it when the multifile is closed, including if this function returns false.

void output(std::ostream &out) const

void print_signature_certificate(int n, std::ostream &out) const: Writes the certificate for the nth signature, in user-readable verbose form, to the indicated stream. See the comments in get_num_signatures().

vector_uchar read_subfile(int index)

Returns a vector_uchar that contains the entire contents of the indicated subfile.

Fills a string with the entire contents of the indicated subfile.

Fills a pvector with the entire contents of the indicated subfile.

void remove_subfile(int index)

bool remove_subfile(std::string const &subfile_name)

Removes the named subfile from the Multifile, if it exists; returns true if successfully removed, or false if it did not exist in the first place. The file will not actually be removed from the disk until the next call to flush().

Note that this does not actually remove the data from the indicated subfile; it simply removes it from the index. The Multifile will not be reduced in size after this operation, until the next call to repack().

Removes the nth subfile from the Multifile. This will cause all subsequent index numbers to decrease by one. The file will not actually be removed from the disk until the next call to flush().

Note that this does not actually remove the data from the indicated subfile; it simply removes it from the index. The Multifile will not be reduced in size after this operation, until the next call to repack().

bool repack(void)

Forces a complete rewrite of the Multifile and all of its contents, so that its index will appear at the beginning of the file with all of the subfiles listed in alphabetical order. This is considered optimal for reading, and is the standard configuration; but it is not essential to do this.

It is only valid to call this if the Multifile was opened using open_read_write() and an explicit filename, rather than an iostream. Also, we must have write permission to the directory containing the Multifile.

Returns true on success, false on failure.

bool scan_directory(vector_string &contents, std::string const &subfile_name) const

Considers subfile_name to be the name of a subdirectory within the Multifile, but not a file itself; fills the given vector up with the sorted list of subdirectories or files within the named directory.

Note that directories do not exist explicitly within a Multifile; this just checks for the existence of files with the given initial prefix.

Returns true if successful, false otherwise.

void set_encryption_algorithm(std::string const &encryption_algorithm)

Specifies the encryption algorithm that should be used for future calls to add_subfile(). The default is whatever is specified by the encryption- algorithm config variable. The complete set of available algorithms is defined by the current version of OpenSSL.

If an invalid algorithm is specified, there is no immediate error return code, but flush() will fail and the file will be invalid.

It is possible to apply a different encryption algorithm to different files, and unlike the password, this does not interfere with mounting the multifile via VFS. Changing this value may cause an implicit call to flush().

void set_encryption_flag(bool flag)

Sets the flag indicating whether subsequently-added subfiles should be encrypted before writing them to the multifile. If true, subfiles will be encrypted; if false (the default), they will be written without encryption.

When true, subfiles will be encrypted with the password specified by set_encryption_password(). It is possible to apply a different password to different files, but the resulting file can’t be mounted via VFS.

void set_encryption_iteration_count(int encryption_iteration_count)

Specifies the number of times to repeatedly hash the key before writing it to the stream in future calls to add_subfile(). Its purpose is to make it computationally more expensive for an attacker to search the key space exhaustively. This should be a multiple of 1,000 and should not exceed about 65 million; the value 0 indicates just one application of the hashing algorithm.

The default is whatever is specified by the multifile-encryption-iteration- count config variable.

It is possible to apply a different iteration count to different files, and unlike the password, this does not interfere with mounting the multifile via VFS. Changing this value causes an implicit call to flush().

void set_encryption_key_length(int encryption_key_length)

Specifies the length of the key, in bits, that should be used to encrypt the stream in future calls to add_subfile(). The default is whatever is specified by the encryption-key-length config variable.

If an invalid key_length for the chosen algorithm is specified, there is no immediate error return code, but flush() will fail and the file will be invalid.

It is possible to apply a different key length to different files, and unlike the password, this does not interfere with mounting the multifile via VFS. Changing this value may cause an implicit call to flush().

void set_encryption_password(std::string const &encryption_password)

Specifies the password that will be used to encrypt subfiles subsequently added to the multifile, if the encryption flag is also set true (see set_encryption_flag()).

It is possible to apply a different password to different files, but the resulting file can’t be mounted via VFS. Changing this value may cause an implicit call to flush().

void set_header_prefix(std::string const &header_prefix)

Sets the string which is written to the Multifile before the Multifile header. This string must begin with a hash mark and end with a newline character; and if it includes embedded newline characters, each one must be followed by a hash mark. If these conditions are not initially true, the string will be modified as necessary to make it so.

This is primarily useful as a simple hack to allow p3d applications to be run directly from the command line on Unix-like systems.

The return value is true if successful, or false on failure (for instance, because the header prefix violates the above rules).

void set_multifile_name(Filename const &multifile_name): Replaces the filename of the Multifile. This is primarily used for documentation purposes only; changing this name does not open the indicated file. See open_read() or open_write() for that.

void set_record_timestamp(bool record_timestamp)

Sets the flag indicating whether timestamps should be recorded within the Multifile or not. The default is true, indicating the Multifile will record timestamps for the overall file and also for each subfile.

If this is false, the Multifile will not record timestamps internally. In this case, the return value from get_timestamp() or get_subfile_timestamp() will be estimations.

You may want to set this false to minimize the bitwise difference between independently-generated Multifiles.

void set_scale_factor(std::size_t scale_factor)

Changes the internal scale factor for this Multifile.

This is normally 1, but it may be set to any arbitrary value (greater than zero) to support Multifile archives that exceed 4GB, if necessary. (Individual subfiles may still not exceed 4GB.)

All addresses within the file are rounded up to the next multiple of _scale_factor, and zeros are written to the file to fill the resulting gaps. Then the address is divided by _scale_factor and written out as a 32-bit integer. Thus, setting a scale factor of 2 supports up to 8GB files, 3 supports 12GB files, etc.

Calling this function on an already-existing Multifile will have no immediate effect until a future call to repack() or close() (or until the Multifile is destructed).

void set_timestamp(time_t timestamp): Changes the overall modification timestamp of the multifile. Note that this will be reset to the current time every time you modify a subfile. Only set this if you know what you are doing!

std::string update_subfile(std::string const &subfile_name, Filename const &filename, int compression_level)

Adds a file on disk to the subfile. If a subfile already exists with the same name, its contents are compared byte-for-byte to the disk file, and it is replaced only if it is different; otherwise, the multifile is left unchanged.

Either Filename:::set_binary() or set_text() must have been called previously to specify the nature of the source file. If set_text() was called, the text flag will be set on the subfile.

int validate_signature_certificate(int n) const: Checks that the certificate used for the nth signature is a valid, authorized certificate with some known certificate authority. Returns 0 if it is valid, -1 if there is some error, or the corresponding OpenSSL error code if it is invalid, out-of-date, or self-signed.

void write_signature_certificate(int n, std::ostream &out) const: Writes the certificate for the nth signature, in PEM form, to the indicated stream. See the comments in get_num_signatures().