DiSMEC++
dismec::io::model::PartialModelSaver Class Reference

Manage saving a model consisting of multiple partial models. More...

#include <model-io.h>

Inheritance diagram for dismec::io::model::PartialModelSaver:
dismec::io::model::PartialModelIO

Public Member Functions

 PartialModelSaver (path target_file, SaveOption options, bool load_partial=false)
 Create a new PartialModelSaver. More...
 
std::future< WeightFileEntryadd_model (const std::shared_ptr< const Model > &model, const std::optional< std::string > &file_path={})
 Adds the weights of a partial model asynchronously. More...
 
void update_meta_file ()
 Updates the metadata file. More...
 
void finalize ()
 Checks that all weights have been written and updates the metadata file. More...
 
std::pair< label_id_t, label_id_tget_missing_weights () const
 Get an interval labels for which weights are missing. More...
 
bool any_weight_vector_for_interval (label_id_t begin, label_id_t end) const
 Checks if there are any weight vectors for the given interval. More...
 
void insert_sub_file (const WeightFileEntry &data)
 Inserts a new sub-file entry into the metadata object. More...
 
- Public Member Functions inherited from dismec::io::model::PartialModelIO
long num_labels () const noexcept
 Gets the total number of labels. More...
 
long num_features () const noexcept
 Gets the total number of features. More...
 

Private Attributes

SaveOption m_Options
 
path m_MetaFileName
 

Additional Inherited Members

- Protected Types inherited from dismec::io::model::PartialModelIO
using weight_file_iter_t = std::vector< WeightFileEntry >::const_iterator
 
- Protected Member Functions inherited from dismec::io::model::PartialModelIO
 PartialModelIO ()=default
 
 ~PartialModelIO ()=default
 
void read_metadata_file (const path &meta_file)
 
void insert_sub_file (const WeightFileEntry &data)
 Inserts a new sub-file entry into the metadata object. More...
 
weight_file_iter_t label_lower_bound (label_id_t pos) const
 Gets an iterator into the weight-file list that points to the first element whose starting label is larger than or equal to pos. More...
 
- Protected Attributes inherited from dismec::io::model::PartialModelIO
long m_TotalLabels = -1
 
long m_NumFeatures = -1
 
std::vector< WeightFileEntrym_SubFiles
 

Detailed Description

Manage saving a model consisting of multiple partial models.

In large-scale situations, model training will not produce a single, complete model, but instead we will train multiple partial models (either successively, or in a distributed parallel fashion). In that case, the save_model function is not suitable, and this class has to be used.

The work flow looks like this: First, a model saver instance is created. You can pass in the name of the metadata file, and additional options as SaveOption.

SaveOption options;
PartialModelSaver saver(target_file, options)
PartialModelSaver(path target_file, SaveOption options, bool load_partial=false)
Create a new PartialModelSaver.
Definition: model-io.cpp:160

Then the partial models will be added. The first model will be used to determine the total number of labels and features, and all further partial models added will be verified to be compatible. Note that PartialModelSaver::add_model() does write the weight files, but it does not yet update the metadata file.

for(auto& partial_model : generator_of_partial_models) {
saver.add_model(partial_model);
}

After all partial models have been added, you should call PartialModelSaver::finalize(). This method first verifies that you have in fact submitted all partial models, i.e. that all together the weight vector for each label has been saved. If that is the case, it updates the metadata file. If you want to checkpoint the models you have saved up to a certain point, but have not yet saved all weights, you can instead use the PartialModelSaver::update_meta_file() function. This will only update the meta-data file, but not check that all weights are present. You can continue writing more partial models to that PartialModelSaver. Another, typical use case is that you want to continue adding to the save file in another run of the program. In that case, you can load the partial save file by passing true as the load_partial argument to the constructor.

saver.update_meta_file(); // model is still incomplete
// ...
PartialModelSaver continued_save(target_file, options, true); // load the partial save file
continued_save.add_model(missing_parts); // add more partial models
continued_save.finalize(); // call finalize once it is complete.
//

Definition at line 236 of file model-io.h.

Constructor & Destructor Documentation

◆ PartialModelSaver()

PartialModelSaver::PartialModelSaver ( path  target_file,
SaveOption  options,
bool  load_partial = false 
)

Create a new PartialModelSaver.

Parameters
target_fileFile name of the metadata file. Also used as the base name for automatically generated weight file name.
optionsOptions that will be used for saving the weights of all partial models that will be submitted to add_model. The SaveOption::SplitFiles parameter is ignored, though, since file splits are done explicitly through the partial models.
load_partialIf this is set to true, then it is expected that target_file already exists, and the corresponding metadata will be loaded (but not the weights). This allows to add more weights to an unfinished save file.
Note
When continuing from an existing checkpoint, we assume that all existing partial weights are valid. If that is not the case, the PartialModelSaver will still be able to add more weights (that don't overlap with the existing ones), but actually loading the resulting model file will be impossible.

Definition at line 160 of file model-io.cpp.

References m_MetaFileName, and dismec::io::model::PartialModelIO::read_metadata_file().

Member Function Documentation

◆ add_model()

std::future< WeightFileEntry > PartialModelSaver::add_model ( const std::shared_ptr< const Model > &  model,
const std::optional< std::string > &  file_path = {} 
)

Adds the weights of a partial model asynchronously.

Saves the weights of model in a weights file and appends that file to the internal list of weight files. This does not update the metadata file. If this method is called for the first time, the total number of labels and features is extracted. All subsequent calls verify that the partial models have the same number of labels and features. This function operates asynchronously, using std::async to launch the actual writing of the weights.

Parameters
modelThe model whose weights to save.
file_pathIf given, this will be used as the file path for the weights file. Otherwise, an automatically generated file name is used.
Exceptions
std::logic_errorif model is incompatible because it has a different number of features/labels than the other partial models, or if there is an overlap with already saved weights.
Returns
A future that becomes ready after the weight file has been written. Its value is the new weight file entry that has been added to the list of weight files.

Definition at line 172 of file model-io.cpp.

References dismec::io::model::PartialModelIO::m_NumFeatures, and dismec::io::model::PartialModelIO::m_TotalLabels.

Referenced by TrainingProgram::run(), and TEST_CASE().

◆ any_weight_vector_for_interval()

bool dismec::io::model::PartialModelSaver::any_weight_vector_for_interval ( label_id_t  begin,
label_id_t  end 
) const

Checks if there are any weight vectors for the given interval.

This can be used to make sure that training is only run for label ids for which there is no weight vector yet.

Definition at line 255 of file model-io.cpp.

References dismec::io::model::PartialModelIO::label_lower_bound(), and dismec::io::model::PartialModelIO::m_SubFiles.

Referenced by TrainingProgram::parse_label_range().

◆ finalize()

void PartialModelSaver::finalize ( )

Checks that all weights have been written and updates the metadata file.

This function checks that the PartialModelSaver has received weight vectors for every label. If that is true, it updates the metadata file.

Exceptions
std::logic_errorIf there are missing weight vectors.
Todo:
Do we need this function? It seems like a nice idea to have a check if everything is done, but currently we don't use it.

Definition at line 277 of file model-io.cpp.

References dismec::io::model::PartialModelIO::m_SubFiles, dismec::io::model::PartialModelIO::m_TotalLabels, and update_meta_file().

Referenced by TEST_CASE().

◆ get_missing_weights()

std::pair< label_id_t, label_id_t > PartialModelSaver::get_missing_weights ( ) const

Get an interval labels for which weights are missing.

If all weights are present, the returned pair is (num_weights, num_weights). Otherwise, it is a half-open interval over label ids for which the PartialModelSaver doesn't have weights available.

Definition at line 292 of file model-io.cpp.

References dismec::io::model::PartialModelIO::m_SubFiles, and dismec::io::model::PartialModelIO::m_TotalLabels.

Referenced by TrainingProgram::parse_label_range().

◆ insert_sub_file()

void dismec::io::model::PartialModelIO::insert_sub_file

Inserts a new sub-file entry into the metadata object.

Parameters
dataThe specifications of the sub file to be added.
Exceptions
std::logic_errorif data has weights for labels that overlap with already existing weights.

Definition at line 184 of file model-io.cpp.

◆ update_meta_file()

void PartialModelSaver::update_meta_file ( )

Updates the metadata file.

This ensures that all weight files that have been created due to add_model calls will be listed in the metadata file. Use this function to checkpoint partial saving. If all partial models have been added, call finalize() instead, which also verifies the completeness of the data.

Definition at line 234 of file model-io.cpp.

References m_MetaFileName, dismec::io::model::PartialModelIO::m_NumFeatures, dismec::io::model::PartialModelIO::m_SubFiles, dismec::io::model::PartialModelIO::m_TotalLabels, and dismec::io::model::to_string().

Referenced by finalize(), and TrainingProgram::run().

Member Data Documentation

◆ m_MetaFileName

path dismec::io::model::PartialModelSaver::m_MetaFileName
private

Definition at line 309 of file model-io.h.

Referenced by PartialModelSaver(), and update_meta_file().

◆ m_Options

SaveOption dismec::io::model::PartialModelSaver::m_Options
private

Definition at line 308 of file model-io.h.


The documentation for this class was generated from the following files: