DiSMEC++
|
Manage saving a model consisting of multiple partial models. More...
#include <model-io.h>
Public Member Functions | |
PartialModelSaver (path target_file, SaveOption options, bool load_partial=false) | |
Create a new PartialModelSaver. More... | |
std::future< WeightFileEntry > | add_model (const std::shared_ptr< const Model > &model, const std::optional< std::string > &file_path={}) |
Adds the weights of a partial model asynchronously. More... | |
void | update_meta_file () |
Updates the metadata file. More... | |
void | finalize () |
Checks that all weights have been written and updates the metadata file. More... | |
std::pair< label_id_t, label_id_t > | get_missing_weights () const |
Get an interval labels for which weights are missing. More... | |
bool | any_weight_vector_for_interval (label_id_t begin, label_id_t end) const |
Checks if there are any weight vectors for the given interval. More... | |
void | insert_sub_file (const WeightFileEntry &data) |
Inserts a new sub-file entry into the metadata object. More... | |
Public Member Functions inherited from dismec::io::model::PartialModelIO | |
long | num_labels () const noexcept |
Gets the total number of labels. More... | |
long | num_features () const noexcept |
Gets the total number of features. More... | |
Private Attributes | |
SaveOption | m_Options |
path | m_MetaFileName |
Additional Inherited Members | |
Protected Types inherited from dismec::io::model::PartialModelIO | |
using | weight_file_iter_t = std::vector< WeightFileEntry >::const_iterator |
Protected Member Functions inherited from dismec::io::model::PartialModelIO | |
PartialModelIO ()=default | |
~PartialModelIO ()=default | |
void | read_metadata_file (const path &meta_file) |
void | insert_sub_file (const WeightFileEntry &data) |
Inserts a new sub-file entry into the metadata object. More... | |
weight_file_iter_t | label_lower_bound (label_id_t pos) const |
Gets an iterator into the weight-file list that points to the first element whose starting label is larger than or equal to pos . More... | |
Protected Attributes inherited from dismec::io::model::PartialModelIO | |
long | m_TotalLabels = -1 |
long | m_NumFeatures = -1 |
std::vector< WeightFileEntry > | m_SubFiles |
Manage saving a model consisting of multiple partial models.
In large-scale situations, model training will not produce a single, complete model, but instead we will train multiple partial models (either successively, or in a distributed parallel fashion). In that case, the save_model
function is not suitable, and this class has to be used.
The work flow looks like this: First, a model saver instance is created. You can pass in the name of the metadata file, and additional options as SaveOption.
Then the partial models will be added. The first model will be used to determine the total number of labels and features, and all further partial models added will be verified to be compatible. Note that PartialModelSaver::add_model() does write the weight files, but it does not yet update the metadata file.
After all partial models have been added, you should call PartialModelSaver::finalize(). This method first verifies that you have in fact submitted all partial models, i.e. that all together the weight vector for each label has been saved. If that is the case, it updates the metadata file. If you want to checkpoint the models you have saved up to a certain point, but have not yet saved all weights, you can instead use the PartialModelSaver::update_meta_file() function. This will only update the meta-data file, but not check that all weights are present. You can continue writing more partial models to that PartialModelSaver. Another, typical use case is that you want to continue adding to the save file in another run of the program. In that case, you can load the partial save file by passing true as the load_partial
argument to the constructor.
Definition at line 236 of file model-io.h.
PartialModelSaver::PartialModelSaver | ( | path | target_file, |
SaveOption | options, | ||
bool | load_partial = false |
||
) |
Create a new PartialModelSaver.
target_file | File name of the metadata file. Also used as the base name for automatically generated weight file name. |
options | Options that will be used for saving the weights of all partial models that will be submitted to add_model . The SaveOption::SplitFiles parameter is ignored, though, since file splits are done explicitly through the partial models. |
load_partial | If this is set to true, then it is expected that target_file already exists, and the corresponding metadata will be loaded (but not the weights). This allows to add more weights to an unfinished save file. |
PartialModelSaver
will still be able to add more weights (that don't overlap with the existing ones), but actually loading the resulting model file will be impossible. Definition at line 160 of file model-io.cpp.
References m_MetaFileName, and dismec::io::model::PartialModelIO::read_metadata_file().
std::future< WeightFileEntry > PartialModelSaver::add_model | ( | const std::shared_ptr< const Model > & | model, |
const std::optional< std::string > & | file_path = {} |
||
) |
Adds the weights of a partial model asynchronously.
Saves the weights of model
in a weights file and appends that file to the internal list of weight files. This does not update the metadata file. If this method is called for the first time, the total number of labels and features is extracted. All subsequent calls verify that the partial models have the same number of labels and features. This function operates asynchronously, using std::async to launch the actual writing of the weights.
model | The model whose weights to save. |
file_path | If given, this will be used as the file path for the weights file. Otherwise, an automatically generated file name is used. |
std::logic_error | if model is incompatible because it has a different number of features/labels than the other partial models, or if there is an overlap with already saved weights. |
Definition at line 172 of file model-io.cpp.
References dismec::io::model::PartialModelIO::m_NumFeatures, and dismec::io::model::PartialModelIO::m_TotalLabels.
Referenced by TrainingProgram::run(), and TEST_CASE().
bool dismec::io::model::PartialModelSaver::any_weight_vector_for_interval | ( | label_id_t | begin, |
label_id_t | end | ||
) | const |
Checks if there are any weight vectors for the given interval.
This can be used to make sure that training is only run for label ids for which there is no weight vector yet.
Definition at line 255 of file model-io.cpp.
References dismec::io::model::PartialModelIO::label_lower_bound(), and dismec::io::model::PartialModelIO::m_SubFiles.
Referenced by TrainingProgram::parse_label_range().
void PartialModelSaver::finalize | ( | ) |
Checks that all weights have been written and updates the metadata file.
This function checks that the PartialModelSaver
has received weight vectors for every label. If that is true, it updates the metadata file.
std::logic_error | If there are missing weight vectors. |
Definition at line 277 of file model-io.cpp.
References dismec::io::model::PartialModelIO::m_SubFiles, dismec::io::model::PartialModelIO::m_TotalLabels, and update_meta_file().
Referenced by TEST_CASE().
std::pair< label_id_t, label_id_t > PartialModelSaver::get_missing_weights | ( | ) | const |
Get an interval labels for which weights are missing.
If all weights are present, the returned pair is (num_weights, num_weights)
. Otherwise, it is a half-open interval over label ids for which the PartialModelSaver
doesn't have weights available.
Definition at line 292 of file model-io.cpp.
References dismec::io::model::PartialModelIO::m_SubFiles, and dismec::io::model::PartialModelIO::m_TotalLabels.
Referenced by TrainingProgram::parse_label_range().
void dismec::io::model::PartialModelIO::insert_sub_file |
Inserts a new sub-file entry into the metadata object.
data | The specifications of the sub file to be added. |
std::logic_error | if data has weights for labels that overlap with already existing weights. |
Definition at line 184 of file model-io.cpp.
void PartialModelSaver::update_meta_file | ( | ) |
Updates the metadata file.
This ensures that all weight files that have been created due to add_model
calls will be listed in the metadata file. Use this function to checkpoint partial saving. If all partial models have been added, call finalize() instead, which also verifies the completeness of the data.
Definition at line 234 of file model-io.cpp.
References m_MetaFileName, dismec::io::model::PartialModelIO::m_NumFeatures, dismec::io::model::PartialModelIO::m_SubFiles, dismec::io::model::PartialModelIO::m_TotalLabels, and dismec::io::model::to_string().
Referenced by finalize(), and TrainingProgram::run().
|
private |
Definition at line 309 of file model-io.h.
Referenced by PartialModelSaver(), and update_meta_file().
|
private |
Definition at line 308 of file model-io.h.