DiSMEC++
dismec::model::Model Class Referenceabstract

A model combines a set of weight with some meta-information about these weights. More...

#include <model.h>

Inheritance diagram for dismec::model::Model:
dismec::model::DenseModel dismec::model::SparseModel dismec::model::SubModelWrapper< T >

Public Types

using PredictionMatrixOut = Eigen::Ref< PredictionMatrix >
 
using FeatureMatrixIn = GenericInMatrix
 
using WeightVectorIn = GenericInVector
 

Public Member Functions

 Model (PartialModelSpec spec)
 
virtual ~Model ()=default
 
long num_labels () const noexcept
 How many labels are in the underlying dataset. More...
 
virtual long num_features () const =0
 How many weights are in each weight vector, i.e. how many features should the input have. More...
 
long num_weights () const noexcept
 How many weights vectors are in this model. More...
 
virtual bool has_sparse_weights () const =0
 whether this model stores the weights in a sparse format, or a dense format. More...
 
bool is_partial_model () const
 returns true if this instance only stores part of the weights of an entire model More...
 
label_id_t labels_begin () const noexcept
 
label_id_t labels_end () const noexcept
 
long contained_labels () const noexcept
 How many labels are in this submodel. More...
 
void get_weights_for_label (label_id_t label, Eigen::Ref< DenseRealVector > target) const
 Gets the weights for the given label as a dense vector. More...
 
void set_weights_for_label (label_id_t label, const WeightVectorIn &weights)
 Sets the weights for a label. More...
 
void predict_scores (const FeatureMatrixIn &instances, PredictionMatrixOut target) const
 Calculates the scores for all examples and all labels in this model. More...
 

Protected Member Functions

label_id_t adjust_label (label_id_t label) const
 

Private Member Functions

virtual void predict_scores_unchecked (const FeatureMatrixIn &instances, PredictionMatrixOut target) const =0
 Unchecked version of predict_scores(). More...
 
virtual void get_weights_for_label_unchecked (label_id_t label, Eigen::Ref< DenseRealVector > target) const =0
 Unchecked version of get_weights_for_label(). More...
 
virtual void set_weights_for_label_unchecked (label_id_t label, const WeightVectorIn &weights)=0
 Unchecked version of set_weights_for_label(). More...
 

Private Attributes

label_id_t m_LabelsBegin
 
label_id_t m_LabelsEnd
 
long m_NumLabels
 Total number of labels of the complete model. More...
 

Detailed Description

A model combines a set of weight with some meta-information about these weights.

The weights may be represented as a dense or as a sparse matrix. The model class allows access to model meta-data through functions such as

as well as access to the weights via

Finally, the class provides an interface for performing predictions. This is handled by the Model::predict_scores function, which exists in a version that handles dense features and one version that handles sparse feature vectors.

The current design abstract away whether the weights are saved internally in a dense or sparse representation, but this comes at the cost that getting and setting weights has sub-optimal performance characteristics for both sparse and dense data format. However, these functions are mostly for 1. IO. In th case we expect the actual IO cost to be much more than the in-memory copies 2. setting weights after training. Same. In both cases, the solution is to have one buffer per thread, and only do a single read or write.

For performance reasons, the model functions do not return newly constructed vectors, but instead expect the caller to provide a buffer in which they will place their values. Where appropriate, input vectors are not taken as Eigen::Ref objects so they can bind to sub-matrices. In that case, the caller needs to make sure that the resulting sub-objects have an inner stride of one. This allows e.g. to parallelize the prediction process over the examples.

Todo:
what about per-label prediction? That would also be a valid axis for parallelizing.

Instances of the Model class can either represent an entire model, or a sub-model that only contains a contiguous subset of the weight vectors of the whole model. This is necessary e.g. to facility distributed-memory training and prediction. If only a subset of the weights are contained, then the labels can be queried by Model::labels_begin() and Model::labels_end().

Definition at line 63 of file model.h.

Member Typedef Documentation

◆ FeatureMatrixIn

Definition at line 66 of file model.h.

◆ PredictionMatrixOut

Definition at line 65 of file model.h.

◆ WeightVectorIn

Definition at line 67 of file model.h.

Constructor & Destructor Documentation

◆ Model()

Model::Model ( PartialModelSpec  spec)
explicit

◆ ~Model()

virtual dismec::model::Model::~Model ( )
virtualdefault

Member Function Documentation

◆ adjust_label()

label_id_t Model::adjust_label ( label_id_t  label) const
protected

this function verifies that label is a valid label, in [labels_begin(), labels_end()), and returns a zero-based label, i.e. it subtracts labels_begin().

Definition at line 28 of file model.cpp.

References labels_begin(), labels_end(), and dismec::opaque_int_type< Tag, T >::to_index().

Referenced by get_weights_for_label(), and set_weights_for_label().

◆ contained_labels()

long dismec::model::Model::contained_labels ( ) const
inlinenoexcept

How many labels are in this submodel.

Definition at line 105 of file model.h.

References m_LabelsBegin, and m_LabelsEnd.

Referenced by dismec::io::model::load_dense_weights_npy(), dismec::io::model::save_as_sparse_weights_txt(), and dismec::io::model::save_dense_weights_npy().

◆ get_weights_for_label()

void Model::get_weights_for_label ( label_id_t  label,
Eigen::Ref< DenseRealVector target 
) const

Gets the weights for the given label as a dense vector.

Since we do not know whether the weights saved in the model are sparse or dense vectors, we cannot simply return a const reference here. Instead, the user is required to provide a pre-allocated buffer target into which the weights will be copied.

Exceptions
Iftarget does not have the correct size, or if label is invalid.

Definition at line 41 of file model.cpp.

References adjust_label(), get_weights_for_label_unchecked(), and num_features().

Referenced by anonymous_namespace{weights.cpp}::save_weights().

◆ get_weights_for_label_unchecked()

virtual void dismec::model::Model::get_weights_for_label_unchecked ( label_id_t  label,
Eigen::Ref< DenseRealVector target 
) const
privatepure virtual

Unchecked version of get_weights_for_label().

Since we do not know whether the weights saved in the model are sparse or dense vectors, we cannot simply return a const reference here. Instead, the user is required to provide a pre-allocated buffer target into which the weights will be copied.

Exceptions
Iftarget does not have the correct size, or if label is invalid.
Note
This function is called from get_weights_for_label and can assume label is a valid label index that has been corrected for partial models (i.e. such that the first label of the partial model will get the index 0). Target can be assumed to be of correct size.

Implemented in dismec::model::SubModelWrapper< T >, dismec::model::SparseModel, and dismec::model::DenseModel.

Referenced by get_weights_for_label().

◆ has_sparse_weights()

virtual bool dismec::model::Model::has_sparse_weights ( ) const
pure virtual

whether this model stores the weights in a sparse format, or a dense format.

Implemented in dismec::model::SubModelWrapper< T >, dismec::model::SparseModel, and dismec::model::DenseModel.

◆ is_partial_model()

bool Model::is_partial_model ( ) const

returns true if this instance only stores part of the weights of an entire model

Definition at line 37 of file model.cpp.

References m_LabelsBegin, m_LabelsEnd, num_labels(), and dismec::opaque_int_type< Tag, T >::to_index().

Referenced by TEST_CASE().

◆ labels_begin()

label_id_t dismec::model::Model::labels_begin ( ) const
inlinenoexcept

If this is a partial model, returns the index of the first label for which weight vectors are available. For a complete model this always returns 0.

Definition at line 98 of file model.h.

References m_LabelsBegin.

Referenced by adjust_label(), dismec::model::SubModelWrapper< T >::get_weights_for_label_unchecked(), dismec::io::model::load_sparse_weights_txt(), anonymous_namespace{weights.cpp}::load_weights(), num_weights(), anonymous_namespace{weights.cpp}::save_weights(), dismec::model::SubModelWrapper< T >::set_weights_for_label_unchecked(), and TEST_CASE().

◆ labels_end()

label_id_t dismec::model::Model::labels_end ( ) const
inlinenoexcept

If this is a partial, returns the first label index for which no weights are available. For a complete model this returns num_labels()

Definition at line 102 of file model.h.

References m_LabelsEnd.

Referenced by adjust_label(), dismec::io::model::load_sparse_weights_txt(), anonymous_namespace{weights.cpp}::load_weights(), num_weights(), anonymous_namespace{weights.cpp}::save_weights(), and TEST_CASE().

◆ num_features()

◆ num_labels()

long dismec::model::Model::num_labels ( ) const
inlinenoexcept

How many labels are in the underlying dataset.

If is_partial_model() is false, this is equal to the number of weights in this model.

Definition at line 78 of file model.h.

References m_NumLabels.

Referenced by is_partial_model(), dismec::io::model::load_sparse_weights_txt(), and TEST_CASE().

◆ num_weights()

long dismec::model::Model::num_weights ( ) const
inlinenoexcept

How many weights vectors are in this model.

If is_partial_model is false, this is equal to the number of labels.

Definition at line 87 of file model.h.

References labels_begin(), and labels_end().

Referenced by predict_scores(), and TEST_CASE().

◆ predict_scores()

void Model::predict_scores ( const FeatureMatrixIn instances,
PredictionMatrixOut  target 
) const

Calculates the scores for all examples and all labels in this model.

This is just the matrix multiplication of the input instances and the weight matrix. This function can be called safely from multiple threads. Note that Eigen::Ref requires that the passed submatrix has an inner stride of 1, i.e. that features for a single instance are provided as contiguous memory (in case of dense features), and the same for the pre-allocated buffer for the targets. This can be achieved e.g. by using rows of a row-major matrix.

Parameters
instancesFeature vector of the instances for which we want to predict the scores. This is handled as a Eigen::Ref parameter so that subsets of a large dataset can be passed without needing data to be copied. Should have number of columns equal to the number of features. The GenericInMatrix allows different data formats to be passed – however, some data formats may be more efficient than others.
targetThis is the matrix to which the scores will be written. Has to have the correct size, i.e. the same number of rows as instances and number of columns equal to the number of labels.
Exceptions
Ifinstances and target have different number of rows, or if the number of columns (rows) in instances (target) does not match get_num_features() (get_num_labels()).

Definition at line 60 of file model.cpp.

References dismec::types::EigenVariantWrapper< Types >::cols(), num_features(), num_weights(), predict_scores_unchecked(), and dismec::types::EigenVariantWrapper< Types >::rows().

◆ predict_scores_unchecked()

virtual void dismec::model::Model::predict_scores_unchecked ( const FeatureMatrixIn instances,
PredictionMatrixOut  target 
) const
privatepure virtual

Unchecked version of predict_scores().

This is just the matrix multiplication of the input instances and the weight matrix. This function can be called safely from multiple threads. Note that Eigen::Ref requires that the passed submatrix has an inner stride of 1, i.e. that features for a single instance are provided as contiguous memory (in case of dense features), and the same for the pre-allocated buffer for the targets. This can be achieved e.g. by using rows of a row-major matrix.

Parameters
instancesFeature vector of the instances for which we want to predict the scores. This is handled as a Eigen::Ref parameter so that subsets of a large dataset can be passed without needing data to be copied. Should have number of columns equal to the number of features. The GenericInMatrix allows different data formats to be passed – however, some data formats may be more efficient than others.
targetThis is the matrix to which the scores will be written. Has to have the correct size, i.e. the same number of rows as instances and number of columns equal to the number of labels.
Exceptions
Ifinstances and target have different number of rows, or if the number of columns (rows) in instances (target) does not match get_num_features() (get_num_labels()).
Note
This function is called from predict_scores and can assume that the shapes of instances and target have been verified.

Implemented in dismec::model::SubModelWrapper< T >, dismec::model::SparseModel, and dismec::model::DenseModel.

Referenced by predict_scores().

◆ set_weights_for_label()

void Model::set_weights_for_label ( label_id_t  label,
const WeightVectorIn weights 
)

Sets the weights for a label.

This assigns the given vector as weights for the labelth label. For non-overlapping label parameters this function can safely be called concurrently from different threads. This function can be used for both dense and sparse internal weight representation. If the model internally uses a sparse representation, the zeros will be filtered out.

Exceptions
Ifweights does not have the correct size, or if label is invalid.

Definition at line 51 of file model.cpp.

References adjust_label(), num_features(), set_weights_for_label_unchecked(), and dismec::types::EigenVariantWrapper< Types >::size().

Referenced by dismec::io::model::load_sparse_weights_txt(), and anonymous_namespace{weights.cpp}::load_weights().

◆ set_weights_for_label_unchecked()

virtual void dismec::model::Model::set_weights_for_label_unchecked ( label_id_t  label,
const WeightVectorIn weights 
)
privatepure virtual

Unchecked version of set_weights_for_label().

This assigns the given vector as weights for the labelth label. For non-overlapping label parameters this function can safely be called concurrently from different threads. This function can be used for both dense and sparse internal weight representation. If the model internally uses a sparse representation, the zeros will be filtered out.

Exceptions
Ifweights does not have the correct size, or if label is invalid.
Note
This function is called from set_weights_for_label and can assume label is a valid label index that has been corrected for partial models (i.e. such that the first label of the partial model will get the index 0). weights can be assumed to be of correct size.

Implemented in dismec::model::SparseModel, dismec::model::DenseModel, and dismec::model::SubModelWrapper< T >.

Referenced by set_weights_for_label().

Member Data Documentation

◆ m_LabelsBegin

label_id_t dismec::model::Model::m_LabelsBegin
private

Definition at line 176 of file model.h.

Referenced by contained_labels(), is_partial_model(), labels_begin(), and Model().

◆ m_LabelsEnd

label_id_t dismec::model::Model::m_LabelsEnd
private

Definition at line 177 of file model.h.

Referenced by contained_labels(), is_partial_model(), labels_end(), and Model().

◆ m_NumLabels

long dismec::model::Model::m_NumLabels
private

Total number of labels of the complete model.

if this is a partial model, the information about the number of total labels cannot be extracted from weights.

Definition at line 185 of file model.h.

Referenced by Model(), and num_labels().


The documentation for this class was generated from the following files: