DiSMEC++
dismec::prediction::PredictionBase Class Reference

Base class for handling predictions. More...

#include <prediction.h>

Inheritance diagram for dismec::prediction::PredictionBase:
dismec::parallel::TaskGenerator dismec::prediction::FullPredictionTaskGenerator dismec::prediction::TopKPredictionTaskGenerator

Public Member Functions

 PredictionBase (const DatasetBase *data, std::shared_ptr< const Model > model)
 Constructor, checks that data and model are compatible. More...
 
- Public Member Functions inherited from dismec::parallel::TaskGenerator
virtual ~TaskGenerator ()=default
 
virtual long num_tasks () const =0
 
virtual void run_tasks (long begin, long end, thread_id_t thread_id)=0
 
virtual void prepare (long num_threads, long chunk_size)
 Called to notify the TaskGenerator about the number of threads. More...
 
virtual void finalize ()
 Called after all threads have finished their tasks. More...
 

Protected Member Functions

void make_thread_local_features (long num_threads)
 
void init_thread (thread_id_t thread_id) final
 Called once a thread has spun up, but before it runs its first task. More...
 
void do_prediction (long begin, long end, thread_id_t thread_id, Eigen::Ref< PredictionMatrix > target)
 Predicts the scores for a subset of the instances given by the half-open interval [begin, end). More...
 

Protected Attributes

const DatasetBasem_Data
 Data on which the prediction is run. More...
 
std::shared_ptr< const Modelm_Model
 Model (possibly partial) for which prediction is run. More...
 

Private Attributes

parallel::NUMAReplicator< const GenericFeatureMatrixm_FeatureReplicator
 The NUMAReplicator that generates NUMA-local copies for the feature matrices. More...
 
std::vector< std::shared_ptr< const GenericFeatureMatrix > > m_ThreadLocalFeatures
 

Additional Inherited Members

- Public Types inherited from dismec::parallel::TaskGenerator
using thread_id_t = dismec::parallel::thread_id_t
 

Detailed Description

Base class for handling predictions.

This class manages the dataset and model used for batch prediction. It ensures the features are replicated across the NUMA nodes. Batch-prediction is a difficult process, because:

  1. The model may be too large to fit into RAM at once
  2. The predicted scores may be too many to fit into RAM at once. There are several ways how this could be countered. If only the model is too large, then one may load the model piece by piece, and predict sequentially for the different parts of the model. Parallelism can be achieved by having different threads handle different examples. Such a process is implemented in FullPredictionTaskGenerator.

However, if the predictions cannot be stored in memory, things become more complicated. For many applications, though, one does not actually need the full prediction scores vector, because the prediction is interpreted as a ranking task and only the top-k entries are of relevance. In that case, the top-k reduction can be performed as an accumulation step over the different partial models, and the full score vector never need be stored. This type of prediction is realized through TopKPredictionTaskGenerator.

Definition at line 34 of file prediction.h.

Constructor & Destructor Documentation

◆ PredictionBase()

PredictionBase::PredictionBase ( const DatasetBase data,
std::shared_ptr< const Model model 
)

Constructor, checks that data and model are compatible.

Definition at line 17 of file prediction.cpp.

References m_Model, dismec::DatasetBase::num_features(), and dismec::DatasetBase::num_labels().

Member Function Documentation

◆ do_prediction()

void PredictionBase::do_prediction ( long  begin,
long  end,
thread_id_t  thread_id,
Eigen::Ref< PredictionMatrix target 
)
protected

Predicts the scores for a subset of the instances given by the half-open interval [begin, end).

This function is to be used by derived classes to generate the (partial) score predictions.

Parameters
beginThe index of the first instance for which prediction will be performed.
endThe index past the last instance for which prediction will be performed.
thread_idIndex of the thread which performs the prediction. This argument is needed so the function knows which feature replication to use for optimal performance.
targetReference to the target array in which the predictions will be saved.
Exceptions
std::logic_errorif the shape of target is not compatible.

Definition at line 51 of file prediction.cpp.

References m_Model, m_ThreadLocalFeatures, anonymous_namespace{prediction.cpp}::make_matrix(), dismec::opaque_int_type< Tag, T >::to_index(), and dismec::types::visit().

Referenced by dismec::prediction::FullPredictionTaskGenerator::run_tasks(), and dismec::prediction::TopKPredictionTaskGenerator::run_tasks().

◆ init_thread()

void PredictionBase::init_thread ( parallel::thread_id_t  thread_id)
finalprotectedvirtual

Called once a thread has spun up, but before it runs its first task.

This function is called from inside the thread that also will run the tasks.

Reimplemented from dismec::parallel::TaskGenerator.

Definition at line 38 of file prediction.cpp.

References m_FeatureReplicator, m_ThreadLocalFeatures, and dismec::opaque_int_type< Tag, T >::to_index().

◆ make_thread_local_features()

void PredictionBase::make_thread_local_features ( long  num_threads)
protected

This function resizes the internal thread local feature buffer to correspond to the number of threads. This needs to be called before any call to init_thread().

Definition at line 34 of file prediction.cpp.

References m_ThreadLocalFeatures.

Referenced by dismec::prediction::FullPredictionTaskGenerator::prepare(), and dismec::prediction::TopKPredictionTaskGenerator::prepare().

Member Data Documentation

◆ m_Data

const DatasetBase* dismec::prediction::PredictionBase::m_Data
protected

Data on which the prediction is run.

Definition at line 40 of file prediction.h.

Referenced by dismec::prediction::FullPredictionTaskGenerator::num_tasks(), and dismec::prediction::TopKPredictionTaskGenerator::num_tasks().

◆ m_FeatureReplicator

parallel::NUMAReplicator<const GenericFeatureMatrix> dismec::prediction::PredictionBase::m_FeatureReplicator
private

The NUMAReplicator that generates NUMA-local copies for the feature matrices.

Definition at line 63 of file prediction.h.

Referenced by init_thread().

◆ m_Model

std::shared_ptr<const Model> dismec::prediction::PredictionBase::m_Model
protected

◆ m_ThreadLocalFeatures

std::vector<std::shared_ptr<const GenericFeatureMatrix> > dismec::prediction::PredictionBase::m_ThreadLocalFeatures
private

Vector that stores references (as shared_ptr) to the NUMA-local copies of the feature matrices, in such a way that the correct copy can be found by indexing with the thread id.

Definition at line 67 of file prediction.h.

Referenced by do_prediction(), init_thread(), and make_thread_local_features().


The documentation for this class was generated from the following files: