DiSMEC++
|
Base class for handling predictions. More...
#include <prediction.h>
Public Member Functions | |
PredictionBase (const DatasetBase *data, std::shared_ptr< const Model > model) | |
Constructor, checks that data and model are compatible. More... | |
Public Member Functions inherited from dismec::parallel::TaskGenerator | |
virtual | ~TaskGenerator ()=default |
virtual long | num_tasks () const =0 |
virtual void | run_tasks (long begin, long end, thread_id_t thread_id)=0 |
virtual void | prepare (long num_threads, long chunk_size) |
Called to notify the TaskGenerator about the number of threads. More... | |
virtual void | finalize () |
Called after all threads have finished their tasks. More... | |
Protected Member Functions | |
void | make_thread_local_features (long num_threads) |
void | init_thread (thread_id_t thread_id) final |
Called once a thread has spun up, but before it runs its first task. More... | |
void | do_prediction (long begin, long end, thread_id_t thread_id, Eigen::Ref< PredictionMatrix > target) |
Predicts the scores for a subset of the instances given by the half-open interval [begin, end) . More... | |
Protected Attributes | |
const DatasetBase * | m_Data |
Data on which the prediction is run. More... | |
std::shared_ptr< const Model > | m_Model |
Model (possibly partial) for which prediction is run. More... | |
Private Attributes | |
parallel::NUMAReplicator< const GenericFeatureMatrix > | m_FeatureReplicator |
The NUMAReplicator that generates NUMA-local copies for the feature matrices. More... | |
std::vector< std::shared_ptr< const GenericFeatureMatrix > > | m_ThreadLocalFeatures |
Additional Inherited Members | |
Public Types inherited from dismec::parallel::TaskGenerator | |
using | thread_id_t = dismec::parallel::thread_id_t |
Base class for handling predictions.
This class manages the dataset and model used for batch prediction. It ensures the features are replicated across the NUMA nodes. Batch-prediction is a difficult process, because:
FullPredictionTaskGenerator
.However, if the predictions cannot be stored in memory, things become more complicated. For many applications, though, one does not actually need the full prediction scores vector, because the prediction is interpreted as a ranking task and only the top-k entries are of relevance. In that case, the top-k reduction can be performed as an accumulation step over the different partial models, and the full score vector never need be stored. This type of prediction is realized through TopKPredictionTaskGenerator
.
Definition at line 34 of file prediction.h.
PredictionBase::PredictionBase | ( | const DatasetBase * | data, |
std::shared_ptr< const Model > | model | ||
) |
Constructor, checks that data
and model
are compatible.
Definition at line 17 of file prediction.cpp.
References m_Model, dismec::DatasetBase::num_features(), and dismec::DatasetBase::num_labels().
|
protected |
Predicts the scores for a subset of the instances given by the half-open interval [begin, end)
.
This function is to be used by derived classes to generate the (partial) score predictions.
begin | The index of the first instance for which prediction will be performed. |
end | The index past the last instance for which prediction will be performed. |
thread_id | Index of the thread which performs the prediction. This argument is needed so the function knows which feature replication to use for optimal performance. |
target | Reference to the target array in which the predictions will be saved. |
std::logic_error | if the shape of target is not compatible. |
Definition at line 51 of file prediction.cpp.
References m_Model, m_ThreadLocalFeatures, anonymous_namespace{prediction.cpp}::make_matrix(), dismec::opaque_int_type< Tag, T >::to_index(), and dismec::types::visit().
Referenced by dismec::prediction::FullPredictionTaskGenerator::run_tasks(), and dismec::prediction::TopKPredictionTaskGenerator::run_tasks().
|
finalprotectedvirtual |
Called once a thread has spun up, but before it runs its first task.
This function is called from inside the thread that also will run the tasks.
Reimplemented from dismec::parallel::TaskGenerator.
Definition at line 38 of file prediction.cpp.
References m_FeatureReplicator, m_ThreadLocalFeatures, and dismec::opaque_int_type< Tag, T >::to_index().
|
protected |
This function resizes the internal thread local feature buffer to correspond to the number of threads. This needs to be called before any call to init_thread()
.
Definition at line 34 of file prediction.cpp.
References m_ThreadLocalFeatures.
Referenced by dismec::prediction::FullPredictionTaskGenerator::prepare(), and dismec::prediction::TopKPredictionTaskGenerator::prepare().
|
protected |
Data on which the prediction is run.
Definition at line 40 of file prediction.h.
Referenced by dismec::prediction::FullPredictionTaskGenerator::num_tasks(), and dismec::prediction::TopKPredictionTaskGenerator::num_tasks().
|
private |
The NUMAReplicator
that generates NUMA-local copies for the feature matrices.
Definition at line 63 of file prediction.h.
Referenced by init_thread().
|
protected |
Model (possibly partial) for which prediction is run.
Definition at line 41 of file prediction.h.
Referenced by do_prediction(), PredictionBase(), dismec::prediction::TopKPredictionTaskGenerator::prepare(), dismec::prediction::TopKPredictionTaskGenerator::run_tasks(), and dismec::prediction::TopKPredictionTaskGenerator::update_model().
|
private |
Vector that stores references (as shared_ptr
) to the NUMA-local copies of the feature matrices, in such a way that the correct copy can be found by indexing with the thread id.
Definition at line 67 of file prediction.h.
Referenced by do_prediction(), init_thread(), and make_thread_local_features().