- Member anonymous_namespace{sparse.cpp}::check_positive (long v, const char *error_msg)
- we have this code twice now.
- Member anonymous_namespace{xmc.cpp}::count_features_per_example (std::istream &source, std::size_t num_examples=100 '000)
- also count number of labels based on
,
, then we can reserve also the label vector
- Member anonymous_namespace{xmc.cpp}::parse_xmc_header (const std::string &content)
- throw if there is more data on the line.
- Member dismec::BinaryLabelVector
- decide whether this should be a matrix or an array type
- Class dismec::DiSMECTraining
- Figure out why we do this and put a reference/explanation here.
- Member dismec::io::model::PartialModelSaver::finalize ()
- Do we need this function? It seems like a nice idea to have a check if everything is done, but currently we don't use it.
- Member dismec::io::prediction::read_sparse_prediction (std::istream &source)
- document this format and provide a link.
- Member dismec::io::save_xmc_dataset (std::ostream &target, const MultiLabelData &data)
- insert proper checks that data is sparse
- Class dismec::model::Model
- what about per-label prediction? That would also be a valid axis for parallelizing.
- Member dismec::objective::Objective::declare_vector_on_last_line (const HashVector &location, real_t t)
- improve this interface, together with project_to_line, to be less error prone!
- Member dismec::stats::make_stat_from_json (const nlohmann::json &source)
- document the json format.
- Class dismec::TrainingSpec
- should we give the dataset to each operation, or maybe just set it in the beginning? I think maybe at some point convert all this dataset stuff to use shared_ptr everywhere.
- Member TEST_CASE ("save dense npy")
- test cases for mismatch
- Member TEST_CASE ("parse labels")
- we should also check that the returned pointer is valid.