DiSMEC++
xmc.cpp File Reference
#include "io/xmc.h"
#include "io/common.h"
#include "data/data.h"
#include <fstream>
#include "spdlog/spdlog.h"
#include "spdlog/fmt/fmt.h"
#include "spdlog/stopwatch.h"
#include "doctest.h"

Go to the source code of this file.

Classes

struct  anonymous_namespace{xmc.cpp}::XMCHeader
 Collects the data from the header of an xmc file XMC data format. More...
 

Namespaces

 anonymous_namespace{xmc.cpp}
 

Functions

XMCHeader anonymous_namespace{xmc.cpp}::parse_xmc_header (const std::string &content)
 Parses the header (number of examples, features, labels) of an XMC dataset file. More...
 
std::vector< long > anonymous_namespace{xmc.cpp}::count_features_per_example (std::istream &source, std::size_t num_examples=100 '000)
 Extracts number of nonzero features for each instance. More...
 
template<class F >
const char * anonymous_namespace{xmc.cpp}::parse_labels (const char *line, F &&callback)
 parses the labels part of a xmc dataset line. More...
 
template<long IndexOffset>
void anonymous_namespace{xmc.cpp}::read_into_buffers (std::istream &source, SparseFeatures &feature_buffer, std::vector< std::vector< long >> &label_buffer)
 iterates over the lines in source and puts the corresponding features and labels into the given buffers. More...
 
std::ostream & anonymous_namespace{xmc.cpp}::write_label_list (std::ostream &stream, const std::vector< int > &labels)
 
 TEST_CASE ("parse valid header")
 
 TEST_CASE ("parse invalid header")
 
 TEST_CASE ("count features")
 
 TEST_CASE ("parse labels errors")
 
 TEST_CASE ("parse labels")
 
 TEST_CASE ("read into buffers bounds checks")
 

Function Documentation

◆ TEST_CASE() [1/6]

TEST_CASE ( "count features"  )
Test:
Checks that the number of features are counted correctly. Since feature counting is only a very simple approximation, that assumes correctly formatted input data, the only thing we test here is that 1) the counts are correct and 2) empty lines and comments are skipped as they are supposed to.

Definition at line 376 of file xmc.cpp.

References anonymous_namespace{xmc.cpp}::count_features_per_example().

◆ TEST_CASE() [2/6]

TEST_CASE ( "parse invalid header"  )
Test:
Check that invalid XMC headers are causing an exception. The headers are invalid if either the number of data does not match, of if any of the supplied counts are non-positive.

Definition at line 357 of file xmc.cpp.

References anonymous_namespace{xmc.cpp}::parse_xmc_header().

◆ TEST_CASE() [3/6]

TEST_CASE ( "parse labels errors"  )
Test:
This test checks that XMC label parsing of incorrectly formatted lines results in errors. Note that some formatting problems (such as a space before a comma: 5 ,1 10:3.0) are perfectly legal from a label parsing point of view, but will result in an error when the subsequent feature parsing happens. We currently check the following error conditions:
  • trailing comma
  • not-a-number
  • not an integer
  • wrong separator between labels

Definition at line 421 of file xmc.cpp.

References anonymous_namespace{xmc.cpp}::parse_labels().

◆ TEST_CASE() [4/6]

TEST_CASE ( "parse labels"  )
Test:
This test checks that XMC label parsing of correctly formatted lines gives the desired results. We check that whitespace is robust to space/tab, and that the empty labels situation is correctly recognised.
Todo:
we should also check that the returned pointer is valid.

Definition at line 443 of file xmc.cpp.

References anonymous_namespace{xmc.cpp}::parse_labels().

◆ TEST_CASE() [5/6]

TEST_CASE ( "parse valid header"  )
Test:
Checks that valid XMC headers are parsed correctly

Definition at line 337 of file xmc.cpp.

References anonymous_namespace{xmc.cpp}::parse_xmc_header().

◆ TEST_CASE() [6/6]

TEST_CASE ( "read into buffers bounds checks"  )
Test:
This test verifies that read_into_buffers performs correct bounds checking for its features, labels and examples. We check that both overflow (for examples, labels, features) and underflow (for labels, features) are detected, and that these take into account whether indexing is one-based or zero-based

Definition at line 488 of file xmc.cpp.