API

Base classes

class subwabbit.base.VowpalWabbitBaseFormatter[source]

Formatter translates structured information about context and items to Vowpal Wabbit’s input format: https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Input-format

It also can implement reverse translation, from Vowpal Wabbits feature names into human readable feature names.

format_common_features(common_features: Any, debug_info: Any = None) → str[source]

Return part of VW line with features that are common for one call of predict/train. This method will run just once per one call of subwabbit.base.VowpalWabbitBaseModel’s predict() or train() method.

Parameters:
  • common_features – Features common for all items
  • debug_info – Optional dict that can be filled by information useful for debugging
Returns:

Part of line that is common for each item in one call. Returned string has to start with ‘|’ symbol.

format_item_features(common_features: Any, item_features: Any, debug_info: Any = None) → str[source]

Return part of VW line with features specific to each item. This method will run for each item per one call of subwabbit.base.VowpalWabbitBaseModel’s predict() or train() method.

Note

It is a good idea to cache results of this method.

Parameters:
  • common_features – Features common for all items
  • item_features – Features for item
  • debug_info – Optional dict that can be filled by information useful for debugging
Returns:

Part of line that is specific for item. Depends on whether namespaces are used or not in format_common_features method:

  • namespaces are used: returned string has to start with '|NAMESPACE_NAME' where NAMESPACE_NAME is the name of some namespace
  • namespaces are not used: returned string should not contain ‘|’ symbol

get_formatted_example(common_line_part: str, item_line_part: str, label: Optional[float] = None, weight: Optional[float] = None, debug_info: Optional[Dict[Any, Any]] = None)[source]

Compose valid VW line from its common and item-dependent parts.

Parameters:
  • common_line_part – Part of line that is common for each item in one call.
  • item_line_part – Part of line specific for each item
  • label – Label of this row
  • weight – Optional weight of row
  • debug_info – Optional dict that can be filled by information useful for debugging
Returns:

One VW line in input format: https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Input-format

get_human_readable_explanation(explanation_string: str, feature_translator: Any = None) → List[Dict[KT, VT]][source]

Transform explanation string into more readable form. Every feature used for prediction is translated into this structure:

{
    # For each feature used in higher interaction there is a 2-tuple
    'names': [('Human readable namespace name 1', 'Human readable feature name 1'), ...],
    'original_feature_name': 'c^c8*f^f102'  # feature name how vowpal sees it,
    'hashindex': 123,  # Vowpal's internal hash of feature name
    'value': 0.123, # value for feature in input line
    'weight': -0.534, # weight learned by VW for this feature
    'potential': value * weight,
    'relative_potential': abs(potential) / sum_of_abs_potentials_for_all_features
}
Parameters:
  • explanation_string – Explanation string from explain_vw_line()
  • feature_translator – Any object that can help you with translation of feature names into human readable form, for example some database connection. See parse_element()
Returns:

List of dicts, sorted by contribution to final score

get_human_readable_explanation_html(explanation_string: str, feature_translator: Any = None, max_rows: Optional[int] = None)[source]

Visualize importance of features in Jupyter notebook.

Parameters:
  • explanation_string – Explanation string from explain_vw_line()
  • feature_translator – Any object that can help you with translation, e.g. some database connection.
  • max_rows – Maximum number of most important features. None return all used features.
Returns:

IPython.core.display.HTML

parse_element(element: str, feature_translator: Any = None) → Tuple[str, str][source]

This method is supposed to translate namespace name and feature name to human readable form.

For example, element can be “a_item_id^i123” and result can be (‘Item ID’, ‘News of the day: ID of item is 123’)

Parameters:
  • element – namespace name and feature name, e.g. a_item_id^i123
  • feature_translator – Any object that can help you with translation, e.g. some database connection
Returns:

tuple(human understandable namespace name, human understandable feature name)

class subwabbit.base.VowpalWabbitDummyFormatter[source]

Formatter that assumes that either common features and item features are already formatted VW input format strings.

class subwabbit.base.VowpalWabbitBaseModel(formatter: subwabbit.base.VowpalWabbitBaseFormatter)[source]

Declaration of Vowpal Wabbit model interface.

explain_vw_line(vw_line: str, link_function: bool = False)[source]

Uses VW audit mode to inspect weights used for prediction. Audit mode has to be turned on by passing audit_mode=True to constructor.

Parameters:
  • vw_line – String in VW line format
  • link_function – If your model use link function, pass True
Returns:

(raw prediction without use of link function, explanation string)

predict(common_features: Any, items_features: Iterable[Any], timeout: Optional[float] = None, debug_info: Any = None, metrics: Optional[Dict[KT, VT]] = None, detailed_metrics: Optional[Dict[KT, VT]] = None) → Iterable[Union[float, str]][source]

Transforms iterable with item features to iterator of predictions.

Parameters:
  • common_features – Features common for all items
  • items_features – Iterable with features for each item
  • timeout – Optionally specify how much time in seconds is desired for computing predictions. In case timeout is passed, returned iterator can has less items that items features iterable.
  • debug_info – Some object that can be filled by information useful for debugging.
  • metrics – Optional dict that is populated with some metrics that are good to monitor.
  • detailed_metrics – Optional dict with more detailed (and more time consuming) metrics that are good for debugging and profiling.
Returns:

Iterable with predictions for each item from items_features

train(common_features: Any, items_features: Iterable[Any], labels: Iterable[float], weights: Iterable[Optional[float]], debug_info: Any = None) → None[source]

Transform features, label and weight into VW line format and send it to Vowpal.

Parameters:
  • common_features – Features common for all items
  • items_features – Iterable with features for each item
  • labels – Iterable with same length as items features with label for each item
  • weights – Iterable with same length as items features with optional weight for each item
  • debug_info – Some object that can be filled by information useful for debugging

Blocking implementation

class subwabbit.blocking.VowpalWabbitProcess(formatter: subwabbit.base.VowpalWabbitBaseFormatter, vw_args: List[T], batch_size: int = 20, write_only: bool = False, audit_mode: bool = False)[source]

Class representing Vowpal Wabbit model. It runs vw command through subprocess library and communicates through pipes.

__init__(formatter: subwabbit.base.VowpalWabbitBaseFormatter, vw_args: List[T], batch_size: int = 20, write_only: bool = False, audit_mode: bool = False)[source]
Parameters:
  • formatter – Instance of subwabbit.base.VowpalWabbitBaseFormatter
  • vw_args – List of command line arguments for vw command, eg. [‘-q’, ‘::’] This list MUST NOT specify -p argument for vw command
  • batch_size – Number of lines communicated to Vowpal in one system call, has influence on performance. Smaller batches slightly reduces latencies and throughput.
  • write_only – whether we expect to get predictions or we will just train This can greatly improve training performance but disables predicting.
  • audit_mode – When set to True, VW is launched in audit mode with -a argument (overwrites -t argument). This allows to run explain_vw_line and get_human_readable_explanation methods.

Warning

WARNING: When audit_mode is turned on, it is not possible to call other methods then explain_vw_line.

close(timeout=120)[source]

Gracefully stop Vowpal Wabbit process

Parameters:timeout – Timeout for closing the VW process.
explain_vw_line(vw_line: str, link_function=False)[source]

Uses VW audit mode to inspect weights used for prediction. Audit mode has to be turned on by passing audit_mode=True to constructor.

Parameters:
  • vw_line – String in VW line format
  • link_function – If your model use link function, pass True
Returns:

(raw prediction without use of link function, explanation string)

predict(common_features: Any, items_features: Iterable[Any], timeout: Optional[float] = None, debug_info: Any = None, metrics: Optional[Dict[KT, VT]] = None, detailed_metrics: Optional[Dict[KT, VT]] = None) → Iterable[Union[float, str]][source]

Transforms iterable with item features to iterator of predictions.

Parameters:
  • common_features – Features common for all items
  • items_features – Iterable with features for each item
  • timeout – Optionally specify how much time in seconds is desired for computing predictions. In case timeout is passed, returned iterator can has less items that items features iterable.
  • debug_info – Some object that can be filled by information useful for debugging.
  • metrics

    Optional dict populated with metrics that are good to monitor:

    • prepare_time - Time from call start to start of prediction loop, including format_common_features call
    • total_time - Total time spend in predict call
    • num_lines - Count of predictions performed
  • detailed_metrics
    Optional dict with more detailed (and more time consuming) metrics that are good
    for debugging and profiling:
    • generating_lines_time - time spent by generating VW line
    • sending_lines_time - time spent by sending VW lines to OS pipe buffer
    • receiving_lines_time - time spent by reading predictions from OS pipe buffer

    For each key, there will be list of tuples (time, metric value).

Returns:

Iterable with predictions for each item from items_features

train(common_features: Any, items_features: Iterable[Any], labels: Iterable[float], weights: Iterable[Optional[float]], debug_info: Any = None) → None[source]

Transform features, label and weight into VW line format and send it to Vowpal.

Parameters:
  • common_features – Features common for all items
  • items_features – Iterable with features for each item
  • labels – Iterable with same length as items features with label for each item
  • weights – Iterable with same length as items features with optional weight for each item
  • debug_info – Some object that can be filled by information useful for debugging

Nonblocking implementation

class subwabbit.nonblocking.VowpalWabbitNonBlockingProcess(formatter: subwabbit.base.VowpalWabbitBaseFormatter, vw_args: List[T], batch_size: int = 20, audit_mode: bool = False, max_pending_lines: int = 20, write_timeout_ms: float = 0.001, pipe_buffer_size_bytes: Optional[int] = None)[source]

Class representing Vowpal Wabbit model. It runs vw bash command through subprocess library and communicates through non-blocking pipes.

Warning

Available on Linux only.

__init__(formatter: subwabbit.base.VowpalWabbitBaseFormatter, vw_args: List[T], batch_size: int = 20, audit_mode: bool = False, max_pending_lines: int = 20, write_timeout_ms: float = 0.001, pipe_buffer_size_bytes: Optional[int] = None)[source]
Parameters:
  • formatter – Instance of subwabbit.base.VowpalWabbitBaseFormatter
  • vw_args – List of command line arguments for vw command, eg. [‘-q’, ‘::’] This list MUST NOT specify -p argument for vw command
  • batch_size – Maximum number of lines communicated to Vowpal in one system call. Smaller batches means less system calls overhead, but also higher risk of keeping mess for other calls.
  • audit_mode – When turned on, VW is launched in audit mode with -a argument (overwrites -t argument). This allows to run explain_vw_line and get_human_readable_explanation methods.
  • max_pending_lines – How many lines can wait for prediction in buffers. Recommended to set it to same value as batch_size, but it can be higher.
  • write_timeout_ms – When predict is called with timeout, then write_timeout_ms before timeout sending lines to vowpal stops. It provides time to finish work without keeping mess that next call have to clean.
  • pipe_buffer_size_bytes – Optionally set size of system buffer for sending lines to Vowpal. None means use default buffer size, for more details see http://man7.org/linux/man-pages/man7/pipe.7.html and detailed_metrics argument of predict() method

Warning

WARNING: When audit_mode is turned on, it is not possible to call other methods then explain_vw_line.

cleanup(deadline: Optional[float] = None, debug_info: Any = None)[source]

Cleans buffers after previous calls

Parameters:deadline – Optional unix timestamp to end
explain_vw_line(vw_line: str, link_function=False)[source]

Uses VW audit mode to inspect weights used for prediction. Audit mode has to be turned on by passing audit_mode=True to constructor.

Parameters:
  • vw_line – String in VW line format
  • link_function – If your model use link function, pass True
Returns:

(raw prediction without use of link function, explanation string)

predict(common_features: Any, items_features: Iterable[Any], timeout: Optional[float] = None, debug_info: Any = None, metrics: Optional[Dict[KT, VT]] = None, detailed_metrics: Optional[Dict[KT, VT]] = None) → Iterable[Union[float, str]][source]

Transforms iterable with item features to iterator of predictions.

Parameters:
  • common_features – Features common for all items
  • items_features – Iterable with features for each item
  • timeout – Optionally specify how much time in seconds is desired for computing predictions. In case timeout is passed, returned iterator can has less items that items features iterable.
  • debug_info – Some object that can be filled by information useful for debugging.
  • metrics

    Optional dict populated with metrics that are good to monitor:

    • cleanup_time - Time spent on cleaning buffers after last calls
    • before_cleanup_pending_lines - Count of lines pending in buffers before cleaning
    • after_cleanup_pending_lines - Count of lines pending in buffers after cleaning
    • prepare_time - Time from call start to start of prediction loop, including format_common_features call
    • total_time - Total time spend in predict call
    • num_lines - Count of predictions performed
  • detailed_metrics
    Optional dict with more detailed (and more time consuming) metrics that are good
    for debugging and profiling:
    • sending_bytes - number of bytes (VW lines) sent to OS pipe buffer
    • receiving_bytes - number of bytes (predictions) received from OS pipe buffer
    • pending_lines - number of pending lines sent to VW at the time
    • generating_lines_time - time spent by generating VW lines batch
    • sending_lines_time - time spent by sending lines to OS pipe buffer
    • receiving_lines_time - time spent by receiving predictions from OS pipe buffer

    For each key, there will be list of tuples (time, metric value).

Returns:

Iterable with predictions for each item from items_features

train(common_features: Any, items_features: Iterable[Any], labels: Iterable[float], weights: Iterable[Optional[float]], debug_info: Any = None) → None[source]

Transform features, label and weight into VW line format and send it to Vowpal.

Parameters:
  • common_features – Features common for all items
  • items_features – Iterable with features for each item
  • labels – Iterable with same length as items features with label for each item
  • weights – Iterable with same length as items features with optional weight for each item
  • debug_info – Some object that can be filled by information useful for debugging