API¶

Base classes¶

class subwabbit.base.VowpalWabbitBaseFormatter[source]¶

Formatter translates structured information about context and items to Vowpal Wabbit’s input format: https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Input-format

It also can implement reverse translation, from Vowpal Wabbits feature names into human readable feature names.

format_common_features(common_features: Any, debug_info: Any = None) → str[source]¶

Return part of VW line with features that are common for one call of predict/train. This method will run just once per one call of subwabbit.base.VowpalWabbitBaseModel’s predict() or train() method.

Parameters:	common_features – Features common for all items debug_info – Optional dict that can be filled by information useful for debugging
Returns:	Part of line that is common for each item in one call. Returned string has to start with ‘\|’ symbol.

format_item_features(common_features: Any, item_features: Any, debug_info: Any = None) → str[source]¶

Return part of VW line with features specific to each item. This method will run for each item per one call of subwabbit.base.VowpalWabbitBaseModel’s predict() or train() method.

Note

It is a good idea to cache results of this method.

Parameters:

common_features – Features common for all items
item_features – Features for item
debug_info – Optional dict that can be filled by information useful for debugging

Returns:

Part of line that is specific for item. Depends on whether namespaces are used or not in format_common_features method:

namespaces are used: returned string has to start with '|NAMESPACE_NAME' where NAMESPACE_NAME is the name of some namespace
namespaces are not used: returned string should not contain ‘|’ symbol

get_formatted_example(common_line_part: str, item_line_part: str, label: Optional[float] = None, weight: Optional[float] = None, debug_info: Optional[Dict[Any, Any]] = None)[source]¶

Compose valid VW line from its common and item-dependent parts.

Parameters:	common_line_part – Part of line that is common for each item in one call. item_line_part – Part of line specific for each item label – Label of this row weight – Optional weight of row debug_info – Optional dict that can be filled by information useful for debugging
Returns:	One VW line in input format: https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Input-format

get_human_readable_explanation(explanation_string: str, feature_translator: Any = None) → List[Dict[KT, VT]][source]¶

Transform explanation string into more readable form. Every feature used for prediction is translated into this structure:

{
    # For each feature used in higher interaction there is a 2-tuple
    'names': [('Human readable namespace name 1', 'Human readable feature name 1'), ...],
    'original_feature_name': 'c^c8*f^f102'  # feature name how vowpal sees it,
    'hashindex': 123,  # Vowpal's internal hash of feature name
    'value': 0.123, # value for feature in input line
    'weight': -0.534, # weight learned by VW for this feature
    'potential': value * weight,
    'relative_potential': abs(potential) / sum_of_abs_potentials_for_all_features
}

Parameters:	explanation_string – Explanation string from `explain_vw_line()` feature_translator – Any object that can help you with translation of feature names into human readable form, for example some database connection. See `parse_element()`
Returns:	List of dicts, sorted by contribution to final score

get_human_readable_explanation_html(explanation_string: str, feature_translator: Any = None, max_rows: Optional[int] = None)[source]¶

Visualize importance of features in Jupyter notebook.

Parameters:	explanation_string – Explanation string from `explain_vw_line()` feature_translator – Any object that can help you with translation, e.g. some database connection. max_rows – Maximum number of most important features. None return all used features.
Returns:	IPython.core.display.HTML

parse_element(element: str, feature_translator: Any = None) → Tuple[str, str][source]¶

This method is supposed to translate namespace name and feature name to human readable form.

For example, element can be “a_item_id^i123” and result can be (‘Item ID’, ‘News of the day: ID of item is 123’)

Parameters:	element – namespace name and feature name, e.g. a_item_id^i123 feature_translator – Any object that can help you with translation, e.g. some database connection
Returns:	tuple(human understandable namespace name, human understandable feature name)

class subwabbit.base.VowpalWabbitDummyFormatter[source]¶: Formatter that assumes that either common features and item features are already formatted VW input format strings.

class subwabbit.base.VowpalWabbitBaseModel(formatter: subwabbit.base.VowpalWabbitBaseFormatter)[source]¶

Declaration of Vowpal Wabbit model interface.

explain_vw_line(vw_line: str, link_function: bool = False)[source]¶

Uses VW audit mode to inspect weights used for prediction. Audit mode has to be turned on by passing audit_mode=True to constructor.

Parameters:	vw_line – String in VW line format link_function – If your model use link function, pass True
Returns:	(raw prediction without use of link function, explanation string)

predict(common_features: Any, items_features: Iterable[Any], timeout: Optional[float] = None, debug_info: Any = None, metrics: Optional[Dict[KT, VT]] = None, detailed_metrics: Optional[Dict[KT, VT]] = None) → Iterable[Union[float, str]][source]¶

Transforms iterable with item features to iterator of predictions.

Parameters:

common_features – Features common for all items
items_features – Iterable with features for each item
timeout – Optionally specify how much time in seconds is desired for computing predictions. In case timeout is passed, returned iterator can has less items that items features iterable.
debug_info – Some object that can be filled by information useful for debugging.
metrics – Optional dict that is populated with some metrics that are good to monitor.
detailed_metrics – Optional dict with more detailed (and more time consuming) metrics that are good for debugging and profiling.

Returns:

Iterable with predictions for each item from items_features

train(common_features: Any, items_features: Iterable[Any], labels: Iterable[float], weights: Iterable[Optional[float]], debug_info: Any = None) → None[source]¶

Transform features, label and weight into VW line format and send it to Vowpal.

Parameters:	common_features – Features common for all items items_features – Iterable with features for each item labels – Iterable with same length as items features with label for each item weights – Iterable with same length as items features with optional weight for each item debug_info – Some object that can be filled by information useful for debugging

Blocking implementation¶

class subwabbit.blocking.VowpalWabbitProcess(formatter: subwabbit.base.VowpalWabbitBaseFormatter, vw_args: List[T], batch_size: int = 20, write_only: bool = False, audit_mode: bool = False)[source]¶

Class representing Vowpal Wabbit model. It runs vw command through subprocess library and communicates through pipes.

__init__(formatter: subwabbit.base.VowpalWabbitBaseFormatter, vw_args: List[T], batch_size: int = 20, write_only: bool = False, audit_mode: bool = False)[source]¶

Parameters:

formatter – Instance of subwabbit.base.VowpalWabbitBaseFormatter
vw_args – List of command line arguments for vw command, eg. [‘-q’, ‘::’] This list MUST NOT specify -p argument for vw command
batch_size – Number of lines communicated to Vowpal in one system call, has influence on performance. Smaller batches slightly reduces latencies and throughput.
write_only – whether we expect to get predictions or we will just train This can greatly improve training performance but disables predicting.
audit_mode – When set to True, VW is launched in audit mode with -a argument (overwrites -t argument). This allows to run explain_vw_line and get_human_readable_explanation methods.

Warning

WARNING: When audit_mode is turned on, it is not possible to call other methods then explain_vw_line.

close(timeout=120)[source]¶

Gracefully stop Vowpal Wabbit process

Parameters:	timeout – Timeout for closing the VW process.

explain_vw_line(vw_line: str, link_function=False)[source]¶

Uses VW audit mode to inspect weights used for prediction. Audit mode has to be turned on by passing audit_mode=True to constructor.

Parameters:	vw_line – String in VW line format link_function – If your model use link function, pass True
Returns:	(raw prediction without use of link function, explanation string)

predict(common_features: Any, items_features: Iterable[Any], timeout: Optional[float] = None, debug_info: Any = None, metrics: Optional[Dict[KT, VT]] = None, detailed_metrics: Optional[Dict[KT, VT]] = None) → Iterable[Union[float, str]][source]¶

Transforms iterable with item features to iterator of predictions.

Parameters:

common_features – Features common for all items
items_features – Iterable with features for each item
timeout – Optionally specify how much time in seconds is desired for computing predictions. In case timeout is passed, returned iterator can has less items that items features iterable.
debug_info – Some object that can be filled by information useful for debugging.
metrics –
Optional dict populated with metrics that are good to monitor:
- prepare_time - Time from call start to start of prediction loop, including format_common_features call
- total_time - Total time spend in predict call
- num_lines - Count of predictions performed
detailed_metrics –

Optional dict with more detailed (and more time consuming) metrics that are good

for debugging and profiling:
- generating_lines_time - time spent by generating VW line
- sending_lines_time - time spent by sending VW lines to OS pipe buffer
- receiving_lines_time - time spent by reading predictions from OS pipe buffer
For each key, there will be list of tuples (time, metric value).

Returns:

Iterable with predictions for each item from items_features

train(common_features: Any, items_features: Iterable[Any], labels: Iterable[float], weights: Iterable[Optional[float]], debug_info: Any = None) → None[source]¶

Transform features, label and weight into VW line format and send it to Vowpal.

Parameters:	common_features – Features common for all items items_features – Iterable with features for each item labels – Iterable with same length as items features with label for each item weights – Iterable with same length as items features with optional weight for each item debug_info – Some object that can be filled by information useful for debugging

Nonblocking implementation¶

class subwabbit.nonblocking.VowpalWabbitNonBlockingProcess(formatter: subwabbit.base.VowpalWabbitBaseFormatter, vw_args: List[T], batch_size: int = 20, audit_mode: bool = False, max_pending_lines: int = 20, write_timeout_ms: float = 0.001, pipe_buffer_size_bytes: Optional[int] = None)[source]¶

Class representing Vowpal Wabbit model. It runs vw bash command through subprocess library and communicates through non-blocking pipes.

Warning

Available on Linux only.

__init__(formatter: subwabbit.base.VowpalWabbitBaseFormatter, vw_args: List[T], batch_size: int = 20, audit_mode: bool = False, max_pending_lines: int = 20, write_timeout_ms: float = 0.001, pipe_buffer_size_bytes: Optional[int] = None)[source]¶

Parameters:

formatter – Instance of subwabbit.base.VowpalWabbitBaseFormatter
vw_args – List of command line arguments for vw command, eg. [‘-q’, ‘::’] This list MUST NOT specify -p argument for vw command
batch_size – Maximum number of lines communicated to Vowpal in one system call. Smaller batches means less system calls overhead, but also higher risk of keeping mess for other calls.
audit_mode – When turned on, VW is launched in audit mode with -a argument (overwrites -t argument). This allows to run explain_vw_line and get_human_readable_explanation methods.
max_pending_lines – How many lines can wait for prediction in buffers. Recommended to set it to same value as batch_size, but it can be higher.
write_timeout_ms – When predict is called with timeout, then write_timeout_ms before timeout sending lines to vowpal stops. It provides time to finish work without keeping mess that next call have to clean.
pipe_buffer_size_bytes – Optionally set size of system buffer for sending lines to Vowpal. None means use default buffer size, for more details see http://man7.org/linux/man-pages/man7/pipe.7.html and detailed_metrics argument of predict() method

Warning

WARNING: When audit_mode is turned on, it is not possible to call other methods then explain_vw_line.

cleanup(deadline: Optional[float] = None, debug_info: Any = None)[source]¶

Cleans buffers after previous calls

Parameters:	deadline – Optional unix timestamp to end

explain_vw_line(vw_line: str, link_function=False)[source]¶

Uses VW audit mode to inspect weights used for prediction. Audit mode has to be turned on by passing audit_mode=True to constructor.

Parameters:	vw_line – String in VW line format link_function – If your model use link function, pass True
Returns:	(raw prediction without use of link function, explanation string)

predict(common_features: Any, items_features: Iterable[Any], timeout: Optional[float] = None, debug_info: Any = None, metrics: Optional[Dict[KT, VT]] = None, detailed_metrics: Optional[Dict[KT, VT]] = None) → Iterable[Union[float, str]][source]¶

Transforms iterable with item features to iterator of predictions.

Parameters:

common_features – Features common for all items
items_features – Iterable with features for each item
timeout – Optionally specify how much time in seconds is desired for computing predictions. In case timeout is passed, returned iterator can has less items that items features iterable.
debug_info – Some object that can be filled by information useful for debugging.
metrics –
Optional dict populated with metrics that are good to monitor:
- cleanup_time - Time spent on cleaning buffers after last calls
- before_cleanup_pending_lines - Count of lines pending in buffers before cleaning
- after_cleanup_pending_lines - Count of lines pending in buffers after cleaning
- prepare_time - Time from call start to start of prediction loop, including format_common_features call
- total_time - Total time spend in predict call
- num_lines - Count of predictions performed
detailed_metrics –

Optional dict with more detailed (and more time consuming) metrics that are good

for debugging and profiling:
- sending_bytes - number of bytes (VW lines) sent to OS pipe buffer
- receiving_bytes - number of bytes (predictions) received from OS pipe buffer
- pending_lines - number of pending lines sent to VW at the time
- generating_lines_time - time spent by generating VW lines batch
- sending_lines_time - time spent by sending lines to OS pipe buffer
- receiving_lines_time - time spent by receiving predictions from OS pipe buffer
For each key, there will be list of tuples (time, metric value).

Returns:

Iterable with predictions for each item from items_features

train(common_features: Any, items_features: Iterable[Any], labels: Iterable[float], weights: Iterable[Optional[float]], debug_info: Any = None) → None[source]¶

Transform features, label and weight into VW line format and send it to Vowpal.

Parameters:	common_features – Features common for all items items_features – Iterable with features for each item labels – Iterable with same length as items features with label for each item weights – Iterable with same length as items features with optional weight for each item debug_info – Some object that can be filled by information useful for debugging