API¶
Base classes¶
-
class
subwabbit.base.
VowpalWabbitBaseFormatter
[source]¶ Formatter translates structured information about context and items to Vowpal Wabbit’s input format: https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Input-format
It also can implement reverse translation, from Vowpal Wabbits feature names into human readable feature names.
-
format_common_features
(common_features: Any, debug_info: Any = None) → str[source]¶ Return part of VW line with features that are common for one call of predict/train. This method will run just once per one call of
subwabbit.base.VowpalWabbitBaseModel
’s predict() or train() method.Parameters: - common_features – Features common for all items
- debug_info – Optional dict that can be filled by information useful for debugging
Returns: Part of line that is common for each item in one call. Returned string has to start with ‘|’ symbol.
-
format_item_features
(common_features: Any, item_features: Any, debug_info: Any = None) → str[source]¶ Return part of VW line with features specific to each item. This method will run for each item per one call of
subwabbit.base.VowpalWabbitBaseModel
’s predict() or train() method.Note
It is a good idea to cache results of this method.
Parameters: - common_features – Features common for all items
- item_features – Features for item
- debug_info – Optional dict that can be filled by information useful for debugging
Returns: Part of line that is specific for item. Depends on whether namespaces are used or not in
format_common_features
method:- namespaces are used: returned string has to start with
'|NAMESPACE_NAME'
where NAMESPACE_NAME is the name of some namespace - namespaces are not used: returned string should not contain ‘|’ symbol
-
get_formatted_example
(common_line_part: str, item_line_part: str, label: Optional[float] = None, weight: Optional[float] = None, debug_info: Optional[Dict[Any, Any]] = None)[source]¶ Compose valid VW line from its common and item-dependent parts.
Parameters: - common_line_part – Part of line that is common for each item in one call.
- item_line_part – Part of line specific for each item
- label – Label of this row
- weight – Optional weight of row
- debug_info – Optional dict that can be filled by information useful for debugging
Returns: One VW line in input format: https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Input-format
-
get_human_readable_explanation
(explanation_string: str, feature_translator: Any = None) → List[Dict[KT, VT]][source]¶ Transform explanation string into more readable form. Every feature used for prediction is translated into this structure:
{ # For each feature used in higher interaction there is a 2-tuple 'names': [('Human readable namespace name 1', 'Human readable feature name 1'), ...], 'original_feature_name': 'c^c8*f^f102' # feature name how vowpal sees it, 'hashindex': 123, # Vowpal's internal hash of feature name 'value': 0.123, # value for feature in input line 'weight': -0.534, # weight learned by VW for this feature 'potential': value * weight, 'relative_potential': abs(potential) / sum_of_abs_potentials_for_all_features }
Parameters: - explanation_string – Explanation string from
explain_vw_line()
- feature_translator – Any object that can help you with translation of feature names into human readable
form, for example some database connection.
See
parse_element()
Returns: List of dicts, sorted by contribution to final score
- explanation_string – Explanation string from
-
get_human_readable_explanation_html
(explanation_string: str, feature_translator: Any = None, max_rows: Optional[int] = None)[source]¶ Visualize importance of features in Jupyter notebook.
Parameters: - explanation_string – Explanation string from
explain_vw_line()
- feature_translator – Any object that can help you with translation, e.g. some database connection.
- max_rows – Maximum number of most important features. None return all used features.
Returns: IPython.core.display.HTML
- explanation_string – Explanation string from
-
parse_element
(element: str, feature_translator: Any = None) → Tuple[str, str][source]¶ This method is supposed to translate namespace name and feature name to human readable form.
For example, element can be “a_item_id^i123” and result can be (‘Item ID’, ‘News of the day: ID of item is 123’)
Parameters: - element – namespace name and feature name, e.g. a_item_id^i123
- feature_translator – Any object that can help you with translation, e.g. some database connection
Returns: tuple(human understandable namespace name, human understandable feature name)
-
-
class
subwabbit.base.
VowpalWabbitDummyFormatter
[source]¶ Formatter that assumes that either common features and item features are already formatted VW input format strings.
-
class
subwabbit.base.
VowpalWabbitBaseModel
(formatter: subwabbit.base.VowpalWabbitBaseFormatter)[source]¶ Declaration of Vowpal Wabbit model interface.
-
explain_vw_line
(vw_line: str, link_function: bool = False)[source]¶ Uses VW audit mode to inspect weights used for prediction. Audit mode has to be turned on by passing
audit_mode=True
to constructor.Parameters: - vw_line – String in VW line format
- link_function – If your model use link function, pass True
Returns: (raw prediction without use of link function, explanation string)
-
predict
(common_features: Any, items_features: Iterable[Any], timeout: Optional[float] = None, debug_info: Any = None, metrics: Optional[Dict[KT, VT]] = None, detailed_metrics: Optional[Dict[KT, VT]] = None) → Iterable[Union[float, str]][source]¶ Transforms iterable with item features to iterator of predictions.
Parameters: - common_features – Features common for all items
- items_features – Iterable with features for each item
- timeout – Optionally specify how much time in seconds is desired for computing predictions. In case timeout is passed, returned iterator can has less items that items features iterable.
- debug_info – Some object that can be filled by information useful for debugging.
- metrics – Optional dict that is populated with some metrics that are good to monitor.
- detailed_metrics – Optional dict with more detailed (and more time consuming) metrics that are good for debugging and profiling.
Returns: Iterable with predictions for each item from
items_features
-
train
(common_features: Any, items_features: Iterable[Any], labels: Iterable[float], weights: Iterable[Optional[float]], debug_info: Any = None) → None[source]¶ Transform features, label and weight into VW line format and send it to Vowpal.
Parameters: - common_features – Features common for all items
- items_features – Iterable with features for each item
- labels – Iterable with same length as items features with label for each item
- weights – Iterable with same length as items features with optional weight for each item
- debug_info – Some object that can be filled by information useful for debugging
-
Blocking implementation¶
-
class
subwabbit.blocking.
VowpalWabbitProcess
(formatter: subwabbit.base.VowpalWabbitBaseFormatter, vw_args: List[T], batch_size: int = 20, write_only: bool = False, audit_mode: bool = False)[source]¶ Class representing Vowpal Wabbit model. It runs
vw
command through subprocess library and communicates through pipes.-
__init__
(formatter: subwabbit.base.VowpalWabbitBaseFormatter, vw_args: List[T], batch_size: int = 20, write_only: bool = False, audit_mode: bool = False)[source]¶ Parameters: - formatter – Instance of
subwabbit.base.VowpalWabbitBaseFormatter
- vw_args – List of command line arguments for vw command, eg. [‘-q’, ‘::’] This list MUST NOT specify -p argument for vw command
- batch_size – Number of lines communicated to Vowpal in one system call, has influence on performance. Smaller batches slightly reduces latencies and throughput.
- write_only – whether we expect to get predictions or we will just train This can greatly improve training performance but disables predicting.
- audit_mode – When set to True, VW is launched in audit mode with -a argument (overwrites -t argument). This allows to run explain_vw_line and get_human_readable_explanation methods.
Warning
WARNING: When audit_mode is turned on, it is not possible to call other methods then explain_vw_line.
- formatter – Instance of
-
close
(timeout=120)[source]¶ Gracefully stop Vowpal Wabbit process
Parameters: timeout – Timeout for closing the VW process.
-
explain_vw_line
(vw_line: str, link_function=False)[source]¶ Uses VW audit mode to inspect weights used for prediction. Audit mode has to be turned on by passing
audit_mode=True
to constructor.Parameters: - vw_line – String in VW line format
- link_function – If your model use link function, pass True
Returns: (raw prediction without use of link function, explanation string)
-
predict
(common_features: Any, items_features: Iterable[Any], timeout: Optional[float] = None, debug_info: Any = None, metrics: Optional[Dict[KT, VT]] = None, detailed_metrics: Optional[Dict[KT, VT]] = None) → Iterable[Union[float, str]][source]¶ Transforms iterable with item features to iterator of predictions.
Parameters: - common_features – Features common for all items
- items_features – Iterable with features for each item
- timeout – Optionally specify how much time in seconds is desired for computing predictions. In case timeout is passed, returned iterator can has less items that items features iterable.
- debug_info – Some object that can be filled by information useful for debugging.
- metrics –
Optional dict populated with metrics that are good to monitor:
prepare_time
- Time from call start to start of prediction loop, includingformat_common_features
calltotal_time
- Total time spend in predict callnum_lines
- Count of predictions performed
- detailed_metrics –
- Optional dict with more detailed (and more time consuming) metrics that are good
- for debugging and profiling:
generating_lines_time
- time spent by generating VW linesending_lines_time
- time spent by sending VW lines to OS pipe bufferreceiving_lines_time
- time spent by reading predictions from OS pipe buffer
For each key, there will be list of tuples (time, metric value).
Returns: Iterable with predictions for each item from
items_features
-
train
(common_features: Any, items_features: Iterable[Any], labels: Iterable[float], weights: Iterable[Optional[float]], debug_info: Any = None) → None[source]¶ Transform features, label and weight into VW line format and send it to Vowpal.
Parameters: - common_features – Features common for all items
- items_features – Iterable with features for each item
- labels – Iterable with same length as items features with label for each item
- weights – Iterable with same length as items features with optional weight for each item
- debug_info – Some object that can be filled by information useful for debugging
-
Nonblocking implementation¶
-
class
subwabbit.nonblocking.
VowpalWabbitNonBlockingProcess
(formatter: subwabbit.base.VowpalWabbitBaseFormatter, vw_args: List[T], batch_size: int = 20, audit_mode: bool = False, max_pending_lines: int = 20, write_timeout_ms: float = 0.001, pipe_buffer_size_bytes: Optional[int] = None)[source]¶ Class representing Vowpal Wabbit model. It runs vw bash command through subprocess library and communicates through non-blocking pipes.
Warning
Available on Linux only.
-
__init__
(formatter: subwabbit.base.VowpalWabbitBaseFormatter, vw_args: List[T], batch_size: int = 20, audit_mode: bool = False, max_pending_lines: int = 20, write_timeout_ms: float = 0.001, pipe_buffer_size_bytes: Optional[int] = None)[source]¶ Parameters: - formatter – Instance of
subwabbit.base.VowpalWabbitBaseFormatter
- vw_args – List of command line arguments for vw command, eg. [‘-q’, ‘::’] This list MUST NOT specify -p argument for vw command
- batch_size – Maximum number of lines communicated to Vowpal in one system call. Smaller batches means less system calls overhead, but also higher risk of keeping mess for other calls.
- audit_mode – When turned on, VW is launched in audit mode with -a argument (overwrites -t argument). This allows to run explain_vw_line and get_human_readable_explanation methods.
- max_pending_lines – How many lines can wait for prediction in buffers. Recommended to set it to same value as batch_size, but it can be higher.
- write_timeout_ms – When predict is called with timeout, then write_timeout_ms before timeout sending lines to vowpal stops. It provides time to finish work without keeping mess that next call have to clean.
- pipe_buffer_size_bytes – Optionally set size of system buffer for sending lines to Vowpal.
None means use default buffer size, for more details see
http://man7.org/linux/man-pages/man7/pipe.7.html and
detailed_metrics
argument ofpredict()
method
Warning
WARNING: When audit_mode is turned on, it is not possible to call other methods then explain_vw_line.
- formatter – Instance of
-
cleanup
(deadline: Optional[float] = None, debug_info: Any = None)[source]¶ Cleans buffers after previous calls
Parameters: deadline – Optional unix timestamp to end
-
explain_vw_line
(vw_line: str, link_function=False)[source]¶ Uses VW audit mode to inspect weights used for prediction. Audit mode has to be turned on by passing
audit_mode=True
to constructor.Parameters: - vw_line – String in VW line format
- link_function – If your model use link function, pass True
Returns: (raw prediction without use of link function, explanation string)
-
predict
(common_features: Any, items_features: Iterable[Any], timeout: Optional[float] = None, debug_info: Any = None, metrics: Optional[Dict[KT, VT]] = None, detailed_metrics: Optional[Dict[KT, VT]] = None) → Iterable[Union[float, str]][source]¶ Transforms iterable with item features to iterator of predictions.
Parameters: - common_features – Features common for all items
- items_features – Iterable with features for each item
- timeout – Optionally specify how much time in seconds is desired for computing predictions. In case timeout is passed, returned iterator can has less items that items features iterable.
- debug_info – Some object that can be filled by information useful for debugging.
- metrics –
Optional dict populated with metrics that are good to monitor:
cleanup_time
- Time spent on cleaning buffers after last callsbefore_cleanup_pending_lines
- Count of lines pending in buffers before cleaningafter_cleanup_pending_lines
- Count of lines pending in buffers after cleaningprepare_time
- Time from call start to start of prediction loop, includingformat_common_features
calltotal_time
- Total time spend in predict callnum_lines
- Count of predictions performed
- detailed_metrics –
- Optional dict with more detailed (and more time consuming) metrics that are good
- for debugging and profiling:
sending_bytes
- number of bytes (VW lines) sent to OS pipe bufferreceiving_bytes
- number of bytes (predictions) received from OS pipe bufferpending_lines
- number of pending lines sent to VW at the timegenerating_lines_time
- time spent by generating VW lines batchsending_lines_time
- time spent by sending lines to OS pipe bufferreceiving_lines_time
- time spent by receiving predictions from OS pipe buffer
For each key, there will be list of tuples (time, metric value).
Returns: Iterable with predictions for each item from
items_features
-
train
(common_features: Any, items_features: Iterable[Any], labels: Iterable[float], weights: Iterable[Optional[float]], debug_info: Any = None) → None[source]¶ Transform features, label and weight into VW line format and send it to Vowpal.
Parameters: - common_features – Features common for all items
- items_features – Iterable with features for each item
- labels – Iterable with same length as items features with label for each item
- weights – Iterable with same length as items features with optional weight for each item
- debug_info – Some object that can be filled by information useful for debugging
-