atnlp.eval¶

Model evaluation

html.py¶

Classes for rendering documents as html.

class atnlp.eval.html.Report[source]¶

Simple html report class based on bootstrap

Use the interface to add elements (title, figures, tables, text, etc), then use write to dump rendered html to file.

add_figure(cap='')[source]¶

Add figure to document

Call this directly after creating a figure with matplotlib. The figure will be embedded into the html document.

Parameters:	cap – figure caption (optional)

add_section(title)[source]¶

Add section to document

Parameters:	title – section title

add_styled_table(tab, cap='')[source]¶

Add styled table to document

Note: full control over html style is given to the Styler and bootstrap css is not used, so it can be difficult to get something that actually looks good.

Parameters:	tab – table (pandas Styler) cap – caption (optional)

add_table(tab, cap='')[source]¶

Add table to document

Parameters:	tab – table (pandas DataFrame) cap – caption (optional)

add_text(text)[source]¶

Add paragraph text to document

Parameters:	text – text string

add_title(title, par=None)[source]¶

Add title to document

Parameters:	title – title string par – paragraph to go with title (optional)

write(filename)[source]¶

Write rendered html to file

Parameters:	filename – path to output file

metrics.py¶

Functionality for computing performance metrics. Typically custom metrics not provided by sklearn.

atnlp.eval.metrics.flpd_score(Y_true, Y_pred)[source]¶

Return ‘false labels per document’ score

‘False labels per document’ is defined as:

score := total number of false labels / number of examples

Parameters:	Y_true – ground truth topic labels (one-hot format) Y_pred – topic predictions (one-hot format)
Returns:	false labels per document score

atnlp.eval.metrics.mlpd_score(Y_true, Y_pred)[source]¶

Missing labels per document score

‘Missing labels per document’ is defined as:

score := total number of missing labels / number of examples

Parameters:	Y_true – ground truth topic labels (one-hot format) Y_pred – topic predictions (one-hot format)
Returns:	missing labels per document score

atnlp.eval.metrics.recall_all_score(Y_true, Y_pred)[source]¶

Return the ‘recall all’ score

‘Recall all’ is defined as:

score := number of examples with all labels correct / number of examples

Parameters:	Y_true – ground truth topic labels (one-hot format) Y_pred – topic predictions (one-hot format)
Returns:	recall all score

plot.py¶

Functionality for creating performance summary plots.

atnlp.eval.plot.background_composition_pie(Y_true, Y_score, topic, threshold, min_topic_frac=0.05)[source]¶

Create a pie chart illustrating the major background contributions for given label

Background topics contributing less than min_topic_frac will be merged into a single contribution called “Other”.

A bar chart is also included illustrating the overall topic composition.

Parameters:	Y_true – ground truth topic labels (one-hot format) Y_score – topic probability predictions (shape: samples x topics) topic – name of topic to investigate threshold – threshold above which to investigate background contributions min_topic_frac – minimum background sample fraction
Returns:	tuple (figure, list of axes)

atnlp.eval.plot.binary_classification_accuracy_overlays(classifiers, X_train, y_train, X_test, y_test)[source]¶

Create overlays of binary classification accuracy for multiple classifiers

Parameters:	classifiers – list of tuples (name, classifier) X_train – training data y_train – binary training labels X_test – testing data y_test – binary testing labels
Returns:	tuple (figure, axis)

atnlp.eval.plot.create_awesome_plot_grid(nminor, ncol=5, maj_h=2, maj_w=3, min_xlabel=None, min_ylabel=None, maj_xlabel=None, maj_ylabel=None, grid=True)[source]¶

Returns an awesome plot grid

The grid includes a specified number (nminor) of minor plots (unit size in the grid) and a single major plot whose size can be specified in grid units (maj_h and maj_w).

The major plot is located top-right. If either dimension is 0 the major plot is omitted.

The minor plots are tiled from left-to-right, top-to-bottom on a grid of width ncol and will be spaced around the major plot.

The grid will look something like this

#----#----#----#---------#
|    |    |    |         |
|    |    |    |         |
#----#----#----#         |
|    |    |    |         |
|    |    |    |         |
#----#----#----#----#----#
|    |    |    |    |    |
|    |    |    |    |    |
#----#----#----#----#----#
|    |    |
|    |    | -->
#----#----#

Parameters:

nminor – number of minor plots
ncol – width of grid (in grid units)
maj_h – height of major plot (in grid units)
maj_w – width of major plot (in grid units)
min_xlabel – x-axis label of minor plots
min_ylabel – y-axis label of minor plots
maj_xlabel – x-axis label of major plot
maj_ylabel – y-axis label of major plot
grid – draw grid lines (if True)

Returns:

tuple (figure, major axis, minor axes (flat list), minor axes (2D list))

atnlp.eval.plot.false_labels_matrix(Y_true, Y_pred)[source]¶

Create MxM false labels matrix for M topics

Each column represents a given ground truth topic label. Each row represents the absolute number of false predicted labels.

Parameters:	Y_true – ground truth topic labels (one-hot format) Y_pred – topic predictions (one-hot format)
Returns:	tuple (figure, axis)

atnlp.eval.plot.get_multimodel_sample_size_dependence(models, datasets, labels, sample_fracs, scoring=None, cat_scoring=None)[source]¶

Return performance metrics vs training sample size

Fractions of data (sample_fracs) are randomly sampled from the training dataset and used to train the models, which are always evaluated on the full testing datasets.

Parameters:

models – list of topic labelling models
datasets – list of input data for models (each is (training, testing) tuple)
labels – tuple (train, test) of ground truth topic labels (one-hot format)
sample_fracs – list of sample fractions to scan
scoring – sklearn scorer or scoring name for topic averaged metric
cat_scoring – sklearn scorer or scoring name for individual topic metric

Returns:

tuple (entries per step, averaged model scores for each step, model scores for each topic for each step)

atnlp.eval.plot.keras_train_history_graph(history, metrics)[source]¶

Plot selected performance metrics as a function of training epoch.

Parameters:	history – keras training history metrics – list of metric names to plot
Returns:	tuple (figure, list of axes)

atnlp.eval.plot.multimodel_sample_size_dependence_graph(models, model_names, datasets, labels, sample_fracs, scoring=None, cat_scoring=None)[source]¶

Create graph of performance metric vs training sample size

Fractions of data (sample_fracs) are randomly sampled from the training dataset and used to train the models, which are always evaluated on the full testing datasets.

Parameters:

models – list of topic labelling models
model_names – list of model names
datasets – list of input data for models (each is (training, testing) tuple)
labels – tuple (train, test) of ground truth topic labels (one-hot format)
sample_fracs – list of sample fractions to scan
scoring – sklearn scorer or scoring name for topic averaged metric
cat_scoring – sklearn scorer or scoring name for individual topic metric

Returns:

tuple (figure, major axis, minor axes (flat list), minor axes (2D list))

atnlp.eval.plot.topic_correlation_matrix(Y)[source]¶

Create MxM correlation matrix for M topics

Each column represents a given ground truth topic label. Each row represents the relative frequency with which other ground truth labels co-occur.

Parameters:	Y – ground truth topic labels (one-hot format)
Returns:	tuple (figure, axis)

atnlp.eval.plot.topic_labelling_barchart(Y_true, Y_preds, model_names)[source]¶

Create topic labelling barchart

The figure includes a 1x4 grid of bar charts, illustrating the number of samples, precision, recall and f1 scores for each topic. The scores are overlayed for each model.

Parameters:	Y_true – ground truth topic labels (one-hot format) Y_preds – topic predictions for each model (list of one-hot formats) model_names – topic labelling model names
Returns:	tuple (figure, list of axes)

atnlp.eval.plot.topic_labelling_barchart_cv(models, model_names, model_inputs, Y, cv=10)[source]¶

Create topic labelling barchart with k-fold cross-validation

Figure layout is the same as in topic_labelling_barchart().

K-fold cross-validation is used to estimate uncertainties on the metrics.

Parameters:	models – list of topic labelling models model_names – list of model names model_inputs – list of input data for models Y – ground truth topic labels (one-hot format) cv – number of folds for cross-validation
Returns:	tuple (figure, list of axes)

atnlp.eval.plot.topic_labelling_scatter_plots(Y_true, Y_pred, sample_min=None, thresholds=None)[source]¶

Create scatter plots comparing precision, recall and number of samples

Parameters:	Y_true – ground truth topic labels (one-hot format) Y_pred – topic predictions (one-hot format) sample_min – minimum number of examples per topic thresholds – list of thresholds per category (optional)
Returns:	tuple (figure, list of axes)

atnlp.eval.plot.topic_migration_matrix(Y_true, Y_pred)[source]¶

Create MxM migration matrix for M topics

Each column represents a given ground truth topic label. Each row represents the relative frequency with which predicted labels are assigned.

Parameters:	Y_true – ground truth topic labels (one-hot format) Y_pred – topic predictions (one-hot format)
Returns:	tuple (figure, axis)

table.py¶

Functionality for creating performance summary tables.

atnlp.eval.table.multimodel_topic_labelling_summary_tables(Y_true, Y_preds, model_names, sample_min=None, thresholds=None)[source]¶

Return dictionary of topic labelling summary tables for multiple model predictions

The dictionary includes a single table for each of the metrics included in topic_labelling_summary_table(), where the key is the metric name.

An overall summary table (with key summary) is also provided, including the following metrics:

pre_mic, rec_mic, f1_mic: precision, recall and f1 scores using ‘micro’ averaging over topics
recall_all: recall calculated requiring all labels in document correct (see atnlp.eval.metrics.recall_all_score())
flpd, mlpd: false/missing labels per document (see atnlp.eval.metrics.flpd_score(), atnlp.eval.metrics.mlpd_score())

In each table, metrics are provided for each of the models provided.

If sample_min is specified, topics with fewer examples will be omitted.

thresholds is a list of one threshold per category per model, which if specified, will be applied to Y_pred to generate class predictions. In this case Y_pred is assumed to be a matrix of class probability scores rather than predictions.

Parameters:	Y_true – ground truth topic labels (one-hot format) Y_preds – list of topic predictions for each model (one-hot format) model_names – name of each model sample_min – minimum number of examples per topic thresholds – list of thresholds per category (optional)
Returns:	dict of summary tables (pandas DataFrames)

atnlp.eval.table.topic_labelling_summary_table(Y_true, Y_pred, sample_min=None, thresholds=None)[source]¶

Return topic labelling summary table for single model predictions

Contents of the table includes the following entries per topic:

samples: total number of examples
standard metrics: precision, recall, f1
fl: total number of false labels (for topic)
flps: false labels for topic / topic samples
flpd: false labels for topic / total documents
ml: total numebr of missing labels (for topic)
mlps: missing labels for topic / topic samples
mlpd: missing labels for topic / total documents

If sample_min is specified, topics with fewer examples will be omitted.

thresholds is a list of one threshold per category, which if specified, will be applied to Y_pred to generate class predictions. In this case Y_pred is assumed to be a matrix of class probability scores rather than predictions.

Parameters:	Y_true – ground truth topic labels (one-hot format) Y_pred – topic predictions (one-hot format) sample_min – minimum number of examples per topic thresholds – list of thresholds per category (optional)
Returns:	summary table (pandas DataFrame)