
Model evaluation

Classes for rendering documents as html.

class atnlp.eval.html.Report[source]

Simple html report class based on bootstrap

Use the interface to add elements (title, figures, tables, text, etc), then use write to dump rendered html to file.


Add figure to document

Call this directly after creating a figure with matplotlib. The figure will be embedded into the html document.

Parameters:cap – figure caption (optional)

Add section to document

Parameters:title – section title
add_styled_table(tab, cap='')[source]

Add styled table to document

Note: full control over html style is given to the Styler and bootstrap css is not used, so it can be difficult to get something that actually looks good.

  • tab – table (pandas Styler)
  • cap – caption (optional)
add_table(tab, cap='')[source]

Add table to document

  • tab – table (pandas DataFrame)
  • cap – caption (optional)

Add paragraph text to document

Parameters:text – text string
add_title(title, par=None)[source]

Add title to document

  • title – title string
  • par – paragraph to go with title (optional)

Write rendered html to file

Parameters:filename – path to output file

Functionality for computing performance metrics. Typically custom metrics not provided by sklearn.

atnlp.eval.metrics.flpd_score(Y_true, Y_pred)[source]

Return ‘false labels per document’ score

‘False labels per document’ is defined as:

score := total number of false labels / number of examples
  • Y_true – ground truth topic labels (one-hot format)
  • Y_pred – topic predictions (one-hot format)

false labels per document score

atnlp.eval.metrics.mlpd_score(Y_true, Y_pred)[source]

Missing labels per document score

‘Missing labels per document’ is defined as:

score := total number of missing labels / number of examples
  • Y_true – ground truth topic labels (one-hot format)
  • Y_pred – topic predictions (one-hot format)

missing labels per document score

atnlp.eval.metrics.recall_all_score(Y_true, Y_pred)[source]

Return the ‘recall all’ score

‘Recall all’ is defined as:

score := number of examples with all labels correct / number of examples
  • Y_true – ground truth topic labels (one-hot format)
  • Y_pred – topic predictions (one-hot format)

recall all score

Functionality for creating performance summary plots.

atnlp.eval.plot.background_composition_pie(Y_true, Y_score, topic, threshold, min_topic_frac=0.05)[source]

Create a pie chart illustrating the major background contributions for given label

Background topics contributing less than min_topic_frac will be merged into a single contribution called “Other”.

A bar chart is also included illustrating the overall topic composition.

  • Y_true – ground truth topic labels (one-hot format)
  • Y_score – topic probability predictions (shape: samples x topics)
  • topic – name of topic to investigate
  • threshold – threshold above which to investigate background contributions
  • min_topic_frac – minimum background sample fraction

tuple (figure, list of axes)

atnlp.eval.plot.binary_classification_accuracy_overlays(classifiers, X_train, y_train, X_test, y_test)[source]

Create overlays of binary classification accuracy for multiple classifiers

  • classifiers – list of tuples (name, classifier)
  • X_train – training data
  • y_train – binary training labels
  • X_test – testing data
  • y_test – binary testing labels

tuple (figure, axis)

atnlp.eval.plot.create_awesome_plot_grid(nminor, ncol=5, maj_h=2, maj_w=3, min_xlabel=None, min_ylabel=None, maj_xlabel=None, maj_ylabel=None, grid=True)[source]

Returns an awesome plot grid

The grid includes a specified number (nminor) of minor plots (unit size in the grid) and a single major plot whose size can be specified in grid units (maj_h and maj_w).

The major plot is located top-right. If either dimension is 0 the major plot is omitted.

The minor plots are tiled from left-to-right, top-to-bottom on a grid of width ncol and will be spaced around the major plot.

The grid will look something like this

|    |    |    |         |
|    |    |    |         |
#----#----#----#         |
|    |    |    |         |
|    |    |    |         |
|    |    |    |    |    |
|    |    |    |    |    |
|    |    |
|    |    | -->
  • nminor – number of minor plots
  • ncol – width of grid (in grid units)
  • maj_h – height of major plot (in grid units)
  • maj_w – width of major plot (in grid units)
  • min_xlabel – x-axis label of minor plots
  • min_ylabel – y-axis label of minor plots
  • maj_xlabel – x-axis label of major plot
  • maj_ylabel – y-axis label of major plot
  • grid – draw grid lines (if True)

tuple (figure, major axis, minor axes (flat list), minor axes (2D list))

atnlp.eval.plot.false_labels_matrix(Y_true, Y_pred)[source]

Create MxM false labels matrix for M topics

Each column represents a given ground truth topic label. Each row represents the absolute number of false predicted labels.

  • Y_true – ground truth topic labels (one-hot format)
  • Y_pred – topic predictions (one-hot format)

tuple (figure, axis)

atnlp.eval.plot.get_multimodel_sample_size_dependence(models, datasets, labels, sample_fracs, scoring=None, cat_scoring=None)[source]

Return performance metrics vs training sample size

Fractions of data (sample_fracs) are randomly sampled from the training dataset and used to train the models, which are always evaluated on the full testing datasets.

  • models – list of topic labelling models
  • datasets – list of input data for models (each is (training, testing) tuple)
  • labels – tuple (train, test) of ground truth topic labels (one-hot format)
  • sample_fracs – list of sample fractions to scan
  • scoring – sklearn scorer or scoring name for topic averaged metric
  • cat_scoring – sklearn scorer or scoring name for individual topic metric

tuple (entries per step, averaged model scores for each step, model scores for each topic for each step)

atnlp.eval.plot.keras_train_history_graph(history, metrics)[source]

Plot selected performance metrics as a function of training epoch.

  • history – keras training history
  • metrics – list of metric names to plot

tuple (figure, list of axes)

atnlp.eval.plot.multimodel_sample_size_dependence_graph(models, model_names, datasets, labels, sample_fracs, scoring=None, cat_scoring=None)[source]

Create graph of performance metric vs training sample size

Fractions of data (sample_fracs) are randomly sampled from the training dataset and used to train the models, which are always evaluated on the full testing datasets.

  • models – list of topic labelling models
  • model_names – list of model names
  • datasets – list of input data for models (each is (training, testing) tuple)
  • labels – tuple (train, test) of ground truth topic labels (one-hot format)
  • sample_fracs – list of sample fractions to scan
  • scoring – sklearn scorer or scoring name for topic averaged metric
  • cat_scoring – sklearn scorer or scoring name for individual topic metric

tuple (figure, major axis, minor axes (flat list), minor axes (2D list))


Create MxM correlation matrix for M topics

Each column represents a given ground truth topic label. Each row represents the relative frequency with which other ground truth labels co-occur.

Parameters:Y – ground truth topic labels (one-hot format)
Returns:tuple (figure, axis)
atnlp.eval.plot.topic_labelling_barchart(Y_true, Y_preds, model_names)[source]

Create topic labelling barchart

The figure includes a 1x4 grid of bar charts, illustrating the number of samples, precision, recall and f1 scores for each topic. The scores are overlayed for each model.

  • Y_true – ground truth topic labels (one-hot format)
  • Y_preds – topic predictions for each model (list of one-hot formats)
  • model_names – topic labelling model names

tuple (figure, list of axes)

atnlp.eval.plot.topic_labelling_barchart_cv(models, model_names, model_inputs, Y, cv=10)[source]

Create topic labelling barchart with k-fold cross-validation

Figure layout is the same as in topic_labelling_barchart().

K-fold cross-validation is used to estimate uncertainties on the metrics.

  • models – list of topic labelling models
  • model_names – list of model names
  • model_inputs – list of input data for models
  • Y – ground truth topic labels (one-hot format)
  • cv – number of folds for cross-validation

tuple (figure, list of axes)

atnlp.eval.plot.topic_labelling_scatter_plots(Y_true, Y_pred, sample_min=None, thresholds=None)[source]

Create scatter plots comparing precision, recall and number of samples

  • Y_true – ground truth topic labels (one-hot format)
  • Y_pred – topic predictions (one-hot format)
  • sample_min – minimum number of examples per topic
  • thresholds – list of thresholds per category (optional)

tuple (figure, list of axes)

atnlp.eval.plot.topic_migration_matrix(Y_true, Y_pred)[source]

Create MxM migration matrix for M topics

Each column represents a given ground truth topic label. Each row represents the relative frequency with which predicted labels are assigned.

  • Y_true – ground truth topic labels (one-hot format)
  • Y_pred – topic predictions (one-hot format)

tuple (figure, axis)

Functionality for creating performance summary tables.

atnlp.eval.table.multimodel_topic_labelling_summary_tables(Y_true, Y_preds, model_names, sample_min=None, thresholds=None)[source]

Return dictionary of topic labelling summary tables for multiple model predictions

The dictionary includes a single table for each of the metrics included in topic_labelling_summary_table(), where the key is the metric name.

An overall summary table (with key summary) is also provided, including the following metrics:

In each table, metrics are provided for each of the models provided.

If sample_min is specified, topics with fewer examples will be omitted.

thresholds is a list of one threshold per category per model, which if specified, will be applied to Y_pred to generate class predictions. In this case Y_pred is assumed to be a matrix of class probability scores rather than predictions.

  • Y_true – ground truth topic labels (one-hot format)
  • Y_preds – list of topic predictions for each model (one-hot format)
  • model_names – name of each model
  • sample_min – minimum number of examples per topic
  • thresholds – list of thresholds per category (optional)

dict of summary tables (pandas DataFrames)

atnlp.eval.table.topic_labelling_summary_table(Y_true, Y_pred, sample_min=None, thresholds=None)[source]

Return topic labelling summary table for single model predictions

Contents of the table includes the following entries per topic:

  • samples: total number of examples
  • standard metrics: precision, recall, f1
  • fl: total number of false labels (for topic)
  • flps: false labels for topic / topic samples
  • flpd: false labels for topic / total documents
  • ml: total numebr of missing labels (for topic)
  • mlps: missing labels for topic / topic samples
  • mlpd: missing labels for topic / total documents

If sample_min is specified, topics with fewer examples will be omitted.

thresholds is a list of one threshold per category, which if specified, will be applied to Y_pred to generate class predictions. In this case Y_pred is assumed to be a matrix of class probability scores rather than predictions.

  • Y_true – ground truth topic labels (one-hot format)
  • Y_pred – topic predictions (one-hot format)
  • sample_min – minimum number of examples per topic
  • thresholds – list of thresholds per category (optional)

summary table (pandas DataFrame)