mir_eval.hierarchy
Evaluation criteria for hierarchical structure analysis.
Hierarchical structure analysis seeks to annotate a track with a nested
decomposition of the temporal elements of the piece, effectively providing
a kind of “parse tree” of the composition. Unlike the flat segmentation
metrics defined in mir_eval.segment
, which can only encode one level of
analysis, hierarchical annotations expose the relationships between short
segments and the larger compositional elements to which they belong.
Conventions
Annotations are assumed to take the form of an ordered list of segmentations.
As in the mir_eval.segment
metrics, each segmentation itself consists of
an n-by-2 array of interval times, so that the i
th segment spans time
intervals[i, 0]
to intervals[i, 1]
.
Hierarchical annotations are ordered by increasing specificity, so that the first segmentation should contain the fewest segments, and the last segmentation contains the most.
Metrics
mir_eval.hierarchy.tmeasure()
: Precision, recall, and F-measure of triplet-based frame accuracy for boundary detection.mir_eval.hierarchy.lmeasure()
: Precision, recall, and F-measure of triplet-based frame accuracy for segment labeling.
References
- mir_eval.hierarchy.validate_hier_intervals(intervals_hier)
Validate a hierarchical segment annotation.
- Parameters:
- intervals_hierordered list of segmentations
- Raises:
- ValueError
If any segmentation does not span the full duration of the top-level segmentation.
If any segmentation does not start at 0.
- mir_eval.hierarchy.tmeasure(reference_intervals_hier, estimated_intervals_hier, transitive=False, window=15.0, frame_size=0.1, beta=1.0)
Compute the tree measures for hierarchical segment annotations.
- Parameters:
- reference_intervals_hierlist of ndarray
reference_intervals_hier[i]
contains the segment intervals (in seconds) for thei
th layer of the annotations. Layers are ordered from top to bottom, so that the last list of intervals should be the most specific.- estimated_intervals_hierlist of ndarray
Like
reference_intervals_hier
but for the estimated annotation- transitivebool
whether to compute the t-measures using transitivity or not.
- windowfloat > 0
size of the window (in seconds). For each query frame q, result frames are only counted within q +- window.
- frame_sizefloat > 0
length (in seconds) of frames. The frame size cannot be longer than the window.
- betafloat > 0
beta parameter for the F-measure.
- Returns:
- t_precisionnumber [0, 1]
T-measure Precision
- t_recallnumber [0, 1]
T-measure Recall
- t_measurenumber [0, 1]
F-beta measure for
(t_precision, t_recall)
- Raises:
- ValueError
If either of the input hierarchies are inconsistent
If the input hierarchies have different time durations
If
frame_size > window
orframe_size <= 0
- mir_eval.hierarchy.lmeasure(reference_intervals_hier, reference_labels_hier, estimated_intervals_hier, estimated_labels_hier, frame_size=0.1, beta=1.0)
Compute the tree measures for hierarchical segment annotations.
- Parameters:
- reference_intervals_hierlist of ndarray
reference_intervals_hier[i]
contains the segment intervals (in seconds) for thei
th layer of the annotations. Layers are ordered from top to bottom, so that the last list of intervals should be the most specific.- reference_labels_hierlist of list of str
reference_labels_hier[i]
contains the segment labels for thei
th layer of the annotations- estimated_intervals_hierlist of ndarray
- estimated_labels_hierlist of ndarray
Like
reference_intervals_hier
andreference_labels_hier
but for the estimated annotation- frame_sizefloat > 0
length (in seconds) of frames. The frame size cannot be longer than the window.
- betafloat > 0
beta parameter for the F-measure.
- Returns:
- l_precisionnumber [0, 1]
L-measure Precision
- l_recallnumber [0, 1]
L-measure Recall
- l_measurenumber [0, 1]
F-beta measure for
(l_precision, l_recall)
- Raises:
- ValueError
If either of the input hierarchies are inconsistent
If the input hierarchies have different time durations
If
frame_size > window
orframe_size <= 0
- mir_eval.hierarchy.evaluate(ref_intervals_hier, ref_labels_hier, est_intervals_hier, est_labels_hier, **kwargs)
Compute all hierarchical structure metrics for the given reference and estimated annotations.
- Parameters:
- ref_intervals_hierlist of list-like
- ref_labels_hierlist of list of str
- est_intervals_hierlist of list-like
- est_labels_hierlist of list of str
Hierarchical annotations are encoded as an ordered list of segmentations. Each segmentation itself is a list (or list-like) of intervals (*_intervals_hier) and a list of lists of labels (*_labels_hier).
- **kwargs
additional keyword arguments to the evaluation metrics.
- Returns:
- scoresOrderedDict
Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.
T-measures are computed in both the “full” (
transitive=True
) and “reduced” (transitive=False
) modes.
- Raises:
- ValueError
Thrown when the provided annotations are not valid.
Examples
A toy example with two two-layer annotations
>>> ref_i = [[[0, 30], [30, 60]], [[0, 15], [15, 30], [30, 45], [45, 60]]] >>> est_i = [[[0, 45], [45, 60]], [[0, 15], [15, 30], [30, 45], [45, 60]]] >>> ref_l = [ ['A', 'B'], ['a', 'b', 'a', 'c'] ] >>> est_l = [ ['A', 'B'], ['a', 'a', 'b', 'b'] ] >>> scores = mir_eval.hierarchy.evaluate(ref_i, ref_l, est_i, est_l) >>> dict(scores) {'T-Measure full': 0.94822745804853459, 'T-Measure reduced': 0.8732458222764804, 'T-Precision full': 0.96569179094693058, 'T-Precision reduced': 0.89939075137018787, 'T-Recall full': 0.93138358189386117, 'T-Recall reduced': 0.84857799953694923}
A more realistic example, using SALAMI pre-parsed annotations
>>> def load_salami(filename): ... "load SALAMI event format as labeled intervals" ... events, labels = mir_eval.io.load_labeled_events(filename) ... intervals = mir_eval.util.boundaries_to_intervals(events)[0] ... return intervals, labels[:len(intervals)] >>> ref_files = ['data/10/parsed/textfile1_uppercase.txt', ... 'data/10/parsed/textfile1_lowercase.txt'] >>> est_files = ['data/10/parsed/textfile2_uppercase.txt', ... 'data/10/parsed/textfile2_lowercase.txt'] >>> ref = [load_salami(fname) for fname in ref_files] >>> ref_int = [seg[0] for seg in ref] >>> ref_lab = [seg[1] for seg in ref] >>> est = [load_salami(fname) for fname in est_files] >>> est_int = [seg[0] for seg in est] >>> est_lab = [seg[1] for seg in est] >>> scores = mir_eval.hierarchy.evaluate(ref_int, ref_lab, ... est_hier, est_lab) >>> dict(scores) {'T-Measure full': 0.66029225561405358, 'T-Measure reduced': 0.62001868041578034, 'T-Precision full': 0.66844764668949885, 'T-Precision reduced': 0.63252297209957919, 'T-Recall full': 0.6523334654992341, 'T-Recall reduced': 0.60799919710921635}