mir_eval.transcription_velocity

Transcription evaluation, as defined in mir_eval.transcription, does not take into account the velocities of reference and estimated notes. This submodule implements a variant of mir_eval.transcription.precision_recall_f1_overlap() which additionally considers note velocity when determining whether a note is correctly transcribed. This is done by defining a new function mir_eval.transcription_velocity.match_notes() which first calls mir_eval.transcription.match_notes() to get a note matching based on onset, offset, and pitch. Then, we follow the evaluation procedure described in [1] to test whether an estimated note should be considered correct:

  1. Reference velocities are re-scaled to the range [0, 1].

  2. A linear regression is performed to estimate global scale and offset parameters which minimize the L2 distance between matched estimated and (rescaled) reference notes.

  3. The scale and offset parameters are used to rescale estimated velocities.

  4. An estimated/reference note pair which has been matched according to the onset, offset, and pitch is further only considered correct if the rescaled velocities are within a predefined threshold, defaulting to 0.1.

mir_eval.transcription_velocity.match_notes() is used to define a new variant mir_eval.transcription_velocity.precision_recall_f1_overlap() which considers velocity.

Conventions

This submodule follows the conventions of mir_eval.transcription and additionally requires velocities to be provided as MIDI velocities in the range [0, 127].

Metrics

References

mir_eval.transcription_velocity.validate(ref_intervals, ref_pitches, ref_velocities, est_intervals, est_pitches, est_velocities)

Check that the input annotations have valid time intervals, pitches, and velocities, and throws helpful errors if not.

Parameters:
ref_intervalsnp.ndarray, shape=(n,2)

Array of reference notes time intervals (onset and offset times)

ref_pitchesnp.ndarray, shape=(n,)

Array of reference pitch values in Hertz

ref_velocitiesnp.ndarray, shape=(n,)

Array of MIDI velocities (i.e. between 0 and 127) of reference notes

est_intervalsnp.ndarray, shape=(m,2)

Array of estimated notes time intervals (onset and offset times)

est_pitchesnp.ndarray, shape=(m,)

Array of estimated pitch values in Hertz

est_velocitiesnp.ndarray, shape=(m,)

Array of MIDI velocities (i.e. between 0 and 127) of estimated notes

mir_eval.transcription_velocity.match_notes(ref_intervals, ref_pitches, ref_velocities, est_intervals, est_pitches, est_velocities, onset_tolerance=0.05, pitch_tolerance=50.0, offset_ratio=0.2, offset_min_tolerance=0.05, strict=False, velocity_tolerance=0.1)

Match notes, taking note velocity into consideration.

This function first calls mir_eval.transcription.match_notes() to match notes according to the supplied intervals, pitches, onset, offset, and pitch tolerances. The velocities of the matched notes are then used to estimate a slope and intercept which can rescale the estimated velocities so that they are as close as possible (in L2 sense) to their matched reference velocities. Velocities are then normalized to the range [0, 1]. A estimated note is then further only considered correct if its velocity is within velocity_tolerance of its matched (according to pitch and timing) reference note.

Parameters:
ref_intervalsnp.ndarray, shape=(n,2)

Array of reference notes time intervals (onset and offset times)

ref_pitchesnp.ndarray, shape=(n,)

Array of reference pitch values in Hertz

ref_velocitiesnp.ndarray, shape=(n,)

Array of MIDI velocities (i.e. between 0 and 127) of reference notes

est_intervalsnp.ndarray, shape=(m,2)

Array of estimated notes time intervals (onset and offset times)

est_pitchesnp.ndarray, shape=(m,)

Array of estimated pitch values in Hertz

est_velocitiesnp.ndarray, shape=(m,)

Array of MIDI velocities (i.e. between 0 and 127) of estimated notes

onset_tolerancefloat > 0

The tolerance for an estimated note’s onset deviating from the reference note’s onset, in seconds. Default is 0.05 (50 ms).

pitch_tolerancefloat > 0

The tolerance for an estimated note’s pitch deviating from the reference note’s pitch, in cents. Default is 50.0 (50 cents).

offset_ratiofloat > 0 or None

The ratio of the reference note’s duration used to define the offset_tolerance. Default is 0.2 (20%), meaning the offset_tolerance will equal the ref_duration * 0.2, or 0.05 (50 ms), whichever is greater. If offset_ratio is set to None, offsets are ignored in the matching.

offset_min_tolerancefloat > 0

The minimum tolerance for offset matching. See offset_ratio description for an explanation of how the offset tolerance is determined. Note: this parameter only influences the results if offset_ratio is not None.

strictbool

If strict=False (the default), threshold checks for onset, offset, and pitch matching are performed using <= (less than or equal). If strict=True, the threshold checks are performed using < (less than).

velocity_tolerancefloat > 0

Estimated notes are considered correct if, after rescaling and normalization to [0, 1], they are within velocity_tolerance of a matched reference note.

Returns:
matchinglist of tuples

A list of matched reference and estimated notes. matching[i] == (i, j) where reference note i matches estimated note j.

mir_eval.transcription_velocity.precision_recall_f1_overlap(ref_intervals, ref_pitches, ref_velocities, est_intervals, est_pitches, est_velocities, onset_tolerance=0.05, pitch_tolerance=50.0, offset_ratio=0.2, offset_min_tolerance=0.05, strict=False, velocity_tolerance=0.1, beta=1.0)

Compute the Precision, Recall and F-measure of correct vs incorrectly transcribed notes, and the Average Overlap Ratio for correctly transcribed notes (see mir_eval.transcription.average_overlap_ratio()). “Correctness” is determined based on note onset, velocity, pitch and (optionally) offset. An estimated note is considered correct if

  1. Its onset is within onset_tolerance (default +-50ms) of a reference note

  2. Its pitch (F0) is within +/- pitch_tolerance (default one quarter tone, 50 cents) of the corresponding reference note

  3. Its velocity, after normalizing reference velocities to the range [0, 1] and globally rescaling estimated velocities to minimize L2 distance between matched reference notes, is within velocity_tolerance (default 0.1) the corresponding reference note

  4. If offset_ratio is None, note offsets are ignored in the comparison. Otherwise, on top of the above requirements, a correct returned note is required to have an offset value within offset_ratio` (default 20%) of the reference note’s duration around the reference note’s offset, or within offset_min_tolerance (default 50 ms), whichever is larger.

Parameters:
ref_intervalsnp.ndarray, shape=(n,2)

Array of reference notes time intervals (onset and offset times)

ref_pitchesnp.ndarray, shape=(n,)

Array of reference pitch values in Hertz

ref_velocitiesnp.ndarray, shape=(n,)

Array of MIDI velocities (i.e. between 0 and 127) of reference notes

est_intervalsnp.ndarray, shape=(m,2)

Array of estimated notes time intervals (onset and offset times)

est_pitchesnp.ndarray, shape=(m,)

Array of estimated pitch values in Hertz

est_velocitiesnp.ndarray, shape=(n,)

Array of MIDI velocities (i.e. between 0 and 127) of estimated notes

onset_tolerancefloat > 0

The tolerance for an estimated note’s onset deviating from the reference note’s onset, in seconds. Default is 0.05 (50 ms).

pitch_tolerancefloat > 0

The tolerance for an estimated note’s pitch deviating from the reference note’s pitch, in cents. Default is 50.0 (50 cents).

offset_ratiofloat > 0 or None

The ratio of the reference note’s duration used to define the offset_tolerance. Default is 0.2 (20%), meaning the offset_tolerance will equal the ref_duration * 0.2, or offset_min_tolerance (0.05 by default, i.e. 50 ms), whichever is greater. If offset_ratio is set to None, offsets are ignored in the evaluation.

offset_min_tolerancefloat > 0

The minimum tolerance for offset matching. See offset_ratio description for an explanation of how the offset tolerance is determined. Note: this parameter only influences the results if offset_ratio is not None.

strictbool

If strict=False (the default), threshold checks for onset, offset, and pitch matching are performed using <= (less than or equal). If strict=True, the threshold checks are performed using < (less than).

velocity_tolerancefloat > 0

Estimated notes are considered correct if, after rescaling and normalization to [0, 1], they are within velocity_tolerance of a matched reference note.

betafloat > 0

Weighting factor for f-measure (default value = 1.0).

Returns:
precisionfloat

The computed precision score

recallfloat

The computed recall score

f_measurefloat

The computed F-measure score

avg_overlap_ratiofloat

The computed Average Overlap Ratio score

mir_eval.transcription_velocity.evaluate(ref_intervals, ref_pitches, ref_velocities, est_intervals, est_pitches, est_velocities, **kwargs)

Compute all metrics for the given reference and estimated annotations.

Parameters:
ref_intervalsnp.ndarray, shape=(n,2)

Array of reference notes time intervals (onset and offset times)

ref_pitchesnp.ndarray, shape=(n,)

Array of reference pitch values in Hertz

ref_velocitiesnp.ndarray, shape=(n,)

Array of MIDI velocities (i.e. between 0 and 127) of reference notes

est_intervalsnp.ndarray, shape=(m,2)

Array of estimated notes time intervals (onset and offset times)

est_pitchesnp.ndarray, shape=(m,)

Array of estimated pitch values in Hertz

est_velocitiesnp.ndarray, shape=(n,)

Array of MIDI velocities (i.e. between 0 and 127) of estimated notes

**kwargs

Additional keyword arguments which will be passed to the appropriate metric or preprocessing functions.

Returns:
scoresdict

Dictionary of scores, where the key is the metric name (str) and the value is the (float) score achieved.