ARCH is a framework designed to benchmark audio representations. The goal is to provide a unified framework for researchers to compare their audio representations and to provide a benchmark for the community to evaluate their models. The project is currently in its first release. The details about the datasets and the models are available in the GitHub repository.



Results on the ARCH benchmark - Version 1.0

Model   Size   Audio Events Music Speech
  ESC-50     US8K     FSD50K     VIVAE   FMA     MTT     IRMAS     MS-DB     RAVDESS     A-MNIST     SLURP     EMOVO  
facebook/wav2vec2-base B 45.73 55.48 19.39 31.47 50.54 37.56 35.14 66.06 55.32 86.38 14.37 31.80
microsoft/wavlm-base B 49.88 61.84 17.63 36.31 48.71 34.93 32.62 54.18 67.94 99.50 30.98 43.08
microsoft/wavlm-base-plus B 58.73 64.07 21.57 36.17 56.17 38.24 35.76 57.51 52.20 99.63 28.06 36.73
facebook/hubert-base-ls960 B 58.90 67.28 24.53 40.48 54.63 38.78 36.65 58.46 65.28 99.58 33.75 40.48
facebook/data2vec-audio-base B 23.63 45.63 10.06 30.19 40.58 27.60 25.87 50.74 48.03 99.06 43.57 27.27
ALM/wav2vec2-base-audioset B 52.61 70.48 21.29 31.26 59.50 37.92 35.85 64.61 45.94 88.09 11.00 30.83
ALM/hubert-base-audioset B 68.80 79.09 31.05 40.06 65.87 43.44 47.67 67.81 63.54 98.84 20.53 33.39
facebook/wav2vec2-large-robust L 13.13 42.70 5.80 22.01 41.71 20.95 19.91 50.23 11.57 45.74 7.33 19.27
facebook/wav2vec2-xls-r-300m L 51.28 69.96 23.71 36.28 56.96 38.28 38.42 66.71 31.48 98.88 12.74 20.35
microsoft/wavlm-large L 67.20 70.92 32.21 42.51 61.13 41.29 42.53 68.00 71.76 99.75 42.34 45.29
facebook/hubert-large-ll60k L 63.98 70.00 29.51 40.95 54.79 38.36 36.81 64.08 72.57 99.95 45.26 43.76
facebook/data2vec-audio-large L 25.35 49.15 10.82 30.57 43.46 28.52 27.08 44.20 45.14 99.15 28.60 23.07
ALM/wav2vec2-large-audioset L 74.39 79.00 37.58 39.65 66.58 44.51 49.87 76.90 59.49 99.42 17.74 38.20
ALM/hubert-large-audioset L 71.52 75.63 37.41 44.28 67.54 43.35 50.46 77.82 73.26 99.59 20.46 38.61
facebook/wav2vec2-xls-r-1b XL 66.95 75.90 31.61 40.41 62.79 41.99 43.57 69.79 55.44 99.86 25.14 34.58
facebook/hubert-xlarge-ll60k XL 63.40 69.66 29.32 42.72 56.25 37.76 37.30 64.71 75.69 99.95 47.81 47.17

Best-performing model per size is highlighted in bold. Best performing model overall is highlighted in underlined.