Wearable motion understanding

AnyMo

Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild

Baiyu Chen1,2, Zechen Li1, Wilson Wongso1,2, Lihuan Li1,2, Xiachong Lin1, Hao Xue1,2,3,4, Benjamin Tag1, Flora Salim1,2

1The University of New South Wales

2ARC Centre of Excellence for Automated Decision Making + Society

3The Hong Kong University of Science and Technology (Guangzhou)

4The Hong Kong University of Science and Technology

AnyMo learns wearable IMU motion representations that transfer across sensing setups and datasets, while connecting sparse wearable signals to open-vocabulary recognition, retrieval, and motion captioning.

+11.7%
Accuracy

average zero-shot HAR gain across 14 unseen datasets

+28.6%
Text-to-IMU MRR

stronger bidirectional motion-language retrieval

+18.8%
BERT-F1

zero-shot wearable IMU captioning improvement

AnyMo method family overview and performance comparison

Overview

Wearable setup variation is structured, not arbitrary.

The signal measured by a wearable IMU is produced by the interaction of body motion, body-surface geometry, local sensor orientation, and device response. This structure explains why a wrist watch, glasses, or a phone in a pocket can observe the same activity through very different inertial patterns.

AnyMo uses this structure as an inductive bias. It simulates dense plausible IMU candidates over body-surface placements, pre-trains a spatio-temporal graph encoder from paired placement views and masked sparse observations, then converts setup-stable motion latents into compact full-body IMU tokens for motion-language modeling.

Physics-grounded surface simulation

Local surface frames from body normals and tangent planes define plausible wearable placements, orientations, and noisy IMU signals on the body mesh.

Setup-agnostic representation learning

Paired placement views and masked sparse observations train a graph encoder to recover full-body motion structure from setup-specific IMU windows.

Full-body tokenization and
alignment

The learned motion representation is quantized into compact full-body IMU tokens, then aligned with language for recognition, retrieval, and captioning.

Interactive geometry

Explore body-surface tangent planes.

The visualization shows template mesh points, segment placements, and the local surface geometry used to construct plausible wearable sensor setups. The spike vectors are surface normals, while the translucent squares are local tangent planes.

Normals show the outward direction of each candidate surface placement.
Tangent planes show the local surface frame used to orient simulated wearable IMUs.
Drag to rotate the 3D view; scroll inside the plot to zoom.

Method

From surface-aware simulation to language.

Physics-Grounded Geometry-Aware Simulation

Physics-Grounded Geometry-Aware Simulation

Geometry-Aware Setup-Agnostic Pre-Training and Tokenization

Geometry-Aware Setup-Agnostic Pre-Training and Tokenization

Tokenization and Pre-Training Detail

Tokenization and Pre-Training Detail

Contrastive Instruction Tuning and Inference

Contrastive Instruction Tuning and Inference

Quantitative Results

AnyMo improves recognition, retrieval, and captioning.

The tables summarize the main benchmark results from the paper. Purple bold marks the best result, while purple underline marks the second-best result.

Zero-Shot HAR Comparison

Recognition performance across easy, medium, and hard datasets.

MethodMetricOpportunityUCI-HARw-HARRealWorldTNDA-HAREgoExo4DOpenPackPAMAP2USC-HADWISDMDSADSUTD-MHADEgo4DMMEAAverage
Number of Classes4678881012121819273132
LevelEasyEasyEasyEasyEasyEasyMediumMediumMediumMediumMediumHardHardHardAverage
ImageBindAcc59.314.08.217.422.68.211.612.811.26.96.22.32.84.113.4
F139.66.76.110.418.68.58.37.76.94.43.60.80.81.38.8
R@283.520.252.527.435.312.520.715.523.513.58.24.76.47.623.7
IMU2CLIPAcc47.330.136.127.326.810.99.514.014.812.411.53.71.85.318.0
F137.627.320.920.522.35.94.511.311.68.38.30.40.43.113.0
R@269.055.160.743.946.218.519.526.629.120.417.67.93.110.830.6
IMUGPTAcc10.11.167.216.914.312.511.48.96.08.37.53.74.63.412.6
F110.40.338.84.06.17.69.11.56.96.62.00.31.91.87.0
R@233.718.267.233.728.527.422.919.331.813.914.68.89.36.724.0
HARGPTAcc28.815.04.912.713.716.110.211.19.55.55.83.33.62.410.2
F117.312.73.15.35.412.05.52.13.63.53.41.51.10.95.5
R@247.031.411.531.725.232.022.723.017.611.812.19.37.86.320.7
UniMTSAcc45.935.259.043.659.123.111.547.230.527.831.522.83.76.131.9
F142.222.042.936.753.718.47.543.627.825.523.718.54.32.826.4
R@280.053.160.764.077.547.021.963.245.447.146.032.66.910.746.9
NormWearAcc26.03.73.316.812.210.59.87.98.74.40.73.72.02.78.0
F110.31.61.33.82.83.13.11.61.40.90.10.30.20.32.2
R@266.129.63.320.419.120.216.510.512.212.42.67.45.86.016.6
Gemma 4 26B TextAcc35.929.429.530.828.218.09.613.329.711.610.74.72.45.618.5
F123.519.912.718.819.311.87.67.412.48.07.62.01.21.911.0
R@268.460.531.158.953.140.719.431.142.720.822.010.26.79.433.9
Gemma 4 26B PlotAcc39.829.827.931.227.824.87.719.432.99.411.55.63.54.419.7
F133.021.610.623.615.111.94.510.814.06.17.51.21.32.011.7
R@274.459.632.858.150.124.818.235.446.816.921.510.77.97.933.2
AnyMoAcc59.456.557.448.459.430.213.152.627.725.436.316.38.68.135.7 (+11.7%)
F158.851.642.237.253.124.111.641.522.618.629.511.36.34.029.5 (+11.6%)
R@283.589.598.477.687.951.628.078.264.041.153.024.213.713.957.5 (+22.6%)

Cross-Modal Retrieval

Unseen and zero-shot retrieval on Nymeria held-out and EgoExo4D.

DatasetNymeria Held-outEgoExo4D Zero-shot
Method100 SamplesAll Samples100 SamplesAll Samples
R@1R@5R@10MRRR@1R@5R@10MRRR@1R@5R@10MRRR@1R@5R@10MRR
IMU -> Text
ImageBind0.06.014.05.00.10.20.30.31.05.08.04.60.10.20.30.3
IMU2CLIP1.06.012.05.50.00.10.30.32.010.023.08.20.00.30.50.4
UniMTS4.012.023.010.00.20.91.60.91.09.016.06.30.10.61.30.7
GPT-5.4 Mini1.07.011.04.4--------1.06.010.03.7--------
Gemma 4 26B2.09.016.06.1--------2.04.012.04.6--------
AnyMo28.063.077.044.62.39.515.47.02.09.027.09.50.20.71.40.8
Text -> IMU
ImageBind1.08.014.06.70.10.20.30.32.03.07.05.10.00.00.20.2
IMU2CLIP0.06.014.05.00.10.20.30.31.09.017.07.70.10.30.50.4
UniMTS1.06.012.05.50.10.20.40.31.05.010.05.30.00.10.30.2
GPT-5.4 Mini--------------------------------
Gemma 4 26B--------------------------------
AnyMo33.060.079.046.73.09.916.17.53.010.023.09.90.00.30.60.4

Wearable IMU Motion Captioning

Unseen and zero-shot caption generation on Nymeria and EgoExo4D.

MethodNymeria Held-outEgoExo4D Zero-shot
BLEU-1BLEU-4ROUGE-LMETEORBERT-F1BLEU-1BLEU-4ROUGE-LMETEORBERT-F1
GPT-5.4 Mini19.20.315.725.057.312.60.015.523.856.5
Gemma 4 26B16.20.013.621.556.53.50.04.66.455.1
AnyMo25.06.531.133.569.720.70.419.730.367.1

Qualitative Results

Visual evidence of geometry-grounded transfer.

These qualitative views show how AnyMo aligns synthetic and real wearable motion in the learned space, and how the model turns sparse IMU signals into full-body motion-language examples.

Real-synthetic alignment

UMAP visualization of AnyMo real and synthetic alignment

Motion captioning examples

Qualitative wearable IMU captioning examples

Resources

AnyMo-180, AnyMo Bench and synthetic data.

We curate AnyMo-180, a fine-grained activity-label vocabulary for Nymeria motion windows, and build dense body-surface IMU placements for geometry-aware simulation. Together with AnyMo Bench, these form one of the largest fine-grained IMU-based HAR training corpora and benchmarks for unseen-subject and cross-device evaluation.

180
AnyMo-180 activity classes
158,138
Labeled motion windows
2,374
Body-surface IMU positions

AnyMo Bench

A challenging fine-grained in-the-wild HAR benchmark.

AnyMo Bench evaluates recognition under two forms of generalization: fine-grained daily activities on unseen subjects, and cross-device transfer between co-located IMU units mounted at the head, left wrist, and right wrist.

211.6h
Real in-the-wild IMU
196
Subjects
4
Evaluation settings

Baseline Results on AnyMo Bench

Purple bold marks the best result in each setting, while purple underline marks the second-best result.

ModelAcc@1Acc@5Macro-F1
Fine150 / Unseen Subject
DeepConvLSTM35.363.017.2
MantisV238.565.222.8
COMODO37.865.216.0
Fine150 / Unseen Subject + Cross Device
DeepConvLSTM1.99.50.4
MantisV214.439.78.6
COMODO24.050.68.0
Core50 / Unseen Subject
DeepConvLSTM43.275.434.5
MantisV245.876.841.3
COMODO46.278.837.3
Core50 / Unseen Subject + Cross Device
DeepConvLSTM1.812.10.6
MantisV216.648.718.4
COMODO32.667.823.3

Citation

CopyCopied to clipboard
@article{chen2026anymo,
  title={AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild},
  author={Chen, Baiyu and Li, Zechen and Wongso, Wilson and Li, Lihuan and Lin, Xiachong and Xue, Hao and Tag, Benjamin and Salim, Flora},
  journal={arXiv preprint arXiv:2605.22715},
  year={2026}
}
UNSW
ADMSARC Centre of Excellence
HKUST Guangzhou