A Robust Technique for Motion-based Video Sequences Temporal alignment

Cheng Lu and Mrinal Mandal

Department of Electrical and Computer Engineering, University of Alberta

Abstract: In this paper, we propose a novel technique for temporal alignment of video sequences with similar planar motions acquired using uncalibrated cameras. In this technique, we model the motion-based video temporal alignment problem as a spatio-temporal discrete trajectory point sets alignment problem. First, the trajectory of the interested object is tracked throughout the videos. A probabilistic method is then developed to calculate the spatial correspondence, i.e., homography, between trajectory point sets. Next, the dynamic time warping technique (DTW) is applied to the spatial correspondence information to compute the temporal alignment of the videos. The experimental results show that the proposed technique provides a superior performance, approximately 42% to 74% improvement, over existing techniques for videos with similar trajectory patterns.

The experimental results for real videos are displayed below.

Experimental results

Please download the sample results (about 5 to 15 MB for each file) by clicking the title with underline and play it with Windows Media Player or other media player. Any concern, please send email through lcheng4@uablerta.ca.

Evaluations on Synthetic data

Download the demo (80MB), details please refer to the paper

Evaluations on Real Videos (Download all the real video results by the whole package here (about 60 MB).)

-Videos with the same scene

The proposed technique is compared with other three techniques (RCB [1][2], STE [3], UBD [4]) on the UCF video [5] and coffee cup lifting videos.

UCF; Coffee cup lifting1; Coffee cup lifting2; Coffee cup lifting3.


-Videos with different scenes

Tai Chi Quan playing

The proposed technique is evaluated on one pair of videos with TaiChiQuan playing. Note the videos are captured from different view and different scenes.

Ball (orange) playing

The proposed technique is evaluated on three pairs of videos with ball throwing motion. Note the videos are captured from different view and different scenes.

Coffee cup lifting

The proposed technique is evaluated on four pairs of videos with coffee cup lifting and putting. Note that the similar action, i.e. lifting and putting the coffee cup, is performed by two different people under different scenes with different action speed.




1.       C. Rao, A. Yilmaz, and M. Shah, “View-invariant representation and recognition of actions,” International Journal of Computer Vision, vol. 50, no. 2, pp. 203-226, Nov, 2002.

2.       C. Rao, A. Gritai, M. Shah and T. F. S. Mahmood, View-invariant alignment and matching of video sequences, In proc. ICCV03, pp. 939-945, 2003.

3.       M. Singh, et al., "Optimization of Symmetric Transfer Error for Sub-frame Video Synchronization," in Computer Vision - ECCV 2008, Pt Ii, Proceedings. vol. 5303, D. Forsyth, et al., Eds., ed, 2008, pp. 554-567.

4.       C. Lu, M. Mandal. Efficient Temporal Alignment of Video Sequences Using Unbiased Bidirectional Dynamic Time Warping,” Journal of Electronic Imaging, vol. 19, no. 4, pp. 0501-0504, Aug 2010.

5.       Cen Rao, “View-Invariant Representations for Human Activity Recognition”, http://server.cs.ucf.edu/~vision/projects/ViewInvariance/ViewInvariance.html.