swaps the audio from one video into another, syncronizing with video hit points along the way.
The synchronisation is done by aligning the original audio for the video with the new audio. This is achieved by using beat, bar and section boundary detection gathered from The Echo Nest analyse. Alignment is done at all three levels and the incoming audio is timestretched using dirac with stretching ratios determined beat to beat, to ensure that attacks from the incoming song's beats are as close as possible to the location of the attacks for the original. Since most video editing aligns attacks with hit points, the new attacks also (hopefully!) line up with the visual hit points. Magic!
In the name of section alignment, the mash will always favor the sortest section, so the cut vids will be a bit shorter than either source