Research

Current and past research projects

Current

Sparse Attention Patterns in Video Diffusion: Do They Preserve Semantic Consistency Across Frames?

● in progress

Sparse and linear attention approximations reduce compute in spatial dimensions, but video requires temporal attention for consistency. We're probing whether existing sparse attention masks used for efficiency inadvertently disrupt cross-frame token interactions, and whether simple structured sparsity patterns — like attending to the first frame plus a local window — can recover consistency at lower cost. Directly relevant to autoregressive streaming pipelines where full temporal attention is the main bottleneck.

Sparse Attention Video Diffusion Temporal Consistency SLA SageAttention

Temporal Attention Drift in Autoregressive Video Diffusion

● in progress

As autoregressive video models generate longer sequences, how do temporal attention patterns evolve? We're studying whether attention to early frames dilutes over time and whether this directly causes the quality degradation seen in long sequences. Testing whether forced attention anchoring to keyframes can recover consistency without retraining. Pure inference-time analysis — gives a mechanistic explanation for what Long Live and Self-Forcing++ are implicitly trying to fix.

AR Video Attention Drift Keyframe Anchoring Inference Analysis