Research
Current and past research projects
Current
Sparse Attention Patterns in Video Diffusion: Do They Preserve Semantic Consistency Across Frames?
● in progressSparse and linear attention approximations reduce compute in spatial dimensions, but video requires temporal attention for consistency. We're probing whether existing sparse attention masks used for efficiency inadvertently disrupt cross-frame token interactions, and whether simple structured sparsity patterns — like attending to the first frame plus a local window — can recover consistency at lower cost. Directly relevant to autoregressive streaming pipelines where full temporal attention is the main bottleneck.
Temporal Attention Drift in Autoregressive Video Diffusion
● in progressAs autoregressive video models generate longer sequences, how do temporal attention patterns evolve? We're studying whether attention to early frames dilutes over time and whether this directly causes the quality degradation seen in long sequences. Testing whether forced attention anchoring to keyframes can recover consistency without retraining. Pure inference-time analysis — gives a mechanistic explanation for what Long Live and Self-Forcing++ are implicitly trying to fix.