Blog - Jiachen Lei

Jiachen Lei

Latest Posts

Is Rectified Flow theoretically better than Diffusion Model?

Analyzing why diffusion models may underperform compared to rectified flow in practice, and how to "cook" a high-quality model.

Sep 22, 2025

External Discussions

Why is Pre-norm more frequently used than Post-norm?

Twitter, Paper

Conclusion: Pre-norm does not disturb the gradient and scales cleanly.

The shape regularity of diffusion sampling trajectories

Twitter arXiv

Conclusion: Trajectories follow a linear-nonlinear-linear structure in a 3D sub-space.

3. Flexible patch size for ViT: How should we resize patch embedding?

arXiv

Conclusion: Start from principled experiment, the authors propose PI-Resize (a combination of transformation matrixs derived from bilinear interpolation) for "losslessly" upsampling weight embedding.

4. Why scaling qk by $\frac{1}{d}$ in Transformer?

Arxiv Blog

Conclusion: To prevent the qk value from increasing with embedding dimension, leading to saturated softmax and low-rank attention matrix.