You cannot register for this webinar

This webinar has ended. Thank you for your interest.

Topic
Data Science Seminar - Wenjing Liao
Date & Time

Selected Sessions:

Feb 25, 2025 01:25 PM

Description
Title: Exploiting Low-Dimensional Data Structures and Understanding Neural Scaling Laws of Transformers Abstract: When training deep neural networks, a model’s generalization error is often observed to follow a power scaling law dependent on the model size and the data size. Perhaps the best-known example of such scaling laws is for transformer-based large language models (LLMs), where networks with billions of parameters are trained on trillions of tokens of text. A theoretical interest in LLMs is to understand why transformer scaling laws exist. To answer this question, we exploit low-dimensional structures in language datasets by estimating its intrinsic dimension, and establish statistical estimation and mathematical approximation theories for transformers to predict the scaling laws. By leveraging low-dimensional data structures, we can explain transformer scaling laws in a way which respects the data geometry. Furthermore, we test our theory with empirical observations by training LLMs on language datasets and find strong agreement between the observed empirical scaling laws and our theoretical predictions.