DOI
https://doi.org/10.25772/CPVN-K468
Author ORCID Identifier
0000-0002-9272-0077
Defense Date
2021
Document Type
Dissertation
Degree Name
Doctor of Philosophy
Department
Computer Science
First Advisor
Dr. Thomasz Arodz
Abstract
Language models employ a very large number of trainable parameters. Despite being highly overparameterized, these networks often achieve good out-of-sample test performance on the original task and easily fine-tune to related tasks. Recent observations involving, for example, intrinsic dimension of the objective landscape and the lottery ticket hypothesis, indicate that often training actively involves only a small fraction of the parameter space. Thus, a question remains how large a parameter space needs to be in the first place — the evidence from recent work on model compression, parameter sharing, factorized representations, and knowledge distillation increasingly shows that models can be made much smaller and still perform well. Here, we focus on factorized representations of matrices that underpin dense, embedding, and self-attention layers. We use a low-rank factorized representation of a reshaped and rearranged original matrix to achieve fast, space-efficient, and expressive embeddings and linear layers. We prove that stacking such low-rank layers increases their expressiveness, providing theoretical understanding for their effectiveness in deep networks. Our approach achieves a hundred-fold or more reduction in the space required to store the embeddings with almost no relative drop in accuracy in practical natural language processing tasks. In Transformer models, our approach leads to a more than ten-fold reduction in the number of total trainable parameters, including embedding, attention, and feed-forward layers, with little degradation in on-task performance. The approach operates out-of-the-box, replacing each parameter matrix with its compact equivalent while maintaining the architecture of the network.
Rights
© Aliakbar Panahi
Is Part Of
VCU University Archives
Is Part Of
VCU Theses and Dissertations
Date of Submission
8-11-2021