DOI

https://doi.org/10.25772/CPVN-K468

Author ORCID Identifier

0000-0002-9272-0077

Defense Date

2021

Document Type

Dissertation

Degree Name

Doctor of Philosophy

Department

Computer Science

First Advisor

Dr. Thomasz Arodz

Abstract

Language models employ a very large number of trainable parameters. Despite being highly overparameterized, these networks often achieve good out-of-sample test performance on the original task and easily fine-tune to related tasks. Recent observations involving, for example, intrinsic dimension of the objective landscape and the lottery ticket hypothesis, indicate that often training actively involves only a small fraction of the parameter space. Thus, a question remains how large a parameter space needs to be in the first place — the evidence from recent work on model compression, parameter sharing, factorized representations, and knowledge distillation increasingly shows that models can be made much smaller and still perform well. Here, we focus on factorized representations of matrices that underpin dense, embedding, and self-attention layers. We use a low-rank factorized representation of a reshaped and rearranged original matrix to achieve fast, space-efficient, and expressive embeddings and linear layers. We prove that stacking such low-rank layers increases their expressiveness, providing theoretical understanding for their effectiveness in deep networks. Our approach achieves a hundred-fold or more reduction in the space required to store the embeddings with almost no relative drop in accuracy in practical natural language processing tasks. In Transformer models, our approach leads to a more than ten-fold reduction in the number of total trainable parameters, including embedding, attention, and feed-forward layers, with little degradation in on-task performance. The approach operates out-of-the-box, replacing each parameter matrix with its compact equivalent while maintaining the architecture of the network.

Rights

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission

8-11-2021

Download

Included in

Artificial Intelligence and Robotics Commons, Data Science Commons

COinS

Theses and Dissertations

Improving Space Efficiency of Deep Neural Networks

DOI

Author ORCID Identifier

Defense Date

Document Type

Degree Name

Department

First Advisor

Abstract

Rights

Is Part Of

Is Part Of

Date of Submission

Included in

Browse

Search

Author Corner

Links

Theses and Dissertations

Improving Space Efficiency of Deep Neural Networks

Author

DOI

Author ORCID Identifier

Defense Date

Document Type

Degree Name

Department

First Advisor

Abstract

Rights

Is Part Of

Is Part Of

Date of Submission

Included in

Share

Browse

Search

Author Corner

Links