Recent Posts

In August of 2015, my hands stopped working. I could still control them, but every movement accumulated more pain, so every motion came with a cost: getting dressed in the morning, sending a text, lifting a glass. I was interning at Google that summer about to begin a PhD in Scotland, but coding all day would have left me in agony. In relating this story, I often mention that for months before I learned to work without my hands, I had nothing to do but go to a bar and order a shot of vodka with a straw in it.

CONTINUE READING

Models can be built incrementally by modifying their hyperparameters during training. This is most common in transfer learning settings, in which we seek to adapt the knowledge in an existing model for a new domain or task. The more general problem of continuous learning is also an obvious application. Even with a predefined data set, however, incrementally constraining the topology of the network can offer benefits as regularization. Dynamic Hyperparameters The easiest incrementally modified models to train may be those in which hyperparameters are updated at each epoch.

CONTINUE READING

[Note - This is a repost of a post I made on my old blog while I was in undergrad. I’m including it in case someone finds it useful, since my old blog is defunct. I haven’t significantly edited it, so I’m sorry if it doesn’t fit into my current style.] This post is directed to a lay CS audience. I am an undergraduate in CS, so I consider myself part of that audience.

CONTINUE READING

Selected Publications

Concerns about interpretability, computational resources, and principled inductive priors have motivated efforts to engineer sparse neural models for NLP tasks. If sparsity is important for NLP, might well-trained neural models naturally become roughly sparse? Using the Taxi-Euclidean norm to measure sparsity, we find that frequent input words are associated with concentrated or sparse activations, while frequent target words are associated with dispersed activations but concentrated gradients. We find that gradients associated with function words are more concentrated than the gradients of content words, even controlling for word frequency.
ICML Workshop on Identifying and Understanding Deep Learning Phenomena, 2019

Research has shown that neural models implicitly encode linguistic features, but there has been no research showing emphhow these encodings arise as the models are trained. We present the first study on the learning dynamics of neural language models, using a simple and flexible analysis method called Singular Vector Canonical Correlation Analysis (SVCCA), which enables us to compare learned representations across time and across models, without the need to evaluate directly on annotated data. We probe the evolution of syntactic, semantic, and topic representations and find that part-of-speech is learned earlier than topic; that recurrent layers become more similar to those of a tagger during training; and embedding layers less similar. Our results and methods could inform better learning algorithms for NLP models, possibly to incorporate linguistic information more effectively.
NAACL, 2019

Abstract Meaning Representation (AMR), an annotation scheme for natural language semantics, has drawn attention for its simplicity and representational power. Because AMR annotations are not designed for human readability, we present AMRICA, a visual aid for exploration of AMR annotations. AMRICA can visualize an AMR or the difference between two AMRs to help users diagnose interannotator disagreement or errors from an AMR parser. AMRICA can also automatically align and visualize the AMRs of a sentence and its translation in a parallel text. We believe AMRICA will simplify and streamline exploratory research on cross-lingual AMR corpora.
Proc. of NAACL, 2015

Recent & Upcoming Talks

Learning Dynamics of Language Models
May 28, 2019
Paying the Panopticon
Feb 26, 2019
Learning Dynamics of Text Models With SVCCA
Jan 25, 2019