We introduce a model for constructing vector representations of words by composing characters using bidirectional LSTMs
Below are more highlights from Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation
our model requires only a single vector per character type and a fixed set of parameters for the compositional model. Despite the compactness of this model and, more importantly, the arbitrary nature of the form–function relationship in language, our “composed” word representations yield state-of-the-art results in language modeling and part-of-speech tagging. Benefits over traditional baselines are particularly pronounced in morphologically rich languages
it is manifestly clear that similarity in form is neither a necessary nor sufficient condition for similarity in function: small orthographic differences may correspond to large semantic or syntactic differences (butter vs. batter), and large orthographic differences may obscure nearly perfect functional correspondence (rich vs. affluent). Thus, any orthographically aware model must be able to capture non-compositional effects in addition to more regular effects due to, e.g., morphological processes. To model the complex form–function relationship, we turn to long short-term memories (LSTMs), which are designed to be able to capture complex non-linear and non-local dynamics in sequences
our character-based model is able to generate similar representations for words that are semantically and syntactically similar, even for words are orthographically distant (e.g., October and January)
The goal of our work is not to overcome existing benchmarks, but show that much of the feature engineering done in the benchmarks can be learnt automatically from the task specific data. More importantly, we wish to show large dimensionality word look tables can be compacted into a lookup table using characters and a compositional model allowing the model scale better with the size of the training data. This is a desirable property of the model as data becomes more abundant in many NLP tasks.
The authors have also released Java code for training neural networks.