Characterization of Learning to Generate Formal Language Strings
Bachelor Thesis
The goal is to train toy language models on various formal languages across the Chomsky hierarchy and characterize the training dynamics. Notions like grokking, double descent (on gradient updates, training data, parameter size), formation of circuits, etc. will be investigated. By default, Transformer-based models will be of key focus; however, state-space models and traditional RNNs will be experimented with as well.