PDF Automata in Space: Formal Language Theory meets Neural Computation.
Code: https://github.com/JoseLlarena/ais
BibTeX Citation:
@unpublished{llarena2024automata,
author = {Llarena, J.},
title = {Automata in Space: Formal Language Theory meets Neural Computation},
note = {https://josellarena.github.io/assets/doc/ais-paper-v4.pdf},
year = {2024}
}
Automata in Space presents the main piece of work I’ve done in the last couple of years, laying the foundations of what I expect to be my core research topic moving forward. It intersects the fields of Formal Language Theory, Computational Psycho-/Neuro-/Linguistics and Machine Learning.
The heart of my approach is to use weighted automata to understand neural computation. Though the use of automata is not new, research on recurrent neural networks used them as early as the 80s 1 2, it’s only recently that researchers have used them to understand non-recurrent architectures like Transformers3 and to draw stronger theoretical links between Formal Language Theory and NNs4, with extensive empirical studies showing the strengths and limits of different architecture5.
My work contrasts with existing research in that I:
-
use the linear representation of Weighted Finite State Automata (WFSAs)6 as a mathematically sound reverse engineering normative models to compare the hidden space of neural networks against; vs the implicit or ad-hoc approaches often found in the literature
-
set the scope to be studying all distributed representations for linguistic capabilities; vs specific architectures or general tasks
-
tackle the study of neural representation mechanistically, treating models as white-boxes; vs extracting automata, proving expressivity limits or establishing empirical equivalences
-
favour a geometric computational-state view, making heavy use of 3-dimensional state-diagrams; vs feature-based analysis using 2D plots
1. limits the study to tasks that can be represented as regular languages (like XOR/PARITY). I plan to lift that restriction by employing Weighted Pushdown Automata, extending the scope to context-free languages. It also restricts its remit to classification and language modelling, a limitation I plan to overcome using Weighted Finite State Transducers. See the “Conclusion and Future Work” section in the paper for other limitations of this technique and the future research paths they suggest.
2. reflects my long-standing interest in how the higher-level organisation of language emerges from a lower-level non-symbolic substrate.
3. is an attempt to cut through the historically protracted arguments about whether a computer program is or isn’t a good model for human performance; and if not, how the human “algorithm” differs from the one learned by machines.
4. stems from a strongly held belief that neural computation is best causally explained in terms of a series of steps that (fail to) solve the task at hand, using well understood computational models (i.e. sequential automata), as opposed to the more common approach of comparing hidden representations of inputs.
If you are interested in this line of research, do get in touch through email, linkedin, mastodon or bluesky. Looking forward to hearing your thoughts.
References
-
Servan-Schreiber, D., Cleeremans, A. and McClelland, J., 1988. Learning sequential structure in simple recurrent networks. Advances in neural information processing systems, 1. ↩
-
Rodriguez P. Simple recurrent networks learn context-free and context-sensitive languages by counting. Neural Comput. 2001 Sep;13(9):2093-118. doi: 10.1162/089976601750399326. PMID: 11516 ↩
-
Rizvi-Martel, M., Lizaire, M., Lacroce, C. and Rabusseau, G., 2024, April. Simulating weighted automata over sequences and trees with transformers. In International Conference on Artificial Intelligence and Statistics (pp. 2368-2376). PMLR. ↩
-
Merrill, W., 2021. Formal language theory meets modern NLP. arXiv preprint arXiv:2102.10094. ↩
-
Delétang, G., Ruoss, A., Grau-Moya, J., Genewein, T., Wenliang, L.K., Catt, E., Cundy, C., Hutter, M., Legg, S., Veness, J. and Ortega, P.A., 2022. Neural networks and the chomsky hierarchy. arXiv preprint arXiv:2207.02098. ↩
-
Balle, B., Carreras, X., Luque, F.M. et al. Spectral learning of weighted automata. Mach Learn 96, 33–63 (2014). https://doi.org/10.1007/s10994-013-5416-x ↩