Automata in Spades

PDF Automata in Space: Formal Language Theory meets Neural Computation.

Code: https://github.com/JoseLlarena/ais

BibTeX Citation:

@unpublished{llarena2024automata,
  author    = {Llarena, J.},
  title     = {Automata in Space: Formal Language Theory meets Neural Computation},
  note      = {https://josellarena.github.io/assets/doc/ais-paper-v4.pdf},
  year      = {2024}
}

Automata in Space presents the main piece of work I’ve done in the last couple of years, laying the foundations of what I expect to be my core research topic moving forward. It intersects the fields of Formal Language Theory, Computational Psycho-/Neuro-/Linguistics and Machine Learning.

The heart of my approach is to use weighted automata to understand neural computation. Though the use of automata is not new, research on recurrent neural networks used them as early as the 80s ¹ ², it’s only recently that researchers have used them to understand non-recurrent architectures like Transformers³ and to draw stronger theoretical links between Formal Language Theory and NNs⁴, with extensive empirical studies showing the strengths and limits of different architecture⁵.

My work contrasts with existing research in that I:

use the linear representation of Weighted Finite State Automata (WFSAs)⁶ as a mathematically sound reverse engineering normative models to compare the hidden space of neural networks against; vs the implicit or ad-hoc approaches often found in the literature
set the scope to be studying all distributed representations for linguistic capabilities; vs specific architectures or general tasks
tackle the study of neural representation mechanistically, treating models as white-boxes; vs extracting automata, proving expressivity limits or establishing empirical equivalences
favour a geometric computational-state view, making heavy use of 3-dimensional state-diagrams; vs feature-based analysis using 2D plots

1. limits the study to tasks that can be represented as regular languages (like XOR/PARITY). I plan to lift that restriction by employing Weighted Pushdown Automata, extending the scope to context-free languages. It also restricts its remit to classification and language modelling, a limitation I plan to overcome using Weighted Finite State Transducers. See the “Conclusion and Future Work” section in the paper for other limitations of this technique and the future research paths they suggest.

2. reflects my long-standing interest in how the higher-level organisation of language emerges from a lower-level non-symbolic substrate.

3. is an attempt to cut through the historically protracted arguments about whether a computer program is or isn’t a good model for human performance; and if not, how the human “algorithm” differs from the one learned by machines.

4. stems from a strongly held belief that neural computation is best causally explained in terms of a series of steps that (fail to) solve the task at hand, using well understood computational models (i.e. sequential automata), as opposed to the more common approach of comparing hidden representations of inputs.

If you are interested in this line of research, do get in touch through email, linkedin, mastodon or bluesky. Looking forward to hearing your thoughts.

References