Cognition, Behavior, and Memory
Author: Juan Octavio Castro | Email: joctavio287@gmail.com
Juan Octavio Castro1°2°, Joaquín E Gonzalez1°, Jazmín Vidal Dominguez1°, Pablo E Riera1°3°, Agustín Gravano4°5°, Juan E Kamienkowski1°3°6°
1° Laboratorio de Inteligencia Artificial Aplicada, Instituto de Ciencias de la Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires – CONICET, Argentina
2° Departamento de Física, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Argentina
3° Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Argentina
4° Laboratorio de Inteligencia Artificial, Universidad Torcuato Di Tella, Argentina; Escuela de Negocios, Universidad Torcuato Di Tella, Argentina
5° CONICET, Argentina
6° Maestría de Explotación de Datos y Descubrimiento del Conocimiento, Facultad de Ciencias Exactas y Naturales – Facultad de Ingeniería, Universidad de Buenos Aires, Argentina
Speech requires integrating phonetic, syntactic, semantic and prosodic information in real time, and its study in natural environments challenges traditional approaches in EEG analysis. In recent years, human neurophysiology studies have turned toward natural dynamic stimuli such as videos or natural speech, mostly driven by advances in signal processing, computational modeling and machine learning. Techniques such as encoding models are key to separating the signal from the artifacts produced by movement, which necessarily arise from interactions with the environment, and also allow analysis of more complex stimuli. In recent work, we have shown that these models perform well even during natural dialogues in predicting EEG signals from low-level attributes, such as envelope or spectrogram. In the present work, we aim to expand the study on low-level features (MFCCs, deltas) and gradually deepen the analysis into higher-level attributes such as phonemes, phonological features, semantic properties of words, indicators of turn-taking, and leadership. Preliminary results show that including these novel features outperforms previous models. Moreover, we plan to implement complex representations, mainly based on DNNs, such as wav2vec2 or x-vectors, to increase the performance of the model, opening up new possibilities for investigating the interaction between perception and action and increasingly less controlled stimuli.