D-038 | Brain representation of complex speech attributes during natural dialogue

D-038 | Brain representation of complex speech attributes during natural dialogue 150 150 SAN 2024 Annual Meeting

Cognition, Behavior, and Memory
Author: Juan Octavio Castro | Email: joctavio287@gmail.com


Juan Octavio Castro1°2°, Joaquín E Gonzalez, Jazmín Vidal Dominguez,  Pablo E Riera1°3°, Agustín Gravano4°5°, Juan E Kamienkowski1°3°6°

Laboratorio de Inteligencia Artificial Aplicada, Instituto de Ciencias de la Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires – CONICET, Argentina
Departamento de Física, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Argentina
Departamento de Computación, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Argentina
Laboratorio de Inteligencia Artificial, Universidad Torcuato Di Tella, Argentina; Escuela de Negocios, Universidad Torcuato Di Tella, Argentina
CONICET, Argentina
Maestría de Explotación de Datos y Descubrimiento del Conocimiento, Facultad de Ciencias Exactas y Naturales – Facultad de Ingeniería, Universidad de Buenos Aires, Argentina

Speech requires integrating phonetic, syntactic, semantic and prosodic information in real time, and its study in natural environments challenges traditional approaches in EEG analysis. In recent years, human neurophysiology studies have turned toward natural dynamic stimuli such as videos or natural speech, mostly driven by advances in signal processing, computational modeling and machine learning. Techniques such as encoding models are key to separating the signal from the artifacts produced by movement, which necessarily arise from interactions with the environment, and also allow analysis of more complex stimuli. In recent work, we have shown that these models perform well even during natural dialogues in predicting EEG signals from low-level attributes, such as envelope or spectrogram. In the present work, we aim to expand the study on low-level features (MFCCs, deltas) and gradually deepen the analysis into higher-level attributes such as phonemes, phonological features, semantic properties of words, indicators of turn-taking, and leadership. Preliminary results show that including these novel features outperforms previous models. Moreover, we plan to implement complex representations, mainly based on DNNs, such as wav2vec2 or x-vectors, to increase the performance of the model, opening up new possibilities for investigating the interaction between perception and action and increasingly less controlled stimuli.

Masterfully Handcrafted for Awesomeness

WE DO MOVE

YOUR WORLD

Greatives – Design, Marketing, Sales

Working Hours : 09:00 – 19:00
Address : 44 Oxford Street, London, UK 22004
Phone : +380 22 333 555