D-130 | Analysis of Semantic Bias in ChatGPT: Analyzing Generated Bias Across the Model’s Layers

D-130 | Analysis of Semantic Bias in ChatGPT: Analyzing Generated Bias Across the Model’s Layers 150 150 SAN 2024 Annual Meeting

Theoretical and Computational Neuroscience
Author: María Belén Paez | Email: bpaez2@gmail.com


Maria Belen Paez, Facundo Totaro, Julieta Laurino,  Laura Kaczer, Juan Kamienkowski1°2°4°, Bruno Bianchi1°2°

Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación. Buenos Aires, Argentina.
CONICET-Universidad de Buenos Aires. Instituto de Ciencias de la Computación (ICC). Buenos Aires, Argentina.
Universidad de Buenos Aires. Departamento de Fisiología, Biología Molecular y Celular. Buenos Aires, Argentina.
Facultad de Ciencias Exactas y Naturales, Maestría en Explotación de Datos y Descubrimiento del Conocimiento, Universidad de Buenos Aires, Buenos Aires, Argentina

The assignment of meaning to a word is a key process in language comprehension. However, since words can have more than one meaning (i.e., homonym and polysemy), there is additional complexity to consider. Nevertheless, both our brain and state-of-the-art Language Models (LM) can assign meaning to each word when processing it, taking into account the context in which it appears. The aim of the present study is to understand if the mechanisms used by the brain and the LM to solve this task are analogous. Current LM processes text by executing a series of transformations sequentially. For these, words are converted to embeddings (i.e., the vectorized representation) that are modified, introducing information about the surrounding context, as they pass through the layers. Our goal is to compare the human behavior of meaning assignment (measured in an online experiment) with model neural representations across layers, focusing on how embeddings of ambiguous words vary depending on several biasing contexts. We measured the model’s bias as the cosine distance between the embedding of the meaning and the contextualized embedding of the ambiguous word in a given layer. Our results show a non-linear progression of the biasing. This is in line with previous works, and suggests that middle and final layers process different aspects of the input texts.

Masterfully Handcrafted for Awesomeness

WE DO MOVE

YOUR WORLD

Greatives – Design, Marketing, Sales

Working Hours : 09:00 – 19:00
Address : 44 Oxford Street, London, UK 22004
Phone : +380 22 333 555