Reading papers/NLP 논문
[논문 요약] SOM-DST: Efficient Dialogue State Tracking by Selectively Overwriting Memory
채채씨
2022. 2. 4. 19:11
728x90
반응형
SOM-DST
OVERVIEW
- 기존 접근 방식
- Traditional neural DST approaches assume that all candidate slot-value pairs are given in advance, i.e., they perform predefined ontology-based DST
- 문제점
- it is often difficult to obtain the ontology in advance, especially in a real scenario
- predefined ontologybased DST cannot handle previously unseen slot values
- the approach does not scale large since it has to go over all slot-value candidates at every turn to predict the current dialogue state
- 해결 방안
either directly generates or extracts a value from the dialogue context for every slot, allowing open vocabulary-based DST
-
- (1) predicting state operation on each of the memory slots(state operation prediction, which decides the types of the operations to be performed on each of the memory slots)
- (2) overwriting the memory with new values, of which only a few are generated according to the predicted state operations (slot value generation, which generates the values to be newly written on a subset of the memory slots)
- Encoder와 Decoder역할
- Our encoder, i.e., state operation predictor, can focus on selecting the slots to pass to the decoder so that the decoder, i.e., slot value generator, can focus only on generating the values of those selected slots
Selectively Overwriting Memory for Dialogue State Tracking
- 전체 구조의 입출력
- input :the previous turn dialogue utterances Dt−1, current turn dialogue utterances Dt, and the previous dialogue state Bt−1
- output : the current dialogue state Bt
- Two sub-components
- State Operation Predictor : takes Dt−1, Dt, and Bt−1 as the input and predicts the operations to perform on each of the slots
- Slot Value Generator : generates the values for the slots that take UPDATE as the predicted operation
- Problem setting
- Dialogue State
- $\mathcal{B}_{t}=\left\{\left(S^{j}, V_{t}^{j}\right) \mid 1 \leq j \leq J\right\}$ , at turn t
- a fixed-sized memory whose keys are slots S j and values are the corresponding slot value V j t , where J is the total number of such slots
- ※ "slot” refers to the concatenation of a domain name and a slot name
- Special Value
- Special Value : NULL, DONTCARE
- NULL : no information is given about the slot up to the turn
- DONTCARE : the slot neither needs to be tracked nor considered important in the dialogue at that time
- Special Value : NULL, DONTCARE
- Operation
- $r_{t}^{j} \in \mathcal{O}$ = {CARRYOVER, DELETE, DONTCARE, UPDATE}
- it either keeps the slot value unchanged (CARRYOVER) or changes it to some value different from the previous one (DELETE, DONTCARE, and UPDATE)
- $V_{t}^{j}= \begin{cases}V_{t-1}^{j} & \text { if } r_{t}^{j}=\text { CARRYOVER } \\ \text { NULL } & \text { if } r_{t}^{j}=\text { DELETE } \\ \text { DONTCARE } & \text { if } r_{t}^{j}=\text { DONTCARE } \\ v & \text { if } r_{t}^{j}=\text { UPDATE }\end{cases}$
- UPDATE operation requires the generation of a new value $v \notin$ {V j t−1 , NULL, DONTCARE} by slot value generator
- Dialogue State
State Operation Predictor
- Input Representation
- $D_{t}=A_{t} \oplus ; \oplus U_{t} \oplus[\mathrm{SEP}]$
- At is the system response, Ut is the user utterance
- ";" is a special token used to mark the boundary between At and Ut
- "[SEP]" is a special token used to mark the end of a dialogue turn
- dialogue state representation : $B_{t}=B_{t}^{1} \oplus \ldots \oplus B_{t}^{J}$ , where $B_{t}^{j}=[\mathrm{SLOT}]^{j} \oplus S^{j} \oplus-\oplus V_{t}^{j}$
- "-" is a special token used to mark the boundary between a slot and a value
- "$[\mathrm{SLOT}]^{j}$" is a special token used to aggregate the information of the j-th slot-value pair into a single vector, like the use case of [CLS] token in BERT
- input : $X_{t}=[\mathrm{CLS}] \oplus D_{t-1} \oplus D_{t} \oplus B_{t-1}$
- the previous turn dialog utterances + the current turn dialog utterances + the previous turn dialog state (serves as an explicit, compact, and informative representation of the dialogue history)
- [CLS] is a special token added in front of every turn input
- The input to BERT is the sum of the embeddings of the input tokens Xt, segment id embeddings, and position embeddings.
- For the segment id, we use 0 for the tokens that belong to Dt−1 and 1 for the tokens that belong to Dt or Bt−1
- The position embeddings follow the standard choice of BERT
- $D_{t}=A_{t} \oplus ; \oplus U_{t} \oplus[\mathrm{SEP}]$
- Encoder Output
- $H_{t} \in \mathbb{R}^{\left|X_{t}\right| \times d}$, and $h_{t}^{[\mathrm{CLS}]}, h_{t}^{[\mathrm{SLOT}]^{3}} \in \mathbb{R}^{d}$
- $h_{t}^{X}$ : sequence representation of the entire input Xt
- $h_{t}^{X}=\tanh \left(W_{\text {pool }} h_{t}^{[\mathrm{CLS}]}\right)$
- State Operation Prediction
- $P_{\text {opr }, t}^{j} \in \mathbb{R}^{|\mathcal{O}|}$ : the probability distribution over operations for the j-th slot at turn t
- $P_{o p r, t}^{j}=\operatorname{softmax}\left(W_{o p r} h_{t}^{[\text {SLOT }]^{j}}\right)$
- |O| = 4, because O = {CARRYOVER, DELETE, DONTCARE, UPDATE}
Slot Value Generator
[reference]
728x90
반응형