Exact Decoding of Phrase- ‐based Translation Models through Lagrangian Relaxation Yin- ‐Wen Chang (MIT), Michael Col ins (Columbia University) EMNLP 2011 reading
About the presenter • Name: Yoh Okuno • Software Engineer at Web company • Interest: NLP, Machine Learning, Data Mining • Skill: C/C++, Python, Hadoop, etc. • Weblog: http://d.hatena.ne.jp/nokuno/
Decoding in Phrase- ‐based SMT • Decoding in SMT is NP- ‐Hard – Approximate search: beam search – Exact search: ILP(Integer Linear Programming) • Propose adoption of Lagrangian relaxation and eﬃcient dynamic programming
Phrase- ‐based SMT Model • Reordering makes the problem complicated • Use 3- ‐gram language model L � L−1 � f (y) = h(e(y)) + g(pk) + ηδ(t(pk), s(pk+1)) k=1 k=1 LM Translation Distortion output: y =< p1p2...pL > x: input sentence phrase: pk = (s, t, e) η : negative constant distortion: δ(t, s) = |t + 1 − s|
Decoding with constraints • arg max f (y) Our purpose: solve y∈Y • Deﬁne y(i) = # of x_i is translated in y 1. Each word in the input is translated exactly once: y(i) = 1 for al i 2. Distortion limit: δ(t(pk), s(pk+1)) < d
Exact dynamic programming • Use states: (w1, w2, b, r) • w1, w2: trigram context words • b: bit string which input words are translated • r: end position of the previous phrase
Exact dynamic programming • Yet it is intractable
Decoding based on Lagrangian Relaxation • Consider broader set of Y arg max f (y) y∈Y � • Y’ use looser constraint: N � y(i) = N i=1 • That means, N words are translated
Eﬃcient Dynamic Programming • Use states: (w1, w2, n, r) – or (w1, w2, n, l, m, r) • n: number of translated words • (l,m): range of previous translated words • Transition as one phrase translation pk = (s, t, e)
Applying Lagrangian Relaxation • Solve relaxed problem + constraints arg max f (y) such that ∀ i, y(i) = 1 y∈Y � • Apply Lagrangian method � L(u, y) = f (y) + u(i)(y(i) − 1) i • Dual objective and dual problem: min L(u) = min max L(u, y) u u y∈Y �
Decoding by subgradient method
Intuitive interpretation • Lagrange multiplier u(i) penalizes or rewards input word i to be translated exactly once • Update: ut(i) = ut−1(i) − αt(yt(i) − 1) – Declease u(i) if y(i) > 1, – Inclease u(i) if y(i) = 0 – Do nothing if y(i) = 1
Input: dadurch konnen die qualit ¨ at und die regelm ¨ aßige postzustellung auch weiterhin sichergestel t werden . the quality and also the and the quality and also
the regular wil continue to be continue to be continue to..
in that way, and can thus quality in that way, the qualit and..
can the regular distribution should also ensure distribution..
the regular and regular and regular the quality and the ..
in that way, the quality of the quality of the distribution... output: in that way, the quality and the regular distribution should continue to be guaranteed.
Experimental summary • Language: German to English translation • Corpus: Europarl data (1,824 sentence) • Proposed method ﬁnds exact solutions on 99% • Average run time is 120 seconds • Moses makes search errors of 4 to 18%
Table 1: iteration and conversion • 97% of the examples converge within 120 iter.
Table 4: ILP/LP are too slow
Table 5: Moses search errors
Table 7: BLUE doesn’t improveL
Conclusion • Described an exact decoding algorithm for SMT using Lagrangian relaxation • Proposed method ﬁnds exact solutions on 99% samples within 120 seconds in average • Future work: apply Lagrangian relaxation to training algorithms for SMT
Transition for DP • Deﬁne transition as one phrase translation pk = (s, t, e) (w1, w2, n, l, m, r) −→ (w�1, w�2, n�, l�, m�, r�) (w�1, w�2) = (eM−1, eM ) if M > 1 (w2, e1) if M = 1 n� = n + t − s + 1