ECGSまとめ。

自然言語処理 LDA

Standard-CGS ・全ドキュメント、全単語の「出現回数」分処理をするので重たい。 ECGS-Shortcut ・全ドキュメントの単語は同じtopicに属するものと、ざっくり決める。・100回出現していても1回で決めれる。 ECGS-Dynamic ・全ドキュメントの単語は異なるtop…

2012-08-08

4.2 ECGS-Dynamic (4/4)

自然言語処理 LDA

Figure 5 shows the plot of E(|M|) and H(ˆk) for different values of Idi. One can observe that as the entropy increases, the expectation value increases as well for large values of Idi. Intuitively, when P(z|w) is close to an uniform distri…

2012-08-07

4.2 ECGS-Dynamic (3/4)

LDA 自然言語処理

To achieve a high sampling rate at the beginning, Γdi is initialized as [0, · · · , 0, γ] for every type, where γ is the damping factor and γ ∈ N+. まず最初に高いsampling rateを保持するために、全てのtypeに対してΓdiを[0,...0, γ ]で初期化す…

2012-08-06

4.2 ECGS-Dynamic (2/3)

自然言語処理 LDA

An illustrated comparison of different sampling procedures is provided in Figure 4. 図4に異なるsampling処理の比較図を示す。 Given for instance an 8-tokens document with 5 times “cat” and 3 times “dog”. The standard CGS samples 8 times for …

2012-08-06

4.2 ECGS-Dynamic (1/3)

自然言語処理 LDA

4.2 ECGS-Dynamic The intuition behind ECGS-Dynamic is to automatically adapt ¯ St over the iterations, where ¯ St ∈ [ Σd Nd/W, 1]. Loosely speaking, we start CGS with a high sampling rate (e.g. ¯ S1 = 1), such that every token is involved…

2012-08-05

4.1 ECGS-Shortcut

自然言語処理 LDA

4.1 ECGS-Shortcut To introduce our optimization strategies, we first define the sampling rate in CGS iteration t for type wdi as follows: 我々の最適化戦略を導入するために、CGS iteration tにおけるwdiに対するsampling rateを以下のように定義…

2012-08-05

4. ECGS Algorithm

自然言語処理 LDA

4. ECGS Algorithm In this section, two optimization strategies that form the ECGS algorithm are presented. The underling philosophy of the two strategies is the same: reducing the number of iterations Ndi in the 3rd loop. For the sake of c…

2012-08-05

文章における「単語の出現頻度」ベースで高速化、かなあ？

自然言語処理 LDA

http://home.in.tum.de/~xiaoh/pub/ecgs.html http://home.in.tum.de/~xiaoh/pub/ecgs_slides.pdf 次はこのあたりかなあ？出現回数が多いものに積極的に割り当てていく、的なものかな。 dog:5 cat:3 → だったら、dog:cat = 5:3で、と。

2012-08-04

FastLDAまとめ

自然言語処理 LDA

文書ごとにtopicはどうせ最後は集中するでしょ？だったら、0～K全部のtopicを順番に見なくてもいいじゃない。どうせ大体外れると。だったら、indx[Document][seq] にtopic参照すべき順番を入れておきましょう。例えば、t=3になる文章ならば、indx[] = { 3,…

2012-08-04

FastLDA つまり……どういうことだってばよ！？

自然言語処理 LDA

U = drand(); // draw indx = indx_dp[doc]; // point indx to current sort order U = 0.0～1.0の正規乱数。 indx は文章毎の「Z参照順番」 for (t_ = 0; t_ < T; t_++) { // for each possible topic (sorted order) t=indx[t_]; // t_ refers to sort #, …

2012-08-03

4.3 Fast LDA Algorithm

LDA 自然言語処理

4.3 Fast LDA Algorithm The sampling step for FastLDA begins with a sorted list of topics in descending order by Njk, the most popular topic for a document to the least popular. FastLDAのsampling stepは、まずNjkを逆順にするtopicのソート済…

2012-08-03

4.2 Refinement Sequence

自然言語処理 LDA

4.2 Refinement Sequence Finally, we must also consider the order in which the topics are evaluated. Execution time improves as the number of topics considered before we find the segment sk l containing u decreases. 最後に、我々はtopicを評…

2012-08-02

4.1 Upper Bounds for Z(2/2)

自然言語処理 LDA

H¨older’s inequality describes a class of bounds, for any valid choice of (p, q, r); these values are a design choice of the algorithm. A critical aspect in the choice of bounds is that it must be computationally efficient to maintain. In…

2012-08-02

4.1 Upper Bounds for Z(1/2)

自然言語処理 LDA

FastLDA depends on finding a sequence of improving bounds on the normalization constant, Z1 ≥ Z2 ≥ . . . ≥ ZK = Z. We first define Z in terms of component vectors ~a,~b,~c: FastLDAではZ1 ≥ Z2 ≥ . . . ≥ ZK = Zとなるような正規化定数において…

2012-08-02

Fast LDA(3/3)

自然言語処理 LDA

By organizing the segments in this way, we can obtain a substantial advantage: この方法でsegmentを組織化することで、以下のようなadvantageを得る事が出来る。 we can check each segments in order, knowing only p1 . . . pk and Zk, and if we fin…

2012-08-02

Fast LDA(2/3)

自然言語処理 LDA

Instead of having a single segment for topic k, of length pk/Z = p(zij = k|z¬ij , x, α, β), we instead have several segments skl . . . sk K associated with each topic. pk/Z の長さに基づくtopic kの１つのsegmentの代わりに、我々はそれぞれのt…

2012-08-02

Fast LDA(1/3)

自然言語処理 LDA

大体、CGSのLDAは挙動が解ってきたので、Fast LDAの理解に入る。 http://www.ics.uci.edu/~welling/publications/papers/rtp378-porteous.pdf 4. FAST LDA For most real data sets after several iterations of the Gibbs sampler, the probability mass o…

2012-08-01

ほそく、ほそく

LDA 自然言語処理

new_z = numpy.random.multinomial(1, p_z / p_z.sum()).argmax() multinomialは（第１引数）回だけ、（第二引数）確率の試行を繰り返し行って、その時の回数を求めるかな？で、その引数に対して最大値を求めれば「一番選択されうるもの」になると。

anobiidae's blog

個人的に自然言語処理とかに関して、調べ物したもののメモとかを残す場所。

2012-08-01から1ヶ月間の記事一覧

ECGSまとめ。

4.2 ECGS-Dynamic (4/4)

4.2 ECGS-Dynamic (3/4)

4.2 ECGS-Dynamic (2/3)

4.2 ECGS-Dynamic (1/3)

4.1 ECGS-Shortcut

4. ECGS Algorithm

文章における「単語の出現頻度」ベースで高速化、かなあ？

FastLDAまとめ

FastLDA つまり……どういうことだってばよ！？

4.3 Fast LDA Algorithm

4.2 Refinement Sequence

4.1 Upper Bounds for Z(2/2)

4.1 Upper Bounds for Z(1/2)

Fast LDA(3/3)

Fast LDA(2/3)

Fast LDA(1/3)

ほそく、ほそく