1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Learning word embeddings by first learning character embeddings

Discussion in 'Computer Science' started by LLB, Sep 16, 2020.

  1. LLB

    LLB Guest

    I was going through various papers for NLU applications(Natural Language Understanding).

    There I have observed a common pattern that for a word embeddings, following 3 combinations are used (may be using concatenation or any other technique).

    1. static word embedding (glove, word2vec)
    2. word embeddings learnt from character embeddings
    3. contextual word embedding

    Currently I am implementing one of those papers and I want to learn a word embedding from character embeddings (I don't want to use pre-trained char-embeddings like FastText)

    learn character embeddings ---> learn word embeddings from char embedding --> train my custom task

    So I am thinking what is the best way to do this?

    Let's say my data looks like this:


    row-1: What a sunny day. Beautiful! I feel happy.

    Our task is, we want find embeddings for each word which derived from char embedding.

    approach-1] use char level CNN (1ConvD)


    • Here features are nothing but unique characters in our data.


    • I am assuming result of this step will be, I will have an embedding for each character.


    • Then I will go word-by-word, e.g. 'sunny' --> 's'+'u'+'n'+'n'+'y' that is take a sum/max/avg of char embeddings to get a word embeddings.

      However this last step doesn't sound correct wrt backtracking operation. It feels disconnected from model chain. What am I missing here?

    approach-2] Use Bi-Dir LSTM Something like this as described in this Blog:

    [​IMG]

    But few things are not clear from this:


    1. "What a sunny day" Here how would I know word boundary? In above picture there was just 1 word so concatenation was easy.


    2. Since we are using hidden state as char embedding that means for same letter (e.g. 'C'), embedding will be different in 2 different words. Is my understanding correct?

    Login To add answer/comment
     

Share This Page