1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Tokenize text with both American and English words

Discussion in 'Computer Science' started by user3259111, Oct 8, 2018.

  1. user3259111

    user3259111 Guest

    I need to tokenize a corpus of abstracts from an international conference. The abstracts are usually American English but sometimes British English.

    Consequently, I get 2 tokens for “organization” and “organisation” or “color” and “colour”. Examples : https://en.oxforddictionaries.com/spelling/british-and-spelling

    Do you know a (python) library converting “British English” to “American English” (or vis versa) ?

    I would be happy to that ... (but I am french and my english is not soo good)


    Login To add answer/comment

Share This Page