A Machine Readable Sense Inventory for Emoji

Here you will find links to download the EmojiSim508 dataset and our Emoji embedding models for calculating emoji similarity. Please cite the following paper if you use EmoSim508 dataset in your project.

Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran. A Semantics-Based Measure of Emoji Similarity. In 2017 IEEE/WIC/ACM International Conference on Web Intelligence (Web Intelligence 2017). Leipzig, Germany; 2017. [Kno.e.sis Library Page] | [PDF] | [BibTeX] | [EmoSim508 Dataset]

EmoSim508 Dataset - Emoji Pairs

Description This dataset consists of 508 Emoji pairs. We used co-occurence frequency to select the emoji pairs for this dataset. We selected top-k emoji pairs that covers 25% of emoji pairs in our Twitter corpus. These 508 emoji pairs have 158 unique emoji.
URL http://emojinet.knoesis.org/emojipairs508.htm

EmoSim508 Dataset - User Ratings

Description We use human annotators to assign similarity scores for each emoji pair in the EmoSim508 dataset. A total of ten annotators were annotated this dataset.
URL http://emojinet.knoesis.org/emojipairs508_userstudy.htm

Semantic Similarity of Emoji - Results

Description We learn emoji embeddings by using Twitter and Google corpora and encode different emoji meaning represenations available in EmojiNet using them. We learn 8 different emoji embedding models and use each embedding model to rank emoji pairs based on the similarity.
Download URL http://emojinet.knoesis.org/emojipairs508_userstudy_embedding.htm