On the Linguistic and Social Aspects of Internet Slang

Created at 4pm, Jan 2

erhant

Culture

Contract ID

hsu48tl5lPxNR-lZufLsNdSGamScONk8PDsZz-41eDc

File Type

PDF

Entry Count

Embed. Model

jina_embeddings_v2_base_en

Index Type

hnsw

Kulkarni, V., & Wang, W. Y. (2017). Tfw, damngina, juvie, and hotsie-totsie: On the linguistic and social aspects of internet slang. arXiv preprint arXiv:1712.08291.

970.00.030.00.00.720.210.070.020.040.880.070.00.050.090.86Normalized Confusion Matrix ALP 0.4 CLI CLI ALP 0.0 BLE BLE 0.2 REDTrue label 0.8 0.6 REDPredicted label 0.2 0.4 0.960.00.040.00.00.960.040.00.040.060.890.010.110.130.130.63Normalized Confusion Matrix CLI ALP 0.8 ALP BLE BLE 0.6 REDTrue label CLI 0.0 REDPredicted label (a) Confusion Matrix Gold

id: a40b9fcc051719fbec27ec777a1cca6e - page: 6

(b) Confusion Matrix 600BUD Figure 7: Performance of our character n-gram model on the gold and 600BUD test sets in closed class setting. Note good performance on all classes. (ALP: Alphabetisms BLE: Blends CLI: Clippings and REDUP: Reduplicatives). score could be (a) the maximum probability over the known classes or (b) the negative entropy of the output probability distribution. Table 3 shows some of the top words detected by our model for each category on UrbanDictionary data. Our method effectively identifies instances of each class while also rejecting instances not belonging to four classes. We identify slang like E.V.I.L and S.P.E.W as Alphabetisms and detect Blends like Iretalian:Irish+Italian or Obamerica:Obama+America. Similarly our model is able to detect Clippings like Stevie (Steven), Bishie (bishounen) and Reduplicatives like hooty-hoo.

id: 8008d1c8ade1cc7dbb114cb84a1e22cf - page: 6

We also evaluate our model quantitatively in a closed class setting on a balanced manually created test sample of the UrbanDictionary data-set of 600 words (600BUD) over which we obtained a weighted F1-score of 86% (see Figure 7b for the confusion matrix on 600BUD). Finally, we evaluated our model in the open class setting using cross-class validation which yields a mean weighted F1 score of 66.43 implying that our model generalizes reasonably well to this open set recognition setting as well. 6 Category Alphabetisms Blends Clippings Reduplicatives

id: 93e9239070ed73a8e1ad77dcdfec238d - page: 6

Rejected Word A.D.E.D, E.V.I.L, S.P.E.W, C.H.U.D, S.E.A.L, S.W.I.M Iretalian (Ireland+Italian), Obamerica (Obama + America), Metapedia (Meta + Encyclopedia), Obroxious (Obnoxious+Rock), Cumbrella (Cum + Umbrella), Rainmelon (probably c.f No Rain + Blind Melon) stevie (cutback), cuttie (bishounen), hattie, cottie e-d-b-t-z, yu-gay-ho, yu-gay-oh, bug-a-boo, Roody-poo, hooty-hoo Darwin, edging, Pingo, Oil-Can, Wet-Seal, Flank, Baking (Steven), Bishie Table 3: Examples of top words detected by model for each class on the UrbanDictionary dataset along with words our algorithm correctly rejected. Observe that we can correctly identify labels for interesting words like Ireitalian, obroxious and cuttie.

id: 6391e03f7ecafaf81d742d2bd7ccf873 - page: 6

How to Retrieve?

# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "hsu48tl5lPxNR-lZufLsNdSGamScONk8PDsZz-41eDc", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "hsu48tl5lPxNR-lZufLsNdSGamScONk8PDsZz-41eDc", "level": 2}'