Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 19 of 19

1	Sentence Boundary Extraction from Scientific Literature of Electric Double Layer Capacitor Domain: Tools and Techniques
	Md. Saef Ullah Miah; Junaida Sulaiman; Talha Bin Sarwar; Ateeqa Naseer; Fasiha Ashraf; Kamal Zuhairi Zamli; Rajan Jose
	In: Applied Sciences; Volume 12; Issue 3; Pages: 1352 (2022)
	BASE
	Show details

2	Free Software Tools for Computational Linguistics: An Overview ...
	Đurić D., Miloš. - : Zenodo, 2021
	BASE
	Show details

3	Free Software Tools for Computational Linguistics: An Overview ...
	Đurić D., Miloš. - : Zenodo, 2021
	BASE
	Show details

4	Free Software Tools for Computational Linguistics: An Overview ...
	Đurić D., Miloš. - : Zenodo, 2020
	BASE
	Show details

5	Free Software Tools for Computational Linguistics: An Overview ...
	Đurić D., Miloš. - : Zenodo, 2020
	BASE
	Show details

6	A Framework for the Eurasian Latin Archive using CLTK and NLTK ...
	Carbé, Emmanuela; Giannelli, Nicola. - : Zenodo, 2019
	BASE
	Show details

7	A Framework for the Eurasian Latin Archive using CLTK and NLTK ...
	Carbé, Emmanuela; Giannelli, Nicola. - : Zenodo, 2019
	BASE
	Show details

8	Survey on Sentiment Analysis Using Machine Learning ...
	Parth Deshmukh; Adesh Gadge; Aniket Ganbote. - : Zenodo, 2019
	BASE
	Show details

9	Survey on Sentiment Analysis Using Machine Learning ...
	Parth Deshmukh; Adesh Gadge; Aniket Ganbote. - : Zenodo, 2019
	BASE
	Show details

10	Evaluating morphosyntactic differences in narrative re-tell tasks between bilingual children with and without language impairment using computational methods
	Dowd, Erin Adams. - 2017
	BASE
	Show details

11	SNET : a statistical normalisation method for Twitter
	Sosamphan, Phavanh. - 2016
	BASE
	Show details

12	SNET : a statistical normalisation method for Twitter
	Sosamphan, Phavanh. - : Unitec Institute of Technology, 2016
	BASE
	Show details

13	SNET : a statistical normalisation method for Twitter
	Sosamphan, Phavanh. - 2016
	Abstract: One of the major problems in the era of big data use is how to ‘clean’ the vast amount of data on the Internet, particularly data in the micro-blog website Twitter. Twitter enables people to connect with their friends, colleagues, or even new people who they have never met before. Twitter, one of the world’s biggest social media networks, has around 316 million users, and 500 million tweets posted per day (Twitter, 2016). Undoubtedly, social media networks create huge opportunities in helping businesses build relationships with customers, gain more insights into their customers, and deliver more value to them. Despite all the advantages of Twitter use, comments – called tweets - posted on social media networks may not be all that useful if they contain irrelevant and incomprehensible information, therefore making it difficult to analyse. Tweets are commonly written in ‘ill-forms’, such as abbreviations, repeated characters, and misspelled words. These ‘noisy tweets’ become text normalisation challenges in terms of selecting the proper methods to detect and convert them into the most accurate English sentences. There are several existing text cleaning techniques which are proposed to solve the issues, however they possess some limitations and still do not achieve good results overall. In this research, our aim is to propose the SNET, a statistical normalisation method for cleaning noisy tweets at character-level (which contain abbreviations, repeated letters, and misspelled words) that combines different techniques to achieve more accurate and clean data. To clean noisy tweets, existing techniques are evaluated in order to find the best solution by combining techniques so as to solve all problems with high accuracy. This research proposes that abbreviations are converted to their standard form by using abbreviations dictionary lookup, while repeated characters are normalised by the Natural Language Toolkit (NLTK) platform and a dictionary based approach. Besides the NLTK, the edit distance algorithm is also utilised as a means of solving misspelling problems, while “Enchant” dictionary can be used to access the spell checking library. Furthermore, existing models, such as a spell corrector, can be deployed for conversion purposes, while text cleanser is advanced as superior for comparing the SNET with a baseline model. With experiments on a Twitter sample dataset, our results show that the SNET satisfies 88% accuracy in the Bilingual Evaluation Understudy (BLEU) score and 7% in the word error rate (WER) score, both of which are better than the baseline model. Devising such a method to clean tweets can make a great contribution in terms of its adoption in brand sentiment analysis or opinion mining, political analysis, and other applications seeking to make sound predictions.
	Keyword: 080109 Pattern Recognition and Data Mining; 150502 Marketing Communications; abbreviations; big data; data normalisation; micro-blogs; Natural Language Toolkit (NLTK); noisy tweets; normalisation; social media; spell checkers; text cleansers; tweets; Twitter
	URL: https://hdl.handle.net/10652/3508
	BASE
	Hide details

14	Computational Linguistic Analysis of Earthquake Collections
	Wakeley, Christopher; Orleans-Pobee, Kwamina; Bialousz, Kenneth. - 2014
	BASE
	Show details

15	Automatic Text Analysis Using Drupal
	Chai, Herman
	In: Computer Engineering (2013)
	BASE
	Show details

16	Multidisciplinary Instruction with the Natural Language Toolkit
	BIRD, S; Klein; Loper...
	In: TEACH CL-08 Third Workshop on Issues in Teaching Computational Linguistics (2008)
	BASE
	Show details

17	Managing Fieldwork Data with Toolbox and the Natural Language Toolkit
	Robinson, Stuart; Aumann, Greg; Bird, Steven. - : University of Hawai'i Press, 2007
	BASE
	Show details

18	Managing Fieldwork Data with Toolbox and the Natural Language Toolkit
	Stuart Robinson; Greg Aumann; Steven Bird
	In: Language Documentation & Conservation, Vol 1, Iss 1 (2007) (2007)
	BASE
	Show details

19	Desenvolupament d'un assistent de redacció per l'anglès com a llengua estrangera
	Adrian Roman, Mar
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern