Tuesday, 27 August 2019 06:17

A MapReduce-based Quick Search Approach on Large Files

Ye-feng Li1, Jia-jin Le2, and Mei Wang2

1College of Computer Science and Technology, Beijing University of Technology, China

2College of Computer Science and Technology, Donghua University, China

Abstract: String search is an important branch of pattern matching for information retrieval in various fields. In the past four decades, the research importance has been attached on skipping more unnecessary characters to improve the search performance, and never taken into consideration on large scale of data. In this paper, two major achievements are contributed. At first, we propose a Quick Search algorithm for data Stream (QSS) on a single machine to support string search in a large text file, as opposed to previous researches that limits to a bound memory. For the next, we implement the search algorithm on MapReduce framework to improve the velocity of retrieving the search results. The experiments demonstrate that our approach is fast and effective for large files.

Keywords: String search, mapreduce, data stream and large file.

Received May 21, 2015; accepted September 24, 2017

Full text   


Tuesday, 27 August 2019 01:17

An Efficient Line Clipping Algorithm in 2D Space

Mamatha Elliriki1, Chandrasekhara Reddy2, and Krishna Anand3

1Department of Mathematics, GITAM University, India

2Department of Mathematics, Cambridge Institute of Technology-NC, India

3Department of Computer Science, Sreenidhi Institute of Science and Technology, India

Abstract: Clipping problem seems to be pretty simple from human perspective point of view since with visualization a line can easily be traced whether it is completely inside and if not what portion of the line lies outside the window. However, from system point of view, the number of computations and comparisons for lines with floating point calculations are extremely large which in turn adds to inherent complexity. It needs to minimize the number of computations thereby achieving a significant increase in terms of efficiency. In this work, a mathematical model has been proposed for evaluating intersection points thereby clipping lines which decently rely on integral calculations. Besides, no further computations are found to be necessary for evaluating intersection points. The performance of the algorithm seems to be consistently good in terms of speed for all sizes of clipping windows.

Keywords: Ortho lengths, raster graphics system, line clipping, intersection points, geometrical slopes, rectangle window.

Received July 23, 2015; accepted June 1, 2016
Full text  
Tuesday, 27 August 2019 01:16

Parameter Tuning of Neural Network for Financial Time Series Forecasting

Zeinab Fallahshojaei1 and Mehdi Sadeghzadeh2

1Department of Computer Engineering, Buin Zahra Branch, Islamic Azad University, Buin Zahra, Iran

2Department of Computer Engineering, Mahshahr Branch, Islamic Azad University, Mahshahr, Iran

Abstract: One of the most challengeable problems in pattern recognition domain is financial time series forecasting which aims to exactly estimate the cost value variations of a particular object in future. One of the best well-known financial time series prediction methods is Neural Network (NN) but it suffers from parameter tuning such as number of neuron in hidden layer, learning rate and number of periods that should be forecasted. To solve the problem, this paper proposes a new meta-heuristic-based parameter tuning scheme which is based on Harmony Search (HS). To improve the exploration and exploitation rates of HS, the control parameters of HS are adapted during the generations. Evaluation of the proposed method on several financial times series datasets shows the efficiency of the improved HS on parameter setting of NN for time series prediction.

Keywords: Financial times series forecasting, parameter setting, NN, HS, parameter adaptation.

Received November 1, 2015; accepted March 20, 2018
Full text  
Tuesday, 27 August 2019 01:15

­­­­An Optimized and Efficient Radial Basis Neural Network using Cluster Validity Index for Diabetes Classification

Ramalingaswamy Cheruku, Damodar Edla, and Venkatanareshbabu Kuppili

Department of Computer Science and Engineering, National Institute of Technology Goa, India

­­­­­Abstract: This Radial Basis Function Neural Networks (RBFNNs) have been used for classification in medical sciences, especially in diabetes classification. These are three layer feed forward neural network with input layer, hidden layer and output layer respectively. As the number of the training patterns increases the number of neurons in the hidden layer of RBFNNs increases, simultaneously network complexity increases and classification time increases. Although various efforts have been made to address this issue by using different clustering algorithms like k-means, k-medoids, and Self Organizing Feature Map (SOFM) etc. to cluster the input data of diabetic to reduce the size of the hidden layer. Though the main difficulty of determination of the optimal number of neurons in the hidden layer remains unsolved. In this paper, we present an efficient method for predicting diabetics using RBFNN with optimal number of neurons in the hidden layer. This study mainly focuses on determining the number of neurons in hidden layer using cluster validity indexes and also find out the weights between output layer and a hidden layer by using genetic algorithm. The proposed model was used to solve the problem of detection of Pima Indian Diabetes and gave an accuracy of 73.50%, which was better than most of the commonly known algorithms in the literature. And also proposed methodology reduced the complexity of the network by 90% in terms of number of connections, furthermore reduced the classification time of new patterns.

Keywords: Radial basis function networks, classification, medical diagnosis, diabetes, optimal number of clusters, genetic algorithm.

Received February 13, 2016; accepted February 8, 2017

Full text  

Tuesday, 27 August 2019 01:13

Edge Detection Optimization Using Fractional Order Calculus

Mohammed Mekideche and Youcef Ferdi

Department of Electrical Engineering, Skikda University, Algeria    

Abstract: In computer vision and image processing, time and quality are major factors taken into account. In edge detection process, the smoothing operation by a low-pass filter is commonly performed first in order to reduce noise effect. However, performing the smoothing operation partially requires additional computational time and alters true edges as well. Attempting to resolve such problems, a new approach dealing with edge detection optimization is addressed in this paper. For this purpose, a short edge detector algorithm without smoothing operation is proposed and investigated. This algorithm is based on a fractional order mask used as kernel of convolution for edge enhancement. It has been shown that in the proposed algorithm, the smoothing pre-process is no longer necessary; because, the efficiency of our fractional order mask is expressed in term of immunity to noise and the capability of detecting edges. Simulation results show how the quality of edge detection can be enhanced on adjusting the fractional order parameter. Then, our proposed edge detection method can be very useful in real time applications in some fields such as, satellite and medical imaging.

Keywords: Edge detection, fractional order calculus, computational time, smoothing operation, performances evaluation.

Received February 22, 2016; accepted April 17, 2017
Full text  
Tuesday, 27 August 2019 01:12

Predicting the Winner of Delhi Assembly Election,

2015 from Sentiment Analysis on Twitter

Data-A BigData Perspective

Lija Mohan and Sudheep Elayidom

Division of Computer Science, Cochin University of Science and Technology, India

Abstract: Social media is currently a place where people create and share contents at a massive rate. Because of its ease of use, speed and reach, it is fast changing the public discourse in society and setting trends and agendas in different topics including environment, politics technology, entertainment etc. As it is a form of collective wisdom, we decided to investigate its power at predicting real-world outcomes. The objective was to design a Twitter-based sentiment mining. We introduce a keyword-aware user-based collective tweet mining approach to rank the sentiment of each user. To prove the accuracy of this method, we chose an Election Winner Prediction application and observed how the sentiments of people on different political issues at that time got reflected in their votes. A Domain thesaurus is built by collecting keywords related to each issue. Twitter data being huge in size and difficult to process, we use a scalable and efficient Map Reduce programming model-based approach, to classify the tweets. The experiments were designed to predict the winner of Delhi Assembly Elections 2015, by analyzing the sentiments of people on political issues and from this analysis, we accurately predicted that Aam Admi Party has a higher support, compared to Bharathiya Janatha Party (BJP), the ruling party. Thus, a Big Data Approach that has widespread applications in today’s world, is used for sentiment analysis on Twitter data.

Keywords: Election winner prediction, big data, sentiment analysis, tweet mining, map reduce.

Revived February 26, 2016; accepted June 29, 2017
Full text  
Tuesday, 27 August 2019 01:11

A New Approach for A Domain-Independent Turkish Sentiment Seed Lexicon Compilation

Ekin Ekinci and Sevinç Omurca

Department of Computer Engineering, Kocaeli University, Turkey

Abstract: Sentiment analysis deals with opinions in documents and relies on sentiment lexicons; however, Turkish is one of the poorest languages in regard to having such ready-to-use sentiment lexicons. In this article, we propose a domain-independent Turkish sentiment seed lexicon, which is extended from an initial seed lexicon, consisting of 62 positive/negative seeds. The lexicon is completed by using the beam search method to propagate the sentiment values of initial seeds by exploiting synonym and antonym relations in the Turkish Semantic Relations Dataset. Consequently, the proposed method assigned 94 words as positive sentiments and 95 words as negative sentiments. To test the correctness of the sentiment seeds and their values the first sense, the total sum and weighted sum algorithms, which are based on SentiWordNet and SenticNet 3, are used. According to the weighted sum, experimental results indicate that the beam search algorithm is a good alternative to automatic construction of a domain-independent sentiment seed lexicon.

Keywords: Sentiment lexicon, beam search, pattern generation, turkish language, unsupervised framework.

Received March 29, 2016; accepted September 22, 2016
Full text  
Tuesday, 27 August 2019 01:10

Simulating Email Worm Propagation Based

on Social Network and User Behavior

Kexin Yin1, Wanlong Li1, Ming Hu3, and Jianqi Zhu2

1School of Computer Science and Engineering, Changchun University of Technology, China

2School of Computer Science and Technology, Jilin University, China

3School of Computer Technology and Engineering, Changchun Institute of Technology, China

Abstract: Email worms pose a significant security threat to organizations and computer users today. Because they propagate over a logical network, the traditional epidemic model is unsuitable for modeling their propagation over the internet. However, it is no doubt that accurate modeling the propagation of email worms is helpful to contain th9eir attacks in advance. This paper presents a novel email worms’ propagation model, which is based on a directed and weighted social network. Moreover, the effects of user’s behavior are also considered in this model. To the author’s knowledge, there is little information available considering the effects of them in modeling their propagation. A simulation algorithm is designed for verifying the effectiveness of the presented model. The results show that the presented model can describe the propagation of email worms accurately. Through simulating different containing strategies, we demonstrate that the infected key nodes in email social community can speed up the worm propagating. Last, a new General Susceptible Infectious Susceptible (G-SIS) email worm model is presented, which can predict the propagation scale of email worms accurately.

Keywords: Network security, Email worm propagation, social network, user Behavior, G-SIS.

Received April 2, 2016; accepted November 27, 2017

Full text  

Tuesday, 27 August 2019 01:08

Performance Analysis of Microsoft Network Policy

Server and FreeRADIUS Authentication Systems

in 802.1x based Secured Wired Ethernet using PEAP

Farrukh Chughtai1, Riaz UlAmin1, Abdul Sattar Malik2, and Nausheen Saeed3

1Department of Computer Science, Balochistan University of Information Technology Engineering and Management Sciences, Pakistan

2Department of Electrical Engineering, Bahauddin Zakariya University, Pakistan

3Department of Computer Science, Sardar Bahadur Khan University, Pakistan

Abstract: IEEE 802.1x is an industry standard to implement physical port level security in wired and wireless Ethernets by using RADIUS infrastructure. Administrators of corporate networks need secure network admission control for their environment in a way that adds minimum traffic overhead and does not degrade the performance of the network. This research focuses on two widely used Remote Authentication Dial In User Service (RADIUS) servers, Microsoft Network Policy Server (NPS) and FreeRADIUS to evaluate their efficiency and network overhead according to a set of pre-defined key performance indicators using Protected Extensible Authentication Protocol (PEAP) in conjunction with Microsoft Challenged Handshake Authentication Protocol version 2 (MSCHAPv2). The key performance indicators – authentication time, reconnection time and protocol overhead were evaluated in real test bed configuration. Results of the experiments explain why the performance of a particular authentications system is better than the other in the given scenario.

Keywords: IEEE 802.1x, Microsoft NPS, FreeRADIUS, PEAP, MSCHAP2, performance analysis, RADIUS.

Received May 28, 2016; accepted May 29, 2017
Full text  
Tuesday, 27 August 2019 01:07

Tree Based Fast Similarity Query Search Indexing on Outsourced Cloud Data Streams


Balamurugan Balasubramanian1, Kamalraj Durai1, Jegadeeswari Sathyanarayanan1, and Sugumaran Muthukumarasamy2

1Research Scholar, Computer Science, Bharathiar University, India

2Computer Science and Engineering, Pondicherry Engineering College, India

Abstract: A Cloud may be seen as flexible computing infrastructure comprising of many nodes that support several concurrent end users. To fully harness the power of the Cloud, efficient data query processing has to be ascertained. This work provides extra functionalities on cloud data query processing, a method called, Hybrid Tree Fast Similarity Query (HT-FSQS) Search is presented. The Hybrid Tree structure used in HT-FSQS consists of E-tree and R+ tree for balancing the load and performing similarity search. In addition, we articulate performance optimization mechanisms for our method by indexing quasi data objects to improve the quality of similarity search using R+ tree mechanism. Fast Similarity Query Search indexing build cloud data streams for handling different types of user queries and produce the result with lesser computational time. Fast Similarity Query Search uses inter-intra bin pruning technique, where it resolves the data more similar to user query. E- R+ tree FSQ method branch and bound search eliminates certain bins from consideration, speeding up the indexing operation. The experiment results demonstrate that the Hybrid Tree Fast Similarity Query (HT-FSQS) Search achieve significant performance gains in terms of computation time, quality of similarity search and load balance factor in comparison with non-indexing approaches.

Keywords: Cloud, hybrid tree, fast similarity query, e-tree, r+ tree.

Received June 2, 2016; accepted May 1, 2017
Full text  
Tuesday, 27 August 2019 01:05

A Cloud-based Architecture for Mitigating Privacy

Issues in Online Social Networks

Mustafa Kaiiali1, Auwal Iliyasu2, Ahmad Wazan3, Adib Habbal4, and Yusuf Muhammad5

1Centre for Secure Information Technologies, Queen's University Belfast, UK

2The Department of Computer Engineering, Kano State Polytechnic, Nigeria

3Département Informatique, Institut de Recherche en Informatique de Toulouse, France

4InterNetWorks Research Lab, School of Computing, Universiti Utara Malaysia, Malaysia

5The Department of Computer Science, Saadatu Rimi College of Education, Nigeria

Abstract: Online social media networks have revolutionized the way information is shared across our societies and around the world. Information is now delivered for free to a large audience within a short period of time. Anyone can publish news and information and become a content creator over the internet. However, along with these benefits is the privacy issue that raises a serious concern due to incidences of privacy breaches in Online Social Networks (OSNs). Various projects have been developed to protect users’ privacy in OSNs. This paper discusses those projects and analysestheir pros and cons. Then it proposes a new cloud-based model to shield up OSNs users against unauthorized disclosure of their private data. The model supports both trusted (private) as well as untrusted (3rd party) clouds. An efficiency analysis is provided at the end to show that the proposed model offers a lot of improvements over existing ones.

Keywords: Online social network, cloud computing, user’s privacy, access control, broadcast encryption.

Received June 17, 2016; accepted February 27, 2017
Full text  
Tuesday, 27 August 2019 01:04

A Trusted Virtual Network Construction Method

Based on Data Sources Dependence

Xiaorong Cheng1 and Tianqi LI2

1Department Computer Science, North China Electric Power University, China

2C-Epri Electric Power Engineering CO, LTD, China

Abstract: At present, the isolated and single data source cannot meet the needs of system security. Based on the research of the trusted computing theory, this paper creatively put forward a method to construct a trusted virtual network based on data source dependency. Firstly, the credibility of data source is calculated by the NEWACCU algorithm, and then, the trusted virtual network which is composed of the entity of data source is built dynamically by calculating the credibility between data sources, which will provide technical support for future credibility assessment and further research on information security. Taking the data of e-commerce platform as an example, the experimental results verify the effectiveness of the method.

Keywords: Data source, credibility, trusted virtual network, dynamics, modeling and simulation.

Received June 22, 2016; accepted April 11, 2017
Full text  
Tuesday, 27 August 2019 01:02

A Novel and Complete Approach for Storing RDF(S) in Relational Databases

Fu Zhang1, Qiang Tong2, and Jingwei Cheng1

1School of Computer Science and Engineering, Northeastern University, China

2School of Software, Northeastern University, China

Abstract: Resource Description Framework (RDF) and RDF Schema (collectively called RDF(S)) are the normative language to describe the Web resource information. With the massive growth of RDF(S) information, how to effectively store them is becoming an important research issue. By analysing the characteristics of RDF(S) data and schema semantic information in depth, this paper proposes a multiple storage model of RDF(S) based on relational databases. An overall storage framework, some detailed storage rules, a storage algorithm and a storage example are proposed. Also, the correctness of the storage approach is discussed and proved. Based on the proposed storage approach, a prototype storage tool is implemented, and experiments show that the approach and the tool are feasible.

Keywords: RDF, RDF schema, relational database, storage.

Received June 26, 2016; accepted October 11, 2017
Full text  
Tuesday, 27 August 2019 01:01

UTP: A Novel PIN Number Based User Authentication Scheme

Srinivasan Rajarajan and Ponnada Priyadarsini

 School of Computing, SASTRA Deemed University, India

Abstract: This paper proposes a Personal Identification Number (PIN) number based authentication scheme named User Transformed PIN (UTP). It introduces a simple cognitive process with which users may transform their PIN numbers into a dynamic one-time number. PIN numbers are widely used for the purpose of user authentication. They are entered directly and reused several times. This makes them vulnerable to many types of attacks. To overcome their drawbacks, One Time Password (OTPs) are combined with PIN numbers to form a stronger two-factor authentication. Though it is relatively difficult to attack OTPs, nevertheless OTPs are not foolproof to attacks. In our proposed work, we have devised a new scheme that withstands many of the common attacks on PIN numbers and OTPs. In our scheme, users will generate the UTP with the help of a visual pattern, random alphabets sequence and a PIN number. Because the UTP varies for each transaction, it acts like an OTP. Our scheme conceals PIN number within the UTP so that no direct entry of PIN number is required. The PIN number could be retrieved from the UTP by the authenticator module at the server. To the best our knowledge, this is the first scheme that facilitates users to transform their PIN numbers into a one-time number without any special device or tool. Our scheme is an inherently multi-factor authentication by combining knowledge factor and possession factor within itself. The user studies we conducted on the prototype have provided encouraging results to support the scheme’s security and usability.

Keywords: Personal identification number, shoulder surfing, keylogging, user authentication, otp, internet banking.

Received July 19, 2016; accepted June 4, 2017
Full text  
Tuesday, 27 August 2019 00:59

Detecting Sentences Types in the Standard Arabic Language

Ramzi Halimouche and Hocine Teffahi

Laboratory of Spoken Communication and Signal Processing, Electronics and Computer Science Faculty, University of Sciences and Technology Houari Boumediene, Algeria

Abstract: The standard Arabic language, like many other languages, contains a prosodic feature, which is hidden in the speech signal. The studies related to this field are still in the preliminary stages. This fact results in restraining the performance of the communication tools. The prosodic study allows people having all the communication tools needed in their native language. Therefore, we propose, in this paper, a prosodic study between the various types of sentences in the standard Arabic language. The sentences are recognized according to three modalities as the following: declarative, interrogative and exclamatory sentences. The results of this study will be used to synthesize the different types of pronunciation that can be exploited in several domains namely the man-machine communication. To this end, we developed a specific dataset, consisting of the three types of sentences. Then, we tested two sets of features: prosodic features (Fundamental Frequency, Energy and Duration) and spectrum features (Mel-Frequency Cepstral Coefficients and Linear Predictive Coding) as well their combination. We adopted the Multi-Class Support Vector Machine (MC-SVM) as classifier. The experimental results are very encouraging.

Keywords: Standard arabic language, sentence type detection, fundamental frequency, energy, duration, mel-frequency cepstral coefficients, linear predictive coding.

Received January 19, 2017; accepted August 23, 2017

Full text  

Tuesday, 27 August 2019 00:57

Data Deduplication for Efficient Cloud Storage and Retrieval

Rishikesh Misal and Boominathan Perumal

 School of Computer Engineering, Vellore Institute of Technology University, India

Abstract: Cloud services provide flawless service to the client by increasing the geographic availability of the data. Increasing availability of data induces high amount of redundancy and large amount of space required to store that data. Data compression techniques can reduce the amount of space required for that data to be store at various sites. Data compression will ensure that there is no loss of availability and consistency at any site. As there is huge demand for cloud services and storage due to this the amount of investment also increases. By using data compression we can reduce the amount of investment required and this will also decrease the amount of physical space and data centers required to store data. Various security protocols can be incorporated to secure these compressed files at various sites. We provide a reliable technique to store deduplicates and its management in a secure manner to accomplish high consistency as well as availability.

Keywords: Data deduplication, cloud computing, storage, file system, distributed system.

Received February 23, 2017; accepted June 13, 2017
Full text  
Tuesday, 27 August 2019 00:56

Self-Adaptive PSO Memetic Algorithm For

Multi Objective Workflow Scheduling in Hybrid Cloud

Padmaveni Krishnan and John Aravindhar

Department of Computer Science and Engineering, Hindustan Institute of Technology and Science, India

Abstract: Cloud computing is a technology in distributed computing that facilitate pay per model to solve large scale problems. The main aim of cloud computing is to give optimal access among the distributed resources. Task scheduling in cloud is the allocation of best resource to the demand considering the different parameters like time, makespan, cost, throughput etc. All the workflow scheduling algorithms available cannot be applied in cloud since they fail to integrate the elasticity and heterogeneity in cloud. In this paper, the cloud workflow scheduling problem is modeled considering make span, cost, percentage of private cloud utilization and violation of deadline as four main objectives. Hybrid approach of Particle Swarm Optimization (PSO) and Memetic Algorithm (MA) called Self-Adaptive Particle Swarm Memetic Algorithm (SPMA) is proposed. SPMA can be used by cloud providers to maximize user quality of service and the profit of resource using an entropy optimization model. The heuristic is tested on several workflows. The results obtained shows that SPMA performs better than other state of art algorithms.

Keywords: Cloud computing, memetic algorithm, particle swarm optimization, self-adaptive particle swarm memetic algorithm.

Received April 3, 2017; accepted May 29, 2017
Full text  
Tuesday, 27 August 2019 00:54

A Novel Adaptive Two-phase Multimodal

Biometric Recognition System

Venkatramaphanikumar Sistla1, Venkata Krishna Kishore Kolli1, and Kamakshi Prasad Valurouthu2

1Department of Computer Science and Engineering, Vignan’s Foundation for Science, Technology and Research, India

2Department of Computer Science and Engineering, Jawaharlal Nehru Technological University Hyderabad College of Engineering, India

Abstract: Multimodal biometric recognition systems are intended to offer authentication without compromising on security, accuracy and these systems also used to address the limitations of unimodal systems like spoofing, intra class variations, noise and non-universality. In this paper, a novel adaptive two-phase multimodal framework is proposed with face, finger and speech traits. In this work, face trait reduces the search space by retrieving few possible nearest enrolled candidates to the probe using Gabor wavelets, semi-supervised kernel discriminant analysis and two dimensional- dynamic time warping. This nonlinear face classification serves as a search space reducer and affects the True Acceptance Rate (TAR). Later, level-1 and level-2 features of fingerprint trait are fused with Dempster Shafer theory and achieved high TAR. In the second phase, to reduce FAR and to validate the user identity, a text dependent speaker verification with RBFNN classifier is proposed. Classification accuracy of the proposed method is evaluated on own and standard datasets and experimental results clearly evident that proposed technique outperforms existing techniques in terms of search time, space and accuracy.

Keywords: Gabor filters, radial basis function, discrete wavelet transform, dynamic time warping kernel discriminant analysis.

Received April 18, 2017; accepted June 13, 2017

Full text  

Tuesday, 27 August 2019 00:53

EncCD: A Framework for Efficient Detection of Code Clones

Minhaj Khan

Department of Computer Science, Bahauddin Zakariya University, Pakistan

Abstract: Code clones represent similar snippets of code written for an application. The detection of code clones is essential for maintenance of a software as modification to multiple snippets with a similar bug becomes cumbersome for a large software. The clone detection techniques perform conventional parsing before final match detection. An inefficient parsing mechanism however deteriorates performance of the overall clone detection mechanism. In this paper, we propose a framework called Encoded Clone Detector (EncCD), which is based on encoded pipeline processing for efficiently detecting clones. The proposed framework makes use of efficient labelled encoding followed by tokenization and match detection. The experimentation performed on the Intel Core i7 and Intel Xeon processor based systems shows that the proposed EncCD framework outperforms the widely used JCCD and CCFinder frameworks by producing a significant performance improvement.

Keywords: Clone detection, Software Engineering, Software Maintenance, Optimization, Speedup.

Received February 5, 2017; accepted September 30, 2018
Full text  
Tuesday, 27 August 2019 00:51

Sentiment Analysis with Term Weighting and Word Vectors

Metin Bilgin1 and Haldun Köktaş2

1Department of Computer Engineering, Bursa Uludağ University, Turkey

2Department of Mechatronic Engineering, Bursa Technical University, Turkey

Abstract: It is the sentiment analysis with which it is tried to predict the sentiment being told in the texts in an area where Natural Language Processing (NLP) studies are being frequently used in recent years. In this study sentiment extraction has been made from Turkish texts and performances of methods that are used in text representation have been compared. In the study being conducted, besides Bag of Words (BoW) method which is traditionally used for the representation of texts, Word2Vec, which is word vector algorithm being developed in recent years and Doc2Vec, being document vector algorithm, have been used. For the study 5 different Machine Learning (ML) algorithms have been used to classify the texts being represented in 5 different ways on 3000 pieces of labeled tweets belonging to a telecom company. As a conclusion it was seen that Word2Vec, being among text representation methods and Random Forest, being among ML algorithms were most successful and most applicable ones. It is important as it is the first study with which BoW and word vectors have been compared for sentiment analysis in Turkish texts.

Keywords: Word2vec, Doc2vec, sentiment analysis, machine learning, natural language processing.

Received February 16, 2018; accepted July 22, 2018

Full text 

Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…