Thursday, 11 October 2018 05:24

A Novel Handwriting Grading System Using

Gurmukhi Characters


Munish Kumar1, Manish Jindal2, and Rajendra Sharma3

1Department of Computational Sciences, Maharaja Ranjit Singh Punjab Technical University, India

2Department of Computer Science and Applications, Panjab University Regional Centre, India

3Department of Computer Science and Engineering, Thapar University, India

Abstract: This paper presents a new technique for grading the writers based on their handwriting. This process of grading shall be helpful in organizing handwriting competitions and then deciding the winners on the basis of an automated process. For testing data set, we have collected samples from one hundred different writers. In order to establish the correctness of our approach, we have also considered these characters, taken from one printed Gurmukhi font (Anandpur Sahib) in testing data set. For training data set, we have considered these characters, taken from four printed Gurmukhi fonts, namely, Language Materials Project (LMP) Taran, Maharaja, Granthi and Gurmukhi_Lys. Nearest Neighbour classifier has been used for obtaining a classification score for each writer. Finally, the writers are graded based on their classification score.

Keywords: Gradation; feature extraction; peak extent based features; modified division point based features; NN.

Received June 7, 2015; accepted January 13, 2016
  
Thursday, 11 October 2018 05:23

SynchroState: A SPEM-based Solution for Synchronizing Activities and Products through State Transitions

Amal Rochd1, Maria Zrikem1, Thierry Millan2, Christian Percebois2, Claude Baron3, and Abderrahmane Ayadi1

1Laboratory of Modeling and Information Technologies, University of Cadi Ayyad, Morocco

2Institut de Recherche en Informatique de Toulouse, Université de Toulouse, France

3Laboratoire d’Analyse et d’Architecture des Systèmes, Université de Toulouse, France

Abstract: Software engineering research was always focused around the efficiency of software development processes. Recently, we noticed an increasing interest in model-driven approaches in this context. Models that were once merely descriptive, are nowadays playing a productive role in defining engineering processes and managing their lifecycles. However, there is a problem that has not been considered enough; it is about sustaining consistency between products and the implicated activities during the process lifecycle. This issue, identified in this paper as the synchronization problem, needs to be resolved in order to guarantee a flawless execution of a software process. In this paper, we present a SPEM-based solution named SynchroState that highlights the relationship between process activities and products. SynchroState's goal is to ensure synchronization between activities and products in order that if one of these two entities undergoes a change, the dependents entities should be notified and evolved to sustain consistency In order to evaluate SynchroState, we have implemented the solution using the AspectJ language and validated it through a case study inspired from the ISPW-6 software process example. Results of this study demonstrate the automation of synchronization of product state following a change in the activity state during the evolution of the process execution.

Keywords: Synchro state, SPEM, metamodeling, process model, synchronization, aspectJ. 

Received April 17, 2015; accepted June 9, 2016
  
Thursday, 11 October 2018 05:21

Security Mechanism against Sybil Attacks for

High-Throughput Multicast Routing in Wireless

Mesh Networks

Anitha Periasamy1 and Periasamy Pappampalayam2

1Master of Computer Applications Department, Anna University, India

2Electronics and Communication Engineering Department, Anna University, India

Abstract: Wireless Mesh Networks (WMNs) have become one of the important domains in wireless communications. They comprise of a number of static wireless routers which form an access network for end users to IP-based services. Unlike conventional Wireless Local Area Network (WLAN) deployments, wireless mesh networks offer multihop routing, facilitating an easy and cost-effective deployment. In this paper, an efficient and secure multicast routing on such wireless mesh networks is concentrated. This paper identifies the novel attacks against high throughput multicast protocols in wireless mesh networks through Secure On-Demand Multicast Routing Protocol (S-ODMRP). Recently, Sybil attack is observed to be the most harmful attack in WMNs, where a node illegitimately claims multiple identities. This paper systematically analyzes the threat posed by the Sybil attack to WMN. The Sybil attack is encountered by the defense mechanism called Random Key Predistribution technique (RKP). The performance of the proposed approach which integrates the S-ODMRP and RKP is evaluated using the throughput performance metric. It is observed from the experimental result that the proposed approach provides good security against Sybil attack with very high throughput.

Keywords: Secure multicast routing, sybil attack, random key predistribution.

Received June 29, 2015; accepted December 14, 2015
  
Thursday, 11 October 2018 05:20

An Effective Sample Preparation Method for Diabetes Prediction

Shima Afzali and Oktay Yildiz

Computer Engineering Department, Gazi University, Turkey

Abstract: Diabetes is a chronic disorder caused by metabolic malfunction in carbohydrate metabolism and it has become a serious health problem worldwide. Early and correct detection of diabetes can significantly influence the treatment process of diabetic patients and thus eliminate the associated side effects. Machine learning is an emerging field of high importance for providing prognosis and a deeper understanding of the classification of diseases such as diabetes. This study proposed a high precision diagnostic system by modifying k-means clustering technique. In the first place, noisy, uncertain and inconsistent data was detected by new clustering method and removed from data set. Then, diabetes prediction model was generated by using Support Vector Machine (SVM). Employing the proposed diagnostic system to classify Pima Indians Diabetes data set (PID) resulted in 99.64% classification accuracy with 10-fold cross validation. The results from our analysis show the new system is highly successful compared to SVM and the classical k-means algorithm & SVM regarding classification performance and time consumption. Experimental results indicate that the proposed approach outperforms previous methods.

Keywords: Diabetes, clustering, classification, K-means, SVM, sample preparation.

Received November 28, 2015; accepted February 3, 2016
  
Thursday, 11 October 2018 05:19

Evaluating Social Context in Arabic Opinion Mining

Mohammed Al-Kabi1, Izzat Alsmadi2, Rawan Khasawneh3, and Heider Wahsheh4

1Computer Science Department, Zarqa University, Jordan

2Computer Science Department, University of New Haven, USA

3Computer Information Systems Department, Jordan University of Science and Technology, Jordan

4Computer Science Department, King Khaled University, Saudi Arabia

Abstract: This study is based on a benchmark corpora consisting of 3,015 textual Arabic opinions collected from Facebook. These collected Arabic opinions are distributed equally among three domains (Food, Sport, and Weather), to create a balanced benchmark corpus. To accomplish this study ten Arabic lexicons were constructed manually, and a new tool called Arabic Opinions Polarity Identification (AOPI) is designed and implemented to identify the polarity of the collected Arabic opinions using the constructed lexicons. Furthermore, this study includes a comparison between the constructed tool and two free online sentiment analysis tools (SocialMention and SentiStrength) that support the Arabic language. The effect of stemming on the accuracy of these tools is tested in this study. The evaluation results using machine learning classifiers show that AOPI is more effective than the other two free online sentiment analysis tools using a stemmed dataset.

Keywords: Big data, social networks, sentiment analysis, Arabic text classification, and analysis, opinion mining.

Received November 20, 2015; accepted March 30, 2016
  
Thursday, 11 October 2018 04:58

Image Quality Assessment Employing RMS

Contrast and Histogram Similarity

Al-Amin Bhuiyan1 and Abdul Raouf Khan2

1Department of Computer Engineering, King Faisal University, KSA

2Department of Computer Science, King Faisal University, KSA

Abstract: This paper presents a new approach for evaluating image quality. The method is based on the histogram similarity computation between images and is organized with assessing quality index factors due to the contributions of correlation coefficient, average luminance distortion and rms contrast measurement. The effectiveness of this proposed RMS Contrast and Histogram Similarity (RCHS) based hybrid quality index has been justified over Lena images under different well known distortions and standard image databases. Experimental results demonstrate that this image quality assessment method performs better than those of widely used image distortion quality metric Mean Squared Error (MSE), Structural SIMilarity (SSIM) and Histogram based Image Quality (HIQ).

Keywords: Image quality measures, RMS contrast, histogram similarity, SSIM, HIQ, minkowski distance metric.

Received July 25, 2015; accepted November 29, 2015
  
Thursday, 11 October 2018 04:57

Enhancing Anti-phishing by a Robust Multi-Level Authentication Technique (EARMAT)

Adwan Yasin and Abdelmunem Abuhasan

College of Engineering and Information Technology, Arab American University, Palestine

Abstract: Phishing is a kind of social engineering attack in which experienced persons or entities fool novice users to share their sensitive information such as usernames, passwords, credit card numbers, etc. through spoofed emails, spams, and Trojan hosts. The proposed scheme based on designing a secure two factor authentication web application that prevents phishing attacks instead of relying on the phishing detection methods and user experience. The proposed method guarantees that authenticating users to services, such as online banking or e-commerce websites, is done in a very secure manner. The proposed system involves using a mobile phone as a software token that plays the role of a second factor in the user authentication process, the web application generates a session based onetime password and delivers it securely to the mobile application after notifying him through Google Cloud Messaging (GCM) service, then the user mobile software will complete the authentication process – after user confirmation- by encrypting the received onetime password with its own private key and sends it back to the server in a secure and transparent to the user mechanism. Once the server decrypts the received onetime password and mutually authenticates the client, it automatically authenticates the user’s web session. We implemented a prototype system of our authentication protocol that consists of an Android application, a Java-based web server and a GCM connectivity for both of them. Our evaluation results indicate the viability of the authentication protocol to secure the web applications authentication against various types of threats.

Keywords: Phishing, two-factor authentication, web security, google cloud messaging, mobile authentication.

Received September 29, 2015; accepted June 1, 2016
  
Thursday, 11 October 2018 04:54

Detection of Neovascularization in Proliferative

Diabetic Retinopathy Fundus Images

Suma Gandhimathi1 and Kavitha Pillai2

1Department of Computer Science and Engineering, Sree Vidyanikethan Engineering College, India

2Department of Computer Science and Engineering, University College of Engineering, India

Abstract: Neovascularization is a serious visual consequence disease arising from Proliferative Diabetic Retinopathy (PDR). The condition causes progressive retinal damage in persons suffering from Diabetes mellitus, and is characterized by busted growth of abnormal blood vessels from the normal vasculature, which hampers proper blood flow into the retina because of oxygen insufficiency in retinal capillaries. The present paper aims at detecting PDR neovascularization with the help of the Adaptive Histogram Equalization technique, which enhances the green plane of the fundus image, resulting in enrichment of the details presented in the fundus image. The neovascularization blood vessels and the normal blood vessels were both segmented from the equalized image, using the Fuzzy C-means clustering technique. Marking of the neovascularization region, was achieved with a function matrix box based on a compactness classifier, which applied morphological and threshold techniques on the segmented image. Subsequently, the Feed Forward Back-propagation Neural Network interacted with extracted features (e.g., number of segments, gradient variation, mean, variance, standard deviation, contrast, correlation, entropy, energy, homogeneity, cluster shade towards the neovascularization detection region), in an attempt to achieve accurate identification. The above method was tested on images from three online datasets, as well as two hospital eye clinics. The performance of the detection technique was evaluated on these five image sources, and found to show an overall accuracy of 94.5% for sensitivity of 95.4% and of specificity 49.3% respectively, thus reiterating that the method would play a vital role in the study and analysis of Diabetic Retinopathy.

Keywords: Diabetic retinopathy, neovascularization, fuzzy C-means clustering, compactness classifier, feature extraction, neural network.

Received May 28, 2015; accepted December 27, 2015
  
Thursday, 11 October 2018 04:52

Machine Learning based Intelligent Framework for Data Preprocessing

Sohail Sarwar1, Zia UI Qayyum2, and Abdul Kaleem1

1Department of Computing, Iqra University Islamabad, Pakistan

2Department of Computer Science, National University of Computing and Emerging Sciences Islamabad, Pakistan

Abstract: Data preprocessing having a pivotal role in data mining ensures reduction in cost by catering inconsistent, incomplete and irrelevant data through data cleansing to assist knowledge workers in making effective decisions through knowledge extraction. Prevalent techniques are not much effective for having more manual effort, increased processing time, less accuracy percentage etc with constrained data volumes. In this research, a comprehensive, semi-automatic pre-processing framework based on hybrid of two machine learning techniques namely Conditional Random Fields (CRF) and Hidden Markov Model (HMM) is devised for data cleansing. Proposed framework is envisaged to be effective and flexible enough to manipulate data set of any size. A bucket of inconsistent dataset (comprising of customer’s address directory) of Pakistan Telecommunication Company (PTCL) is used to conduct different experiments for training and validation of proposed approach. Small percentage of semi cleansed data (output of preprocessing) is passed to hybrid of HMM and CRF for learning and rest of the data is used for testing the model. Experiments depict superiority of higher average accuracy of 95.50% for proposed hybrid approach compared to CRF (84.5%) and HMM (88.6%) when applied in separately.

Keywords: Machine learning, hidden markov model, conditional random fields, preprocessing.

Received March 14, 2015; accepted June 14, 2016
  
Thursday, 11 October 2018 04:51

An Empirical Study to Evaluate the Relationship of Object-Oriented Metrics and Change Proneness

Ruchika Malhotra and Megha Khanna

Department of Computer Science and Engineering, Technological University, India

Abstract: Software maintenance deals with changes or modifications which software goes through. Change prediction models help in identification of classes/modules which are prone to change in future releases of a software product. As change prone classes are probable sources of defects and modifications, they represent the weak areas of a product. Thus, change prediction models would aid software developers in delivering an effective quality software product by allocating more resources to change prone classes/modules as they need greater attention and resources for verification and meticulous testing. This would reduce the probability of defects in future releases and would yield a better quality product and satisfied customers. This study deals with the identification of change prone classes in an Object-Oriented (OO) software in order to evaluate whether a relationship exists between OO metrics and change proneness attribute of a class. The study also compares the effectiveness of two sets of methods for change prediction tasks i.e. the traditional statistical methods (logistic regression) and the recently widely used machine learning methods like Bagging, Multi-layer perceptron etc.

Keywords: Change proneness, empirical validation, machine learning, object-oriented and software quality.

Received May 29, 2015; accepted September 20, 2015
  
Thursday, 11 October 2018 04:50

Impulse Noise Reduction for Texture Images Using Real Word Spelling Correction Algorithm and Local Binary Patterns

Shervan Fekri-Ershad1, Seyed Fakhrahmad2, and Farshad Tajeripour2

1Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Najafabad, Iran

2Department of Computer Science and Engineering, Shiraz University, Shiraz, Iran

Abstract: Noise Reduction is one of the most important steps in very broad domain of image processing applications such as face identification, motion tracking, visual pattern recognition and etc. Texture images are covered a huge number of images where are collected as database in these applications. In this paper an approach is proposed for noise reduction in texture images which is based on real word spelling correction theory in natural language processing. The proposed approach is included two main steps. In the first step, most similar pixels to noisy desired pixel in terms of textural features are generated using local binary pattern. Next, best one of the candidates is selected based on two-gram algorithm. The quality of the proposed approach is compared with some of state of the art noise reduction filters in the result part. High accuracy, Low blurring effect, and low computational complexity are some advantages of the proposed approach.

Keywords: Image noise reduction, local binary pattern, real word spelling correction, texture analysis.

Received June 22, 2015; accepted March 9, 2016
  
Thursday, 11 October 2018 04:47

Using Data Mining for Predicting Cultivable

Uncultivated Regions in the Middle East

Ahsan Abdullah1, Ahmed Bakhashwain2, Abdullah Basuhail3, and Ahtisham Aslam4

1Department of Information Technology, King Abdulaziz University, Saudi Arabia

2Department of Arid Regions Agriculture, King Abdulaziz University, Saudi Arabia

3Department of Computer Science, King Abdulaziz University, Saudi Arabia

4Department of Information Systems, King Abdulaziz University, Saudi Arabia

Abstract: Middle-East region is mostly characterized by a hot and dry climate, vast deserts and long coastlines. Deserts cover large areas, while agricultural lands are described as small areas of arable land under perennial grass pastures or crops. In view of the harsh climate and falling ground-water level, it is critical to identify which agriculture produce to grow, and where to grow it? The traditional methods used for this purpose are expensive, complex, prone to subjectivity, risky and are time-consuming; this points to the need of exploring novel IT techniques using Geographic Information Systems (GIS). In this paper, we present a data-driven stand-alone flexible analysis environment i.e., Spatial Prediction and Overlay Tool (SPOT). SPOT is predictive spatial data mining GIS tool designed to facilitate decision support by processing and analysing agro-meteorological and socio-economic thematic maps and generating crop cultivation geo-referenced prediction maps by predicative data mining. In this paper, we present a case study of Saudi Arabia by using decade old wheat cultivation data, and compare the historically uncultivated regions predicted by SPOT with their current cultivation status. The prediction results were found to be promising after verification in time and space using latest satellite imagery followed by on-site physical ground verification using GPS.

Keywords: Data mining, image processing, GIS, prediction, wheat, alfalfa.

Received October 30, 2015; accepted June 1, 2016
  
Thursday, 11 October 2018 04:45

Mining Consumer Knowledge from Shopping

Experience: TV Shopping Industry

Chih-Hao Wen1, Shu-Hsien Liao2, and Shu-Fang Huang2

1Department of Logistics Management, National Defense University, Taiwan
2Department of Management Sciences and Decision Making, Tamkang University, Taiwan

Abstract: TV shopping becomes far much popular in recent years. TV nowadays is almost everywhere. People watch TV; meanwhile, they are more and more accustomed to buy goods via TV shopping channel. Even in recession, it is thriving and has become one of the most important consumption modes. This study uses cluster analysis to identify the profiles of TV shopping consumers. The rules between TV Shopping spokespersons and commodities from consumers are recognized by using association analysis. Depicting the marketing knowledge map of spokespersons, the best endorsement portfolio is found out to make recommendations. By the analysis of spokespersons, period, customer profiles and products, fourbusiness modes of TV shopping are proposed for consumers: new product, knowledge, low price and luxury product; the related recommendations are also provided for the industry reference.

Keywords: Consumer knowledge, data mining, TV shopping, association rules, clustering.

Received July 23, 2014; accepted June 26, 2016
  
Thursday, 11 October 2018 04:44

A Physical Topology Discovery Method Based on AFTs of Down Constraint

Bin Zhang1, Xingchun Diao2, Donghong Qin3, Yi Liu4, and Yun Yu2

1Cyberspace Security Research Center, Pengcheng Laboratory, China

2Nanjing Telecommunication Technology Research Institute, China

3School of Information Science and Engineering, GuangXi University for Nationalities, China

4National Innovation Institute of Defense Technology, Beijing, China

Abstract: Network physical topology discovery is the key issue for network management and application, the physical topology discovery based on Address Forwarding Table (AFT) is a hot topic on current study. This paper defines three constraints of AFTs, and proposes a tree chopping algorithm based on AFTs satisfying down constraint, which can discover the physical topology of a subnet accurately. The proposed algorithm decreases the demand for AFT integrity dramatically, and is the loosest constraint for discovering physical topology which just relies on AFTs of down ports. The proposed algorithm can also be used in the switch domain of multiple subnets.

Keywords: Physical topology discovery, address forwarding table, network management.

Received January 27, 2015; accept September 9, 2015
  
Thursday, 11 October 2018 04:43

Modified Binary Bat Algorithm for Feature Selection in Unsupervised Learning

 Rajalaxmi Ramasamy and Sylvia Rani

Department of Computer Science and Engineering, Kongu Engineering College, India

Abstract: Feature selection is the process of selecting a subset of optimal features by removing redundant and irrelevant features. In supervised learning, feature selection process uses class label. But feature selection is difficult in unsupervised learning since class labels are not present. In this paper, we present a wrapper based unsupervised feature selection method with the modified binary bat approach with k-means clustering algorithm. To ensure diversification in the search space, mutation operator is introduced in the proposed algorithm. To validate the selected features by our method, classification algorithms like decision tree induction, Support Vector Machine and Naïve Bayesian classifier are used. The results show that the proposed method identifies a minimal number of features with improved accuracy when compared with the other methods.

Keywords: Feature selection, unsupervised learning, binary bat algorithm, mutation.

Received March 10, 2015; accepted December 21, 2015
  
Thursday, 11 October 2018 04:39

Explicitly Symplectic Algorithm for Long-time Simulation of Ultra-flexible Cloth

Xiao-hui Tan1, Zhou Mingquan2, Yachun Fan2, Wang Xuesong2, and Wu Zhongke2

1College of Information and Engineering, Capital Normal University, China

2College of Information Science and Technology, Beijing Normal University, China

Abstract: In this paper, a symplectic structure-preserved algorithm is presented to solve Hamiltonian dynamic model of ultra-flexible cloth simulation with high computation stability. Our method can preserve the conserved quantity of a Hamiltonian, which enables a long-time stable simulation of ultra-flexible cloth. Firstly, the dynamic equation of ultra-flexible cloth simulation is transferred into Hamiltonian system which is slightly perturbed from the original one, but with generalized structure preservability. Secondly, semi-implicit symplecticRunge-Kutta and Euler algorithms are constructed, and able to be converted into explicit algorithms for the separable dynamic models. Thirdly, in order to show the advantages, the presented algorithms are utilized to solve a conservative system which is the primary ultra-flexible cloth model unit. The results show that the presented algorithms can preserve the system energy constant and can give the exact results even at large time-step, however the ordinary non-symplectic explicit methodsexhabit large error with the increasing of time-step. Finally, the presented algorithms are adopted to simulate a large-areaultra-flexible cloth to validate the computation capability and stability. The method employs the symplectial features and analytically integrates the force for better stability and accuracy while keeping the integration scheme is still explicit. Experiment results show that our symplectic schemes are more powerful for integrating Hamiltonian systems than non-symplectic methods. Our method is a common scheme for physically based system to simultaneously maintain real-time and long-time simulation.It has been implemented in the scene building platform-World Max Studio.

Keywords: Flexible cloth simulation, numerical integration, symplectic method, scene building system.

Received March 13, 2015; accepted June 14, 2016
  
Thursday, 11 October 2018 04:38

Semi Fragile Watermarking for Content based Image

Authentication and Recovery in the

DWT-DCT Domains

Jayashree Pillai1 and Padma Theagarajan2

1Department of Computer Science, Acharya Institute of Management and Sciences, India

2Department of Computer Applications, Sona College of Technology, India

Abstract: Content authentication requires that the image watermarks must highlight malicious attacks while tolerating incidental modifications that do not alter the image contents beyond a certain tolerance limit. This paper proposed an authentication scheme that uses content invariant features of the image as a self authenticating watermark and a quantized down sampled approximation of the original image as a recovery watermark, both embedded securely using a pseudorandom sequence into multiple sub bands in the Discrete Wavelet Transform (DWT) domain. The scheme is blind as it does not require the original image during the authentication stage and highly tolerant to Jpeg2000 compression. The scheme also ensures highly imperceptible watermarked images and is suitable in application with low tolerance to image quality degradation after watermarking. Both Discrete Cosine Transform (DCT) and DWT transform domain are used in the watermark generation and embedding process.

Keywords: Content authentication, self authentication, recovery watermark, DWT, PQ sequence.

Received May 29, 2015; accepted February 22, 2016
  
Thursday, 11 October 2018 04:36

Recognition of Handwritten Characters Based on

Wavelet Transform and SVM Classifier

Malika Ait Aider1, Kamal Hammouche1, and Djamel Gaceb2

1Laboratoire Vision Artificielle et Automatique des Systèmes, Université Mouloud Mammeri, Algérie

2Laboratoire D'informatique en Image et Systèmes D'information, Institut National des Sciences Appliquées de Lyon, France

Abstract: This paper is devoted to the off-line handwritten character recognition based on the two dimensional wavelet transform and a single support vector machine classifier. The wavelet transform provides a representation of the image in independent frequency bands. It performs a local analysis to characterize images of characters in time and scale space. The wavelet transform provides at each level of decomposition four sub-images: a smooth or approximation sub-image and three detail sub-images. In handwritten character recognition, the wavelet transform has received more attention and its performance is related not only to the use of the type of wavelet but also to the type of a sub-image used to provide features. Our objective here is thus to study these two previous points by conducting several tests using several wavelet families and several combinational features derived from sub-images. They show that the symlet wavelet of order 8 is the most efficient and the features derived from the approximation sub-image allow the best discrimination between the handwritten digits.

Keywords: Feature extraction; wavelet transform, handwritten character recognition; support vector machine; OCR.

Received June 10, 2015; accepted May 16, 2016
  
Thursday, 11 October 2018 04:35

Development of a Hindi Named Entity Recognition System without Using Manually Annotated Training Corpus

Sujan Kumar Saha1 and Mukta Majumder2

1Department of Computer Science and Engineering, Birla Institute of Technology, India

2Department of Computer Science and Application, University of North Bengal, India

Abstract: Machine learning based approach for Named Entity Recognition (NER) requires sufficient annotated corpus to train the classifier. Other NER resources like gazetteers are also required to make the classifier more accurate. But in many languages and domains relevant NER resources are still not available. Creation of adequate and relevant resources is costly and time consuming. However a large amount of resources and several NER systems are available in resource-rich languages, like English. Suitable language adaptation techniques, NER resources of a resource-rich language and minimally supervised learning might help to overcome such scenarios. In this paper we have studied a few such techniques in order to develop a Hindi NER system. Without using any Hindi NE annotated corpus we have achieved a reasonable accuracy of F-Measure 73.87 in the developed system.

Keywords: Natural language processing, machine learning, named entity recognition, resource scarcity, language transfer, semi-supervised learning.

Received July 22, 2015; accepted October 7, 2015
  
Thursday, 11 October 2018 04:31

Combining Instance Weighting and Fine Tuning for Training Naïve Bayesian Classifiers with Scant Training Data

Khalil El Hindi

Department of Computer Science, King Saud University, Saudi Arabia

Abstract: This work addresses the problem of having to train a Naïve Bayesian classifier using limited data. It first presents an improved instance-weighting algorithm that is accurate and robust to noise and then it shows how to combine it with a fine tuning algorithm to achieve even better classification accuracy. Our empirical work using 49 benchmark data sets shows that the improved instance-weighting method outperforms the original algorithm on both noisy and noise-free data sets. Another set of empirical results indicates that combining the instance-weighting algorithm with the fine tuning algorithm gives better classification accuracy than using either one of them alone.

Keywords: Naïve bayesian algorithm, classification, machine learning, noisy data sets, instance weighting.

Received April 4, 2016; accepted June 7, 2016
 
Top
We use cookies to improve our website. By continuing to use this website, you are giving consent to cookies being used. More details…