Acta polytechnica Hungarica Vol. 19, No. 10 (2022)

Acta polytechnica HungaricaVol. 19, No. 10 (2022)

Tartalom

Special Issue on Advances in Intelligent Systems - Guest Editors: Ildar Batyrshin, Fernando Gomide, Vladik Kreinovich and Shahnaz Shahbazova

Ildar Batyrshin ,

Fernando Gomide ,

Vladik Kreinovich ,

Shahnaz Shahbazova :

Preface7en [68.69 kB - PDF]EPA-02461-00126-0010

Eliuth E. López-García ,

Ildar Batyrshin ,

Grigori Sidorov :

New Similarity Measures for Item-based Neighborhood Collaborative Filtering9-27en [3.40 MB - PDF]EPA-02461-00126-0020

Abstract: Similarity measures play an important role in many areas to solve a wide variety of problems. In computer science, these measures are used in decision making, information retrieval, data mining, machine learning, and recommender systems. The recommender systems are tools that have proven their utility in filtering large amounts of information and giving recommendations useful for users. Neighborhood collaborative filtering is the most common recommender system approach implemented by cutting-edge companies. A key element of this approach is the similarity measure, which is used to find neighbors with similar tastes to provide recommendations that satisfy users' needs. A drawback of this approach is the lack of user’s information to generate proper recommendations. For this reason, it is important to design new similarity measures that can find the most relevant neighbors to generate more accurate recommendations for users with little information about them. This paper designs two new similarity measures that can generate good recommendations with little information about users. These similarity measures have been tested using MovieLens datasets and different rating prediction methods, and they have shown a good performance in comparison with other similarity measures designed to address the recommendation problem.

Keywords: rating scale; recommender systems; collaborative filtering; neighborhood; itembased; similarity measure; similarity; cold-start

Anabel Martínez-Vargas ,

Ángel G. Andrade ,

Zury J. Santiago-Manzano :

A Joint Algorithm for Base Station Deactivation and Mobile User Reassignment in Green Cellular Networks29-48en [663.73 kB - PDF]EPA-02461-00126-0030

Abstract: The proliferation of smartphones has led to an increase in the cellular infrastructure, due to efforts by mobile operators to meet the rising demand. Given that the planning of cellular networks is carried out according to demand during peak hours, a large number of base stations must be deployed to maintain a constant number of base stations even when traffic intensity is reduced. This strategy has brought about increased energy levels in cellular networks, affecting the networks' operating expenses and contributing to the problem of carbon emissions in the atmosphere. This work shows an algorithm that deactivates base stations for cellular networks and reassigns mobile users. We use the interruption probability to analyze the effect of base-station-deactivation on mobile users. We perform two approaches: one using a homogeneous network and the other a heterogeneous network. The homogeneous network is a macro-cell deployment, whereas the heterogeneous network comprises macro-cells and femto-cells. A genetic algorithm is used to find the set of base stations to deactivate and continue offering the demand services. As the carrier-to-interference ratio increases, the results show that few base stations need deactivating in a heterogeneous network with high traffic demand.

Keywords: Genetic algorithm; Green network; Sleep mode

Julio C. Urenda ,

Olga Kosheleva ,

Shahnaz Shahbazova ,

Vladik Kreinovich :

What Is the Uncertainty of the Result of Data Processing: Fuzzy Analogue of the Central Limit Theorem49-60en [164.75 kB - PDF]EPA-02461-00126-0040

Abstract: It is known that, due to the Central Limit Theorem, the probability distribution of the uncertainty of the result of data processing is, in general, close to Gaussian - or to a distribution from a somewhat more general class known as infinitely divisible. We show that a similar result holds in the fuzzy case: namely, the membership function describing the uncertainty of the result of data processing is, in general, close to Gaussian - or to a membership function from an explicitly described more general class.

Keywords: fuzzy logic; Central Limit Theorem; uncertainty

Shahnaz N. Shahbazova ,

Dursun Ekmekci :

An Input-weighted, Multi-Objective Evolutionary Fuzzy Classifier, for Alcohol Classification61-81en [738.62 kB - PDF]EPA-02461-00126-0050

Abstract: The success of the evolutionary computational methods in scanning at problem's solution space and the ability to produce robust solutions, are important advantages for fuzzy systems, especially in terms of "interpretability" and "accuracy". Many techniques have been introduced for multi-objective evolutionary fuzzy classifiers by considering this advantage. However, these techniques are mostly fuzzy rule-based methods. In this study, instead of designing an optimal rule table or determining optimal rule weights, the inputs are weighted, and no rules are used. The average of the degrees of membership obtained with their Membership Function (MF) is calculated as the "input membership degree (μInp)" for each input. The μInps are then weighted, and a single coefficient is generated to be used for the output. With the output, results are obtained for different objective functions. The weights of the inputs and the MFs parameters of all variables (inputs and outputs) are optimized with NSGA-II. The performance of the method has been tested for alcohol classification. As a result, it has been proven that the method can generate designs that can classify at shallow error levels with different sensors at different gas concentrations. In addition, it has been observed that the proposed method produces more successful solutions for alcohol classification problems when compared to other MOEFC techniques.

Keywords: Multi-Objective Fuzzy Classifier; Multi-Objective Optimization; Input-Weighted Multi-Objective Fuzzy Classifier

Vadim Borisov ,

Margarita Chernovalova ,

Marina Dulyasova ,

Dmitry Morozov ,

Artem Vasiliev :

Fuzzy Methods for Comparing Project Situations and Selecting Precedent Decisions83-98en [502.07 kB - PDF]EPA-02461-00126-0060

Abstract: This paper characterizes project management task aspects substantiating the expediency of applying fuzzy methods for comparing project situations and selecting precedent decisions. It discusses methods for assessing the similarity of the fuzzy features of project situations, based on operations with fuzzy sets, pseudometric distances between fuzzy sets, and the fuzzy distance between fuzzy sets. The paper also describes approaches to comparing fuzzy project situations on the basis of aggregating the results of comparing individual features with the use of various convolutions or fuzzy inference algorithms, as well as by individual priority features. An example of selecting precedent project decisions relevant to project situations is given, where relative pseudometric distance between fuzzy sets is used to estimate the degree of similarity among the fuzzy features of project situations, and the modifiable Mamdani fuzzy inference algorithm is used for comparing fuzzy project situations and selecting precedent project decisions.

Keywords: project management; fuzzy project situations; fuzzy distance; fuzzy logic inference; precedent decision

Vladimir V. Bochkarev ,

Stanislav V. Khristoforov ,

Anna V. Shevlyakova ,

Valery D. Solovyev :

Comparison of the Three Algorithms for Concreteness Rating Estimation of English Words99-121en [401.74 kB - PDF]EPA-02461-00126-0070

Abstract: The paper compares three algorithms for concreteness rating estimation of English words. To train and test the models, we used a number of freely available dictionaries containing concreteness ratings. A feedforward neural network is employed as a regression model. Pre-trained fastText vectors, data on co-occurrence of target words with the most frequent ones, and data on co-occurrence of target words with functional words are used as input data by the considered algorithms. One of the three algorithms was proposed for the first time in this article. We provide detailed explanations of which combinations with functional words are the most informative in terms of concreteness ratings estimation for English words. Although the rest two algorithms have already been used for estimation of concreteness ratings, we consider possible ways to update them and improve the results obtained by a neural network. Thuswise, we use stochastic Spearman’s correlation coefficient as a criterion for stopping of training. All three algorithms provided good results. The best value of Spearman’s correlation coefficient between the value of the concreteness rating and its estimate was 0.906, which exceeds the values achieved in previous works.

Keywords: concreteness rating; abstractness; neural networks; fastText; word co-occurrence; English

Shashirekha Hosahalli Lakshmaiah ,

Fazlourrahman Balouchzahi ,

Mudoor Devadas Anusha ,

Grigori Sidorov :

CoLI-Machine Learning Approaches for Code-mixed Language Identification at the Word Level in Kannada-English Texts123-141en [785.51 kB - PDF]EPA-02461-00126-0080

Abstract: The task of automatically identifying a language used in a given text is called Language Identification (LI). India is a multilingual country and many Indians especially youths are comfortable with Hindi and English, in addition to their local languages. Hence, they often use more than one language to post their comments on social media. Texts containing more than one language are called “code-mixed texts” and are a good source of input for LI. Languages in these texts may be mixed at sentence level, word level or even at sub-word level. LI at word level is a sequence labeling problem where each and every word in a sentence is tagged with one of the languages in the predefined set of languages. For many NLP applications, using code-mixed texts, the first but very crucial preprocessing step will be identifying the languages in a given text. In order to address word level LI in code-mixed Kannada-English (Kn-En) texts, this work presents i) the construction of code-mixed Kn-En dataset called CoLI-Kenglish dataset, ii) code-mixed Kn-En embedding and iii) learning models using Machine Learning (ML), Deep Learning (DL) and Transfer Learning (TL) approaches. Code-mixed Kn-En texts are extracted from Kannada YouTube video comments to construct CoLI-Kenglish dataset and code-mixed Kn-En embedding. The words in CoLI-Kenglish dataset are grouped into six major categories, namely, “Kannada”, “English”, “Mixed-language”, “Name”, “Location” and “Other”. Code-mixed embeddings are used as features by the learning models and are created for each word, by merging the word vectors with sub-words vectors of all the sub-words in each word and character vectors of all the characters in each word. The learning models, namely, CoLI-vectors and CoLI-ngrams based on ML, CoLI-BiLSTM based on DL and CoLI-ULMFiT based on TL approaches are built and evaluated using CoLI-Kenglish dataset. The performances of the learning models illustrated, the superiority of CoLI-ngrams model, compared to other models with a macro average F1-score of 0.64. However, the results of all the learning models were quite competitive with each other.

Keywords: Language Identification; Code-mixed texts; Machine Learning; Deep Learning; Transfer Learning

Maaz Amjad ,

Noman Ashraf ,

Grigori Sidorov ,

Alisa Zhila ,

Liliana Chanona-Hernandez ,

Alexander Gelbukh :

Automatic Abusive Language Detection in Urdu Tweets143-163en [705.44 kB - PDF]EPA-02461-00126-0090

Abstract: Abusive language detection is an essential task in our modern times. Multiple studies have reported this task, in various languages, because it is essential to validate methods in many different languages. In this paper, we address the automatic detection of abusive language for tweets in the Urdu language. The study introduces the first dataset of tweets in the Urdu language, annotated for offensive expressions and evaluates it by comparing several machine learning methods. The Twitter dataset contains 3,500 tweets, all manually annotated by human experts. This research uses three text representation techniques: two count-based feature vectors and the pre-trained fastText word embeddings. The count-based features contain the character and word n-gram, while the pre-trained fastText model comprises word embeddings extracted from the Urdu tweets dataset. Moreover, this study uses four non-neural network models (SVM, LR, RF, AdaBoost) and two neural networks (CNN, LSTM). The study finding reveals that SVM outperforms other classifiers and obtains the best results for any text representation. Character tri-grams perform well with SVM and get an 82.68% of F1 score. The best-performing words n-grams are unigrams with SVM, which obtain 81.85% F1 score. The fastText word embeddings-based representation yields insignificant results.

Keywords: Twitter corpus; Abusive language detection; Urdu language; Machine learning

Obdulia Pichardo-Lagunas ,

Bella Martinez-Seis ,

Miguel Hidalgo-Reyes ,

Sabino Miranda :

Automatic Detection of Opposition Relations in Legal Texts Using Sentiment Analysis Techniques: A Case Study165-184en [804.45 kB - PDF]EPA-02461-00126-0100

Abstract: The documentation that describes the regulations within a Society, is oriented towards specific areas. This fact does not prevent maintaining concordance in the temporality and transversality of the documents. This work defines the concept of "opposition relations" in legal texts. We identify entities and evaluate the polarity of each paragraph with sentiment analysis techniques. If an entity appears in different paragraphs (articles of law) with opposite polarities, we evaluate the entity’s contexts. We look for antonyms between the words that give polarity to the opposite paragraphs. If there is an antonymic relation in words associated with the entity, we have an opposition relation. The described methodology analyzes the relationship of entities in Mexican Environmental Laws, and the study is oriented towards coherence in the legislation for sustainable development. This process was implemented by computational processing, which required the transformation of current Mexican laws, unifying its structure. Eight environmental laws were analyzed, 1920 entities were identified that appear more than once; 44 of them were identified with opposite polarities, due to their context, a detailed analysis of two cases with potential opposite relationships is exemplified.

Keywords: Natural Language Processing; Legal Text; Sentiment Analysis; Opposition Relation

Maaz Amjad ,

Sabur Butt ,

Alisa Zhila ,

Grigori Sidorov ,

Liliana Chanona-Hernandez ,

Alexander Gelbukh :

Survey of Fake News Datasets and Detection Methods in European and Asian Languages185-204en [482.91 kB - PDF]EPA-02461-00126-0110

Abstract: The presence of fake news and “alternative facts” across the web is a global phenomenon that received considerable attention in recent years. Several researchers have made substantial efforts to automatically identify fake news articles based on linguistic features and neural network-based methods. However, automatic classification via machine and deep learning techniques demands a significant amount of annotated data. While several state-of-the-art datasets for the English language are available and commonly utilized for research, fake news detection in low-resource languages gained less attention. This study surveys the publicly available datasets of fake news in low/medium-resourced Asian and European languages. We also highlight the vacuum of datasets and methods in these languages. Moreover, we summarize the proposed methods and the metrics used to evaluate the classifiers in identifying fake news. This study is helpful for analysis of the available sources in the lower resource languages to solve fake news detection challenges.

Keywords: datasets, fake news, low resource languages, deep learning, machine learning, evaluation metrics.

Iris Iddaly Méndez-Gurrola ,

Abdiel Ramírez-Reyes ,

Alejandro Israel Barranco-Gutiérrez :

A Review and Perspective on the Main Machine Learning Methods Applied to Physical Sciences205-220en [829.15 kB - PDF]EPA-02461-00126-0120

Abstract: Several types of numerical simulations have been used over the years in the Physical Sciences, to advance the real-life problems understanding. Among the statistical tools used for this are, for example: Monte Carlo simulations, such mechanisms have been used in various areas, however, today another tool is used, Machine Learning, which is a branch of Artificial Intelligence (AI). This article reviews sets of work that encompass various areas of the Physical Sciences, to mention some such as particle physics, quantum mechanics, condensed matter, among many others that have used some Machine Learning mechanisms to solve part of the problems raised in their research. In turn, a Machine Learning methods classification was carried out and it was identified which are the most used in Physical Sciences, something that is currently done in very few studies, as it requires extensive review work. The analysis carried out also allowed us to glimpse which areas of the Physical Sciences use Machine Learning the most and identify in which types of journals it is published more on the subject. The results obtained, show that there is currently a good number of works that interrelate Machine Learning and the Physical Sciences, and that this interrelation is increasing.

Keywords: Machine Learning; Physical Sciences; review; interdiciplinary