[论文翻译]经验:通过累积相对频率分布改善意见垃圾邮件检测


原文地址:https://arxiv.org/pdf/2012.13905v1.pdf


Experience: Improving Opinion Spam Detection by Cumulative Relative Frequency Distribution

体验:通过累积相对频率分布改善意见垃圾邮件检测

Abstract

Over the last years, online reviews became very important since they can influence the purchase decision of consumers and the reputation of businesses, therefore, the practice of writing fake reviews can have severe consequences on customers and service providers. Various approaches have been proposed for detecting opinion spam in online reviews, especially based on supervised classifiers. In this contribution, we start from a set of effective features used for classifying opinion spam and we re-engineered them, by considering the Cumulative Relative Frequency Distribution of each feature. By an experimental evaluation carried out on real data from Yelp.com, we show that the use of the distributional features is able to improve the performances of classifiers.

摘要

在过去几年中,在线评论变得非常重要,因为他们可以影响消费者的购买决定和企业的声誉,因此,写作假审查的做法可能对客户和服务提供商具有严重的后果。已经提出了在在线评论中检测意见垃圾邮件的各种方法,特别是在受监督的分类器的基础上。在这一贡献中,我们从一组用于分类意见垃圾邮件的有效功能开始,我们通过考虑每个特征的累积相对频率分布来重新设计它们。通过从Yelp.com的真实数据执行的实验评估,我们表明使用分配功能能够改善分类器的性能。

Introduction

As illustrated in a recent survey on the history of Digital Spam, the Social Web has led not only to a participatory, interactive nature of the Web experience', but also to the proliferation of new and widespread forms of spam, among which the most notorious ones are fake news and spam reviews, opinion spam. This results in the diffusion of different kinds of disinformation and misinformation, where misinformation refers to inaccuracies that may even originate acting in good faith, while disinformation is false information deliberately spread to deceive. Over the last years, online reviews became very important since they reflect the customers' experience with a product or service and, nowadays, they constitute the basis on which the reputation of an organization is built. Unfortunately, the confidence in such reviews is often misplaced, due to the fact that spammers are tempted to write fake information in exchange for some reward or to mislead consumers for obtaining business advantages. The practice of writing false reviews is not only morally deplorable, as it is misleading for customers and inconvenient for service providers, but it is also punishable by law. Considering both the longevity and the spread of the phenomenon, scholars for years have investigated various approaches to opinion spam detection, mainly based on supervised or unsupervised learning algorithms. Further approaches are based on Multi Criteria Decision Making. Machine learning approaches rely on input data to build a mathematical model in order to make predictions or decisions. To this aim, data are usually represented by a set of features, which are structured and ideally fully representative of the phenomenon being modeled. An effective feature engineering process, i.e., the process through which an analyst uses the domain knowledge of the data under investigation to prepare appropriate features, is a critical and time-consuming task. However, if done correctly, feature engineering increases the predictive power of algorithms by facilitating the machine learning process.

简介

如最近的数字垃圾邮件历史调查所示,社交网络不仅导致了Web体验的参与式,互动性质,而且还导致了新的和广泛形式的垃圾邮件的扩散,其中最臭名昭着的是假新闻和垃圾邮件评论,意见垃圾邮件。这导致不同种类的不忠实和错误信息的扩散,其中误导是指甚至可以善意地发挥行为的不准确性,而DISINATION是虚假的信息,故意传播欺骗。在过去的几年里,在线评论开始非常重要,因为它们反映了客户的产品或服务的客户经验,而且现在,他们构成了构建了组织声誉的基础。不幸的是,由于垃圾邮件发送者旨在编写伪造信息,以换取一些奖励或误导消费者获得业务优势,往往将被放置的信心往往错位。写作虚假审查的做法不仅是道德地令人遗憾的,因为它对客户来说是误导性和服务提供商的不方便,但它也是法律判处的。考虑到寿命和现象的蔓延,学者多年来已经调查了各种意见垃圾邮件检测方法,主要基于受监督或无监督的学习算法。进一步的方法是基于多标准决策制定。机器学习方法依赖于输入数据来构建数学模型,以便进行预测或决策。为此目的,数据通常由一组特征表示,该特征是结构化的,理想地完全代表被建模的现象。有效的特征工程过程,即分析师使用域名对调查数据的域名知识来准备适当的功能的过程是一个关键和耗时的任务。但是,如果正确完成,特征工程通过促进机器学习过程来增加算法的预测力。

In this paper, we do not aim to contribute by defining novel features suitable for fake reviews detection, rather, starting from features that have been proven to be very effective by Academia, we them, by considering the distribution of the occurrence of the features values in the dataset under analysis. In particular, we focus on the Cumulative Relative Frequency Distribution of a set of the basic features already employed for the task of fake review detection. We compute this distribution for each feature and substitute each feature value with the corresponding value of the distribution. To demonstrate the effectiveness of the proposed approach, the and the ones have been exploited to train several supervised machine-learning classifiers and the obtained results have been compared. To the best of the authors' knowledge, this is the first time that Cumulative Relative Frequency Distribution of a set of features has been considered for the unveiling of fake reviews. The experimental results show that the distributional features improve the performances of the classifiers, at the mere cost of a small computational surplus in the feature engineering phase. The rest of the paper is organized as follows. The next section revises related work in the area. Sectionfeatures describes the process of feature engineering. In Sectionsetup, we present the experimental setup, while Sectionresults reports the results of the comparison among the classification algorithms. Moreover, in this section, we assess the importance of the distributional features and discuss about the benefits brought by their adoption. Finally, Sectionconcl concludes the paper.

在本文中,我们并不目的是通过定义适合虚假评论检测的小说特征来贡献,而不是从学术界被证明是非常有效的特征,我们通过考虑分配特征值的分配在分析的数据集中。特别是,我们专注于已经用于假审查检测任务的一组基本功能的累积相对频率分布。我们为每个特征计算此分发,并将每个特征值替换为分布的相应值。为了展示所提出的方法的有效性,已经利用和该人被利用培训了几种监督的机器学习分类器,并且已经比较了所获得的结果。据作者所知,这是第一次考虑了一组特征的累积相对频率分布,已经考虑了揭幕假审查。实验结果表明,分布特征在特征工程阶段的小型计算盈余的仅成本中提高了分类器的性能。本文的其余部分安排如下。下一节修改了该地区的相关工作。 SectionFeatures描述了特征工程的过程。在SentsSetup中,我们介绍了实验设置,而切片结果报告了分类算法之间的比较结果。此外,在本节中,我们评估分配特征的重要性,并讨论采用带来的益处。最后,SectionConcl结束了这篇论文。

Social Media represent the perfect means for everyone to spread contents in the form of User-Generated Content (UGG), almost without any traditional form of trusted control. Since years, Academia, Industry, and Platform Administrators have been fighting for developing automatic solutions to raise the users' awareness about the credibility of the news they read online. One of the contexts in which the problem of credibility assessment is receiving the most interest is spam - or fake - reviews detection. The existence of spam reviews has been known since the early 2000s when e-commerce and e-advice sites began to be popular. In his seminal work, Liu lists three approaches to automatically identify opinion spam: the supervised, unsupervised, and group approaches. In a standard supervised approach, a ground truth of a priori known genuine and fake reviews is needed. Then, features about the labeled reviews, the reviewers, and the reviewed products are engineered. The performances of the first models built on such features achieved good results with common algorithms such as Naive Bayes and Support Vector Machines. As usual, a supervised approach is particularly challenging since it requires the existence of labeled data, that is, in our scenario, a set of reviews with prior knowledge about their (un)trustworthiness. To overcome the frequent issue of lack of labeled data, in the very first phases of investigation in this field, the work done by Jindal et al. in exploited the fact that a common practice of fraudulent reviewers was to post almost duplicate reviews: reviews with similar texts were collected as fake instances. As shown in , linguistic features have been proven to be valid for fake reviews detection, particularly in the early advent of this phenomenon. Indeed, pioneer fake reviewers exhibited precise stylistic features in their texts, such as a marked use of short terms and expressions of positive feelings. Anomaly detection was also been widely employed in this field: an analysis of anomalous practices with respect to the average behavior of a genuine reviewer led to good results. Anomalous behavior of the reviewer may be related to general and early rating deviation, as highlighted by Liu in, or temporal dynamics (see Xie et al.). Going further with the useful methodologies, human annotators, possibly recruited from crowd-sourcing services like Amazon Mechanical Turk, have also been employed, both 1) to manually label reviews' sets to separate fake from non-fake reviews (e.g., see the very recent survey by Crawford et al. in) and 2) to let them write intentionally false reviews, in order to test the accuracy of existing predictive models on such set of ad hoc crafted reviews, as nicely reproduced by Ott et al. in. Recently, an interesting point of view has been offered by Cocarascu and Tonotti in: deception is analysed based on contextual information derivable from review texts, but not in a standard way, e.g., considering linguistic features, but evaluating the influence and interactions that one text has on the others. The new feature, based on bipolar argumentation on the same review, has been shown to outperform more traditional features, when used in standard supervised classifiers, and even on small datasets.

相关工作

社交媒体代表每个人以用户生成的内容(UGG)的形式传播内容的完美手段,几乎没有任何传统的可信控制形式。自年来,学术界,工业和平台管理员一直在争取自动解决方案,以提高用户对他们在线阅读新闻信誉的认可。信用评估问题收到最多兴趣的背景之一是垃圾邮件 - 或伪造 - 评论检测。自2000年代初以来,垃圾邮件的存在是已知的,当时电子商务和电子咨询网站开始受欢迎。在他的开创性工作中,刘列出了三种方法,以自动识别意见垃圾邮件:监督,无监督和团体方法。在一个标准的监督方法中,需要一个先验的已知真实和假审查的基础事实。然后,有关标签评论,评论者和审查产品的特点是设计的。基于此类功能的第一款模型的性能与常见的算法(如天真贝叶斯和支持向量机)实现了良好的效果。像往常一样,监督方法尤其具有挑战性,因为它需要标记数据的存在,即在我们的方案中,这是一套关于他们(联合国)可靠性的先验知识的评论。在这一领域的调查的第一阶段,克服频繁问题缺乏标签数据,jindal等人完成的工作。在利用欺诈性评审员的常见做法是将几乎重复的审查发布:与类似文本的评论被收集为假实例。如图所示,语言特征已被证明对假评评论检测有效,特别是在这种现象的早期出现。事实上,先锋假审查员在文本中表现出精确的风格特征,例如明确使用短篇小说和积极情绪的表达。该领域也广泛采用异常检测:对真正审查人员平均行为的异常做法导致了良好效果的分析。审阅者的异常行为可能与一般和早期评级偏差有关,如刘在或时间动态(见谢等)所突出显示。进一步利用有用的方法,人类注册人,可能是亚马逊机械土耳其人的人群采购服务,也已被雇用,两者都是1)以手动标记审查,以分离非假审查(例如,看到非常Crawford等人最近的调查。最近,Cocarascu和Tonotti提供了一个有趣的观点:Cocarascu和Tonotti一个文字对其他文字有关。基于同一审查的Bipolar参数的新功能已被证明在标准监督分类器中使用时,甚至在标准监督分类器中使用的更优于更传统的功能。

Supervised learning algorithms usually need diverse examples - and the values of diverse features derived from such examples - for an accurate training phase. Wang et al. investigated the 'cold-start' problem: the identification of a fake review when a new reviewer posts one review. Without enough data about the stylistic features of the review and the behavioral characteristics of the reviewer, the authors first find similarities between the review text under investigation and other review texts. Then, they consider similar behavior between the reviewer under investigation and the reviewers who posted the identified reviews. A model based on neural networks proves to be effective to approach the problem of lack of data in cold-start scenarios. Although many years have passed and, as we will see briefly later, the problem has been addressed in many research works, with different techniques, automatically detecting a false review is an issue not completely solved yet, as stated in the recent survey of Wu et al.. This inspiring work examines the phenomenon not only giving an overview of the various detection techniques used over time, but also proposing twenty future research questions. Notably, to help scholars find suitable datasets for a supervised classification task, this survey lists the currently available review datasets and their characteristics. A similar work by Hussain et al, aimed at a comparison of different approaches, focuses on the performances obtained by different classification frameworks. Also, the authors carried on a relevance analysis of six different behavioral features of reviewers. Weighting the features with respect to their relevance, a classification over a baseline dataset obtains an 84.5% accuracy. A quite novel work considers the unveiling of malicious reviewers by exploiting the notion of `neighborhood of suspiciousness'. In, Kaghazgaran et al. proposed a system called TwoFace that, starting from identifiable reviewers paid by crowd-sourcing platforms to write fake reviews on well-known e-commerce platforms, such as Amazon, studies the similarity between these and other reviewers, based, e.g., on the reviewed products, and shows how it is possible to spot organized fake reviews campaigns even when the reviewers alternate genuine and malicious behaviors. Serra et al. developed a supervised approach where the task is to differentiate amongst different kinds of reviewers, from fraudulent, to uninformative, to reliable. Leveraging a supervised classification approach based on a deep recurrent neural network, the system achieves notable performances over a real dataset where there is an a priori knowledge of the fraudulent reviewers. The research work reminded so far lies in supervised learning. However, unsupervised techniques have been employed too, since they are very useful when no tagged data is available.

监督学习算法通常需要不同的例子 - 以及从这些示例中衍生的不同特征的值 - 用于准确的训练阶段。 Wang等人。调查了“冷启动”问题:当新审查员发布一次审查时,确定假审查。没有足够的数据关于审查员的风格特征和审阅者的行为特征,作者首先在调查和其他审查文本下找到了审查文本之间的相似之处。然后,他们考虑审查员在调查下的类似行为和发布所确定的评论的审核人员。基于神经网络的模型证明是有效地接近冷启动方案中缺乏数据的问题。虽然多年来已经过去了,正如我们稍后会看到的那样,许多研究工作已经解决了问题,具有不同的技术,自动检测错误审查是一个没有完全解决的问题,如吴et的近期调查所述问题。 Al ..这种鼓舞人心的工作审查了不仅概述了随着时间的推移使用的各种检测技术的现象,还提出了20个未来的研究问题。值得注意的是,为了帮助学者找到适合监督分类任务的合适数据集,本调查列出了当前可用的审核数据集及其特征。 Hussain等人在旨在进行不同方法的比较,侧重于不同分类框架获得的性能。此外,作者在审查人员的六种不同行为特征的相关性分析上进行了相关性分析。对其相关性的特征加权,基线数据集的分类获得了84.5%的精度。一项非常小说的工作考虑了通过利用“怀疑邻里”的概念来揭幕。在,kaghazgaran等。提出了一个称为Twoface的系统,从人群采购平台支付的可识别审核者开始写作亚马逊等知名电子商务平台的假审查,研究这些和其他审阅者之间的相似性,例如,在审查中产品,并展示了如何在审阅者替代真正和恶意行为的情况下发现有可能的人造假审查活动。 Serra等人。制定了一个受监督的方法,任务是在不同类型的审稿人之间区分,从欺诈,无知到可靠。利用基于深度经常性神经网络的监督分类方法,该系统通过真正的数据集实现显着的性能,在那里存在对欺诈性评审者的先验知识。到目前为止,研究工作提醒了监督学习。但是,也使用了无监督的技术,因为当没有标记数据时,它们非常有用。

Fake reviewers' coordination can emerge by mining frequent behavioral patterns and ranking the most suspicious ones. A pioneer work by first identifies groups of reviewers that reviewed the same set of products; then, the authors compute and aggregate an ensemble of anomaly scores (e.g., based on similarity amongst reviews and times at which the reviews have been posted): the scores are ultimately used to tag the reviewers as colluding or not. Another interesting approach for the analysis of colluding users is the one proposed by: the authors check whether a given group of accounts (e.g., reviewers on ) contains a subset of malicious accounts. The intuition behind this methodology is that the statistical distribution of reputation scores (e.g., number of friends and followers) of the accounts participating in a tampered computation significantly diverges from that of untampered ones. We close this section by referring back to the division made by Liu in about supervised, unsupervised, and group approaches to spot fake reviewers and/or reviews. As noted in , these are classification methods, mostly aiming at classifying in a binary or multiple way information items (i.e., credible vs non-credible)' with the evaluation of a series of credibility features extracted from the data. Notably, approaches, which are based on some prior domain knowledge, are promising in providing a ranking of the information item (i.e., in our scenario, of the review) with respect to credibility. This is the case of recent work by Pasi et al., which exploits a Multi-Criteria Decision Making approach to assess the credibility of a review. In this context, a given review, seen as an alternative amongst others, is evaluated with respect to some credibility criteria. An overall credibility estimate of the review is then obtained by means of a suitable model-driven approach based on aggregation operators. This approach has also the advantage of assessing the contribution that single or interacting criteria/features have in the final ranking. The techniques presented above have their pros and cons, and depending on the particular context, one approach can be preferred with respect to another. The most relevant contribution of our proposal with respect to the state of the art is to improve the effectiveness of the solution based on supervised classifiers, which, as seen above, is a well-known and widely-used approach in this context.

假审阅者的协调可以通过挖掘频繁的行为模式和排名最具可疑的协调来涌现。首先识别先锋工作,识别审查同一套产品的审稿人群;然后,作者计算并汇总异常分数的集合(例如,基于评论已经发布的评论和时间的相似性):分数最终用于标记审阅者作为勾结。另一个有趣的分析勾结用户的方法是提出的方法:作者检查一组给定的帐户(例如,审核员)是否包含恶意账户的子集。这种方法背后的直觉是,参与篡改计算的账户的信誉分数(例如,朋友和追随者的数量)显着不同于不可歧视的计算。我们通过返回刘的划分关于监督,无人监督,小组途径以发现假审查者和/或审查的方法来关闭本节。如上所述,这些是分类方法,主要针对在“数据”中提取的一系列可信度功能的评估中分类(即,可信VS不可信)的分类。值得注意的是,基于一些现有领域知识的方法承诺在可信度方面提供信息项(即,在我们的方案中的信息)的排名。这是PASI等人的最新工作的情况。,PASI等人的工作利用多标准决策方法来评估审查的可信度。在这种情况下,在一些可信度标准的情况下评估了作为其他人的替代方案的给定审查。然后通过基于聚合运算符的合适的模型驱动方法获得审查的整体可信度估计。这种方法还具有评估单个或互动标准/特征在最终排名中的贡献的优点。上面呈现的技术具有它们的优点和缺点,并且根据特定的背景,可以对另一个方法优选一种方法。我们关于本领域技术的提案的最相关贡献是提高基于监督分类器的解决方案的有效性,如上所述,这是在这种情况下是一种众所周知的和广泛使用的方法。

Feature Engineering

In this section, we introduce a subset of features that have been adopted in past work to detect opinion spam and we propose how to modify them in order to improve the performances of classifiers. We emphasize that the listed features have been used effectively for this task by past researchers. We give below the rationale for their use in the context of unveiling fake reviews. Finally, it is worth noting that the list of selected features is not intended to be exhaustive.

功能工程

在本节中,我们介绍了过去的工作中采用的一个功能,以检测意见垃圾邮件,并建议如何修改它们以改善分类器的性能。我们强调,过去的研究人员已经有效地使用了所列功能。我们在揭幕假审查的背景下给予他们使用的理由。最后,值得注意的是,所选功能列表并不令人穷的。

Basic Features

Following a supervised classification approach, the selection of the most appropriate features plays a crucial role, since they may considerably affect the performance of the machine learning models constructed starting from them. Features can be review-centric or reviewer-centric. The former are features that refer to the review, while the latter refer to the reviewer. In the literature, several reviewer-centric features have been investigated, such as the maximum number of reviews, the percentage of positive reviews, the average review length, the reviewer rating deviation. According to the outcomes of several works proposed in the context of opinion spam detection , we focused on reviewer-centric features, which have been demonstrated to be more effective for the identification of fake reviews. Thus, we relied on a set of , which have been already used proficiently in the literature for the detection of opinion spam in reviews. Specifically, we focused on the following reviewer-centric features: - {Photo Count}: This metric measures the number of pictures uploaded by a reviewer and is directly retrieved from the reviewer profile. In, the authors demonstrated the effectiveness of using photo count, together with other non-verbal features, for detecting fake reviews. - {Review Count}: It measures how many reviews have been posted by a reviewer on the platform. showed that spammers and non-spammers present different behavior regarding the number of reviews they post. In particular, spammers usually post more reviews, since they may get paid. This feature has also been investigated by and . - {Useful Votes}: The most popular online review platforms allow users to rank reviews as useful or not. This information can be retrieved from the reviewer profile, or computed by summing the total amount of useful votes received by a reviewer. This feature has already been exploited by and it has been demonstrated to be effective for opinion spam detection. - {Reviewer Expertise}: Past research in highlights that reviewers with acquired expertise on the platform are less prone to cheat.

基本功能

遵循监督分类方法,选择最合适的功能起到重要作用,因为它们可能会大大影响从它们开始构造的机器学习模型的性能。特点可以是以审查为中心的或以评论者为中心的。前者是参考审查的功能,而后者则指的是审阅者。在文献中,已经调查了几个审查员为中心的特征,例如最多的评论,验证的百分比,平均审查长度,审核率偏差。根据在意见垃圾邮件检测范围内提出的几项作品的结果,我们专注于审核人的特征,这些功能已经被证明更有效地识别虚假评论。因此,我们依赖于一套,这已经在文献中已经熟练地用于检测评论中的意见垃圾邮件。具体来说,我们专注于以下审阅者为中心的特征:

  • {照片计数}:此度量标准测量审阅者上传的图片数量,并从审阅者配置文件中直接检索。在,作者展示了使用照片计数的有效性,以及其他非语言特征,用于检测假审查。
  • {审核计数}:衡量平台上的评论家发布了多少评论。显示垃圾邮件发送者和非垃圾邮件发送者对他们发布的评论数量的不同行为产生了不同的行为。特别是,垃圾邮件发送者通常会发布更多审查,因为他们可能会得到报酬。此功能也被调查了。
  • {有用的投票}:最受欢迎的在线评论平台允许用户将评论评为有用的综合评论。可以从审阅者配置文件中检索此信息,或者通过概括审阅者收到的有用投票总额来计算。此功能已被利用,并且已被证明是对意见垃圾邮件检测有效。
  • {审阅者专业知识}:过去的研究突出的审查员在平台上获得的专业知识不太容易作弊。

Particularly, Mukherjee et al. in report that opinion spammers are usually not longtime members of a site. Genuine reviewers, however, use their accounts from time to time to post reviews. Although this experimental evidence does not mean that no spammer can be a member of a review platform for a long time, the literature has considered useful to exploit the activity freshness of an account in cheating detection. The Reviewer Expertise has been defined by Zhang et al. in as the number of days a reviewer has been a member of the platform (the original name was Membership Length).

  • {Average Gap}: The review gap is the time elapsed between two consecutive reviews. This feature has been previously introduced in the seminal work by Mukherjee et al., under the name {\it Activity Window}, and successfully re-adopted for detecting both colluders (i.e., spammers acting with a coordinated strategy) and singleton reviewers (i.e., reviewers with just isolated behavioral posting) . In the cited work, the Activity window feature as been proved highly discriminant for demarcate spammers and non-spammers. Quoting from, fake reviewers are likely to review in short bursts and are usually not longtime active members.

特别是mukherjee等。在报告中,意见垃圾邮件发送者通常不会长期成员。但是,真正的审核人员不时使用他们的帐户来发布评论。虽然这种实验证据并不意味着长期没有垃圾邮件发送者可以成为审查平台的成员,但文献已被认为有助于利用在作弊检测中的账户的活动新鲜度。 Zhang等人已经定义了审稿人的专业知识。在审阅者的日子中,审阅者一直是平台的成员(原名是会员长度)。

  • {平均差距}:审查缺口是两次连续评论之间经过的时间。此功能以来,Mukherjee等人在Omkherjee等人的名称下,并以{\ IT活动窗口}的名义,并成功地重新采用,以检测勾结斗争者(即,用协调策略的垃圾邮件发送者)和单例评论者(即,审查员只有孤立的行为帖子)。在引用的工作中,活动窗口特征是被证明是划分垃圾邮件发送者和非垃圾邮件发送者的高度判别。引用,假审查人员可能会在短暂的爆发中审查,通常不是长期活动成员。

On a Yelp dataset where was a priori known the benign and malicious nature of reviewers, work in proved that, by computing the difference of timestamps of the last and first reviews for all the reviewers, a majority (80%) of spammers were bounded by 2 months of activity, whereas the same percentage of non-spammers remain active for at least 10 months. We define the Average Gap feature as the average time, in days, elapsed between two consecutive reviews of the same reviewer and is defined as:

$$ AG_i = \frac{1}{N_i - 1} \sum_{j=2}^{N_i} (T_{i,j} - T_{i,j-1}) $$

where $ AG_i $ is the Average Gap for the $ i $ -th user, $ N_i $ is the number of reviews written by the user, $ T_{i,j} $ is the timestamp of the $ j $ -th reviews of the $ i $ -th user.

  • {Average Rating Deviation}: The rating deviation measures how much a reviewer's rating is far from the average rating of a business. observed that spammers are more prone to deviate from the average rating than genuine reviewers. However, a bad experience may induce a genuine reviewer to deviate from the mean rating. The Average Rating Deviation is defined as follows

$$ ARD_i = \frac{1}{N_i} \sum_{j=1}^{N_i} \abs{R_{i,j} - {R_{B(j)}}} $$

where $ ARD_i $ is the Average Rating Deviation of the $ i $ -th user, $ N_i $ is the number of reviews written by the user, $ R_{i,j} $ is the rating given by the $ i $ -th user to her/his $ j $ -th reviews corresponding to the business $ B(j) $ , $ {R_{B(j)}} $ is the average rating obtained by the business $ B(j) $ .

  • {First Review}: Spammers are usually paid to write reviews when a new product is placed on the market. This is due to the fact that early reviews have a great impact on consumers' opinions and, in turn, impact the sales, as pointed out by and . We compute the time elapsed between each review of a reviewer and the first review, for the same business. Then, we average the results on all the reviews. Specifically, the First Review value for reviewer $ i $ is given by:

$$ FRT_i = \frac{1}{N_i} \sum_{j=1}^{N_i} (T_{i,j} - F_{B(j)}) $$

where $ FR_i $ is the First Review value of the $ i $ -th user, $ N_i $ is the number of reviews written by the user, $ T_{i,j} $ is the time the $ i $ -th user wrote the $ j $ -th review and $ F_B(j) $ is the time the first review of the same business $ B(j) $ , corresponding to the one of the $ j $
-th review, has been posted.

  • {Reviewer Activity}: Several works pointed out that the more active a user on the online platform, the more the user is likely genuine, in terms of contributing with knowledge sharing in a useful way. The usefulness of this feature has been demonstrated several years ago. Since the early 00s, surveys have been conducted on large communities of individuals, trying to understand what drives them to be active and useful on an online social platform, in terms of sharing content. Results showed that people contribute their knowledge when they perceive that it enhances their reputations, when they have the experience to share, and when they are structurally embedded in the network. The Activity feature expresses the number of days a user has been active and it is computed as:
    $$ A_i = T_{i,L} - T_{i,0} $$
    where $ A_i $ is the activity (expressed in days) of the $ i $ -th user, $ T_{i,L} $ is the time of the last review of the $ i $ -th user and $ T_{i,0} $ is the time of the first review of the $ i $ -th user.

在yelp数据集上,先验已知审阅者的良性和恶意性质,证明,通过计算对所有评论者的最后一个评论的时间戳和第一份评论的差异,垃圾邮件发送者的大多数(80%)被束缚2个月的活动,而相同的非垃圾邮件发送百分比仍然活跃至少10个月。我们将平均差距特征定义为平均时间,在同一审阅者的两个连续询问之间经过的时间,并且被定义为:
$$ AG_i = \frac{1}{N_i - 1} \sum_{j=2}^{N_i} (T_{i,j} - T_{i,j-1}) $$
其中$ ag_i $是$ i $ --th用户的平均间隙,$ n_i $是用户编写的审查数量,$ t_ {i,j} $是$ i $ -th用户的$ j $的时间戳。

  • {平均评级偏差}:评级偏差衡量评审者的评级远远距离业务的平均评级。观察到垃圾邮件发送者更容易偏离比真正审稿人的平均额定值。然而,糟糕的经历可能会诱使真正的评论者偏离平均评分。平均评级偏差定义如下:
    $$ ARD_i = \frac{1}{N_i} \sum_{j=1}^{N_i} \abs{R_{i,j} - {R_{B(j)}}} $$
    在$ ard_i $是$ i $ -th用户的平均评级偏差,$ n_i $是用户,$ r_ {i,j}的审核数量。 $ i $ -th用户给她/他的$ b $ sth评论给她的商业$ b(j)$,$ {r_ {b(j)}}$由商家为$B(j)$。
  • {第一个审查}:垃圾邮件发送者通常在市场上播放新产品时撰写审查。这是由于早期评论对消费者的意见产生了很大影响,而且反过来影响销售额,如下所示。我们计算同一业务对审阅者的每次审查和第一次审查之间过去的时间。然后,我们平均对所有评论的结果。具体而言,审阅者$ i $的第一个审查值由:
    $$ FRT_i = \frac{1}{N_i} \sum_{j=1}^{N_i} (T_{i,j} - F_{B(j)}) $$
    其中$ FR_i $是$ i $ -th用户编写$ j $ sth评论和$ f_b(j)$的时间是第一次审查同一$ F_B(j) $的时间,对应于其中一个$ j $审查,已发布。
  • {审阅者活动}:几种作品指出,在在线平台上的用户越活跃,用户可能是真正的,就在有用的方式贡献知识共享方面。几年前已经证明了这一功能的有用性。自从00年代初期以来,已经在个人的大型社区上进行了调查,试图了解在共享内容方面,在线社交平台驱动他们在线社交平台上的驱动器。结果表明,当他们认为它具有共享经验时,人们在察觉时促进他们的声誉,以及在结构上嵌入在网络中时,人们会促进他们的知识。活动特征表达了用户已激活的天数,并且计算为:
    $$ A_i = T_{i,L} - T_{i,0} $$
    其中$ a_i $是$ i $ -th用户,$ t_ {i,l} $的活动(在几天)是关于$ i $ -th用户和$ t_ {i,0} $的最后审查时间第一次审查$ i $最终用户的时间。

From Basic to Cumulative Features

The features described so far have been used to train a machine learning algorithm to construct a classifier, in a supervised-learning fashion. In this work, we propose to build on the basic features, with a proper feature engineering process, to possibly assess an improvement of the classification performances. The proposed feature engineering process is based on the concept of Cumulative Relative Frequency Distribution. The Relative Frequency is a quantity that expresses how often an event occurs divided by all outcomes. It can be easily represented by a Relative Frequency table. The Relative Frequency table is constructed directly from the data by simply dividing the frequency of a value by the total number of values in the dataset. The Cumulative Relative Frequency is then calculated by adding each frequency from the Relative Frequency table to the sum of its predecessors. In practice, the Cumulative Relative Frequency indicates the percentage of elements in the dataset that lies below the current value. In this work, we modify each feature by using its Cumulative Relative Frequency Distribution. In the following, we show an example of how to compute the Cumulative Relative Frequency Distribution. Let us consider the feature and assume that for each photo count value, the corresponding number of occurrences is the one reported in the second column of Tablefreq_table. Thus, the second column reports the number of reviews associated with a reviewer who uploaded a given number of photos: in our example, there are 7,944 reviews whose reviewers have no photo associated. The third column measures the Relative Frequency, which is computed by dividing the number of occurrences by the total number of reviews. Finally, the fourth column reports the Cumulative Relative Frequency values, which have been obtained by adding each Relative Frequency value to the sum of its predecessor. In our proposal, the process described so far is carried out for each basic feature and the Cumulative Relative Frequency values are used to train the classifier instead of the simple values. This involves, in practice, to substitute each value of the first column with the corresponding value of the fourth column of Tablefreq_table.

Photo Frequency Relative.Freq. Cumulative
Count (#Reviews) (%Reviews) Rel.Freq.
0 7944 0.44 0.44
1 2301 0.13 0.57
2 1756 0.10 0.67
3 1401 0.08 0.75
4 822 0.04 0.79
5 1382 0.08 0.87
Photo Frequency Relative.Freq. Cumulative
Count (#Reviews) (%Reviews) Rel.Freq.
6 550 0.03 0.90
7 347 0.02 0.92
8 780 0.04 0.96
9 342 0.02 0.98
10 460 0.02 1.00

从基本到累积功能

所描述的特征已经用于训练机器学习算法以监督学习方式构建分类器。在这项工作中,我们建议建立基本功能,具有适当的特征工程过程,可能会评估分类性能的改进。所提出的特征工程过程基于累积相对频率分布的概念。相对频率是表示事件发生的频率除以所有结果的量。它可以容易地由相对频率表表示。通过简单地将值的频率通过数据集中的总值划分值,直接从数据构成相对频率表。然后通过将来自相对频率表的每个频率从相对频率表添加到其前述者的总和来计算累积相对频率。在实践中,累积相对频率指示在低于当前值的数据集中的元素百分比。在这项工作中,我们通过使用其累积相对频率分布来修改每个功能。在下文中,我们显示了如何计算累积相对频率分布的示例。让我们考虑该功能并假设对于每个照片计数值,相应的出现次数是TableFreq_table的第二列中报告的。因此,第二列报告与上传给定照片的评论员相关的评论的次数:在我们的示例中,有7,944条评论审查员没有照片相关联。第三列测量相对频率,通过划分差异的总次数来计算的相对频率。最后,第四列报告累积相对频率值,该累积频率值是通过将每个相对频率值添加到其前任的总和而获得的。在我们的提议中,到目前为止所描述的过程是针对每个基本特征进行的,并且累积相对频率值用于训练分类器而不是简单的值。这涉及在实践中替代第一列的每个值与TableFreq_table的第四列的相应值。

Experimental Setup

In this section, we describe the setting of the experiments conducted to evaluate the effectiveness of the proposed features. This is done by comparing the results obtained when using the basic features and the cumulative ones with the most widespread supervised machine learning algorithms.

实验设置

在本节中,我们描述了对评估所提出的特征的有效性进行的实验的设置。这是通过比较使用基本特征和累积的结果获得的结果来完成的,这是通过与最广泛的监督机器学习算法进行比较。

Dataset Construction and Characteristics

The dataset used in this study is composed of 56,317 business reviews, 42,673 businesses (both restaurants and hotels), and 1,429 reviewers. This dataset has been obtained by repopulating the YelpCHI dataset. The YelpCHI dataset included 67,395 reviews from 201 hotels and restaurants done by 38,063 reviewers and each review was tagged with a fake/non-fake label. To tag reviewers in the YelpCHI dataset and to obtain fresher data to work with, we operate as follows: In Tabledataset_summary we report a summary with the statistics of the basic features for the repopulated dataset, whereas in Figurecorr_mat we present the correlation matrix, which shows the correlation coefficients among variables.

photo review useful reviewer avg avg rating first reviewer
count count votes expertise gap deviation review activity
\mean 170.9 201.9 502.7 3664.7 55.2 0.01 13.9 2637.6
std_dev 911 298.2 2089.45 579.8 110.6 0.06 50.2 991.3

image

数据集建设和特性

本研究中使用的数据集由56,317个商务评估,42,673名业务(餐馆和酒店)组成,以及1,429名评论员。通过重新填充yelpchi数据集获得了此数据集。 Yelpchi DataSet包括来自201家酒店和餐馆的67,395条点评由38,063名评论员提供,每次评论都标有假/非假标签。在Yelpchi DataSet中标记Reviewers并获取使用的更新数据,我们按照以下操作:在TableDataseT_Summary中运行,我们将摘要与重新填充数据集的基本功能的统计数据统计,而在图中,我们呈现了所显示的相关矩阵变量之间的相关系数。

Data Labeling

One limitation of supervised classification approaches is the possible lack of labeled data. To overcome this problem, relied on Amazon Mechanical Turks to generate fake and genuine reviews. Nevertheless, highlighted the limits of this approach, since workers could be not always ef