[论文翻译]探索生成式人工智能在教育中的影响:主题分析


原文地址:https://arxiv.org/pdf/2501.10134


Exploring the Impact of Generative Artificial Intelligence in Education: A Thematic Analysis

探索生成式人工智能在教育中的影响:主题分析

Abstract

摘要

The recent advancements in Generative Artificial intelligence (GenAI) technology have been transformative for the field of education. Large Language Models (LLMs) such as ChatGPT and Bard can be leveraged to automate boilerplate tasks, create content for personalised teaching, and handle repetitive tasks to allow more time for creative thinking. However, it is important to develop guidelines, policies, and assessment methods in the education sector to ensure the responsible integration of these tools. In this article, thematic analysis has been performed on seven essays obtained from professionals in the education sector to understand the advantages and pitfalls of using GenAI models such as ChatGPT and Bard in education. Exploratory Data Analysis (EDA) has been performed on the essays to extract further insights from the text. The study found several themes which highlight benefits and drawbacks of GenAI tools, as well as suggestions to overcome these limitations and ensure that students are using these tools in a responsible and ethical manner.

生成式人工智能 (Generative AI, GenAI) 技术的最新进展对教育领域产生了变革性影响。诸如 ChatGPT 和 Bard 这样的大语言模型 (LLMs) 可以用于自动化模板任务、创建个性化教学内容以及处理重复性任务,从而为创造性思维留出更多时间。然而,在教育领域制定指导方针、政策和评估方法以确保这些工具的负责任整合至关重要。本文通过对教育领域专业人士撰写的七篇文章进行主题分析,以了解在教育中使用 ChatGPT 和 Bard 等生成式人工智能模型的优势和潜在问题。通过对这些文章进行探索性数据分析 (EDA),进一步提取了文本中的见解。研究发现了一些主题,这些主题突出了生成式人工智能工具的优势和缺点,并提出了克服这些局限性的建议,以确保学生以负责任和合乎道德的方式使用这些工具。

Keywords: generative artificial intelligence, large language models, education, AI-assisted coding

关键词:生成式人工智能 (Generative AI),大语言模型 (Large Language Model),教育,AI辅助编程

1. Introduction

1. 引言

The accelerated advancements in Artificial Intelligence (AI) over the past decade have disrupted several fields such as education [1], healthcare [2] [3], finance [4], and law [5]. Natural Language Processing (NLP) is a subfield of AI responsible for understanding, synthesizing, and generating human language [6]. Examples of applications of NLP include sentiment analysis in various languages [7] [8], hate speech detection [9] [10], machine translation [11], and question answering [12].NLP systems have evolved from early rule-based chatbots, such as ALICE [13] and ELIZA [14], to the advanced transformer-based systems [15] such as Bidirectional Encoder Representations from Transformers (BERT) [16] and A Robustly Optimized BERT Pretraining Approach (RoBERTa) [17], which have found numerous applications [18] [19].These models are pre-trained on large amounts of data and consist of parameters in the millions or billions, enabling them to capture the context of the conversation and other linguistic complexities [20] [16]. They have gained popularity due to their ability to enhance human productivity, boost creativity [21], and support personalized and continuous learning [22].

过去十年中,人工智能 (AI) 的快速发展已经颠覆了多个领域,如教育 [1]、医疗保健 [2] [3]、金融 [4] 和法律 [5]。自然语言处理 (NLP) 是 AI 的一个子领域,负责理解、合成和生成人类语言 [6]。NLP 的应用示例包括多种语言的情感分析 [7] [8]、仇恨言论检测 [9] [10]、机器翻译 [11] 和问答系统 [12]。NLP 系统已经从早期的基于规则的聊天机器人(如 ALICE [13] 和 ELIZA [14])发展到基于 Transformer 的先进系统 [15],例如双向编码器表示 (BERT) [16] 和鲁棒优化的 BERT 预训练方法 (RoBERTa) [17],这些系统已经找到了许多应用 [18] [19]。这些模型在大规模数据上进行预训练,并包含数百万或数十亿的参数,使它们能够捕捉对话的上下文和其他语言复杂性 [20] [16]。它们因其能够提高人类生产力、增强创造力 [21] 以及支持个性化和持续学习 [22] 而受到欢迎。

Generative AI (GenAI) refers to AI systems capable of creating text, audio, and images, in response to user prompts [23]. In recent years, the outstanding capabilities of GenAI tools and LLMs such as ChatGPT [24], Bing-AI [25], and Bard [26] have highlighted the potential of this technology in education [27]. The ability of ChatGPT to carry out natural sounding conversations and respond in the style requested by the user can be harnessed to develop engaging teaching aids that suit the needs of the students [28] [29]. For software development, students can use AI-assisted coding tools to generate boilerplate templates [30], perform troubleshooting and debugging [31], and generate documentation [32]. GenAI tools can function as a personalized tutor for students, encouraging an adaptive learning environment, and reducing their dependence on educators [1] [33]. OpenAI’s website has provided a student’s guide to writing with ChatGPT, which suggests use cases such as formatting citations, providing foundational knowledge on a new topic, providing relevant research sources, providing answers to specific questions, providing tailored and iterative feedback, and suggesting counterarguments for a thesis [34]. However, there are several pitfalls associated with GenAI technology that can lead to concerns about academic integrity, plagiarism [35] [36], over-reliance on and potential misuse of the technology, and transparency about their operation [37]. Therefore, it is important to understand and address these challenges.

生成式 AI (Generative AI, GenAI) 是指能够根据用户提示生成文本、音频和图像的 AI 系统 [23]。近年来,ChatGPT [24]、Bing-AI [25] 和 Bard [26] 等生成式 AI 工具和大语言模型 (LLM) 的卓越能力凸显了该技术在教育领域的潜力 [27]。ChatGPT 能够进行自然对话并根据用户要求的风格进行响应,这一能力可用于开发适合学生需求的引人入胜的教学辅助工具 [28] [29]。在软件开发方面,学生可以使用 AI 辅助编码工具生成样板代码 [30]、进行故障排除和调试 [31] 以及生成文档 [32]。生成式 AI 工具可以充当学生的个性化导师,鼓励适应性学习环境,并减少学生对教育工作者的依赖 [1] [33]。OpenAI 的网站提供了学生使用 ChatGPT 写作的指南,建议的用例包括格式化引用、提供新主题的基础知识、提供相关研究来源、回答具体问题、提供定制化和迭代反馈,以及为论文提供反论点 [34]。然而,生成式 AI 技术也存在一些缺陷,可能导致对学术诚信、剽窃 [35] [36]、过度依赖和潜在滥用技术以及其操作透明性的担忧 [37]。因此,理解和应对这些挑战至关重要。

Thematic analysis is a qualitative research approach used to identify themes and patterns from data [38] [39]. It involves generating initial codes from the data, aggregating similar codes together, and drawing insights from the resulting themes. Exploratory Data Analysis (EDA) is a data analytics process that also aims to uncover patterns in relationships in a dataset.

主题分析是一种定性研究方法,用于从数据中识别主题和模式 [38] [39]。它包括从数据中生成初始代码,将相似的代码聚合在一起,并从生成的主题中提取见解。探索性数据分析 (EDA) 是一种数据分析过程,同样旨在揭示数据集中关系的模式。

In this study, opinions, in the form of unstructured essays, were obtained from 7 educators discussing the potential benefits and challenges of integrating GenAI in education. Thematic analysis has been performed on these essays by extracting codes and deriving themes from them. Additionally, EDA has been performed on the text to derive insights from the essays. The rest of the paper is structured as follows: Section 2 details the motivation of the study along with the hypothesis and research questions. Section 3 covers the methodology used to conduct the study and perform thematic analysis on educator opinions. Section 4 includes the opinion essays provided by the 7 educators. In Section 5, the identified themes are discussed in detail, and Section 6 details the results of EDA. Section 7 attempts to answer the research questions in context of the findings of the analysis. Section 8 concludes the study.

在本研究中,我们以非结构化文章的形式收集了7位教育工作者关于将生成式AI (Generative AI) 融入教育的潜在益处和挑战的观点。通过对这些文章进行主题分析,提取代码并从中推导出主题。此外,还对文本进行了探索性数据分析 (EDA),以从文章中得出见解。本文的其余部分结构如下:第2节详细介绍了研究的动机、假设和研究问题。第3节涵盖了用于进行研究并对教育工作者观点进行主题分析的方法。第4节包含了7位教育工作者提供的观点文章。第5节详细讨论了识别出的主题,第6节详细介绍了EDA的结果。第7节尝试在分析结果的背景下回答研究问题。第8节总结了本研究。

2. Motivation

2. 动机

The impressive capabilities of GenAI tools such as their ability to carry out natural conversations about a wide array of topics [40], perform analysis on multimodal data [24], and generate personalized content [41], come with several risks. Although these tools can be greatly beneficial by serving functions such as automating repetitive tasks [42], and providing personal tutoring [43], they pose significant ethical concerns and can be detrimental to the learning process and development of problem-solving skills [44]. The motivation behind conducting this study is to gain insight from educator opinions about the use of GenAI in the field of education. The individual perspectives of the educators can be a helpful tool in understanding the potential advantages and challenges of this transformative technology. This can help in the effective and ethical integration of GenAI tools in educational practices to harness their potential, while avoiding potential misuse and the limitations presented by the technology.

生成式 AI (GenAI) 工具的令人印象深刻的能力,例如能够就广泛主题进行自然对话 [40]、对多模态数据进行分析 [24] 以及生成个性化内容 [41],也伴随着一些风险。尽管这些工具通过自动化重复性任务 [42] 和提供个性化辅导 [43] 等功能可以带来巨大益处,但它们也引发了重大的伦理问题,并可能对学习过程和问题解决能力的发展产生不利影响 [44]。开展这项研究的动机是从教育者的意见中获取关于生成式 AI 在教育领域应用的见解。教育者的个人观点可以帮助我们理解这一变革性技术的潜在优势和挑战。这有助于在教育实践中有效且合乎道德地整合生成式 AI 工具,以发挥其潜力,同时避免潜在的滥用和技术本身的局限性。

2.1. Hypothesis and Research Questions

2.1. 假设与研究问题

The hypothesis for this study is as follows:

本研究的假设如下:

Hypothesis: Educators perceive both advantages and challenges in the integration of GenAI in education.

假设:教育工作者认为将生成式 AI (Generative AI) 融入教育既有利也有挑战。

The research questions formulated to explore the hypothesis are as follows:

为探索该假设而制定的研究问题如下:

3. Methodology

3. 方法论

In this section, the methodology used to perform thematic analysis and exploratory data analysis has been discussed.

在本节中,讨论了用于执行主题分析和探索性数据分析的方法。

Table 1: Educator Details

表 1: 教育者详情

No. Gender
1 M
2 M
3 F
4 M
5 M
6 F
7 M

Table 1 lists the gender details of the educators who participated in the study. 5 out of the 7 educators are male, which is a high gender imbalance. Educators have provided lectures in machine learning, digital marketing, programming, databases, distributed systems, statistics, game development, and research methods in machine learning.

表 1 列出了参与研究的教师的性别详细信息。7 名教师中有 5 名是男性,性别比例严重失衡。这些教师教授的课程包括机器学习、数字营销、编程、数据库、分布式系统、统计学、游戏开发以及机器学习研究方法。

3.1. Thematic Analysis

3.1. 主题分析

Thematic analysis is a technique to find patterns and themes within qualitative data to uncover underlying topics and ideas [45]. Figure 1 displays the steps involved in performing thematic analysis as detailed by Braun and Clarke [46]. It consists of the following steps: Familiarize yourself with the data, generate initial codes from the data, search for themes, review themes, define themes, and complete the write-up.

主题分析是一种在定性数据中寻找模式和主题以揭示潜在话题和想法的技术 [45]。图 1 展示了 Braun 和 Clarke [46] 详细描述的主题分析步骤。它包括以下步骤:熟悉数据、从数据中生成初始代码、寻找主题、审查主题、定义主题以及完成撰写。


Figure 1: Thematic Methodology

图 1: 主题方法论

3.2. Exploratory Data Analysis

3.2. 探索性数据分析

In order to perform EDA on the opinion essays, pre-processing has been performed by converting the text to lowercase, removing all stopwords, and lemma ti zing the tokens. All the references, images, and headings were removed from the essays. The most common words and bigrams and extracted from the text.

为了对意见文章进行探索性数据分析 (EDA),我们进行了预处理,包括将文本转换为小写、移除所有停用词以及对 Token 进行词形还原。文章中所有的引用、图片和标题都被移除。我们从文本中提取了最常见的单词和双词组合。

4. Educator Opinions

4. 教育者观点

In the following subsections, the opinions essays provided by the educators have been included.

在以下小节中,包含了教育工作者提供的意见文章。

4.1. Educator 1

4.1. 教育者 1

Students should understand that AI chatbots are tools/resources that can help them but cannot do everything. For example, an MSc student was interested in studying a topic but did not have an appropriate dataset. They asked ChatGPT to simulate data for them. The dataset that ChatGPT created was nonsensical and highly inappropriate to answer the questions they wanted to examine. AI chatbots are not able to create simulated data without clear and explicit instructions. A possible exercise in data management and study design would be to ask AI chatbots to simulate data. Writing specific instructions to create data with an appropriate structure could be a useful exercise. Asking chatbots the right questions is the skill that needs to be learned. Concerns related to plagiarism and academic misconduct are valid in my opinion. Even though third level institutions are putting policies in place to deter students from claiming the work of AI chatbots is their own, use is still prevalent, and often difficult to detect or to prove. Several of my colleagues teaching mathematics at third level have noted that many students do not have the patience to learn mathematics. They are used to instant answers from online calculators and AI chatbots. The art of taking time to figure out a problem has been lost. This is worrying as one of the main attributes of maths graduates is problem-solving skills. Universities also need to work with primary and secondary schools so that students are not dependent on AI chatbots when they start third level education.

学生应理解,AI聊天机器人是能够帮助他们但不能完成所有任务的工具/资源。例如,一位理学硕士生对某个研究主题感兴趣,但没有合适的数据集。他们请求ChatGPT为他们模拟数据。ChatGPT创建的数据集毫无意义,完全不适合回答他们想要研究的问题。AI聊天机器人在没有清晰明确指令的情况下无法创建模拟数据。一个关于数据管理和研究设计的可能练习是要求AI聊天机器人模拟数据。编写具体指令以创建具有适当结构的数据可能是一个有用的练习。学会向聊天机器人提出正确的问题是需要掌握的技能。我认为,与抄袭和学术不端行为相关的担忧是合理的。尽管高等教育机构正在制定政策以防止学生将AI聊天机器人的工作成果据为己有,但使用仍然普遍,且往往难以检测或证明。我的一些在大学教数学的同事指出,许多学生没有耐心学习数学。他们习惯于从在线计算器和AI聊天机器人那里获得即时答案。花时间解决问题的艺术已经丧失。这令人担忧,因为数学毕业生的主要特质之一就是解决问题的能力。大学还需要与中小学合作,确保学生在开始高等教育时不会依赖AI聊天机器人。

It is important that students are taught about possible biases in AIgenerated content. In many cases, the methodology for producing content is not transparent or easily accessed and the relevance or accuracy of the information must be questioned. It has also been shown that there are concerns around copyright issues when using AI chatbots [47]. Students need to be educated about the potential dangers of this. A module or course on AI chatbots could be a mandatory part of every third-level degree as part of core skills to make sure students are informed about the use of such tools.

重要的是,学生需要了解 AI 生成内容中可能存在的偏见。在许多情况下,生成内容的方法并不透明或易于获取,因此信息的相关性或准确性必须受到质疑。研究还表明,使用 AI 聊天机器人时存在版权问题的担忧 [47]。学生需要了解这些潜在的危险。关于 AI 聊天机器人的模块或课程可以作为每个第三级学位的必修部分,作为核心技能的一部分,以确保学生了解此类工具的使用。

I am teaching programming to a group studying for a master’s degree in data analytics. The lectures are lab-based with a focus on solving practical problems in class. Almost all the students immediately open AI chatbots to help them with the exercises. This can help with minor fixes, but when it is used to write full functions it removes the learning to independently solve problems. It is often the case that the chatbot has written code close to correct, but students do not question the output, and without developing the skills to write functions themselves they are unable to correct and improve the chatbot generated code. Possible exercises and assignments could involve taking human or AI generated code that is partially correct and adapting it.

我正在为一群攻读数据分析硕士学位的学生教授编程课程。课程以实验为基础,重点在于在课堂上解决实际问题。几乎所有学生都会立即打开AI聊天机器人来帮助他们完成练习。这有助于解决一些小问题,但当它被用来编写完整的函数时,就会剥夺学生独立解决问题的学习机会。通常,聊天机器人编写的代码接近正确,但学生不会质疑输出结果,而且由于没有培养自己编写函数的技能,他们无法纠正和改进聊天机器人生成的代码。可能的练习和作业可以包括采用部分正确的人工或AI生成的代码并进行调整。

Universities and students should be careful about adopting the use of AIdriven tools. Students could be frustrated if lecturers use chatbots, but they are not allowed to. It is important to educate students on the weaknesses of ChatGPT. For example, it regularly miscalculates simple arithmetic operations. AI chatbots are excellent at relaying facts and writing text but reduce the possibility for creativity from the learner. There is an inherent struggle when writing an essay or code that I think is a necessary struggle. It is necessary to learn techniques and problem-solving skills and it is necessary to write creatively and to grapple with new concepts. People say it is like when the calculator was introduced – it will become normalised and an accepted part of education. However, as someone with over ten years of experience as an educator at third level, I have seen a very poor standard of mental maths and an over-reliance on calculators. Students could do with having better arithmetic skills in my opinion. Students with better abilities of estimation are better equipped to identify when an answer is clearly wrong and not blindly accept the calculator’s answer. Similarly, an over-reliance on AI chatbots will reduce students’ ability to write clearly and think independently. They will be less able to critique the output from AI chatbots which is by no means perfect.

大学和学生在采用AI驱动工具时应谨慎。如果讲师使用聊天机器人,而学生不被允许使用,学生可能会感到沮丧。教育学生了解ChatGPT的弱点非常重要。例如,它经常错误计算简单的算术运算。AI聊天机器人在传递事实和撰写文本方面表现出色,但减少了学习者的创造力。在撰写文章或代码时,存在一种固有的挣扎,我认为这是必要的。学习技巧和解决问题的技能是必要的,创造性地写作和应对新概念也是必要的。人们说这就像计算器被引入时一样——它将变得正常化,并成为教育中可接受的一部分。然而,作为一名在高等教育领域有十多年经验的教育者,我看到了心算水平非常差和对计算器的过度依赖。在我看来,学生应该具备更好的算术技能。具备更好估算能力的学生更能够识别出答案明显错误的情况,而不是盲目接受计算器的答案。同样,过度依赖AI聊天机器人将降低学生清晰写作和独立思考的能力。他们将更难批判AI聊天机器人的输出,而这些输出远非完美。

4.2. Educator 2

4.2. 教育者 2

Artificial Intelligence was consigned by many to either an academic or science fiction curiosity. Although the founding of the MIT AI lab predates the internet’s inception, Artificial Intelligence has remained largely a niche research pursuit even inside academia. This changed in 2023 when Generative AI and particularly Open AI’s Generative Pre-trained Transformer (GPT) Large Language Models (LLMs) attracted significant interest, popularity and familiarity amongst the general public. The conversational interface of ChatGPT introduced many to constructing prompts and refining output for the first time. It was clear that students were ahead in uptake of ChatGPT in particular than their educators! The arche typical computer science education centres on programming. Assistance was largely confined to initial code template generation and basic refactoring tools, mainly centred around languages such as Java and C# that structurally suited them. Just as Google is supplemented by domain-specific search tools, ChatGPT’s generality has now been augmented by tools such as GitHub CoPilot for programming.

人工智能曾被许多人视为学术或科幻的奇思妙想。尽管MIT人工智能实验室的成立早于互联网的诞生,但即使在学术界内部,人工智能也主要是一个小众的研究领域。这一情况在2023年发生了改变,生成式AI(Generative AI),特别是OpenAI的生成式预训练Transformer(GPT)大语言模型(LLMs),引起了公众的广泛兴趣、流行和熟悉。ChatGPT的对话界面首次让许多人接触到了如何构建提示词并优化输出。显然,学生在使用ChatGPT方面比他们的教育者更早地走在了前面!典型的计算机科学教育以编程为中心。辅助工具主要局限于初始代码模板生成和基本的重构工具,主要集中在Java和C#等结构上适合它们的语言。正如Google通过特定领域的搜索工具得到补充一样,ChatGPT的通用性现在也通过GitHub CoPilot等编程工具得到了增强。

These are now integrated into modern code authoring tools, and even traditional text editors such as emacs have interface packages available. As well as programming languages, the computing ecosystem houses a multitude of text-based information: configuration files, Infrastructure-as-code and system administration scripts. I have found that students in diverse fields such as cloud computing, data storage technologies and data architecture have been able to leverage generative AI to produce boilerplate templates. More usefully they can generate minimal working examples from which to develop and integrate their own solutions, reducing the barrier to entry of many tools, and increasing the breadth of their skillset. Early internet search engines included many operators to fine-tune searches, and whilst Google still supports them, very few users actively take advantage of them. The usefulness of output from GPT models is highly correlated to the quality of the prompts given. Learners will benefit significantly if prompt construction is integrated into information search and retrieval tutorials at an early stage. More specifically, computing students need to see appropriate use of generative AI in coding contexts by their instructors, just as they would encounter the use of refactoring tools by example. Optimal ways to use revolutionary new tools, and knowing when not to use them, is best achieved by experiential practice, not avoidance! Educators are grappling with the impact that generative AI has had on assessment, particularly highlighted by academic integrity concerns. Many essay-type assessments are at risk of being largely the work of LLMs rather than the student, including perhaps some assessments that were not fit-for-purpose in any event. Practical skill demonstration under examination conditions will probably need to form an increased part of the assessment for many applied subjects, with prohibition or explicit limits on the use of generative AI and other tooling.

这些功能现已集成到现代代码编写工具中,甚至像 emacs 这样的传统文本编辑器也有可用的界面包。除了编程语言,计算生态系统中还包含大量基于文本的信息:配置文件、基础设施即代码 (Infrastructure-as-code) 和系统管理脚本。我发现,云计算、数据存储技术和数据架构等不同领域的学生已经能够利用生成式 AI 生成样板模板。更有用的是,他们可以生成最小可行示例,从中开发和集成自己的解决方案,从而降低许多工具的入门门槛,并扩大他们的技能范围。早期的互联网搜索引擎包含许多用于微调搜索的运算符,虽然 Google 仍然支持这些运算符,但很少有用户积极利用它们。GPT 模型输出的有用性与所提供提示的质量高度相关。如果在早期阶段将提示构建集成到信息搜索和检索教程中,学习者将受益匪浅。更具体地说,计算专业的学生需要通过教师的示例看到生成式 AI 在编码环境中的适当使用,就像他们会遇到重构工具的使用一样。使用革命性新工具的最佳方式,以及知道何时不使用它们,最好通过体验式实践来实现,而不是回避!教育工作者正在努力应对生成式 AI 对评估的影响,特别是学术诚信问题。许多论文型评估可能大部分是由大语言模型而非学生完成的,包括一些可能本身就不适合的评估。在考试条件下展示实践技能可能需要在许多应用学科的评估中占据更大的比重,并对生成式 AI 和其他工具的使用进行禁止或明确限制。

4.3. Educator 3

4.3. 教育者 3

After seeing first hand the impact ChatGPT can have on a student that is struggling with getting code to work in a project, ChatGPT was freely available there to fix any bugs the student was struggling with and allowed them to move on to the next part of the project without having to ask for assistance from a supervisor/lecturer. This is an invaluable resource that allows the student more independence in their learning when it should be independent learning. However this only works well when the student has already achieved the foundational learning needed in the area and is now trying to do apply more advanced techniques. The student then has enough knowledge to understand when the prompts it has given ChatGPT has actually lead to a coherent and correct answer.

在亲眼目睹了 ChatGPT 对一个在项目中遇到代码问题的学生的影响后,ChatGPT 可以自由地帮助学生修复他们遇到的任何错误,并让他们能够继续项目的下一部分,而无需向导师/讲师寻求帮助。这是一个宝贵的资源,让学生在应该独立学习的时候获得更多的独立性。然而,这只有在学生已经掌握了该领域所需的基础学习,并且现在正在尝试应用更高级的技术时才能很好地发挥作用。学生有足够的知识来理解他们给 ChatGPT 的提示是否真的导致了连贯且正确的答案。

Academic integrity has been an issue since for over a century and will continue to be an issue with the current education and research structures [48]. During the pandemic and since the pandemic, learning has moved to a more blended online learning environment. Universities were already needing to update the policies and procedures to take into account this more fluid learning environment whilst maintaining the integrity of the grades being achieved without the formal onsite externally in vigil a ted exams. They use of these more freely available AI tools has just accelerated this need even more so than the pandemic whether the learning is primarily in class room or online.

学术诚信问题已存在一个多世纪,在当前的教育和研究结构下,这一问题将持续存在 [48]。疫情期间及之后,学习逐渐转向更加混合的在线学习环境。大学已经需要更新政策和程序,以适应这种更加灵活的学习环境,同时确保在没有正式现场监考的情况下,所获成绩的完整性。这些更易获取的 AI 工具的使用,进一步加速了这一需求,甚至比疫情期间更为迫切,无论学习主要是在课堂还是在线进行。

The need to be more inventive with the forms of assessment are needed that even if a student is to use an AI tool, despite explicated prohibited, that you can assess whether the learning has been achieved. This might take place in many different ways whether it be; Q&A sessions with the students on a topic/project, screen casts where the student explains the work, students have to critique the work of AI tools, etc. But it does mean what has worked in the past to assess this module may not still work now and needs a lot of thought from individual lecturers and programme teams to understand what will work for their courses, ideally guided by updated institutional academic integrity policies. Formal onsite exams still have a place in this new age of learning, and it might seem like an easy solution to assessing the learning from a student without the use of AI tools. Although for many courses, in particular ICT sector of education, formal onsite exams have long been replaced with various continuous assessment strategies and formal onsite exams should not be brought back in light of these AI challenges after it was argued that is not appropriate way to measure the student’s learning in the area previously. The need for a diverse set of assessment strategies are the best way to assess the students abilities [49].

需要更具创造性的评估形式,即使学生使用了AI工具(尽管明确禁止),也能评估学习成果是否达成。这可以通过多种方式实现,例如:与学生就某个主题/项目进行问答、学生通过屏幕录制解释其工作、学生必须批判AI工具的工作等。但这意味着过去用于评估该模块的方法可能不再适用,需要讲师和课程团队深入思考,以确定适合其课程的评估方式,最好在更新的学术诚信政策指导下进行。正式的现场考试在这个新的学习时代仍然占有一席之地,它可能看起来是一种无需使用AI工具就能评估学生学习成果的简单解决方案。然而,对于许多课程,特别是ICT教育领域,正式的现场考试早已被各种持续评估策略所取代,鉴于这些AI挑战,不应重新引入正式现场考试,因为此前已有人认为这不是衡量学生在该领域学习的合适方式。多样化的评估策略是评估学生能力的最佳方式[49]。

Difficult thing to do as pandemic has hindered the learning for a lot of students so given the timeframe currently I think this needs to be looked at but when the pandemic can be isolated out of the data.

由于疫情阻碍了许多学生的学习,因此在当前的时间框架内,我认为需要对此进行审视,但前提是能够将疫情的影响从数据中隔离出来。

4.4. Educator 4

4.4. 教育者 4

The adoption of any new automation technology is fraught with potential for pitfalls, misunderstandings and misapplication s. Before turning attention

采用任何新的自动化技术都充满了潜在的陷阱、误解和误用。在将注意力转向之前

to Large Language Models (LLMs) such as ChatGPT, I would first choose a more straightforward illustrative example.

对于像 ChatGPT 这样的大语言模型 (LLMs),我会首先选择一个更简单的示例来说明。

4.4.1. Originality Detection

4.4.1. 原创性检测

Even the least tech-savvy of students and educators have some grasp of how originality detectors such as Turnitin $^2$ operate. On a high level, the system has access to a vast database of text samples (both those gathered online and those submitted to the system in the past) and newly-submitted work is compared against this. Text passages that match items in the database are identified and a “similarity score” is output. Even in this relatively straightforward scenario, misinterpretation and misapplication abounds. Firstly, tools of this type are often deceptively described as “plagiarism detection” [50], leading to an over-reliance on a single tool as an arbiter of what constitutes plagiarism. As noted by Meo and Talha [50], plagiarism comes in many forms and plagiarism detection is an academic judgment. “Word-forword plagiarism” (which originality checkers can effectively discover) is only one aspect. Students who are compelled to submit their work through such systems often come to conflate “similarity score” with “plagiarism score”. Particularly in situations where students can see these scores and resubmit their work, a perception can grow that rephrasing the offending matching sections is sufficient to avoid plagiarism. Where reworded ideas have been taken from other sources, without attribution, a heavily-pla gia rise d document can yield a low similarity score. Conversely, a relatively high similarity score does not necessarily constitute plagiarism either, and it is incumbent on educators to bear this in mind. There are myriad reasons why sections may match text from a database, particularly quotations and bibliographies. A submitted work that is over reliant on lengthy quotations without commentary may be of low quality, but if cited correctly it does not constitute plagiarism. Originality checkers should only be used as a tool to identify potential cases of a specific form of plagiarism, with a human investigation necessary to verify whether or not this is the case.

即使是最不精通技术的学生和教育工作者,也对 Turnitin $^2$ 等原创性检测工具的运作方式有所了解。从高层次来看,该系统可以访问大量的文本样本数据库(包括从网上收集的和过去提交给系统的样本),并将新提交的作品与之进行比较。系统会识别出与数据库中内容匹配的文本段落,并输出一个“相似度分数”。即使在这种相对简单的情况下,误解和误用也屡见不鲜。首先,这类工具常常被误导性地描述为“抄袭检测”工具 [50],导致人们过度依赖单一工具来判断是否构成抄袭。正如 Meo 和 Talha [50] 所指出的,抄袭有多种形式,抄袭检测是一种学术判断。“逐字抄袭”(原创性检测工具可以有效发现)只是其中的一个方面。被迫通过此类系统提交作品的学生常常将“相似度分数”与“抄袭分数”混为一谈。特别是在学生可以看到这些分数并重新提交作品的情况下,他们可能会认为只需重新措辞匹配的部分就足以避免抄袭。如果从其他来源提取并重新措辞的想法未经引用,即使文档中存在大量抄袭内容,也可能产生较低的相似度分数。相反,相对较高的相似度分数也不一定构成抄袭,教育工作者有责任牢记这一点。文本段落与数据库中的内容匹配的原因有很多,尤其是引用和参考文献。过度依赖冗长引用而没有评论的提交作品可能质量较低,但如果引用正确,则不构成抄袭。原创性检测工具应仅作为识别特定形式抄袭潜在案例的工具,必须通过人工调查来验证是否确实存在抄袭。

In summary, even an understandable tool of this type can directly contribute to students misunderstanding the concepts of plagiarism, and overzealous educators making accusations of academic misconduct based on a misinterpretation of the significance of the evidence to hand.

总之,即使是这种类型的可理解工具,也可能直接导致学生对抄袭概念的误解,以及教育工作者因对手头证据重要性的误解而过度热情地指控学术不端行为。

4.4.2. Large Language Models

4.4.2 大语言模型 (Large Language Models)

The role and capabilities of LLMs such as ChatGPT and Bard are much more difficult to understand, and as such the challenges of dealing with them in an educational setting are even more pronounced. Firstly, we should endeavour to understand, even on a basic level, how a LLM operates. In essence, it learns patterns and relationships between words, sentences and paragraphs in text, having been exposed to enormous quantities of human-created text to learn from. Then, given a “prompt” from a user, it generates text in response, beginning by matching the context of the prompt against its text store. As it generates the text, it uses a probabilistic approach to choose words one at a time. Based on the text it has generated thus far, it tries to predict what the next word should be. However, to avoid generating the same text in response to the same prompt each time, an element of randomness is introduced so as not to always choose the most likely word. Finally, it has a stopping mechanism that will cause the generation to end as appropriate [51]. One other aspect is that ChatGPT is also trained on actual chat logs between humans, and so it exhibits elements of personality. It is polite to a fault, apologises for perceived mistakes and appears eager to please. This leads to another observation, relating to the language that people use to describe their characteristics, and indeed their shortcomings. Because of the human-like nature of the generated text, people seem to be happy to attribute human-like explanations. It has been widely observed that ChatGPT will generate plausible-looking, incorrect references when asked to provide a bibliography [52]. Other types of referencing errors have also been observed (e.g. in law [53]). Such errors are typically described as “hallucinations”, giving them a distinctly human characteristic that implies real intelligence. Contrast this with a hypothetical AI image classifier that, presented with a photograph of a cat, predicts that it is a spaceship. In the latter situation, users are more likely to dismiss the tool’s effectiveness as being simply wrong.

ChatGPT 和 Bard 等大语言模型 (LLM) 的作用和能力更加难以理解,因此在教育环境中应对它们的挑战也更加明显。首先,我们应该努力理解大语言模型的基本运作方式。本质上,它通过学习大量人类创建的文本,学习文本中单词、句子和段落之间的模式和关系。然后,在用户提供“提示”时,它会根据其文本库匹配提示的上下文,生成相应的文本。在生成文本时,它采用概率方法逐个选择单词。根据已生成的文本,它尝试预测下一个单词应该是什么。然而,为了避免每次对相同提示生成相同的文本,引入了一定的随机性,以避免总是选择最可能的单词。最后,它有一个停止机制,可以在适当的时候结束文本生成 [51]。另一个方面是,ChatGPT 还接受了人类实际聊天记录的训练,因此它表现出一定的个性特征。它过于礼貌,会为感知到的错误道歉,并表现出渴望取悦用户的态度。这引出了另一个观察,即人们用来描述其特征甚至缺点的语言。由于生成的文本具有类似人类的特性,人们似乎乐于赋予其类似人类的解释。广泛观察到的是,当要求提供参考文献时,ChatGPT 会生成看似合理但错误的引用 [52]。还观察到其他类型的引用错误(例如在法律领域 [53])。这些错误通常被称为“幻觉”,赋予它们一种明显的人类特征,暗示着真正的智能。与此形成对比的是,假设一个 AI 图像分类器在接收到一张猫的照片时,预测它是一艘宇宙飞船。在后一种情况下,用户更可能认为该工具的效果是错误的。

The human-like characteristic of ChatGPT ultimately means that users are more likely to trust that its output is correct. An additional issue is that ChatGPT is innumerate. Although it can recognise where a numeric value would be appropriate in the text, the specific value often bears no resemblance to the correct answer. When challenged, it will attempt to “correct” the answer (even for relatively straightforward calculations) and offer an alternative. It is notable that when GPT- 4 recognises that a numeric value is required (at present, the free version of ChatGPT is based on the earlier

ChatGPT 的类人特性最终意味着用户更有可能相信其输出的正确性。另一个问题是 ChatGPT 不具备数学能力。尽管它能够识别文本中适合插入数值的位置,但具体数值往往与正确答案相去甚远。当被质疑时,它会尝试“纠正”答案(即使是相对简单的计算)并提供替代方案。值得注意的是,当 GPT-4 识别到需要数值时(目前,免费版本的 ChatGPT 基于早期版本),它可能会提供更准确的数值。

GPT-3.5), it will generate a Python program to perform the calculations, which is a significant advancement. Students may be attracted by reports of ChatGPT passing the bar exam, for example [27], and be tempted to employ it to cheat on university assignments. Due to the limitations outlined above, strategies such as requiring correct referencing, or in some cases complex calculations, may result in indications that the work is not that of the student. A careless student who simply copy/pastes a ChatGPT-generated essay may find that they have submitted substandard work, even if their use of LLMs cannot be proven.

GPT-3.5),它将生成一个 Python 程序来执行计算,这是一个显著的进步。学生可能会被 ChatGPT 通过律师资格考试等报道所吸引 [27],并试图利用它在大学作业中作弊。由于上述限制,要求正确引用或在某些情况下进行复杂计算的策略可能会导致作业并非学生本人完成的迹象。一个粗心的学生如果简单地复制/粘贴 ChatGPT 生成的论文,可能会发现他们提交的是不合格的作品,即使他们使用大语言模型的行为无法被证明。

4.4.3. Detection of LLM-Generated Content

4.4.3. 大语言模型生成内容的检测

Educators are understandably concerned at the rise in the use of ChatGPT among students to write essays and assignments. This has led to the launch of a number of products that claim to be able to differentiate AIgenerated content from human-generated text. Examples include GPTZero $^3$ [35] and ZeroGPT $^{14}$ . To be fair to the creators of these products, their websites are open about the role their tools are intended to play, and give some detail about how they are created. For example, GPTZero’s website states the following: ‘We test our models on a never-before-seen set of human and AI articles from a section of our large-scale dataset, in addition to a smaller set of challenging articles that are outside its training distribution.’ ZeroGPT’s website states the following: ‘Finally, we employ a comprehensive deep learning methodology, trained on extensive text collections from the internet, educational datasets, and our proprietary synthetic AI datasets produced using various language models.’ Both therefore claim strong accuracy in differentiating between text that is $100%$ AI-generated and text that is $100%$ human-generated. However, as with the originality detection software discussed above, it is imperative that educators understand what these tools are designed to do and what they are not designed to do. Only the laziest of students will directly submit a $100%$ AI-generated piece of work. These tools have not been trained on any dataset that includes proactive efforts to fool them. In some cases, even the addition of a single space can cause a ChatGPT detection tool to be fooled [54]. Similarly, since AI-generated text does not contain spelling or grammatical errors, some trivial manipulations can cause the detection software to fail. This serves to emphasise some inherent challenges in dealing with the problem of students using LLMs to complete their assignments. Certainly, no LLM detector should be relied upon as definitive evidence of wrongdoing, nor can it definitively exonerate a suspected student. It remains an open question as to whether a reliable AI-detection tool is even possible. At best, an educator may use these in a similar way to originality checkers: a first pass to find suspicious cases that may merit further investigation. However, human judgment and old-fashioned mechanisms like oral examinations should remain part of the process.

教育工作者对学生使用 ChatGPT 撰写论文和作业的现象感到担忧是可以理解的。这促使了一系列产品的推出,这些产品声称能够区分 AI 生成的内容和人类生成的文本。例如 GPTZero $^3$ [35] 和 ZeroGPT $^{14}$。公平地说,这些产品的创建者在他们的网站上公开了这些工具的预期作用,并详细说明了它们的创建方式。例如,GPTZero 的网站声明如下:“我们在一个前所未见的人类和 AI 文章集上测试我们的模型,这些文章来自我们大规模数据集的一部分,此外还包括一些不在其训练分布中的具有挑战性的文章。” ZeroGPT 的网站声明如下:“最后,我们采用了一种全面的深度学习方法,训练数据来自互联网上的大量文本、教育数据集以及我们使用各种语言模型生成的专有合成 AI 数据集。”因此,两者都声称在区分 $100%$ AI 生成的文本和 $100%$ 人类生成的文本方面具有很高的准确性。然而,与上面讨论的原创性检测软件一样,教育工作者必须了解这些工具的设计目的以及它们不设计用于做什么。只有最懒惰的学生才会直接提交 $100%$ AI 生成的作品。这些工具没有在包含主动欺骗它们的任何数据集上进行训练。在某些情况下,甚至添加一个空格也可能导致 ChatGPT 检测工具被欺骗 [54]。同样,由于 AI 生成的文本不包含拼写或语法错误,一些简单的操作可能会导致检测软件失效。这强调了在处理学生使用大语言模型完成作业的问题时存在的一些固有挑战。当然,不应依赖任何大语言模型检测器作为不当行为的明确证据,也不能明确证明被怀疑的学生无罪。一个可靠的 AI 检测工具是否可能仍然是一个悬而未决的问题。充其量,教育工作者可以以类似于原创性检查器的方式使用这些工具:作为初步筛选,以发现值得进一步调查的可疑案例。然而,人类判断和口试等传统机制仍应是这一过程的一部分。

4.5. Educator 5

4.5. 教育者 5

ChatGPT is a specific software application built on top of Generative AI technology, particularly large language models (LLMs). Generative AI is a broad term that refers to any type of artificial intelligence that can create new content. This can include text, images, music, code, and more. Among these, Large Language Models (LLMs) stand out as a specialized subset of Generative AI, specifically engineered for text generation. LLMs represent a class of artificial intelligence proficient in both text generation and comprehension. They undergo extensive training on extensive datasets containing text and code, enabling them to grasp the intricacies of human language patterns. LLMs find application across a diverse range of tasks, including [55] :

ChatGPT 是基于生成式 AI (Generative AI) 技术构建的特定软件应用,尤其是大语言模型 (LLMs)。生成式 AI 是一个广义术语,指的是任何能够创建新内容的人工智能类型。这可以包括文本、图像、音乐、代码等。其中,大语言模型 (LLMs) 是生成式 AI 的一个专门子集,专门用于文本生成。LLMs 代表了一类在文本生成和理解方面都表现出色的人工智能。它们经过大量包含文本和代码的数据集的训练,能够理解人类语言模式的复杂性。LLMs 被广泛应用于各种任务中,包括 [55]:

This has given rise to Ethical and Privacy Concerns around AI generated content in education. Central to understanding the impact past the hype phase is promoting a broader understanding of what these models really are, and how they are designed. Bard $^{5}$ , ChatGPT, and Bing $^6$ AI are all examples of publicly available large language models (LLMs) that can generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way. While large language models open up many possibilities, there is still much to learn about how people will interact with them [56]. While Bard, Bing and ChatGPT all aim to give human-like answers to questions, each performs differently. Bing starts with the same GPT-4 tech as ChatGPT but goes beyond text and can generate images. Bard uses Google’s own model, called LaMDA, often giving less text-heavy responses [57] [58]. Bard is trained on a dataset of text and code that is specifically designed to improve its dialogue and coding abilities. ChatGPT is trained on a dataset of text that is more general in nature. This means that Bard is better at understanding and responding to natural language, while ChatGPT is better at generating creative text formats.

这引发了关于教育中AI生成内容的伦理和隐私问题。要理解这些模型在炒作阶段之后的影响,核心在于促进对其本质及设计方式的更广泛理解。Bard $^{5}$、ChatGPT和Bing $^6$ AI都是公开可用的大语言模型(LLMs)的例子,它们能够生成文本、翻译语言、撰写各种创意内容,并以信息丰富的方式回答问题。尽管大语言模型开启了众多可能性,但关于人们将如何与之互动,仍有许多需要学习的地方 [56]。虽然Bard、Bing和ChatGPT都旨在提供类似人类的回答,但各自表现不同。Bing与ChatGPT一样基于GPT-4技术,但超越了文本,还能生成图像。Bard使用Google自己的模型LaMDA,通常提供较少文本密集的响应 [57] [58]。Bard在一个专门设计用于提升对话和编码能力的文本和代码数据集上训练。ChatGPT则在一个更为通用的文本数据集上训练。这意味着Bard更擅长理解和回应自然语言,而ChatGPT在生成创意文本格式方面表现更佳。

Consideration must be given to alignment with educational values. This should ensure AI tools align with educational goals and values, such as critical thinking, creativity, and ethical decision-making. The presence of AIgenerated content presents new and unique ethical considerations. Firstly, the very notion of authorship blurs, as AI lacks the capacity for genuine creative ownership. Assigning sole credit to authors who merely provide prompts for AI outputs is equally disingenuous. Therefore, establishing transparent attribution guidelines becomes essential. Secondly, the specter of bias is evident throughout AI research, as AI algorithms can unwittingly mirror societal prejudices present in their training data. Mitigating this necessitates employing diverse datasets and vigilantly monitoring outputs for discri minatory content. Thirdly, the potential for manipulating or fabricating information through AI-generated content, exemplified by deepfakes, poses a significant threat. Safeguards emphasizing factual accuracy and transparency are essential to combat this. Finally, the emotional impact of AI content cannot be ignored. Educators must carefully consider the potential psychological effects, particularly on vulnerable populations within educational settings. In conclusion, navigating the ethical minefield surrounding AI-generated content requires a multifaceted approach, encompassing clear attribution, diverse training data, robust safeguards against misinformation, and thoughtful consideration of the emotional impact on users. By addressing these ethical and privacy concerns, we can ensure AI-generated content and chatbots contribute positively to the educational experience, fostering a safe, responsible, and enriching learning environment.

必须考虑与教育价值观的一致性。这应确保AI工具与教育目标和价值观(如批判性思维、创造力和道德决策)保持一致。AI生成内容的存在带来了新的独特伦理问题。首先,作者身份的概念变得模糊,因为AI缺乏真正的创造性所有权。将仅提供AI输出提示的作者单独归功同样是不诚实的。因此,建立透明的归属指南变得至关重要。其次,偏见问题在AI研究中显而易见,因为AI算法可能会无意中反映其训练数据中存在的社会偏见。缓解这一问题需要采用多样化的数据集,并警惕地监控输出中的歧视性内容。第三,通过AI生成内容(如深度伪造)操纵或伪造信息的潜力构成了重大威胁。强调事实准确性和透明度的保障措施对于应对这一问题至关重要。最后,AI内容的情感影响不容忽视。教育者必须仔细考虑潜在的心理影响,特别是在教育环境中对弱势群体的影响。总之,应对AI生成内容带来的伦理困境需要多方面的策略,包括明确的归属、多样化的训练数据、防止错误信息的强大保障措施,以及对用户情感影响的深思熟虑。通过解决这些伦理和隐私问题,我们可以确保AI生成内容和聊天机器人为教育体验做出积极贡献,营造一个安全、负责任且丰富的学习环境。


Figure 2: The GAI landscape: generative models and artifacts [59]

图 2: 生成式 AI (Generative AI) 的格局:生成模型与产物 [59]

4.6. Educator 6

4.6. 教育者 6

Artificial Intelligence (AI), has been increasingly used these days as the prime disposition to take decision, solve problems, write reports and so on and so forth claiming to replace human intelligence in future. Since, it is typically performance based, executing the commands generously without any perceptions or misconceptions of its abilities, it is being increasingly demanded. But another facet to pointed is that, it is only a function of the human command programmed to function with a set pattern or methods and will deliver similar results, sometimes overlapping with the same methodical approach. The future of Artificial Intelligence in such light seems crippled without human intelligence. The future of decision-making, problem-solving lies with a careful concoction of number crunching, big data analysis, research tools and interdisciplinary research which requires correct amount of Human intelligence at every stage, leading to formation of Augmented Intelligence. It is an amalgamation of reckoning the correct ingredients or combination with intuitive abilities of human judgement when equipped with the methodical skill set of Artificial intelligence [60]. It is also known as Intelligence Augmentation (IA) or Cognitive augmentation is a new age marriage between man and machine. AI and IA together have a plan to pen down the future differently when used with a collaborative approach. While AI has been increasingly posing a threat to replace humans, but when it comes to the judgment and the reckoning aspect, we see human stepping in for an informed disposition of intelligence. Instead of avoiding or making attempts to accept this inevitable change, it is now time to look at the aspect as a JV between Humans and Computer. The IA approach shall bring together advances, modernisation and speed in the work approach across business enterprise, Institution, organisation, students, workers and media communities. The idea is to make the most of it by equipping and training the human intelligence with its correct and appropriate use. The tasks which are tedious for humans or repetitive in nature and has redundant value can be done by the AI bots, thus removing the human errors and biases. While the tasks which require interpretations, visionary approach holistic mind set and decree can be done by humans with larger efficiency due to save time at hand.

人工智能(AI)如今越来越多地被用作决策、解决问题、撰写报告等的主要手段,声称未来将取代人类智能。由于它通常基于性能,慷慨地执行命令,没有任何对其能力的感知或误解,因此需求日益增长。但另一个值得指出的方面是,它只是人类命令的一个函数,按照设定的模式或方法进行编程,并会提供类似的结果,有时会与相同的方法重叠。在这种背景下,人工智能的未来似乎在没有人类智能的情况下是残缺的。未来的决策和问题解决依赖于数字运算、大数据分析、研究工具和跨学科研究的精心结合,这需要在每个阶段都具备正确的人类智能,从而形成增强智能(Augmented Intelligence)。它是将正确的成分或组合与人类判断的直觉能力相结合,同时配备人工智能的方法技能集 [60]。它也被称为智能增强(Intelligence Augmentation, IA)或认知增强,是人与机器之间的新时代结合。AI 和 IA 共同计划在协作方法下以不同的方式书写未来。虽然 AI 越来越多地构成取代人类的威胁,但在判断和推理方面,我们看到人类介入以做出明智的决策。与其避免或试图接受这一不可避免的变化,现在是时候将其视为人类与计算机之间的合资企业。IA 方法将在企业、机构、组织、学生、工人和媒体社区中带来工作方式的进步、现代化和速度。其理念是通过正确和适当的使用来装备和培训人类智能,从而充分利用它。那些对人类来说繁琐或重复且具有冗余价值的任务可以由 AI 机器人完成,从而消除人为错误和偏见。而那些需要解释、远见卓识、整体思维和判断的任务,由于节省了时间,可以由人类以更高的效率完成。

The statistics have shown that IA leads to 99% accuracy in decision making, leading to enhanced productivity. Alexa or other similar bots help you to take commands, recognize voices and eliminate the trouble of remembering and in some cases doing of mundane tasks. Students have although been using it as convenient tool to plagiarize their creativity lowering the scope of thinking. The University experts have now started to incorporate the Chatbot as the assignment providers to the students, where students are the ones evaluating the assignments created by bots [61]. The method is a clever precautionary approach rather than being a cure to the plagiarism. Unlike straight automation, IA shall enhance cognitive abilities. It’s the IA technology who has to evolve with the open human mind set creating and consuming content with the help of AI, leaving no room for error and creating a powerful and strengthened approach. It uses the strengths of both man and machine while mitigating the risks and threats. The data has repeatedly shown that in organisation, where AI was kept as the sole leader, the human participation was ultimately asked in 30% of cases [62]. In the times of the uncertainty of the VUCA world, this only seems to rise. If your process has continuous human input, the change management in terms of adopting AI as a peer or colleague to work along will become smooth function of any organisation, While the students community need to be evaluators to understand deprivation from the AI approaches, so that they use it like an equipment rather an a subordinate. The IA can redefine the landscape of Human performance with harmonious function of partnership between man and machine building a realm of AI powered humans, who increase effectiveness at workplace by opening new horizons of ideas, backed by rationale and vision.

统计数据显示,智能增强 (Intelligence Augmentation, IA) 在决策中达到了 99% 的准确率,从而显著提高了生产力。Alexa 或其他类似的机器人助手能够接收指令、识别语音,并帮助人们摆脱记忆繁琐任务的困扰,甚至在某些情况下直接完成这些任务。然而,学生却将其作为便利工具,用于抄袭创意,降低了思考的空间。大学专家们现在开始将聊天机器人作为作业提供者引入教学,学生则负责评估由机器人生成的作业 [61]。这种方法是一种巧妙的预防措施,而非解决抄袭问题的直接手段。与单纯的自动化不同,智能增强应当提升认知能力。智能增强技术需要与开放的人类思维共同发展,借助 AI 创造和消费内容,确保无差错并形成强大且稳健的方法。它结合了人类和机器的优势,同时降低了风险和威胁。数据反复表明,在那些将 AI 作为唯一领导者的组织中,30% 的情况下最终仍需人类参与 [62]。在 VUCA(易变性、不确定性、复杂性和模糊性)世界的不确定性中,这一比例似乎只会上升。如果您的流程中持续有人类输入,那么在将 AI 作为同事或合作伙伴引入时,变革管理将成为任何组织的顺畅功能。与此同时,学生群体需要成为评估者,以理解 AI 方法带来的局限性,从而将其视为工具而非下属。智能增强可以通过人与机器的和谐合作重新定义人类表现的格局,构建一个由 AI 赋能的人类领域,这些人类通过开拓新的思想视野,在理性与愿景的支持下,提升工作场所的效能。

4.7. Educator 7

4.7. 教育者 7

Assuring the veracity of student outputs has always been of concern, but recent developments in Generative AI (Gen AI) have thrown a curve ball at the processes already in place. Lecturers and administrators across our college have been challenged with the double concern of how to embed Gen AI into our teaching as a tool but also assure that it is not misused in producing outputs at the assessment level. The usage of these tools by students was at first met with apprehension but then excitement as it was seen as another important tool within the modern student’s arsenal; it has become increasingly apparent that these tools will be and are being used across industry [63] so we have noticed that we would be seriously disadvantaging students by not including their usage in the teaching programmes. Early trials are in place for using Gen AI as a part of module teaching and assessment in some modules. But we also need to put in place mechanisms to help prevent their misuse in outputs at assessment level. The Exams Office along with Programme Coordinators have developed changes to overall assessment that reflect the need to be aware of the misuse of AI through more authentic forms of assessment. This discussion deals with some of the early ideas and mechanisms proposed and developed around both of these issues as we both battle and welcome AI.

确保学生作业的真实性一直是一个关注点,但生成式 AI (Generative AI) 的最新发展给现有的流程带来了新的挑战。我们学院的讲师和管理人员面临着双重担忧:如何将生成式 AI 作为一种工具嵌入教学中,同时确保在评估层面不被滥用。学生们使用这些工具最初引发了担忧,但随后又带来了兴奋,因为它被视为现代学生工具箱中的另一个重要工具;越来越明显的是,这些工具将在整个行业中使用,并且已经在使用 [63],因此我们注意到,如果不将其纳入教学计划,我们将严重损害学生的利益。一些模块已经开始了将生成式 AI 作为模块教学和评估一部分的早期试验。但我们也需要建立机制,以防止其在评估层面的输出中被滥用。考试办公室与项目协调员一起开发了整体评估的变更,以反映通过更真实的评估形式来防止 AI 滥用的需求。本文讨论了一些围绕这两个问题提出的早期想法和机制,我们既在与 AI 斗争,也在欢迎它。

Many scholars are exploring the ethical usage of Gen AI in the classroom. Some of this research finds high intention by students to use these tools [64]. But the perceived usefulness of these tools is questioned in university settings contrasting other research in the area [65] which the authors say may be due to a lack of understanding of these tools. Within our own organization we are going to extreme attempts to show students and staff the usefulness of these tools in the classroom as well as their coursework. One such method has involved staff CPD to instill effective usage as well as guidance from our corporate headquarters (Kaplan Inc.). We have also been developing guidance at the Quality Assurance Level for staff and students while some staff members are actively including Gen AI tools in their teaching..

许多学者正在探索生成式 AI (Generative AI) 在课堂中的伦理使用。一些研究发现,学生使用这些工具的意愿很高 [64]。但在大学环境中,这些工具的实用性受到质疑,这与该领域的其他研究形成对比 [65],作者认为这可能是由于对这些工具缺乏理解。在我们自己的组织中,我们正在采取极端措施,向学生和教职员工展示这些工具在课堂和课程作业中的实用性。其中一种方法是通过教职员工的持续专业发展 (CPD) 来灌输有效使用,同时获得我们公司总部 (Kaplan Inc.) 的指导。我们还在质量保证层面为教职员工和学生制定指导方针,同时一些教职员工正在积极将生成式 AI 工具纳入他们的教学中。

Another interesting study looks at Gen AI adoption across the generations. [65] found that Generation Z students showed an interest in using Gen AI as a tool in their educational pursuits while Generation X and Y teachers showed optimism towards the tool while expressing concerns about its application. One could indeed map the rise of Gen AI on many other tools that over the years would have seemed to be cheapening the learning experience (the computer to the page, the page to rhetoric. . . ). We have found this mixture of interest and apprehension across our own staff as we learn to deal with this interesting new tool.

另一项有趣的研究探讨了不同世代对生成式 AI (Generative AI) 的采用情况。[65] 发现,Z 世代学生对将生成式 AI 作为教育工具表现出兴趣,而 X 世代和 Y 世代教师则对该工具持乐观态度,同时也对其应用表示担忧。确实,我们可以将生成式 AI 的兴起与多年来似乎降低了学习体验价值的其他工具(从计算机到纸张,从纸张到修辞……)进行类比。在我们学习应对这一有趣的新工具时,我们的员工中也出现了这种兴趣与担忧并存的现象。

The elephant in the room though, is plagiarism or academic impropriety (the other AI). Having experience working adjunct in two other universities has allowed this researcher multiple viewpoints into the issue as it has arisen. Working in different departments (Humanities and Social Sciences) has also highlighted interesting and varied approaches. The first observation was that Humanities departments, heavily reliant on the the essay form such as Literature Studies, initially showed an aggressive zero-tolerance approach to its usage in coursework while the Social Sciences such as Communications Studies, showed a more balanced approach, recognising it as a tool but still leaning strongly towards penalizing students for its usage as opposed to actively incorporating its usage. It was in our Business College where we found a more balanced approach and this may be due to the prevalence of projecttype work which facilitates the ethical usage of Gen AI but also makes its misuse less easy to apply. The nature of project work is a more authentic type of assessment that involves more interaction between the lecturer and student, which makes Gen AI content more obvious in final productions.

然而,房间里的大象是剽窃或学术不端行为(另一种 AI)。这位研究人员在其他两所大学担任兼职工作的经历,使他对这一问题有了多角度的看法。在不同部门(人文与社会科学)工作的经历也凸显了有趣且多样的应对方式。第一个观察是,像文学研究这样严重依赖论文形式的人文系,最初对其在课程作业中的使用采取了激进的零容忍态度,而像传播学这样的社会科学则表现出更为平衡的态度,将其视为一种工具,但仍然强烈倾向于惩罚使用它的学生,而不是积极将其纳入使用。在我们的商学院,我们发现了一种更为平衡的应对方式,这可能是由于项目型工作的普遍性,这种工作促进了生成式 AI 的合乎道德的使用,同时也使其滥用更难以实施。项目工作的性质是一种更为真实的评估类型,涉及讲师和学生之间更多的互动,这使得生成式 AI 的内容在最终成果中更加明显。

The answer to the negative aspects of Gen AI that leads to academic impropriety is to embrace more authentic forms of assessment like the above. To move away from the essay from and towards more regulated and monitored project-type work that also encourages the use of Gen AI as a tool in that process. At the HECA Research Conference 2023, Gen AI and authentic assessment was the centre of many interesting discussions. Indeed, the conference ended with Danielle Logan Fleming of Griffith University, Australia: A message of HOPE: Generative AI and Authentic Interactive Oral

应对生成式 AI(Generative AI)带来的学术不端问题的答案是采用更真实的评估形式,如上所述。从传统的论文形式转向更受监管和监控的项目型工作,同时鼓励在此过程中将生成式 AI 作为工具使用。在 2023 年 HECA 研究会议上,生成式 AI 和真实评估是许多有趣讨论的核心。事实上,会议结束时,澳大利亚格里菲斯大学的 Danielle Logan Fleming 发表了题为《希望的信息:生成式 AI 与真实互动口语》的演讲。

Assessment 7. The idea generated here is that we need much more authentic assessment in our programmes and that we can easily battle the misuse of Gen AI by creating more authentic assessment that engages in a conversation with the students as they develop their work [66]. In short, Gen AI is the future of industry and education and we need to embrace it in our classrooms and our assessment. It cannot be ignored, nor should it, as any institution that attempts to ride out the storm of Gen AI will fall behind and drag their students with them.

评估 7. 这里产生的想法是,我们需要在课程中进行更多真实的评估,并且我们可以通过创建更多真实的评估来轻松应对生成式 AI (Generative AI) 的滥用,这些评估在学生开发作品时与他们进行对话 [66]。简而言之,生成式 AI 是工业和教育的未来,我们需要在课堂和评估中拥抱它。它不能被忽视,也不应该被忽视,因为任何试图在生成式 AI 的风暴中坚持的机构都会落后,并拖累他们的学生。

5. Thematic Analysis

5. 主题分析

In this section, the results of the thematic analysis approach by Braun and Clarke [46] have been discussed. The framework helped in the identification of 11 themes, which have been discussed in the following subsections. The themes identified were: ‘Academic Integrity and Challenges in Assessment’, ‘Limitations and Misuse of Generative AI’ ‘The Importance of Prompt Construction’, ‘Critical Thinking and Problem-Solving Skills’, ‘Bias, Transparency, and Ethical Concerns’, ‘Responsible use of GenAI’, ‘GenAI for Programming’, ‘Technical Details of AI Tools’, ‘Advantages of GenAI’, ‘Challenges of GenAI’, and ‘Miscellaneous’.

在本节中,讨论了 Braun 和 Clarke [46] 的主题分析方法的结果。该框架帮助识别了 11 个主题,这些主题将在以下小节中讨论。识别的主题包括:“学术诚信与评估中的挑战”、“生成式 AI 的局限性与滥用”、“提示构建的重要性”、“批判性思维与问题解决能力”、“偏见、透明度与伦理问题”、“生成式 AI 的负责任使用”、“生成式 AI 在编程中的应用”、“AI 工具的技术细节”、“生成式 AI 的优势”、“生成式 AI 的挑战”以及“其他”。

Figure 3 displays a sankey diagram that illustrates the themes discussed by each educator. ‘Academic Integrity and Challenges in Assessment’ is the most prevalent theme, indicating that it is a significant concern amongst educators. Other prevalent themes include ‘Responsible use of GenAI’ discussed by 4 educators, and ‘Challenges of GenAI’, ‘Technical Details of AI Tools’ and ‘Advantages of GenAI’, each discussed by 3 educators. Themes such as ‘Importance of Prompt Construction’ and ‘Bias, Transparency, and Ethical Concerns’ have been specifically discussed by a minority of the educators. Certain educators, such as educators 1 and 2, have discussed a variety of themes.

图 3 展示了一个桑基图,说明了每位教育工作者讨论的主题。“学术诚信与评估挑战”是最普遍的主题,表明这是教育工作者关注的重要问题。其他普遍的主题包括 4 位教育工作者讨论的“生成式 AI 的负责任使用”,以及 3 位教育工作者分别讨论的“生成式 AI 的挑战”、“AI 工具的技术细节”和“生成式 AI 的优势”。少数教育工作者特别讨论了“提示构建的重要性”和“偏见、透明度和伦理问题”等主题。某些教育工作者,如教育工作者 1 和 2,讨论了多种主题。

5.1. Academic Integrity and Challenges in Assessment

5.1. 学术诚信与评估中的挑战

‘Academic Integrity and Challenges in Assessment’ is the most commonly discussed theme, mentioned by 5 out of the 7 educators.

“学术诚信与评估挑战”是最常讨论的主题,7位教育者中有5位提到。

Table 2 lists the final codes for the theme ‘Academic Integrity and Challenges in Assessment’ for each educator. Plagiarism can become rampant due to the free availability of GenAI tools, and AI-generated content can be difficult to detect [67]. Developing tools to detect AI generated content can be challenging [35], if possible at all, and some can be evaded by simply adding a single space [54]. The development of new and innovative assessment methods, such interactive oral assessment [66], project-based work, and peer evaluations [68], is essential.

表 2 列出了每位教育工作者关于“学术诚信与评估挑战”主题的最终代码。由于生成式 AI (Generative AI) 工具的免费可用性,抄袭行为可能会变得猖獗,而 AI 生成的内容可能难以检测 [67]。开发检测 AI 生成内容的工具可能具有挑战性 [35],甚至可能根本无法实现,有些工具只需添加一个空格即可规避 [54]。开发新的创新评估方法,如互动式口头评估 [66]、基于项目的作业和同行评估 [68],至关重要。


Figure 3: Thematic mapping

图 3: 主题映射

1 2 关于抄袭和学术不端的担忧,AI生成的作品难以检测,AI对评估和学术诚信的影响带来的挑战
3 论文类评估面临风险,实践技能展示的重要性,学术诚信是一个问题,学术政策必须更新以确保学术诚信,免费可用的AI工具进一步影响
6 诚信,需要创新和多样化的评估方法,替代评估方法的重要性,更新机构学术诚信政策的重要性,学生滥用AI进行抄袭,学生评估AI生成的作业以规避抄袭
7 生成式AI加剧了抄袭检测的问题,讲师和学生之间的评估互动,监控的项目类型工作鼓励生成式AI的伦理使用

Table 2: Codes for Academic Integrity and Challenges in Assessment

表 2: 学术诚信与评估挑战的代码

5.2. Limitations and Misuse of Generative AI

5.2. 生成式 AI 的局限性与误用

The theme ‘Limitations and Misuse of Generative AI’ includes codes that discuss general limitations and the potential misuses of these tools, which could not be aggregated into a single theme. Table 3 lists the final codes for the theme for the educators who discussed it. Educator 4 discusses how ChatGPT hallucinates plausible sounding references [52] and creates other referencing errors [53]. ChatGPT also struggles with basic arithmetic [69].

主题“生成式 AI (Generative AI) 的局限性与滥用”包含讨论这些工具的普遍局限性和潜在滥用的代码,这些内容无法归入单一主题。表 3 列出了讨论该主题的教育工作者的最终代码。Educator 4 讨论了 ChatGPT 如何生成看似合理但实际不存在的参考文献 [52],并产生其他引用错误 [53]。ChatGPT 在处理基本算术时也存在困难 [69]。

Edc. Codes

Edc. Codes

Table 3: Codes for Limitations and Misuse of AI

表 3: AI 的限制与滥用代码

| 1 3 4 ChatGPT 生成 | 学生对 AI 聊天机器人在编码中的严重依赖,AI 聊天机器人作为工具的局限性,需要教育学生了解 ChatGPT 的局限性,ChatGPT 经常错误计算简单的算术操作,有效使用 ChatGPT 所需的基础知识,错误的引用,ChatGPT 不擅长数学 |

The theme ‘Importance of Prompt Construction’ consists of codes that discuss the importance of constructing high-quality and precise prompts to obtain relevant and accurate responses from a GenAI tool. Table 4 lists the final codes for the theme for the educators who discussed it. Educators 1 and 2 suggest supplying GenAI chatbots with precise prompts will provide optimal results [70], and tutorials for prompt construction should be included in the curriculum [1].

主题“提示构建的重要性”包含讨论构建高质量和精确提示以获得生成式 AI (Generative AI) 工具相关且准确响应的重要性的代码。表 4 列出了讨论该主题的教育工作者的最终代码。教育工作者 1 和 2 建议为生成式 AI 聊天机器人提供精确的提示将获得最佳结果 [70],并且提示构建的教程应包含在课程中 [1]。

Edc. Codes
1 2 生成模拟数据需要明确的指令,必须构建精确且正确的提示;通过 ChatGPT 介绍提示工程的重要性,高质量提示对于 GPT 的重要性,需要包含信息搜索和检索教程

Table 4: Codes for The Importance of Prompt Construction

表 4: 提示构建重要性的代码

5.4. Critical Thinking and Problem-Solving Skills

5.4. 批判性思维与问题解决能力

The theme ‘Critical Thinking and Problem-Solving Skills’ consists of codes which discuss the impact of GenAI technology on the critical thinking skills of students. Table 5 lists the final codes for the theme for the educators who discussed it. GenAI tools can both promote and hinder the development of these skills [44]. Educator 1 mentions particular concerns such as decline in creativity and independent thinking due to over-reliance on GenAI tools [71].

主题“批判性思维与问题解决能力”包含讨论生成式 AI (Generative AI) 技术对学生批判性思维能力影响的代码。表 5 列出了讨论该主题的教育者的最终代码。生成式 AI 工具既可以促进也可以阻碍这些能力的发展 [44]。教育者 1 提到了一些特别的担忧,例如由于过度依赖生成式 AI 工具而导致创造力和独立思考能力的下降 [71]。

Edc. Codes 1 Decline in problem-solving skills due to AI; AI chatbots diminish problem-solving skills in coding; AI chatbots reduce student creativity, struggle essential for problem solving skills, over-reliance on AI chatbots will reduce independent thinking;

Edc. Codes 1 由于 AI 导致的问题解决能力下降;AI 聊天机器人削弱了编程中的问题解决能力;AI 聊天机器人降低了学生的创造力,而挣扎是问题解决能力的关键,过度依赖 AI 聊天机器人将减少独立思考能力;

Table 5: Codes for Critical Thinking and Problem-Solving Skills

表 5: 批判性思维和问题解决能力的代码

Table 6 lists the final codes for the theme ‘Bias, Transparency, and Ethical Concerns’ for the educators who discussed it. The theme ‘Bias, Transparency, and Ethical Concerns’ consists of codes that touch on biases in AI -generated content [72], copyright concerns with AI generated content [73], and authorship debates around such content [58]. The lack of transparency in the development and deployment of these tools is also a significant barrier in their integration in the educational curriculum [37]. Educator 5 has also expressed concerns regarding the proliferation of AI-generated fake content, which can have a detrimental impact on misinformation in politics and journalism [74] [75]. The potential psychological impact that AI generated content can have on the students, which can be positive or negative, must also be kept in mind [33].

表 6 列出了讨论“偏见、透明度和伦理问题”主题的教育工作者的最终代码。主题“偏见、透明度和伦理问题”包含涉及 AI 生成内容中的偏见 [72]、AI 生成内容的版权问题 [73] 以及围绕此类内容的作者身份争议 [58] 的代码。这些工具在开发和部署过程中缺乏透明度,也是其融入教育课程的重大障碍 [37]。Educator 5 还对 AI 生成的虚假内容的扩散表示担忧,这可能会对政治和新闻领域的错误信息产生不利影响 [74] [75]。还必须牢记 AI 生成内容可能对学生产生的潜在心理影响,这种影响可能是积极的,也可能是消极的 [33]。

Table 6: Codes for Bias, Transparency, and Ethical Concerns

表 6: 偏见、透明度和伦理问题的代码

Edc. Codes
1 5 必须考虑 学生需要了解 AI 生成内容中的偏见、生成式 AI 方法缺乏透明度、AI 生成内容的版权问题、教育中 AI 生成内容的伦理和隐私问题、作者身份、偏见和 AI 生成的虚假内容等伦理考虑、确保准确性和透明性的保障措施的必要性、AI 生成内容的潜在心理影响

5.6. Responsible use of GenAI

5.6. 负责任地使用生成式 AI (Generative AI)

In the theme ‘Responsible use of GenAI’, suggestions for ethical and responsible use of GenAI have been highlighted. Table 7 lists the final codes for the theme for the educators who discussed it. Educator 1 suggests possible changes that can be made to class assignments that are given to students. Educators 1 and 2 suggest that the proper use of GenAI tools must be taught as part of the curriculum. Educator 3 highlights the importance of studying user interaction with LLMs [56]. Educator 6 highlights the importance of human oversight in utilizing GenAI.

在“负责任地使用生成式 AI (GenAI)”主题中,强调了关于道德和负责任使用生成式 AI 的建议。表 7 列出了讨论该主题的教育工作者的最终代码。Educator 1 建议可以对分配给学生的课堂作业进行可能的修改。Educator 1 和 Educator 2 建议必须将正确使用生成式 AI 工具作为课程的一部分进行教授。Educator 3 强调了研究用户与大语言模型 (LLM) 交互的重要性 [56]。Educator 6 强调了在使用生成式 AI 时人类监督的重要性。

Edc. Codes
1 2 3 6 可以通过 l 来执行 AI 的采用 关于教育中 AI 聊天机器人的课程;可能的作业可以适应部分正确的人类或 AI 生成的代码;应指导生成式 AI 的适当使用;必须研究用户与大语言模型的交互;没有人类智能的 AI 是无效的,人类的创造性问题解决能力和持续的人类输入是必要的

Table 7: Codes for Responsible use of GenAI

表 7: 负责任使用生成式 AI (GenAI) 的准则

Edc. Codes
2 3 计算机科学教育侧重于编程,早期的编码助手仅限于模板生成和基本工具,当前的AI编程工具已集成到文本编辑器中;计算环境中存在多样化的文本信息,生成式AI用于样板模板和最小工作示例,ChatGPT作为学生的故障排除工具

Table 8: Codes for GenAI for Programming

表 8: 编程用生成式 AI (GenAI) 代码

5.7. GenAI for Programming

5.7. 编程中的生成式 AI (GenAI)

The theme ‘GenAI for Programming’ includes mentions of the benfits of GenAI tools in software development. Table 8 lists the final codes for the theme for the educators who discussed it. Educator 2 discusses GenAI tools in the context of computer programming, and the tasks, such as template generation and pair programming [76], that can be handled efficiently by these tools. Educator 3 mention the benefits of GenAI as a troubleshooting tool in programming [77]. Despites the various advantages, the use of AI coding assistants has also led to a concerning decrease in code reuse [78].

主题“生成式AI用于编程”包括生成式AI工具在软件开发中的好处。表8列出了讨论该主题的教育工作者的最终代码。教育工作者2在计算机编程的背景下讨论了生成式AI工具,以及这些工具可以有效处理的任务,例如模板生成和结对编程 [76]。教育工作者3提到生成式AI作为编程中的故障排除工具的好处 [77]。尽管有各种优势,使用AI编码助手也导致了代码重用的显著减少 [78]。

5.8. Technical Details of AI Tools

5.8. AI 工具的技术细节

The theme ‘Technical Details of AI Tools’ consists of codes that describe the mechanism behind AI tools. Table 9 lists the final codes for the theme ‘Technical Details of AI Tools’ for the educators who discussed it. Educator 4 describes the mechanism behind originality detectors, and suggests that these tools are completely ineffective in detecting AI-generated content. Educators

主题“AI工具的技术细节”包含描述AI工具背后机制的代码。表9列出了讨论该主题的教育工作者关于“AI工具的技术细节”的最终代码。教育工作者4描述了原创性检测器背后的机制,并指出这些工具在检测AI生成内容方面完全无效。

4, 5, and 6 also describe the mechanism behind LLMs and ChatGPT, and compare a few examples.

4、5 和 6 还描述了大语言模型 (LLM) 和 ChatGPT 背后的机制,并比较了一些示例。

Table 9: Codes for Technical Details of AI Tools

表 9: AI 工具技术细节代码

Edc. Codes
4 5 原创性检测器匹配文本段落并生成相似度评分,大语言模型通过概率方法生成文本,ChatGPT 在实际聊天记录上训练。大语言模型是生成式 AI 的子集,擅长文本生成和理解,经过大规模数据集的广泛训练,大语言模型的应用,Bing 也能生成图像,LaMDA 提供较少文本的响应,Bard 在自然语言理解 (NLU) 和自然语言生成 (NLG) 方面表现更好,ChatGPT 在生成创意文本格式方面表现更好
6 由人类编程的 AI

5.9. Advantages of GenAI

5.9. 生成式 AI 的优势

The theme ‘Advantages of GenAI’ is composed of codes that detailed advantages of using GenAI in education that were not covered under a single theme. Table 10 lists the final codes for the theme for the educators who discussed it. Educator 3 explores the potential of GenAI tools such as ChatGPT in promoting personalized and adaptive learning environments, limiting the immediate need for educators to complex-learning environments [1] [33]. Educator 4 discusses how the human-like characteristics of GenAI tools such as politeness inspire trust and credibility amongst the users [79]. Intelligence Augmentation (IA) is a concept that focuses on enhancing human capabilities through the use of technology [80], and has been discussed in detail by Educator 6.

主题“生成式 AI (GenAI) 的优势”由多个代码组成,这些代码详细描述了在教育中使用生成式 AI 的优势,这些优势未在单一主题下涵盖。表 10 列出了讨论该主题的教育者的最终代码。教育者 3 探讨了生成式 AI 工具(如 ChatGPT)在促进个性化和适应性学习环境方面的潜力,减少了教育者立即应对复杂学习环境的需求 [1] [33]。教育者 4 讨论了生成式 AI 工具(如礼貌性)的类人特征如何激发用户的信任和可信度 [79]。智能增强 (Intelligence Augmentation, IA) 是一个通过使用技术来增强人类能力的概念 [80],教育者 6 对此进行了详细讨论。

5.10. Challenges of GenAI

5.10. 生成式 AI (Generative AI) 的挑战

The theme ‘Challenges of GenAI’ consists of all the concerns related to the integration of GenAI into education that could not be aggregated into a single theme. Table 11 lists the final codes for the theme for the educators who discussed it. Educator 1 discusses challenges such as the need to verify AI generated content and code [81] and potential frustration amongst students due to inconsistent policies for using GenAI tools [37]. Human oversight is essential to ensure the accuracy and reliability of AI-generated content, especially in high-stake situations [82]. Educators 4 and 7 discuss other challenges such as limited perceived usefulness due to lack of knowledge and the skepticism towards new technologies [83].

主题“生成式 AI 的挑战”包含了所有与将生成式 AI 整合到教育中相关的担忧,这些担忧无法归入单一主题。表 11 列出了讨论该主题的教育工作者的最终代码。教育工作者 1 讨论了诸如需要验证 AI 生成的内容和代码 [81] 以及由于使用生成式 AI 工具的政策不一致而可能导致的学生挫败感 [37] 等挑战。人类监督对于确保 AI 生成内容的准确性和可靠性至关重要,尤其是在高风险情况下 [82]。教育工作者 4 和 7 讨论了其他挑战,例如由于缺乏知识和对新技术的怀疑而导致的感知有用性有限 [83]。

Table 10: Codes for Advantages of GenAI

表 10: GenAI 优势代码

Edc. 代码
3 ChatGPT 促进独立学习,减少对教育者的依赖
4 6 ChatGPT 展现出礼貌的个性,GPT-4 在需要数值时生成 Python 脚本,增强智能 (Augmented Intelligence) 是未来决策和问题解决的未来,智能增强 (Intelligence Augmentation) 结合了人类直觉和 AI 的系统化技能,IA 将提高现代化和速度,重复性任务可以由 AI 执行,AI 可以减少人为错误和偏见

Edc. Codes

Edc. 代码

Table 11: Codes for Challenges of GenAI

表 11: 生成式 AI (GenAI) 挑战的代码

| 4 7 | 缺乏对 AI 生成代码的验证,采用 AI 驱动工具时的谨慎,由于 AI 使用政策不一致导致的学生挫败感,AI 生成内容的准确性和相关性必须得到验证,采用新技术的挑战,ChatGPT 被赋予类似人类的特性(幻觉),由于类似人类的特性而信任 ChatGPT,对有效整合和防止滥用的担忧,对生成式 AI 工具缺乏理解可能影响其感知的有用性 |

5.11. Miscellaneous

5.11. 其他

The ‘Miscellaneous’ theme consists of code that could not be aggregated into other themes. Table 12 lists the final codes for the theme for the educators who discussed it. Educator 3 discusses the drastic shift towards online learning that occurred during the pandemic [84], and the challenges that were faced in reliably and correctly assessing online tests and assignments [85]. Educator 7 discusses the variability in adoption of GenAI across different generations and departments of an institute. Age can be a determining factor in an educator’s willingness to adopt a GenAI tool [65]. While younger and middle-aged educators approach the new tool with optimism and concern, older educators are more skeptical of it, and preferred to maintain control of the key aspects of teaching [86]. The apprehension demonstrated by educators could be due to a lack of familiarity with the technology, concerns about loss of employment, and resistance to change [83].

“其他”主题包含无法归类到其他主题的代码。表 12 列出了讨论该主题的教师的最终代码。教师 3 讨论了疫情期间向在线学习的急剧转变 [84],以及在可靠且正确地评估在线测试和作业时面临的挑战 [85]。教师 7 讨论了不同世代和部门对生成式 AI (Generative AI) 的采用差异。年龄可能是教师是否愿意采用生成式 AI 工具的决定性因素 [65]。虽然年轻和中年的教师对新工具持乐观和担忧的态度,但年长的教师对此持怀疑态度,并更倾向于保持对教学关键方面的控制 [86]。教师表现出的担忧可能是由于对技术的不熟悉、对失业的担忧以及对变革的抵触 [83]。

Table 12: Codes for Miscellaneous

表 12: 杂项代码

Edc. Codes
3 7 在疫情期间转向混合学习环境,Z世代学生对生成式 AI (Generative AI) 作为工具表现出兴趣,X世代和Y世代教师表现出乐观和担忧,不同部门对生成式 AI 有不同的态度,人文学科部门严重依赖论文生成,商学院由于项目型工作采取了更平衡的方法

6. Exploratory Data Analysis

6. 探索性数据分析

In the section, the findings of performing EDA on the opinion essays have been discussed.

在本节中,讨论了在观点文章上执行探索性数据分析 (EDA) 的结果。

Table 13: Top 10 most frequent words with count for each educator

表 13: 每位教育工作者使用频率最高的前 10 个词汇及其出现次数

教育工作者 词汇(频率)
1 student (16), chatbots (14), ai (13), data (6), skill (6), could (5), answer (4), exercise (4), third (4), level (4)
2 ai (7), tool (7), many (6), generative (6), use (5), student (4), search (4), assessment (4), largely (3), language (3)
3 student (13), learning (12), work (6), pandemic (5), ai (5), need (5), formal (4), onsite (4), exam (4), tool (4)
4 text (17), student (11), tool (10), plagiarism (10), even (9), chatgpt (9), work (7), may (7), llm (6), originality (6)
5 text (12), ai (11), language (10), llm (10), content (9), chat-gpt (6), model (6), code (6), ethical (5), bard (5)
9 human (17), intelligence (10), ai (9), approach (8), ia (7), student (5), future (5), function (4), set (4), artificial (3)
7 ai (23), gen (18), tool (15), student (12), assessment (12), usage (8), authentic (7), also (6), staff (5), across (4)

Table 13 displays the top 10 most frequent words for each educator with their respective counts. Words such as ‘student’, ‘ai’, and ‘tool’ appear in the top 10 most frequent words for almost all educators, indicating that several educators discussed the role of students in the adoption of GenAI tools in education. ‘pandemic’ is amongst the most frequent words for Educator 3, as they discussed the shift towards online learning that occurred during the pandemic. Educator 5 has frequently mentioned examples of LLMs such as ‘chatgpt’ and ‘bard’. ‘ia’, or intelligence augmentation, has been mentioned 7 times by Educator 6, as they have highlighted several advantages of using GenAI tools in collaboration with human creativity and intelligence. Terms related to the theme ‘Academic Integrity and Challenges in Assessment’, such as ‘assessment’, ‘plagiarism’, and ‘exam’, are frequent across all educators.

表 13 显示了每位教育工作者使用频率最高的前 10 个单词及其相应的计数。诸如“学生”、“AI”和“工具”等词汇几乎出现在所有教育工作者的前 10 个高频词中,这表明几位教育工作者讨论了学生在教育中采用生成式 AI (Generative AI) 工具的角色。“pandemic”是 Educator 3 的高频词之一,因为他们讨论了疫情期间向在线学习的转变。Educator 5 经常提到大语言模型的例子,如“chatgpt”和“bard”。“ia”,即智能增强 (intelligence augmentation),被 Educator 6 提到了 7 次,因为他们强调了使用生成式 AI 工具与人类创造力和智能协作的多个优势。与“学术诚信和评估挑战”主题相关的术语,如“评估”、“剽窃”和“考试”,在所有教育工作者中都很常见。

Table 14: Top 4 frequent bigrams

表 14: 前 4 个最常见的二元组

排名 二元组 出现次数
1 ai chatbots s (12), third level (4), simulate data (2), possible exercise (2) 12, 4, 2, 2
2 generative ai (5), artificial intelligence (2), refactoring tool (2), use generative (2) 5, 2, 2, 2
3 formal onsite (4), ai tool (4), onsite exam (3), student struggling (2) 4, 4, 3, 2
4 similarity score (4), language model (3), originality checker (3), large language (2) 4, 3, 3, 2
5 large language (4), language model (4), aigenerated content (4), generative ai (3) 4, 4, 4, 3
6 human intelligence (4), artificial intelligence (3), man machine (3), ai increasingly (2) 4, 3, 3, 2
7 gen ai (18), ai authentic (3), ai tool (3), authentic a assessment (3) 18, 3, 3, 3

Table 14 displays the top 4 most frequent bigrams across all educators. Again, bigrams such as ‘originality checker’ and ‘authentic assessment’, which correlate to the theme ‘Academic Integrity and Challenges in Assessment’ are frequent across all educators.

表 14 显示了所有教育工作者中最常见的 4 个二元组。同样,与主题“学术诚信和评估挑战”相关的二元组,如“原创性检查器”和“真实性评估”,在所有教育工作者中都很常见。

Table 15: Sentence count and Average word count per sentence

表 15: 句子数量及每句平均单词数

教育者 句子数量 每句平均单词数
1 19.59
2 39 20
3 15 36.33
4 59 24.93
5 34 21.85
6 27 28.52
7 32 30.72

Table 15 displays the sentence count and average word count per sentence. Educator 4 has the highest number of sentences (59), and the longest opinion essay, indicating that they discussed several themes (4 as can be seen from Figure 3). Educator 1 has the second highest number of sentences, indicating that they have made several arguments and discussed several themes (5 as can be seen from Figure 3). Educator 3 has the lowest number of sentences (15), but also the highest average word count per sentence (36.33), and has discussed only 2 themes (as can be seen from Figure 3). This suggests that the opinions essays vary greatly in length.

表 15 显示了句子数量和每句的平均单词数。Educator 4 的句子数量最多(59 句),并且拥有最长的意见文章,表明他们讨论了多个主题(从图 3 中可以看到有 4 个主题)。Educator 1 的句子数量位居第二,表明他们提出了多个论点并讨论了多个主题(从图 3 中可以看到有 5 个主题)。Educator 3 的句子数量最少(15 句),但每句的平均单词数最高(36.33),并且只讨论了 2 个主题(从图 3 中可以看到)。这表明意见文章的长度差异很大。

7. Discussion

7. 讨论

In this section, the findings of the study will be discussed in context with the research questions and hypothesis. Limitations of the study will also be highlighted, along with future scope of the work.

在本节中,将结合研究问题和假设讨论研究结果。同时,还将强调研究的局限性以及未来的工作范围。

in Assessment’, which discussed concerns such as plagiarism, academic misconduct, outdated assessment approaches [66], etc. The development of tools to detect AI- generated content is challenging [67] [35]. Another theme that identified a specific challenge was Critical Thinking and Problem-Solving Skills,’ where educators discussed the harm that over-reliance on GenAI tools can do to the development of critical thinking skills [44] [71]. The theme ‘Limitations and Misuse of Generative AI’ discusses limitations of GenAI technology such as their tendency to hallucinate [52], the potential for misuse and over-reliance, and the necessity to possess a foundational knowledge of the subject when using these tools. The theme ‘Bias, Transparency, and Ethical Concerns’ discusses the concerns that arise from unintended biases in AI [72], potential copyright and authorship conflicts [73] [58], the lack of transparency in the mechanism of GenAI tools, and the potential harm from the misinformation caused by deepfakes and other false generated content [74]. The theme ‘Challenges of GenAI’ discusses the challenges that need to addressed to ensure responsible and ethical use of GenAI tools such as the need for verification of AI-generated content [81], lack of knowledge about the technology [37], and skepticism towards change [83].

在“评估”中,讨论了诸如剽窃、学术不端、过时的评估方法 [66] 等问题。开发检测 AI 生成内容的工具具有挑战性 [67] [35]。另一个识别出特定挑战的主题是“批判性思维和问题解决能力”,教育工作者讨论了过度依赖生成式 AI 工具对批判性思维能力发展的危害 [44] [71]。主题“生成式 AI 的局限性和滥用”讨论了生成式 AI 技术的局限性,例如其倾向于产生幻觉 [52]、滥用和过度依赖的可能性,以及在使用这些工具时具备基础学科知识的必要性。主题“偏见、透明度和伦理问题”讨论了 AI 中无意偏见 [72]、潜在的版权和作者身份冲突 [73] [58]、生成式 AI 工具机制缺乏透明度,以及由深度伪造和其他虚假生成内容引起的错误信息可能造成的危害 [74]。主题“生成式 AI 的挑战”讨论了为确保负责任和合乎道德地使用生成式 AI 工具而需要解决的挑战,例如需要验证 AI 生成内容 [81]、对该技术缺乏了解 [37],以及对变革的怀疑 [83]。

  1. RQ3: What are the findings of exploratory data analysis on the opinion essays? EDA on the educator responses has provided certain insights into the their opinions on the use of GenAI tools in education. The responses were varied in length, ranging form 15 to 59 sentences. After preprocessing and stopword removal, the top 10 words and top 4 bigrams for each essay was highlighted. Some of the most frequent tokens, such as ‘assessment’, ‘plagiarism’, and ‘exam’, align with the identified themes, ı`n this case ‘Academic Integrity and Challenges in Assessment’.
  2. RQ3: 关于意见文章的探索性数据分析有哪些发现?
    对教育工作者回应的探索性数据分析(EDA)揭示了他们对在教育中使用生成式 AI (Generative AI) 工具的看法。回应的长度不一,从 15 句到 59 句不等。经过预处理和停用词去除后,每篇文章的前 10 个单词和前 4 个二元词组被突出显示。一些最常见的 Token,如“assessment(评估)”、“plagiarism(抄袭)”和“exam(考试)”,与已确定的主题(如“学术诚信与评估中的挑战”)一致。

Therefore, educators perceive both advantages and drawbacks of GenAI in the education. Despite the various challenges that present themselves and the potential of misuse of these tools, effective policy making, guidance for proper use, and updated assessments methods can allow both students and educators to use these tools ethically. The study was limited in the form of feedback taken. The number of educators was only 7, with only two female educators present in the sample. Analysing opinions of a large number of educators from different demographics may allow for a better insight.

因此,教育工作者认为生成式 AI (Generative AI) 在教育中既有优势也有弊端。尽管存在各种挑战和这些工具可能被滥用的风险,但通过有效的政策制定、正确使用的指导以及更新的评估方法,学生和教育工作者可以以合乎道德的方式使用这些工具。该研究的反馈形式存在局限性。教育工作者的人数仅为 7 人,样本中只有两名女性教育工作者。分析来自不同人口统计背景的大量教育工作者的意见,可能会提供更深入的见解。

8. Conclusion

8. 结论

The advent and free availability of various GenAI tools and LLMs has the potential to significantly impact traditional educational practices. However, it is important to distinguish genuine concerns about the use of this technology from the hype, so that the necessary policies, laws, and frameworks may be developed to ensure it’s responsible integration in the educational sector. In this study, thematic analysis has been performed on opinions essays about the use of GenAI in education obtained from 7 educators. Several themes emerged from this analysis, which highlighted both the potential benefits and limitations of these tools. They can serve as a personal tutor, handle repetitive tasks with ease, and provide better engagement due to their ability to generate human-like text. However, several limitations and challenges also become apparent, such as academic integrity, plagiarism, development of new and well-rounded assessment methods, and ethical and copyright concerns. The most frequent tokens obtained by performing EDA on the opinion essays align with the identified themes. The future scope of this study includes obtaining expert feedback through other approaches (questionnaires, interviews, etc.), performing case studies on the subject, and performing sentiment analysis.

各种生成式 AI (Generative AI) 工具和大语言模型的问世及其免费可用性,有可能对传统教育实践产生重大影响。然而,重要的是要区分对这种技术使用的真正担忧与炒作,以便制定必要的政策、法律和框架,确保其在教育领域的负责任整合。在本研究中,我们对从 7 位教育工作者那里获得的关于在教育中使用生成式 AI 的意见文章进行了主题分析。分析中出现了几个主题,这些主题既突出了这些工具的潜在优势,也指出了它们的局限性。它们可以充当个人导师,轻松处理重复性任务,并由于其生成类人文本的能力而提供更好的参与度。然而,一些局限性和挑战也变得显而易见,例如学术诚信、剽窃、开发新的全面评估方法,以及伦理和版权问题。通过对意见文章进行探索性数据分析 (EDA) 获得的最常见的 Token 与已确定的主题一致。本研究的未来范围包括通过其他方法(问卷、访谈等)获取专家反馈,对该主题进行案例研究,并进行情感分析。

References

参考文献

阅读全文(20积分)