Semi-supervised Federated Learning for Activity Recognition

半监督活动识别的联邦学习

Abstract

Training deep learning models on in-home IoT sensory data is commonly used to recognise human activities. Recently, federated learning systems that use edge devices as clients to support local human activity recognition have emerged as a new paradigm to combine local (individual-level) and global (group-level) models. This approach provides better scalability and generalisability and also offers better privacy compared with the traditional centralised analysis and learning models. The assumption behind federated learning, however, relies on supervised learning on clients. This requires a large volume of labelled data, which is difficult to collect in uncontrolled IoT environments such as remote in-home monitoring. In this paper, we propose an activity recognition system that uses semi-supervised federated learning, wherein clients conduct unsupervised learning on autoencoders with unlabelled local data to learn general representations, and a cloud server conducts supervised learning on an activity classifier with labelled data. Our experimental results show that using a long short-term memory autoencoder and a Softmax classifier, the accuracy of our proposed system is higher than that of both centralised systems and semi-supervised federated learning using data augmentation. The accuracy is also comparable to that of supervised federated learning systems. Meanwhile, we demonstrate that our system can reduce the number of needed labels and the size of local models, and has faster local activity recognition speed than supervised federated learning does.

摘要

在室内IoT感官数据上训练深度学习模型通常用于识别人类活动。最近，使用边缘设备作为客户端以支持本地人类活动识别的联邦学习系统已经成为结合本地（个人）模型和全局（小组）模型的新范例。与传统的集中式分析和学习模型相比，此方法提供了更好的可伸缩性和通用性，并且还提供了更好的隐私性。但是，联邦学习背后的假设依赖于对客户的监督学习。这需要大量带标签的数据，而这些数据很难在不受控制的IoT环境（如远程家庭监控）中收集。

在本文中，我们提出了一种使用半监督联邦学习的活动识别系统，其中客户在具有未标记本地数据的自动编码器上进行无监督学习以学习一般表示，而云服务器在具有标签数据的活动分类器上进行监督学习。我们的实验结果表明，使用较长的短期内存自动编码器和Softmax分类器，我们提出的系统的准确性要高于集中式系统和使用数据增强的半监督联邦学习的准确性。准确性也可以与监督式联邦学习系统的准确性相提并论。同时，我们证明了我们的系统可以减少需要的标签数量和局部模型的大小，并且比有监督的联邦学习具有更快的局部活动识别速度。

Introduction 简介

Modern smart homes are integrating more and more Internet of Things (IoT) technologies in different application scenarios. The IoT devices can collect a variety of time-series data, including ambient data such as occupancy, temperature, and brightness, and physiological data such as weight and blood pressure. With the help of machine learning (ML) algorithms, these sensory data can be used to recognise people’s activities at home. Human activity recognition (HAR) using IoT data has the promise that can significantly improve quality of life for people who require in-home care and support. For example, anomaly detection based on recognised activities can raise alerts when an individual’s health deteriorates. The alerts can then be used for early interventions (Enshaeifar et al., 2018; Cao et al., 2015; Queralta et al., 2019). Analysis of long-term activities can help identify behaviour changes, which can be used to support clinical decisions and healthcare plans (Enshaeifar et al., 2018).

An architecture for HAR is to deploy devices with computational resources at the edge of networks, which is normally within people’s homes. Such “edge devices” are capable of communicating with sensory devices to collect and aggregate the sensory data, and running ML algorithms to process the in-home activity and movement data. With the help of a cloud back-end, these edge devices can form a federated learning (FL) system (McMahan et al., 2017; Yang et al., 2019; Zhao et al., 2020), which is increasingly used as a new system to learn at population-level while constructing personalised edge models for HAR. In an FL system, clients jointly train a global Deep Neural Network (DNN) model by sharing their local models with a cloud back-end. This design enables clients to use their data to contribute to the training of the model without breaching privacy. One of the assumptions behind using the canonical FL system for HAR is that data on clients are labelled with corresponding activities so that the clients can use these data to train supervised local DNN models. In HAR using IoT data, due to the large amount of time-series data that are continuously generated from different sensors, it is difficult to guarantee that end-users are capable of labelling activity data at a large scale. Thus, the availability of labelled data on clients is one of the challenges that impede the adoption of FL systems in real-world HAR applications.

Existing solutions to utilise unlabelled data in FL systems is through data augmentation (Jeong et al., 2020; Liu et al., 2020; Zhang et al., 2020). The server of an FL system keeps some labelled data and use them to train a global model through supervised learning. The clients of the system receive the global model and use it to generate pseudo labels on augmented local data. However, this approach couples the local training on clients with the specific task (i.e., labels) from the server. If a client accesses multiple FL servers for different tasks, it has to generate pseudo labels for each of them locally, which increases the cost of local training.

现代智能家居正在不同的应用场景中集成越来越多的物联网（IoT）技术。物联网设备可以收集各种时间序列数据，包括环境数据（例如占用率，温度和亮度）以及生理数据（例如体重和血压）。借助机器学习（ML）算法，这些感官数据可用于识别人们在家中的活动。使用物联网数据的人类活动识别（HAR）有望大大改善需要家庭护理和支持的人们的生活质量。例如，当个人的健康状况恶化时，基于已识别活动的异常检测可以发出警报。然后可以将警报用于早期干预（Enshaeifar et al。，2018; 曹等。，2015 ; Queralta等。，2019）。长期活动的分析可以帮助识别行为变化，这可以用于支持临床决策和医疗保健计划（Enshaeifar等，2018）。

HAR的体系结构是将具有计算资源的设备部署在通常位于人们家中的网络边缘。这样的“边缘设备”能够与感官设备进行通信以收集和汇总感官数据，并且能够运行ML算法来处理家庭活动和运动数据。借助云后端，这些边缘设备可以形成联邦学习（FL）系统（McMahan等，2017； Yang等，2019； Zhao等，2020）。，它越来越多地被用作在人口层次上学习的新系统，同时为HAR构建个性化边缘模型。在FL系统中，客户通过与云后端共享本地模型来共同训练全局深度神经网络（DNN）模型。这种设计使客户可以使用他们的数据为模型的训练做出贡献，而不会破坏隐私。将规范的FL系统用于HAR的假设之一是，客户端上的数据被标记有相应的活动，以便客户端可以使用这些数据来训练受监督的本地DNN模型。在使用物联网数据的HAR中，由于从不同传感器连续生成的大量时间序列数据，很难保证最终用户能够大规模标记活动数据。因此，

现有的在FL系统中利用未标记数据的解决方案是通过数据增强（Jeong等，2020； Liu等，2020； Zhang等，2020）。FL系统的服务器保留一些标记的数据，并使用它们通过监督学习来训练全局模型。系统的客户端接收全局模型，并使用它在增强的本地数据上生成伪标签。但是，这种方法将针对客户的本地训练与特定任务结合在一起（即，标签）。如果客户端访问多个FL服务器以执行不同任务，则必须在本地为每个FL服务器生成伪标签，这会增加本地训练的成本。

In centralised ML, unsupervised learning on DNN such as autoencoders (Baldi, 2011) has been widely used to learn general representations from unlabelled data. The learned representations can then be utilised to facilitate supervised learning models with labelled data. A recent study by van Berlo et al. (van Berlo et al., 2020) shows that temporal convolutional networks can be used as autoencoders to learn representations on clients of an FL system. The representations can help with training of the global supervised model of an FL system. The resulting model’s performance is comparable to that of a fully supervised algorithm. Building upon this promising result, we propose a semi-supervised FL system that realises activity recognition using time-series data at the edge, without labelled IoT sensory data on clients, and evaluate how different factors (e.g., choices of models, the number of labels, and the size of representations) affect the performance (e.g., accuracy and inference time) of the system.

In our proposed design, clients locally train autoencoders with unlabelled time-series sensory data to learn representations. These local autoencoders are then sent to a cloud server that aggregates them into a global autoencoder. The server integrates the resulting global autoencoder into the pipeline of the supervised learning process. It uses the encoder component of the global autoencoder to transform a labelled dataset into labelled representations, with which a classifier can be trained. Such a labelled dataset on the cloud back-end server can be provided by service providers without necessarily using any personal data from users (e.g., open data or data collected from laboratory trials with consents). Whenever the server selects a number of clients, both the global autoencoder and the global classifier are sent to the clients to support local activity recognition.

We evaluated our system through simulations on different HAR datasets, with different system component designs and data generation strategies. We also tested the local activity recognition part of our system on a Raspberry Pi 4 model B, which is a low-cost edge device. With the focus on HAR using time-series sensory data, we are interested in answering the research questions as follows:

Q1. How does semi-supervised FL using autoencoders perform in comparison to supervised learning on a centralised server?
Q2. How does semi-supervised FL using autoencoders perform in comparison to semi-supervised FL using data augmentation?
Q3. How does semi-supervised FL using autoencoders perform in comparison to supervised FL?
Q4. How do the key parameters of semi-supervised FL, including the number of labels on the server and the size of learned representations, affect its performance.
Q5. How efficient is semi-supervised FL on low-cost edge devices?

Our experimental results demonstrate several key findings:

Using long short-term memory autoencoders as local models and a Softmax classifier model as a global classifier, the accuracy of our system is higher than that of a centralised system that only conducts supervised learning in the cloud, which means that learning general representations locally improves the performance of the system.
Our system also has higher accuracy than semi-supervised FL using data augmentation to generate pseudo labels does.
Our system can achieve comparable accuracy to that of a supervised FL system.
By only conducting supervised learning in the cloud, our system can significantly reduce the needed number of labels without losing much accuracy.
By using autoencoders, our system can reduce the size of local models. This can potentially contribute to the reduction of upload traffic from the clients to the server.
The processing time of our system when recognising activities on a low-cost edge device is acceptable for real-time applications and is significantly lower than that of supervised FL.

在集中式机器学习中，在DNN上的无监督学习（例如自动编码器）（Baldi，2011年）已被广泛用于从未标记的数据中学习一般表示。然后，可以利用学习到的表示来促进带有标记数据的监督学习模型。van Berlo等人的最新研究*。*（van Berlo等人，2020年）显示了时间卷积网络可以用作自动编码器，以学习FL系统客户端上的表示形式。这些表示形式可以帮助训练FL系统的全局监督模型。最终模型的性能可与完全监督算法的性能相媲美。基于此有希望的结果，我们提出了一种半监督的FL系统，该系统使用边缘的时间序列数据实现活动识别，而无需在客户端上标记IoT传感数据，并评估不同因素（例如，模型选择，标签和表示的大小）会影响系统的性能（例如，准确性和推断时间）。

在我们提出的设计中，客户在本地使用未标记的时间序列感官数据训练自动编码器，以学习表示形式。然后将这些本地自动编码器发送到云服务器，该服务器将它们聚合为全局自动编码器。服务器将生成的全局自动编码器集成到监督学习过程的管道中。它使用全局自动编码器的编码器组件将带标签的数据集转换为带标签的表示形式，从而可以训练分类器。服务提供商可以在云后端服务器上提供这样的标记数据集，而不必使用用户的任何个人数据（例如，开放数据或在同意的情况下从实验室试验中收集的数据）。每当服务器选择多个客户端时，

我们通过对具有不同系统组件设计和数据生成策略的不同HAR数据集进行仿真来评估我们的系统。我们还在Raspberry Pi 4模型B（一种低成本的边缘设备）上测试了系统的本地活动识别部分。着眼于使用时间序列感官数据的HAR，我们有兴趣回答以下研究问题：

Q1。与集中式服务器上的监督学习相比，使用自动编码器的半监督FL如何执行？
Q2。与使用数据增强的半监督FL相比，使用自动编码器的半监督FL如何执行？
Q3。与监督型FL相比，使用自动编码器的半监督型FL的性能如何？
Q4。半监督FL的关键参数（包括服务器上标签的数量和学习到的表示的大小）如何影响其性能。
Q5。半监督FL在低成本边缘设备上的效率如何？

我们的实验结果表明了几个关键发现：

使用长短期内存自动编码器作为局部模型，并使用Softmax分类器模型作为全局分类器，我们的系统的准确性要高于仅在云中进行监督学习的集中式系统，这意味着在本地学习通用表示会有所改善系统的性能。
与使用数据增强生成伪标签的半监督FL相比，我们的系统还具有更高的准确性。
我们的系统可以达到与监督FL系统相当的精度。
通过仅在云中进行有监督的学习，我们的系统可以在不损失太多准确性的情况下显着减少所需的标签数量。
通过使用自动编码器，我们的系统可以减小局部模型的大小。这可能有助于减少从客户端到服务器的上传流量。
我们的系统在识别低成本边缘设备上的活动时的处理时间对于实时应用是可以接受的，并且比监督式FL的处理时间要短得多。

As one of the key applications of IoT that can significantly improve the quality of people's lives, HAR has attracted an enormous amount of research. Many HAR systems have been proposed to be deployed at the edge of networks, thanks to the evergrowing computational power of different types of edge devices.

作为可以显着改善人们生活质量的物联网的关键应用之一，HAR吸引了大量研究。由于不同类型的边缘设备的不断增长的计算能力，许多HAR系统已被提议部署在网络边缘。

HAR at the edge 边缘的HAR

In comparison to having both data and algorithms in the cloud, edge computing instead deploys devices closer to end users of services, which means that data generated by the users and computation on these data can stay on the devices locally. Modern edge devices such as Intel Next Unit of Computing (NUC) and Raspberry Pi are capable of running DNN models and providing real-time activity recognition from videos. Many deep learning models such as long short-term memory (LSTM) or convolutional neural network (CNN) can be applied at the edge for HAR. For example, Zhang proposed an HAR system that utilised both edge computing and back-end cloud computing. One implementation of this kind of HAR edge systems was proposed by Cao , which implemented fall detection both at the edge and in the cloud. Their results show that their system has lower response latency than that of a cloud based system. Queralta also proposed a fall detection system that achieved over 90% precision and recall. Uddin proposed a system that used more diverse body sensory data including electrocardiography (ECG), magnetometer, accelerometer, and gyroscope readings for activity recognition. These HAR systems, however, send the personal data of their users to a back-end cloud server to train deep learning models, which poses great privacy threats to the data subjects. Servia-Rodr'guez proposed a system in which a small group of users voluntarily share their data to the cloud to train a model. Other users in the system can download this model for local training, which protects the privacy of the majority in the system but does not utilise the fine trained local models from different users to improve the performance of each other's models. To improve the utility of local models and protect privacy at the same time, we apply federated learning to HAR at the edge, which can train a global deep learning model with constant contributions from users but does not require the users to send their personal data to the cloud.

与将数据和算法都存储在云中相比，边缘计算（Shi等人，2016）将设备部署在距离服务最终用户更近的地方，这意味着用户生成的数据以及对这些数据的计算可以保留在设备上本地。英特尔®下一代计算单元（NUC）（15）和树莓派（28）等现代边缘设备能够运行DNN模型（Servia-Rodriguez等人，2018 ; Chen和Ran，2019）并提供实时活动识别（Liu et al。，2018; Cartas等。，2019）从视频中获取。许多深度学习模型，例如长短期记忆（LSTM）（Hochreiter和Schmidhuber，1997 ; Guan和Plötz，2017 ; Hammerla等人，2016）或卷积神经网络（CNN）（Hammerla等人，2016）都可以适用于HAR的边缘。例如，Zhang等。（Zhang等人，2018）提出了一种同时利用边缘计算和后端云计算的HAR系统。Cao等人提出了这种HAR边缘系统的一种实现方式*。*（Cao et al。，2015），该算法在边缘和云中都实现了跌倒检测。他们的结果表明，他们的系统比基于云的系统具有更低的响应延迟。Queralta 等。 Queralta et al。，2019）还提出了一种跌倒检测系统，该系统可实现90％以上的精度和召回率。乌丁（乌丁，2019）提出了一种系统，该系统使用包括心电图（ECG），磁力计，加速度计和陀螺仪读数在内的多种身体感觉数据进行活动识别。

但是，这些HAR系统会将其用户的个人数据发送到后端云服务器以训练深度学习模型，这对数据主体构成了极大的隐私威胁。Servia-Rodríguez 等。（Servia-Rodriguez等人，2018）提出了一个系统，其中一小部分用户自愿将其数据共享到云中以训练模型。系统中的其他用户可以下载此模型以进行本地训练，这可以保护系统中大多数用户的隐私，但不会利用来自不同用户的经过良好训练的本地模型来提高彼此模型的性能。为了提高本地模型的效用并同时保护隐私，我们应用了联邦学习（McMahan等，2017）到边缘的HAR，可以在用户不断贡献的情况下训练全球深度学习模型，但不需要用户将其个人数据发送到云中。

HAR with federated learning Har与联邦学习

Federated learning (FL) (McMahan et al., 2017; Yang et al., 2019) was proposed as an alternative to traditional cloud based deep learning systems. It uses a cloud server to coordinate different clients to collaboratively train a global model. The server periodically sends the global model to a selection of clients that use their local data to update the global model. The resulting local models from the clients will be sent back to the server and be aggregated into a new global model. By this means, the global model is constantly updated using users’ personal data, without having these data in the server. Since FL was proposed, it has been widely adopted in many applications (Li et al., 2020; Yu et al., 2020) including HAR. Sozinov et al. (Sozinov et al., 2018) proposed an FL based HAR system and they demonstrated that its performance is comparable to that of its centralised counterpart, which suffers from privacy issues. Zhao et al. (Zhao et al., 2020) proposed an FL based HAR system for activity and health monitoring. Their experimental results show that, apart from acceptable accuracy, the inference time of such a system on low-cost edge devices such as Raspberry Pi is marginal. Feng et al. (Feng et al., 2020) introduced locally personalised models in FL based HAR systems to further improve the accuracy for mobility prediction. Specifically, HAR applications that need both utility and privacy guarantees such as smart healthcare can benefit from the accurate recognition and the default privacy by design of FL. For example, the system recently proposed by Chen et al. (Chen et al., 2020) applied FL to wearable healthcare, with a specific focus on the auxiliary diagnosis of Parkinson’s disease.

Existing HAR systems with canonical FL use supervised learning that relies on the assumption that all local data on clients are properly labelled with activities. This assumption is difficult to be satisfied in the scenario of IoT using sensory data. Compared to the existing FL based HAR systems, we aim to address this issue by utilising semi-supervised machine learning, which does not need locally labelled data.

联邦学习（FL）被提出为传统云的深度学习系统的替代方案（McMahan等人，2017 ; 它使用云服务器来协调不同的客户端，以协作训练全局模型。服务器定期将全局模型发送给使用其本地数据更新全局模型的客户端选择。客户端生成的本地模型将被发送回服务器，并被汇总为新的全局模型。通过这种方式，可以使用用户的个人数据不断更新全局模型，而无需将这些数据存储在服务器中。自从FL提出以来，它已在许多应用中被广泛采用（Li等。，2020 ; Yu等。（2020年），包括HAR。Sozinov 等。 Sozinov et al。，2018）提出了一种基于FL的HAR系统，他们证明了其性能与集中式同类产品的性能相当，后者受到隐私问题的困扰。赵等。（Zhao et al。，2020）提出了一种基于FL的HAR系统，用于活动和健康监测。他们的实验结果表明，除了可接受的精度外，这种系统在诸如Raspberry Pi之类的低成本边缘设备上的推理时间也很短。冯等。（冯等。（2020年）在基于FL的HAR系统中引入了本地个性化模型，以进一步提高移动性预测的准确性。具体来说，需要同时提供实用程序和隐私保证的HAR应用程序（例如智能医疗保健）可以通过FL的设计从准确识别和默认隐私中受益。例如，Chen等人最近提出的系统*。*（Chen et al。，2020）将FL应用于可穿戴医疗保健，特别侧重于帕金森氏病的辅助诊断。

现有的具有规范FL的HAR系统使用监督学习，这种学习依赖于这样的假设，即客户端上的所有本地数据都正确地标记有活动。在使用传感数据的物联网场景中，很难满足这个假设。与现有的基于FL的HAR系统相比，我们旨在通过利用半监督机器学习来解决此问题，该学习不需要本地标记的数据。

Semi-supervised federated learning 半监督联邦学习

Semi-supervised learning combines both supervised learning that requires labelled data and unsupervised learning that does not use labels when training DNN models. Traditional centralised ML has benefited from semi-supervised learning techniques such as transfer learning (Khan and Roy, 2018; Zhang and Ardakanian, 2019) and autoencoders (Baldi, 2011). These techniques have been widely used in centralised ML such as learning time-series representations from videos (Srivastava et al., 2015), learning representations to compress local data (Hu and Krishnamachari, 2020), and learning representations that do not contain sensitive information (Malekzadeh et al., 2018).

The challenge of having available local labels in FL has motivated a number of systems that aim to realise FL in a semi-supervised or self-supervised fashion. The majority of the existing solutions in this area focuses on generating pseudo labels for unlabelled data and using these labels to conduct supervised learning (Jeong et al., 2020; Liu et al., 2020; Zhang et al., 2020; Long et al., 2020; Zhang et al., 2021; Kang et al., 2020; Wang et al., 2020; Yang et al., 2020). For example, Jeong et al. (Jeong et al., 2020) use data augmentation to generate fake labels and keep the consistency of the labels across different FL clients. However, the inter-client consistency requires some clients to share their data with others, which poses privacy issues. Liu et al. (Liu et al., 2020) use labelled data on an FL server to train a model through supervised learning and then send this model to FL clients to generate labels on their local data. These solutions couple the local training on clients with the specific task from the server, which means that a client has to generate pseudo labels for all the servers that have different tasks.

Another direction of semi-supervised FL is to conduct unsupervised learning on autoencoders locally on clients instead of generating pseudo labels. Compared with existing solutions, the trained autoencoders learn general representations from data, which are independent from specific tasks. Preliminary results from the work by van Berlo et al. (van Berlo et al., 2020) show promising potential of using autoencoders to implement semi-supervised FL. Compared to their work, we evaluate different local models (i.e., autoencoders, convolutional autoencoders, and LSTM autoencoders), investigate different design considerations, and test how efficient its local activity recognition is when running on low-cost edge devices.
半监督学习结合了需要标记数据的监督学习和训练DNN模型时不使用标签的无监督学习。传统的集中式机器学习得益于半监督学习技术，例如转移学习（Khan和Roy，2018 ; Zhang和Ardakanian，2019）和自动编码器（Baldi，2011）。这些技术已广泛用于集中式机器学习中，例如从视频中学习时间序列表示（Srivastava et al。，2015），学习表示以压缩本地数据（Hu和Krishnamachari，2020）以及学习不包含敏感信息的表征（Malekzadeh等，2018）。

在FL中拥有可用的本地标签的挑战激发了许多旨在以半监督或自我监督方式实现FL的系统。该领域中的大多数现有解决方案着重于为未标记的数据生成伪标签，并使用这些标签进行监督学习（Jeong等，2020； Liu等，2020； Zhang等，2020； Long等）。，2020 ;张等人，2021 ;康等人。，2020 ;王等，2020 ;杨等。，2020年）。例如Jeong 等。（Jeong等人，2020年）使用数据增强来生成假标签，并在不同的FL客户端之间保持标签的一致性。但是，客户端之间的一致性要求某些客户端与其他客户端共享其数据，这带来了隐私问题。刘等。（Liu et al。，2020）在FL服务器上使用标记的数据通过监督学习来训练模型，然后将此模型发送给FL客户端以在其本地数据上生成标签。这些解决方案将客户端的本地训练与服务器上的特定任务结合在一起，这意味着客户端必须为具有不同任务的所有服务器生成伪标签。

半监督FL的另一个方向是在客户端本地对自动编码器进行无监督学习，而不是生成伪标签。与现有解决方案相比，训练有素的自动编码器从数据中学习通用表示形式，而与特定任务无关。van Berlo等人的工作的初步结果。（van Berlo et al。，2020）展示了使用自动编码器实现半监督FL的潜力。与他们的工作相比，我们评估了不同的本地模型（即自动编码器，卷积自动编码器和LSTM自动编码器），研究了不同的设计注意事项，并测试了在低成本边缘设备上运行时其本地活动识别的效率如何。

3.METHODOLOGY 方法

Our goal is to implement HAR using an FL system, without having any labelled data on the edge clients. We first introduce the long short-term memory model, which is a technique for analysing time-series data for HAR. We then introduce autoencoders, which are the key technique for deep unsupervised learning. We finally demonstrate the design of our proposed semi-supervised FL system and describe how unsupervised and supervised learning models are used in our framework.

我们的目标是使用FL系统实现HAR，而边缘客户端上没有任何带标签的数据。我们首先介绍长短期记忆模型（Hochreiter和Schmidhuber，1997年），这是一种分析HAR的时序数据的技术。然后，我们介绍自动编码器，这是深度无监督学习的关键技术。最后，我们演示了我们提出的半监督FL系统的设计，并描述了如何在我们的框架中使用无监督和监督学习模型。

3.1 Long short-term memory 长短期记忆

The long short-term memory (LSTM) belongs to recurrent neural network (RNN) models, which are a class of DNN that processes sequences of data points such as time-series data. At each time point of the time series, the output of an RNN, which is referred to as the hidden state, is fed to the network together with the next data point in the time-series sequence. An RNN works in a way that, as time proceeds, it recurrently takes and processes the current input and the previous output (, the hidden state), and generates a new output for the current time. Specifically for LSTM, Fig.lstm shows the network structure of a basic LSTM unit, which is called an LSTM cell. At each time $ t $ , it takes three input variables, which are the current observed data point $ X_{t} $ , the previous state of the cell $ C_{t-1} $ , and the previous hidden state $ h_{t-1} $ . For the case of applying LSTM to HAR, $ X_{t} $ is a vector of all the observed sensory readings at time $ t $ . $ h_{t} $ is the hidden state of the activity to be recognised in question.
长短期内存（LSTM）属于经常性神经网络（RNN）模型，这是一个类别的DNN，其处理诸如时间序列数据的数据点序列。在时间序列的每个时间点，将被称为隐藏状态的RNN的输出与时序序列中的下一个数据点一起馈送到网络。 RNN以时间的方式工作，随着时间的继续，它循环采用并处理当前输入和先前输出（，隐藏状态），并为当前时间生成新的输出。专门针对LSTM，图1S显示了基本LSTM单元的网络结构，称为LSTM * Cell *。在每个时间$ t $时，需要三个输入变量，它是当前观察到的数据点$ X_{t}$，上述单元格$ C_{t-1} $，以及先前的隐藏状态$ h_{t-1} $。对于将LSTM应用于Har的情况，$ X_{t} $是时间$ t $的所有观察到的感觉读数的向量。 $ h_{t} $是要识别的活动的隐藏状态。

Figure 1. Network structure of a long short-term memory (LSTM) cell. At each time point t, the current cell state $ C_{t} $ and hidden state $ h_{t} $is dependent on the previous cell state $ C_{t-1} $, the previous hidden state $ h_{t} $ , and the current observed data point $ X_{t} $.
图1.长短期记忆（LSTM）单元的网络结构。在每个时间点t，当前单元格状态 $ C_{t} $ 和隐藏状态 $ h_{t} $ 取决于先前的电池状态 $ C_{t-1} $个，以前的隐藏状态 $ h_{t} $ ，以及当前观察到的数据点 $ X_{t} $ 。

LSTM can be used in both supervised learning and unsupervised learning. For supervised learning, each $ X_{t} $ of a time-series sequence has a corresponding label $ Y_{t} $ (, activity class at time point $ t $ ) as the ground truth. The hidden state $ h_{t} $ can be fed into a Softmax classifier that contains a fully-connected layer and a Softmax layer. By this means, such an LSTM classifier can be trained against the labelled activities through feedforward and backpropagation to minimise the loss (, Cross-entropy loss) between the classifications and the ground truth. For unsupervised learning, LSTM can be trained as components of an autoencoder, which we will describe in detail in Sec.autoencoder.

lstm可用于监督学习和无人监督的学习。对于受监督学习，时间序列序列的每个$ X_{t} $具有相应的标签$ Y_{t} $（在时间点$ t $的活动类）作为地面真理。隐藏状态$ h_{t} $可以馈送到包含完全连接的图层和软邮件层的软MAX分类器中。通过这种方式，可以通过前馈和反向化对标记的活动训练这种LSTM分类器，以最小化分类和地面真理之间的损耗（，跨熵损失）。对于无监督的学习，LSTM可以被视为AutoEncoder的组件，我们将在Sec.AutoenCoder中详细描述。

3.2。自动编码器

An autoencoder is a type of neural network that is used to learn latent feature representations from data. Different from supervised learning that aims to learn a function $ f(X)\rightarrow Y $ from input variables $ X $ to labels $ Y $ , an autoencoder used in unsupervised learning tries to encode $ X $ to its latent representation $ h $ and to decode $ h $ into a reconstruction of $ X $ , which is presented as $ X^\prime $ . Fig.ae demonstrates two types of autoencoders that use different neural networks. The simple autoencoder in Fig.ae uses fully connected layers to encode $ X $ into $ h $ and then to decode $ h $ into $ X^\prime $ . The convolutional autoencoder in Fig.cae uses a convolutional layer that moves small kernels alongside the input $ X $ and conducts convolution operations on each part of $ X $ to encode it into $ h $ . The decoder part uses a transposed convolutional layer that moves small kernels on $ h $ to upsample it into $ X^\prime $ .

Figure 4. Network structures of a simple autoencoder and a convolutional autoencoder. The encoder part compresses the input X into a representation h that has fewer dimensions. The decoder part tries to generate a reconstruction X′ from h, which is supposed to be close to X.
图4.简单自动编码器和卷积自动编码器的网络结构。编码器部分压缩输入X 成为代表 H尺寸较小。解码器部分尝试生成重构X′ 从 H，应该接近 X。

Ideally, $ X^\prime $ is supposed to be as close to $ X $ as possible, based on the assumption that the key representations of $ X $ can be learned and encoded as $ h $ . As the dimensionality of $ h $ is lower than that of $ X $ , there is less information in $ h $ than in $ X $ . Thus the reconstructed $ X^\prime $ is likely to be a distorted version of $ X $ . The goal of training an autoencoder is to minimise the distortion, , minimising a loss function $ L(X,X^\prime) $ , thereby producing an encoder (e.

autoencoder是一种神经网络，用于学习来自数据的潜在特征表示。与监督学习不同，旨在从输入变量$ f(X) $到标签$ X $到标签$ Y $，用于无监督学习的autoencoder尝试编码 $ X $到其潜在表示$ h $和将 $ h $解码为$ X $的重建，其呈现为$ X^\prime $。图。演示了使用不同神经网络的两种类型的AutoEncoders。图1中的简单AutoEncoder使用完全连接的层将$ X $编码为$ h $，然后将$ h $解码为$ X^\prime $。图CAE中的卷积AutoEncoder使用卷积层，该卷积层与输入$ X $一起移动小内核，并在$ X $的每个部分上对其进行卷积操作以将其编码为$ h $。解码器部件使用转置的卷积层，该层在$ h $上移动小内核以将其上置为$ X^\prime $。

理想情况下，$ X^\prime $应该基于$ X $的密钥表示和编码为$的假设，如此接近$ X $。 h $。随着$ h $的维度低于$ X $的维度，$ h $中的信息较少而不是$ X $。因此，重建的$ X^\prime $可能是$ X $的扭曲版本。训练AutoEncoder的目标是最小化失真，最小化损耗函数$ L(X,X^\prime) $，从而产生编码器（如，完全连接的隐藏层或卷积隐藏层）可以在其表示$ h $中捕获$ X $最有用的信息。如上所述，STSM，LSTM也可以用作LSTM-AutoEncoder的组件以编码时间序列数据。如图1所示，LSTM小区用作AutoEncoder的编码器，并将时间串联序列作为其输入。最终隐藏状态$ h_{3} $是$ X_{3} $在序列$(X_{1},X_{2},X_{3}) $的上下文中的表示。由于LSTM编码器的隐藏状态基于输入观察和先前的隐藏状态，以这种方式生成的表示压缩了观察中的特征和时间序列序列的信息。解码器是另一个LSTM小区，以反向顺序重建原始序列。因此，LSTM-AutoEncoder的目标是最小化原始序列和重建序列之间的损耗。

Figure 5. Network structure of an LSTM-autoencoder. The time-series sequence (X1,X2,X3) is input into an LSTM encoder cell and the final output hidden state h3 (i.e., after X3 is input into the encoder) is the representation of X3 in the context of (X1,X2,X3). A sequence of the learned representation with the same length as that of the original sequence, i.e., (h3,h3,h3), is input into an LSTM decoder cell. The output sequence tries to reconstruct the original sequence in reversed order.
图5. LSTM自动编码器的网络结构。时间序列（X1个，X2个，X3）输入到LSTM编码器单元中，最终输出处于隐藏状态 H3（即，后X3 输入到编码器中）是 X3 在...的背景下（X1个，X2个，X3）。学习的表示的序列，其长度与原始序列的长度相同，即，（H3，H3，H3）被输入到LSTM解码器单元中。输出序列尝试以相反的顺序重建原始序列。

Since our system runs unsupervised learning locally at the edge and supervised learning in the cloud, we consider simple autoencoders, convolutional autoencoders, and LSTM-autoencoders in our proposed system, in order to understand how the location where time-series information is captured (, in supervised learning or unsupervised learning) affect the performance of our system.

自从我们的系统运行无监督在本地在云端监督学习，我们考虑简单的autoencoders，卷积式自动码器和lstm-autoencoders在我们的建议系统中，以了解如何定位捕获时间序列信息（在监督学习或无监督学习中）影响我们系统的性能。

System design 系统设计

In a canonical FL system, as in a client-server structure, a cloud server periodically sends a global model to selected clients for updating the model locally. As shown in Fig.system-supervised, in each communication round $ t $ , a global model $ w^g_{t} $ is sent to three selected clients, which conduct supervised learning on $ w^g_{t} $ with their labelled local data. The resulting local models are then sent to the server, which uses the federated averaging (FedAvg) algorithm to aggregate these models into a new global model $ w^g_{t+1} $ . The server and clients repeat this procedure through multiple communication rounds between them, thereby fitting the global model to clients' local data without releasing the data to the server.
在规范的FL系统中，就像在客户端-服务器结构中一样，云服务器会定期向选定的客户端发送全局模型，以在本地更新模型。如图6所示，在每个通信回合中$ t $中，将全局模型$ w^g_{t} $发送到三个选定的客户端，该客户在$ w^g_{t} $上进行监督学习，其中包含标记的本地数据。然后将生成的本地模型发送到服务器，该服务器使用联邦平均（FADVG）算法将这些模型聚合到新的全局型号$ w^{g}_{t+1} $中。服务器和客户端通过它们之间的多个通信轮次重复此过程，从而将全局模型拟合到客户端的本地数据，而不将数据释放到服务器。

Figure 6. System structure of a canonical federated learning (FL) system with supervised learning. The server selects 3 clients and sends the global $ w^g_{t} \ to them and the clients use their labelled data to update $ w^g_{t} $into their local models, which are then sent to the server to be aggregated into a new global model using the FedAvg algorithm.
图6.具有监督学习的规范联邦学习（FL）系统的系统结构。服务器选择3个客户端并发送全局$ w^g_{t} \给他们，客户使用他们标记的数据来更新$ w^g_{t} \本地模型，然后使用FedAvg算法将其发送到服务器以汇总为新的全局模型。

In order to address the lack of labels on clients in HAR with IoT sensory data, our proposed system applies semi-supervised learning in an FL system, in which clients use unsupervised learning to train autoencoders with their unlabelled data, and a server uses supervised learning to train a classifier that can map encoded representations to activities with a labelled dataset. As shown in Fig.system-semi, in each communication round, the server sends a global autoencoder $ w^ag_{t} $ to selected clients. In order to update $ w^ag_{t} $ locally, clients run unsupervised learning on $ w^ag_{t} $ with their unlabelled local data and then send the resulting local autoencoders to the server.

为了解决具有物联网传感数据的HAR中客户端上标签不足的问题，我们提出的系统在FL系统中应用了半监督学习，其中客户端使用无监督学习来训练自动编码器使用其未标记的数据，而服务器则使用监督学习训练一个分类器，该分类器可以将编码的表示形式映射到带有标签数据集的活动。

如图7所示，在每个通信回合中，服务器发送一个全局自动编码器 $ w^ag_{t} $发送到所选客户端。为了在本地更新$ w^ag_{t} $，客户端在$ w^ag_{t} $上运行无监督的学习，并将其未标记的本地数据发送，然后将生成的本地AutoEncoders发送到服务器。

The server follows the standard FedAvg algorithm to generate a new global autoencoder $ w^ag_{t+1} $ , which is then plugged into the pipeline of supervised learning with a labelled dataset $ D=(X,Y) $ . The server first uses the encoder part of $ w^ag_{t+1} $ to encode the original features $ X $ into representations $ X^\prime $ in order to generate a labelled representation dataset $ D^\prime=(X^\prime,Y) $ . Then the server conducts supervised learning with $ D^\prime $ to update a classifier $ w^s_{t} $ into $ w^s_{t+1} $ . Fig.alg shows the detailed semi-supervised algorithm of our system.

In each communication round $ t $ , the resulting classifier $ w^{s}{t} $ is also sent to selected clients with the global autoencoder $ w^{g{a}}{t} $ . In order to locally recognise activities from its observations $ X $ , a client first uses the encoder part of $ w^{g{a}}{t} $ to transform $ X $ into its presentation $ X^\prime $ , and then feeds $ X^\prime $ into the classifier $ w^{s}{t} $ to recognise the corresponding activities.

该服务器遵循标准FADVG算法，生成新的全局AutoEncoder $ w^{a_{g}}{t+1} $，然后用标记的DataSet $ D=(X,Y) $插入监督学习的管道。服务器首先使用$ w^{a{g}}{t+1} $的编码器部分将原始功能$ X $编码为表示$ X^\prime $，以生成标记表示数据集$ D^\prime,Y) $。然后，服务器通过$ D^\prime $进行监督学习，将分类器$ w^s{t} $更新为$ w^s_{t+1} $。图展示了我们系统的详细半监督算法。

在每个通信轮$ t $，所得到的分类器$ w^s_{t} $也被发送到选定的客户端与全球自动编码$ w^{g_{a}}{t} $。为了从其观测到$ X $，客户端首先使用$ w^{g{a}} T28_2{t} $的编码器部分将$ X $转换为其演示文稿$ X^\prime $，然后将$ X^\prime $馈送到分类器中$ w^s_{t} $识别相应的活动。

Figure 8. Algorithm of semi-supervised FL. nk and n are the numbers of unlabelled samples on client k and on all selected clients, respectively. LocalTraining is unsupervised learning on the global autoencoder $ w^ag_t $ on a client. CloudTraining is supervised learning on the classifier wst on the server.

Evaluation

We evaluated our system through simulations on different human activity datasets with different system designs and configurations. In addition we evaluated the local activity recognition algorithms of our system on a Raspberry Pi 4 model B. We want to answer research questions as follow:

Q1. How does our system perform in comparison to supervised learning on a centralised server?
Q2. How does our system perform in comparison to semi-supervised FL using data augmentation?
Q3. How does our system perform in comparison to supervised FL?
Q4. How do the key parameters of our system, including the size of labelled samples on the server and the size of learned representations, affect the performance of HAR.
Q5. How efficient is semi-supervised FL on low-cost edge devices.

评估

我们通过对具有不同系统设计和配置的不同人类活动数据集进行仿真来评估我们的系统。此外，我们在Raspberry Pi 4模型B上评估了系统的本地活动识别算法。我们想回答以下研究问题：

Q1。与集中式服务器上的监督学习相比，我们的系统的性能如何？
Q2。与使用数据增强的半监督FL相比，我们的系统的性能如何？
Q3。与监督FL相比，我们的系统的性能如何？
Q4。我们系统的关键参数（包括服务器上标记的样本的大小和学习到的表示的大小）如何影响HAR的性能。
Q5。半监督FL在低成本边缘设备上的效率如何。

Datasets 数据集

We used three HAR datasets that contain time-series sensory data in our evaluation. The datasets have different numbers of features and activities with different durations and frequencies.

The Opportunity (Opp) dataset (Chavarriaga et al., 2013) contains short-term and non-repeated kitchen activities of 4 participants. The Daphnet Freezing of Gait (DG) dataset (Bachlin et al., 2009) contains Parkinson’s Disease patients’ freezing of gaits incidents collected from 10 participants, which are also short-term and non-repeated. The PAMAP2 dataset (Reiss and Stricker, 2012) contains household and exercise activities collected from 9 participants, which are long-term and repeated. The data pre-processing procedure in our evaluation is the same as described by Hammerla et al. (Hammerla et al., 2016). Table 1 shows detailed information about the used datasets after being pre-processed.
我们在评估中使用了三个包含时间序列感官数据的HAR数据集。数据集具有不同数量的具有不同持续时间和频率的特征和活动。

机会（Opp）数据集（Chavarriaga等人，2013年）包含4位参与者的短期和非重复性厨房活动。Daphnet步态冻结（DG）数据集（Bachlin et al。，2009）包含帕金森氏病患者对10例参与者的步态事件的冻结，这些事件也是短期且未重复的。PAMAP2数据集（Reiss和Stricker，2012年）包含从9位参与者那里收集的长期和重复的家庭和锻炼活动。我们评估中的数据预处理程序与Hammerla等人描述的相同*。*（哈默拉等。，2016）。表 1 显示了有关预处理后使用的数据集的详细信息。

Dataset	Activities	Features	Classes	Train	Test
Opp	Kitchen	79	18	651k	119k
DG	Gait	9	3	792k	81k
PAMAP2	Household & Exercise	52	12	473k	83k

Table 1. HAR datasets in our experiments.

Simulation setup 模拟设置

We simulated a semi-supervised FL that runs unsupervised learning on 100 clients to locally update autoencoders and runs supervised learning on a server to update a classifier. In each communication round $ t $ , the server selects $ 100\cdot C $ clients to participate in the unsupervised learning, and $ C $ is the fraction of clients to be selected. Each selected client uses its local data to train the global autoencoder $ w^{a_{g}}{t} $ with a learning rate $ lr{a} $ for $ e_{a} $ epochs. The server conducts supervised learning to train the classifier $ w^{s}{t} $ with a learning rate $ lr{s} $ for $ e_{s} $ epochs. For each individual simulation setup, we conducted 64 replicates with different random seeds. Based on the assumption that a server is more computationally powerful than a client in practice, we set the learning rates $ lr_{a} $ and $ lr_{s} $ as 0.01 and 0.001, respectively. Similarly, we set the numbers of epochs $ e_{a} $ and $ e_{s} $ as 2 and 5, because an individual client is only supposed to run a small number of epochs of unsupervised learning and a server is capable of doing more epochs of supervised learning. The reason for setting $ e_{s}=5 $ is to keep the execution time of our simulation in an acceptable range. Nevertheless, we believe that this parameter on the server can be set as a larger number in real-world applications where more powerful clusters and graphics processing units (GPUs) can be deployed to accelerate the convergence of performance.

我们模拟了一个半监控的FL，在100个客户端上运行无监督的学习到本地更新AutoEncoders，并在服务器上运行监督学习以更新分类器。在每个通信循环$ t $中，服务器选择$ 100\cdot C $客户端参与无监督的学习，而$ C $是要选择的客户端的分数。每个所选客户端都使用其本地数据来训练全局AutoEncoder $ w^{a_{g}}{t} $，用于$ e{a} $时期的学习速率$ lr_{a} $。服务器进行监督学习，以训练分类器$ w^{t} $，用于$ e_{s} $时期的学习速率$ lr_{s} $。对于每个单独的仿真设置，我们用不同的随机种子进行了64个重复。基于服务器在实践中比客户更强大的假设，我们将学习速率$ lr_{a}$和$ lr_{s} $分别设置为0.01和0.001。同样，我们设置了时代的数量$ e_{a} $和$ e_{s} $为2和5，因为单个客户端仅应该运行少数无监督学习的时期，并且服务器能够做更多的监督时期的监督学习。设置$ e_{s}=5 $的原因是将我们的模拟的执行时间保持在可接受的范围内。尽管如此，我们认为服务器上的此参数可以在现实世界应用程序中设置为更大的数字，其中可以部署更强大的群集和图形处理单元（GPU）以加速性能的收敛。

Baselines 基准线

To answer Q1 and Q2, we consider two baselines to 1) evaluate whether the autoencoders in our system improve the performance of the system and 2) compare the performance of our system to that of data augmentation based semi-supervised FL.

Since we assume that labelled data exist on the server of the system, thus for ablation studies, we consider a baseline system that only uses these labelled data to conduct supervised learning on the server and sends trained models to clients for local activity recognition. This system trains an LSTM classifier on labelled data on the server and does not train any autoencoders on clients. We refer to this baseline of a centralised system as CS. Comparing the performance of CS to that of our proposed system will indicate whether the autoencoders in our system have any effectiveness in improving the performance of the trained model.

To compare our system with the state of the art, we consider a semi-supervised FL system that uses data augmentation to generate pseudo labels as another baseline. We refer to this baseline as DA. It first conducts supervise learning on labelled data on the server to train an LSTM classifier. It then follows standard FL protocols to sends the trained global model to clients. Each client uses the received model to generate pseudo labels on their unlabelled local data. To introduce randomness in data augmentation, we feed sequences with randomised lengths into the model when generating labels. The sequences are then paired with t

[论文翻译]Semi-supervised Federated Learning for Activity Recognition 半监督联邦学习活动识别

原文地址：https://arxiv.org/pdf/2011.00851v3.pdf

Semi-supervised Federated Learning for Activity Recognition

半监督活动识别的联邦学习

Abstract

摘要