半监督活动识别的联邦学习

Abstract

Training deep learning models on in-home IoT sensory data is commonly used to recognise human activities. Recently, federated learning systems that use edge devices as clients to support local human activity recognition have emerged as a new paradigm to combine local (individual-level) and global (group-level) models. This approach provides better scalability and generalisability and also offers better privacy compared with the traditional centralised analysis and learning models. The assumption behind federated learning, however, relies on supervised learning on clients. This requires a large volume of labelled data, which is difficult to collect in uncontrolled IoT environments such as remote in-home monitoring. In this paper, we propose an activity recognition system that uses semi-supervised federated learning, wherein clients conduct unsupervised learning on autoencoders with unlabelled local data to learn general representations, and a cloud server conducts supervised learning on an activity classifier with labelled data. Our experimental results show that using a long short-term memory autoencoder and a Softmax classifier, the accuracy of our proposed system is higher than that of both centralised systems and semi-supervised federated learning using data augmentation. The accuracy is also comparable to that of supervised federated learning systems. Meanwhile, we demonstrate that our system can reduce the number of needed labels and the size of local models, and has faster local activity recognition speed than supervised federated learning does.

.

Introduction 简介

Modern smart homes are integrating more and more Internet of Things (IoT) technologies in different application scenarios. The IoT devices can collect a variety of time-series data, including ambient data such as occupancy, temperature, and brightness, and physiological data such as weight and blood pressure. With the help of machine learning (ML) algorithms, these sensory data can be used to recognise people’s activities at home. Human activity recognition (HAR) using IoT data has the promise that can significantly improve quality of life for people who require in-home care and support. For example, anomaly detection based on recognised activities can raise alerts when an individual’s health deteriorates. The alerts can then be used for early interventions (Enshaeifar et al., 2018; Cao et al., 2015; Queralta et al., 2019). Analysis of long-term activities can help identify behaviour changes, which can be used to support clinical decisions and healthcare plans (Enshaeifar et al., 2018).

An architecture for HAR is to deploy devices with computational resources at the edge of networks, which is normally within people’s homes. Such “edge devices” are capable of communicating with sensory devices to collect and aggregate the sensory data, and running ML algorithms to process the in-home activity and movement data. With the help of a cloud back-end, these edge devices can form a federated learning (FL) system (McMahan et al., 2017; Yang et al., 2019; Zhao et al., 2020), which is increasingly used as a new system to learn at population-level while constructing personalised edge models for HAR. In an FL system, clients jointly train a global Deep Neural Network (DNN) model by sharing their local models with a cloud back-end. This design enables clients to use their data to contribute to the training of the model without breaching privacy. One of the assumptions behind using the canonical FL system for HAR is that data on clients are labelled with corresponding activities so that the clients can use these data to train supervised local DNN models. In HAR using IoT data, due to the large amount of time-series data that are continuously generated from different sensors, it is difficult to guarantee that end-users are capable of labelling activity data at a large scale. Thus, the availability of labelled data on clients is one of the challenges that impede the adoption of FL systems in real-world HAR applications.

Existing solutions to utilise unlabelled data in FL systems is through data augmentation (Jeong et al., 2020; Liu et al., 2020; Zhang et al., 2020). The server of an FL system keeps some labelled data and use them to train a global model through supervised learning. The clients of the system receive the global model and use it to generate pseudo labels on augmented local data. However, this approach couples the local training on clients with the specific task (i.e., labels) from the server. If a client accesses multiple FL servers for different tasks, it has to generate pseudo labels for each of them locally, which increases the cost of local training.

HAR 的体系结构是将具有计算资源的设备部署在通常位于人们家中的网络边缘。这样的“边缘设备”能够与感官设备进行通信以收集和汇总感官数据，并且能够运行 ML 算法来处理家庭活动和运动数据。借助云后端，这些边缘设备可以形成联邦学习（FL）系统 （McMahan 等，2017； Yang 等，2019； Zhao 等，2020）。，它越来越多地被用作在人口层次上学习的新系统，同时为 HAR 构建个性化边缘模型。在 FL 系统中，客户通过与云后端共享本地模型来共同训练全局深度神经网络（DNN）模型。这种设计使客户可以使用他们的数据为模型的训练做出贡献，而不会破坏隐私。将规范的 FL 系统用于 HAR 的假设之一是，客户端上的数据被标记有相应的活动，以便客户端可以使用这些数据来训练受监督的本地 DNN 模型。在使用物联网数据的 HAR 中，由于从不同传感器连续生成的大量时间序列数据，很难保证最终用户能够大规模标记活动数据。因此，

In centralised ML, unsupervised learning on DNN such as autoencoders (Baldi, 2011) has been widely used to learn general representations from unlabelled data. The learned representations can then be utilised to facilitate supervised learning models with labelled data. A recent study by van Berlo et al. (van Berlo et al., 2020) shows that temporal convolutional networks can be used as autoencoders to learn representations on clients of an FL system. The representations can help with training of the global supervised model of an FL system. The resulting model’s performance is comparable to that of a fully supervised algorithm. Building upon this promising result, we propose a semi-supervised FL system that realises activity recognition using time-series data at the edge, without labelled IoT sensory data on clients, and evaluate how different factors (e.g., choices of models, the number of labels, and the size of representations) affect the performance (e.g., accuracy and inference time) of the system.

In our proposed design, clients locally train autoencoders with unlabelled time-series sensory data to learn representations. These local autoencoders are then sent to a cloud server that aggregates them into a global autoencoder. The server integrates the resulting global autoencoder into the pipeline of the supervised learning process. It uses the encoder component of the global autoencoder to transform a labelled dataset into labelled representations, with which a classifier can be trained. Such a labelled dataset on the cloud back-end server can be provided by service providers without necessarily using any personal data from users (e.g., open data or data collected from laboratory trials with consents). Whenever the server selects a number of clients, both the global autoencoder and the global classifier are sent to the clients to support local activity recognition.

We evaluated our system through simulations on different HAR datasets, with different system component designs and data generation strategies. We also tested the local activity recognition part of our system on a Raspberry Pi 4 model B, which is a low-cost edge device. With the focus on HAR using time-series sensory data, we are interested in answering the research questions as follows:

• Q1. How does semi-supervised FL using autoencoders perform in comparison to supervised learning on a centralised server?
• Q2. How does semi-supervised FL using autoencoders perform in comparison to semi-supervised FL using data augmentation?
• Q3. How does semi-supervised FL using autoencoders perform in comparison to supervised FL?
• Q4. How do the key parameters of semi-supervised FL, including the number of labels on the server and the size of learned representations, affect its performance.
• Q5. How efficient is semi-supervised FL on low-cost edge devices?

Our experimental results demonstrate several key findings:

• Using long short-term memory autoencoders as local models and a Softmax classifier model as a global classifier, the accuracy of our system is higher than that of a centralised system that only conducts supervised learning in the cloud, which means that learning general representations locally improves the performance of the system.
• Our system also has higher accuracy than semi-supervised FL using data augmentation to generate pseudo labels does.
• Our system can achieve comparable accuracy to that of a supervised FL system.
• By only conducting supervised learning in the cloud, our system can significantly reduce the needed number of labels without losing much accuracy.
• By using autoencoders, our system can reduce the size of local models. This can potentially contribute to the reduction of upload traffic from the clients to the server.
• The processing time of our system when recognising activities on a low-cost edge device is acceptable for real-time applications and is significantly lower than that of supervised FL.

• Q1。与集中式服务器上的监督学习相比，使用自动编码器的半监督 FL 如何执行？
• Q2。与使用数据增强的半监督 FL 相比，使用自动编码器的半监督 FL 如何执行？
• Q3。与监督型 FL 相比，使用自动编码器的半监督型 FL 的性能如何？
• Q4。半监督 FL 的关键参数（包括服务器上标签的数量和学习到的表示的大小）如何影响其性能。
• Q5。半监督 FL 在低成本边缘设备上的效率如何？

• 使用长短期内存自动编码器作为局部模型，并使用 Softmax 分类器模型作为全局分类器，我们的系统的准确性要高于仅在云中进行监督学习的集中式系统，这意味着在本地学习通用表示会有所改善系统的性能。
• 与使用数据增强生成伪标签的半监督 FL 相比，我们的系统还具有更高的准确性。
• 我们的系统可以达到与监督 FL 系统相当的精度。
• 通过仅在云中进行有监督的学习，我们的系统可以在不损失太多准确性的情况下显着减少所需的标签数量。
• 通过使用自动编码器，我们的系统可以减小局部模型的大小。这可能有助于减少从客户端到服务器的上传流量。
• 我们的系统在识别低成本边缘设备上的活动时的处理时间对于实时应用是可以接受的，并且比监督式 FL 的处理时间要短得多。

As one of the key applications of IoT that can significantly improve the quality of people's lives, HAR has attracted an enormous amount of research. Many HAR systems have been proposed to be deployed at the edge of networks, thanks to the evergrowing computational power of different types of edge devices.

HAR at the edge 边缘的 HAR

In comparison to having both data and algorithms in the cloud, edge computing instead deploys devices closer to end users of services, which means that data generated by the users and computation on these data can stay on the devices locally. Modern edge devices such as Intel Next Unit of Computing (NUC) and Raspberry Pi are capable of running DNN models and providing real-time activity recognition from videos. Many deep learning models such as long short-term memory (LSTM) or convolutional neural network (CNN) can be applied at the edge for HAR. For example, Zhang proposed an HAR system that utilised both edge computing and back-end cloud computing. One implementation of this kind of HAR edge systems was proposed by Cao , which implemented fall detection both at the edge and in the cloud. Their results show that their system has lower response latency than that of a cloud based system. Queralta also proposed a fall detection system that achieved over 90% precision and recall. Uddin proposed a system that used more diverse body sensory data including electrocardiography (ECG), magnetometer, accelerometer, and gyroscope readings for activity recognition. These HAR systems, however, send the personal data of their users to a back-end cloud server to train deep learning models, which poses great privacy threats to the data subjects. Servia-Rodr'guez proposed a system in which a small group of users voluntarily share their data to the cloud to train a model. Other users in the system can download this model for local training, which protects the privacy of the majority in the system but does not utilise the fine trained local models from different users to improve the performance of each other's models. To improve the utility of local models and protect privacy at the same time, we apply federated learning to HAR at the edge, which can train a global deep learning model with constant contributions from users but does not require the users to send their personal data to the cloud.

HAR with federated learning Har 与联邦学习

Federated learning (FL) (McMahan et al., 2017; Yang et al., 2019) was proposed as an alternative to traditional cloud based deep learning systems. It uses a cloud server to coordinate different clients to collaboratively train a global model. The server periodically sends the global model to a selection of clients that use their local data to update the global model. The resulting local models from the clients will be sent back to the server and be aggregated into a new global model. By this means, the global model is constantly updated using users’ personal data, without having these data in the server. Since FL was proposed, it has been widely adopted in many applications (Li et al., 2020; Yu et al., 2020) including HAR. Sozinov et al. (Sozinov et al., 2018) proposed an FL based HAR system and they demonstrated that its performance is comparable to that of its centralised counterpart, which suffers from privacy issues. Zhao et al. (Zhao et al., 2020) proposed an FL based HAR system for activity and health monitoring. Their experimental results show that, apart from acceptable accuracy, the inference time of such a system on low-cost edge devices such as Raspberry Pi is marginal. Feng et al. (Feng et al., 2020) introduced locally personalised models in FL based HAR systems to further improve the accuracy for mobility prediction. Specifically, HAR applications that need both utility and privacy guarantees such as smart healthcare can benefit from the accurate recognition and the default privacy by design of FL. For example, the system recently proposed by Chen et al. (Chen et al., 2020) applied FL to wearable healthcare, with a specific focus on the auxiliary diagnosis of Parkinson’s disease.

Existing HAR systems with canonical FL use supervised learning that relies on the assumption that all local data on clients are properly labelled with activities. This assumption is difficult to be satisfied in the scenario of IoT using sensory data. Compared to the existing FL based HAR systems, we aim to address this issue by utilising semi-supervised machine learning, which does not need locally labelled data.

Semi-supervised federated learning 半监督联邦学习

Semi-supervised learning combines both supervised learning that requires labelled data and unsupervised learning that does not use labels when training DNN models. Traditional centralised ML has benefited from semi-supervised learning techniques such as transfer learning (Khan and Roy, 2018; Zhang and Ardakanian, 2019) and autoencoders (Baldi, 2011). These techniques have been widely used in centralised ML such as learning time-series representations from videos (Srivastava et al., 2015), learning representations to compress local data (Hu and Krishnamachari, 2020), and learning representations that do not contain sensitive information (Malekzadeh et al., 2018).

The challenge of having available local labels in FL has motivated a number of systems that aim to realise FL in a semi-supervised or self-supervised fashion. The majority of the existing solutions in this area focuses on generating pseudo labels for unlabelled data and using these labels to conduct supervised learning (Jeong et al., 2020; Liu et al., 2020; Zhang et al., 2020; Long et al., 2020; Zhang et al., 2021; Kang et al., 2020; Wang et al., 2020; Yang et al., 2020). For example, Jeong et al. (Jeong et al., 2020) use data augmentation to generate fake labels and keep the consistency of the labels across different FL clients. However, the inter-client consistency requires some clients to share their data with others, which poses privacy issues. Liu et al. (Liu et al., 2020) use labelled data on an FL server to train a model through supervised learning and then send this model to FL clients to generate labels on their local data. These solutions couple the local training on clients with the specific task from the server, which means that a client has to generate pseudo labels for all the servers that have different tasks.

Another direction of semi-supervised FL is to conduct unsupervised learning on autoencoders locally on clients instead of generating pseudo labels. Compared with existing solutions, the trained autoencoders learn general representations from data, which are independent from specific tasks. Preliminary results from the work by van Berlo et al. (van Berlo et al., 2020) show promising potential of using autoencoders to implement semi-supervised FL. Compared to their work, we evaluate different local models (i.e., autoencoders, convolutional autoencoders, and LSTM autoencoders), investigate different design considerations, and test how efficient its local activity recognition is when running on low-cost edge devices.

3.METHODOLOGY 方法

Our goal is to implement HAR using an FL system, without having any labelled data on the edge clients. We first introduce the long short-term memory model, which is a technique for analysing time-series data for HAR. We then introduce autoencoders, which are the key technique for deep unsupervised learning. We finally demonstrate the design of our proposed semi-supervised FL system and describe how unsupervised and supervised learning models are used in our framework.

3.1 Long short-term memory 长短期记忆

The long short-term memory (LSTM) belongs to recurrent neural network (RNN) models, which are a class of DNN that processes sequences of data points such as time-series data. At each time point of the time series, the output of an RNN, which is referred to as the hidden state, is fed to the network together with the next data point in the time-series sequence. An RNN works in a way that, as time proceeds, it recurrently takes and processes the current input and the previous output (, the hidden state), and generates a new output for the current time. Specifically for LSTM, Fig.lstm shows the network structure of a basic LSTM unit, which is called an LSTM cell. At each time $t$ , it takes three input variables, which are the current observed data point $X_{t}$ , the previous state of the cell $C_{t-1}$ , and the previous hidden state $h_{t-1}$ . For the case of applying LSTM to HAR, $X_{t}$ is a vector of all the observed sensory readings at time $t$ . $h_{t}$ is the hidden state of the activity to be recognised in question.

Figure 1. Network structure of a long short-term memory (LSTM) cell. At each time point t, the current cell state $C_{t}$ and hidden state $h_{t}$is dependent on the previous cell state $C_{t-1}$, the previous hidden state $h_{t}$ , and the current observed data point $X_{t}$.

LSTM can be used in both supervised learning and unsupervised learning. For supervised learning, each $X_{t}$ of a time-series sequence has a corresponding label $Y_{t}$ (, activity class at time point $t$ ) as the ground truth. The hidden state $h_{t}$ can be fed into a Softmax classifier that contains a fully-connected layer and a Softmax layer. By this means, such an LSTM classifier can be trained against the labelled activities through feedforward and backpropagation to minimise the loss (, Cross-entropy loss) between the classifications and the ground truth. For unsupervised learning, LSTM can be trained as components of an autoencoder, which we will describe in detail in Sec.autoencoder.

lstm 可用于监督学习和无人监督的学习。对于受监督学习，时间序列序列的每个$X_{t}$具有相应的标签$Y_{t}$（在时间点$t$的活动类）作为地面真理。隐藏状态$h_{t}$可以馈送到包含完全连接的图层和软邮件层的软 MAX 分类器中。通过这种方式，可以通过前馈和反向化对标记的活动训练这种 LSTM 分类器，以最小化分类和地面真理之间的损耗（，跨熵损失）。对于无监督的学习，LSTM 可以被视为 AutoEncoder 的组件，我们将在 Sec.AutoenCoder 中详细描述。

3.2。自动编码器

An autoencoder is a type of neural network that is used to learn latent feature representations from data. Different from supervised learning that aims to learn a function $f(X)\rightarrow Y$ from input variables $X$ to labels $Y$ , an autoencoder used in unsupervised learning tries to encode $X$ to its latent representation $h$ and to decode $h$ into a reconstruction of $X$ , which is presented as $X^\prime$ . Fig.ae demonstrates two types of autoencoders that use different neural networks. The simple autoencoder in Fig.ae uses fully connected layers to encode $X$ into $h$ and then to decode $h$ into $X^\prime$ . The convolutional autoencoder in Fig.cae uses a convolutional layer that moves small kernels alongside the input $X$ and conducts convolution operations on each part of $X$ to encode it into $h$ . The decoder part uses a transposed convolutional layer that moves small kernels on $h$ to upsample it into $X^\prime$ .

Figure 4. Network structures of a simple autoencoder and a convolutional autoencoder. The encoder part compresses the input X into a representation h that has fewer dimensions. The decoder part tries to generate a reconstruction X′ from h, which is supposed to be close to X.

Ideally, $X^\prime$ is supposed to be as close to $X$ as possible, based on the assumption that the key representations of $X$ can be learned and encoded as $h$ . As the dimensionality of $h$ is lower than that of $X$ , there is less information in $h$ than in $X$ . Thus the reconstructed $X^\prime$ is likely to be a distorted version of $X$ . The goal of training an autoencoder is to minimise the distortion, , minimising a loss function $L(X,X^\prime)$ , thereby producing an encoder (e.

autoencoder 是一种神经网络，用于学习来自数据的潜在特征表示。与监督学习不同，旨在从输入变量$f(X)$到标签$X$到标签$Y$，用于无监督学习的 autoencoder 尝试编码 $X$到其潜在表示$h$和 $h$解码为$X$的重建，其呈现为$X^\prime$。图。演示了使用不同神经网络的两种类型的 AutoEncoders。图 1 中的简单 AutoEncoder 使用完全连接的层将$X$编码为$h$，然后将$h$解码为$X^\prime$。图 CAE 中的卷积 AutoEncoder 使用卷积层，该卷积层与输入$X$一起移动小内核，并在$X$的每个部分上对其进行卷积操作以将其编码为$h$。解码器部件使用转置的卷积层，该层在$h$上移动小内核以将其上置为$X^\prime$。

Figure 5. Network structure of an LSTM-autoencoder. The time-series sequence (X1,X2,X3) is input into an LSTM encoder cell and the final output hidden state h3 (i.e., after X3 is input into the encoder) is the representation of X3 in the context of (X1,X2,X3). A sequence of the learned representation with the same length as that of the original sequence, i.e., (h3,h3,h3), is input into an LSTM decoder cell. The output sequence tries to reconstruct the original sequence in reversed order.

Since our system runs unsupervised learning locally at the edge and supervised learning in the cloud, we consider simple autoencoders, convolutional autoencoders, and LSTM-autoencoders in our proposed system, in order to understand how the location where time-series information is captured (, in supervised learning or unsupervised learning) affect the performance of our system.

System design 系统设计

In a canonical FL system, as in a client-server structure, a cloud server periodically sends a global model to selected clients for updating the model locally. As shown in Fig.system-supervised, in each communication round $t$ , a global model $w^g_{t}$ is sent to three selected clients, which conduct supervised learning on $w^g_{t}$ with their labelled local data. The resulting local models are then sent to the server, which uses the federated averaging (FedAvg) algorithm to aggregate these models into a new global model $w^g_{t+1}$ . The server and clients repeat this procedure through multiple communication rounds between them, thereby fitting the global model to clients' local data without releasing the data to the server.

Figure 6. System structure of a canonical federated learning (FL) system with supervised learning. The server selects 3 clients and sends the global $w^g_{t} \ to them and the clients use their labelled data to update$ w^g_{t} $into their local models, which are then sent to the server to be aggregated into a new global model using the FedAvg algorithm. 图 6.具有监督学习的规范联邦学习（FL）系统的系统结构。服务器选择 3 个客户端并发送全局$ w^g_{t} \给他们，客户使用他们标记的数据来更新$w^g_{t} \本地模型，然后使用 FedAvg 算法将其发送到服务器以汇总为新的全局模型。 In order to address the lack of labels on clients in HAR with IoT sensory data, our proposed system applies semi-supervised learning in an FL system, in which clients use unsupervised learning to train autoencoders with their unlabelled data, and a server uses supervised learning to train a classifier that can map encoded representations to activities with a labelled dataset. As shown in Fig.system-semi, in each communication round, the server sends a global autoencoder$ w^ag_{t} $to selected clients. In order to update$ w^ag_{t} $locally, clients run unsupervised learning on$ w^ag_{t} $with their unlabelled local data and then send the resulting local autoencoders to the server. 为了解决具有物联网传感数据的 HAR 中客户端上标签不足的问题，我们提出的系统在 FL 系统中应用了半监督学习，其中客户端使用无监督学习来训练自动编码器使用其未标记的数据，而服务器则使用监督学习训练一个分类器，该分类器可以将编码的表示形式映射到带有标签数据集的活动。 如图 7 所示 ，在每个通信回合中，服务器发送一个全局自动编码器$ w^ag_{t} $发送到所选客户端。为了在本地更新$ w^ag_{t} $，客户端在$ w^ag_{t} $上运行无监督的学习，并将其未标记的本地数据发送，然后将生成的本地 AutoEncoders 发送到服务器。 The server follows the standard FedAvg algorithm to generate a new global autoencoder$ w^ag_{t+1} $, which is then plugged into the pipeline of supervised learning with a labelled dataset$ D=(X,Y) $. The server first uses the encoder part of$ w^ag_{t+1} $to encode the original features$ X $into representations$ X^\prime $in order to generate a labelled representation dataset$ D^\prime=(X^\prime,Y) $. Then the server conducts supervised learning with$ D^\prime $to update a classifier$ w^s_{t} $into$ w^s_{t+1} $. Fig.alg shows the detailed semi-supervised algorithm of our system. In each communication round$ t $, the resulting classifier$ w^{s}{t} $is also sent to selected clients with the global autoencoder$ w^{g{a}}{t} $. In order to locally recognise activities from its observations$ X $, a client first uses the encoder part of$ w^{g{a}}{t} $to transform$ X $into its presentation$ X^\prime $, and then feeds$ X^\prime $into the classifier$ w^{s}{t} $to recognise the corresponding activities. 该服务器遵循标准 FADVG 算法，生成新的全局 AutoEncoder$ w^{a_{g}}{t+1} $，然后用标记的 DataSet$ D=(X,Y) $插入监督学习的管道。服务器首先使用$ w^{a{g}}{t+1} $的编码器部分将原始功能$ X $编码为表示$ X^\prime $，以生成标记表示数据集$ D^\prime,Y) $。然后，服务器通过$ D^\prime $进行监督学习，将分类器$ w^s{t} $更新为$ w^s_{t+1} $。图展示了我们系统的详细半监督算法。 在每个通信轮$ t $，所得到的分类器$ w^s_{t} $也被发送到选定的客户端与全球自动编码$ w^{g_{a}}{t} $。为了从其观测到$ X $，客户端首先使用$ w^{g{a}} T28_2{t} $的编码器部分将$ X $转换为其演示文稿$ X^\prime $，然后将$ X^\prime $馈送到分类器中$ w^s_{t} $识别相应的活动。 Figure 8. Algorithm of semi-supervised FL. nk and n are the numbers of unlabelled samples on client k and on all selected clients, respectively. LocalTraining is unsupervised learning on the global autoencoder$ w^ag_t $on a client. CloudTraining is supervised learning on the classifier wst on the server. Evaluation We evaluated our system through simulations on different human activity datasets with different system designs and configurations. In addition we evaluated the local activity recognition algorithms of our system on a Raspberry Pi 4 model B. We want to answer research questions as follow: • Q1. How does our system perform in comparison to supervised learning on a centralised server? • Q2. How does our system perform in comparison to semi-supervised FL using data augmentation? • Q3. How does our system perform in comparison to supervised FL? • Q4. How do the key parameters of our system, including the size of labelled samples on the server and the size of learned representations, affect the performance of HAR. • Q5. How efficient is semi-supervised FL on low-cost edge devices. 评估 我们通过对具有不同系统设计和配置的不同人类活动数据集进行仿真来评估我们的系统。此外，我们在 Raspberry Pi 4 模型 B 上评估了系统的本地活动识别算法。我们想回答以下研究问题： • Q1。与集中式服务器上的监督学习相比，我们的系统的性能如何？ • Q2。与使用数据增强的半监督 FL 相比，我们的系统的性能如何？ • Q3。与监督 FL 相比，我们的系统的性能如何？ • Q4。我们系统的关键参数（包括服务器上标记的样本的大小和学习到的表示的大小）如何影响 HAR 的性能。 • Q5。半监督 FL 在低成本边缘设备上的效率如何。 Datasets 数据集 We used three HAR datasets that contain time-series sensory data in our evaluation. The datasets have different numbers of features and activities with different durations and frequencies. The Opportunity (Opp) dataset (Chavarriaga et al., 2013) contains short-term and non-repeated kitchen activities of 4 participants. The Daphnet Freezing of Gait (DG) dataset (Bachlin et al., 2009) contains Parkinson’s Disease patients’ freezing of gaits incidents collected from 10 participants, which are also short-term and non-repeated. The PAMAP2 dataset (Reiss and Stricker, 2012) contains household and exercise activities collected from 9 participants, which are long-term and repeated. The data pre-processing procedure in our evaluation is the same as described by Hammerla et al. (Hammerla et al., 2016). Table 1 shows detailed information about the used datasets after being pre-processed. 我们在评估中使用了三个包含时间序列感官数据的 HAR 数据集。数据集具有不同数量的具有不同持续时间和频率的特征和活动。 机会（Opp）数据集 （Chavarriaga 等人，2013 年）包含 4 位参与者的短期和非重复性厨房活动。Daphnet 步态冻结（DG）数据集 （Bachlin et al。，2009）包含帕金森氏病患者对 10 例参与者的步态事件的冻结，这些事件也是短期且未重复的。PAMAP2 数据集 （Reiss 和 Stricker，2012 年）包含从 9 位参与者那里收集的长期和重复的家庭和锻炼活动。我们评估中的数据预处理程序与 Hammerla 等人 描述的相同*。*（哈默拉等。，2016）。表 1 显示了有关预处理后使用的数据集的详细信息。 Dataset Activities Features Classes Train Test Opp Kitchen 79 18 651k 119k DG Gait 9 3 792k 81k PAMAP2 Household & Exercise 52 12 473k 83k Table 1. HAR datasets in our experiments. Simulation setup 模拟设置 We simulated a semi-supervised FL that runs unsupervised learning on 100 clients to locally update autoencoders and runs supervised learning on a server to update a classifier. In each communication round$ t $, the server selects$ 100\cdot C $clients to participate in the unsupervised learning, and$ C $is the fraction of clients to be selected. Each selected client uses its local data to train the global autoencoder$ w^{a_{g}}{t} $with a learning rate$ lr{a} $for$ e_{a} $epochs. The server conducts supervised learning to train the classifier$ w^{s}{t} $with a learning rate$ lr{s} $for$ e_{s} $epochs. For each individual simulation setup, we conducted 64 replicates with different random seeds. Based on the assumption that a server is more computationally powerful than a client in practice, we set the learning rates$ lr_{a} $and$ lr_{s} $as 0.01 and 0.001, respectively. Similarly, we set the numbers of epochs$ e_{a} $and$ e_{s} $as 2 and 5, because an individual client is only supposed to run a small number of epochs of unsupervised learning and a server is capable of doing more epochs of supervised learning. The reason for setting$ e_{s}=5 $is to keep the execution time of our simulation in an acceptable range. Nevertheless, we believe that this parameter on the server can be set as a larger number in real-world applications where more powerful clusters and graphics processing units (GPUs) can be deployed to accelerate the convergence of performance. 我们模拟了一个半监控的 FL，在 100 个客户端上运行无监督的学习到本地更新 AutoEncoders，并在服务器上运行监督学习以更新分类器。在每个通信循环$ t $中，服务器选择$ 100\cdot C $客户端参与无监督的学习，而$ C $是要选择的客户端的分数。每个所选客户端都使用其本地数据来训练全局 AutoEncoder$ w^{a_{g}}{t} $，用于$ e{a} $时期的学习速率$ lr_{a} $。服务器进行监督学习，以训练分类器$ w^{t} $，用于$ e_{s} $时期的学习速率$ lr_{s} $。对于每个单独的仿真设置，我们用不同的随机种子进行了 64 个重复。基于服务器在实践中比客户更强大的假设，我们将学习速率$ lr_{a}$和$ lr_{s} $分别设置为 0.01 和 0.001。同样，我们设置了时代的数量$ e_{a} $和$ e_{s} $为 2 和 5，因为单个客户端仅应该运行少数无监督学习的时期，并且服务器能够做更多的监督时期的监督学习。设置$ e_{s}=5 $的原因是将我们的模拟的执行时间保持在可接受的范围内。尽管如此，我们认为服务器上的此参数可以在现实世界应用程序中设置为更大的数字，其中可以部署更强大的群集和图形处理单元（GPU）以加速性能的收敛。 Baselines 基准线 To answer Q1 and Q2, we consider two baselines to 1) evaluate whether the autoencoders in our system improve the performance of the system and 2) compare the performance of our system to that of data augmentation based semi-supervised FL. Since we assume that labelled data exist on the server of the system, thus for ablation studies, we consider a baseline system that only uses these labelled data to conduct supervised learning on the server and sends trained models to clients for local activity recognition. This system trains an LSTM classifier on labelled data on the server and does not train any autoencoders on clients. We refer to this baseline of a centralised system as CS. Comparing the performance of CS to that of our proposed system will indicate whether the autoencoders in our system have any effectiveness in improving the performance of the trained model. To compare our system with the state of the art, we consider a semi-supervised FL system that uses data augmentation to generate pseudo labels as another baseline. We refer to this baseline as DA. It first conducts supervise learning on labelled data on the server to train an LSTM classifier. It then follows standard FL protocols to sends the trained global model to clients. Each client uses the received model to generate pseudo labels on their unlabelled local data. To introduce randomness in data augmentation, we feed sequences with randomised lengths into the model when generating labels. The sequences are then paired with the labels that are generated from them as a pseudo-labelled local dataset, which is used for locally updating the global model. 为了回答 Q1 和 Q2，我们考虑两个基线：1）评估我们系统中的自动编码器是否提高了系统性能，以及 2）将我们的系统性能与基于数据增强的半监督 FL 的性能进行了比较。 由于我们假设标记的数据存在于系统的服务器上，因此对于消融研究，我们考虑一个基准系统，该基线系统仅使用这些标记的数据在服务器上进行监督学习，并将经过训练的模型发送给客户端以进行本地活动识别。该系统在服务器上的带标签数据上训练 LSTM 分类器，而在客户端上不训练任何自动编码器。我们将此集中式系统的基准称为 CS。将 CS 的性能与我们提出的系统的性能进行比较，将表明我们系统中的自动编码器是否对提高训练模型的性能有效。 为了将我们的系统与现有技术进行比较，我们考虑使用数据增强以生成伪标签作为另一个基准的半监督 FL 系统。我们将此基准称为 DA。它首先在服务器上对标记数据进行监督学习，以训练 LSTM 分类器。然后，它遵循标准的 FL 协议，将经过训练的全局模型发送给客户端。每个客户端使用接收到的模型在其未标记的本地数据上生成伪标记。为了在数据扩充中引入随机性，我们在生成标签时将具有随机长度的序列输入模型。然后将序列与从它们生成的标签配对，作为伪标记的本地数据集，该数据集用于本地更新全局模型。 Autoencoders and classifiers 自动编码器和分类器 We implement three schemes for our system with different autoencoders, including simple autoencoders, convolutional autoencoders, and LSTM-autoencoders. The first scheme uses a simple autoencoder with fully connected (FC) layers to learn representations from individual samples in unsupervised learning and uses a classifier that has an LSTM cell with its output hidden states connected to an FC layer and a Softmax layer, which we refer to as FC-LSTM. The second scheme uses 1-d convolutional and transposed convolutional layers in its autoencoder. The convolutional layer has 8 output channels with kernel size 3 and has both stride and padding sizes equal to 1. The output is batch normalised and then fed into a ReLU layer. To control the size of the encoded h, after the ReLU layer, we flatten the output of 8 channels and feed it into a fully connected layer that transforms it into the h with a specific size. For the decoder part, we have a fully connected layer whose output is unflattened into 8 channels. Then we use a 1-d transposed convolutional layer that has 1 output channel with kernel size 3 and has both stride and padding sizes equal to 1, to generate the decoded X′. The LSTM classifier of this scheme has the same structure as that in FC+LSTM. We refer to this scheme as CNN-LSTM. For the third scheme, we use an LSTM-autoencoder to capture time-series information in local unsupervised learning. Both the encoder and the decoder have 1 LSTM cell. It uses a Softmax classifier that has a fully connected layer and a Softmax layer. We refer to this scheme as LSTM-FC For the LSTM classifiers in our experiments, we adopted the bagging (i.e., bootstrap aggregating) strategy similar to Guan and Plötz (Guan and Plötz, 2017) to train our models with random batch sizes and sequence lengths. In all schemes, we used the mean square error (MSE) loss function for autoencoders and the cross-entropy loss function for classifiers. We used the Adam optimiser in the training of all models. All the deep learning components in our simulations were implemented using PyTorch libraries (Paszke et al., 2019). 我们为具有不同自动编码器的系统实现了三种方案，包括简单的自动编码器，卷积自动编码器和 LSTM 自动编码器。 第一种方案使用具有完全连接（FC）层的简单自动编码器在无监督学习中从单个样本中学习表示形式，并使用分类器，该分类器具有 LSTM 单元，其输出隐藏状态连接到 FC 层和 Softmax 层，我们将其称为作为 FC-LSTM。 第二种方案在其自动编码器中使用 1-d 卷积和转置卷积层。卷积层具有 8 个输出通道，内核大小为 3，步幅和填充大小均等于 1。将输出批量标准化，然后馈入 ReLU 层。控制编码的大小 H，在 ReLU 层之后，我们将 8 个通道的输出展平，并将其馈入一个完全连接的层，该层将其转换为 H 具有特定的大小。对于解码器部分，我们有一个完全连接的层，其输出未展平为 8 个通道。然后，我们使用一维转置卷积层，该层具有 1 个输出通道，内核大小为 3，并且步幅和填充大小均等于 1，以生成解码后的卷积层。X′。该方案的 LSTM 分类器具有与 FC + LSTM 相同的结构。我们将此方案称为 CNN-LSTM。 对于第三个方案，我们使用 LSTM 自动编码器在本地无监督学习中捕获时间序列信息。编码器和解码器均具有 1 个 LSTM 单元。它使用 Softmax 分类器，该分类器具有完全连接的层和 Softmax 层。我们将此方案称为 LSTM-FC 对于我们实验中的 LSTM 分类器，我们采用类似于 Guan 和 Plötz （Guan andPlötz，2017）的装袋（自举聚合）策略 来训练具有随机批次大小和序列长度的模型。在所有方案中，我们对自动编码器使用均方误差（MSE）损失函数，对分类器使用交叉熵损失函数。我们在所有模型的训练中都使用了 Adam 优化器。我们模拟中的所有深度学习组件均使用 PyTorch 库实现 （Paszke 等人，2019）。 Label ratio and compression ratio 标签率和压缩率 We adjusted two parameters to control the amount of labelled data and the size of representations. For an original training dataset that has$ N^{l} $time-series samples with labels, we adjusted the label ratio$ r^{l}\in(0,1) $and took$ r^{l}\cdot N^{l} $samples from it as the labelled training dataset on the server. Since the samples are formed as time-series sequences, to avoid breaking the activities by directly taking random samples from the sequences, we first divided the entire training set into 100 divisions. We then randomly sampled$ 100\cdot r^{l} $divisions and concatenated them as the labelled training dataset on the server. For a training dataset whose observations have$ N^{f} $features, we adjusted the compression ratio$ r^{f}\in(0,1) $and used the rounded value of$ r^{f}\cdot N^{f} $as the size of the representation when training autoencoders. 我们调整了两个参数以控制标记数据的数量和表示的大小。对于具有$ N^{l} $时序样本的原始训练数据集，我们调整了标签比$ r^(0,1) $并取$ r^{l}\cdot N^{l} $样本作为标记的训练数据集服务器。由于样品形成为时间序列序列，以避免通过直接从序列中取随随机样本来破坏活动，因此我们首先将整个训练分成 100 个分裂。然后，我们随机采样$ 100\cdot r^{l} $部门并将它们连接为服务器上标记的训练数据集。对于训练数据集，其观察具有$ N^{f} $功能，我们调整了压缩比$ r^{f}\in(0,1) $，并使用$ r^{f}\cdot N^{f} $的舍入值作作为训练自动编码器时的表示形式的大小。 IID and Non-IID local data IID 和非 IID 本地数据 We used two strategies to generate local training data for clients from training datasets. In both strategies, the number of allocated samples for each client, ,$ n_{k} $, equals to$ \frac{n^{o}}{n^{p}} $, where$ n^{o} $is the number of samples in the original training dataset shown in Tabledatasets and$ n^{p} $is the number of participants (e.g., 4 for the Opp dataset) of the original dataset. To generate , we divided the training dataset into 100 divisions. For a client to be allocated$ n_{k} $samples, its local training data evenly distribute in these divisions. In each division, a time window that contains$ \frac{n_{k}}{100} $continuous samples is randomly selected, without their labels, as the client's sample fragment in this division. The sample fragments from all divisions are then concatenated as the local training dataset for the client in the IID scenario. For , we randomly located a time window with length$ n_{k} $in the training dataset and used the samples without labels in the time window as the local training dataset for the client in the Non-IID scenario. By this means, the local training dataset of each client can only represent the distribution within a single part in the unlabelled dataset. 我们使用了两种策略来为来自训练数据集的客户端生成本地训练数据。在两个策略中，每个客户端的分配样本数，$ n_{k} $等于$\frac{n^{o}}{n^{p}} $，其中$ n^{o} $是 tabledatasets 中所示的原始训练数据集中的样本数，$ n^{p} $是原始数据集的参与者数（例如，用于 OPP 数据集的 4 个）。要生成，我们将训练数据集分为 100 个部门。对于要分配$ n_{k} $样本的客户端，其本地训练数据均可均匀地分布在这些部门中。在每个划分中，包含$\frac{n_{k}}{100} $连续样本的时间窗口被随机选择，无需其标签，作为该司中的客户的样本片段。然后，来自所有部门的样本片段被连接为 IID 方案中客户端的本地训练数据集。对于，我们随机位于训练数据集中的长度$ n_{k} $的时间窗口，并在时间窗口中使用了没有标签的样本，作为非 IID 方案中客户端的本地训练数据集。通过这种方式，每个客户端的本地训练数据集只能代表未标记的数据集中的单个部分内的分布。 Edge device setup ### 边缘设备设置 Apart from simulations, to answer Q5, we evaluated the local activity recognition part of our system on a Raspberry Pi 4 Model B. The specifications of the device are shown in Table.pi. 除了模拟以外，为了回答问题 5，我们还在 Raspberry Pi 4 Model B 上评估了系统的本地活动识别部分。该设备的规格如表所示。 2 CPU Quad core Cortex-A72 (ARM v8) 64-bit SoC @ 1.5GHz RAM 4GB LPDDR4-3200 SDRAM Storage SanDisk Ultra 32GB microSDHC Memory Card OS Ubuntu Server 19.10 Table 2. System specifications of Raspberry Pi 4 Model B. 表 2. Raspberry Pi 4 Model B 的系统规格 Compared with supervised FL, on the one hand, our system introduces local autoencoders that encode samples into representations before feeding them into classifiers, which costs additional processing time. On the other hand, encoded representations have smaller sizes than original samples do, which reduces the processing time of classifiers. To understand how these two factors affect the overall local processing time, we tested both supervised FL and our system on the Raspberry Pi and compared their performances. We divided the testing datasets into one-second-long sequences and measured the overall processing time of the trained models (, autoencoders + classifiers) on each sequence, in order to calculate the overhead for each one-second time window. 与有监督的 FL 相比，一方面，我们的系统引入了本地自动编码器，可以在将样本输入分类器之前将其编码为表示形式，这会花费额外的处理时间。另一方面，编码表示具有比原始样本小的尺寸，这减少了分类器的处理时间。为了了解这两个因素如何影响整体本地处理时间，我们在 Raspberry Pi 上测试了监督 FL 和我们的系统，并比较了它们的性能。我们将测试数据集划分为一秒长的序列，并测量每个序列上经过训练的模型（，自动编码器 + 分类器）的总体处理时间，以便计算每个一秒时间窗口的开销。 Metrics 指标 We evaluated the performance of the global autoencoder and the classifier with the testing datasets at the end of every other communication round. We first used a time window to select 5000 samples each time. As the sampling frequency in the processed datasets is approximately$ 33Hz $, this time window represents activities in about 2.53 minutes. We then applied the global autoencoder on the samples in the time window to encode them into a sequence of labelled representations. The classifier was applied to the sequence of representations to recognise the activities, which were then compared with the ground truth labels. We calculate the accuracy in the time window, which is the fraction of correctly classified representations among all representations. The accuracies from different time windows are averaged as the accuracy of the system. In every other communication round$ t $, we calculate the average value from 64 simulation replicates and its standard error. metrics 我们评估了全局 AutoEncoder 和分类器的性能，并在每个其他通信结束时使用测试数据集。我们首先使用时间窗口来每次选择 5000 个样本。随着处理后数据集中的采样频率约为$ 33Hz $，此时间窗口表示约 2.53 分钟的活动。然后，我们将全局 AutoEncoder 应用于时间窗口的样本中，以将它们进行编码为标记表示的序列。分类器应用于表示活动的表示顺序，然后将其与 ground truth 标签进行比较。我们计算时间窗口中的精度，这是所有表示中正确分类表示的分数。不同时间窗口的准确性平均为系统的准确性。在每个其他通信循环$ t $中，我们计算 64 模拟复制的平均值及其标准误差。 Results 结果 We find that our proposed semi-supervised FL system has higher accuracy than the centralised system that only conducts supervised learning on the server. The accuracy is also higher than that of data augmentation based semi-supervised FL and is comparable to that of supervised FL that requires more labelled data and bigger local models. In addition, it has marginal local activity recognition time on a low-cost edge device. 我们发现，我们提出的半监督 FL 系统比仅在服务器上进行监督学习的集中式系统具有更高的准确性。准确性也高于基于数据增强的半监督 FL，并且可以与需要更多标记数据和更大局部模型的监督 FL 相媲美。此外，它在低成本边缘设备上的边际活动识别时间也很短。 Analysis of autoencoders and classifiers ### 自动编码器和分类器的分析 We first look at the contribution of the autoencoders in our proposed system. As in our assumption, the server has some labelled data that can be used for supervised learning. If the server has enough labelled data to train a decent model, its accuracy may be higher than that of a semi-supervised FL.Thus the centralised system (CS) is a natural baseline that our system needs to surpass. 我们首先来看一下我们建议的系统中自动编码器的贡献。根据我们的假设，服务器具有一些可用于监督学习的标记数据。如果服务器具有足够的标记数据来训练一个体面的模型，则其准确性可能高于半监督 FL 的准确性。因此，集中式系统（CS）是我们系统需要超越的自然基准。 We adjust the label ratio$ r^l $from$ 1/2 $to$ 1/32 $in ablation studies for each scheme and try to find out if our system has higher accuracy than CS. We find that the scheme FC-LSTM (, using simple autoencoders and LSTM classifiers) has lower accuracy than the CS baseline does under all circumstances. Therefore we remove it from our analysis. For the other schemes, we keep two$ r^l $values that lead to the two highest accuracies that are higher than that of the CS baseline on each dataset. Thus on the Opp dataset, our schemes have better performance than the CS baseline does when$ r^l={1/16, 1/32} $. On the DG and PAMAP2 datasets, we have$ r^l={1/2, 1/4} $. We test our schemes on both IID and Non-IID data but have not found significant differences in the accuracy because our schemes do not use any labels locally. Thus we only show the results on IID data. All schemes' accuracy converges after 50 communication rounds and we only show the results during this period. 我们将$ 1/2 $从$ 1/2 $调整为$ 1/32 $，以便在每个方案的消融研究中调整$ 1/32 $，并尝试了解我们的系统是否具有比 CS 更高的准确性。我们发现该方案 FC-LSTM（使用简单的 AutoEncoders 和 LSTM 分类器）的精度低于 CS 基线在所有情况下都具有较低的精度。因此，我们将其从我们的分析中删除。对于其他方案，我们保留了两个$ r^l $值，导致两个最高精度高于每个数据集上的 CS 基线的最高精度。因此，在 OPP 数据集上，我们的方案具有比 CS 基线在$ r^l={1/16, 1/32} $上的性能更好的性能。在 DG 和 PAMAP2 数据集上，我们具有$ r^l={1/2, 1/4} $。我们在 IID 和非 IID 数据上测试我们的方案，但在准确性中没有发现显着差异，因为我们的方案不会在本地使用任何标签。因此，我们只显示 IID 数据的结果。所有方案的准确性在 50 个通信回合后收敛，我们在此期间只显示结果。 Ablation study of CNN autoencoder CNN 自动编码器的消融研究 Fig.cnn_ablation shows the accuracy of the scheme CNN+LSTM and the scheme CS, with$ r^f=1/2 $on different datasets. As the round of communications increases, the accuracy of all schemes goes up and converges. The converged accuracy of CNN+LSTM schemes, , using a convolutional autoencoder to learn representations locally and using an LSTM classifier for supervised learning in the cloud, is higher than that of the CS schemes that only conduct supervised learning in the cloud. This means that training CNN autoencoders locally indeed contributes to improving the accuracy of the system. When$ r^l $decreases, the converged accuracy of CNN+LSTM goes down on all datasets, which means that it is sensitive to the change of label ratios. Figure 9. Test accuracy of CNN autoencoders with LSTM classifiers (CNN+LSTM), and a centralised system (CS) using LSTM classifiers without autoencoders.$ r^f=1/2 $for both schemes. CNN+LSTM has higher converged accuracy than CS, which means that unsupervised learning on CNN autoencoders helps improve the performance. 图 9.带有 LSTM 分类器（CNN + LSTM）的 CNN 自动编码器和使用不带自动编码器的 LSTM 分类器的集中式系统（CS）的测试精度。$ r^f=1/2 $对于两种方案。CNN + LSTM 的融合精度高于 CS，这意味着 CNN 自动编码器上的无监督学习有助于提高性能。 9 显示了方案 CNN + LSTM 和方案 CS 的精度，其中，用$ r^f=1/2 $在不同的数据集上提供了方案。随着通信回合的增加，所有方案的准确性都会提高并趋于一致。 CNN + LSTM 方案的融合准确性，使用卷积 AutoEncoder 在本地学习云中的 LSTM 分类器，用于在云中监督学习的 LSTM 分类器，高于仅在云中进行监督学习的 CS 方案的表示。这意味着训练 CNN AutoEncoders 本地确实有助于提高系统的准确性。当$ r^l $减小时，CNN + LSTM 的融合精度在所有数据集上都会下降，这意味着它对标签比的变化敏感。 Ablation study of LSTM autoencoder LSTM 自动编码器的消融研究 Fig.lstm_ablation shows the accuracy of the scheme LSTM+FC and the scheme CS, with$ r^{f}=1/2 $. It demonstrates similar trends as Fig.cnn_ablation does. The accuracy of LSTM+FC, , using LSTM autoencoders locally and using Softmax classifiers for supervised learning in the cloud, is higher than that of CS that runs centralised and supervised learning without using unlabelled local data. However, LSTM+FC is less sensitive to the change of label ratios. For example, its converged accuracy on the Opp dataset is almost the same when we change$ r^l $from$ 1/16 $to$ 1/32 $. This would enable us to achieve similar performance but require fewer labelled data compared to CNN+LSTM. 10 显示了方案 LSTM + FC 和方案 CS 的精度，其中$ r^{f}=1/2 $。它展示了与图 CNN_ABLATION 的类似趋势。 LSTM + FC 的准确性，使用 LSTM AutoEncoders 在本地和使用 SoftMax 分类器中云中的监督学习，高于 CS 中的 CS，而不使用未标记的本地数据。但是，LSTM + FC 对标签比的变化不太敏感。例如，当从$ 1/16 $到$ 1/32 $将$ r^l $更改为$ r^l $时，其对 OPP 数据集的融合精度几乎相同。这将使我们能够实现类似的性能，但与 CNN + LSTM 相比，需要更少的标记数据。 Figure 10. Test accuracy of LSTM autoencoders with Softmax classifiers (LSTM+FC), and a centralised system (CS) using LSTM classifiers without autoencoders.$ r^{f}=1/2 $for both schemes. LSTM+FC has higher converged accuracy than CS. It is less sensitive to the change of$ r^l $than CNN+LSTM on the Opp and DG datasets. 图 10.带有 Softmax 分类器（LSTM + FC）的 LSTM 自动编码器和使用不带自动编码器的 LSTM 分类器的集中式系统（CS）的测试精度。$ r^{f}=1/2 $对于两种方案。LSTM + FC 的融合精度比 CS 高。和 Opp 和 DG 数据集上的 CNN + LSTM 相比起来$ r^l $对变化不那么敏感。 The experimental results show that, when implementing a semi-supervised FL system for HAR, both CNN autoencoders and LSTM autoencoders can improve the accuracy of the system. Using LSTM autoencoders is less sensitive to the change of available labelled data in the cloud. In the rest of our analyses of our results, we only show the accuracy of the LSTM+FC scheme. 实验结果表明，当为 HAR 实施半监控 FL 系统时，CNN AutoEncoders 和 LSTM AutoEncoders 都可以提高系统的准确性。使用 LSTM AutoEncoders 对云中可用标记数据的更改不太敏感。在我们的结果分析的其余部分中，我们只显示了 LSTM + FC 方案的准确性。 5.2 Comparison with different FL schemes 与不同 FL 方案的比较 We now analyse the performance of our system in comparison with semi-supervised FL using data augmentation (DA) to generate pseudo labels and supervised FL having labelled data available on clients. 现在，我们与使用数据增强（DA）生成伪标签的半监督 FL 相比较，分析了系统的性能，并在客户端上提供了具有标签数据的监督 FL。 Comparison with DA 与 DA 的比较 Fig. 11 shows the accuracy of both LSTM+FC and DA on three datasets. On the Opp and DG datasets, the accuracy of DA increases more slowly than LSTM+FC does. But once the accuracy of both schemes converge, they do not show significant differences. On the PAMAP2 dataset, the converged accuracy of LSTM+FC is higher than that of DA. We also find that, although the accuracy of DA on the Opp and DG datasets is higher than that of CS in Fig. 10, its accuracy on the PAMAP2 dataset in Fig. 11 converges more slowly than CS does in Fig. 10. This indicates that using the received global LSTM model to generate pseudo labels and then training the model on these pseudo labels may damage the testing accuracy. Although we used time-series sequences with randomised lengths to generate pseudo labels in our experiments, DA may still risk overfitting the model to the training data and consequently has slower speed to achieve decent accuracy on testing data. 11 显示了三个数据集上 LSTM + FC 和 DA 的准确性。在 Opp 和 DG 数据集上，DA 的精度增加的速度比 LSTM + FC 慢。但是，一旦这两种方案的准确性收敛，它们就不会显示出显着差异。在 PAMAP2 数据集上，LSTM + FC 的收敛精度高于 DA。我们还发现，尽管 Opp 和 DG 数据集上 DA 的精度高于图 10 中的 CS，但图 11 中 PAMAP2 数据集上的 DA 的 收敛速度却比图 10 中的 CS 慢。 。这表明使用接收到的全局 LSTM 模型生成伪标签，然后在这些伪标签上训练模型可能会破坏测试准确性。尽管我们在实验中使用了具有随机长度的时间序列序列来生成伪标记，但 DA 仍可能会使模型过度拟合训练数据，因此速度较慢，无法在测试数据上获得不错的准确性。 Figure 11. Test accuracy of LSTM autoencoders with Softmax classifiers (LSTM+FC), and semi-supervised FL using data augmentation (DA). There is not significant difference between their converged accuracy on the Opp and DG datasets. LSTM+FC has higher converged accuracy than DA on the PAMAP2 dataset. 图 11.具有 Softmax 分类器（LSTM + FC）和使用数据增强（DA）的半监督 FL 的 LSTM 自动编码器的测试精度。它们在 Opp 和 DG 数据集上的收敛精度之间没有显着差异。在 PAMAP2 数据集上，LSTM + FC 的收敛精度高于 DA。 Our results indicate that, for semi-supervised FL, using locally trained autoencoders can achieve higher converged accuracy than using data augmentation to generate pseudo labels. In addition, compared with DA, our scheme is independent from the specific tasks provided by the server. For example, if one client uses its unlabelled data to access multiple FL servers that conduct different tasks, with data augmentation, the client has to generate pseudo labels for each of models of these tasks. In our scheme, the client only conducts unsupervised learning locally using unlabelled data to learn general representations, which is independent from the labels in the cloud. 我们的结果表明，对于半监督的 FL，使用本地训练的自动编码器比使用数据增强生成伪标签可以实现更高的收敛精度。此外，与 DA 相比，我们的方案独立于服务器提供的特定任务。例如，如果一个客户端使用其未标记的数据来访问执行不同任务的多个 FL 服务器（通过数据增强），则客户端必须为这些任务的每种模型生成伪标签。在我们的方案中，客户端仅使用未标记的数据在本地进行无监督的学习，以学习一般表示形式，而与云中的标记无关。 Comparison with supervised FL 与监督 FL 的比较 Fig.supervised shows the accuracy of LSTM+FC with$ r^f=1/2 $and a supervised FL scheme. The supervised FL uses all the information (, 100% features and 100% labels) in the training datasets. Therefore it has higher accuracy than that of LSTM+FC. However, our scheme enables a trade-off between the performance of the system (, accuracy), the cost of data annotation (, label ratio), and the size of models (, compression ratio). For example, having larger compression ratio$ r^f=3/4 $on the PAMAP2 dataset can lead to a higher accuracy (shown in Fig.compression_ratio) that is comparable to that of the supervised FL. Figure 12. Test accuracy of LSTM autoencoders with Softmax classifiers (LSTM+FC,$ r^f=1/2 $, and supervised FL using 100% features and labels. The converged accuracy of LSTM+FC is comparable to that of the supervised FL and requires fewer labelled data. The experimental results suggest that we can implement FL systems in a semi-supervised fashion with fewer needed labels than those in supervised FL, meanwhile achieve comparable accuracy. Although one of the motivations of FL is to hold models instead of personal data in the cloud to address potential privacy issues, the data held by the server of our system do not have to be from the users of the service of the system. This kind of dataset in the cloud has been used in FL to address other challenges such as dealing with Non-IID data by creating a small globally shared dataset and does not necessarily contain private information. We believe that service providers can collect these data from open datasets, or from laboratory trials in controlled environment where data subjects give their consents to contribute their data. 实验结果表明，我们可以以半监督方式实施流量，而不是监督 FL 中的较少的标签，同时达到可比的准确性。虽然 FL 的动机之一是持有模型而不是云中的个人数据来解决潜在的隐私问题，但我们系统服务器所持的数据不必来自系统服务的用户。云中的这种数据集已被用于解决其他挑战，例如通过创建小全局共享数据集处理非 IID 数据，并且不一定包含私人信息。我们认为，服务提供商可以从开放数据集中收集这些数据，或者在受控环境中的实验室试验中，数据主体给予他们的代表贡献他们的数据。 5.3 Analysis of compression ratio 压缩比分析 We also investigate how the compression ratio$ r^{f} $of autoencoders affects the accuracy of our system. It is an important factor that can affect the number of parameters and the size of local models. These local models are regularly uploaded from the clients to the server over network, hence their sizes affect the outbound traffic. We use$ r^{f}={3/4,1/2, 1/4,1/8} $on all datasets. We keep$ r^{l}=1/16 $for the Opp dataset and$ r^{l}=1/2 $for both the DG and PAMAP2 datasets. Fig.compression_ratio demonstrates that, on the Opp and the DG datasets, our system can compress an original sample into a representation whose size is only$ 1/4 $of the original sample without significantly affecting the accuracy. On the PAMAP2 dataset, increasing$ r^f $from$ 1/2 $to$ 3/4 $can lead to accuracy that is comparable to that of the supervised FL scheme. 我们还研究了自动频率的压缩比$ r^{f} $如何影响我们系统的准确性。这是一个重要因素，可以影响参数的数量和本地模型的大小。这些本地模型通过网络从客户端上传到服务器，因此它们的大小会影响出站流量。我们在所有数据集上使用$ r^{f}={3/4,1/2, 1/4,1/8} $。对于 DG 和 PAMAP2 数据集，我们为 OPP 数据集和$ r^{l} r^{l}=1/2 $保留$ r^{l}=1/16 $。 Formbinceplate_ratio 演示，在 OPP 和 DG 数据集上，我们的系统可以将原始样本压缩成尺寸仅为原始样本的$ 1/4 $，而不会显着影响精度。在 PAMAP2 数据集上，从$ 1/2 $增加到$ 3/4 $的$ r^f $可能导致可与监督流程方案相当的准确性。 Figure 13. Test accuracy with different compression ratios$ r^{f}={3/4,1/2, 1/4,1/8} $.$ r^{l}=1/16 $for OPP and$ r^{l}=1/2 $for both DG and PAMAP2. The accuracy on OPP and DG is not affected much when changing$ r^f $from 3/4 to 1/4. PAMAP2 is more sensitive to the change of$ r^f $than the other two datasets. Changing the compression ratio in our system allows us to exchange accuracy with model sizes, or When used data are not sensitive to the compression ratio (, Opp and DG), compressing samples into smaller representations may significantly reduce the size of local models that are uploaded from the clients to the server, which may lead to lower network traffic. 更改系统中的压缩比可以使我们与模型尺寸交换精度，反之亦然。当使用的数据对压缩率（例如 Opp 和 DG）不敏感时，将样本压缩为较小的表示形式可能会大大减小从客户端上传到服务器的本地模型的大小，这可能会导致较低的网络流量。 5.4 Running time at the edge 边缘运行时间 We evaluated the local activity recognition using both supervised FL and our system (LSTM+FC) with$ r^{f}=0.5 $on a Raspberry Pi. As shown in Fig.pi, the processing time of our system is significantly lower ($ p<0.001 $) than that of supervised FL on all datasets. Although the autoencoder in our system inevitably causes extra processing time as it increases the length of the local pipeline, the LSTM cell in the autoencoder encodes the input data into smaller representations. In contrast, the LSTM classifier in the supervised FL transforms input data into hidden states that have larger sizes. This reduction of the amount of data leads to a shorter overall processing time than that of the supervised FL. 我们在 Raspberry Pi 上使用监督 FL 和我们的系统（LSTM + FC）对本地活动识别进行了评估，$ r^{f}=0.5 \$。如图 14 所示 ，我们系统的处理时间大大缩短了（p<0.001），而不是所有数据集上的监督 FL。尽管我们系统中的自动编码器不可避免地会导致额外的处理时间，因为它增加了本地流水线的长度，但 AutoEncoder 中的 LSTM 单元将输入数据编码为更小的表示。相比之下，监控的 LSTM 分类器将输入数据转换为具有更大尺寸的隐藏状态。数据量的这种减少导致总的处理时间比受监督的 FL 的处理时间更短。

Combined with the results of Sec.compression, our experimental results show that running unsupervised learning on autoencoders can reduce both the size of local models and the size of data processed by classifiers. This can potentially improve not only the outbound network traffic, but also the efficiency of local activity recognition.

Discussion 讨论

Our experimental results show that HAR with semi-supervised FL can achieve comparable accuracy to that of supervised FL. We now discuss how these results can contribute to the system design of FL systems and possible research topics.

FL servers can do more than FedAvg FL 服务器可以做的不仅仅是 FEDAVG

In canonical FL systems, servers only hold global models and use the FedAvg algorithm to aggregate received local models into new global models. This design consideration is due to the privacy concerns of having personal data on the servers. Our findings suggest that running supervised learning with a small amount of labelled data on the servers can alleviate individual users from labelling their local data. Therefore, we suggest that FL systems may consider maintaining datasets that do not contain private information on their servers to support semi-supervised learning. Apart from implementing the FedAvg algorithm in every communication round, servers can conduct more epochs of supervised learning than individual clients can do, since they have more computational resources and fewer power constrains than clients do. This can help the performance of the models converge faster.

Learning useful representations, not bias 学习有用的表示，不是通过训练自动校正本地的偏倚

By training autoencoders locally, semi-supervised FL is not affected by Non-IID data because it does not use any labels locally. This sheds light on a new solution, which is different from data augmentation or limiting individual contributions from clients, to address the Non-IID data issue in FL. Although, our work focuses on semi-supervised FL where no labels are available on clients, we suggest that supervised FL can also consider learning general representations apart from the mappings from features to labels, and use the learned representations to help alleviate the bias caused by Non-IID data. Another possible application of semi-supervised FL is to defend against malicious users who attack the global model through data poisoning. Labels of local data are a common attack vector in FL. Adversaries can manipulate (, flipping) the labels in their local data to affect the performance of their local models, thereby affecting the performance of the global model. Such an attack will be removed if we do not use local labels. We suggest that security researchers in FL should consider semi-supervised FL as a possible scheme to defend from data poisoning attacks.

Smaller models via unsupervised learning 通过无监督学习的较小模型

In supervised FL, as the complexity of an ML task goes up, the size of the model of the task increases. This eventually leads to increasing numbers of parameters and increasing network traffic when uploading local models to the server. Semi-supervised FL only uploads trained autoencoders from clients to the server. Our experimental results suggest that the performance of semi-supervised FL can still converge to an acceptable level even if we use high compression rates, which help with reducing a significant amount of model parameters. The compression rate can be considered as a system parameter that can be tuned to reduce the size of models, as long as the key representations can be learned and the performance of the system can be guaranteed. This makes our system suitable in scenarios that have demanding network conditions. Although in this paper, we only focus on the application of HAR, such effects may exist in other applications with different types of data, which should be further investigated.

Conclusions 结论

HAR using IoT sensory data and FL systems can empower many real-world applications including processing the daily activities and the changes to these activities in people living with long-term conditions. The difficulty of obtaining labelled data from end users limits the scalability of the FL applications for HAR in real-world and uncontrolled environments. In this paper, we propose a semi-supervised FL system to enable HAR in IoT environments. By training LSTM autoencoders through unsupervised learning on FL clients, and training Softmax classifiers through supervised learning on an FL server, our system can achieve higher accuracy than centralised systems and data augmentation based semi-supervised FL do. The accuracy is also comparable to that of a supervised FL system but does not require any locally labelled data. In addition, it is has simpler local models with smaller size and faster processing speed. Our future research plans are to investigate fully unsupervised FL systems that can support anomaly detection through analysing the difference between local and global models. We believe that such systems will enable many useful real-time applications in HAR where useful labels are rare or extremely difficult to collect.