[论文翻译]Semi-supervised Federated Learning for Activity Recognition 半监督联邦学习活动识别


Semi-supervised Federated Learning for Activity Recognition



Training deep learning models on in-home IoT sensory data is commonly used to recognise human activities. Recently, federated learning systems that use edge devices as clients to support local human activity recognition have emerged as a new paradigm to combine local (individual-level) and global (group-level) models. This approach provides better scalability and generalisability and also offers better privacy compared with the traditional centralised analysis and learning models. The assumption behind federated learning, however, relies on supervised learning on clients. This requires a large volume of labelled data, which is difficult to collect in uncontrolled IoT environments such as remote in-home monitoring. In this paper, we propose an activity recognition system that uses semi-supervised federated learning, wherein clients conduct unsupervised learning on autoencoders with unlabelled local data to learn general representations, and a cloud server conducts supervised learning on an activity classifier with labelled data. Our experimental results show that using a long short-term memory autoencoder and a Softmax classifier, the accuracy of our proposed system is higher than that of both centralised systems and semi-supervised federated learning using data augmentation. The accuracy is also comparable to that of supervised federated learning systems. Meanwhile, we demonstrate that our system can reduce the number of needed labels and the size of local models, and has faster local activity recognition speed than supervised federated learning does.





Introduction 简介

Modern smart homes are integrating more and more Internet of Things (IoT) technologies in different application scenarios. The IoT devices can collect a variety of time-series data, including ambient data such as occupancy, temperature, and brightness, and physiological data such as weight and blood pressure. With the help of machine learning (ML) algorithms, these sensory data can be used to recognise people’s activities at home. Human activity recognition (HAR) using IoT data has the promise that can significantly improve quality of life for people who require in-home care and support. For example, anomaly detection based on recognised activities can raise alerts when an individual’s health deteriorates. The alerts can then be used for early interventions (Enshaeifar et al., 2018; Cao et al., 2015; Queralta et al., 2019). Analysis of long-term activities can help identify behaviour changes, which can be used to support clinical decisions and healthcare plans (Enshaeifar et al., 2018).

An architecture for HAR is to deploy devices with computational resources at the edge of networks, which is normally within people’s homes. Such “edge devices” are capable of communicating with sensory devices to collect and aggregate the sensory data, and running ML algorithms to process the in-home activity and movement data. With the help of a cloud back-end, these edge devices can form a federated learning (FL) system (McMahan et al., 2017; Yang et al., 2019; Zhao et al., 2020), which is increasingly used as a new system to learn at population-level while constructing personalised edge models for HAR. In an FL system, clients jointly train a global Deep Neural Network (DNN) model by sharing their local models with a cloud back-end. This design enables clients to use their data to contribute to the training of the model without breaching privacy. One of the assumptions behind using the canonical FL system for HAR is that data on clients are labelled with corresponding activities so that the clients can use these data to train supervised local DNN models. In HAR using IoT data, due to the large amount of time-series data that are continuously generated from different sensors, it is difficult to guarantee that end-users are capable of labelling activity data at a large scale. Thus, the availability of labelled data on clients is one of the challenges that impede the adoption of FL systems in real-world HAR applications.

Existing solutions to utilise unlabelled data in FL systems is through data augmentation (Jeong et al., 2020; Liu et al., 2020; Zhang et al., 2020). The server of an FL system keeps some labelled data and use them to train a global model through supervised learning. The clients of the system receive the global model and use it to generate pseudo labels on augmented local data. However, this approach couples the local training on clients with the specific task (i.e., labels) from the server. If a client accesses multiple FL servers for different tasks, it has to generate pseudo labels for each of them locally, which increases the cost of local training.

现代智能家居正在不同的应用场景中集成越来越多的物联网(IoT)技术。物联网设备可以收集各种时间序列数据,包括环境数据(例如占用率,温度和亮度)以及生理数据(例如体重和血压)。借助机器学习(ML)算法,这些感官数据可用于识别人们在家中的活动。使用物联网数据的人类活动识别(HAR)有望大大改善需要家庭护理和支持的人们的生活质量。例如,当个人的健康状况恶化时,基于已识别活动的异常检测可以发出警报。然后可以将警报用于早期干预 (Enshaeifar et al。,2018; 曹等。,2015 ; Queralta等。,2019)。长期活动的分析可以帮助识别行为变化,这可以用于支持临床决策和医疗保健计划 (Enshaeifar等,2018)。

HAR的体系结构是将具有计算资源的设备部署在通常位于人们家中的网络边缘。这样的“边缘设备”能够与感官设备进行通信以收集和汇总感官数据,并且能够运行ML算法来处理家庭活动和运动数据。借助云后端,这些边缘设备可以形成联邦学习(FL)系统 (McMahan等,2017; Yang等,2019; Zhao等,2020)。,它越来越多地被用作在人口层次上学习的新系统,同时为HAR构建个性化边缘模型。在FL系统中,客户通过与云后端共享本地模型来共同训练全局深度神经网络(DNN)模型。这种设计使客户可以使用他们的数据为模型的训练做出贡献,而不会破坏隐私。将规范的FL系统用于HAR的假设之一是,客户端上的数据被标记有相应的活动,以便客户端可以使用这些数据来训练受监督的本地DNN模型。在使用物联网数据的HAR中,由于从不同传感器连续生成的大量时间序列数据,很难保证最终用户能够大规模标记活动数据。因此,

现有的在FL系统中利用未标记数据的解决方案是通过数据增强 (Jeong等,2020; Liu等,2020; Zhang等,2020)。FL系统的服务器保留一些标记的数据,并使用它们通过监督学习来训练全局模型。系统的客户端接收全局模型,并使用它在增强的本地数据上生成伪标签。但是,这种方法将针对客户的本地训练与特定任务结合在一起(,标签)。如果客户端访问多个FL服务器以执行不同任务,则必须在本地为每个FL服务器生成伪标签,这会增加本地训练的成本。

In centralised ML, unsupervised learning on DNN such as autoencoders (Baldi, 2011) has been widely used to learn general representations from unlabelled data. The learned representations can then be utilised to facilitate supervised learning models with labelled data. A recent study by van Berlo et al. (van Berlo et al., 2020) shows that temporal convolutional networks can be used as autoencoders to learn representations on clients of an FL system. The representations can help with training of the global supervised model of an FL system. The resulting model’s performance is comparable to that of a fully supervised algorithm. Building upon this promising result, we propose a semi-supervised FL system that realises activity recognition using time-series data at the edge, without labelled IoT sensory data on clients, and evaluate how different factors (e.g., choices of models, the number of labels, and the size of representations) affect the performance (e.g., accuracy and inference time) of the system.

In our proposed design, clients locally train autoencoders with unlabelled time-series sensory data to learn representations. These local autoencoders are then sent to a cloud server that aggregates them into a global autoencoder. The server integrates the resulting global autoencoder into the pipeline of the supervised learning process. It uses the encoder component of the global autoencoder to transform a labelled dataset into labelled representations, with which a classifier can be trained. Such a labelled dataset on the cloud back-end server can be provided by service providers without necessarily using any personal data from users (e.g., open data or data collected from laboratory trials with consents). Whenever the server selects a number of clients, both the global autoencoder and the global classifier are sent to the clients to support local activity recognition.

We evaluated our system through simulations on different HAR datasets, with different system component designs and data generation strategies. We also tested the local activity recognition part of our system on a Raspberry Pi 4 model B, which is a low-cost edge device. With the focus on HAR using time-series sensory data, we are interested in answering the research questions as follows:

  • Q1. How does semi-supervised FL using autoencoders perform in comparison to supervised learning on a centralised server?
  • Q2. How does semi-supervised FL using autoencoders perform in comparison to semi-supervised FL using data augmentation?
  • Q3. How does semi-supervised FL using autoencoders perform in comparison to supervised FL?
  • Q4. How do the key parameters of semi-supervised FL, including the number of labels on the server and the size of learned representations, affect its performance.
  • Q5. How efficient is semi-supervised FL on low-cost edge devices?

Our experimental results demonstrate several key findings:

  • Using long short-term memory autoencoders as local models and a Softmax classifier model as a global classifier, the accuracy of our system is higher than that of a centralised system that only conducts supervised learning in the cloud, which means that learning general representations locally improves the performance of the system.
  • Our system also has higher accuracy than semi-supervised FL using data augmentation to generate pseudo labels does.
  • Our system can achieve comparable accuracy to that of a supervised FL system.
  • By only conducting supervised learning in the cloud, our system can significantly reduce the needed number of labels without losing much accuracy.
  • By using autoencoders, our system can reduce the size of local models. This can potentially contribute to the reduction of upload traffic from the clients to the server.
  • The processing time of our system when recognising activities on a low-cost edge device is acceptable for real-time applications and is significantly lower than that of supervised FL.

在集中式机器学习中,在DNN上的无监督学习(例如自动编码器) (Baldi,2011年)已被广泛用于从未标记的数据中学习一般表示。然后,可以利用学习到的表示来促进带有标记数据的监督学习模型。van Berlo等人的 最新研究*。*(van Berlo等人,2020年)显示了时间卷积网络可以用作自动编码器,以学习FL系统客户端上的表示形式。这些表示形式可以帮助训练FL系统的全局监督模型。最终模型的性能可与完全监督算法的性能相媲美。基于此有希望的结果,我们提出了一种半监督的FL系统,该系统使用边缘的时间序列数据实现活动识别,而无需在客户端上标记IoT传感数据,并评估不同因素(例如,模型选择,标签和表示的大小)会影响系统的性能(例如,准确性和推断时间)。


我们通过对具有不同系统组件设计和数据生成策略的不同HAR数据集进行仿真来评估我们的系统。我们还在Raspberry Pi 4模型B(一种低成本的边缘设备)上测试了系统的本地活动识别部分。着眼于使用时间序列感官数据的HAR,我们有兴趣回答以下研究问题:

  • Q1。与集中式服务器上的监督学习相比,使用自动编码器的半监督FL如何执行?
  • Q2。与使用数据增强的半监督FL相比,使用自动编码器的半监督FL如何执行?
  • Q3。与监督型FL相比,使用自动编码器的半监督型FL的性能如何?
  • Q4。半监督FL的关键参数(包括服务器上标签的数量和学习到的表示的大小)如何影响其性能。
  • Q5。半监督FL在低成本边缘设备上的效率如何?


  • 使用长短期内存自动编码器作为局部模型,并使用Softmax分类器模型作为全局分类器,我们的系统的准确性要高于仅在云中进行监督学习的集中式系统,这意味着在本地学习通用表示会有所改善系统的性能。
  • 与使用数据增强生成伪标签的半监督FL相比,我们的系统还具有更高的准确性。
  • 我们的系统可以达到与监督FL系统相当的精度。
  • 通过仅在云中进行有监督的学习,我们的系统可以在不损失太多准确性的情况下显着减少所需的标签数量。
  • 通过使用自动编码器,我们的系统可以减小局部模型的大小。这可能有助于减少从客户端到服务器的上传流量。
  • 我们的系统在识别低成本边缘设备上的活动时的处理时间对于实时应用是可以接受的,并且比监督式FL的处理时间要短得多。

As one of the key applications of IoT that can significantly improve the quality of people's lives, HAR has attracted an enormous amount of research. Many HAR systems have been proposed to be deployed at the edge of networks, thanks to the evergrowing computational power of different types of edge devices.


HAR at the edge 边缘的HAR

In comparison to having both data and algorithms in the cloud, edge computing instead deploys devices closer to end users of services, which means that data generated by the users and computation on these data can stay on the devices locally. Modern edge devices such as Intel Next Unit of Computing (NUC) and Raspberry Pi are capable of running DNN models and providing real-time activity recognition from videos. Many deep learning models such as long short-term memory (LSTM) or convolutional neural network (CNN) can be applied at the edge for HAR. For example, Zhang proposed an HAR system that utilised both edge computing and back-end cloud computing. One implementation of this kind of HAR edge systems was proposed by Cao , which implemented fall detection both at the edge and in the cloud. Their results show that their system has lower response latency than that of a cloud based system. Queralta also proposed a fall detection system that achieved over 90% precision and recall. Uddin proposed a system that used more diverse body sensory data including electrocardiography (ECG), magnetometer, accelerometer, and gyroscope readings for activity recognition. These HAR systems, however, send the personal data of their users to a back-end cloud server to train deep learning models, which poses great privacy threats to the data subjects. Servia-Rodr'guez proposed a system in which a small group of users voluntarily share their data to the cloud to train a model. Other users in the system can download this model for local training, which protects the privacy of the majority in the system but does not utilise the fine trained local models from different users to improve the performance of each other's models. To improve the utility of local models and protect privacy at the same time, we apply federated learning to HAR at the edge, which can train a global deep learning model with constant contributions from users but does not require the users to send their personal data to the cloud.

与将数据和算法都存储在云中相比,边缘计算 (Shi等人,2016)将设备部署在距离服务最终用户更近的地方,这意味着用户生成的数据以及对这些数据的计算可以保留在设备上本地。英特尔®下一代计算单元(NUC)(15)和树莓派 (28)等现代边缘设备 能够运行DNN模型 (Servia-Rodriguez等人,2018 ; Chen和Ran,2019)并提供实时活动识别 (Liu et al。,2018; Cartas等。,2019)从视频中获取。许多深度学习模型,例如长短期记忆(LSTM) (Hochreiter和Schmidhuber,1997 ; Guan和Plötz,2017 ; Hammerla等人,2016)或卷积神经网络(CNN) (Hammerla等人,2016)都可以适用于HAR的边缘。例如,Zhang等。 (Zhang等人,2018)提出了一种同时利用边缘计算和后端云计算的HAR系统。Cao等人 提出了这种HAR边缘系统的一种实现方式*。*(Cao et al。,2015),该算法在边缘和云中都实现了跌倒检测。他们的结果表明,他们的系统比基于云的系统具有更低的响应延迟。Queralta 等。 Queralta et al。,2019)还提出了一种跌倒检测系统,该系统可实现90%以上的精度和召回率。乌丁 (乌丁,2019)提出了一种系统,该系统使用包括心电图(ECG),磁力计,加速度计和陀螺仪读数在内的多种身体感觉数据进行活动识别。

但是,这些HAR系统会将其用户的个人数据发送到后端云服务器以训练深度学习模型,这对数据主体构成了极大的隐私威胁。Servia-Rodríguez 等。 (Servia-Rodriguez等人,2018)提出了一个系统,其中一小部分用户自愿将其数据共享到云中以训练模型。系统中的其他用户可以下载此模型以进行本地训练,这可以保护系统中大多数用户的隐私,但不会利用来自不同用户的经过良好训练的本地模型来提高彼此模型的性能。为了提高本地模型的效用并同时保护隐私,我们应用了联邦学习 (McMahan等,2017)到边缘的HAR,可以在用户不断贡献的情况下训练全球深度学习模型,但不需要用户将其个人数据发送到云中。

HAR with federated learning Har与联邦学习

Federated learning (FL) (McMahan et al., 2017; Yang et al., 2019) was proposed as an alternative to traditional cloud based deep learning systems. It uses a cloud server to coordinate different clients to collaboratively train a global model. The server periodically sends the global model to a selection of clients that use their local data to update the global model. The resulting local models from the clients will be sent back to the server and be aggregated into a new global model. By this means, the global model is constantly updated using users’ personal data, without having these data in the server. Since FL was proposed, it has been widely adopted in many applications (Li et al., 2020; Yu et al., 2020) including HAR. Sozinov et al. (Sozinov et al., 2018) proposed an FL based HAR system and they demonstrated that its performance is comparable to that of its centralised counterpart, which suffers from privacy issues. Zhao et al. (Zhao et al., 2020) proposed an FL based HAR system for activity and health monitoring. Their experimental results show that, apart from acceptable accuracy, the inference time of such a system on low-cost edge devices such as Raspberry Pi is marginal. Feng et al. (Feng et al., 2020) introduced locally personalised models in FL based HAR systems to further improve the accuracy for mobility prediction. Specifically, HAR applications that need both utility and privacy guarantees such as smart healthcare can benefit from the accurate recognition and the default privacy by design of FL. For example, the system recently proposed by Chen et al. (Chen et al., 2020) applied FL to wearable healthcare, with a specific focus on the auxiliary diagnosis of Parkinson’s disease.

Existing HAR systems with canonical FL use supervised learning that relies on the assumption that all local data on clients are properly labelled with activities. This assumption is difficult to be satisfied in the scenario of IoT using sensory data. Compared to the existing FL based HAR systems, we aim to address this issue by utilising semi-supervised machine learning, which does not need locally labelled data.

联邦学习(FL)被提出为传统云的深度学习系统的替代方案(McMahan等人,2017 ; 它使用云服务器来协调不同的客户端,以协作训练全局模型。服务器定期将全局模型发送给使用其本地数据更新全局模型的客户端选择。客户端生成的本地模型将被发送回服务器,并被汇总为新的全局模型。通过这种方式,可以使用用户的个人数据不断更新全局模型,而无需将这些数据存储在服务器中。自从FL提出以来,它已在许多应用中被广泛采用 (Li等。,2020 ; Yu等。(2020年),包括HAR。Sozinov 等。 Sozinov et al。,2018)提出了一种基于FL的HAR系统,他们证明了其性能与集中式同类产品的性能相当,后者受到隐私问题的困扰。赵等。 (Zhao et al。,2020) 提出了一种基于FL的HAR系统,用于活动和健康监测。他们的实验结果表明,除了可接受的精度外,这种系统在诸如Raspberry Pi之类的低成本边缘设备上的推理时间也很短。冯等。 (冯等。(2020年)在基于FL的HAR系统中引入了本地个性化模型,以进一步提高移动性预测的准确性。具体来说,需要同时提供实用程序和隐私保证的HAR应用程序(例如智能医疗保健)可以通过FL的设计从准确识别和默认隐私中受益。例如,Chen等人 最近提出的系统*。*(Chen et al。,2020)将FL应用于可穿戴医疗保健,特别侧重于帕金森氏病的辅助诊断。


Semi-supervised federated learning 半监督联邦学习

Semi-supervised learning combines both supervised learning that requires labelled data and unsupervised learning that does not use labels when training DNN models. Traditional centralised ML has benefited from semi-supervised learning techniques such as transfer learning (Khan and Roy, 2018; Zhang and Ardakanian, 2019) and autoencoders (Baldi, 2011). These techniques have been widely used in centralised ML such as learning time-series representations from videos (Srivastava et al., 2015), learning representations to compress local data (Hu and Krishnamachari, 2020), and learning representations that do not contain sensitive information (Malekzadeh et al., 2018).

The challenge of having available local labels in FL has motivated a number of systems that aim to realise FL in a semi-supervised or self-supervised fashion. The majority of the existing solutions in this area focuses on generating pseudo labels for unlabelled data and using these labels to conduct supervised learning (Jeong et al., 2020; Liu et al., 2020; Zhang et al., 2020; Long et al., 2020; Zhang et al., 2021; Kang et al., 2020; Wang et al., 2020; Yang et al., 2020). For example, Jeong et al. (Jeong et al., 2020) use data augmentation to generate fake labels and keep the consistency of the labels across different FL clients. However, the inter-client consistency requires some clients to share their data with others, which poses privacy issues. Liu et al. (Liu et al., 2020) use labelled data on an FL server to train a model through supervised learning and then send this model to FL clients to generate labels on their local data. These solutions couple the local training on clients with the specific task from the server, which means that a client has to generate pseudo labels for all the servers that have different tasks.

Another direction of semi-supervised FL is to conduct unsupervised learning on autoencoders locally on clients instead of generating pseudo labels. Compared with existing solutions, the trained autoencoders learn general representations from data, which are independent from specific tasks. Preliminary results from the work by van Berlo et al. (van Berlo et al., 2020) show promising potential of using autoencoders to implement semi-supervised FL. Compared to their work, we evaluate different local models (i.e., autoencoders, convolutional autoencoders, and LSTM autoencoders), investigate different design considerations, and test how efficient its local activity recognition is when running on low-cost edge devices.
半监督学习结合了需要标记数据的监督学习和训练DNN模型时不使用标签的无监督学习。传统的集中式机器学习得益于半监督学习技术,例如转移学习 (Khan和Roy,2018 ; Zhang和Ardakanian,2019)和自动编码器 (Baldi,2011)。这些技术已广泛用于集中式机器学习中,例如从视频中学习时间序列表示 (Srivastava et al。,2015),学习表示以压缩本地数据 (Hu和Krishnamachari,2020)以及学习不包含敏感信息的 表征(Malekzadeh等,2018)。

在FL中拥有可用的本地标签的挑战激发了许多旨在以半监督或自我监督方式实现FL的系统。该领域中的大多数现有解决方案着重于为未标记的数据生成伪标签,并使用这些标签进行监督学习 (Jeong等,2020; Liu等,2020; Zhang等,2020; Long等) 。,2020 ;张等人,2021 ;康等人。,2020 ;王等,2020 ;杨等。,2020年)。例如Jeong 等。 (Jeong等人,2020年)使用数据增强来生成假标签,并在不同的FL客户端之间保持标签的一致性。但是,客户端之间的一致性要求某些客户端与其他客户端共享其数据,这带来了隐私问题。刘 等。 (Liu et al。,2020)在FL服务器上使用标记的数据通过监督学习来训练模型,然后将此模型发送给FL客户端以在其本地数据上生成标签。这些解决方案将客户端的本地训练与服务器上的特定任务结合在一起,这意味着客户端必须为具有不同任务的所有服务器生成伪标签。

半监督FL的另一个方向是在客户端本地对自动编码器进行无监督学习,而不是生成伪标签。与现有解决方案相比,训练有素的自动编码器从数据中学习通用表示形式,而与特定任务无关。van Berlo等人 的工作的初步结果 (van Berlo et al。,2020)展示了使用自动编码器实现半监督FL的潜力。与他们的工作相比,我们评估了不同的本地模型(即自动编码器,卷积自动编码器和LSTM自动编码器),研究了不同的设计注意事项,并测试了在低成本边缘设备上运行时其本地活动识别的效率如何。


Our goal is to implement HAR using an FL system, without having any labelled data on the edge clients. We first introduce the long short-term memory model, which is a technique for analysing time-series data for HAR. We then introduce autoencoders, which are the key technique for deep unsupervised learning. We finally demonstrate the design of our proposed semi-supervised FL system and describe how unsupervised and supervised learning models are used in our framework.

我们的目标是使用FL系统实现HAR,而边缘客户端上没有任何带标签的数据。我们首先介绍长短期记忆模型 (Hochreiter和Schmidhuber,1997年),这是一种分析HAR的时序数据的技术。然后,我们介绍自动编码器,这是深度无监督学习的关键技术。最后,我们演示了我们提出的半监督FL系统的设计,并描述了如何在我们的框架中使用无监督和监督学习模型。

3.1 Long short-term memory 长短期记忆

The long short-term memory (LSTM) belongs to recurrent neural network (RNN) models, which are a class of DNN that processes sequences of data points such as time-series data. At each time point of the time series, the output of an RNN, which is referred to as the hidden state, is fed to the network together with the next data point in the time-series sequence. An RNN works in a way that, as time proceeds, it recurrently takes and processes the current input and the previous output (, the hidden state), and generates a new output for the current time. Specifically for LSTM, Fig.lstm shows the network structure of a basic LSTM unit, which is called an LSTM cell. At each time $ t $ , it takes three input variables, which are the current observed data point $ X_{t} $ , the previous state of the cell $ C_{t-1} $ , and the previous hidden state $ h_{t-1} $ . For the case of applying LSTM to HAR, $ X_{t} $ is a vector of all the observed sensory readings at time $ t $ . $ h_{t} $ is the hidden state of the activity to be recognised in question.
长短期内存(LSTM)属于经常性神经网络(RNN)模型,这是一个类别的DNN,其处理诸如时间序列数据的数据点序列。在时间序列的每个时间点,将被称为隐藏状态的RNN的输出与时序序列中的下一个数据点一起馈送到网络。 RNN以时间的方式工作,随着时间的继续,它循环采用并处理当前输入和先前输出(,隐藏状态),并为当前时间生成新的输出。专门针对LSTM,图1S显示了基本LSTM单元的网络结构,称为LSTM * Cell *。在每个时间$ t $时,需要三个输入变量,它是当前观察到的数据点$ X_{t}$,上述单元格$ C_{t-1} $,以及先前的隐藏状态$ h_{t-1} $。对于将LSTM应用于Har的情况,$ X_{t} $是时间$ t $的所有观察到的感觉读数的向量。 $ h_{t} $是要识别的活动的隐藏状态。

Figure 1. Network structure of a long short-term memory (LSTM) cell. At each time point t, the current cell state $ C_{t} $ and hidden state $ h_{t} $is dependent on the previous cell state $ C_{t-1} $, the previous hidden state $ h_{t} $ , and the current observed data point $ X_{t} $.
图1.长短期记忆(LSTM)单元的网络结构。在每个时间点t,当前单元格状态 $ C_{t} $ 和隐藏状态 $ h_{t} $ 取决于先前的电池状态 $ C_{t-1} $个,以前的隐藏状态 $ h_{t} $ ,以及当前观察到的数据点 $ X_{t} $ 。

LSTM can be used in both supervised learning and unsupervised learning. For supervised learning, each $ X_{t} $ of a time-series sequence has a corresponding label $ Y_{t} $ (, activity class at time point $ t $ ) as the ground truth. The hidden state $ h_{t} $ can be fed into a Softmax classifier that contains a fully-connected layer and a Softmax layer. By this means, such an LSTM classifier can be trained against the labelled activities through feedforward and backpropagation to minimise the loss (, Cross-entropy loss) between the classifications and the ground truth. For unsupervised learning, LSTM can be trained as components of an autoencoder, which we will describe in detail in Sec.autoencoder.

lstm可用于监督学习和无人监督的学习。对于受监督学习,时间序列序列的每个$ X_{t} $具有相应的标签$ Y_{t} $(在时间点$ t $的活动类)作为地面真理。隐藏状态$ h_{t} $可以馈送到包含完全连接的图层和软邮件层的软MAX分类器中。通过这种方式,可以通过前馈和反向化对标记的活动训练这种LSTM分类器,以最小化分类和地面真理之间的损耗(,跨熵损失)。对于无监督的学习,LSTM可以被视为AutoEncoder的组件,我们将在Sec.AutoenCoder中详细描述。


An autoencoder is a type of neural network that is used to learn latent feature representations from data. Different from supervised learning that aims to learn a function $ f(X)\rightarrow Y $ from input variables $ X $ to labels $ Y $ , an autoencoder used in unsupervised learning tries to encode $ X $ to its latent representation $ h $ and to decode $ h $ into a reconstruction of $ X $ , which is presented as $ X^\prime $ . Fig.ae demonstrates two types of autoencoders that use different neural networks. The simple autoencoder in Fig.ae uses fully connected layers to encode $ X $ into $ h $ and then to decode $ h $ into $ X^\prime $ . The convolutional autoencoder in Fig.cae uses a convolutional layer that moves small kernels alongside the input $ X $ and conducts convolution operations on each part of $ X $ to encode it into $ h $ . The decoder part uses a transposed convolutional layer that moves small kernels on $ h $ to upsample it into $ X^\prime $ .


Figure 4. Network structures of a simple autoencoder and a convolutional autoencoder. The encoder part compresses the input X into a representation h that has fewer dimensions. The decoder part tries to generate a reconstruction X′ from h, which is supposed to be close to X.
图4.简单自动编码器和卷积自动编码器的网络结构。编码器部分压缩输入X 成为代表 H尺寸较小。解码器部分尝试生成重构X′ 从 H,应该接近 X。

Ideally, $ X^\prime $ is supposed to be as close to $ X $ as possible, based on the assumption that the key representations of $ X $ can be learned and encoded as $ h $ . As the dimensionality of $ h $ is lower than that of $ X $ , there is less information in $ h $ than in $ X $ . Thus the reconstructed $ X^\prime $ is likely to be a distorted version of $ X $ . The goal of training an autoencoder is to minimise the distortion, , minimising a loss function $ L(X,X^\prime) $ , thereby producing an encoder (e.

autoencoder是一种神经网络,用于学习来自数据的潜在特征表示。与监督学习不同,旨在从输入变量$ f(X) $到标签$ X $到标签$ Y $,用于无监督学习的autoencoder尝试编码 $ X $到其潜在表示$ h $和 $ h $解码为$ X $的重建,其呈现为$ X^\prime $。图。演示了使用不同神经网络的两种类型的AutoEncoders。图1中的简单AutoEncoder使用完全连接的层将$ X $编码为$ h $,然后将$ h $解码为$ X^\prime $。图CAE中的卷积AutoEncoder使用卷积层,该卷积层与输入$ X $一起移动小内核,并在$ X $的每个部分上对其进行卷积操作以将其编码为$ h $。解码器部件使用转置的卷积层,该层在$ h $上移动小内核以将其上置为$ X^\prime $。

理想情况下,$ X^\prime $应该基于$ X $的密钥表示和编码为$的假设,如此接近$ X $。 h $。随着$ h $的维度低于$ X $的维度,$ h $中的信息较少而不是$ X $。因此,重建的$ X^\prime $可能是$ X $的扭曲版本。训练AutoEncoder的目标是最小化失真,最小化损耗函数$ L(X,X^\prime) $,从而产生编码器(如,完全连接的隐藏层或卷积隐藏层)可以在其表示$ h $中捕获$ X $最有用的信息。如上所述,STSM,LSTM也可以用作LSTM-AutoEncoder的组件以编码时间序列数据。如图1所示,LSTM小区用作AutoEncoder的编码器,并将时间串联序列作为其输入。最终隐藏状态$ h_{3} $是$ X_{3} $在序列$(X_{1},X_{2},X_{3}) $的上下文中的表示。由于LSTM编码器的隐藏状态基于输入观察和先前的隐藏状态,以这种方式生成的表示压缩了观察中的特征和时间序列序列的信息。解码器是另一个LSTM小区,以反向顺序重建原始序列。因此,LSTM-AutoEncoder的目标是最小化原始序列和重建序列之间的损耗。

Figure 5. Network structure of an LSTM-autoencoder. The time-series sequence (X1,X2,X3) is input into an LSTM encoder cell and the final output hidden state h3 (i.e., after X3 is input into the encoder) is the representation of X3 in the context of (X1,X2,X3). A sequence of the learned representation with the same length as that of the original sequence, i.e., (h3,h3,h3), is input into an LSTM decoder cell. The output sequence tries to reconstruct the original sequence in reversed order.
图5. LSTM自动编码器的网络结构。时间序列(X1个,X2个,X3) 输入到LSTM编码器单元中,最终输出处于隐藏状态 H3(,后X3 输入到编码器中)是 X3 在...的背景下 (X1个,X2个,X3)。学习的表示的序列,其长度与原始序列的长度相同,,(H3,H3,H3)被输入到LSTM解码器单元中。输出序列尝试以相反的顺序重建原始序列。

Since our system runs unsupervised learning locally at the edge and supervised learning in the cloud, we consider simple autoencoders, convolutional autoencoders, and LSTM-autoencoders in our proposed system, in order to understand how the location where time-series information is captured (, in supervised learning or unsupervised learning) affect the performance of our system.


System design 系统设计

In a canonical FL system, as in a client-server structure, a cloud server periodically sends a global model to selected clients for updating the model locally. As shown in Fig.system-supervised, in each communication round $ t $ , a global model $ w^g_{t} $ is sent to three selected clients, which conduct supervised learning on $ w^g_{t} $ with their labelled local data. The resulting local models are then sent to the server, which uses the federated averaging (FedAvg) algorithm to aggregate these models into a new global model $ w^g_{t+1} $ . The server and clients repeat this procedure through multiple communication rounds between them, thereby fitting the global model to clients' local data without releasing the data to the server.
在规范的FL系统中,就像在客户端-服务器结构中一样,云服务器会定期向选定的客户端发送全局模型,以在本地更新模型。如图6所示 ,在每个通信回合中$ t $中,将全局模型$ w^g_{t} $发送到三个选定的客户端,该客户在$ w^g_{t} $上进行监督学习,其中包含标记的本地数据。然后将生成的本地模型发送到服务器,该服务器使用联邦平均(FADVG)算法将这些模型聚合到新的全局型号$ w^{g}_{t+1} $中。服务器和客户端通过它们之间的多个通信轮次重复此过程,从而将全局模型拟合到客户端的本地数据,而不将数据释放到服务器。

Figure 6. System structure of a canonical federated learning (FL) system with supervised learning. The server selects 3 clients and sends the global $ w^g_{t} \ to them and the clients use their labelled data to update $ w^g_{t} $into their local models, which are then sent to the server to be aggregated into a new global model using the FedAvg algorithm.
图6.具有监督学习的规范联邦学习(FL)系统的系统结构。服务器选择3个客户端并发送全局$ w^g_{t} \给他们,客户使用他们标记的数据来更新$ w^g_{t} \本地模型,然后使用FedAvg算法将其发送到服务器以汇总为新的全局模型。

In order to address the lack of labels on clients in HAR with IoT sensory data, our proposed system applies semi-supervised learning in an FL system, in which clients use unsupervised learning to train autoencoders with their unlabelled data, and a server uses supervised learning to train a classifier that can map encoded representations to activities with a labelled dataset. As shown in Fig.system-semi, in each communication round, the server sends a global autoencoder $ w^ag_{t} $ to selected clients. In order to update $ w^ag_{t} $ locally, clients run unsupervised learning on $ w^ag_{t} $ with their unlabelled local data and then send the resulting local autoencoders to the server.


如图7所示 ,在每个通信回合中,服务器发送一个全局自动编码器 $ w^ag_{t} $发送到所选客户端。为了在本地更新$ w^ag_{t} $,客户端在$ w^ag_{t} $上运行无监督的学习,并将其未标记的本地数据发送,然后将生成的本地AutoEncoders发送到服务器。

The server follows the standard FedAvg algorithm to generate a new global autoencoder $ w^ag_{t+1} $ , which is then plugged into the pipeline of supervised learning with a labelled dataset $ D=(X,Y) $ . The server first uses the encoder part of $ w^ag_{t+1} $ to encode the original features $ X $ into representations $ X^\prime $ in order to generate a labelled representation dataset $ D^\prime=(X^\prime,Y) $ . Then the server conducts supervised learning with $ D^\prime $ to update a classifier $ w^s_{t} $ into $ w^s_{t+1} $ . Fig.alg shows the detailed semi-supervised algorithm of our system.


In each communication round $ t $ , the resulting classifier $ w^{s}{t} $ is also sent to selected clients with the global autoencoder $ w^{g{a}}{t} $ . In order to locally recognise activities from its observations $ X $ , a client first uses the encoder part of $ w^{g{a}}{t} $ to transform $ X $ into its presentation $ X^\prime $ , and then feeds $ X^\prime $ into the classifier $ w^{s}{t} $ to recognise the corresponding activities.

该服务器遵循标准FADVG算法,生成新的全局AutoEncoder $ w^{a_{g}}{t+1} $,然后用标记的DataSet $ D=(X,Y) $插入监督学习的管道。服务器首先使用$ w^{a{g}}{t+1} $的编码器部分将原始功能$ X $编码为表示$ X^\prime $,以生成标记表示数据集$ D^\prime,Y) $。然后,服务器通过$ D^\prime $进行监督学习,将分类器$ w^s{t} $更新为$ w^s_{t+1} $。图展示了我们系统的详细半监督算法。

在每个通信轮$ t $,所得到的分类器$ w^s_{t} $也被发送到选定的客户端与全球自动编码$ w^{g_{a}}{t} $。为了从其观测到$ X $,客户端首先使用$ w^{g{a}} T28_2{t} $的编码器部分将$ X $转换为其演示文稿$ X^\prime $,然后将$ X^\prime $馈送到分类器中$ w^s_{t} $识别相应的活动。
Figure 8. Algorithm of semi-supervised FL. nk and n are the numbers of unlabelled samples on client k and on all selected clients, respectively. LocalTraining is unsupervised learning on the global autoencoder $ w^ag_t $ on a client. CloudTraining is supervised learning on the classifier wst on the server.


We evaluated our system through simulations on different human activity datasets with different system designs and configurations. In addition we evaluated the local activity recognition algorithms of our system on a Raspberry Pi 4 model B. We want to answer research questions as follow:

  • Q1. How does our system perform in comparison to supervised learning on a centralised server?
  • Q2. How does our system perform in comparison to semi-supervised FL using data augmentation?
  • Q3. How does our system perform in comparison to supervised FL?
  • Q4. How do the key parameters of our system, including the size of labelled samples on the server and the size of learned representations, affect the performance of HAR.
  • Q5. How efficient is semi-supervised FL on low-cost edge devices.


我们通过对具有不同系统设计和配置的不同人类活动数据集进行仿真来评估我们的系统。此外,我们在Raspberry Pi 4模型B上评估了系统的本地活动识别算法。我们想回答以下研究问题:

  • Q1。与集中式服务器上的监督学习相比,我们的系统的性能如何?
  • Q2。与使用数据增强的半监督FL相比,我们的系统的性能如何?
  • Q3。与监督FL相比,我们的系统的性能如何?
  • Q4。我们系统的关键参数(包括服务器上标记的样本的大小和学习到的表示的大小)如何影响HAR的性能。
  • Q5。半监督FL在低成本边缘设备上的效率如何。

Datasets 数据集

We used three HAR datasets that contain time-series sensory data in our evaluation. The datasets have different numbers of features and activities with different durations and frequencies.

The Opportunity (Opp) dataset (Chavarriaga et al., 2013) contains short-term and non-repeated kitchen activities of 4 participants. The Daphnet Freezing of Gait (DG) dataset (Bachlin et al., 2009) contains Parkinson’s Disease patients’ freezing of gaits incidents collected from 10 participants, which are also short-term and non-repeated. The PAMAP2 dataset (Reiss and Stricker, 2012) contains household and exercise activities collected from 9 participants, which are long-term and repeated. The data pre-processing procedure in our evaluation is the same as described by Hammerla et al. (Hammerla et al., 2016). Table 1 shows detailed information about the used datasets after being pre-processed.

机会(Opp)数据集 (Chavarriaga等人,2013年)包含4位参与者的短期和非重复性厨房活动。Daphnet步态冻结(DG)数据集 (Bachlin et al。,2009)包含帕金森氏病患者对10例参与者的步态事件的冻结,这些事件也是短期且未重复的。PAMAP2数据集 (Reiss和Stricker,2012年)包含从9位参与者那里收集的长期和重复的家庭和锻炼活动。我们评估中的数据预处理程序与Hammerla等人 描述的相同*。*(哈默拉等。,2016)。表 1 显示了有关预处理后使用的数据集的详细信息。

Dataset Activities Features Classes Train Test
Opp Kitchen 79 18 651k 119k
DG Gait 9 3 792k 81k
PAMAP2 Household & Exercise 52 12 473k 83k

Table 1. HAR datasets in our experiments.

Simulation setup 模拟设置

We simulated a semi-supervised FL that runs unsupervised learning on 100 clients to locally update autoencoders and runs supervised learning on a server to update a classifier. In each communication round $ t $ , the server selects $ 100\cdot C $ clients to participate in the unsupervised learning, and $ C $ is the fraction of clients to be selected. Each selected client uses its local data to train the global autoencoder $ w^{a_{g}}{t} $ with a learning rate $ lr{a} $ for $ e_{a} $ epochs. The server conducts supervised learning to train the classifier $ w^{s}{t} $ with a learning rate $ lr{s} $ for $ e_{s} $ epochs. For each individual simulation setup, we conducted 64 replicates with different random seeds. Based on the assumption that a server is more computationally powerful than a client in practice, we set the learning rates $ lr_{a} $ and $ lr_{s} $ as 0.01 and 0.001, respectively. Similarly, we set the numbers of epochs $ e_{a} $ and $ e_{s} $ as 2 and 5, because an individual client is only supposed to run a small number of epochs of unsupervised learning and a server is capable of doing more epochs of supervised learning. The reason for setting $ e_{s}=5 $ is to keep the execution time of our simulation in an acceptable range. Nevertheless, we believe that this parameter on the server can be set as a larger number in real-world applications where more powerful clusters and graphics processing units (GPUs) can be deployed to accelerate the convergence of performance.

我们模拟了一个半监控的FL,在100个客户端上运行无监督的学习到本地更新AutoEncoders,并在服务器上运行监督学习以更新分类器。在每个通信循环$ t $中,服务器选择$ 100\cdot C $客户端参与无监督的学习,而$ C $是要选择的客户端的分数。每个所选客户端都使用其本地数据来训练全局AutoEncoder $ w^{a_{g}}{t} $,用于$ e{a} $时期的学习速率$ lr_{a} $。服务器进行监督学习,以训练分类器$ w^{t} $,用于$ e_{s} $时期的学习速率$ lr_{s} $。对于每个单独的仿真设置,我们用不同的随机种子进行了64个重复。基于服务器在实践中比客户更强大的假设,我们将学习速率$ lr_{a}$和$ lr_{s} $分别设置为0.01和0.001。同样,我们设置了时代的数量$ e_{a} $和$ e_{s} $为2和5,因为单个客户端仅应该运行少数无监督学习的时期,并且服务器能够做更多的监督时期的监督学习。设置$ e_{s}=5 $的原因是将我们的模拟的执行时间保持在可接受的范围内。尽管如此,我们认为服务器上的此参数可以在现实世界应用程序中设置为更大的数字,其中可以部署更强大的群集和图形处理单元(GPU)以加速性能的收敛。

Baselines 基准线

To answer Q1 and Q2, we consider two baselines to 1) evaluate whether the autoencoders in our system improve the performance of the system and 2) compare the performance of our system to that of data augmentation based semi-supervised FL.

Since we assume that labelled data exist on the server of the system, thus for ablation studies, we consider a baseline system that only uses these labelled data to conduct supervised learning on the server and sends trained models to clients for local activity recognition. This system trains an LSTM classifier on labelled data on the server and does not train any autoencoders on clients. We refer to this baseline of a centralised system as CS. Comparing the performance of CS to that of our proposed system will indicate whether the autoencoders in our system have any effectiveness in improving the performance of the trained model.

To compare our system with the state of the art, we consider a semi-supervised FL system that uses data augmentation to generate pseudo labels as another baseline. We refer to this baseline as DA. It first conducts supervise learning on labelled data on the server to train an LSTM classifier. It then follows standard FL protocols to sends the trained global model to clients. Each client uses the received model to generate pseudo labels on their unlabelled local data. To introduce randomness in data augmentation, we feed sequences with randomised lengths into the model when generating labels. The sequences are then paired with the labels that are generated from them as a pseudo-labelled local dataset, which is used for locally updating the global model.




Autoencoders and classifiers 自动编码器和分类器

We implement three schemes for our system with different autoencoders, including simple autoencoders, convolutional autoencoders, and LSTM-autoencoders.

The first scheme uses a simple autoencoder with fully connected (FC) layers to learn representations from individual samples in unsupervised learning and uses a classifier that has an LSTM cell with its output hidden states connected to an FC layer and a Softmax layer, which we refer to as FC-LSTM.

The second scheme uses 1-d convolutional and transposed convolutional layers in its autoencoder. The convolutional layer has 8 output channels with kernel size 3 and has both stride and padding sizes equal to 1. The output is batch normalised and then fed into a ReLU layer. To control the size of the encoded h, after the ReLU layer, we flatten the output of 8 channels and feed it into a fully connected layer that transforms it into the h with a specific size. For the decoder part, we have a fully connected layer whose output is unflattened into 8 channels. Then we use a 1-d transposed convolutional layer that has 1 output channel with kernel size 3 and has both stride and padding sizes equal to 1, to generate the decoded X′. The LSTM classifier of this scheme has the same structure as that in FC+LSTM. We refer to this scheme as CNN-LSTM.

For the third scheme, we use an LSTM-autoencoder to capture time-series information in local unsupervised learning. Both the encoder and the decoder have 1 LSTM cell. It uses a Softmax classifier that has a fully connected layer and a Softmax layer. We refer to this scheme as LSTM-FC

For the LSTM classifiers in our experiments, we adopted the bagging (i.e., bootstrap aggregating) strategy similar to Guan and Plötz (Guan and Plötz, 2017) to train our models with random batch sizes and sequence lengths. In all schemes, we used the mean square error (MSE) loss function for autoencoders and the cross-entropy loss function for classifiers. We used the Adam optimiser in the training of all models. All the deep learning components in our simulations were implemented using PyTorch libraries (Paszke et al., 2019).



第二种方案在其自动编码器中使用1-d卷积和转置卷积层。卷积层具有8个输出通道,内核大小为3,步幅和填充大小均等于1。将输出批量标准化,然后馈入ReLU层。控制编码的大小 H,在ReLU层之后,我们将8个通道的输出展平,并将其馈入一个完全连接的层,该层将其转换为 H具有特定的大小。对于解码器部分,我们有一个完全连接的层,其输出未展平为8个通道。然后,我们使用一维转置卷积层,该层具有1个输出通道,内核大小为3,并且步幅和填充大小均等于1,以生成解码后的卷积层。X′。该方案的LSTM分类器具有与FC + LSTM相同的结构。我们将此方案称为CNN-LSTM。


对于我们实验中的LSTM分类器,我们采用类似于Guan和Plötz (Guan andPlötz,2017)的装袋(自举聚合)策略 来训练具有随机批次大小和序列长度的模型。在所有方案中,我们对自动编码器使用均方误差(MSE)损失函数,对分类器使用交叉熵损失函数。我们在所有模型的训练中都使用了Adam优化器。我们模拟中的所有深度学习组件均使用PyTorch库实现 (Paszke等人,2019)。

Label ratio and compression ratio 标签率和压缩率

We adjusted two parameters to control the amount of labelled data and the size of representations. For an original training dataset that has $ N^{l} $ time-series samples with labels, we adjusted the label ratio $ r^{l}\in(0,1) $ and took $ r^{l}\cdot N^{l} $ samples from it as the labelled training dataset on the server. Since the samples are formed as time-series sequences, to avoid breaking the activities by directly taking random samples from the sequences, we first divided the entire training set into 100 divisions. We then randomly sampled $ 100\cdot r^{l} $ divisions and concatenated them as the labelled training dataset on the server. For a training dataset whose observations have $ N^{f} $ features, we adjusted the compression ratio $ r^{f}\in(0,1) $ and used the rounded value of $ r^{f}\cdot N^{f} $ as the size of the representation when training autoencoders.

我们调整了两个参数以控制标记数据的数量和表示的大小。对于具有$ N^{l} $时序样本的原始训练数据集,我们调整了标签比 $ r^(0,1) $并取$ r^{l}\cdot N^{l} $样本作为标记的训练数据集服务器。由于样品形成为时间序列序列,以避免通过直接从序列中取随随机样本来破坏活动,因此我们首先将整个训练分成100个分裂。然后,我们随机采样$ 100\cdot r^{l} $部门并将它们连接为服务器上标记的训练数据集。对于训练数据集,其观察具有$ N^{f} $功能,我们调整了压缩比 $ r^{f}\in(0,1) $,并使用$ r^{f}\cdot N^{f} $的舍入值作作为训练自动编码器时的表示形式的大小。

IID and Non-IID local data IID和非IID本地数据

We used two strategies to generate local training data for clients from training datasets. In both strategies, the number of allocated samples for each client, , $ n_{k} $ , equals to $ \frac{n^{o}}{n^{p}} $ , where $ n^{o} $ is the number of samples in the original training dataset shown in Tabledatasets and $ n^{p} $ is the number of participants (e.g., 4 for the Opp dataset) of the original dataset. To generate , we divided the training dataset into 100 divisions. For a client to be allocated $ n_{k} $ samples, its local training data evenly distribute in these divisions. In each division, a time window that contains $ \frac{n_{k}}{100} $ continuous samples is randomly selected, without their labels, as the client's sample fragment in this division. The sample fragments from all divisions are then concatenated as the local training dataset for the client in the IID scenario. For , we randomly located a time window with length $ n_{k} $ in the training dataset and used the samples without labels in the time window as the local training dataset for the client in the Non-IID scenario. By this means, the local training dataset of each client can only represent the distribution within a single part in the unlabelled dataset.

我们使用了两种策略来为来自训练数据集的客户端生成本地训练数据。在两个策略中,每个客户端的分配样本数,$ n_{k} $等于$\frac{n^{o}}{n^{p}} $,其中$ n^{o} $是tabledatasets中所示的原始训练数据集中的样本数,$ n^{p} $是原始数据集的参与者数(例如,用于OPP数据集的4个)。要生成,我们将训练数据集分为100个部门。对于要分配$ n_{k} $样本的客户端,其本地训练数据均可均匀地分布在这些部门中。在每个划分中,包含$\frac{n_{k}}{100} $连续样本的时间窗口被随机选择,无需其标签,作为该司中的客户的样本片段。然后,来自所有部门的样本片段被连接为IID方案中客户端的本地训练数据集。对于,我们随机位于训练数据集中的长度$ n_{k} $的时间窗口,并在时间窗口中使用了没有标签的样本,作为非IID方案中客户端的本地训练数据集。通过这种方式,每个客户端的本地训练数据集只能代表未标记的数据集中的单个部分内的分布。

Edge device setup ### 边缘设备设置

Apart from simulations, to answer Q5, we evaluated the local activity recognition part of our system on a Raspberry Pi 4 Model B. The specifications of the device are shown in Table.pi.
除了模拟以外,为了回答问题5,我们还在Raspberry Pi 4 Model B上评估了系统的本地活动识别部分。该设备的规格如表所示。 2

CPU Quad core Cortex-A72 (ARM v8) 64-bit SoC @ 1.5GHz
Storage SanDisk Ultra 32GB microSDHC Memory Card
OS Ubuntu Server 19.10

Table 2. System specifications of Raspberry Pi 4 Model B.
表2. Raspberry Pi 4 Model B的系统规格

Compared with supervised FL, on the one hand, our system introduces local autoencoders that encode samples into representations before feeding them into classifiers, which costs additional processing time. On the other hand, encoded representations have smaller sizes than original samples do, which reduces the processing time of classifiers. To understand how these two factors affect the overall local processing time, we tested both supervised FL and our system on the Raspberry Pi and compared their performances. We divided the testing datasets into one-second-long sequences and measured the overall processing time of the trained models (, autoencoders + classifiers) on each sequence, in order to calculate the overhead for each one-second time window.

与有监督的FL相比,一方面,我们的系统引入了本地自动编码器,可以在将样本输入分类器之前将其编码为表示形式,这会花费额外的处理时间。另一方面,编码表示具有比原始样本小的尺寸,这减少了分类器的处理时间。为了了解这两个因素如何影响整体本地处理时间,我们在Raspberry Pi上测试了监督FL和我们的系统,并比较了它们的性能。我们将测试数据集划分为一秒长的序列,并测量每个序列上经过训练的模型(,自动编码器+分类器)的总体处理时间,以便计算每个一秒时间窗口的开销。

Metrics 指标

We evaluated the performance of the global autoencoder and the classifier with the testing datasets at the end of every other communication round. We first used a time window to select 5000 samples each time. As the sampling frequency in the processed datasets is approximately $ 33Hz $ , this time window represents activities in about 2.53 minutes. We then applied the global autoencoder on the samples in the time window to encode them into a sequence of labelled representations. The classifier was applied to the sequence of representations to recognise the activities, which were then compared with the ground truth labels. We calculate the accuracy in the time window, which is the fraction of correctly classified representations among all representations. The accuracies from different time windows are averaged as the accuracy of the system. In every other communication round $ t $ , we calculate the average value from 64 simulation replicates and its standard error.


我们评估了全局AutoEncoder和分类器的性能,并在每个其他通信结束时使用测试数据集。我们首先使用时间窗口来每次选择5000个样本。随着处理后数据集中的采样频率约为$ 33Hz $,此时间窗口表示约2.53分钟的活动。然后,我们将全局AutoEncoder应用于时间窗口的样本中,以将它们进行编码为标记表示的序列。分类器应用于表示活动的表示顺序,然后将其与ground truth标签进行比较。我们计算时间窗口中的精度,这是所有表示中正确分类表示的分数。不同时间窗口的准确性平均为系统的准确性。在每个其他通信循环$ t $中,我们计算64模拟复制的平均值及其标准误差。

Results 结果

We find that our proposed semi-supervised FL system has higher accuracy than the centralised system that only conducts supervised learning on the server. The accuracy is also higher than that of data augmentation based semi-supervised FL and is comparable to that of supervised FL that requires more labelled data and bigger local models. In addition, it has marginal local activity recognition time on a low-cost edge device.


Analysis of autoencoders and classifiers ### 自动编码器和分类器的分析

We first look at the contribution of the autoencoders in our proposed system. As in our assumption, the server has some labelled data that can be used for supervised learning. If the server has enough labelled data to train a decent model, its accuracy may be higher than that of a semi-supervised FL.Thus the centralised system (CS) is a natural baseline that our system needs to surpass.


We adjust the label ratio $ r^l $ from $ 1/2 $ to $ 1/32 $ in ablation studies for each scheme and try to find out if our system has higher accuracy than CS. We find that the scheme FC-LSTM (, using simple autoencoders and LSTM classifiers) has lower accuracy than the CS baseline does under all circumstances. Therefore we remove it from our analysis. For the other schemes, we keep two $ r^l $ values that lead to the two highest accuracies that are higher than that of the CS baseline on each dataset. Thus on the Opp dataset, our schemes have better performance than the CS baseline does when $ r^l={1/16, 1/32} $ . On the DG and PAMAP2 datasets, we have $ r^l={1/2, 1/4} $ . We test our schemes on both IID and Non-IID data but have not found significant differences in the accuracy because our schemes do not use any labels locally. Thus we only show the results on IID data. All schemes' accuracy converges after 50 communication rounds and we only show the results during this period.

我们将$ 1/2 $从$ 1/2 $调整为$ 1/32 $,以便在每个方案的消融研究中调整$ 1/32 $,并尝试了解我们的系统是否具有比CS更高的准确性。我们发现该方案FC-LSTM(使用简单的AutoEncoders和LSTM分类器)的精度低于CS基线在所有情况下都具有较低的精度。因此,我们将其从我们的分析中删除。对于其他方案,我们保留了两个$ r^l $值,导致两个最高精度高于每个数据集上的CS基线的最高精度。因此,在OPP数据集上,我们的方案具有比CS基线在$ r^l={1/16, 1/32} $上的性能更好的性能。在DG和PAMAP2数据集上,我们具有$ r^l={1/2, 1/4} $。我们在IID和非IID数据上测试我们的方案,但在准确性中没有发现显着差异,因为我们的方案不会在本地使用任何标签。因此,我们只显示IID数据的结果。所有方案的准确性在50个通信回合后收敛,我们在此期间只显示结果。

Ablation study of CNN autoencoder CNN自动编码器的消融研究

Fig.cnn_ablation shows the accuracy of the scheme CNN+LSTM and the scheme CS, with $ r^f=1/2 $ on different datasets. As the round of communications increases, the accuracy of all schemes goes up and converges. The converged accuracy of CNN+LSTM schemes, , using a convolutional autoencoder to learn representations locally and using an LSTM classifier for supervised learning in the cloud, is higher than that of the CS schemes that only conduct supervised learning in the cloud. This means that training CNN autoencoders locally indeed contributes to improving the accuracy of the system. When $ r^l $ decreases, the converged accuracy of CNN+LSTM goes down on all datasets, which means that it is sensitive to the change of label ratios.

Figure 9. Test accuracy of CNN autoencoders with LSTM classifiers (CNN+LSTM), and a centralised system (CS) using LSTM classifiers without autoencoders. $ r^f=1/2 $for both schemes. CNN+LSTM has higher converged accuracy than CS, which means that unsupervised learning on CNN autoencoders helps improve the performance.
图9.带有LSTM分类器(CNN + LSTM)的CNN自动编码器和使用不带自动编码器的LSTM分类器的集中式系统(CS)的测试精度。$ r^f=1/2 $对于两种方案。CNN + LSTM的融合精度高于CS,这意味着CNN自动编码器上的无监督学习有助于提高性能。

9显示了方案CNN + LSTM和方案CS的精度,其中,用$ r^f=1/2 $在不同的数据集上提供了方案。随着通信回合的增加,所有方案的准确性都会提高并趋于一致。 CNN + LSTM方案的融合准确性,使用卷积AutoEncoder在本地学习云中的LSTM分类器,用于在云中监督学习的LSTM分类器,高于仅在云中进行监督学习的CS方案的表示。这意味着训练CNN AutoEncoders本地确实有助于提高系统的准确性。当$ r^l $减小时,CNN + LSTM的融合精度在所有数据集上都会下降,这意味着它对标签比的变化敏感。

Ablation study of LSTM autoencoder LSTM自动编码器的消融研究

Fig.lstm_ablation shows the accuracy of the scheme LSTM+FC and the scheme CS, with $ r^{f}=1/2 $ . It demonstrates similar trends as Fig.cnn_ablation does. The accuracy of LSTM+FC, , using LSTM autoencoders locally and using Softmax classifiers for supervised learning in the cloud, is higher than that of CS that runs centralised and supervised learning without using unlabelled local data. However, LSTM+FC is less sensitive to the change of label ratios. For example, its converged accuracy on the Opp dataset is almost the same when we change $ r^l $ from $ 1/16 $ to $ 1/32 $ . This would enable us to achieve similar performance but require fewer labelled data compared to CNN+LSTM.
10显示了方案LSTM + FC和方案CS的精度,其中$ r^{f}=1/2 $。它展示了与图CNN_ABLATION的类似趋势。 LSTM + FC的准确性,使用LSTM AutoEncoders在本地和使用SoftMax分类器中云中的监督学习,高于CS中的CS,而不使用未标记的本地数据。但是,LSTM + FC对标签比的变化不太敏感。例如,当从$ 1/16 $到$ 1/32 $将$ r^l $更改为$ r^l $时,其对OPP数据集的融合精度几乎相同。这将使我们能够实现类似的性能,但与CNN + LSTM相比,需要更少的标记数据。

Figure 10. Test accuracy of LSTM autoencoders with Softmax classifiers (LSTM+FC), and a centralised system (CS) using LSTM classifiers without autoencoders. $ r^{f}=1/2 $for both schemes. LSTM+FC has higher converged accuracy than CS. It is less sensitive to the change of $ r^l $ than CNN+LSTM on the Opp and DG datasets.
图10.带有Softmax分类器(LSTM + FC)的LSTM自动编码器和使用不带自动编码器的LSTM分类器的集中式系统(CS)的测试精度。$ r^{f}=1/2 $对于两种方案。LSTM + FC的融合精度比CS高。和Opp和DG数据集上的CNN + LSTM相比起来$ r^l $对变化不那么敏感。

The experimental results show that, when implementing a semi-supervised FL system for HAR, both CNN autoencoders and LSTM autoencoders can improve the accuracy of the system. Using LSTM autoencoders is less sensitive to the change of available labelled data in the cloud. In the rest of our analyses of our results, we only show the accuracy of the LSTM+FC scheme.

实验结果表明,当为HAR实施半监控FL系统时,CNN AutoEncoders和LSTM AutoEncoders都可以提高系统的准确性。使用LSTM AutoEncoders对云中可用标记数据的更改不太敏感。在我们的结果分析的其余部分中,我们只显示了LSTM + FC方案的准确性。

5.2 Comparison with different FL schemes 与不同FL方案的比较

We now analyse the performance of our system in comparison with semi-supervised FL using data augmentation (DA) to generate pseudo labels and supervised FL having labelled data available on clients.


Comparison with DA 与DA的比较

Fig. 11 shows the accuracy of both LSTM+FC and DA on three datasets. On the Opp and DG datasets, the accuracy of DA increases more slowly than LSTM+FC does. But once the accuracy of both schemes converge, they do not show significant differences. On the PAMAP2 dataset, the converged accuracy of LSTM+FC is higher than that of DA. We also find that, although the accuracy of DA on the Opp and DG datasets is higher than that of CS in Fig. 10, its accuracy on the PAMAP2 dataset in Fig. 11 converges more slowly than CS does in Fig. 10. This indicates that using the received global LSTM model to generate pseudo labels and then training the model on these pseudo labels may damage the testing accuracy. Although we used time-series sequences with randomised lengths to generate pseudo labels in our experiments, DA may still risk overfitting the model to the training data and consequently has slower speed to achieve decent accuracy on testing data.
11显示了三个数据集上LSTM + FC和DA的准确性。在Opp和DG数据集上,DA的精度增加的速度比LSTM + FC慢。但是,一旦这两种方案的准确性收敛,它们就不会显示出显着差异。在PAMAP2数据集上,LSTM + FC的收敛精度高于DA。我们还发现,尽管Opp和DG数据集上DA的精度高于图10中的CS,但图 11中PAMAP2数据集上的DA的 收敛速度却比图10中的CS慢。 。这表明使用接收到的全局LSTM模型生成伪标签,然后在这些伪标签上训练模型可能会破坏测试准确性。尽管我们在实验中使用了具有随机长度的时间序列序列来生成伪标记,但DA仍可能会使模型过度拟合训练数据,因此速度较慢,无法在测试数据上获得不错的准确性。

Figure 11. Test accuracy of LSTM autoencoders with Softmax classifiers (LSTM+FC), and semi-supervised FL using data augmentation (DA). There is not significant difference between their converged accuracy on the Opp and DG datasets. LSTM+FC has higher converged accuracy than DA on the PAMAP2 dataset.
图11.具有Softmax分类器(LSTM + FC)和使用数据增强(DA)的半监督FL的LSTM自动编码器的测试精度。它们在Opp和DG数据集上的收敛精度之间没有显着差异。在PAMAP2数据集上,LSTM + FC的收敛精度高于DA。

Our results indicate that, for semi-supervised FL, using locally trained autoencoders can achieve higher converged accuracy than using data augmentation to generate pseudo labels. In addition, compared with DA, our scheme is independent from the specific tasks provided by the server. For example, if one client uses its unlabelled data to access multiple FL servers that conduct different tasks, with data augmentation, the client has to generate pseudo labels for each of models of these tasks. In our scheme, the client only conducts unsupervised learning locally using unlabelled data to learn general representations, which is independent from the labels in the cloud.


Comparison with supervised FL 与监督FL的比较

Fig.supervised shows the accuracy of LSTM+FC with $ r^f=1/2 $ and a supervised FL scheme. The supervised FL uses all the information (, 100% features and 100% labels) in the training datasets. Therefore it has higher accuracy than that of LSTM+FC. However, our scheme enables a trade-off between the performance of the system (, accuracy), the cost of data annotation (, label ratio), and the size of models (, compression ratio). For example, having larger compression ratio $ r^f=3/4 $ on the PAMAP2 dataset can lead to a higher accuracy (shown in Fig.compression_ratio) that is comparable to that of the supervised FL.

Figure 12. Test accuracy of LSTM autoencoders with Softmax classifiers (LSTM+FC, $ r^f=1/2 $ , and supervised FL using 100% features and labels. The converged accuracy of LSTM+FC is comparable to that of the supervised FL and requires fewer labelled data.

The experimental results suggest that we can implement FL systems in a semi-supervised fashion with fewer needed labels than those in supervised FL, meanwhile achieve comparable accuracy. Although one of the motivations of FL is to hold models instead of personal data in the cloud to address potential privacy issues, the data held by the server of our system do not have to be from the users of the service of the system. This kind of dataset in the cloud has been used in FL to address other challenges such as dealing with Non-IID data by creating a small globally shared dataset and does not necessarily contain private information. We believe that service providers can collect these data from open datasets, or from laboratory trials in controlled environment where data subjects give their consents to contribute their data.


5.3 Analysis of compression ratio 压缩比分析

We also investigate how the compression ratio $ r^{f} $ of autoencoders affects the accuracy of our system. It is an important factor that can affect the number of parameters and the size of local models. These local models are regularly uploaded from the clients to the server over network, hence their sizes affect the outbound traffic. We use $ r^{f}={3/4,1/2, 1/4,1/8} $ on all datasets. We keep $ r^{l}=1/16 $ for the Opp dataset and $ r^{l}=1/2 $ for both the DG and PAMAP2 datasets. Fig.compression_ratio demonstrates that, on the Opp and the DG datasets, our system can compress an original sample into a representation whose size is only $ 1/4 $ of the original sample without significantly affecting the accuracy. On the PAMAP2 dataset, increasing $ r^f $ from $ 1/2 $ to $ 3/4 $ can lead to accuracy that is comparable to that of the supervised FL scheme.
我们还研究了自动频率的压缩比$ r^{f} $如何影响我们系统的准确性。这是一个重要因素,可以影响参数的数量和本地模型的大小。这些本地模型通过网络从客户端上传到服务器,因此它们的大小会影响出站流量。我们在所有数据集上使用$ r^{f}={3/4,1/2, 1/4,1/8} $。对于DG和PAMAP2数据集,我们为OPP数据集和$ r^{l} r^{l}=1/2 $保留$ r^{l}=1/16 $。 Formbinceplate_ratio演示,在OPP和DG数据集上,我们的系统可以将原始样本压缩成尺寸仅为原始样本的$ 1/4 $,而不会显着影响精度。在PAMAP2数据集上,从$ 1/2 $增加到$ 3/4 $的$ r^f $可能导致可与监督流程方案相当的准确性。

Figure 13. Test accuracy with different compression ratios$ r^{f}={3/4,1/2, 1/4,1/8} $.$ r^{l}=1/16 $ for OPP and$ r^{l}=1/2 $ for both DG and PAMAP2. The accuracy on OPP and DG is not affected much when changing $ r^f $ from 3/4 to 1/4. PAMAP2 is more sensitive to the change of $ r^f $than the other two datasets.

Changing the compression ratio in our system allows us to exchange accuracy with model sizes, or When used data are not sensitive to the compression ratio (, Opp and DG), compressing samples into smaller representations may significantly reduce the size of local models that are uploaded from the clients to the server, which may lead to lower network traffic.


5.4 Running time at the edge 边缘运行时间

We evaluated the local activity recognition using both supervised FL and our system (LSTM+FC) with $ r^{f}=0.5 $ on a Raspberry Pi. As shown in Fig.pi, the processing time of our system is significantly lower ( $ p<0.001 $ ) than that of supervised FL on all datasets. Although the autoencoder in our system inevitably causes extra processing time as it increases the length of the local pipeline, the LSTM cell in the autoencoder encodes the input data into smaller representations. In contrast, the LSTM classifier in the supervised FL transforms input data into hidden states that have larger sizes. This reduction of the amount of data leads to a shorter overall processing time than that of the supervised FL.
我们在Raspberry Pi上使用监督FL和我们的系统(LSTM + FC)对本地活动识别进行了评估,$ r^{f}=0.5 $。如图14所示 ,我们系统的处理时间大大缩短了(p<0.001),而不是所有数据集上的监督FL。尽管我们系统中的自动编码器不可避免地会导致额外的处理时间,因为它增加了本地流水线的长度,但AutoEncoder中的LSTM单元将输入数据编码为更小的表示。相比之下,监控的LSTM分类器将输入数据转换为具有更大尺寸的隐藏状态。数据量的这种减少导致总的处理时间比受监督的FL的处理时间更短。


Combined with the results of Sec.compression, our experimental results show that running unsupervised learning on autoencoders can reduce both the size of local models and the size of data processed by classifiers. This can potentially improve not only the outbound network traffic, but also the efficiency of local activity recognition.


Discussion 讨论

Our experimental results show that HAR with semi-supervised FL can achieve comparable accuracy to that of supervised FL. We now discuss how these results can contribute to the system design of FL systems and possible research topics.


FL servers can do more than FedAvg FL服务器可以做的不仅仅是FEDAVG

In canonical FL systems, servers only hold global models and use the FedAvg algorithm to aggregate received local models into new global models. This design consideration is due to the privacy concerns of having personal data on the servers. Our findings suggest that running supervised learning with a small amount of labelled data on the servers can alleviate individual users from labelling their local data. Therefore, we suggest that FL systems may consider maintaining datasets that do not contain private information on their servers to support semi-supervised learning. Apart from implementing the FedAvg algorithm in every communication round, servers can conduct more epochs of supervised learning than individual clients can do, since they have more computational resources and fewer power constrains than clients do. This can help the performance of the models converge faster.


Learning useful representations, not bias 学习有用的表示,不是通过训练自动校正本地的偏倚

By training autoencoders locally, semi-supervised FL is not affected by Non-IID data because it does not use any labels locally. This sheds light on a new solution, which is different from data augmentation or limiting individual contributions from clients, to address the Non-IID data issue in FL. Although, our work focuses on semi-supervised FL where no labels are available on clients, we suggest that supervised FL can also consider learning general representations apart from the mappings from features to labels, and use the learned representations to help alleviate the bias caused by Non-IID data. Another possible application of semi-supervised FL is to defend against malicious users who attack the global model through data poisoning. Labels of local data are a common attack vector in FL. Adversaries can manipulate (, flipping) the labels in their local data to affect the performance of their local models, thereby affecting the performance of the global model. Such an attack will be removed if we do not use local labels. We suggest that security researchers in FL should consider semi-supervised FL as a possible scheme to defend from data poisoning attacks.

通过在本地训练自动编码器,半监督FL不受Non-IID数据的影响,因为它不在本地使用任何标签。这为解决FL中的非IID数据问题提供了一种新的解决方案,该解决方案不同于数据增强或限制客户的个人贡献 (Kairouz等人,2019)。尽管我们的工作集中在半监督FL上,客户端上没有可用的标签,但我们建议监督FL除了考虑从特征到标签的映射之外,还可以考虑学习一般表示,并使用学习到的表示来减轻由非IID数据。

半监督FL的另一个可能应用是防御通过数据中毒攻击全局模型的恶意用户 (Kairouz等人,2019)。本地数据的标签是FL中常见的攻击向量。攻击者可以操纵(例如,翻转)其本地数据中的标签以影响其本地模型的性能,从而影响全局模型的性能。如果我们不使用本地标签,则将消除此类攻击。我们建议FL的安全研究人员应将半监督FL视为防止数据中毒攻击的一种可行方案。

Smaller models via unsupervised learning 通过无监督学习的较小模型

In supervised FL, as the complexity of an ML task goes up, the size of the model of the task increases. This eventually leads to increasing numbers of parameters and increasing network traffic when uploading local models to the server. Semi-supervised FL only uploads trained autoencoders from clients to the server. Our experimental results suggest that the performance of semi-supervised FL can still converge to an acceptable level even if we use high compression rates, which help with reducing a significant amount of model parameters. The compression rate can be considered as a system parameter that can be tuned to reduce the size of models, as long as the key representations can be learned and the performance of the system can be guaranteed. This makes our system suitable in scenarios that have demanding network conditions. Although in this paper, we only focus on the application of HAR, such effects may exist in other applications with different types of data, which should be further investigated.


Conclusions 结论

HAR using IoT sensory data and FL systems can empower many real-world applications including processing the daily activities and the changes to these activities in people living with long-term conditions. The difficulty of obtaining labelled data from end users limits the scalability of the FL applications for HAR in real-world and uncontrolled environments. In this paper, we propose a semi-supervised FL system to enable HAR in IoT environments. By training LSTM autoencoders through unsupervised learning on FL clients, and training Softmax classifiers through supervised learning on an FL server, our system can achieve higher accuracy than centralised systems and data augmentation based semi-supervised FL do. The accuracy is also comparable to that of a supervised FL system but does not require any locally labelled data. In addition, it is has simpler local models with smaller size and faster processing speed. Our future research plans are to investigate fully unsupervised FL systems that can support anomaly detection through analysing the difference between local and global models. We believe that such systems will enable many useful real-time applications in HAR where useful labels are rare or extremely difficult to collect.