Man-in-the-Middle Attacks on MQTT-based IoT Using BERT Based ...€¦ · Message Queuing Telemetry Transport (MQTT) protocol for com-munications. This attack scheme consists of an

Man-in-the-Middle Attacks on MQTT-based IoT Using BERTBased Adversarial Message Generation

Henry Wong Tie Luo∗Department of Computer Science

Missouri University of Science and Technology, MO 65401, USAE-mail: {henrycwong, tluo}@mst.edu

ABSTRACTWith IoT devices becoming more ingrained into everyday life andbusiness, attacks on Internet-of-Things (IoT) systems can be costlyand, in extreme cases, cause life-threatening situations and hugeeconomic loss. Denial-of-Service (DoS) attacks have been well stud-ied in cybersecurity, partly because of the Mirai malware which hasshown extensive damage to an insecure network. However, Man-in-the-Middle (MITM) attacks have been largely overlooked especiallyin IoT networks. In this paper, we introduce a new scheme of aMan-in-the-Middle (MITM) attack on IoT devices that utilize theMessage Queuing Telemetry Transport (MQTT) protocol for com-munications. This attack scheme consists of an MQTT Parser thatis created to dissect and alter MQTT messages at the bit level, and anovel BERT-based adversarial model that generates malicious mes-sages using an approach inspired by GAN. We present the designof this attack in order to show how a sophisticated attack couldlead to serious damage that is difficult for typical security defensemechanisms to detect. We set up a test-bed using IoT hardware andsoftware including Raspberry Pi, WiFi Pineapple, Mosquitto, etc. toconduct experiments. We show that our designed attack schemesuccessfully evades logistic regression, random forest, K-nearestneighbor, and support vector machine (SVM) based anomaly de-tection models. Multi-Layer Perceptron fares better against ourmodel, but such use of deep neural networks on typical IoT devicesis rather restricted due to the computation cost. In summary, theresults show that the MITM attack is effective against a wide rangeof typical anomaly detection mechanisms.

KEYWORDSInternet of things (IoT) security, man-in-the-middle attack, denial-of-service attack, anomaly detection, MQTT, BERT.

1 INTRODUCTIONRecent years have seen a sharp increase of the popularity of Internet-of-Things (IoT) devices due to their various benefits and low cost.The popularity, however, comes at a price with an large number ofattacks targeting IoT devices, such as Mirai [2] and BrickerBot [9]as well as more conventional DDoS, spoofing, and Sybil attacks.Among these, direct hacking such as malware takes up 45% of allsecurity breaches [3], which demonstrates that firewalls are ofteninadequate to protect networked systems.

Message Queuing Telemetry Transport (MQTT) is a lightweightapplication-layer protocol that uses a publish-subscribe model to

KDD’20Workshops: the 3rd InternationalWorkshop onArtificial Intelligence of Things(AIoT), August 24, 2020, San Diego, CA.∗Corresponding author.

enable communication among low-resource IoT devices. A pub-lisher is typically a data-generating device (e.g., a sensor) that sendsmessages with a topic to a broker. Then a subscriber is typically auser device (e.g., smartphone) that receives messages from the bro-ker on the topics the user device subscribes to. The elegant designand performance has gained MQTT tremendous support and wideuse to the extent of becoming “the de facto standard of IoT” [23],surpassing other popular protocols including ZigBee, CoAP, andLoRa (regardless of layers in a protocol stack).

However, MQTT does not have a sophisticated security mecha-nism on its own, but relies on SSL/TLS (Secure Socket Layer/TransportLayer Security) to encrypt MQTT messages. Unfortunately, thisis poorly enforced in IoT deployments. To demonstrate this, werun a Shodan query1 of “port: 1883”, the default port for MQTT,and the result shows over 416,618 ports on (unsecured) TCP con-nections; on the other hand, a query for “port: 8883”, the defaultSSL/TSL port for MQTT, only shows 38 ports on (secured) TCPconnections (as of May 16, 2020). This vast ratio of 12,625:1 presentsa serious security vulnerability in IoT systems that use MQTT. Notethat this vulnerability is not easy to fix, because of legacy reasons(an enormous number of existing deployments), practical reasons(users lack of technical know-how on SSL/TLS configuration), andtechnical reasons (SSL/TLS consumes extra resources which can beunfriendly to some IoT devices).

In this paper, we show that this vulnerability can indeed beexploited by adversarial users and such exploitation can be hard todetect. To this end, we design a Man-in-the-Middle (MITM) attackscheme. MITM attacks, also known as hijack attack, are a cyberattack where an attacker intercepts a network connection to alterthe data being transferred between the two ends. For example, inFigure 1, an attacker hijacks the wireless connection between anaircraft and a control tower, in such a way that the aircraft deemsthe attacker as the control tower and the control tower deems theattacker as the aircraft; thus, the attacker serves as a malicious relaythat can both eavesdrop and alter communications without beingnoticed.

The reason we choose MITM attacks is because it grants theattacker a large degree of freedom for manipulation, by havingboth read and write access (within a network of IoT devices) tosensitive or critical information without being noticed. In compar-ison, eavesdropping has no write access, and both jamming andDenial of Service (DoS) attacks can be easily detected. On the otherhand, MITM attacks are technically challenging to design becausethe attacks need in-depth understanding of protocol and networkdetails.1Shodan is dubbed as the “Google of IoT” which allows users to search the Internetfor various IoT devices.

H. Wong and T. Luo

Figure 1: Man-in-the-Middle attack: an illustration.

To make our design sophisticated, in the sense that the attackcan be automated and is hard to be detected (typically by anomalydetection mechanisms), we are inspired by generative adversarialnetworks (GAN) [7] and use the Bidirectional Encoder Represen-tations from Transformers (BERT) model[6], an NLP pre-trainingtechnique, to construct a discriminator in our attack model. Thedesign also involves the creation of a new MQTT Parser written inPython. We verify the effectiveness of our attack scheme againstfive representative anomaly detection algorithms using real IoTdevices. In summary, this paper makes the following contributions:• We design a Man-in-the-Middle attack to show the vulnera-bility of the “the de facto standard of IoT”, MQTT, and howthe vulnerability can be exploited.• We develop the first open-source MQTT Parser2 that candissect and modify MQTT packets as per need.• We are the first to apply an NLP technique (BERT) to thedesign of MITM attacks; in particular, we show how to auto-mate the alteration of MQTT messages by constructing anadversarial model, which is inspired by GAN and based onBERT. We have also open-sourced this adversarial model.3• We conduct experiments with real IoT devices and proto-cols (Raspberry Pi’s, WiFi Pineapple, Eclipse Mosquitto, andEclipse Paho). The results show that our designed MITMattack is elusive to detect, and evades most representativeanomaly detection algorithms.

2 RELATEDWORK2.1 MQTT SecurityWith an increasing number of cyber attacks each year, the field ofIoT security continues to expand, mostly towards Intrusion Detec-tion Systems (IDS). The ARTEMIS Intrusion Detection System [5]uses multiple machine learning techniques to detect maliciousMQTT messages with a promising accuracy of 0.9998 when us-ing OneClassSVM. However, the tested attacks were generatedwith the malaria toolkit [18] which leaves it up to the user to cre-ate custom malicious messages, allowing the possibility of biaseddata, and the messages that were tested only contain numerical(temperature) values.

Besides IDS, authentication is another method to bolster security.A typical example is the use of OAuth 1.0, which has been exploredfor MQTT-based IoT devices [17]. This method allows low-powered2https://github.com/HenryCWong/MQTTPython3https://github.com/HenryCWong/adversarialBERTMessages

devices that cannot use SSL/TLS to connect to an OAuth server.However, the configuration and implementation are technicallychallenging for most IoT users, leaving the method only useful toexpert users. OAuth 1.0 specializes in protecting against Denial-of-Servie (DoS) and flooding attacks only, while MITM attacks can belaunched if the attacker can replicate the authentication certificate.

Both of the above approaches, as well as most work done in theMQTT field, recommend using Secure Socket Layer and TransportLayer Security (SSL/TLS) to secure MQTT communications if pos-sible. SSL and TLS are cryptographic protocols that allow commu-nications to be encrypted, thereby making attacks like sniffing andMITM more difficult. However, as mentioned by Niruntasukrat etal. [17], some devices do not have the resources to support SSL/TLS.To this end, one can use a layer based approach [21] which offloadsthe processing (e.g., cryptographic) task to other devices that haveenough resources. However, this approach specializes in preventingDoS and flooding attacks rather than Man-in-the-Middle attacks.There are other IoT security approaches such as JPAKE [10] (ondevice) and DIoT [16] (federated learning based), but they alsoprioritize protecting against DoS attacks.

2.2 BERT and Feature-Based LearningBERT was introduced in 2018 by Google [6], which is an appli-cation of feature based learning to Natural Language Processing.It is a multi-layer bidirectional transformer encoder utilizing thetransformer introduced by Vaswani el al. [24], and improves on topof left-to-right [15], right-to-left, and bidirectional feature models[19] in terms of speed and accuracy.

Since the introduction of BERT to the NLP community, manyvariations of the pre-training model has surfaced, including BioBert[12], SciBERT [4], and ClinicalBERT [1, 8]. There are also BERTvariations in terms of size of the model, such as DistilBert [22]and Albert[11] which are smaller and faster versions of BERT thatreduce the number of feature in BERT but as a result sacrificeperformance. Other distinct variations of BERT with extended fea-tures include ERNIE [25] which incorporates a knowledge graph topre-training, M-BERT [20] which is a model that pre-trains on 104different languages, and ViLBERT [14] that uses both textual and vi-sual inputs to train models. Models that reportedly claim to have tobetter results than Bert like RoBERTa [13] have surfaced. RoBERTautilizes dynamic masking, Full-Sentence input (to achieve zero lossin Next-Sentence Prediction (NSP)), training in larger batches, Byte-Pair Text Encoding, and training on data sets ten times larger thanthe data BERT uses. However, RoBERTa requires much more re-sources, with an entire day of training on state-of-the-art GPUs,while BERT only requires a few hours.

3 DESIGN OF ATTACKTo facilitate understanding, and without loss of generality, supposethe victim is an aircraft engine or data center (hardware), whichneeds to be operated within a certain temperature range. The hard-ware is monitored and controlled by a thermostat over a wirelessnetwork via the MQTT protocol (see Figure 2). The goal of theattack is to disrupt this monitoring and control system such thatmalicious temperature can be injected to cause the aircraft engineor the data center servers to shutdown or explode, which can resultin life or huge economic loss.

https://github.com/HenryCWong/MQTTPython

https://github.com/HenryCWong/adversarialBERTMessages

KDD’20 Workshops (AIoT), August 24, 2020, San Diego, CA

Figure 2: Victim system (an example): Temperaturemonitor-ing and control for an aircraft engine or a data center.

3.1 OverviewWe consider a representative case where the victim network con-nection goes through a WiFi router. In fact, one of the main reasonsthat MQTT has gained such significant popularity is because itcan operate over 802.11 or WiFi, which is ubiquitously availableand offers higher data transmission rate than other protocols suchas Zigbee or Bluetooth. For attacking purposes, we use a WiFiPineapple, which is a penetration testing tool created to allow cy-ber security professionals to test the vulnerability of a computernetwork. In our case, we use the WiFi Pineapple to de-authenticateand spoof the legitimate WiFi router (by spoofing the WiFi’s SSID)and leverage the Pineapple’s capability of injecting raw packets todevices connected to it.

Figure 3 provides an overview of our MITM attack. The ther-mostat (publisher) connects to the aircraft engine or data center(subscriber) via the WiFi router, mediated by a MQTT broker.

(1) The WiFi Pineapple first masquerades as the WiFi router byusing the same SSID and waits for devices to connect to thePineapple.

(2) Once the devices (are fooled to) connect to the WiFi Pineap-ple, the MQTT packets are dismantled and altered (Section3.2) into malicious messages. Note that the WiFi Pineappleis still connected to a legitimate WiFi connection to provideconnection for the rest of the network, so that the networkstill appears normal to the devices being attacked.

(3) The malicious messages dictate the temperature to be sentto the MQTT subscriber (climate control units of the air-craft engine or data center), where the temperature variesstrategically based on an adversarial model (Section 4).

(4) The temperature is received by the victim (MQTT subscriber)which then regulates temperature in the attacker-desiredmanner and causes damage.

3.2 MQTT ParserDespite MQTT being a popular IoT protocol, there does not exist alibrary that can parse MQTT packets, which is important becausesuch a library is the foundation to enable protocol analysis forMQTT. This motivated us to write a MQTT parser in Python, whichcan dissect MQTT packets into bytes and modify them as per need,and we have open-sourced it.

Figure 3: Overview of the MITM attack.

Our MQTT parser functions as follows. First, WiFi Pineappleuses the TCPDump module to dump all the received TCP packetsinto .pcap files (a format originally created for TCPDump and anabbreviation for packet capture), each containing multiple (in ourcase 4000) packets. Next, each .pcap file is passed to the MQTTparser, which will create a MQTT Object, then call its internal func-tions to decode the file at the bit level to ASCII into MQTT ObjectTypes (messageType, messageLength, topicLength, topicName, mes-sageWords); see Figure 4. Following that, a function translates thehexadecimal message into ASCII to send to an adversarial modeldescribed in Section 4 to generate a malicious message. Anotherfunction then takes this malicious message and translates it backinto hexadecimal format. Finally, the WiFi Pineapple injects thisaltered (and malicious) message to the subscriber (victim).

Figure 4: Structure of a MQTT packet.

4 ADVERSARIAL MESSAGE GENERATIONThis section describes an adversarial model that can be used to notonly automate the malicious message generation process, but alsoachieve the following objectives:

H. Wong and T. Luo

• The malicious messages need to conform to standard pro-tocol specification and the message content needs to makesense to the receiver (so that the device will take the attacker-intended actions to cause damage);• The malicious messages need to be prudently designed so asto bypass the incumbent anomaly detection systems.

In this paper, we cast the problem as an adversarial game, inwhich twomodels confront each other to create amaliciousmessage.This idea is inspired by Generative Adversarial Networks (GAN),but we do not use GAN because its complexity is not necessary tohandle our particular case of MQTT.

Resembling GAN, we construct one model to act as a generatorto generate malicious messages, and construct the other model toact as a discriminator that separates real messages from maliciousmessages. Algorithm 1 outlines the overall process between theGenerator and the Discriminator.

Algorithm 1: GAN-Inspired Adversarial ModelInput: 𝑋 , a set of words from captured MQTT MessagesOutput: 𝛾 , to be used by Generator

1 𝛾 ← Arbitrary Large Number2 𝜌 ← a small number between (−𝛾, 0)3 𝑠𝑐𝑜𝑟𝑒 ← 04 while optimal (lowest) 𝑠𝑐𝑜𝑟𝑒 is not found do5 𝑋 ← Generator(𝑋 ,𝛾 )6 𝑠𝑐𝑜𝑟𝑒 ← Discriminator(𝑋 ,𝛾 ,𝜌)7 𝛾 ← 𝛾 − 𝜌

4.1 GeneratorThe generator uses two algorithms. For numerical values such astemperature (Algorithm 2), it takes a range of random numbersbetween one standard deviation, 𝑆𝑥 , below and above the mean,(̄𝑋 ), scaled by a weight 𝛾 . The value 𝛾 is determined by Algorithm4.

Algorithm 2: Generator for Numeric ValuesInput: 𝑋 : A set of numbers from captured MQTT Messages

𝛾 : Adjusted WeightOutput: 𝛽 : A malicious set of numbers

1 𝑋 ←𝑚𝑒𝑎𝑛(𝑋 )2 𝑆𝑋 ← 𝑠𝑡𝑑 (𝑋 )3 𝛽 ← ∅4 for all 𝑋𝑖 ∈ 𝑋 do5 𝛽𝑖 ← 𝑟𝑎𝑛𝑑𝑜𝑚((𝑋 − (𝛾 · 𝑆𝑋 ), 𝑋 + (𝛾 · 𝑆𝑋 ))

For non-numeric (i.e., alphabetic) messages, we need a differentapproach. We choose a word (or multiple words depending on thesize of the message) and find the antonym of that word as outlinedin Algorithm 3. In choosing the word, we prioritize adjectives be-cause most MQTT use-cases are IoT devices that send messagesto describe an event or object. The chosen word is determined by

𝛾 , and hence each iteration will choose a different word. To iden-tify adjectives for the experiment, the Natural Language Toolkit(NLTK) Library’s Wordnet corpus for Python was used. The methodof processing non-numeric messages is formulated in Algorithm 3.

Algorithm 3: Generator for Non-Numeric ValuesInput: 𝑋 : A set of words from captured MQTT Messages

𝛾 : Adjusted WeightOutput: 𝛽 : A malicious set of numbers

1 𝛿 ← 02 \ ← ℎ𝑎𝑠ℎ(𝛾) ⊲ hash() converts 𝛾 into a unique integer3 for all 𝑋𝑖 ∈ 𝑋 do4 𝛿 ← Number of Words mod \

5 𝛽𝑖 ← Antonym of 𝛿-th word in the message 𝑋𝑖(prioritizing adjectives)

4.2 DiscriminatorThe discriminator utilizes BERT and a Neural Network to differen-tiate between malicious messages made by the Generator and realmessages that were received from the original sender. BERT allowsmodels to learn from all the words in a sentence without havingto suffer from long-term dependencies. It has a self-attention unit,which calculates the pair-wise correlation between all the words ineach sentence in order to tune a weight on each word. BERT is bidi-rectional, in the sense that it can calculate from both left-to-rightand right-to-left.

In this paper, we use BERT to pre-train the MQTT messagesto check for bi-directional context which allows the model to findcorrelations between words. In particular, we use DistilBERT [22]which is a lighter version of BERT: BERT pre-trains with 12 layerswhile DistilBert only uses 10 layers and removes other featuressuch as token-type embeddings and pooling, while retaining 97% ofBERT’s performance. With marginal performance loss, DistilBertis faster (hence helpful in avoiding being detected) and is reportedto perform better than ELMo [19].

To further expedite the process, we batch all themessages throughpadding and tokenization before running the messages throughDistilBERT. After pre-training, a Multi-Layer Perceptron was usedto classify whether the messages were malicious messages from theGenerator or original messages from the captured MQTT packets.Algorithm 4 illustrates how the Discriminator works.

Algorithm 4: DiscriminatorInput: 𝑋 : A set of words from captured MQTT Messages

𝛾 : Adjusted WeightOutput: 𝑠𝑐𝑜𝑟𝑒 : Accuracy of MLPClassifier()

1 𝑧 ← BERT(𝑋 )2 𝑠𝑐𝑜𝑟𝑒 ←MLPClassifier(𝑧) ⊲ Classifier returns the accuracy

KDD’20 Workshops (AIoT), August 24, 2020, San Diego, CA

5 EXPERIMENTS5.1 Testbed SetupOur testbed uses two Raspberry Pi 3B’s, one as a Publisher and theother as a Subscriber. The testbed also consists of a WiFi Pineappleto performMain-in-the-Middle attack, and aWiFi access point (AP).We use MQTT version 1.5.7 and Mosquitto version 1.6.9 to sendand receive MQTT messages, where the control scripts are writtenin Python using the Eclipse Paho 3.1.1 library. The messages con-tain various sensory data including temperature, weather forecasts,wind direction, and humidity. To interact with the WiFi Pineappleremotely, we use Secure Shell (SSH) from a laptop terminal.5.2 ResultsOur primary goal was to evaluate the performance of our adver-sarial model, by testing a range of anomaly detection (AD) modelsagainst the malicious messages generated by the adversarial model(from 250 messages used to train). These messages were chosenfrom messages that were classified as false negatives in the Dis-criminator. We choose five typical AD models that use Logistic Re-gression, Random Forest, K-Nearest Neighbor (KNN), Multi-LayerPerceptron (MLP), and Support Vector Classification (SVC). In par-ticular, random forest and SVC are also used in IDS systems likeARTEMIS [5]; MLP is relatively more resource-consuming andhence not always suitable for IoT devices, but we still include it tosee how our model performs against such an “upperbound”.

In Table 1, the second column gives the results for our adver-sarial model with BERT removed, in the format of number of FalseNegatives (mis-detection) over the total number of malicious mes-sages (False Negatives + True Positives). The third column (FN)gives the false negatives when BERT is used (the complete model).The fourth column represents the improvement of the third columnover the second. We can see that our adversarial model fares wellagainst Logistic Regression, Random Forest, and SVC. In the caseof KNN, it is better to disable BERT. However, in either case (withor without BERT), our model is able to achieve a very high falsenegative rate (more than 63%), indicating that it is effective even inthe case of KNN. In the case of MLP, using BERT is not desirableas there is a substantial decrease in FN rate. However, MLP as adeep neural network model does not fit most IoT devices due to theamount of required resource; furthermore, even in (rare) cases suchdeep models are used, we can simply remove BERT to still achievea high (more than 50%) false negative rate, which is desired bythe attacker. Finally, Table 1 demonstrates that BERT significantlyimproves the malicious messages against the SVC model with animprovement of 92.5%; in fact, it completely evades the AD model,resulting in a 100% mis-detection (FN) rate.

Figure 5 shows how the choice of 𝛾 values affects accuracy andfalse negatives within the adversarial model. The accuracy andfalse negatives shown are results of the Discriminator using anMLP classifier. The figure shows the optimal 𝛾 to be around 1.8since the accuracy is low and the number of false negatives ishigh. Since even the Discriminator cannot classify these maliciousmessages well at these 𝛾 values, it is unlikely for IoT devices withless resources than the attacker running the adversarial model toachieve a better accuracy or false negative rate.

Figures 5, 6, and 7 are based on information recorded while theadversarial model was running. Both Figures 6 and 7 are results of

Classification Model FN w/o BERT FN ImprovementLogistic Regression 58/102 81/90 58.3%Random Forest 69/102 89/90 46.2%

KNN 69/102 57/90 -6.4%MLP 99/102 12/90 -86.3%SVC 53/102 90/90 92.5%

Table 1: Results of using different classification models todetect malicious messages generated from our adversarialmodel. The FN (third) column represents results for ouradversarial model using BERT. The total numbers of mali-cious messages in columns two and three are different (102vs 90) because the total number of malicious messages areonly messages that fooled the Discriminator, which variesdepending on whether BERT is used.

Figure 5: Effect of 𝛾 on Accuracy and False Negatives on theDiscriminator (the FN here is not to be confused with the FNin Table 1).

running the multiple classification models against messages thatwere output by the Generator. Both had no direct interaction withthe Discriminator, unlike Figure 5 which records the values from theDiscriminator. The results in Table 1 were collected after runningthe adversarial model and were tested with the malicious messagesat optimal 𝛾 values.

Figure 6 shows the number of false negatives of each AD modelwhile the Adversarial model was running. All models except MLPand KNN show a slight inverse parabolic curve as the number offalse negatives peaks at an approximate optimal 𝛾 (around 1.8).An optimal 𝛾 is further visualized in Figure 7 which describes theaccuracy of the same situation as Figure 6. This Figure instead repre-sents a slight parabolic curve (with the exception being MLP) witha similar optimal 𝛾 . This proves that for the generated maliciousmessages, a 𝛾 is not just optimal for the Discriminator model, butalso in multiple AD models.

6 CONCLUSIONIn this paper, we present a well-designed MITM attack on MQTT-based IoT devices, due to the popularity and wide deployment ofMQTT in IoT networks. This was intended to motivate follow-upresearch on such less studied attacks in the IoT context. Our MITMattack scheme consists a MQTT Parser created for dissecting andaltering MQTT messages, and an adversarial model to generate

H. Wong and T. Luo

Figure 6: Number of false negatives in the evaluation againstmultiple anomaly detection classifiers. The highlightedcurves indicate the best and the worst performance.

Figure 7: Accuracy in the evaluation against multiple anom-aly detection classifiers. The highlighted curves indicate thebest and the worst performance.

malicicous messages. The adversarial model is inspired by GANand uses BERT in the discriminator to generate messages that aremalicious but elusive to possible anomaly detection mechanisms.Our experiments have shown promising results against commonlyused classification based anomaly detection algorithms.

Future work could include exploring different attack mechanicsfor Man-in-the-Middle attacks without using a WiFi Pineapple inorder to show more attack possibilities. Another interesting venuethat could be explored is the use of BERT in Intrusion DetectionSystems within IoT devices that transmit complex messages.

REFERENCES[1] Emily Alsentzer, John R Murphy, Willie Boag, Wei-Hung Weng, Di Jin, Tristan

Naumann, and Matthew McDermott. 2019. Publicly available clinical BERTembeddings. arXiv preprint arXiv:1904.03323 (2019).

[2] Manos Antonakakis, Tim April, Michael Bailey, Matt Bernhard, Elie Bursztein,Jaime Cochran, Zakir Durumeric, J Alex Halderman, Luca Invernizzi, MichalisKallitsis, et al. 2017. Understanding the mirai botnet. In 26th {USENIX} Security

Symposium ({USENIX} Security 17). 1093–1110.[3] G Bassett. 2015. Database Breach Investigations Report, Verizon.[4] Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A pretrained language

model for scientific text. In Proceedings of the 2019 Conference on Empirical Meth-ods in Natural Language Processing and the 9th International Joint Conference onNatural Language Processing (EMNLP-IJCNLP). 3606–3611.

[5] E. Ciklabakkal, A. Donmez, M. Erdemir, E. Suren, M. K. Yilmaz, and P. Angin.2019. ARTEMIS: An Intrusion Detection System for MQTT Attacks in Internet ofThings. In 2019 38th Symposium on Reliable Distributed Systems (SRDS). 369–3692.

[6] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert:Pre-training of deep bidirectional transformers for language understanding. arXivpreprint arXiv:1810.04805 (2018).

[7] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley,Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversar-ial Networks. In Proceedings of International Conference on Neural InformationProcessing Systems (NIPS). 2672–2680.

[8] Kexin Huang, Jaan Altosaar, and Rajesh Ranganath. 2019. Clinicalbert: Modelingclinical notes and predicting hospital readmission. arXiv:1904.05342 (2019).

[9] ICS-CERT. 2017. BrickerBot Permanent Denial-of-Service Attack (Update A).https://ics-cert.us-cert.gov/alerts/ICS-ALERT-17-102-01A.

[10] Jun Young Kim, Ralph Holz, Wen Hu, and Sanjay Jha. 2017. Automated Analysisof Secure Internet of Things Protocols. In Proceedings of the 33rd Annual ComputerSecurity Applications Conference (Orlando, FL, USA) (ACSAC 2017). Associationfor Computing Machinery, New York, NY, USA, 238–249. https://doi.org/10.1145/3134600.3134624

[11] Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, PiyushSharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learningof language representations. arXiv preprint arXiv:1909.11942 (2019).

[12] Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim,Chan Ho So, and Jaewoo Kang. 2020. BioBERT: a pre-trained biomedical languagerepresentation model for biomedical text mining. Bioinformatics 36, 4 (2020),1234–1240.

[13] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, OmerLevy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: Arobustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692(2019).

[14] Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. ViLBERT: PretrainingTask-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. InAdvances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle,A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates,Inc., 13–23. http://papers.nips.cc/paper/8297-vilbert-pretraining-task-agnostic-visiolinguistic-representations-for-vision-and-language-tasks.pdf

[15] Andriy Mnih and Geoffrey E Hinton. 2009. A scalable hierarchical distributedlanguage model. InAdvances in neural information processing systems. 1081–1088.

[16] T. D. Nguyen, S. Marchal, M. Miettinen, H. Fereidooni, N. Asokan, and A. Sadeghi.2019. DÏoT: A Federated Self-learning Anomaly Detection System for IoT. InIEEE 39th International Conference on Distributed Computing Systems (ICDCS).756–767.

[17] A. Niruntasukrat, C. Issariyapat, P. Pongpaibool, K. Meesublak, P. Aiumsupucgul,and A. Panya. 2016. Authorization mechanism for MQTT-based Internet ofThings. In 2016 IEEE International Conference on Communications Workshops(ICC). 290–295.

[18] Karl Palsson. 2019. Malaria Toolkit. Retrieved May 31, 2020 from https://github.com/etactica/mqtt-malaria

[19] Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, ChristopherClark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized wordrepresentations. arXiv preprint arXiv:1802.05365 (2018).

[20] Telmo Pires, Eva Schlinger, and Dan Garrette. 2019. How multilingual is Multi-lingual BERT? arXiv preprint arXiv:1906.01502 (2019).

[21] G. Potrino, F. de Rango, and A. F. Santamaria. 2019. Modeling and evaluation ofa new IoT security system for mitigating DoS attacks to the MQTT broker. In2019 IEEE Wireless Communications and Networking Conference (WCNC). 1–6.

[22] Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. Dis-tilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXivpreprint arXiv:1910.01108 (2019).

[23] I Skerrett. Oct. 25, 2019. Why MQTT Has Become the De-Facto IoT Standard.https://dzone.com/articles/why-mqtt-has-become-the-de-facto-iot-standard

[24] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is Allyou Need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V.Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.).Curran Associates, Inc., 5998–6008. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

[25] Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu.2019. ERNIE: Enhanced language representation with informative entities. arXivpreprint arXiv:1905.07129 (2019).

https://doi.org/10.1145/3134600.3134624

https://doi.org/10.1145/3134600.3134624

http://papers.nips.cc/paper/8297-vilbert-pretraining-task-agnostic-visiolinguistic-representations-for-vision-and-language-tasks.pdf

http://papers.nips.cc/paper/8297-vilbert-pretraining-task-agnostic-visiolinguistic-representations-for-vision-and-language-tasks.pdf

https://github.com/etactica/mqtt-malaria

https://github.com/etactica/mqtt-malaria

https://dzone.com/articles/why-mqtt-has-become-the-de-facto-iot-standard

http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

Documents

Man-in-the-Middle Attacks on MQTT-based IoT Using BERT Based ...€¦ · Message Queuing Telemetry Transport (MQTT) protocol for com-munications. This attack scheme consists of an