A Collaborative AI-enabled Pretrained Language Model for AIoT Domain Question Answering

Hongyin Zhu, Prayag Tiwari, Ahmed Ghoneim, M. Shamim Hossain

Research output: Contribution to journalArticleScientificpeer-review


Large-scale knowledge in the Artificial Intelligence of Things (AIoT) field urgently needs effective models to understand human language and automatically answer questions. Pre-trained language models (PLMs) achieve state-of-the-art performance on some question answering (QA) datasets, but few models can answer questions on AIoT domain knowledge. Currently, AIoT domain lacks sufficient QA datasets and large-scale pre-training corpora. We propose RoBERTa-AIoT to address the problem of the lack of high-quality large-scale labeled AIoT QA datasets. We construct an AIoT corpus to further pre-train RoBERTa and BERT. RoBERTa-AIoT and BERT-AIoT leverage unsupervised pre-training on a large corpus composed of AIoT-oriented Wikipedia webpages to learn more domain-specific context and improve performance on the AIoT QA tasks. To fine-tune and evaluate the model, we construct 3 AIoT QA datasets based on the community QA websites. We evaluate our approach on these datasets and the experimental results demonstrate the significant improvements of our approach.
Original languageEnglish
JournalIEEE Transactions on Industrial Informatics
Publication statusAccepted/In press - 14 Jul 2021
MoE publication typeA1 Journal article-refereed


  • AIoT
  • Question answering
  • RoBERTa
  • BERT
  • Domain-specific


Dive into the research topics of 'A Collaborative AI-enabled Pretrained Language Model for AIoT Domain Question Answering'. Together they form a unique fingerprint.

Cite this