An Outlook for AI Innovation in Multimodal Communication Research

In Duffy Vincent G., Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management (HCII 2024). pp. 182–234 (2024)
  Copy   BIBTEX

Abstract

In the rapidly evolving landscape of multimodal communication research, this paper explores the transformative role of machine learning (ML), particularly using multimodal large language models, in tracking, augmenting, annotating, and analyzing multimodal data. Building upon the foundations laid in our previous work, we explore the capabilities that have emerged over the past years. The integration of ML allows researchers to gain richer insights from multimodal data, enabling a deeper understanding of human (and non-human) communication across modalities. In particular, augmentation methods have become indispensable because they facilitate the synthesis of multimodal data and further increase the diversity and richness of training datasets. In addition, ML-based tools have accelerated annotation processes, reducing human effort while improving accuracy. Continued advances in ML and the proliferation of more powerful models suggest even more sophisticated analyses of multimodal communication, e.g., through models like ChatGPT, which can now “understand” images. This makes it all the more important to assess what these models can achieve now or in the near future, and what will remain unattainable beyond that. We also acknowledge the ethical and practical challenges associated with these advancements, emphasizing the importance of responsible AI and data privacy. We must be careful to ensure that benefits are shared equitably and that technology respects individual rights. In this paper, we highlight advances in ML-based multimodal research and discuss what the near future holds. Our goal is to provide insights into this research stream for both the multimodal research community, especially in linguistics, and the broader ML community. In this way, we hope to foster collaboration in an area that is likely to shape the future of technologically mediated human communication.

Other Versions

No versions found

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 107,895

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Multimodal AI: Teaching Machines to See, Hear, and Understand.Shruti Anita Yadav Meera Komal Saxena - 2022 - International Journal of Multidisciplinary and Scientific Emerging Research 10 (2).
Emergence of Multimodal Solutions.Deshmukh Dev Kiran - 2025 - International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering (Ijareeie) 14 (2):489-494.
Transformer Architectures Beyond NLP: Applications in Vision and Multimodal Learning.Ashok Mehta Gupta Jai Patel Iyer, Kirti Kumar Yadav - 2023 - International Journal of Multidisciplinary and Scientific Emerging Research 11 (2).
Beyond ChatGPT: The Evolution of Conversational AI.Isha Pramila Gupta Kavya Deepa Agarwal - 2022 - International Journal of Multidisciplinary and Scientific Emerging Research 10 (2).

Analytics

Added to PP
2024-06-04

Downloads
25 (#1,045,125)

6 months
9 (#638,931)

Historical graph of downloads
How can I increase my downloads?

Author Profiles

Reetu Bhattacharjee
University of Münster
Jens Lemanski
University Tübingen

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references