With rapid development of unmanned aerial vehicle (UAV) technology, application of UAVs for task offloading has received increasing interest in academia. However, real-time interaction between one UAV and the mobile edge computing node is required for processing the tasks of mobile end users, which significantly increases the system overhead and is unable to meet the demands of large-scale artificial intelligence (AI)-based applications. To tackle this problem, in this article, we propose a new architecture for UAV clustering to enable efficient multi-modal multi-task offloading. With the proposed architecture, the computing, caching, and communication resources are collaboratively optimized using Al-based decision making. This not only increases the efficiency of UAV clusters, but also provides insight into the fusion of computation and communication.