Multimodal Text - Search News

Multimodal text guided network for chest CT pneumonia classification

Pneumonia is a prevalent and serious respiratory disease, responsible for a significant number of cases globally. With advancements in deep learning, the automatic diagnosis of pneumonia has attracted ...

Nature

Hierarchical cross-modal attention and dual audio pathways for enhanced multimodal sentiment analysis

This paper presents a new architecture for multimodal sentiment analysis exploiting hierarchical cross-modal attention mechanisms, as well as two parallel lanes for audio analysis. Traditional ...

Forbes

Beyond Large Language Models: How Multimodal AI Is Unlocking Human-Like Intelligence

The AI industry has long been dominated by text-based large language models (LLMs), but the future lies beyond the written word. Multimodal AI represents the next major wave in artificial intelligence ...

InfoQ

NVIDIA Unveils NVLM 1.0: Open-Source Multimodal LLM with Improved Text and Vision Capabilities

A monthly overview of things you need to know as an architect or aspiring architect. Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with ...

Frontiers

Multimodal Annotation for Intangible Cultural Heritage: Embodied Knowledge and Technology

The field of Intangible Cultural Heritage (ICH) preservation increasingly depends on multimodal data, ranging from motion ...

Mashable

French startup Mistral unveils Pixtral 12B, its first multimodal AI model

French AI startup Mistral has dropped its first multimodal model, Pixtral 12B, capable of processing both images and text. The 12-billion-parameter model, built on Mistral’s existing text-based model ...

VentureBeat

Meta introduces Chameleon, a state-of-the-art multimodal model

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now As competition in the generative AI field ...

InfoWorld

Microsoft’s Phi-4-multimodal AI model handles speech, text, and video

Microsoft has introduced a new AI model that, it says, can process speech, vision, and text locally on-device using less compute capacity than previous models. Innovation in generative artificial ...

Frontiers

Multimodal World Models, Embodiment, and Cognitive Amplification

Multimodal models and world models are emerging as promising frameworks for extending language-based AI beyond text, towards ...

VentureBeat

Meta Introduces Spirit LM open source model that combines text and speech inputs/outputs

Just in time for Halloween 2024, Meta has unveiled Meta Spirit LM, the company’s first open-source multimodal language model capable of seamlessly integrating text and speech inputs and outputs.

17d

Google unveils Gemma 4 12B, bringing advanced multimodal AI to 16 GB laptops

Google has launched Gemma 4 12B, a new open-weight artificial intelligence model that can run locally on laptops with as little as 16 GB of RAM while handling text, images and audio. The company says ...

moneycontrol.com

Google's new AI tool can create videos from text. Here's how Gemini Omni works

Did our AI summary help? Google has launched Gemini Omni in India, giving users access to its newest artificial intelligence tool for creating and editing videos. Announced at Google I/O 2026, the ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results