Danny Weber
04:58 04-04-2026
© RusPhotoBank
Microsoft launches three AI models for multimodal data processing, including speech recognition, audio generation, and visual content handling, integrated into its platforms.
Microsoft is strengthening its position in the artificial intelligence field by introducing three new models designed for text, voice, and image processing. The company is betting on developing its own technologies and aims to solidify its standing amid increasing competition among major market players.
As reported by the Central News Service, these new solutions share a multimodal approach: they can process different types of data within a single ecosystem. Among them are a speech recognition model supporting dozens of languages, an audio generation tool capable of creating custom voices, and a system for handling visual content, including image and video generation.
All these developments are already being integrated into Microsoft platforms, including Foundry and the Playground test environment. The company emphasizes that the primary focus is on the practical application of AI in users' daily tasks, and future development will be built on combining proprietary technologies with partner solutions.