Mistral Announces Pixtral 12B Multimodal AI Model With 'Computer Vision' Feature

By: gadgets.ndtv.com

Sep 12 2024
0
0 Views

Mistral Announces Pixtral 12B Multimodal AI Model With 'Computer Vision' Feature

Mistral released its first multimodal artificial intelligence (AI) model dubbed Pixtral 12B on Wednesday. The AI firm, known for its open-source large language models (LLMs), has also made the latest AI model available on GitHub and Hugging Face for users to download and test out. Notably, despite being multimodal, Pixtral can only process images using computer vision technology and answer queries about them. Two special encoders have been added for this functionality. It cannot generate images like the Stable Diffusion models or Midjourney's Generative Adversarial Networks (GANs).

Mistral Releases Pixtral 12B

Gaining a reputation for minimalist announcements, the official account of Mistral on X (formerly known as Twitter) released the AI model in a post by sharing its magnet link. The total file size of Pixtral 12B is 24GB, and it will require an NPU-enabled PC or one with a powerful GPU to run the model.

The Pixtral 12B comes with 12 billion parameters and is built using the company's existing Nemo 12B AI model. Mistral highlights users will also need the Gaussian Error Linear Unit (GeLU) as the vision adapter and 2D Rotary Position Embedding (RoPE) as the vision encoder.

Mistral’s Large 2 Could Offer Similar Performance as Meta Llama 3.1 405B

Notably, users can upload image files or URLs to the Pixtral 12B and it should be able to answer queries about the image such as identifying the objects, counting the number of objects, and sharing additional information. Since it is built on Nemo, the model will also be adept at completing all the typical text-based tasks as well.

A Reddit user posted an image about the benchmarking scores of Pixtral 12B, and it appears that the LLM outperforms Claude-3 Haiku and Phi-3 Vision in multimodal capabilities on the ChartQA bench. It also outperforms both rival AI models on the Massive Multitask Language Understanding (MMLU) bench for multimodal knowledge and reasoning.

Citing the company spokesperson, TechCrunch reports that the Mistral AI model can be fine-tuned and used under an Apache 2.0 license. This means the outputs from the model can be used for personal or commercial usage without restrictions. Additionally, Sophia Yang, the Head of Developer Relations at Mistral clarified in a post that Pixtral 12B will soon be available on Le Chat and Le Platforme.

For now, users can directly download the AI model using the magnet link provided by the company. Alternatively, the model weights have also been hosted on Hugging Face and GitHub listings.

anchor links ads by Easy Branches

Mistral Announces Pixtral 12B Multimodal AI Model With 'Computer Vision' Feature

Mistral Releases Pixtral 12B

Related

Mistral Announces Pixtral 12B Multimodal AI Model With 'Computer Vision' Feature

Apple Introduces App Store 'Win-Back Offers' Allowing Developers to Prompt Customers to Re-Subscribe

Vivo T3 Ultra With MediaTek Dimensity 9200+ SoC, 5,500mAh Battery Launched in India: Price, Specifications

Huawei Mate XT Ultimate Design Allegedly Powered by 8-Core Kirin 9010 Chipset With 64-Bit Architecture

Amazon Great Indian Festival 2024 Sale Announced With Early Access for Prime Members, SBI Card Benefits

Lava Blaze 3 5G Confirmed to Launch in India Soon; Teased to Get 50-Megapixel Rear Camera

China to Build Lunar Base at the South Pole by 2035, Unveiling Ambitious Two-Phase Plan

Realme P2 Pro 5G Price in India, Colour Options, Key Features Revealed Ahead of September 13 Launch

Epic Games, EA, Roblox Among Video Game Companies Hit with EU Complaint for 'Tricking Consumers'

Top EU Privacy Regulator Opens Probe Into Google's AI Compliance

Google Search Results Will Now Show Archived Web Pages via Wayback Machine

Adobe Firefly Video Model With AI-Powered Video Generation Feature Teased

Apple Introduces App Store 'Win-Back Offers' Allowing Developers to Prompt Customers to Re-Subscribe