Microsoft launches Phi-3 its smallest AI model till date

Microsoft

Microsoft rolled out the latest iteration of its compact AI model, Phi-3 Mini, marking the debut of the first of three petite models slated for release. Phi-3 Mini boasts 3.8 billion parameters and undergoes training on a dataset smaller in scale compared to hefty language models like GPT-4. It’s currently accessible via Azure, Hugging Face, and Ollama platforms.

Microsoft’s roadmap includes the release of Phi-3 Small, packing 7 billion parameters, and Phi-3 Medium, with 14 billion parameters. Parameters essentially indicate the model’s capacity to comprehend complex instructions. In December, the company launched Phi-2, which delivered performance on par with larger models such as Llama 2. Microsoft claims that Phi-3 outperforms its predecessor and is capable of delivering responses comparable to those of a model ten times its size.

Eric Boyd, corporate vice president of Microsoft Azure AI Platform, told The Verge that Phi-3 Mini possesses the same capabilities as large language models like GPT-3.5, just in a more compact size.

When pitted against their larger counterparts, smaller AI models typically offer cost-effective operation and superior performance on personal devices such as phones and laptops. Earlier this year, The Information disclosed Microsoft’s initiative to establish a team dedicated to developing lightweight AI models. In addition to Phi, the company has crafted Orca-Math, a model designed to tackle mathematical problems.

Microsoft’s rivals also offer their own lineup of compact AI models, primarily aimed at handling simpler tasks such as document summarization or aiding in coding tasks. Google’s Gemma 2B and 7B models excel in powering basic chatbots and language-related endeavors. Anthropic’s Claude 3 Haiku is proficient in parsing through dense research papers featuring graphs and swiftly summarizing their contents. Meanwhile, Meta’s newly introduced Llama 3 8B model serves purposes like chatbot support and coding assistance.

Boyd explains that developers utilized a “curriculum” to train Phi-3. They drew inspiration from the way children absorb knowledge through bedtime stories and books featuring simpler language and sentence structures, yet discussing broader subjects.

“There aren’t enough children’s books out there, so we took a list of more than 3,000 words and asked an LLM to make ‘children’s books’ to teach Phi,” Boyd says.

He further explained that Phi-3 essentially refined the knowledge accumulated by its predecessors. While Phi-1 concentrated on coding and Phi-2 started delving into reasoning, Phi-3 exhibits enhanced proficiency in both coding and reasoning tasks. Despite possessing some general knowledge, the Phi-3 lineup falls short of matching the breadth of a GPT-4 or another large language model (LLM). There’s a significant contrast in the types of responses attainable from an LLM trained on the entirety of the internet compared to a smaller model like Phi-3.

Boyd mentions that many companies discover that smaller models like Phi-3 are more effective for their customized applications. This is particularly true because, for many companies, their internal datasets tend to be relatively small. Additionally, since these models require less computing power, they are often much more cost-effective.

Rohit Arora