Semiconductor News Top 30 Semiconductor News-Top 10

Mô hình ngôn ngữ nhỏ: Giải pháp triển khai mô hình ngôn ngữ tại Edge?

Tháng Mười Một 18, 2024 The Art Of Semi 0 Comments

While Large Language Models (LLMs) like GPT-3 and GPT-4 have quickly become synonymous with AI, LLM mass deployments in both training and inference applications have, to date, been predominately cloud-based. This is primarily due to the sheer size of the models; the resulting processing and memory requirements often overwhelm the capabilities of edge-based systems. While the efficiency of Expedera’s Origin NPU continues to increase dramatically (along with some other NPUs), most experts believe that memory will remain the bottleneck for some time in large-scale edge deployments of LLMs. In Expedera’s customer engagements, we have yet to find an instance where a desired edge LLM deployment was not memory-bound; the amount of memory required for reasonable LLM training or inference performance has greatly exceeded the edge device’s power, cost, and size budgets.

While memory catches up, the AI industry has focused on a new model type, the Small Language Model (SLM). Like LLMs, SLMs are language models. However, unlike LLMs which can have hundreds of billions of parameters, SLMs have significantly fewer parameters, often only a few million to several hundred million.

SLMs offer several significant advantages compared to LLMs when considering deployment on edge devices, including:

Power Efficiency: SLMs can run much more efficiently on edge devices, which have limited computational capacity and memory.
Faster Responses: SLMs are much faster at generating responses due to their smaller size, making them ideal for real-time applications like chatbots, voice assistants, and other interactive systems where latency is critical.
Lower Costs: SLMs require significantly fewer resources to train, store, and deploy. They use less memory, processing power, and energy, making them more affordable to operate.
Privacy-Friendly: As SLMs can be more easily deployed locally, they eliminate the need for external servers, reducing privacy risks.
Greater Control and Customization: SLMs are generally easier to fine-tune and specialize for narrow domains or specific tasks compared to LLMs, given their smaller size.

However, SLMs do offer drawbacks compared to their larger brethren, including:

Reduced Accuracy and Language Comprehension: SLMs typically have fewer parameters, which limits their ability to understand complex language, nuances, and detailed context.
Limited Generalization: With fewer parameters and often smaller training datasets, SLMs may struggle to generalize across diverse topics, making them less versatile.
Increased Bias and Reduced Robustness: Smaller models are more prone to bias since they may lack the depth and diversity of data exposure that LLMs benefit from.
Inability to Handle Complex or Multi-Step Reasoning: SLMs may have shorter context windows and are less capable of handling complex reasoning, logic-based tasks, or multi-step processes, limiting their use in applications requiring advanced problem-solving.

Even with the drawbacks, SLMs are seen as a near-term ‘path forward’ for edge deployment of language models. Here is a breakdown of some example SLMs:

Model	Number of Parameters (in M)	Description	Use Cases
DistilBERT	66	A smaller, faster, and lighter version of BERT, reportedly retaining about 95% of BERT’s language understanding while being 60% faster and smaller	Text classification, sentiment analysis, and question-answering tasks
TinyBERT	14.5	Optimized for efficient inference, with further compression than DistilBERT	Intent recognition, voice assistants, and contextual search in apps
ALBERT	12	ALBERT reduces BERT’s size by sharing parameters across layers and using factorized embedding parameterization; more lightweight and memory-efficient.	Document classification, named entity recognition (NER), others
MiniLM	22	MiniLM is a distilled version of Microsoft’s Transformer models	Text summarization, machine translation, and search engines
Reformer and Longformer	41 (Longformer)	Reformer and Longformer are optimized to handle long text sequences more efficiently than traditional transformers, allowing small- to medium-sized models to handle large inputs without major memory usage.	Document analysis, summarization, and handling long transcripts or legal documents in customer service and content moderation.
Ada and Babbage	350 (Ada)	Smaller versions of OpenAI’s language models	Classification, text completion, and basic conversational AI tasks.
T5-Small and T5-Base	60 (T5-Small)	Smaller variants of the T5 (Text-To-Text Transfer Transformer) models	Summarization, translation, and other language generation tasks

Small Language Models may be preferred over Large Language Models for edge deployments, where cost, efficiency, speed, and ease of deployment are prioritized. SLMs offer enhanced privacy by enabling on-device processing, requiring significantly less energy and prolonging battery life. They are also easier to fine-tune for specific tasks and more manageable to maintain, making them ideal for high-volume, real-time, or specialized edge applications where advanced language comprehension isn’t essential.