LIBRISTO
LIBROAMANTO
obvezno
Postanite del skupnosti ljubiteljev knjig z vsega sveta in uživajte v številnih ugodnostih. Ustvarite brezplačen račun
0
Brezplačna dostava Zásilkovna nad 69.99 €
Zbirna točka GLS 4.49 Zbirna točka DPD 2.99 Kurirska služba GLS 5.49 Kurir DPD 3.49 Kurirska služba 3.49 Zbirno mesto 3.49 Zbirno mesto 3.49 Dostava preko Pošte Slovenije 3.49

Brezplačna dostava za naročila nad 69.99 € na paketomatih Pošte Slovenije.

AI Inference Optimization Engineering

Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

Jezik AngleščinaAngleščina
Knjiga Mehka
Knjiga AI Inference Optimization Engineering ChatVariety Team
Koda Libristo: 52770465
Založba Independently published, junij 2026
Slash LLM Deployment Costs and LatencyDeploying Large Language Models (LLMs) in production is a mass... Celoten opis
? points 26 b Kmalu Kmalu Novo Novo
10.80
Pričakovana zaloga Naselitev 07. 06. 2026

30 dni za vračilo blaga

Slash LLM Deployment Costs and Latency

Deploying Large Language Models (LLMs) in production is a massive economic and engineering hurdle. AI Inference Optimization Engineering is your comprehensive, hands-on guide to mastering the full stack of modern LLM optimization techniques. From memory-bandwidth solutions to hardware-specific compilation, this book bridges the gap between research-level models and enterprise-grade execution.

What you will master inside this book:
  • Hardware-Aware Optimization: Dive deep into KV cache mechanics, autoregressive decoding, and GPU memory hierarchies to eliminate latency bottlenecks.
  • State-of-the-Art Quantization: Apply GPTQ, AWQ, and GGUF compression algorithms to scale down massive neural networks without sacrificing model accuracy.
  • Advanced Acceleration Methods: Implement speculative decoding with draft models (like Medusa and Eagle), PagedAttention, and FlashAttention to boost throughput by 2-3x.
  • Production-Grade Serving: Build ultra-low-latency deployment infrastructures using vLLM, Triton Inference Server, and continuous batching.
  • Cross-Platform Deployment: Optimize models for specific target hardware, including NVIDIA H100 (TensorRT-LLM), Apple Silicon (llama.cpp/Metal), and Qualcomm mobile/edge accelerators.

Whether you are an ML infrastructure engineer, an AI platform architect, or a technical leader looking to scale LLMs cost-effectively, this book provides the production-ready code, equations, and architectural patterns you need to build hyper-efficient AI pipelines.

Igralka & Poliglotka
EWA KASP za
Predvajaj video
Ewa Kasp
Libristo ima največjo izbiro tujejezične literature. Zato svoje knjige kupujem tukaj.

O knjigi

Polni naslov AI Inference Optimization Engineering
Jezik Angleščina
Vezava Knjiga - Mehka
Datum izida 2026
Število strani 96
EAN 9798199720021
Koda Libristo 52770465
Teža 142
Mere 152 x 229 x 5
Podarite to knjigo še danes
To je povsem preprosto
1 Dodajte knjigo v košarico in izberite dostavo kot darilo 2 V zameno vam bomo poslali kupon 3 Knjiga bo dostavljena na naslov obdarovanca

Prijava

Prijavite se v svoj račun. Še nimate računa Libristo? Ustvarite ga zdaj!

 
obvezno
obvezno

Še nimate računa? Izkoristite prednosti računa Libristo!

Z računom Libristo boste imeli vedno vse pod nadzorom.

Ustvarite račun Libristo