LIBRISTO
LIBROAMANTO
obvezno
Postanite del skupnosti ljubiteljev knjig z vsega sveta in uživajte v številnih ugodnostih. Ustvarite brezplačen račun
0
Brezplačna dostava Zásilkovna nad 69.99 €
Zbirna točka GLS 4.49 Zbirna točka DPD 2.99 Kurirska služba GLS 5.49 Kurir DPD 3.49 Kurirska služba 3.49 Zbirno mesto 3.49 Zbirno mesto 3.49 Dostava preko Pošte Slovenije 3.49

Brezplačna dostava za naročila nad 69.99 € na paketomatih Pošte Slovenije.

Local LLM Inference Optimization

A Comprehensive Guide to Quantization, Hardware Acceleration, and Efficient Private AI Deployment

Jezik AngleščinaAngleščina
Knjiga Mehka
Knjiga Local LLM Inference Optimization Thomas O. Greene
Koda Libristo: 52120727
Založba Independently published, april 2026
Stop Renting Intelligence. Start Optimizing Your Own.Do you want to run 70B parameter models on a si... Celoten opis
? points 38 b Novo Novo
15.79
Na zalogi pri dobavitelju Odposlali bomo v 9-15 dneh

30 dni za vračilo blaga

Stop Renting Intelligence. Start Optimizing Your Own.
Do you want to run 70B parameter models on a single consumer GPU? Are you tired of high API costs, network latency, and the privacy risks of cloud-based AI?
The "Local LLM Revolution" is here, but running Large Language Models (LLMs) privately is only half the battle. To make them truly useful, you must master Inference Optimization.
In Local LLM Inference Optimization, you will move beyond basic "out-of-the-box" setups and dive into the high-performance engineering required to squeeze every drop of power from your hardware. Whether you are using NVIDIA CUDA, Apple Silicon (MLX), or AMD ROCm, this comprehensive guide provides the technical blueprint for the sovereign engineer.

What You Will Master:

  • The Quantization Deep-Dive: Learn to navigate the "Quantization Tax" using GGUF, EXL2, AWQ, and GPTQ. Move from FP32 to 4-bit and even 1.58-bit (BitNet) without losing the model's "mind."
  • Advanced Memory Management: Defeat "Out of Memory" (OOM) errors by mastering KV Cache Management, PagedAttention, and FlashAttention 2 & 3.
  • The Speed Multipliers: Double your Tokens Per Second (TPS) using Speculative Decoding, Continuous Batching, and Lookahead Heuristics.
  • Hardware Architecture: Architect high-performance local servers using Multi-GPU Pipeline Parallelism and CPU/GPU offloading strategies.
  • Context Window Expansion: Use RoPE Scaling, YaRN, and LongRoPE to push 8k models to 128k+ context on consumer hardware.
  • The Full Local Stack: Step-by-step guides for Llama.cpp, Ollama, vLLM, and TGI (Text Generation Inference).
  • Security & Privacy: Deploy Air-Gapped AI environments and secure your infrastructure using Safetensors and local sandboxing.
Why This Book?
This book focuses on Deployment and Efficiency. It is written for the Lead Engineer, the Privacy-Conscious CTO, and the Prosumer Hobbyist who demands low Time to First Token (TTFT) and maximum Perf/Watt.
Stop paying for tokens. Own your weights. Optimize your future.

Igralka & Poliglotka
EWA KASP za
Predvajaj video
Ewa Kasp
Libristo ima največjo izbiro tujejezične literature. Zato svoje knjige kupujem tukaj.

O knjigi

Polni naslov Local LLM Inference Optimization
Jezik Angleščina
Vezava Knjiga - Mehka
Datum izida 2026
Število strani 170
EAN 9798258375193
Koda Libristo 52120727
Teža 237
Mere 152 x 229 x 9
Podarite to knjigo še danes
To je povsem preprosto
1 Dodajte knjigo v košarico in izberite dostavo kot darilo 2 V zameno vam bomo poslali kupon 3 Knjiga bo dostavljena na naslov obdarovanca

Prijava

Prijavite se v svoj račun. Še nimate računa Libristo? Ustvarite ga zdaj!

 
obvezno
obvezno

Še nimate računa? Izkoristite prednosti računa Libristo!

Z računom Libristo boste imeli vedno vse pod nadzorom.

Ustvarite račun Libristo
Knjižni svetovalec Libroamiko
Pozdravljeni, sem Libroamiko, vam lahko pomagam?