Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)
Sources: http://bair.berkeley.edu/blog/2025/04/11/prompt-injection-defense, http://bair.berkeley.edu/blog/2025/04/11/prompt-injection-defense/, BAIR Blog
Seeded from: BAIR Blog Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications. However, as LLMs have improved, so have the attacks against them. Prompt injection attack is listed as the #1 threat by OWASP to LLM-integrated applications, where an LLM input contains a trusted prompt (ins Read more: http://bair.berkeley.edu/blog/2025/04/11/prompt-injection-defense/
More resources
Cut Model Deployment Costs While Keeping Performance With GPU Memory Swap
Leverage GPU memory swap (model hot-swapping) to share GPUs across multiple LLMs, reduce idle GPU costs, and improve autoscaling while meeting SLAs.
Fine-Tuning gpt-oss for Accuracy and Performance with Quantization Aware Training
Guide to fine-tuning gpt-oss with SFT + QAT to recover FP4 accuracy while preserving efficiency, including upcasting to BF16, MXFP4, NVFP4, and deployment with TensorRT-LLM.
How Small Language Models Are Key to Scalable Agentic AI
Explores how small language models enable cost-effective, flexible agentic AI alongside LLMs, with NVIDIA NeMo and Nemotron Nano 2.
How to Scale Your LangGraph Agents in Production From a Single User to 1,000 Coworkers
Guidance on deploying and scaling LangGraph-based agents in production using the NeMo Agent Toolkit, load testing, and phased rollout for hundreds to thousands of users.
Introducing NVIDIA Jetson Thor: The Ultimate Platform for Physical AI
Jetson Thor combines edge AI compute, MIG virtualization, and multimodal sensors for flexible, real-time robotics at the edge, with FP4/FP8 acceleration and support for Isaac GR00T and large language/vision models.
NVFP4 Trains with Precision of 16-Bit and Speed and Efficiency of 4-Bit
NVFP4 is a 4-bit data format delivering FP16-level accuracy with the throughput and memory efficiency of 4-bit precision, extended to pretraining for large language models. This profile covers 12B-scale experiments, stability, and industry collaborations.