- Introduces container image caching for SageMaker Inference, pre‑pulling large model images to halve scale‑out startup latency.
- Enables up to 2× faster provisioning of new instances during generative AI scale‑out, with no changes required to existing endpoints.
- Available across all commercial AWS regions and works with accelerator types, single‑model and inference component endpoints.
Community impact
Community ratings: 0 Useful, 0 Noise, 0 Risky, 0 Broke, 0 Try.