Meet Sibyl – DoorDash’s New Prediction Service
DoorDash’s prediction service, Sibyl, was created to handle real-time predictions in their machine learning infrastructure. Sibyl retrieves both models and features from independent stores to make predictions. The prediction service allows batch and shadow predictions to test multiple models on the same data in the background. Sibyl was built to be scalable and fast, allowing other services to call it for predictions instead of making predictions individually. To achieve this, Sibyl caches models in memory fetches features during each prediction and asynchronously makes predictions for each feature set. The service has a lifecycle of a typical request, where it grabs the models and config from an in-memory cache, finds missing features and attempts to retrieve them from the feature store, makes predictions, and finally constructs a response. These design decisions and the overall architecture of the Sibyl prediction service were made to make the service highly scalable and able to handle hundreds of thousands of predictions per second.
More detail on Sibyl
- Model caching: To ensure that predictions are as fast as possible, the Sibyl service caches all models in memory when it starts up. This way, instead of fetching the model from the model store for every request, the service can retrieve it from memory, which is much faster. The model cache is stored in a shared memory cache, a thread-safe data structure accessible to all worker threads.
- Asynchronous prediction computation: Another critical component of the Sibyl service is its asynchronous prediction computation. This means that the forecasts are computed in the background on separate worker threads instead of blocking the request thread while predictions are made. This allows the request thread to continue processing other requests while forecasts are being made, improving the overall performance and scalability of the service.
- Shadow predictions: The Sibyl service also supports shadow predictions, essentially predictions made in the background using a different model than those used for official forecasts. This allows teams to test multiple candidate models on the same data without switching the official prediction model each time. Shadow predictions are also made asynchronously, so they do not impact the performance of the official predictions.
- Feature and model fetching: The Sibyl service brings features and models from different stores. Parts are conveyed for each prediction, and the service retrieves them from a Redis cache of feature values. For models, the service retrieves them from the model store when it starts up and caches them in memory, eliminating the need to fetch them for each request.
- Model configs: Besides caching the models, the Sibyl service also caches model configs, which contain information about the models, such as the required features, default fallback values for parts, and model type. The model configs determine the components needed for each prediction and what to do if an element is missing.
In conclusion, the Sibyl prediction service was built with scalability, performance, and flexibility. The service can handle hundreds of thousands of predictions per second without breaking a sweat by using asynchronous prediction computation, caching models and model configs in memory, and supporting shadow predictions. And by focusing on just prediction, it provides a simple, fast, and reliable service for other components in the DoorDash ML infrastructure ecosystem.
The article is “Meet Sibyl – DoorDash’s New Prediction Service – Learn about its Ideation, Implementation and Rollout.“