There is a saying in meteorology that you can be accurate more often than not if you predict tomorrow’s weather to be the same as today’s weather. Of course, that is not always the case unless you live in a place like San Diego or if you use data to make your predictions.
Forecasting in business requires data, lots of data, and it requires specialized data science skills, time, and tools to wrangle, prepare, and analyze the data.
Cloud solution providers such as AWS are enabling organizations to collect and host all that business data and provide tools to seamlessly integrate and analyze data for analytics and forecasting. Amazon Forecast is a managed service that consumes time series data and makes predictions without requiring the user to have any machine leaning knowledge or experience.
Determine Use Case and Acceptable Accuracy
It is important to identify use cases and accuracy criteria before generating forecasts. As machine learning tools become easier to use, hasty predictions that are ostensibly accurate become more common.
Without a prepared use case and a definition of acceptable accuracy, the usefulness of any data analysis will be in question. A common use case is predicting the customer demand of inventory items to ensure adequate supply. For inventory, a common use case is to expect that the predict demand will be higher than the actual demand 90% of the time to ensure adequate supply without overstocking.
Statistics-wise that would be a quantile loss or percentile denoted as P90.
How Amazon Forecast Works
To get started, you need to collect the historical and related time series data and upload it to Amazon Forecast. Amazon Forecast automatically inspects the data and identifies the key attributes and selects the appropriate machine learning algorithm, trains the model, and generates the forecasts. Forecasts can be visualized or the data can be exported for downstream processing.
Aggregate and Prepare Data
Time series data is often more granular than necessary for many use cases. If transaction data is collected across multiple locations (or device readings or inventory items) and the use case requires only a prediction of the total amount, the data will need to be aggregated before attempting any predictions.
Inconsistencies in time series data are common and should be analyzed and corrected as much as possible before attempting any predictions. In many cases, perfect corrections are impossible due to missing or inaccurate data and methods to smooth, fill, or interpolate the data will need to be employed.
Amazon Forecast Forecasts
Generating a forecast from Amazon Forecast is much easier than doing the prerequisite work. Amazon Forecast provides half a dozen predefined algorithms and an option for AutoML, which will evaluate all algorithms and choose one it determines to fit best.
Simple CSV files are uploaded, a Predictor is trained, and a Forecast is created. The end-to-end process usually takes several hours depending on the size of the data and the parameter settings. Once generated, you can see the results in a Forecast Lookup or export them back to CSV to be consumed by a data visualization service such as Amazon QuickSight.
If you skipped the prerequisites, you would look at the Forecast results and ask, “Now what?” If your results satisfy your use case and accuracy requirements, you can start working on other use cases and/or create an Amazon Forecast pipeline that delivers regular predictions.
Improving Forecast Accuracy
The most important factor affecting forecast accuracy is the quality and quantity of the data. Larger datasets are the first thing that should be tried. Data analysis might also be needed to ensure that the data is consistent.
If the generated Forecast has not satisfied the accuracy requirements you defined, it’s time to adjust some of the (hyper)parameters, include additional data, or both.
Parameters and Hyperparameters
Reducing the forecast horizon can increase accuracy; it’s easier to make shorter term predictions. Manually setting the Predictor’s algorithm to DeepAR+ will enable an advanced option called HRO which stands for Hyperparameter Optimization. Enabling HRO will cause the Predictor to run multiple times with different tweaks to attempt to increase the accuracy.
Related Time Series and Metadata
Related Time Series data (e.g., weather data, holidays) and Metadata (e.g. sub-categories of inventory items) can be added to the Dataset Group to attempt to increase the accuracy. Matching item_ids and making sure beginning and ending timestamps match the dataset can add additional overhead that may not be necessary depending on your accuracy requirements.
-Joey Brown, Sr Cloud Consultant