Once you understand the benefits and structure of data science and machine learning (ML), it’s time to start implementation. While it’s not an overly complicated process, planning change management from implementation through replication can help mitigate potential pitfalls. We recommend following this 3-step process.
Step 1: Find Your Purpose
It can be fun to tinker around with shiny, new technology toys, but without specific goals, the organization suffers. Time and resources are wasted, and without proof of value-added, the buy-in necessary from leadership won’t happen. Why are you implementing this solution, and what do you hope to get out of the data you put in?
ML projects can produce several outcomes contributing to decisions fueled by data and gaining insights into customer buying behavior, which can be used to optimize the sales cycle with new marketing campaigns. Other uses could include utilizing predictive search to improve user experience, streamlining warehouse inventory with image processing, real-time fraud detection, predictive maintenance, or elevating customer service with voice to text speech recognition.
ML projects are typically led by a data scientist who is responsible for understanding the business requirements and who leverages data to train a computer model to learn patterns in very large volumes of data to predict outcomes while also improving the outcomes over time.
Successful ML solutions can generate 4-5% higher profit margins, so identify benchmarks, set growth goals, and integrate regular progress measurements to make sure you’re always on track with your purpose in mind.
Step 2: Apply Machine Learning
The revolutionary appeal for ML is that it does not require an explicit computer program to deliver analytics and predictions, it leverages a computer model that can be trained to predict and improve the outcomes. After the data scientist’s analysis defines the business requirements, they wrangle the necessary data to train the ML model by leveraging an algorithm, which is the engine that turns the data into a model.
Data preparation is critical to the success of the ML project because it is the foundation of everything that follows. Garbage in equals garbage out, but value in produces more value.
Raw data can be tempting, but data that isn’t clean, governed, and appropriate for business use corrupts the model and invalidates the outcome. Data needs to be prepared and ready, meaning it has been reviewed for accuracy, and it’s available and accessible to all users. Data is typically stored in a cloud data warehouse or data lake and it must be maintained with ongoing governance.
A common mistake organizations make is relying on data scientists to clean the data. Studies have found that data scientists spend 70% of their time wrangling data and only 30% of the time implementing the solution and delivering business value. These highly paid and skilled professionals are scarce resources trained for innovation and analyzing data, not cleaning data. Only after the data is clean should data scientists start their analysis.
The data scientist’s core expertise is in selecting the appropriate algorithm to process and analyze the data. The science in ML is figuring out which algorithm to use and how to optimize it to deliver accurate and reliable results.
Thankfully, ML algorithms are available today in all the major service provider platforms, and many Python and R libraries. The general use cases within reach include:
- Classification (is this a cat or is this not a cat) using anomaly detection, marketing segmentation, and recommendation engines.
- NLP (natural language progression) using autocomplete, sentiment, and understanding (i.e., chatbots).
- Timeseries using forecasting.
Algorithms are either supervised or unsupervised. Supervised learning algorithms start with training data and correct answers. Labeled data trains the model using the algorithm and feedback. Think texting and autocorrect – the algorithm is always learning new words based on your interaction with autocorrect. That feedback is delivered to the live model for updates and the feedback loop never ends.
Unsupervised learning algorithms start with unlabeled data. The algorithm divides the data into meaningful clusters used to make inferences about the records. These algorithms are useful for segmentation of click stream data or email lists.
Some popular algorithms include CNN (convolutional neuro network), a deep learning algorithm, K Means Clustering, PCA, Support Vector Machine, Decision Trees, and Logistic Regression.
With everything in place, it’s time to see if the model is doing what you need it to do. When evaluating model quality, consider bias and variance. Bias quantifies the algorithm’s limited flexibility to learn the pattern. Variance quantifies the algorithm’s sensitivity to specific sets of training.
Three things can happen when optimizing the model:
- Over-fitting: Low bias + high variance. The model is too tightly fitted to the training data, and it won’t generalize data it hasn’t seen before.
- Under-fitting: High bias + low variance. The model is new and hasn’t reached a point of accuracy. Get to over-fitting first, then back up and reiterate until the model fits.
- Limiting/preventing under/over-fitting: There are too many features in the model (i.e. data points used to build the model), and you need to either reduce them, or create new features from existing features.
Before unleashing your ML project on customers, experiment first with employees. Solutions like virtual assistance and chat bots that are customer-facing can jeopardize your reputation if they don’t add value to interactions with customers. Because ML influences decision-making, accuracy is a must before real-world implementation.
Step 3: Experiment and Push into Production
With software projects, it either works or it crashes. With data science projects, you have to see, touch, and feel the results to know if it’s working. Reach out to users for feedback and to ensure any changes to user experience are positive. Luckily, with the cloud, the cost of experimentation is low, so don’t be afraid to beta test before a full launch.
Once the model fits and you’ve pushed the project into production, make noise about it around the organization. Promote that you’re implementing something new and garner the attention of executive leadership. Unfortunately, 70% of data projects fail because they don’t have an executive champion.
Share your learnings internally using data, charts, results, and emphasizing company-wide impact. You’re not going to get buy in on day one, but as you move up the chain of command, earning more and more supporters, your budget will allow for more machine learning solutions. Utilize buzzwords and visual representations of the project – remember data science needs to be seen, touched, and felt.
Ensure ML and data science success with best practices for introducing, completing, and repeating implementation. 2nd Watch Data and Analytic Solutions help your organization realize the power of ML with proper data cleaning, the right algorithm selection, and quality model deployment. Contact Us to see how you can do more with the data you have.
-Sam Tawfik, Sr Marketing Manager, Data & Analytics