Machine Learning (ML) is the one of the latest IT trends
being touted by companies like AWS and Microsoft – its applications are many,from predicting in which animal species new Coronaviruses may develop to matching shippers and truckers to move freight more efficiently. Now, before you go jumping into your new ML project you have to be realistic – it’s not going to be
applicable for every business problem you have.
Read on for a definition of ML and a few tips to help make sure your
plans / business goals for your ML solution are well researched and make sense.
What is Machine Learning?
ML is an area of artificial
intelligence (AI) that gives systems or services the ability to automatically
learn and improve from experience without being explicitly programmed. Machine
learning focuses on the development of systems that can be presented with data
and can use this data to make inferences.
What are the typical business
problems that ML can solve?
Machine Learning can be used in
many areas where a solution cannot be programmed. Typical business problem
examples are in the areas of Forecasting, Document Analysis, Fraud Detection, Image
detection and Code review.
Here are some detailed use
cases: ML
Customer References
Tips for a successful ML project
1. Realistic Expectations
Not all business problems are well suited to Machine Learning and even if they are, there is no guarantee that you will be able to collect enough data at a high enough quality and at reasonable cost to develop an effective and reliable model. Even with plenty of high- quality data you will likely see challenges with the model algorithm that you select either initially or some time after production.
2. Data Collection / Visualisation Challenges
You will need to spend considerable time on
understanding and collecting data, bear in mind the following:
- Collecting observations and exploring Data Features
- Dealing with various data Structures such as CSV, JSON, Text, Image, Video, Audio
- Categorizing between Structured (i.e. with a Schema) and Unstructured Data
- Importance of distinguishing between labelled and unlabelled data, numerical or categorical features and choice of Supervised, Unsupervised or Reinforcement Learning.
- Understanding the concept of Ground Truth Data and its implications with respect to overall system performance and reliability
- Importance of feature engineering and splitting data into training, validation, testing data sets
3. Model Selection and Training
The importance of visualizing and charting data to look for patterns, corruptions, outliers, imbalances, relationships cannot be over emphasized.
Look at the advantage of using popular
charting tools (AWS Insights, TensorBoard, Tableau) and approaches such as KPI,
Scatter, Bubble, Bar, Table, Column, Line, Pie, Stacked
Understanding the 3 Machine
Learning model types
Supervised Learning –
Using Target Variable for Regression or Classification
Unsupervised Learning – No Target Variable for Clustering and
Dimensionality Reduction
Reinforcement Learning – Using Reward (Positive or Negative) Feedback
for Decisions
A Critical decision for your project will be the choice of a specific algorithm for the chosen model type and familiarity with the associated hyper-parameters.
Finally consider management of training jobs, resource levels and model deployment
4. Data and Environment Security
Your ML system needs to be as
secure as any production environment – remember you are dealing with large
amounts of data – make sure the environment is set up by an Infrastructure
Specialist prior to deployment to team. Your secure ML environment should
cover:
- Creation of Model Endpoints and Endpoint Hosting
- Endpoint Scaling and Resilience
- Continuous Monitoring of models for drift
- Production Model Debugging
- In-Transit and At Rest Encryption Options
- Data Protection using Network Isolation
5. ML Pipeline Automation
Its Important to build a Continuous integration and continuous delivery service for ML
and consider the entire ML workflow and at scale.
Consider using tools such as AWS Step Functions to design and implement ML workflows and incorporate tracking using Amazon SageMaker ML Lineage Tracking.
6. ML Service Selection
Using Machine Learning Built-in Services can help you accelerate your ML
journey. AWS has been named a leader in the Gartner’s Cloud AI Developer
services Magic Quadrant. Services include:
- Amazon Rekognition – Image and Video analysis
- AWS Comprehend – A natural language processing (NLP) service used to find insights and relationships in text
- AWS Lex – For building voice and text conversations interfaces for Applications
- AWS Polly – Turns text into lifelike speech
- Amazon Transcribe – Converts audio to text
- Amazon Textract – Understands and extracts printed text, handwriting and table data from scanned documents
- Amazon Translate – Provides language translation for text
- Amazon Forecast – Provides forecasts from time series data
- Amazon Kendra – Intelligent Search Service for websites and applications
- Amazon Fraud Detector – Used to identify potentially fraudulent activity
- Amazon Code Guru – Provides intelligent recommendations to improve your code quality
Additional ML services that work in conjunction with AWS can be purchased through the AWS Marketplace.
7. Cost Control
Costs can easily spiral out of control with large amounts of data so make sure costs are reviewed and budget alerts set up for your Environment.
This article should give you a good idea of how you need to prepare for your ML Journey. Good Luck with you project.