by Joche Ojeda | Jan 2, 2024 | A.I
This article demonstrates the process of creating, training, saving, and loading a spam detection AI model using ML.NET, but also emphasizes the reusability of the trained model. By following the steps in the article, you will be able to create a model that can be easily reused and integrated into your .NET applications, allowing you to effectively identify and filter out spam emails.
Prerequisites
- Basic understanding of C#
- Familiarity with ML.NET and machine learning concepts
Code Overview
-
- Import necessary namespaces:
using System;
using System.IO;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;
-
- Define the
Email
class and its properties:
public class Email
{
public string Content { get; set; }
public bool IsSpam { get; set; }
}
-
- Create a sample dataset for training the model:
var sampleData = new List<Email>
{
new Email { Content = "Buy cheap products now", IsSpam = true },
new Email { Content = "Meeting at 3 PM", IsSpam = false },
};
-
- Initialize a new MLContext, which is the main entry point to ML.NET:
var mlContext = new MLContext();
-
- Load the sample data into an IDataView:
var trainData = mlContext.Data.LoadFromEnumerable(sampleData);
-
- Define the data processing pipeline and the training algorithm (SdcaLogisticRegression):
var pipeline = mlContext.Transforms.Text.FeaturizeText("Features", nameof(Email.Content))
.Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression());
-
- Train the model:
var model = pipeline.Fit(trainData);
-
- Save the trained model as a .NET binary:
mlContext.Model.Save(model, trainData.Schema, "model.zip");
-
- Load the saved model:
var newMlContext = new MLContext();
DataViewSchema modelSchema;
ITransformer trainedModel = newMlContext.Model.Load("model.zip", out modelSchema);
-
- Create a prediction engine:
var predictionEngine = mlContext.Model.CreatePredictionEngine<Email, SpamPrediction>(trainedModel);
-
- Test the model with a sample email:
var sampleEmail = new Email { Content = "Special discount, buy now!" };
var prediction = predictionEngine.Predict(sampleEmail);
-
- Output the prediction:
Debug.WriteLine($"Email: '{sampleEmail.Content}' is {(prediction.IsSpam ? "spam" : "not spam")}");
-
- Assert that the prediction is correct:
Assert.IsTrue(prediction.IsSpam);
-
- Verify that the model was saved:
if(File.Exists("model.zip"))
Assert.Pass();
else
Assert.Fail();
Conclusion
In this article, we explained a simple spam detection model in ML.NET and demonstrated how to train and test the model. This code can be extended to build more complex models, and can be used as a starting point for exploring machine learning in .NET.
Github Repo
by Joche Ojeda | Dec 13, 2023 | A.I
Introduction to Machine Learning in C#: Spam using Binary Classification
This example demonstrates the basics of machine learning in C# using ML.NET, Microsoft’s machine learning framework specifically designed for .NET applications. ML.NET offers a versatile, cross-platform framework that simplifies integrating machine learning into .NET applications, making it accessible for developers familiar with the .NET ecosystem.
Technologies Used
- C#: A modern, object-oriented programming language developed by Microsoft, which is widely used for a variety of applications. In this example, C# is used to define data models, process data, and implement the machine learning pipeline.
- ML.NET: An open-source and cross-platform machine learning framework for .NET. It is used in this example to create a machine learning model for classifying emails as spam or not spam. ML.NET simplifies the process of training, evaluating, and consuming machine learning models in .NET applications.
- .NET Core: A cross-platform version of .NET for building applications that run on Windows, Linux, and macOS. It provides the runtime environment for our C# application.
The example focuses on a simple spam detection system. It utilizes text data processing and binary classification, two common tasks in machine learning, to classify emails into spam and non-spam categories. This is achieved through the use of a logistic regression model, a fundamental algorithm for binary classification problems.
Creating an NUnit Test Project in Visual Studio Code
Setting up NUnit for DecisionTreeDemo
-
-
Install .NET Core SDK
Download and install the .NET Core SDK from the .NET official website.
-
Install Visual Studio Code
Download and install Visual Studio Code (VS Code) from here. Also, install the C# extension for VS Code by Microsoft.
-
Create a New .NET Core Project
Open VS Code, and in the terminal, create a new .NET Core project:
dotnet new console -n DecisionTreeDemo
cd DecisionTreeDemo
-
Add the ML.NET Package
Add the ML.NET package to your project:
dotnet add package Microsoft.ML
-
Create the Test Project
Create a separate directory for your test project, then initialize a new test project:
mkdir DecisionTreeDemo.Tests
cd DecisionTreeDemo.Tests
dotnet new nunit
-
Add Required Packages to Test Project
Add the necessary NUnit and ML.NET packages:
dotnet add package NUnit
dotnet add package Microsoft.NET.Test.Sdk
dotnet add package NUnit3TestAdapter
dotnet add package Microsoft.ML
-
Reference the Main Project
Reference the main project:
dotnet add reference ../DecisionTreeDemo/DecisionTreeDemo.csproj
-
Write Test Cases
Write NUnit test cases within your test project to test different functionalities of your ML.NET application.
Define the Data Model for the Email
Include the content of the email and whether it’s classified as spam.
public class Email
{
[LoadColumn(0)]
public string Content { get; set; }
[LoadColumn(1), ColumnName("Label")]
public bool IsSpam { get; set; }
}
Define the Model for Spam Prediction
This model is used to determine whether an email is spam.
public class SpamPrediction
{
[ColumnName("PredictedLabel")]
public bool IsSpam { get; set; }
}
Write the test case
// Create a new ML context for the application, which is a starting point for ML.NET operations.
var mlContext = new MLContext();
// Example dataset of emails. In a real-world scenario, this would be much larger and possibly loaded from an external source.
var data = new List
{
new Email { Content = "Buy cheap products now", IsSpam = true },
new Email { Content = "Meeting at 3 PM", IsSpam = false },
// Additional data can be added here...
};
// Load the data into the ML.NET data model.
var trainData = mlContext.Data.LoadFromEnumerable(data);
// Define the data processing pipeline. Here we are featurizing the text (i.e., converting text into numeric features) and then applying a logistic regression model.
var pipeline = mlContext.Transforms.Text.FeaturizeText("Features", nameof(Email.Content))
.Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression());
// Train the model on the loaded data.
var model = pipeline.Fit(trainData);
// Create a prediction engine for making predictions on individual data samples.
var predictionEngine = mlContext.Model.CreatePredictionEngine<Email, SpamPrediction>(model);
// Create a sample email to test the model.
var sampleEmail = new Email { Content = "Special discount, buy now!" };
var prediction = predictionEngine.Predict(sampleEmail);
// Output the prediction to the console.
Debug.WriteLine($"Email: '{sampleEmail.Content}' is {(prediction.IsSpam ? "spam" : "not spam")}");
Assert.IsTrue(prediction.IsSpam);
-
Running Tests
Run the tests with the following command:
dotnet test
As you can see the test will pass because the sample email contains the word “buy” that was used in the training data and was labeled as spam
You can download the source code for this article here
This article has explored the fundamentals of machine learning in C# using the ML.NET framework. By defining specific data models and utilizing ML.NET’s powerful features, we demonstrated how to build a simple yet effective spam detection system. This example serves as a gateway into the vast world of machine learning, showcasing the potential for integrating AI technologies into .NET applications. The skills and concepts learned here lay the groundwork for further exploration and development in the exciting field of machine learning and artificial intelligence.
by Joche Ojeda | Dec 5, 2023 | A.I
Brief History and Early Use Cases of Machine Learning
Machine learning began shaping in the mid-20th century, with Alan Turing’s 1950 paper “Computing Machinery and Intelligence” introducing the concept of machines learning like humans. This period marked the start of algorithms based on statistical methods.
The first documented attempts at machine learning focused on pattern recognition and basic learning algorithms. In the 1950s and 1960s, early models like the perceptron emerged, capable of simple learning tasks such as visual pattern differentiation.
Three Early Use Cases of Machine Learning:
- Checker-Playing Program: One of the earliest practical applications was in the late 1950s when Arthur Samuel developed a program that could play checkers, improving its performance over time by learning from each game.
- Speech Recognition: In the 1970s, Carnegie Mellon University developed “Harpy,” a speech recognition system that could comprehend approximately 1,000 words, showcasing early success in machine learning for speech recognition.
- Optical Character Recognition (OCR): Early OCR systems in the 1970s and 1980s used machine learning to recognize text and characters in images, a significant advancement for digital document processing and automation.
How Machine Learning Works
Data Collection: The process starts with the collection of diverse data.
Data Preparation: This data is cleaned and formatted for use in algorithms.
Choosing a Model: A model like decision trees or neural networks is chosen based on the problem.
Training the Model: The model is trained with a portion of the data to learn patterns.
Evaluation: The model is evaluated using a separate dataset to test its effectiveness.
Parameter Tuning: The model is adjusted to improve its performance.
Prediction or Decision Making: The trained model is then used for predictions or decision-making.
A Simple Example: Email Spam Detection
Let’s consider an email spam detection system as an example of machine learning in action:
- Data Collection: Emails are collected and labeled as “spam” or “not spam.”
- Data Preparation: Features such as word presence and email length are extracted.
- Choosing a Model: A decision tree or Naive Bayes classifier is selected.
- Training the Model: The model learns to associate features with spam or non-spam.
- Evaluation: The model’s accuracy is assessed on a different set of emails.
- Parameter Tuning: The model is fine-tuned for improved performance.
- Prediction: Finally, the model is used to identify spam in new emails.
Conclusion
Machine learning, from its theoretical inception to its contemporary applications, has undergone significant evolution. It encompasses the preparation of data, selection and training of a model, and the utilization of that model for prediction or decision-making. The example of email spam detection is just one of the many practical applications of machine learning that impact our daily lives.