Run A.I models locally with Ollama

by Joche Ojeda | Jan 4, 2024 | A.I

Ollama AI Framework

Ollama is an advanced AI framework designed for running large language models (LLMs) locally on personal computers. It simplifies the deployment of these models by integrating model weights, configurations, and data into a single, user-friendly package. The framework is known for two key features: its Command Line Interface (CLI) Read-Eval-Print Loop (REPL) and its REST API.

CLI Read-Eval-Print Loop (REPL)

The CLI REPL is a significant aspect of Ollama, providing an interactive shell for executing and managing models. This feature enhances usability for users who prefer command-line tools for development, testing, and interaction with LLMs.

REST API

Additionally, Ollama’s REST API expands its usability across different programming languages. This API facilitates interaction with Ollama from various environments, allowing developers to integrate LLMs into a wide range of applications.

List of Available Models in Ollama

The Ollama framework supports a variety of large language models (LLMs). Here’s a list of some of the models that Ollama can run:

Llama 2: A versatile model with 7 billion parameters, suitable for a variety of applications.
Code Llama: Tailored for coding-related tasks, with 7 billion parameters.
Mistral: A general-purpose 7 billion parameter model.
Dolphin Phi: A smaller model with 2.7 billion parameters, for less resource-intensive applications.
Phi-2: Similar to Dolphin Phi, with 2.7 billion parameters.
Neural Chat: Focused on conversational tasks, with 7 billion parameters.
Starling: A general-purpose model with 7 billion parameters.
Llama 2 Uncensored: An uncensored version of Llama 2 with 7 billion parameters.
Llama 2 (13B): An upscaled version with 13 billion parameters for more demanding tasks.
Llama 2 (70B): The largest variant with 70 billion parameters, aimed at complex applications.
Orca Mini: A smaller model with 3 billion parameters for applications with limited resources.
Vicuna: Another 7 billion parameter model for various tasks.
LLaVA: With 7 billion parameters, suitable for general-purpose applications.

Note: These models have different computational and memory requirements. It’s recommended to have at least 8 GB of RAM for the 7 billion parameter models, 16 GB for the 13 billion models, and 32 GB for the 70 billion models.

Overall, Ollama is distinguished by its ability to run LLMs locally, leading to advantages like reduced latency, no data transfer costs, increased privacy, and extensive customization of models. Its support for a variety of open-source models and adaptability for use with different programming languages, including Python, make it versatile for various applications, ranging from Python development to web development.

For more information, visit the official Ollama website here and the GitHub page here.

Understanding LLM Limitations and the Advantages of RAG

by Joche Ojeda | Jan 3, 2024 | A.I

Navigating the Limitations of Large Language Models: Understanding Outdated Information, Lack of Data Sources, and the Comparative Advantages of Retrieval-Augmented Generation (RAG)

Introduction

In the rapidly evolving field of artificial intelligence, Large Language Models (LLMs) like OpenAI’s GPT series have become central to various applications. However, despite their impressive capabilities, these models exhibit certain undesirable behaviors that can impact their effectiveness. This article delves into two significant limitations of LLMs – outdated information and the absence of data sources – and compares their functionality with Retrieval-Augmented Generation (RAG), highlighting the advantages of RAG over traditional fine-tuning approaches in LLMs.

1. Outdated Information in Large Language Models

A prominent issue with LLMs is their reliance on pre-existing datasets that may not include the most current information. Since these models are trained on data available up to a certain point in time, any developments post-training are not captured in the model’s responses. This limitation is particularly noticeable in fields with rapid advancements like technology, medicine, and current affairs.

2. Lack of Data Source Attribution

LLMs generate responses based on patterns learned from their training data, but they do not provide references or sources for the information they present. This lack of transparency can be problematic in academic, professional, and research settings where source verification is crucial. Users may find it challenging to distinguish between factual information, well-informed guesses, and outright fabrications.

Comparing LLMs with Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) presents a solution to some of the limitations faced by LLMs. RAG combines the generative capabilities of LLMs with the information retrieval aspect, pulling in data from external sources in real-time. This approach allows RAG to access and integrate the most recent information, overcoming the outdated information issue inherent in LLMs.

Why RAG Excels Over Fine-Tuning in LLMs

Fine-tuning involves additional training of a pre-trained model on a specific dataset to tailor it to particular needs or improve its performance in certain areas. While effective, fine-tuning does not address the core issues of outdated information and source attribution.

Dynamic Information Update: Unlike fine-tuned LLMs, RAG can access the latest information, ensuring responses are more current and relevant.
Source Attribution: RAG provides the ability to trace back the information to its source, enhancing credibility and reliability.
Customizability and Flexibility: RAG can be customized to pull information from specific databases or sources, catering to niche requirements more effectively than a broadly fine-tuned LLM.

Conclusion

While Large Language Models have transformed the AI landscape, their limitations, particularly regarding outdated information and lack of data source attribution, pose challenges. Retrieval-Augmented Generation offers a promising alternative, addressing these issues by integrating real-time data retrieval with generative capabilities. As AI continues to advance, the synergy between generative models and information retrieval systems like RAG is likely to become increasingly significant, paving the way for more accurate, reliable, and transparent AI-driven solutions.

Enhancing AI Language Models with Retrieval-Augmented Generation

by Joche Ojeda | Jan 3, 2024 | A.I

Enhancing AI Language Models with Retrieval-Augmented Generation

Introduction

In the world of natural language processing and artificial intelligence, researchers and developers are constantly searching for ways to improve the capabilities of AI language models. One of the latest innovations in this field is Retrieval-Augmented Generation (RAG), a technique that combines the power of language generation with the ability to retrieve relevant information from a knowledge source. In this article, we will explore what RAG is, how it works, and its potential applications in various industries.

What is Retrieval-Augmented Generation?

Retrieval-Augmented Generation is a method that enhances AI language models by allowing them to access external knowledge sources to generate more accurate and contextually relevant responses. Instead of relying solely on the model’s internal knowledge, RAG enables the AI to retrieve relevant information from a database or a knowledge source, such as Wikipedia, and use that information to generate a response.

How does Retrieval-Augmented Generation work?

RAG consists of two main components: a neural retriever and a neural generator. The neural retriever is responsible for finding relevant information from the external knowledge source. It does this by searching for documents that are most similar to the input text or query. Once the relevant documents are retrieved, the neural generator processes the retrieved information and generates a response based on the context provided by the input text and the retrieved documents.

The neural retriever and the neural generator work together to create a more accurate and contextually relevant response. This combination allows the AI to produce higher-quality outputs and reduces the likelihood of generating incorrect or nonsensical information.

Potential Applications of Retrieval-Augmented Generation

Retrieval-Augmented Generation has a wide range of potential applications in various industries. Some of the most promising use cases include:

Customer service: RAG can be used to improve the quality of customer service chatbots, allowing them to provide more accurate and relevant information to customers.
Education: RAG can be used to create educational tools that provide students with accurate and up-to-date information on a wide range of topics.
Healthcare: RAG can be used to develop AI systems that can assist doctors and healthcare professionals by providing accurate and relevant medical information.
News and media: RAG can be used to create AI-powered news and media platforms that can provide users with accurate and contextually relevant information on current events and topics.

Conclusion

Retrieval-Augmented Generation is a powerful technique that has the potential to significantly enhance the capabilities of AI language models. By combining the power of language generation with the ability to retrieve relevant information from external sources, RAG can provide more accurate and contextually relevant responses. As the technology continues to develop, we can expect to see a wide range of applications for RAG in various industries.

The Steps to Create, Train, Save, and Load a Spam Detection AI Model Using ML.NET

by Joche Ojeda | Jan 2, 2024 | A.I

This article demonstrates the process of creating, training, saving, and loading a spam detection AI model using ML.NET, but also emphasizes the reusability of the trained model. By following the steps in the article, you will be able to create a model that can be easily reused and integrated into your .NET applications, allowing you to effectively identify and filter out spam emails.

Prerequisites

Basic understanding of C#
Familiarity with ML.NET and machine learning concepts

Code Overview

1. Import necessary namespaces:


      using System;
      using System.IO;
      using System.Linq;
      using Microsoft.ML;
      using Microsoft.ML.Data;

1. Define the Email class and its properties:


      public class Email
      {
        public string Content { get; set; }
        public bool IsSpam { get; set; }
      }

1. Create a sample dataset for training the model:


      var sampleData = new List<Email>
      {
        new Email { Content = "Buy cheap products now", IsSpam = true },
        new Email { Content = "Meeting at 3 PM", IsSpam = false },
      };

1. Initialize a new MLContext, which is the main entry point to ML.NET:


      var mlContext = new MLContext();

1. Load the sample data into an IDataView:


      var trainData = mlContext.Data.LoadFromEnumerable(sampleData);

1. Define the data processing pipeline and the training algorithm (SdcaLogisticRegression):


      var pipeline = mlContext.Transforms.Text.FeaturizeText("Features", nameof(Email.Content))
        .Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression());

1. Train the model:


      var model = pipeline.Fit(trainData);

1. Save the trained model as a .NET binary:


      mlContext.Model.Save(model, trainData.Schema, "model.zip");

1. Load the saved model:


      var newMlContext = new MLContext();
      DataViewSchema modelSchema;
      ITransformer trainedModel = newMlContext.Model.Load("model.zip", out modelSchema);

1. Create a prediction engine:


      var predictionEngine = mlContext.Model.CreatePredictionEngine<Email, SpamPrediction>(trainedModel);

1. Test the model with a sample email:


      var sampleEmail = new Email { Content = "Special discount, buy now!" };
      var prediction = predictionEngine.Predict(sampleEmail);

1. Output the prediction:


      Debug.WriteLine($"Email: '{sampleEmail.Content}' is {(prediction.IsSpam ? "spam" : "not spam")}");

1. Assert that the prediction is correct:


      Assert.IsTrue(prediction.IsSpam);

1. Verify that the model was saved:


      if(File.Exists("model.zip"))
        Assert.Pass();
      else
        Assert.Fail();

Conclusion

In this article, we explained a simple spam detection model in ML.NET and demonstrated how to train and test the model. This code can be extended to build more complex models, and can be used as a starting point for exploring machine learning in .NET.

Github Repo

The Meme: A Cultural A.I Embedding

by Joche Ojeda | Dec 31, 2023 | A.I

Unpacking Memes and AI Embeddings: An Intriguing Intersection

The Essence of Embeddings in AI

In the realm of artificial intelligence, the concept of an embedding is pivotal. It’s a method of converting complex, high-dimensional data like text, images, or sounds into a lower-dimensional space. This transformation captures the essence of the data’s most relevant features.

Imagine a vast library of books. An embedding is like a skilled librarian who can distill each book into a single, insightful summary. This process enables machines to process and understand vast swathes of data more efficiently and meaningfully.

The Meme: A Cultural Embedding

A meme is a cultural artifact, often an image with text, that encapsulates a collective experience, emotion, or idea in a highly condensed format. It’s a snippet of culture, distilled down to its most essential and relatable elements.

The Intersection: AI Embeddings and Memes

The connection between AI embeddings and memes lies in their shared essence of abstraction and distillation. Both serve as compact representations of more complex entities. An AI embedding abstracts media into a form that captures its most relevant features, just as a meme condenses an experience or idea into a simple format.

Implications and Insights

This intersection offers fascinating implications. For instance, when AI learns to understand and generate memes, it’s tapping into the cultural and emotional undercurrents that memes represent. This requires a nuanced understanding of human experiences and societal contexts – a significant challenge for AI.

Moreover, the study of memes can inform AI research, leading to more adaptable and resilient AI models.

Conclusion

In conclusion, while AI embeddings and memes operate in different domains, they share a fundamental similarity in their approach to abstraction. This intersection opens up possibilities for both AI development and our understanding of cultural phenomena.

ONNX: Revolutionizing Interoperability in Machine Learning

by Joche Ojeda | Dec 18, 2023 | A.I

ONNX: Revolutionizing Interoperability in Machine Learning

The field of machine learning (ML) and artificial intelligence (AI) has witnessed a groundbreaking innovation in the form of ONNX (Open Neural Network Exchange). This open-source model format is redefining the norms of model sharing and interoperability across various ML frameworks. In this article, we explore the ONNX models, the history of the ONNX format, and the role of the ONNX Runtime in the ONNX ecosystem.

What is an ONNX Model?

ONNX stands as a universal format for representing machine learning models, bridging the gap between different ML frameworks and enabling models to be exported and utilized across diverse platforms.

The Genesis and Evolution of ONNX Format

ONNX emerged from a collaboration between Microsoft and Facebook in 2017, with the aim of overcoming the fragmentation in the ML world. Its adoption by major frameworks like TensorFlow and PyTorch was a key milestone in its evolution.

ONNX Runtime: The Engine Behind ONNX Models

ONNX Runtime is a performance-focused engine for running ONNX models, optimized for a variety of platforms and hardware configurations, from cloud-based servers to edge devices.

Where Does ONNX Runtime Run?

ONNX Runtime is cross-platform, running on operating systems such as Windows, Linux, and macOS, and is adaptable to mobile platforms and IoT devices.

ONNX Today

ONNX stands as a vital tool for developers and researchers, supported by an active open-source community and embodying the collaborative spirit of the AI and ML community.

ONNX and its runtime have reshaped the ML landscape, promoting an environment of enhanced collaboration and accessibility. As we continue to explore new frontiers in AI, ONNX’s role in simplifying model deployment and ensuring compatibility across platforms will be instrumental in advancing the field.

« Older Entries

Next Entries »

Run A.I models locally with Ollama

Ollama AI Framework

CLI Read-Eval-Print Loop (REPL)

REST API

List of Available Models in Ollama

Understanding LLM Limitations and the Advantages of RAG

Navigating the Limitations of Large Language Models: Understanding Outdated Information, Lack of Data Sources, and the Comparative Advantages of Retrieval-Augmented Generation (RAG)

Introduction

1. Outdated Information in Large Language Models

2. Lack of Data Source Attribution

Comparing LLMs with Retrieval-Augmented Generation (RAG)

Why RAG Excels Over Fine-Tuning in LLMs

Conclusion

Enhancing AI Language Models with Retrieval-Augmented Generation

Enhancing AI Language Models with Retrieval-Augmented Generation

Introduction

What is Retrieval-Augmented Generation?

How does Retrieval-Augmented Generation work?

Potential Applications of Retrieval-Augmented Generation

Conclusion

The Steps to Create, Train, Save, and Load a Spam Detection AI Model Using ML.NET

Prerequisites

Code Overview

Conclusion

The Meme: A Cultural A.I Embedding

Unpacking Memes and AI Embeddings: An Intriguing Intersection

The Essence of Embeddings in AI

The Meme: A Cultural Embedding

The Intersection: AI Embeddings and Memes

Implications and Insights

Conclusion

ONNX: Revolutionizing Interoperability in Machine Learning

ONNX: Revolutionizing Interoperability in Machine Learning

What is an ONNX Model?

The Genesis and Evolution of ONNX Format

ONNX Runtime: The Engine Behind ONNX Models

Where Does ONNX Runtime Run?

ONNX Today

Search

Recent Posts

Categories

Archives