- Feb 14

A Beginner's Guide to Using Open Source Quantized LLM for Generative AI Apps

In the realm of Generative AI, the emergence of quantized model it becomes easier to run open source models, especially for businesses, researchers, and developers seeking to unlock new potentials without delving deep into technical complexities.

Introduction

In this blog we build simple Gradio application which takes in the Customer Feedback and analyze the feedback and provide us back with structured JSON output which can be used directly stored and analyzed. Then we will generate a reply based on the feedback provided by the customer.

Gradio:

Gradio is a Python library that allows developers to quickly create customizable UI components for machine learning models. It's designed to make AI more accessible and interactive, enabling users to test and showcase their models through a web interface with minimal coding. Gradio's simplicity and versatility make it a popular choice for both seasoned AI practitioners and newcomers.

Mistral 7B LLM: Mistral 7B is a new 7.3 billion parameter language model that represents a major advance in large language model (LLM) capabilities. It has outperformed the 13 billion parameter Llama 2 model on all tasks. For coding related tasks it is close to the performance of CodeLlama 7B. Thus this model can be user for a myriad of Generative AI application.

Hosting LLM API Locally

Go to https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF and download any one the below quantized models and place it along with the folder where you want to develop the Gradio app. I have downloaded the mistral-7b-instruct-v0.2.Q4_K_M.gguf model as it can be run using 7GB of RAM. Now in order to create a endpoint similar to OpenAI API we can use llama-cpp-python library which we went through last blog.

The Mistral-7B model can be hosted in API endpoint locally using a simple command line instruction that leverages llama-cpp-python, a tool that facilitates running LLMs like Mistral-7B efficiently. The command provided:

python -m llama_cpp.server --model Mistral-7B-Instruct-v0.2.Q4_K_M.gguf --port 1234

This command starts a server on your local machine, making the Mistral-7B model accessible for processing requests, such as analyzing text and endpoint will be hosted in http://localhost:1234/v1 in your local machine.

Building the Gradio App

The Gradio app designed to analyze customer feedback using LLMs and then use the LLM to give a response back to the user based on the feedback. In order for us to compare the performance of the Open Source Model with the commercial used LLM model we will provide the user with option to choose between the Open Source Model or the OpenAI Model and let the user compare and see the difference.

Before we begin with the coding part make sure you have OpenAI API key if you have skip to next section else go through this section to make sure you have setup the OpenAI API key properly.

First go to https://www.openai.com and create a account if you don't have one.

Once you log in go to the API key section and then click create new secret key and copy the API key and save it some where safe as you will not be able to access the key after that.

Now that we have created the API key, create a new folder where you will be having your app and other files for the app. In that folder make sure to have the model which was previously downloaded as well. In the same folder create a file called config.ini and paste the below contents into the file

[openai]
api_key = your-api-key-here

Now create new python called app.py for our Gradio app. Let us import all the required libraries for the app. First we will import Gradio to build the Gradio app, then we will import the OpenAI client which we will be using to call the endpoint for accessin both the OpenAI GPT models as well as the Quantized Mistral 7B parameter model running locally using llama-cpp-python. Next we import the ConfigParser to read the OpenAI API key from the config.ini and at last we will import the JSON library to help us create JSON objects.

import gradio as gr
from openai import OpenAI
from configparser import ConfigParser
import json

Now we will read the config file so that we can directly access the API Key

keyconfig = ConfigParser() 
config.read('config.ini')

Now we define the JSON format we want the LLM to generate so that we can capture it easily and then store or process it. Here are looking for the product the customer if referring to and then we look for the reason the customer either liked or disliked the product then we look for impact of the reason be it positive or negative and finally we determine the sentiment of the feedback so that we can easily classify the feedback and treat it accordingly.

product_json_format = {
    "Product": "",
    "Reason": "Reason for Liking the product to disliking the product",
    "Impact": "Impact due to Reason",
    "Sentiment": "Sentiment of Feedback"
}

Next we define a function that we are going to use to make the LLM generate the custom message based on the feedback provided by the user. Here we are only considering only two possibilities of Positive and Negative.

def generate_custom_message(client, sentiment):
    # Generate a response based on sentiment using the chosen AI model
    if sentiment == "Positive":
        prompt = "Generate a thank you message for positive feedback. Company Name is Codersarts"
    else:
        prompt = "Generate a message assuring the user that their issue will be addressed. Company Name is Codersarts"

    response = client.chat.completions.create(
        model="gpt-4-turbo-preview",  # Open Source LLM ignores this but we need it for OpenAI
        messages=[{"role": "system", "content": prompt}],
    )
    return response.choices[0].message.content.strip()

In the above function we take two inputs one is client which is client we will be using connect with LLM and other will be sentiment which we would have identified first when generating the JSON response. Note in the client we have mentioned the model as OpenAI GPT4 Turbo Preview model which is needed when using OpenAI client but when using the Mistral 7B parameter model we are running locally this model is not considered so it make it convenient to put as GPT4 model as default. Once the response is generated we extract the message from the response.

Next we define the function to analyze the feedback and generate the JSON response. Here is the code for it.

def analyze_feedback(feedback, model_choice):
    # Choose the client based on model_choice
    if model_choice == "OpenAI GPT":
        client = OpenAI(api_key=config.get('openai', 'api_key'))
    else:
        client = OpenAI(base_url="http://localhost:1234/v1",
                        api_key="not-needed")

    response = client.chat.completions.create(
        model="gpt-4-turbo-preview",
        messages=[{"role": "system", "content": f"Act as Customer Feedback Analyzer and Find all the meaning full information from the feedback and provide it in Following JSON format {product_json_format}"},
                  {"role": "user", "content": f"{feedback}"}],
    )

    json_output = response.choices[0].message.content.strip()
    json_output = json_output.replace('```', '').replace('json', '')
    json_output = json.loads(json_output)

    sentiment = json_output["Sentiment"]
    custom_message = generate_custom_message(client, sentiment)

    # Convert the response to a visually appealing HTML format
    response_html = f"""
    <div style='margin: 10px; padding: 20px; border: 1px solid #ddd; border-radius: 8px;'>
        <h2>Feedback Analysis</h2>
        <p><strong>Product:</strong> {json_output['Product']}</p>
        <p><strong>Reason:</strong> {json_output['Reason']}</p>
        <p><strong>Impact:</strong> {json_output['Impact']}</p>
        <p><strong>Sentiment:</strong> <span style='color: {"green" if sentiment == "Positive" else "red"};'>{sentiment}</span></p>
    </div>
    """

    return response_html, custom_message

We will go line by line and understand the code, the function accepts two inputs one is the feedback and other is model we are using. Next based on the Model we will have to define the client accordingly. For the OpenAI Model we using the config file to pass on the API Key and for the Mistral 7B model we define the URL where the models is hosted to which it has to interact to get the response which will be similar to OpenAI API design.

Next we define the client and model to be used in case of OpenAI client, then we use the role "system" to let the LLM know what it should do then we use the role of "user" to provide the feedback as input to the LLM for it generate the JSON response. The prompt given for the system was finetuned to make sure LLM does only what it is asked to do.

Next we receive the response and extract the message generated by the response, the model of this kind usually is trained to output in a markdown format so it might generate some text before as shown below in the output which we remove using the replace methods of the string. This is essential as other wise load method of json will throw an error saying unknow syntax in the string. Now we using JSON library to convert the string to JSON using the loads function in JSON as shown below.

# Text that are present in the output we need to remove 
# ```json
# json
# ```

json.loads(json_output) # Converts the String to JSON

Now we extract the sentiment from the JSON object and the provide it to the function generate_custom_message along with the sentiment.

sentiment = json_output["Sentiment"]
custom_message = generate_custom_message(client, sentiment)

Now we use the JSON object and create a simple HTML to display the contents of JSON object in a nice view in the Gradio app and then we return the response_html and custom message.

# Convert the response to a visually appealing HTML format
response_html = f"""
    <div style='margin: 10px; padding: 20px; border: 1px solid #ddd; border-radius: 8px;'>
        <h2>Feedback Analysis</h2>
        <p><strong>Product:</strong> {json_output['Product']}</p>
        <p><strong>Reason:</strong> {json_output['Reason']}</p>
        <p><strong>Impact:</strong> {json_output['Impact']}</p>
        <p><strong>Sentiment:</strong> <span style='color: {"green" if sentiment == "Positive" else "red"};'>{sentiment}</span></p>
    </div>
    """

return response_html, custom_message

Now, we define the Gradio App interface for out app.

Initializing the Gradio Interface: The code begins with with gr.Blocks() as iface:. This line sets up a new Gradio app using the Blocks API, which allows for more flexible and complex interfaces. The iface variable represents our Gradio interface, where we'll add all our components.
Adding a Title with Markdown: gr.Markdown() is used to add a title to our app. We use Markdown formatting to center the title "Feedback Analyzer" on the webpage. This is a simple way to make the interface more informative and visually appealing.
Creating a Dropdown for Model Selection: With gr.Dropdown(), we add a dropdown menu labeled "Choose Model". This lets the user select between "OpenAI GPT" and "Non OpenAI Model". This feature provides flexibility, allowing the user to compare feedback analysis from different AI models.
Textbox for Customer Feedback: The gr.Textbox(label="Customer Feedback") line creates a textbox where users can input the customer feedback they want to analyze. This is a key part of the interface, as it's where the user interacts directly with the app by entering their data.
Displaying Analysis Results: We use gr.HTML(label='Analysis Output') and gr.Textbox(label='Custom Message') to display the results of the feedback analysis. The former is intended for HTML-formatted output, which can include styled text or images, while the latter provides a simple textbox for any custom messages we might want to show the user.
Analyzing Feedback Button: gr.Button("Analyze Feedback") adds a button that users click to start the analysis of the feedback they've entered. This is an essential interactive element, as it triggers the analysis process.
Connecting the Button to Functionality: The .click() method connects our "Analyze Feedback" button to the analyze_feedback function, specifying what should happen when the button is clicked. It defines that the function takes the user's input and model choice, processes them, and then outputs the analysis and any custom message. The inputs and outputs parameters specify which components feed into the function and where the results will be displayed, respectively. api_name="Analyze Feedback" gives a name to this specific API endpoint, useful for reference or when integrating with external applications.
Launching the App: Finally, iface.launch() outside the with block tells Gradio to start the app. This makes the interface accessible to users, allowing them to interact with the Feedback Analyzer.

Now we can start the gradio app by running the below command in the terminal.

python app.py

The app will now be running on the following URL

http://127.0.0.1:7860/

Gradio app will have the following UI where you can choose the model and write the feedback for the product and click analyze feedback it will start generating output.

Output will look like this

Now for Comparison let use try the same using the OpenAI GPT 4 Turbo Preview model and see what the result is (Note you might incur API Cost for this action)

As we observe, both models demonstrate similar performance in accurately understanding the essence of the feedback and extracting meaningful insights, as well as in their capability to craft a custom message. The primary distinction lies in the cost implications. Utilizing the API allows for a pay-per-use model, freeing you from the concerns of maintenance and other hosting-related responsibilities associated with managing your own model.

However, hosting your own model comes with its own set of benefits, including full control over the model's customization, updates, and privacy. This autonomy allows for tailored adjustments and improvements based on specific needs or new insights, offering a level of flexibility and security that might be critical for certain applications. This approach not only simplifies operations but also ensures cost-effectiveness by charging only for what you use, while providing the potential for deep customization based on your need by finetuning the model for a specific use case and enhanced data security when managing your model.

To Create such Proof Of Concept/MVP for similar use cases or using open source or OpenAI models feel free to contact us at contact@codersarts.com

A Beginner's Guide to Using Open Source Quantized LLM for Generative AI Apps

Introduction

Gradio:

Hosting LLM API Locally

Building the Gradio App

Recent Posts