LLM ChatBot with Neo4j

Building interactive ChatBot using LLM -LangChain-Neo4j.

Nov 24, 2023

Welcome to the fascinating world of Graph Convolutional Networks! Today, we’re going to explore an exciting and practical application: building a chatbot that integrates OpenAI’s Large Language Models (LLMs) with Neo4j for a seamless, user-friendly interaction with a graph database.

Before we dive into the details, let's quickly recap some of the key topics we've covered in the Lighthouse Projects Program Season 2:

🔸 For details on the graph data and Graph Neural Network, Read here.

🔸 For details on Graph ML Practical Applications, Read here.

🔸 Tab2Graph-1 Conversion of Tabular to Graph using Python, Read here

🔸 Tab2Graph- Conversion of Tabular to Graph using Neo4j, Read here.

🔸 Graph Building - Neo4j, Read here.

Agenda

This article will guide you through the process of integrating powerful language models with graph databases, enabling you to create a chatbot capable of natural language interactions with Neo4j. By the end of this guide, you'll have all the tools you need to build your own chatbot that leverages the power of LLMs and Neo4j for intuitive data exploration and analysis.

Introduction to Large Language Models (LLMs)
Applications of LLMs
Overview of LangChain
Advantages and Applications of LangChain
Installation Guidelines
Step-by-Step Guide to Building a Chatbot using LLM, LangChain, and Neo4j

Large Language Models

LLM stands for "Large Language Model." It refers to an artificial intelligence model trained on vast amounts of text data to understand and generate human-like language. These models are capable of understanding context, generating coherent and contextually relevant text, and performing various natural language processing (NLP) tasks.

Large language models, like OpenAI's GPT-3 (Generative Pre-trained Transformer 3), are part of the transformer architecture family. They are pre-trained on diverse datasets containing parts of the internet, books, articles, and other text sources. Once pre-trained, they can be fine-tuned for specific tasks or applications, such as chatbots, language translation, summarisation, and more.

Types of LLMs

There are several types of Large Language Models available in the market.

1. GPT (Generative Pre-trained Transformer) Series by OpenAI

GPT-4: The latest version has 1 trillion parameters, and it’s under Research and Development.

GPT-3.5: It is the previous version of GPT and is still a very powerful LLM. GPT-3.5 -175 Billion parameters can generate text that is indistinguishable from human-written text, and it can be used for a wide variety of tasks. GPT-3.5 is a mature product, and it is available for use through OpenAI's API.

GPT-3: With 175 billion parameters, it is capable of performing a wide range of natural language tasks.

GPT-2: The predecessor to GPT-3 is also known for its large-scale language generation capabilities.

2. BERT (Bidirectional Encoder Representations from Transformers) by Google

BERT is designed for natural language understanding tasks. It considers the context of words in both directions, improving its understanding of context in sentences.

3. XLNet by Google and Carnegie Mellon University

XLNet is an extension of the transformer model that incorporates elements of autoregressive and autoencoding language modelling. It aims to capture bidirectional context while maintaining the advantages of autoregressive models.

4. T5 (Text-to-Text Transfer Transformer) by Google

T5 is designed to frame all NLP tasks as converting input text to output text. This unified approach simplifies the training process for various language tasks.

5. RoBERTa (Robustly optimized BERT approach) by Facebook

RoBERTa builds on BERT but modifies key hyperparameters and removes certain training objectives, achieving better performance on certain benchmarks.

6. ERNIE (Enhanced Representation through kNowledge Integration) by Baidu

ERNIE incorporates knowledge learned during pre-training to enhance its understanding of language.

7. ALBERT (A Lite BERT) by Google

ALBERT reduces the number of parameters in BERT while maintaining its performance, making it more efficient.

8. Turing-NLG by Microsoft

Turing-NLG is a large-scale language model developed by Microsoft that is known for its capabilities in text generation.

🔸 How to do Fine-Tuning of LLM

Fine-tuning a Large Language Model (LLM) involves taking a pre-trained model and training it on a specific dataset or task to adapt it for a particular application.

Below are general steps for fine-tuning an LLM

Choose a Pre-trained Model

You can select a pre-trained LLM that fits your needs. Popular choices include GPT-3, GPT-2, BERT, T5, etc.

Define the Task

Could you clearly define the task for which you want to fine-tune the model? Tasks include sentiment analysis, text classification, named entity recognition, etc.

Prepare the Dataset

Gather and preprocess a dataset specific to your task. Ensure that the dataset is annotated or labelled with the necessary information for training.

Tokenisation

Tokenise the dataset to match the tokenisation used by the pre-trained model. This ensures consistency in the input format.

Model Architecture

Depending on the task, you may need to modify the architecture of the pre-trained model. For example, could you add a classification layer for text classification tasks?

Loss Function

You can choose an appropriate loss function for your task if you like. Common choices include categorical cross-entropy for classification tasks or mean squared error for regression tasks.

Training

Train the model on your specific dataset. Use the pre-trained weights as the starting point. Adjust the learning rate, batch size, and other hyperparameters as needed.

Validation and Evaluation

Validate the model on a separate validation set to monitor performance during training. Evaluate the fine-tuned model on a test set to assess its generalisation to new data.

Hyperparameter Tuning

Experiment with different hyperparameters to optimise the model's performance. This may involve adjusting the learning rate, dropout rate, or other settings.

Regularisation

Apply regularisation techniques such as dropout or weight decay to prevent overfitting, especially if the fine-tuned dataset is small.

Save the Fine-Tuned Model

Could you save the weights and configuration of the fine-tuned model for future use? This allows you to deploy the model for inference on new data.

Deployment

You can integrate the fine-tuned model into your application or system for real-world use.

I’d like to point out that fine-tuning requires a labelled dataset relevant to your task. Additionally, ethical considerations, data privacy, and compliance with relevant regulations should be taken into account when working with large language models, especially when deploying them in real-world applications.

🔸 Langchain

LangChain is an open-source framework for developing applications powered by large language models (LLMs).

It provides a modular and flexible architecture that makes it easy to connect LLMs to a variety of data sources and external services, enabling developers to create a wide range of applications.

🔸Features of LangChain

Chaining

LangChain allows developers to "chain" together different components to create more complex applications. This modular approach makes it easier to reuse code and build upon existing components.

Context Awareness

LangChain supports the use of context to provide additional information to LLMs, which can improve their responses and decision-making. This context can come from various sources, such as prompts, few-shot examples, or external data.

Reasoning

LangChain enables LLMs to reason about the provided context and generate more informed and nuanced responses. This capability is particularly useful for tasks like question answering, summarisation, and chatbot interactions.

analyseApplications of LangChain

Several applications can be built using the LangChain; a few are mentioned below.

ChatBots
Document Analysis
Summarisation
Code Analysis and Generation
Text Generation
Creative Writing and Content Creation
Educational Applications

Chatbots

LangChain can be used to develop sophisticated chatbots that can engage in natural conversations, provide customer support, and even offer personalised recommendations.

Document Analysis and Summarization

LangChain can be employed to analyse and summarise large documents, extract key information, and generate concise summaries.

Code Analysis and Generation

LangChain can be leveraged to analyse code, identify potential errors, and even generate code snippets.

Creative Writing and Content Creation

LangChain can be used to generate creative text formats, such as poems, scripts, musical pieces, emails, and letters.

Educational Applications

LangChain can be integrated into educational platforms to provide personalised learning experiences, generate interactive exercises, and assess student understanding.

Advantages of LangChain

There are several advantages of using the LangChain in building applications, including:

Reduced Development Effort

LangChain simplifies the development of LLM-powered applications by providing a standardised framework and reusable components.

Improved Scalability

LangChain's modular design makes it easy to scale applications to handle increasing data volumes and user traffic.

Enhanced Efficiency

LangChain's ability to connect LLMs to external data sources and services can improve application performance and accuracy.

Fostered Innovation

LangChain provides a platform for developers to experiment with new LLM-based applications and push the boundaries of what's possible.

🔸Installing Langchain

There are two main ways to install LangChain for Python: using pip or from source.

Installing LangChain using pip

The easiest way to install LangChain is using pip, the Python package installer. To do this, open a terminal window and type the following command:

pip install langchain

This will install the latest stable version of LangChain. If you want to install a specific version of LangChain, you can use the following command:

pip install langchain==<version_number>

Installing OpenAI

Installing the OpenAI Python library is a straightforward process that allows you to interact with the OpenAI API effectively. Here's a step-by-step guide on how to install OpenAI using Python for its API key:

Create a Virtual Environment (Recommended): You should create a virtual environment to isolate the OpenAI Python library from other Python projects so that there are potential conflicts. To create a virtual environment, open a terminal window and type the following command:

python -m venv openai-env

This will create a virtual environment named "openai-env" in the current directory.

Activate the Virtual Environment: To activate the virtual environment, use the following command:

# On Windows
openai-env\Scripts\activate

# On Unix or macOS
source openai-env/bin/activate

Install the OpenAI Python Library: Once the virtual environment is activated, install the OpenAI Python library using pip:

pip install openai

This will install the latest stable version of the OpenAI Python library.

Set Your OpenAI API Key: Obtain your OpenAI API key from your OpenAI account. You can find it under the "API Keys" section in your account settings.
Store your API key in a secure environment variable named OPENAI_API_KEY. You can do this using the following command:

export OPENAI_API_KEY=<your_api_key>

Replace <your_api_key> with your actual API key.

Verify Installation: To verify that the OpenAI Python library is installed and your API key is set correctly, you can run the following code:

Python

import openai

print(openai.api_key)

This will print your API key to the console. If you see your API key, the installation and configuration are successful.

Remember to keep your API key confidential and avoid exposing it in any publicly accessible code.

Installation of Neo4J

Installing Neo4j in Python involves two steps.

First, the Neo4j Python driver will be installed, and second, Neo4j will be configured to accept connections from Python applications.

Installing the Neo4j Python Driver

The Neo4j Python driver is a package that allows Python applications to connect to and interact with Neo4j databases. To install the driver, open a terminal window and type the following command:

pip install neo4j

This will install the latest stable version of the Neo4j Python driver.

Configuring Neo4j

By default, Neo4j does not allow connections from Python applications. To enable this, you need to modify the Neo4j configuration file (neo4j.conf). Open the neo4j.conf file in a text editor and locate the following line:

# Allow remote connections to Neo4j
remote.bolt.server.port = 7687

Could you stop this line by removing the '#' at the beginning? This will allow Python applications to connect to Neo4j on port 7687.

Restart the Neo4j server for the changes to take effect. Once the server has restarted, you can start using the Neo4j Python driver to connect to and interact with your Neo4j database.

Here is an example of how to connect to a Neo4j database using the Neo4j Python driver:

Python

from neo4j import GraphDatabase

driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "password"))

session = driver.session()

session.run("MATCH (n) RETURN n")

This code will connect to a Neo4j database running on localhost on port 7687 using the username "neo4j" and the password "password". It will then execute a Cypher query that returns all nodes in the database.

Installing ChainLit

Chainlit is an open-source Python framework that makes it incredibly fast to build Chat GPT-like applications with your own business logic and data. It is built on top of LangChain, a framework for building applications powered by large language models (LLMs). It provides a user-friendly interface for creating and deploying LLM-based applications.

Key Features of Chainlit

Fast development: Chainlit enables rapid development of LLM-based applications by providing a streamlined development process and pre-built components.
Customisable interface: Chainlit allows you to create a custom frontend for your application, giving you the flexibility to design a unique user experience.
Seamless integrations: Chainlit integrates seamlessly with various tools and frameworks, making it easy to extend its functionality.

Applications of Chainlit

Chainlit can be used to build a wide range of applications, including:

Chatbots: Chainlit enables the creation of interactive chatbots that can engage in natural conversations with users.
Prompt playgrounds: Chainlit provides a platform for experimenting with different prompts and observing the responses of LLMs.
Data exploration tools: Chainlit can be used to build tools that help users explore and analyse large datasets.
Educational applications: Chainlit can be used to create engaging educational applications that utilise the power of LLMs.

Getting Started with Chainlit

To get started with Chainlit, follow these steps:

Install Chainlit: You can install Chainlit using pip:

pip install chainlit

Create a Chainlit application: Create a new Python file and import Chainlit:

Python

import chainlit as cl

Could you define your application logic? Write your application's logic using Chainlit's components and methods.
Run your application: Run the Python file using the “chainlit” command:

chainlit run my_app.py

This will launch the Chainlit application in your web browser.

Chainlit is a powerful tool for building LLM-based applications, and its ease of use and flexibility make it a great choice for developers of all levels of experience.

So, we have now met all the basic requirements for the connection of LLM with Neo4j using Langchain.

Creating ChatBot using LLM with Neo4j

Welcome to the Climax !!!

Here, we will create a chatbot which can provide a conversational interface for users to interact with a Neo4j graph database using natural language.

The GraphCypherQAChain component seamlessly integrates the user's natural language questions, the ChatOpenAI language model, and the Neo4j graph database.

By leveraging a predefined Cypher query template and ChatOpenAI's language processing capabilities, the code generates relevant Cypher queries based on the user's questions. It retrieves the corresponding data from the database.

ChatOpenAI then processes the retrieved data and presents it to the user in a natural language response, enabling an intuitive and user-friendly interaction with the graph database.

Let’s Jump into the coding part.

1. Library Imports:

from langchain.graphs import Neo4jGraph 
from langchain.prompts.prompt 
import PromptTemplate 
from langchain.chains import GraphCypherQAChain 
from langchain.chat_models import ChatOpenAI 
import chainlit as cl

Imports necessary libraries and modules, including components from the langchain and chainlit packages.

2. Graph Database Setup:

api_key = "YOUR-API-KEY-HERE"
 
graph_data = Neo4jGraph( url="YOUR-DB-URL-HERE", username="neo4j", password="PASSWORD-HERE", database="neo4j" ) 
schema = graph_data.get_schema

Sets up a connection to a Neo4j graph database using the Neo4jGraph class from the langchain library.
Retrieves the database schema using get_schema.

3. Cypher Query Generation Template:

CYPHER_QUERY_TEMPLATE = """Task:Generate Cypher statement to query a graph database. ... {question}"""

Defines a template for generating Cypher queries. It includes instructions, examples, and a placeholder for the user's question.

4. Prompt Template for Cypher Generation:

CYPHER_GENERATION_PROMPT = PromptTemplate( input_variables=["schema", "question"], template=CYPHER_QUERY_TEMPLATE )

Creates a PromptTemplate object, specifying input variables (schema and question) and using the previously defined Cypher generation template.

5. Main Function (on_chat_start):

@cl.on_chat_start def main(): 
llm_chain = GraphCypherQAChain.from_llm( ChatOpenAI(temperature=0, openai_api_key=api_key, model='gpt-4'), graph=graph, verbose=True, cypher_prompt=CYPHER_GENERATION_PROMPT ) 

cl.user_session.set("llm_chain", llm_chain)

Defines the main function, which is executed when the chat starts.
Instantiates a GraphCypherQAChain using a ChatOpenAI language model, Neo4j graph, and the Cypher generation prompt.
Stores the chain in the user session using cl.user_session.set.

6. Message Processing Function (on_message):

@cl.on_message

async def main(message: str):

    llm_chain = cl.user_session.get("llm_chain")  # type: LLMChain
    res = await cl.make_async(llm_chain)(
        message, callbacks=[cl.LangchainCallbackHandler()]
    )
    await cl.Message(content=res['result']).send()
    return llm_chain

Defines the main function for processing messages.
Retrieves the chain from the user session.
Calls the chain asynchronously with the user's message and a callback handler.
Sends the result of the chain as a message.

So, we have successfully created a ChatBot for user-friendly interaction with Graph DB !!!

Note:

Ensure you replace placeholders like "YOUR-API-KEY-HERE," "YOUR-DB-URL-HERE," and "PASSWORD-HERE" with your actual API key, database URL, and password.

Thank you Very Much, and Stay Tuned for the Upcoming Articles on Graph Machine Learning !!!

References:-

Thank you for reading the CeADAR Lighthouse Project Program. This post is public, so feel free to share it.

CeADAR Lighthouse Project Program

LLM ChatBot with Neo4j

Building interactive ChatBot using LLM -LangChain-Neo4j.

Agenda

Large Language Models

Types of LLMs

1. GPT (Generative Pre-trained Transformer) Series by OpenAI

2. BERT (Bidirectional Encoder Representations from Transformers) by Google

3. XLNet by Google and Carnegie Mellon University

4. T5 (Text-to-Text Transfer Transformer) by Google

5. RoBERTa (Robustly optimized BERT approach) by Facebook

6. ERNIE (Enhanced Representation through kNowledge Integration) by Baidu

7. ALBERT (A Lite BERT) by Google

8. Turing-NLG by Microsoft

🔸 How to do Fine-Tuning of LLM

Choose a Pre-trained Model

Define the Task

Prepare the Dataset

Tokenisation

Model Architecture

Loss Function

Training

Validation and Evaluation

Hyperparameter Tuning

Regularisation

Save the Fine-Tuned Model

Deployment

🔸 Langchain

🔸Features of LangChain

Chaining

Context Awareness

Reasoning

analyseApplications of LangChain

Chatbots

Document Analysis and Summarization

Code Analysis and Generation

Creative Writing and Content Creation

Educational Applications

Advantages of LangChain

Reduced Development Effort

Improved Scalability

Enhanced Efficiency

Fostered Innovation

🔸Installing Langchain

Installing OpenAI

Installation of Neo4J

Installing ChainLit

Creating ChatBot using LLM with Neo4j

1. Library Imports:

2. Graph Database Setup:

3. Cypher Query Generation Template:

4. Prompt Template for Cypher Generation:

5. Main Function (on_chat_start):

6. Message Processing Function (on_message):

Note:

Discussion about this post