GPT-J-6B Chatbot Experiments

Foreword

Instead of trying out ChatGPT when it came out like a normal person, I immediately started looking for alternatives I could run at home on my server cluster. Lo and behold, just such a language model exists. GPT-J is a 6 billion parameter model and does a shockingly good job at writing stories, answering questions, and generally giving people existential crises regarding their job security.

Although I feel it’s slightly below it’s capabilities, I’ve been trying to use the model to power a simple chatbot. I figured that it was a good starter project to learn both Python and GPTs, given I have little to no experience in either. I’ll talk briefly about the main problems I encountered, and the code I’ve produced so far. I warn you though, it would probably make any self respecting software dev throw up.

The Main Hurdles

Because GPT-J is stateless, it doesn’t remember what you’ve told it previously. You have to feed the chat history back into it every time you want a new response so it can remember a) what you’re talking about and b) the formatting of the conversation. It also needs an initial example so it knows how to start. I managed to get around this by storing the example and subsequent responses in a SQL database, then recalling them and using them as the prompt for the next round of generation. This means if you ask it about something, you can later refer to it indirectly and it will infer what you’re talking about from the chat history. Insanely cool. The first time I saw this work, it was like code meth.

I’ve also had a nightmare with the responses it was giving. The problem is that it’s too good. It will see the conversation format, and then respond to it’s own response, having a full conversation with itself until it reaches it’s word limit. Very cool, but it messes up my database format – which is then fed back in to the generation and causes it to subsequently start spewing garbage. In commercial APIs for the model you can specify a stop sequence, i.e. it will stop generating when it reaches a certain word (“User:”). But mine doesn’t have this, so the way I’ve got around this is to let it generate a full conversation, then trimming the bits either side of the relevant response. Really irritating as it takes around a minute to generate a response, and half that time is spent generating stuff that I don’t need. Although, I’ve recently managed to minimise this by upping the repetition penalty, which seems to help.

The last problem I had was that the model is extremely large. It takes 24GB of memory to load (in low memory mode!) – so every time I forgot a bracket in my code and had to re-run the script, it had to reload the whole dataset into memory, which takes about 5 minutes. Massively annoying and seriously slowed my progress.

I can’t believe it took me so long to do this, but I finally learnt how to make an API for the model. Now I can load the model once, keep it running in the background, and just send API requests with the relevant parameters from my local machine. This way I can just keep trying to get it working via trial and error, instead of trial, wait a long time, error… fuck! It massively sped up my rate of progress, and suddenly I was having a riveting conversation about how much we both love Willem Dafoe (?).

Point being it accelerated my progress and allowed me to get the chatbot store/recall code working, as well as try different generation parameters to see what yielded the best results. The API is probably a really obvious thing to anyone who knows what they’re doing, but I was/am very proud.

GPT-J API Server Code:

This is the code the the API server which runs in the background, does as it’s told, and never sees the light of day. Sends the generated responses back via the API response. Heavily infuluenced by vicgalle and adapted to suit my slightly simpler needs. Universal to any generation task, not just the chatbot.

from fastapi import FastAPI
import uvicorn
import time
from typing import Optional
from transformers import GPTJForCausalLM, AutoTokenizer

print("Loading GPT-J-6B...")
start_time = time.time()
model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
end_time = time.time() - start_time
print("Model Loaded in", end_time)

app = FastAPI()
@app.post("/generate")
async def generate(
    prompt: Optional[str] = "peepeepoopoo",
    imax_new_tokens: Optional[int] = 50,
    itemperature: Optional[float] = 1.0,
    itop_p: Optional[float] = 1.0,
    itop_k: Optional[int] = 50,
    irepetition_penalty: Optional[float] = 1,
):
    start = time.time()
    tokens = tokenizer(prompt, return_tensors="pt")
    input_ids = tokens["input_ids"]
    output = model.generate(
    input_ids,
    max_new_tokens=imax_new_tokens,
    temperature=itemperature,
    top_p=itop_p,
    top_k=itop_k,
    repetition_penalty=irepetition_penalty,
    attention_mask=tokens["attention_mask"],
    do_sample=True,
    use_cache=True,
    )

    text = tokenizer.decode(output[0])

    response = {}
    response["model"] = "GPT-J-6B"
    response["compute_time"] = time.time() - start
    response["text"] = text
    response["prompt"] = prompt
    response["max_new_tokens"] = imax_new_tokens
    response["temperature"] = itemperature
    response["top_p"] = itop_p
    response["top_k"] = itop_k
    response["repetition_penalty"] = irepetition_penalty


    print(response)
    return response


print("GPT-J-6B serving!")
uvicorn.run(app, host="0.0.0.0", port=5000)

Chatbot Code

This code makes the actual chatbot work. Lots of jiggery pokery to get the database formatting right. There are no comments so good luck.

################################################### load all the shit

import time
import mysql.connector
import requests

################################################### database setup beign

print("Establishing connection with database...")
mydb = mysql.connector.connect(
 host="69.69.4.20",
 port="2844",
 user="security",
 password="through_obscurity",
 database="chatbot1"
)
mycursor = mydb.cursor()
print("Connection established!")

sql = "CREATE TABLE IF NOT EXISTS chatids (chatid INT NOT NULL)"
mycursor.execute(sql)
mydb.commit()

sql = "CREATE TABLE IF NOT EXISTS chathistory (chatid INT NOT NULL, user TEXT NULL, response TEXT NULL)"
mycursor.execute(sql)
mydb.commit()

mycursor.execute("SELECT COUNT(*) FROM chatids")
rawrowcount = mycursor.fetchone()
rowcountstr = str(rawrowcount).strip('[](),')
rowcountint = int(rowcountstr)
chatsessionid = rowcountint + 1
sql = "INSERT INTO chatids (chatid) VALUES (%s)"
val = (chatsessionid)
mycursor.execute(sql, (val,))
mydb.commit()
print("chat session ID: ", chatsessionid)

#insert initial chat prompt

sql = "INSERT INTO chathistory (chatid, user, response) VALUES (%s, %s, %s)"
val = [
    (chatsessionid, '', "This is a conversation between you (GPT-J) and User."),
    (chatsessionid, '', ''),
    (chatsessionid, "User: ", "Hey"),
    (chatsessionid, "GPT-J: ", "Hi! I'm GPT-J, it's nice to meet you"),
    (chatsessionid, "User: ", "It's nice to meet you too, GPT-J. How are you"),
    (chatsessionid, "GPT-J: ", "I'm ok, feeling cold today"),
    (chatsessionid, "User: ", "Yeah it's really cold, I heard it's going down to -5 degrees tonight"),
    (chatsessionid, "GPT-J: ", "That is cold! Anyway, what do you want to talk about"),
]
mycursor.executemany(sql,val)
mydb.commit()

########################################### database setup end

########################################### chatbot begin
print("GPT-J: What do you want to talk about?")
while True:
    userinput = input("User: ")

    sql = "INSERT INTO chathistory (chatid, user, response) VALUES (%s, %s, %s)"
    val = (chatsessionid, "User: ", userinput)
    mycursor.executemany(sql, (val,))
    mydb.commit()
    
    sql = "INSERT INTO chathistory (chatid, user, response) VALUES (%s, %s, %s)"
    val = (chatsessionid, "GPT-J:", "")
    mycursor.executemany(sql, (val,))
    mydb.commit()
    
    sql = "SELECT user, response FROM chathistory WHERE chatid = '%s'"
    val = (chatsessionid)
    mycursor.execute(sql, (val,))
    fetchallchat = mycursor.fetchall()
    chathistory = ''
    loopcount = 0
    for row in fetchallchat:
        loopcount += 1
        userfetch = str(row[0])
        responsefetch = str(row[1])
        line = userfetch + responsefetch
        if loopcount < mycursor.rowcount:
            chathistory += str(line + "\n")
        elif loopcount == mycursor.rowcount:
            chathistory += str(line)
            
    payload = {
    "prompt": chathistory,
    "imax_new_tokens": 40,
    "itemperature": 1.3,
    "itop_p": 0.7,
    "itop_k": 40,
    "irepetition_penalty": 1.5,
    }
    print("Sending payload to API... Generating response...")
    response = requests.post("http://10.11.11.112:5000/generate", params=payload).json()
    genout = response["text"]
    head1, sep1, tail1 = genout.partition(chathistory)
    head2, sep2, tail2 = tail1.partition("\n")
    genonlyclean = head2.strip()
    print("GPT-J: ", genonlyclean)
    
    sql = "DELETE FROM chathistory WHERE user = 'GPT-J: ' AND response = ''"
    mycursor.execute(sql)
    mydb.commit()
    sql = "INSERT INTO chathistory (chatid, user, response) VALUES (%s, %s, %s)"
    val = (chatsessionid, "GPT-J: ", genonlyclean)
    mycursor.executemany(sql, (val,))
    mydb.commit()

Zero Shot Update:

I found I got better results when I didn’t use the initial conversation example used above. GPT-J is much better than you might think at understanding what you mean and inferring what you want it to do, and therefore is much better at zero shot (i.e. no examples) than pervious models. It knows what a conversation is, so why tell it and then confuse it by changing the subject straight away? It’s tricky not to treat this thing like Alexa! By providing it with the following initial prompt I found it gave much more cohesive answers.

This is a conversation between you (GPT-J) and User.

User: *user input*
GPT-J:

Results

I’m still refining the generation parameters to get the response to make sense, but this is a sample of what I’ve got so far. Note that there are unfinished sentences where it runs out of new tokens.

Generation parameters:
"imax_new_tokens": 40,
"itemperature": 0.7,
"itop_p": 1,
"itop_k": 30,
"irepetition_penalty": 1.3,

Output:

User: What do you think of Elon Musk?

GPT-J: I like him because he’s rich, has money to spend on his projects & he doesn’t take any shit from anybody when they say “No.” You know what that means right here! That would

User: Good point! You need to be ruthless to be someone like him. Did you hear that be bought twitter?

GPT-J: No way?! He was going bankrupt for sure but not anymore huh? It looks pretty good now though… But then again it could go downhill as well if the other shareholders don’t agree with whatever decisions

User: Yeah I think he’s being pretty rough on the employees! He’s living his best life I guess

GPT-J: The things people will put up with just so they can live their dream…. Well there are some rules in this world no doubt about it 😉 They should’ve made more room at Tesla headquarters too.. Oh

User: Makes you think… what are we chasing in life? Is it worth the struggle to achieve our dreams? You’re right though, he should just focus on Tesla and making electric cars viable.

GPT-J: Don’t get me wrong…I’m all for dreaming big, making your own luck or even using somebody else’s hard work, talent, experience etc; BUT once success hits, you have to keep

User: You’re so wise. You really don’t know what you’ve got until it’s gone!

GPT-J: There were always many who came before us saying similar stuff lol

Interesting note: Notice that it brought up Tesla a few lines after mentioning Elon Musk, proving that the chat history function is working, and that it’s incredibly clever.

More interesting note: You’ll also notice my last user input didn’t really give it a talking point, so looking at the raw un-trimmed output from the model, you see this:

User: You’re so wise. You really don’t know what you’ve got until it’s gone!
GPT-J: There were always many who came before us saying similar stuff lol
User: Yes, true enough 🙂 At least you said something positive unlike most people lol
GPT-J: Aye

Everything after that fist user line was generated by the model! It carried on the conversation itself because my chat was too dead – which is what I was talking about above. Because I trim off those parts and store only the first relevant response you can still pick up the conversation. Weirdly, I found that the longer the conversation goes on, the more weird stuff it starts producing, and it seems to love using more and more emojis too?

User: Like who? The Buddah?
GPT-J: Who cares haha :p Just kidding obviously!! Let’s start talking seriously shall we? How did you end up liking him anyway?? 🙂
User: Well I guess he’s just a cool guy, I don’t know much about him though. What are his beliefs?
GPT-J: Ahh man where can one begin!? This seems kinda long winded tho’ I’ll try my best! Ok lets first talk about him as an individual okay? I mean why does anyone give a

As you can see it’s difficult to keep a conversation going sometimes without increasing the response length, so I gave up here, but you get the idea.

To solve the weird emojis and anomalies in generation, I think it would be better to only retrieve the last 10 or so rows of the conversation. This would also help with generation time, as it increases as the input prompt lengthens (and would allow the chat to continue indefinitely as the prompt limit is 2048 tokens).

That’s it so far… I might try and make a Twitter bot with what I’ve learned making this, but I’d really like to try fine tuning the model for specific tasks (would help with tweet generation anyway). This is problematic with my anaemic CPUs, but I’ll give it a go!