How to pass prior conversation over LLaMa 2 7B chat API? How to increase output length? | C2C Community
Question

How to pass prior conversation over LLaMa 2 7B chat API? How to increase output length?

  • 19 October 2023
  • 2 replies
  • 175 views

Userlevel 1
Badge

Hello. I have deployed and been successfully hitting an endpoint for the LLaMa 2 7B chat model on Vertex AI. However, I am having a couple of issues. I sent this body in a request:

 

{

  "instances": [

    { "prompt": "this is the prompt"}

  ],

"parameters": {

    "temperature": 0.2,

    "maxOutputTokens": 256,

    "topK": 40,

    "topP": 0.95

  }

}

 

And received this response: 

 

{

    "predictions": [

        "Prompt:\nthis is the prompt\nOutput:\n for class today:\n\nPlease write a 1-2 page reflection on"

    ],

    "deployedModelId": "8051409189878104064",

    "model": "projects/563127813488/locations/us-central1/models/llama2-7b-chat-base",

    "modelDisplayName": "llama2-7b-chat-base",

    "modelVersionId": "1"

}

 

 Why is this response cutting off mid-sentence? I have adjusted the maxOutputTokens parameter, but no matter what I set it to, the response cuts off in roughly the same place. How can I fix this?

 

I would also like to pass prior conversation to the LLaMa model. I can do this to chat-bison with a body like this:

 

{

    "instances": [

        {

            "context": "",

            "examples": [],

            "messages": [

                {

                    "author": "user",

                    "content": "hello my name is tim"

                },

                {

                    "author": "bot",

                    "content": " Hello Tim, how can I help you today?

",

                    "citationMetadata": {

                        "citations": []

                    }

                },

                {

                    "author": "user",

                    "content": "what is my name"

                }

            ]

        }

    ],

    "parameters": {

        "candidateCount": 1,

        "maxOutputTokens": 1024,

        "temperature": 0.2,

        "topP": 0.8,

        "topK": 40

    }

}

 

The model will "remember" that my name is Tim. What is the syntax for doing the equivalent with LLaMa? Right now I am constrained to a singular "prompt" field like this:

 

{

  "instances": [

    { "prompt": "this is the prompt"}

  ],

"parameters": {

    "temperature": 0.2,

    "maxOutputTokens": 256,

    "topK": 40,

    "topP": 0.95

  }

}

 

How can I additionally pass prior queries and responses, or even a system prompt? Thank you in advance for your help!


2 replies

Userlevel 7
Badge +35

Hello @wbalkan,

Thank  you for your question. Did you try to fix the issue of the response cutting off mid-sentence, you can try the following:

  • Increase the maxOutputTokens parameter. The default value is 256, but you can try increasing it to 512, 1024, or even higher.
  • Set the stop_sequences parameter to a list of words or phrases that you want the model to stop generating text at. For example, you could set it to [".", "?", "!", ";"].
  • Set the temperature parameter to a higher value. This will make the model more likely to generate longer responses.
  • Set the topP parameter to a lower value. This will make the model more likely to generate more creative and diverse responses.
  • Experiment with different values for the temperature, topK, and topP parameters to find a combination that works well for your needs.

Also, you can check the link below to relevent answer that might be help you for your additional questions.

 

Userlevel 1
Badge

Hi @malamin - Thanks for your response! Unfortunately, the issue of the model cutting off mid-sentence is persisting even with these suggestions. Could there be something else limiting the output?

 

Also, in the link you sent, when adding the system prompt like below:

{
"instances": [
{
"prompt": "hello world",
"system": "This is a system prompt."
}
]
}

I get an Internal Server Error. Is there another way to pass the system prompt? What about a way to pass conversation history as a parameter? I have not found anywhere that has this information specific to the LLaMa models. Thanks so much for your help!

Reply