"TypeError: Object of type Undefined is not JSON serializable" when tokenizing tool_call inputs

#104

by ztgeng - opened Aug 17

Discussion

ztgeng

Aug 17

•

edited Aug 18

Issue

I was playing around with the tutorial example of tool use on the Meta-Llama-3.1-8B-Instruct model and noticed something off. Here’s what I ran (mostly lifted from the tutorial page):

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")

def get_current_temperature(location: str, unit: str) -> float:
    """
    Get the current temperature at a location.
    
    Args:
        location: The location to get the temperature for, in the format "City, Country"
        unit: The unit to return the temperature in. (choices: ["celsius", "fahrenheit"])
    Returns:
        The current temperature at the specified location in the specified units, as a float.
    """
    return 22.  # A real function should probably actually get the temperature!

def get_current_wind_speed(location: str) -> float:
    """
    Get the current wind speed in km/h at a given location.
    
    Args:
        location: The location to get the temperature for, in the format "City, Country"
    Returns:
        The current wind speed at the given location in km/h, as a float.
    """
    return 6.  # A real function should probably actually get the wind speed!

tools = [get_current_temperature, get_current_wind_speed]

messages = [
  {"role": "system", "content": "You are a bot that responds to weather queries. You should reply with the unit used in the queried location."},
  {"role": "user", "content": "Hey, what's the temperature in Paris right now?"}
]

inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}
out = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):]))

The model's response at this step was:

<|python_tag|>{"name": "get_current_temperature", "parameters": {"location": "Paris, France", "unit": "celsius"}}<|eom_id|>

...which is a bit different from what's documented on the tutorial page:

<tool_call>
{"arguments": {"location": "Paris, France", "unit": "celsius"}, "name": "get_current_temperature"}
</tool_call><|im_end|>

The key difference is the tutorial mentions a key "arguments," but the actual response from the Llama model has "parameters."

Continuing with the example, I hit a snag at this step:

import json

out_str = tokenizer.decode(out[0][len(inputs["input_ids"][0]):], skip_special_tokens=True)
tool_call = json.loads(out_str)
messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]})

messages.append({"role": "tool", "name": "get_current_temperature", "content": "22.0"})

inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}
out = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):]))

The tokenizer.apply_chat_template() will raise a TypeError: Object of type Undefined is not JSON serializable.

Cause

Diving a bit deeper, I discovered that the issue stems from how the tokenizer uses Jinja2 for generating prompt templates defined in tokenizer_config.json. Here's the juicy part:

{%- for message in messages %}\n
    ...
    {%- elif 'tool_calls' in message %}\n
        ...
        {%- set tool_call = message.tool_calls[0].function %}\n
        ...
        {%- else  %}\n
            {{- '<|start_header_id|>assistant<|end_header_id|>\\n\\n' -}}\n
            {{- '{\"name\": \"' + tool_call.name + '\", ' }}\n
            {{- '\"parameters\": ' }}\n
            {{- tool_call.arguments | tojson }}\n
            {{- \"}\" }}\n
        {%- endif %}\n

The crux of the problem is that it looks for tool_call.arguments in the message containing tool_calls, but Llama model spits out "parameters" instead. Hence, Jinja2 isn't happy during the JSON parsing.

And the model’s use of "parameters" instead of "arguments" is actually prompted by the very same template:

{{- '<|start_header_id|>user<|end_header_id|>\\n\\n' -}}\n
{{- \"Given the following functions, please respond with a JSON for a function call \" }}\n
{{- \"with its proper arguments that best answers the given prompt.\\n\\n\" }}\n
{{- 'Respond in the format {\"name\": function name, \"parameters\": dictionary of argument name and its value}.' }}\n
{{- \"Do not use variables.\\n\\n\" }}\n

It explicitly formats the response with {"name": function name, "parameters": dictionary of argument name and its value}.

Interestingly, the tutorial doesn’t encounter this error because it uses a mock message, not a live model response, which conveniently includes the correct key:

tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France", "unit": "celsius"}}
messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]})

Fix

Modify the template in tokenizer_config.json by replacing "parameters" with "arguments", or vice versa.

Also, the tokenizer_config.json files in the model Meta-Llama-3.1-70B-Instruct and Meta-Llama-3.1-405B-Instruct seem to share this glitch.

If you’re curious to see the fix in action, check out this notebook I whipped up.

Hope this helps someone out there!

ztgeng

Aug 17

Hey @Rocketknight1 , saw you were the last to tweak that file (tokenizer_config.json) and something quirky caught my eye. I took a stab at a fix, might be off the mark but would really value your input! 😅🛠️

Rocketknight1

Aug 19

•

edited Aug 19

~~Hi @ztgeng , thanks for raising this! There are a couple of issues here, firstly the example output is wrong (I think that was copied from a NousHermes model, my bad).~~

Secondly, the chat templates should standardize on the input format described here, which uses arguments as the input, even if the actual token input to the model is a dict with "parameters". The template should handle converting from the universal format with arguments to the model-specific format.

~~I'll open some PRs today to rectify this and credit you. Thank you for pointing it out!~~

Update: I investigated a little further! The example code here uses a NousHermes model, and is correct. The problem is a little more subtle: Even though Llama-3.1 emits a dictionary containing parameters, you should always append the tool-call to the chat in the standard format from the tutorial document, using arguments. We don't actually have example code for the full tool calling process in the Llama-3.1 model cards, but maybe we should!

The reason for replacing parameters with arguments is that we want a single, universal format for chats. Unfortunately, this means that sometimes you have to translate the model's tool outputs into the universal format. We're working on "inverse templates" that can do that for you, but for now it's a manual step! However, you're definitely right that this is confusing, so I opened a PR to update the docs here.

ztgeng

Aug 19

•

edited Aug 19

Thank you @Rocketknight1 for the swift follow-up!

I totally see the value in having a universal format for consistency's sake. It definitely helps keep things streamlined across different models. On that note, I found something interesting while comparing the NousHermes and Meta-Llama models:

The tokenizer_config.json for NousHermes sets up the chat template like this:

{{- '{\"name\": <function-name>, \"arguments\": <args-dict>}\n' }}\n

In contrast, the Meta-Llama's tokenizer_config.json suggests:

{{- 'Respond in the format {\"name\": function name, \"parameters\": dictionary of argument name and its value}.' }}\n

This difference seems to be a major factor in the discrepancies you have pointed out. I know this isn't a 100% reliable solution since the model's output can be unpredictable, but I believe it will help mitigate the issue.

I actually saw the error vanish after manually swapping "parameters" with "arguments" in the tokenizer_config.json on my local machine. Here’s a link to my gist showing that the model now works just fine with the tool_call output from the last step.

(BTW, I'm glad you brought it up! Feel free to copy the code from my gist for the tool calling full process examples if it helps.)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment