Guides

Logging & monitoring LLM

Logging & monitoring LLM

In this guide, you will learn how you can log different LLM metrics such as metadata, latency, score, LLM response and economics (prompt_tokens, completion_tokens, total_tokens ) to Orquesta.

The reserved keywords are:

  1. Scoring

  2. Metadata

  3. Latency

  4. LLM response

  5. Economics

You need to have the Orquesta Python or NodeJs SDK installed:

# Python Installation
pip install orquesta-sdk

# NodeJS installation
npm

  1. Scoring

Scoring serves as the compass that guides the LLM's responses toward coherence, relevance, and accuracy. Some of the methods used to get this include thumbs up/down, rating stars, and end-user feedback. It is of type int and represents feedback provided by the end user, which ranges between 0 and 100. You can implement your own logic on what what good looks like.

Using Python SDK:

metrics = OrquestaPromptMetrics(
    score=85
)
prompt.add_metrics(metrics)

Using NodeJS SDK:

prompt.addMetrics({
  score: 85,
});
  1. Metadata

Metadata refers to additional information or context that is associated with a text input or output. This information typically details the text, its source, or its purpose. It has a datatype of Dict and holds key-value pairs of custom fields to attach to the generated logs.

Using Python SDK:

metrics = OrquestaPromptMetrics(
    metadata={
        "custom": "custom_metadata",
        "domain": "ecommerce",
        "total_interactions": 200,
        "finish_reason": completion.choices[0].finish_reason,
    }
)
prompt.add_metrics(metrics)

Using NodeJS SDK:

prompt.addMetrics({
  metadata: {
    custom: 'custom_metadata',
    chain_id: 'ad1231xsdaABw',
    total_interactions: 200,
  },
});
  1. Latency

Latency refers to the time lag or delay between sending a request to the LLM and receiving the corresponding response. It is of the type int and represents the total time in milliseconds of the request to the LLM provider API.

Note: Logging latency is only needed when using Orquesta Prompts. When using Orquesta Endpoints, this is all handled by us.

Using Python SDK:

First, you need to calculate the start and end time of the completion request. The difference in the start time and end time is the latency, and it is calculated in milliseconds. You must import the OrquestaPromptMetrics class into your project to log the latency metric to Orquesta.

# Start time of the completion request
start_time = time.time()

# End time of the completion request
end_time = time.time()

# Calculate the difference (latency) in milliseconds
latency = (end_time - start_time) * 1000

Log the metric:

metrics = OrquestaPromptMetrics(
    latency=latency
)
prompt.add_metrics(metrics)

Using NodeJS SDK:

// log the latency mertic.
prompt.addMetrics({
  latency: 4000,
});
  1. LLM response

LLM response refers to the response or output or generated text returned by your LLM provider in response to a given prompt or query. The data type is str and it's essentially what the model generates as a reply or completion based on the input it receives.

Note: Logging the LLM response is only needed when using Orquesta Prompts. When using Orquesta Endpoints, this is all handled by us.

Using Python SDK:

metrics = OrquestaPromptMetrics(
    llm_response=completion.choices[0].message.content
)
prompt.add_metrics(metrics)

Using NodeJS SDK:

prompt.addMetrics({
  llm_response: 'Orquesta is awesome!',
});
  1. Economics

This is made available with the help of the OrquestaPromptMetricsEconomics class, and it contains prompt information about the prompt tokens, completion tokens, and total tokens.

Note: Logging economics is only needed when using Orquesta Prompts. When using Orquesta Endpoints, this is all handled by us. The tokens_per_second metric gets logged automatically by Orquesta.

  • prompt_tokens are the total tokens input into the model.

  • completion_tokens are the token output by the model.

  • total_tokens are the sum of the prompt tokens and the completion tokens.

Using Python SDK:

metrics = OrquestaPromptMetricsEconomics(
    economics={
        "prompt_tokens": completion.usage.get("prompt_tokens"),
        "completion_tokens": completion.usage.get("completion_tokens"),
        "total_tokens": completion.usage.get("total_tokens"),  
    },
)
prompt.add_metrics(metrics=metrics)

Using NodeJS SDK:

prompt.addMetrics({
  economics: {
    prompt_tokens: 1200,
    completion_tokens: 750,
    total_tokens: 1950,
  },
});

Full Code:

Using Python SDK:

metrics = OrquestaPromptMetrics(
    economics={
        "total_tokens": completion.usage.get("total_tokens"),
        "completion_tokens": completion.usage.get("completion_tokens"),
        "prompt_tokens": completion.usage.get("prompt_tokens"),
    },
    llm_response=completion.choices[0].message.content,
    latency=latency,
    score=85,
    metadata={
        "custom": "custom_metadata",
        "domain": "ecommerce",
        "total_interactions": 200,
        "finish_reason": completion.choices[0].finish_reason,
    },
)
prompt.add_metrics(metrics=metrics)

Using NodeJS SDK:

prompt.addMetrics({
  score: 85,
  latency: 4000,
  llm_response: 'Orquesta is awesome!',
  economics: {
    prompt_tokens: 1200,
    completion_tokens: 750,
    total_tokens: 1950,
  },
  metadata: {
    custom: 'custom_metadata',
    chain_id: 'ad1231xsdaABw',
    total_interactions: 200,
  },
});

Logging & monitoring LLM

In this guide, you will learn how you can log different LLM metrics such as metadata, latency, score, LLM response and economics (prompt_tokens, completion_tokens, total_tokens ) to Orquesta.

The reserved keywords are:

  1. Scoring

  2. Metadata

  3. Latency

  4. LLM response

  5. Economics

You need to have the Orquesta Python or NodeJs SDK installed:

# Python Installation
pip install orquesta-sdk

# NodeJS installation
npm

  1. Scoring

Scoring serves as the compass that guides the LLM's responses toward coherence, relevance, and accuracy. Some of the methods used to get this include thumbs up/down, rating stars, and end-user feedback. It is of type int and represents feedback provided by the end user, which ranges between 0 and 100. You can implement your own logic on what what good looks like.

Using Python SDK:

metrics = OrquestaPromptMetrics(
    score=85
)
prompt.add_metrics(metrics)

Using NodeJS SDK:

prompt.addMetrics({
  score: 85,
});
  1. Metadata

Metadata refers to additional information or context that is associated with a text input or output. This information typically details the text, its source, or its purpose. It has a datatype of Dict and holds key-value pairs of custom fields to attach to the generated logs.

Using Python SDK:

metrics = OrquestaPromptMetrics(
    metadata={
        "custom": "custom_metadata",
        "domain": "ecommerce",
        "total_interactions": 200,
        "finish_reason": completion.choices[0].finish_reason,
    }
)
prompt.add_metrics(metrics)

Using NodeJS SDK:

prompt.addMetrics({
  metadata: {
    custom: 'custom_metadata',
    chain_id: 'ad1231xsdaABw',
    total_interactions: 200,
  },
});
  1. Latency

Latency refers to the time lag or delay between sending a request to the LLM and receiving the corresponding response. It is of the type int and represents the total time in milliseconds of the request to the LLM provider API.

Note: Logging latency is only needed when using Orquesta Prompts. When using Orquesta Endpoints, this is all handled by us.

Using Python SDK:

First, you need to calculate the start and end time of the completion request. The difference in the start time and end time is the latency, and it is calculated in milliseconds. You must import the OrquestaPromptMetrics class into your project to log the latency metric to Orquesta.

# Start time of the completion request
start_time = time.time()

# End time of the completion request
end_time = time.time()

# Calculate the difference (latency) in milliseconds
latency = (end_time - start_time) * 1000

Log the metric:

metrics = OrquestaPromptMetrics(
    latency=latency
)
prompt.add_metrics(metrics)

Using NodeJS SDK:

// log the latency mertic.
prompt.addMetrics({
  latency: 4000,
});
  1. LLM response

LLM response refers to the response or output or generated text returned by your LLM provider in response to a given prompt or query. The data type is str and it's essentially what the model generates as a reply or completion based on the input it receives.

Note: Logging the LLM response is only needed when using Orquesta Prompts. When using Orquesta Endpoints, this is all handled by us.

Using Python SDK:

metrics = OrquestaPromptMetrics(
    llm_response=completion.choices[0].message.content
)
prompt.add_metrics(metrics)

Using NodeJS SDK:

prompt.addMetrics({
  llm_response: 'Orquesta is awesome!',
});
  1. Economics

This is made available with the help of the OrquestaPromptMetricsEconomics class, and it contains prompt information about the prompt tokens, completion tokens, and total tokens.

Note: Logging economics is only needed when using Orquesta Prompts. When using Orquesta Endpoints, this is all handled by us. The tokens_per_second metric gets logged automatically by Orquesta.

  • prompt_tokens are the total tokens input into the model.

  • completion_tokens are the token output by the model.

  • total_tokens are the sum of the prompt tokens and the completion tokens.

Using Python SDK:

metrics = OrquestaPromptMetricsEconomics(
    economics={
        "prompt_tokens": completion.usage.get("prompt_tokens"),
        "completion_tokens": completion.usage.get("completion_tokens"),
        "total_tokens": completion.usage.get("total_tokens"),  
    },
)
prompt.add_metrics(metrics=metrics)

Using NodeJS SDK:

prompt.addMetrics({
  economics: {
    prompt_tokens: 1200,
    completion_tokens: 750,
    total_tokens: 1950,
  },
});

Full Code:

Using Python SDK:

metrics = OrquestaPromptMetrics(
    economics={
        "total_tokens": completion.usage.get("total_tokens"),
        "completion_tokens": completion.usage.get("completion_tokens"),
        "prompt_tokens": completion.usage.get("prompt_tokens"),
    },
    llm_response=completion.choices[0].message.content,
    latency=latency,
    score=85,
    metadata={
        "custom": "custom_metadata",
        "domain": "ecommerce",
        "total_interactions": 200,
        "finish_reason": completion.choices[0].finish_reason,
    },
)
prompt.add_metrics(metrics=metrics)

Using NodeJS SDK:

prompt.addMetrics({
  score: 85,
  latency: 4000,
  llm_response: 'Orquesta is awesome!',
  economics: {
    prompt_tokens: 1200,
    completion_tokens: 750,
    total_tokens: 1950,
  },
  metadata: {
    custom: 'custom_metadata',
    chain_id: 'ad1231xsdaABw',
    total_interactions: 200,
  },
});

Logging & monitoring LLM

In this guide, you will learn how you can log different LLM metrics such as metadata, latency, score, LLM response and economics (prompt_tokens, completion_tokens, total_tokens ) to Orquesta.

The reserved keywords are:

  1. Scoring

  2. Metadata

  3. Latency

  4. LLM response

  5. Economics

You need to have the Orquesta Python or NodeJs SDK installed:

# Python Installation
pip install orquesta-sdk

# NodeJS installation
npm

  1. Scoring

Scoring serves as the compass that guides the LLM's responses toward coherence, relevance, and accuracy. Some of the methods used to get this include thumbs up/down, rating stars, and end-user feedback. It is of type int and represents feedback provided by the end user, which ranges between 0 and 100. You can implement your own logic on what what good looks like.

Using Python SDK:

metrics = OrquestaPromptMetrics(
    score=85
)
prompt.add_metrics(metrics)

Using NodeJS SDK:

prompt.addMetrics({
  score: 85,
});
  1. Metadata

Metadata refers to additional information or context that is associated with a text input or output. This information typically details the text, its source, or its purpose. It has a datatype of Dict and holds key-value pairs of custom fields to attach to the generated logs.

Using Python SDK:

metrics = OrquestaPromptMetrics(
    metadata={
        "custom": "custom_metadata",
        "domain": "ecommerce",
        "total_interactions": 200,
        "finish_reason": completion.choices[0].finish_reason,
    }
)
prompt.add_metrics(metrics)

Using NodeJS SDK:

prompt.addMetrics({
  metadata: {
    custom: 'custom_metadata',
    chain_id: 'ad1231xsdaABw',
    total_interactions: 200,
  },
});
  1. Latency

Latency refers to the time lag or delay between sending a request to the LLM and receiving the corresponding response. It is of the type int and represents the total time in milliseconds of the request to the LLM provider API.

Note: Logging latency is only needed when using Orquesta Prompts. When using Orquesta Endpoints, this is all handled by us.

Using Python SDK:

First, you need to calculate the start and end time of the completion request. The difference in the start time and end time is the latency, and it is calculated in milliseconds. You must import the OrquestaPromptMetrics class into your project to log the latency metric to Orquesta.

# Start time of the completion request
start_time = time.time()

# End time of the completion request
end_time = time.time()

# Calculate the difference (latency) in milliseconds
latency = (end_time - start_time) * 1000

Log the metric:

metrics = OrquestaPromptMetrics(
    latency=latency
)
prompt.add_metrics(metrics)

Using NodeJS SDK:

// log the latency mertic.
prompt.addMetrics({
  latency: 4000,
});
  1. LLM response

LLM response refers to the response or output or generated text returned by your LLM provider in response to a given prompt or query. The data type is str and it's essentially what the model generates as a reply or completion based on the input it receives.

Note: Logging the LLM response is only needed when using Orquesta Prompts. When using Orquesta Endpoints, this is all handled by us.

Using Python SDK:

metrics = OrquestaPromptMetrics(
    llm_response=completion.choices[0].message.content
)
prompt.add_metrics(metrics)

Using NodeJS SDK:

prompt.addMetrics({
  llm_response: 'Orquesta is awesome!',
});
  1. Economics

This is made available with the help of the OrquestaPromptMetricsEconomics class, and it contains prompt information about the prompt tokens, completion tokens, and total tokens.

Note: Logging economics is only needed when using Orquesta Prompts. When using Orquesta Endpoints, this is all handled by us. The tokens_per_second metric gets logged automatically by Orquesta.

  • prompt_tokens are the total tokens input into the model.

  • completion_tokens are the token output by the model.

  • total_tokens are the sum of the prompt tokens and the completion tokens.

Using Python SDK:

metrics = OrquestaPromptMetricsEconomics(
    economics={
        "prompt_tokens": completion.usage.get("prompt_tokens"),
        "completion_tokens": completion.usage.get("completion_tokens"),
        "total_tokens": completion.usage.get("total_tokens"),  
    },
)
prompt.add_metrics(metrics=metrics)

Using NodeJS SDK:

prompt.addMetrics({
  economics: {
    prompt_tokens: 1200,
    completion_tokens: 750,
    total_tokens: 1950,
  },
});

Full Code:

Using Python SDK:

metrics = OrquestaPromptMetrics(
    economics={
        "total_tokens": completion.usage.get("total_tokens"),
        "completion_tokens": completion.usage.get("completion_tokens"),
        "prompt_tokens": completion.usage.get("prompt_tokens"),
    },
    llm_response=completion.choices[0].message.content,
    latency=latency,
    score=85,
    metadata={
        "custom": "custom_metadata",
        "domain": "ecommerce",
        "total_interactions": 200,
        "finish_reason": completion.choices[0].finish_reason,
    },
)
prompt.add_metrics(metrics=metrics)

Using NodeJS SDK:

prompt.addMetrics({
  score: 85,
  latency: 4000,
  llm_response: 'Orquesta is awesome!',
  economics: {
    prompt_tokens: 1200,
    completion_tokens: 750,
    total_tokens: 1950,
  },
  metadata: {
    custom: 'custom_metadata',
    chain_id: 'ad1231xsdaABw',
    total_interactions: 200,
  },
});

Start powering your SaaS with LLMs

Start

powering

your SaaS

with LLMs

Start powering your SaaS with LLMs