Guides
Logging & monitoring LLM
Logging & monitoring LLM
In this guide, you will learn how you can log different LLM metrics such as metadata, latency, score, LLM response and economics (prompt_tokens
, completion_tokens
, total_tokens
) to Orquesta.
The reserved keywords are:
Scoring
Metadata
Latency
LLM response
Economics
You need to have the Orquesta Python or NodeJs SDK installed:
# Python Installation
pip install orquesta-sdk
# NodeJS installation
npm
Scoring
Scoring serves as the compass that guides the LLM's responses toward coherence, relevance, and accuracy. Some of the methods used to get this include thumbs up/down, rating stars, and end-user feedback. It is of type int
and represents feedback provided by the end user, which ranges between 0 and 100. You can implement your own logic on what what good looks like.
Using Python SDK:
metrics = OrquestaPromptMetrics(
score=85
)
prompt.add_metrics(metrics)
Using NodeJS SDK:
prompt.addMetrics({
score: 85,
});
Metadata
Metadata refers to additional information or context that is associated with a text input or output. This information typically details the text, its source, or its purpose. It has a datatype of Dict
and holds key-value pairs of custom fields to attach to the generated logs.
Using Python SDK:
metrics = OrquestaPromptMetrics(
metadata={
"custom": "custom_metadata",
"domain": "ecommerce",
"total_interactions": 200,
"finish_reason": completion.choices[0].finish_reason,
}
)
prompt.add_metrics(metrics)
Using NodeJS SDK:
prompt.addMetrics({
metadata: {
custom: 'custom_metadata',
chain_id: 'ad1231xsdaABw',
total_interactions: 200,
},
});
Latency
Latency refers to the time lag or delay between sending a request to the LLM and receiving the corresponding response. It is of the type int
and represents the total time in milliseconds of the request to the LLM provider API.
Note: Logging latency is only needed when using Orquesta Prompts. When using Orquesta Endpoints, this is all handled by us.
Using Python SDK:
First, you need to calculate the start and end time of the completion request. The difference in the start time and end time is the latency, and it is calculated in milliseconds. You must import the OrquestaPromptMetrics
class into your project to log the latency metric to Orquesta.
# Start time of the completion request
start_time = time.time()
# End time of the completion request
end_time = time.time()
# Calculate the difference (latency) in milliseconds
latency = (end_time - start_time) * 1000
Log the metric:
metrics = OrquestaPromptMetrics(
latency=latency
)
prompt.add_metrics(metrics)
Using NodeJS SDK:
// log the latency mertic.
prompt.addMetrics({
latency: 4000,
});
LLM response
LLM response refers to the response or output or generated text returned by your LLM provider in response to a given prompt or query. The data type is str
and it's essentially what the model generates as a reply or completion based on the input it receives.
Note: Logging the LLM response is only needed when using Orquesta Prompts. When using Orquesta Endpoints, this is all handled by us.
Using Python SDK:
metrics = OrquestaPromptMetrics(
llm_response=completion.choices[0].message.content
)
prompt.add_metrics(metrics)
Using NodeJS SDK:
prompt.addMetrics({
llm_response: 'Orquesta is awesome!',
});
Economics
This is made available with the help of the OrquestaPromptMetricsEconomics
class, and it contains prompt information about the prompt tokens, completion tokens, and total tokens.
Note: Logging economics is only needed when using Orquesta Prompts. When using Orquesta Endpoints, this is all handled by us. The tokens_per_second
metric gets logged automatically by Orquesta.
prompt_tokens
are the total tokens input into the model.completion_tokens
are the token output by the model.total_tokens
are the sum of the prompt tokens and the completion tokens.
Using Python SDK:
metrics = OrquestaPromptMetricsEconomics(
economics={
"prompt_tokens": completion.usage.get("prompt_tokens"),
"completion_tokens": completion.usage.get("completion_tokens"),
"total_tokens": completion.usage.get("total_tokens"),
},
)
prompt.add_metrics(metrics=metrics)
Using NodeJS SDK:
prompt.addMetrics({
economics: {
prompt_tokens: 1200,
completion_tokens: 750,
total_tokens: 1950,
},
});
Full Code:
Using Python SDK:
metrics = OrquestaPromptMetrics(
economics={
"total_tokens": completion.usage.get("total_tokens"),
"completion_tokens": completion.usage.get("completion_tokens"),
"prompt_tokens": completion.usage.get("prompt_tokens"),
},
llm_response=completion.choices[0].message.content,
latency=latency,
score=85,
metadata={
"custom": "custom_metadata",
"domain": "ecommerce",
"total_interactions": 200,
"finish_reason": completion.choices[0].finish_reason,
},
)
prompt.add_metrics(metrics=metrics)
Using NodeJS SDK:
prompt.addMetrics({
score: 85,
latency: 4000,
llm_response: 'Orquesta is awesome!',
economics: {
prompt_tokens: 1200,
completion_tokens: 750,
total_tokens: 1950,
},
metadata: {
custom: 'custom_metadata',
chain_id: 'ad1231xsdaABw',
total_interactions: 200,
},
});
Logging & monitoring LLM
In this guide, you will learn how you can log different LLM metrics such as metadata, latency, score, LLM response and economics (prompt_tokens
, completion_tokens
, total_tokens
) to Orquesta.
The reserved keywords are:
Scoring
Metadata
Latency
LLM response
Economics
You need to have the Orquesta Python or NodeJs SDK installed:
# Python Installation
pip install orquesta-sdk
# NodeJS installation
npm
Scoring
Scoring serves as the compass that guides the LLM's responses toward coherence, relevance, and accuracy. Some of the methods used to get this include thumbs up/down, rating stars, and end-user feedback. It is of type int
and represents feedback provided by the end user, which ranges between 0 and 100. You can implement your own logic on what what good looks like.
Using Python SDK:
metrics = OrquestaPromptMetrics(
score=85
)
prompt.add_metrics(metrics)
Using NodeJS SDK:
prompt.addMetrics({
score: 85,
});
Metadata
Metadata refers to additional information or context that is associated with a text input or output. This information typically details the text, its source, or its purpose. It has a datatype of Dict
and holds key-value pairs of custom fields to attach to the generated logs.
Using Python SDK:
metrics = OrquestaPromptMetrics(
metadata={
"custom": "custom_metadata",
"domain": "ecommerce",
"total_interactions": 200,
"finish_reason": completion.choices[0].finish_reason,
}
)
prompt.add_metrics(metrics)
Using NodeJS SDK:
prompt.addMetrics({
metadata: {
custom: 'custom_metadata',
chain_id: 'ad1231xsdaABw',
total_interactions: 200,
},
});
Latency
Latency refers to the time lag or delay between sending a request to the LLM and receiving the corresponding response. It is of the type int
and represents the total time in milliseconds of the request to the LLM provider API.
Note: Logging latency is only needed when using Orquesta Prompts. When using Orquesta Endpoints, this is all handled by us.
Using Python SDK:
First, you need to calculate the start and end time of the completion request. The difference in the start time and end time is the latency, and it is calculated in milliseconds. You must import the OrquestaPromptMetrics
class into your project to log the latency metric to Orquesta.
# Start time of the completion request
start_time = time.time()
# End time of the completion request
end_time = time.time()
# Calculate the difference (latency) in milliseconds
latency = (end_time - start_time) * 1000
Log the metric:
metrics = OrquestaPromptMetrics(
latency=latency
)
prompt.add_metrics(metrics)
Using NodeJS SDK:
// log the latency mertic.
prompt.addMetrics({
latency: 4000,
});
LLM response
LLM response refers to the response or output or generated text returned by your LLM provider in response to a given prompt or query. The data type is str
and it's essentially what the model generates as a reply or completion based on the input it receives.
Note: Logging the LLM response is only needed when using Orquesta Prompts. When using Orquesta Endpoints, this is all handled by us.
Using Python SDK:
metrics = OrquestaPromptMetrics(
llm_response=completion.choices[0].message.content
)
prompt.add_metrics(metrics)
Using NodeJS SDK:
prompt.addMetrics({
llm_response: 'Orquesta is awesome!',
});
Economics
This is made available with the help of the OrquestaPromptMetricsEconomics
class, and it contains prompt information about the prompt tokens, completion tokens, and total tokens.
Note: Logging economics is only needed when using Orquesta Prompts. When using Orquesta Endpoints, this is all handled by us. The tokens_per_second
metric gets logged automatically by Orquesta.
prompt_tokens
are the total tokens input into the model.completion_tokens
are the token output by the model.total_tokens
are the sum of the prompt tokens and the completion tokens.
Using Python SDK:
metrics = OrquestaPromptMetricsEconomics(
economics={
"prompt_tokens": completion.usage.get("prompt_tokens"),
"completion_tokens": completion.usage.get("completion_tokens"),
"total_tokens": completion.usage.get("total_tokens"),
},
)
prompt.add_metrics(metrics=metrics)
Using NodeJS SDK:
prompt.addMetrics({
economics: {
prompt_tokens: 1200,
completion_tokens: 750,
total_tokens: 1950,
},
});
Full Code:
Using Python SDK:
metrics = OrquestaPromptMetrics(
economics={
"total_tokens": completion.usage.get("total_tokens"),
"completion_tokens": completion.usage.get("completion_tokens"),
"prompt_tokens": completion.usage.get("prompt_tokens"),
},
llm_response=completion.choices[0].message.content,
latency=latency,
score=85,
metadata={
"custom": "custom_metadata",
"domain": "ecommerce",
"total_interactions": 200,
"finish_reason": completion.choices[0].finish_reason,
},
)
prompt.add_metrics(metrics=metrics)
Using NodeJS SDK:
prompt.addMetrics({
score: 85,
latency: 4000,
llm_response: 'Orquesta is awesome!',
economics: {
prompt_tokens: 1200,
completion_tokens: 750,
total_tokens: 1950,
},
metadata: {
custom: 'custom_metadata',
chain_id: 'ad1231xsdaABw',
total_interactions: 200,
},
});
Logging & monitoring LLM
In this guide, you will learn how you can log different LLM metrics such as metadata, latency, score, LLM response and economics (prompt_tokens
, completion_tokens
, total_tokens
) to Orquesta.
The reserved keywords are:
Scoring
Metadata
Latency
LLM response
Economics
You need to have the Orquesta Python or NodeJs SDK installed:
# Python Installation
pip install orquesta-sdk
# NodeJS installation
npm
Scoring
Scoring serves as the compass that guides the LLM's responses toward coherence, relevance, and accuracy. Some of the methods used to get this include thumbs up/down, rating stars, and end-user feedback. It is of type int
and represents feedback provided by the end user, which ranges between 0 and 100. You can implement your own logic on what what good looks like.
Using Python SDK:
metrics = OrquestaPromptMetrics(
score=85
)
prompt.add_metrics(metrics)
Using NodeJS SDK:
prompt.addMetrics({
score: 85,
});
Metadata
Metadata refers to additional information or context that is associated with a text input or output. This information typically details the text, its source, or its purpose. It has a datatype of Dict
and holds key-value pairs of custom fields to attach to the generated logs.
Using Python SDK:
metrics = OrquestaPromptMetrics(
metadata={
"custom": "custom_metadata",
"domain": "ecommerce",
"total_interactions": 200,
"finish_reason": completion.choices[0].finish_reason,
}
)
prompt.add_metrics(metrics)
Using NodeJS SDK:
prompt.addMetrics({
metadata: {
custom: 'custom_metadata',
chain_id: 'ad1231xsdaABw',
total_interactions: 200,
},
});
Latency
Latency refers to the time lag or delay between sending a request to the LLM and receiving the corresponding response. It is of the type int
and represents the total time in milliseconds of the request to the LLM provider API.
Note: Logging latency is only needed when using Orquesta Prompts. When using Orquesta Endpoints, this is all handled by us.
Using Python SDK:
First, you need to calculate the start and end time of the completion request. The difference in the start time and end time is the latency, and it is calculated in milliseconds. You must import the OrquestaPromptMetrics
class into your project to log the latency metric to Orquesta.
# Start time of the completion request
start_time = time.time()
# End time of the completion request
end_time = time.time()
# Calculate the difference (latency) in milliseconds
latency = (end_time - start_time) * 1000
Log the metric:
metrics = OrquestaPromptMetrics(
latency=latency
)
prompt.add_metrics(metrics)
Using NodeJS SDK:
// log the latency mertic.
prompt.addMetrics({
latency: 4000,
});
LLM response
LLM response refers to the response or output or generated text returned by your LLM provider in response to a given prompt or query. The data type is str
and it's essentially what the model generates as a reply or completion based on the input it receives.
Note: Logging the LLM response is only needed when using Orquesta Prompts. When using Orquesta Endpoints, this is all handled by us.
Using Python SDK:
metrics = OrquestaPromptMetrics(
llm_response=completion.choices[0].message.content
)
prompt.add_metrics(metrics)
Using NodeJS SDK:
prompt.addMetrics({
llm_response: 'Orquesta is awesome!',
});
Economics
This is made available with the help of the OrquestaPromptMetricsEconomics
class, and it contains prompt information about the prompt tokens, completion tokens, and total tokens.
Note: Logging economics is only needed when using Orquesta Prompts. When using Orquesta Endpoints, this is all handled by us. The tokens_per_second
metric gets logged automatically by Orquesta.
prompt_tokens
are the total tokens input into the model.completion_tokens
are the token output by the model.total_tokens
are the sum of the prompt tokens and the completion tokens.
Using Python SDK:
metrics = OrquestaPromptMetricsEconomics(
economics={
"prompt_tokens": completion.usage.get("prompt_tokens"),
"completion_tokens": completion.usage.get("completion_tokens"),
"total_tokens": completion.usage.get("total_tokens"),
},
)
prompt.add_metrics(metrics=metrics)
Using NodeJS SDK:
prompt.addMetrics({
economics: {
prompt_tokens: 1200,
completion_tokens: 750,
total_tokens: 1950,
},
});
Full Code:
Using Python SDK:
metrics = OrquestaPromptMetrics(
economics={
"total_tokens": completion.usage.get("total_tokens"),
"completion_tokens": completion.usage.get("completion_tokens"),
"prompt_tokens": completion.usage.get("prompt_tokens"),
},
llm_response=completion.choices[0].message.content,
latency=latency,
score=85,
metadata={
"custom": "custom_metadata",
"domain": "ecommerce",
"total_interactions": 200,
"finish_reason": completion.choices[0].finish_reason,
},
)
prompt.add_metrics(metrics=metrics)
Using NodeJS SDK:
prompt.addMetrics({
score: 85,
latency: 4000,
llm_response: 'Orquesta is awesome!',
economics: {
prompt_tokens: 1200,
completion_tokens: 750,
total_tokens: 1950,
},
metadata: {
custom: 'custom_metadata',
chain_id: 'ad1231xsdaABw',
total_interactions: 200,
},
});