clj-llm: a Simple Clojure Library for LLMs

2026-03-18 Tue 18:40 article clojure llm

Quick Iteration with LLMs

When coding for fun, I enjoy using Clojure. Moreso, I enjoy coding clojure in an environment (cider) that allows for immediate and interactive iteration. Being able to write code and evaluate it against a live image as I build up my understanding of a problem is an enjoyable way to get into a flow state.

I also enjoy working with LLMs, and exploring the many nuances and ins-and-outs of working with smaller or newer models.

I wanted an ergonomic way to combine these two things.

At work I use python, and simonw's llm library is a very nice way to start hacking on and poking around with a model or prompt. For clojure I could not find a similar library that fit my thinking and felt as fun to use, so I started hacking on clj-llm.

Core aspects

A core interface wrapping many providers, allowing for swapping out providers and models easily.
Compatibility with both babashka and the jvm, to allow for both scripting and app development.
Malli as the basis for defining structured outputs and tool call signatures. JSON Schemas are just way too clunky to work with interactively.
Providers should values, and should be extendable and inspectable. They should be easily passed around, and should form the basis of the API.

Providers

Providers hold configuration for backends like API keys, base URLs, and default options. Let's create one:

(require '[co.poyo.clj-llm.core :as llm]
         '[co.poyo.clj-llm.backend.openai :as openai])

(def openai-provider (openai/backend))

This returns an LLMProvider proto with:

:api-base set to "https://api.openai.com/v1"
:api-key-fn set to retrieve OPENAI_API_KEY from the environment

You can customize the provider by passing options:

(def ollama (openai/backend {:api-base "http://localhost:11434/v1"
                             :api-key false}))

:api-key can be

a string for direct setting of the API key
a function for retrieval of the API key
false for backends which require no Authorization headers, such as locallama.

Blocking Generation

Once you have a provider, the generate function is the primary way to interact with LLMs. It takes a provider, an optional map of options and the user input

Example call

;; No opts provided
(llm/generate provider "What is the capital of France?")

(llm/generate
 provider                           ;; provider
 {:model "gpt-4o-mini"}             ;; call opts (optional)
 "What is the capital of France?")  ;; user input

Example Result

{
 :text "The capital of France is Paris.",
 :timings {:duration-ms 1879, :text {:start-ms 1877, :duration-ms 2}},
 :usage {:prompt-tokens 13,
         :completion-tokens 16,
         :total-tokens 29,
         :prompt-tokens-details {:cached-tokens 0, :audio-tokens 0},
         :completion-tokens-details
         {:reasoning-tokens 0,
          :audio-tokens 0,
          :accepted-prediction-tokens 0,
          :rejected-prediction-tokens 0},
         :finish-reason "stop",
         :model "gpt-5-mini"}}

Apart fromt the :text key containing the result, we have :timings for time until first token and duration of generation, as well as usage data.

Options for calls

The second argument to generate is a map of call options. These apply to this specific call:

(llm/generate openai-provider
  {:model "gpt-4o-mini"
   :temperature 0.9
   :max-tokens 50}
  "Name three colors")

Common options:

:model: Which model to use. Not set by default, and a call will error unless set.
:temperature: randomness of token selection (0 = deterministic, 2 = very creative)
:max-tokens: maximum tokens to generate
:system-prompt: instructions that persist across calls, setting behavior

For less common or model dependent options, :provider-opts key will be passed through as JSON to the API call directly. For example, to set the reasoning effort on gpt-5 type models:

(def gpt-5-mini
  (assoc openai-provider
         :defaults {:model "gpt-5-mini"
                    :provider-opts {:reasoning-effort "low"}}))

Provider Defaults

Passing options every call is tedious. Providers can have :defaults that apply to every generate call:

(def gpt-4o-mini
  (assoc openai-provider
         :defaults {:model "gpt-4o-mini"
                    :temperature 0.7}))

(llm/generate gpt-4o-mini "Hello!")  ; uses defaults

You can always override them per-call:

(llm/generate gpt-4o-mini
  {:temperature 0.2}  ; overrides just temperature
  "Be very precise")

Using opts for comparison

Already we can see how driving llm calls using data can enable us to easily compare behavior of models.

Comparison of behavior and timings across models

(for [model [gpt-4o-mini gpt-5-mini]]
  (-> (llm/generate model "What is a good slogan for the Asakusa district in Tokyo?")
      ((juxt :text :timings))))

Results


(["\"Experience Tradition, Embrace Culture: Explore Asakusa!\""
  {:duration-ms 881,
   :text {:start-ms 698,
          :duration-ms 183}}]
 ["Here are several slogan options for Asakusa, each with a different tone. Pick one you like or tell me the mood and I’ll refine it.\n\nTraditional / historic\n- \"Asakusa: Tokyo’s Heart of Tradition\"\n- \"Walk the Old Tokyo — Discover Asakusa\"\n\nCultural / spiritual\n- \"Where Blessings and Stories Meet\"\n- \"Asakusa: Where Tradition Prays and Streets Tell\"\n\nVibrant / lively\n- \"Asakusa: Timeless Energy in Every Street\"\n- \"Taste the Past. Feel the Festival.\"\n\nCharming / tourist-friendly\n- \"Step Into Old Tokyo\"\n- \"Asakusa: Snapshots of Old Tokyo\"\n\nShort & punchy\n- \"Asakusa: Tradition Alive\"\n- \"Asakusa — Old Soul, Bright Streets\"\n\nIf you want Japanese versions, a shorter set for signage, or variants for tourism campaigns (families, nightlife, food), I can refine them."
 {:duration-ms 4534,
  :text {:start-ms 2222,
         :duration-ms 2312}}])

We can clearly see the difference in timing and output from the two models!

Conversations

Instead of just text, input may be a vector of maps representing a conversation:

(def conversation
  [{:role :user :content "What is couplet?"}
   {:role :assistant :content "A couplet is a pair of successive lines of poetry that typically rhyme and have the same meter"}
   {:role :user :content "Give me an example about cats"}])

(-> (llm/generate gpt-4o-mini conversation) :text)

:text

Sure! Here’s a cat-themed couplet:

In sunlit spots, my cat will lay,
A purring ball of fur at play.

Since generate results auto-unwrap :text, you can thread calls without extracting:

(->> "Write a haiku"
     (llm/generate provider)
     (llm/generate provider {:system-prompt "Translate to Japanese"}))

A simple chat repl

This can be used to create a very simple chat repl:

(def history (atom [{:role :system
                     :content "You are a helpful assistant. Be concise."}]))

(defn chat! [msg]
  (swap! history conj {:role :user :content msg})
  (let [result (llm/generate openai-provider @history)]
    (swap! history conj {:role :assistant :content (:text result)})
    (:text result)))

(loop []
  (print "> ")
  (flush)
  (when-let [input (read-line)]
    (when-not (empty? input)
      (println (chat! input)))
    (recur)))

Structured Output with Malli

You can use malli schemas to get precise data shapes back:

(llm/generate openai-provider
  {:schema [:map
            [:name :string]
            [:age :int]
            [:occupation :string]]}
  "Marie Curie was a 66 year old physicist")

The result includes both the raw JSON text and the structured data as a clojure map, having been verified using the malli schema.

{:text "{\"name\":\"Marie Curie\",\"age\":66,\"occupation\":\"physicist\"}"
 :structured {:name "Marie Curie", :age 66, :occupation "physicist"}
 :usage {...}}

Put the schema in defaults to create reusable extractors:

(def person-extractor
  (update gpt-5-mini
          :defaults merge
          {:schema [:map [:name :string] [:age :int]]
           :system-prompt "Extract the person's name and age from the text."}))

(:structured (llm/generate person-extractor "John is 30 years old"))

{:name "John", :age 30}

Tools (Function Calling)

Tools are functions instrumented with malli schemas:

(defn geocode
  {:malli/schema [:=>
                  [:cat [:map [:city :string]]]
                  [:map [:name :string] [:country :string]
                   [:latitude :double] [:longitude :double]]]}
  [{:keys [city]}]
  (let [geo (-> (slurp (str "https://geocoding-api.open-meteo.com/v1/search?name="
                            (java.net.URLEncoder/encode city "UTF-8") "&count=1"))
                (json/parse-string true))
        loc (first (:results geo))]
    {:name (:name loc) :country (:country loc)
     :latitude (:latitude loc) :longitude (:longitude loc)}))

Since these are just regular functions with extra metadata, they can be easily tested before use with the LLM.

(geocode {:city "Tokyo"})
;; => {:name "Tokyo", :country "Japan", :latitude 35.6895, :longitude 139.69171}

Use them with generate by passing the functions as an array of var.

(llm/generate openai-provider
  {:tools [#'geocode]}
  "What are the coordinates of Tokyo?")

Result includes two additional keys:

:tool-calls: The arguments passed to tools
:tool-results: The results of having called the functions associated with the tool calls

{
 :tool-calls [{:id "call_abc"
               :name "geocode"
               :arguments {:city "Tokyo"}}]
 :tool-results [{:name "Tokyo" :country "Japan"
                 :latitude 35.6895 :longitude 139.6917}]
 :text "Tokyo is ...."
 :usage {...}
 :timings {...}
}

Text and usage still exist in the response:

:text

Tokyo (central) — approx. 35.6895° N, 139.6917° E (WGS84).

In DMS: 35°41′22″ N, 139°41′30″ E.

These are approximate coordinates for central Tokyo (Chiyoda/Shinjuku area).",

:usage

{:prompt-tokens 130,
  :completion-tokens 56,
  :total-tokens 186,
  :prompt-tokens-details {:cached-tokens 0, :audio-tokens 0},
  :completion-tokens-details
  {:reasoning-tokens 0,
   :audio-tokens 0,
   :accepted-prediction-tokens 0,
   :rejected-prediction-tokens 0},
  :finish-reason "stop",
  :model "gpt-5-mini"}

Content (Images & PDFs)

Input can include images and PDFs using the content namespace:

(require '[co.poyo.clj-llm.content :as content])

;; Single image
(llm/generate gpt-4o-mini [(content/text "Describe this image")
                          (content/image "photo.jpg")])

;; Image with resize (controls cost/size)
(llm/generate gpt-4o-mini ["What's in this?"
                          (content/image "chart.png" {:max-edge 512})])

;; PDF
(llm/generate gpt-4o-mini ["Summarize"
                          (content/pdf "invoice.pdf")])

image accepts:

File path (base64 encodes)
URL string (passed by reference, or downloaded if resizing)
Raw bytes + media-type

Resize options: :max-edge, :max-width, :max-height, :format.

Streaming

You can receive chunks as they're generated using :on-text and :on-reasoning callbacks:

(llm/generate gpt-5-mini
  {:on-text (fn [chunk] (print chunk) (flush))
   :on-reasoning (fn [chunk] (print "[reasoning]") (flush))}
  "Write a haiku about Clojure")

These fire as tokens arrive. generate still blocks until complete, returning the full response when done.

There are also callbacks for tools:

:on-tool-call — fires when the LLM requests a tool call
:on-tool-result — fires after the tool function returns

(llm/generate gpt-5-mini
  {:tools [#'geocode]
   :on-tool-call (fn [call] (println "Calling" (:name call)))
   :on-tool-result (fn [result] (println "Got result" result))}
  "What's the weather in Tokyo?")

The events function gives lower-level access via a core.async channel:

(require '[clojure.core.async :as a])

(let [ch (llm/events gpt-5-mini "Count to 5")]
  (loop []
    (when-let [event (a/<!! ch)]
      (println event))
      (recur))))

events

{:time 1818, :index 1, :event {:type :content, :content "1"}}
{:time 1821, :index 2, :event {:type :content, :content " \n"}}
{:time 1821, :index 3, :event {:type :content, :content "2"}}
{:time 1821, :index 4, :event {:type :content, :content " \n"}}
{:time 1821, :index 5, :event {:type :content, :content "3"}}
{:time 1821, :index 6, :event {:type :content, :content " \n"}}
{:time 1826, :index 7, :event {:type :content, :content "4"}}
{:time 1826, :index 8, :event {:type :content, :content " \n"}}
{:time 1833, :index 9, :event {:type :content, :content "5"}}
{:time 1833, :index 10, :event {:type :finish, :reason "stop"}}
{:time 1833, :index 11, :event {:type :usage, :prompt-tokens 19, :completion-tokens 9, :total-tokens 28, :prompt-tokens-details {:cached-tokens 0, :audio-tokens 0}, :completion-tokens-details {:reasoning-tokens 0, :audio-tokens 0, :accepted-prediction-tokens 0, :rejected-prediction-tokens 0}}}

Close the channel to cancel the request and clean up HTTP resources.

run-agent

run-agent wraps generate in a loop, repeatedly calling the LLM until tools are no longer requested. It's like generate but automatically handles tool execution and conversation history.

(llm/run-agent provider [#'search #'geocode] "Find a nice cafe in Tokyo and get its coordinates")

Returns the final response along with the full conversation history and a trace of each step:

{:text "The cafe is at..."
 :history [{:role :user :content ...}
           {:role :assistant :tool-calls ...}
           {:role :tool :content ...}
           {:role :assistant :content ...}]
 :steps [{:step 0
          :tool-calls [...]
          :tool-results [...]}
         {:step 1
          ...}]
 :usage {...}}

Key options:

:max-steps: max iterations (default 10)
:stop-when: fn called after each step, return truthy to stop early
:on-text, :on-tool-calls, :on-tool-result: same callbacks as generate