The easiest way to use a [[Large Language Model]] locally is using [[Ollama]] to manage the image generation, model setting and serving. ## Managing models Using the `ollama` CLI, we can download any model using: ```shell ollama pull llama3.2 ``` The `llama3.2` is the model identifier, which can be found using the [web search in the official website](https://ollama.com/search). We can start a model using: ```shell ollama run llama3.2 ``` Which will start a chat within the CLI. To delete any model, simply use: ```shell ollama rm llama3.2 ``` Each time a requests arrives, `ollama` launches an instance of the model that is up in memory for the next five minutes. To see all the active instances, run: ```shell ollama ps ``` ## Serving a model All the models that are running can be asked via an [[HTTP|HTTP request]]; a basic example using [[cURL]] from the CLI would be: ```shell curl http://localhost:11434/api/generate -d '{ "model": "llama3.2", "prompt": "Why is the sky blue?", "stream": false }' ``` There are several more examples on how to perform request to the [[Ollama]] sever in [their official documentation](https://github.com/ollama/ollama/blob/main/docs/api.md#examples).