The easiest way to use a [[Large Language Model]] locally is using [[Ollama]] to manage the image generation, model setting and serving.
## Managing models
Using the `ollama` CLI, we can download any model using:
```shell
ollama pull llama3.2
```
The `llama3.2` is the model identifier, which can be found using the [web search in the official website](https://ollama.com/search). We can start a model using:
```shell
ollama run llama3.2
```
Which will start a chat within the CLI. To delete any model, simply use:
```shell
ollama rm llama3.2
```
Each time a requests arrives, `ollama` launches an instance of the model that is up in memory for the next five minutes. To see all the active instances, run:
```shell
ollama ps
```
## Serving a model
All the models that are running can be asked via an [[HTTP|HTTP request]]; a basic example using [[cURL]] from the CLI would be:
```shell
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Why is the sky blue?",
"stream": false
}'
```
There are several more examples on how to perform request to the [[Ollama]] sever in [their official documentation](https://github.com/ollama/ollama/blob/main/docs/api.md#examples).