Alex Cheema
|
f051ebe6e0
remove accidentally added files
|
1 year ago |
Mark Kockerbeck
|
5eafd5a305
try/except for decode, #75
|
1 year ago |
Mark Kockerbeck
|
d2fa7b247e
Showing the message only if successfully decoded, #75
|
1 year ago |
Mark Kockerbeck
|
f1cd5ae7a6
Merge branch 'main' of github.com:xeb/exo
|
1 year ago |
Mark Kockerbeck
|
4f5ab78d9d
Addressing issue #75 to avoid decoding binary packets
|
1 year ago |
Alex Cheema
|
5a23376059
add log_request middleware if DEBUG>=2 to chatgpt api to debug api issues, default always to llama-3.1-8b
|
1 year ago |
Alex Cheema
|
2084784470
per-request kv cache, remove all explicit reset functionality as it wasnt used. fixes #67
|
1 year ago |
Alex Cheema
|
dd8c5d63a9
add support for mistral nemo and mistral large
|
1 year ago |
Alex Cheema
|
03fe7a058c
more robust message parsing fixes #81
|
1 year ago |
Alex Cheema
|
942012577a
styling for tinychat model selector
|
1 year ago |
Alex Cheema
|
5ac6b6a717
clearer documentation on accessing web UI and chatgpt-api
|
1 year ago |
Alex Cheema
|
9a373c2bb0
make configurable discovery timeout
|
1 year ago |
Alex Cheema
|
63a05d5b4f
make configurable discovery timeout
|
1 year ago |
Alex Cheema
|
8d2bb819bf
add llama-3.1 notice to README
|
1 year ago |
Alex Cheema
|
7a2fbf22b9
add model selection to tinychat
|
1 year ago |
Alex Cheema
|
bbfd5adc20
add support for llama3.1 (8b, 70b, 405b). bump mlx up to 0.16.0 and mlx-lm up to 0.16.1. fixes #66
|
1 year ago |
Alex Cheema
|
5496cd85f5
Revert "smart model downloading for mlx #16"
|
1 year ago |
Alex Cheema
|
3a230f3b44
smart model downloading for mlx #16
|
1 year ago |
Alex Cheema
|
174cff071e
Merge pull request #58 from jakobdylanc/main
|
1 year ago |
Alex Cheema
|
b0e7dd9d2d
add max-generate-tokens flag fixes #54
|
1 year ago |
JakobDylanC
|
f2f61ccee6
inference engine selection improvements
|
1 year ago |
Alex Cheema
|
4e46232364
add simple prometheus metrics collection, with a prometheus / grafana instance for live dashboard. related: #22
|
1 year ago |
Alex Cheema
|
2e419ba211
Merge pull request #48 from itsknk/intel-mac
|
1 year ago |
itsknk
|
e934664168
implement dynamic inference engine selection
|
1 year ago |
Alex Cheema
|
1fcbe18baa
fix m2 ultra flops
|
1 year ago |
Alex Cheema
|
9d9d257eb2
reduce chatgpt api response timeout in test
|
1 year ago |
Alex Cheema
|
8850187b8a
tell the mofo in the workflow to keep responses concise
|
1 year ago |
Alex Cheema
|
052ee1c7e9
cache isolation per workflow job
|
1 year ago |
Alex Cheema
|
ce41e653c0
check cached files in workflow
|
1 year ago |
Alex Cheema
|
3d82338c21
debug cached files in workflow
|
1 year ago |