Alex Cheema
|
581856897a
clean up unused, formatting
|
há 8 meses atrás |
Alex Cheema
|
85279007b3
hotfix edge case where we try to render before tokenizer is set
|
há 8 meses atrás |
Alex Cheema
|
ea70c9fb76
reformat with yapf format.py
|
há 8 meses atrás |
Alex Cheema
|
647ffb94eb
increase cli generation timeout
|
há 8 meses atrás |
Alex Cheema
|
dd24e7db1e
only ignore CancelledError inside stop
|
há 8 meses atrás |
Alex Cheema
|
2e1233357c
ignore CancelledError when stopping the server
|
há 8 meses atrás |
Alex Cheema
|
ae35ada19b
fix headless mode with --disable-tui
|
há 8 meses atrás |
Alex Cheema
|
b95916e0b5
show prompts and outputs in tui
|
há 8 meses atrás |
Alex Cheema
|
e84304317c
add a cli that can be triggered with --run-model <model> --prompt <prompt>
|
há 8 meses atrás |
Alex Cheema
|
7ddb80e245
f-string expression part cannot include a backslash fixes #142
|
há 8 meses atrás |
Alex Cheema
|
6c1bf127b3
add --max-parallel-downloads flag that limits the number of downloads at a time with asyncio.semaphore
|
há 8 meses atrás |
Alex Cheema
|
e6902b2fcf
add --download-quick-check flag to bypass the hf api calls / remote file checks
|
há 8 meses atrás |
Alex Cheema
|
71591d2ebc
display all interfaces web chat and chatgpt api are available on fixes #134
|
há 8 meses atrás |
Alex Cheema
|
6bddb2a9dc
download edge cases
|
há 9 meses atrás |
Alex Cheema
|
f29963f41e
preemptively start downloads when any node starts processing a prompt. this fixes #104
|
há 9 meses atrás |
Alex Cheema
|
476a714bbb
make a separate ShardDownloader abstract class w HFShardDownloader. this opens up plugging in different methods of downloading model shards e.g. #79 / #16
|
há 9 meses atrás |
Alex Cheema
|
d22ed12e7b
bring tinygrad to parity with mlx on llama models, show progress of each download file
|
há 9 meses atrás |
Alex Cheema
|
545a486ed3
separate hf_helpers, make extra dir with download_hf script, unify downloading so tinygrad uses the same method as mlx and interoperable model formats
|
há 9 meses atrás |
Alex Cheema
|
0bfb8e3b6d
sticky node ids #16
|
há 9 meses atrás |
Alex Cheema
|
d6a7e46324
async model downloading with download progress. fixes #102. related: #16 #104
|
há 9 meses atrás |
Alex Cheema
|
57b2f2a4e2
fix ruff lint errors
|
há 9 meses atrás |
Alex Cheema
|
9a373c2bb0
make configurable discovery timeout
|
há 9 meses atrás |
Alex Cheema
|
63a05d5b4f
make configurable discovery timeout
|
há 9 meses atrás |
Alex Cheema
|
174cff071e
Merge pull request #58 from jakobdylanc/main
|
há 9 meses atrás |
Alex Cheema
|
b0e7dd9d2d
add max-generate-tokens flag fixes #54
|
há 9 meses atrás |
JakobDylanC
|
f2f61ccee6
inference engine selection improvements
|
há 9 meses atrás |
Alex Cheema
|
4e46232364
add simple prometheus metrics collection, with a prometheus / grafana instance for live dashboard. related: #22
|
há 9 meses atrás |
Alex Cheema
|
2e419ba211
Merge pull request #48 from itsknk/intel-mac
|
há 9 meses atrás |
itsknk
|
e934664168
implement dynamic inference engine selection
|
há 9 meses atrás |
Alec Potluri
|
db583a863f
disable tui flag
|
há 9 meses atrás |