Alex Cheema
|
d22ed12e7b
bring tinygrad to parity with mlx on llama models, show progress of each download file
|
il y a 9 mois |
Alex Cheema
|
545a486ed3
separate hf_helpers, make extra dir with download_hf script, unify downloading so tinygrad uses the same method as mlx and interoperable model formats
|
il y a 9 mois |
Alex Cheema
|
0bfb8e3b6d
sticky node ids #16
|
il y a 9 mois |
Alex Cheema
|
d6a7e46324
async model downloading with download progress. fixes #102. related: #16 #104
|
il y a 9 mois |
Alex Cheema
|
57b2f2a4e2
fix ruff lint errors
|
il y a 9 mois |
Alex Cheema
|
9a373c2bb0
make configurable discovery timeout
|
il y a 9 mois |
Alex Cheema
|
63a05d5b4f
make configurable discovery timeout
|
il y a 9 mois |
Alex Cheema
|
174cff071e
Merge pull request #58 from jakobdylanc/main
|
il y a 9 mois |
Alex Cheema
|
b0e7dd9d2d
add max-generate-tokens flag fixes #54
|
il y a 9 mois |
JakobDylanC
|
f2f61ccee6
inference engine selection improvements
|
il y a 9 mois |
Alex Cheema
|
4e46232364
add simple prometheus metrics collection, with a prometheus / grafana instance for live dashboard. related: #22
|
il y a 9 mois |
Alex Cheema
|
2e419ba211
Merge pull request #48 from itsknk/intel-mac
|
il y a 9 mois |
itsknk
|
e934664168
implement dynamic inference engine selection
|
il y a 9 mois |
Alec Potluri
|
db583a863f
disable tui flag
|
il y a 9 mois |
Alex Cheema
|
e49924e1b9
add chatgpt-api-response-timeout-secs flag, set this to 20 mins in test
|
il y a 9 mois |
Alex Cheema
|
a342e1abd8
add web url and chatgpt api endpoint to panel (fixes #43), fix a rounding error in the partition to shard mapping implementation
|
il y a 9 mois |
Alex Cheema
|
d9484906a3
remove the spammy logs
|
il y a 9 mois |
Alex Cheema
|
4b592f9d45
exo topology visualisation that shows the topology of the network, device capabilities and the currently active node using opaque statuses. fixes #36. ready for #33
|
il y a 9 mois |
Alex Cheema
|
35177690bd
by default find an ephemeral node port fixes #35, more robust topology updates. both fix #15 and #14
|
il y a 9 mois |
Alex Cheema
|
945f90f676
allow overriding inference_engine and separate flag for TINYGRAD_DEBUG
|
il y a 9 mois |
Alex Cheema
|
72fe293729
exo text on start and stop
|
il y a 9 mois |
Alex Cheema
|
1e1e11cdc6
check if inference_engine has tokenizer before printing with it
|
il y a 9 mois |
Alex Cheema
|
eb92da2c3e
cleaner chatgpt api impl with async callbacks
|
il y a 9 mois |
Alex Cheema
|
71e00745cc
fix tokenizer inconsistencies
|
il y a 9 mois |
Alex Cheema
|
ce46f00059
linux device capabilities
|
il y a 9 mois |
Alex Cheema
|
dbbc7be57f
remove hard dependency on MLX fixes #8
|
il y a 9 mois |
Alex Cheema
|
dd8d18128c
add an opaque inference_state that inference engines can use to pass around small state to other devices
|
il y a 9 mois |
Alex Cheema
|
f2895cbcee
revive the chatgpt api endpoint on :8000
|
il y a 9 mois |
Alex Cheema
|
05b9fa497d
initialize node id to uuid4 if not set
|
il y a 9 mois |
Alex Cheema
|
32f2e36fd3
main rename
|
il y a 9 mois |