Alex Cheema
|
5112f53a37
trigger ci
|
1 year ago |
Alex Cheema
|
047ef48c1d
use separate hf cache dirs for chatgpt api integration test. its an unusual setup where we're running 2 exo instances on the same device which share a disk and hf cache
|
1 year ago |
Alex Cheema
|
2be446546f
refactor tinygrad, only load necessary layers for each shard fixes #128, enable JIT (much faster), prefill all layers not just the first shard fixes #12, use new ShardDownloader for more robust, parallel downloads
|
1 year ago |
Alex Cheema
|
357331c55f
remove some logs, make get_allow_patterns out of class
|
1 year ago |
Alex Cheema
|
b1eb05ed47
debug level 7 for tests
|
1 year ago |
Alex Cheema
|
09a9abc065
fix inference engine test
|
1 year ago |
Alex Cheema
|
67c269076b
Merge branch 'main' into refactor_model_download
|
1 year ago |
Alex Cheema
|
dd41026c5b
cache completed download paths
|
1 year ago |
Alex Cheema
|
706488732f
disable prefix matching on prompts. causes subsequent requests to fail with cannot be broadcast. hotfix for #130
|
1 year ago |
Alex Cheema
|
35b7042e70
upgrade mlx to 0.16.1
|
1 year ago |
Alex Cheema
|
b181f8aa82
handle writing responses errors
|
1 year ago |
Alex Cheema
|
7ec660bba6
fix shard download
|
1 year ago |
Alex Cheema
|
29f154597c
init active_downloadsa
|
1 year ago |
Alex Cheema
|
6bddb2a9dc
download edge cases
|
1 year ago |
Alex Cheema
|
f29963f41e
preemptively start downloads when any node starts processing a prompt. this fixes #104
|
1 year ago |
Alex Cheema
|
7a65a96e52
download progress styling
|
1 year ago |
Alex Cheema
|
c59ceab821
viz spacing
|
1 year ago |
Alex Cheema
|
0a588d0443
viz styles
|
1 year ago |
Alex Cheema
|
d9f232b313
cleaner download progress ui
|
1 year ago |
Alex Cheema
|
476a714bbb
make a separate ShardDownloader abstract class w HFShardDownloader. this opens up plugging in different methods of downloading model shards e.g. #79 / #16
|
1 year ago |
Alex Cheema
|
d22ed12e7b
bring tinygrad to parity with mlx on llama models, show progress of each download file
|
1 year ago |
Alex Cheema
|
45142dab26
tests
|
1 year ago |
Alex Cheema
|
545a486ed3
separate hf_helpers, make extra dir with download_hf script, unify downloading so tinygrad uses the same method as mlx and interoperable model formats
|
1 year ago |
Alex Cheema
|
9014efae86
minimal script to download from hf async with progress
|
1 year ago |
Alex Cheema
|
4b4dfb7fd0
Merge pull request #117 from inksong/main
|
1 year ago |
Alex Cheema
|
55bcad98e3
standardise tinygrad models/tokenizers so it can handle mlx hf
|
1 year ago |
jinke
|
6b1960baec
fix nvidia capabilities
|
1 year ago |
Alex Cheema
|
4a5c6cc580
t
|
1 year ago |
Alex Cheema
|
f93ae2b545
disable tinygrad test for now. need a larger runner or smalelr model
|
1 year ago |
Alex Cheema
|
201996af8a
macos
|
1 year ago |