Alex Cheema
|
eb92da2c3e
cleaner chatgpt api impl with async callbacks
|
hai 1 ano |
Alex Cheema
|
71e00745cc
fix tokenizer inconsistencies
|
hai 1 ano |
Alex Cheema
|
ce46f00059
linux device capabilities
|
hai 1 ano |
Alex Cheema
|
dbbc7be57f
remove hard dependency on MLX fixes #8
|
hai 1 ano |
Alex Cheema
|
dd8d18128c
add an opaque inference_state that inference engines can use to pass around small state to other devices
|
hai 1 ano |
Alex Cheema
|
f2895cbcee
revive the chatgpt api endpoint on :8000
|
hai 1 ano |
Alex Cheema
|
05b9fa497d
initialize node id to uuid4 if not set
|
hai 1 ano |
Alex Cheema
|
32f2e36fd3
main rename
|
hai 1 ano |
Alex Cheema
|
5bbde22a23
move everything under exo module
|
hai 1 ano |
Alex Cheema
|
36b8456798
collect global topology with local peer visibility, ring memory weighted partitioning strategy
|
hai 1 ano |
Alex Cheema
|
563dcb56b0
mlx sharded implementation with example of distributed inference
|
hai 1 ano |
Alex Cheema
|
a21f59ff45
scaffolding for networking, inference and orchestration
|
hai 1 ano |