Alex Cheema
|
ca6095c04d
a generic test for every inference engine
|
1 rok pred |
Alex Cheema
|
850b72d3ea
make StatefulShardedModel callable, add some tests for mlx sharded inference
|
1 rok pred |
Alex Cheema
|
6ee0547eff
fix layer calculation for sharded llama
|
1 rok pred |
Alex Cheema
|
445eda156c
dynamically assign shards to nodes deterministically weighted by memory
|
1 rok pred |
Alex Cheema
|
36b8456798
collect global topology with local peer visibility, ring memory weighted partitioning strategy
|
1 rok pred |
Alex Cheema
|
3a66a0a4a8
add requirements.txt
|
1 rok pred |
Alex Cheema
|
ee96c6b023
add another test for device capabiities on MacBook Air
|
1 rok pred |
Alex Cheema
|
6c8c9ee7b1
topology with partitioning strategy
|
1 rok pred |
Alex Cheema
|
563dcb56b0
mlx sharded implementation with example of distributed inference
|
1 rok pred |
Alex Cheema
|
a21f59ff45
scaffolding for networking, inference and orchestration
|
1 rok pred |