Alex Cheema преди 10 месеца
родител
ревизия
ba7abb9896
променени са 1 файла, в които са добавени 6 реда и са изтрити 4 реда
  1. 6 4
      README.md

+ 6 - 4
README.md

@@ -53,9 +53,11 @@ Unlike other distributed inference frameworks, exo does not use a master-worker
 
 Exo supports different partitioning strategies to split up a model across devices. The default partitioning strategy is [ring memory weighted partitioning](exo/topology/ring_memory_weighted_partitioning_strategy.py). This runs an inference in a ring where each device runs a number of model layers proportional to the memory of the device.
 
-<picture>
-  <img alt="ring topology" src="docs/ring-topology.png" width="30%" height="30%">
-</picture>
+<p>
+    <picture>
+        <img alt="ring topology" src="docs/ring-topology.png" width="30%" height="30%">
+    </picture>
+</p>
 
 
 ## Installation
@@ -98,7 +100,7 @@ That's it! No configuration required - exo will automatically discover the other
 
 The native way to access models running on exo is using the exo library with peer handles. See how in [this example for Llama 3](examples/llama3_distributed.py).
 
-exo also starts a ChatGPT-compatible API endpoint on http://localhost:8000. Note: this is currently only supported by tail nodes (i.e. nodes selected to be at the end of the ring topology). If you want to force a node to be the tail, set its node-id to be sorted last alphabetically on start e.g. `python3 main.py --node-id xxxnode-mac-mini" Example request:
+exo also starts a ChatGPT-compatible API endpoint on http://localhost:8000. Note: this is currently only supported by tail nodes (i.e. nodes selected to be at the end of the ring topology). Example request:
 
 ```
 curl http://localhost:8000/v1/chat/completions \