Rory Clear
|
3384fc7294
update tinygrad version
|
5 månader sedan |
Nel Nibcord
|
8b71d57da7
Removed inference state entirely
|
5 månader sedan |
Nel Nibcord
|
65fdc99ccc
Call no longer needs request_id
|
5 månader sedan |
Nel Nibcord
|
90518a3bbe
Hoisted caching to a wrapper class
|
5 månader sedan |
Nel Nibcord
|
8205a5aebc
Implemented per-request caching in tinygrad
|
5 månader sedan |
Nel Nibcord
|
13572e6a40
Some stability improvements for tinygrad inference
|
5 månader sedan |
Nel Nibcord
|
527c7a6e49
Applied new interface to tinygrad and dummy inference engines
|
5 månader sedan |
Ogden Wells
|
fbec1d2b10
formatted changes
|
5 månader sedan |
Ogden Wells
|
af01b23a07
added rope_scaling and tie_word_embeddings to llama transformer
|
5 månader sedan |
Alex Cheema
|
f53056dede
more compact operator formatting
|
8 månader sedan |
Alex Cheema
|
14f2846a9c
yapf set blank_line_before_nested_class_or_def to false
|
8 månader sedan |
Alex Cheema
|
ea70c9fb76
reformat with yapf format.py
|
8 månader sedan |
Alex Cheema
|
803dffd1c4
always call convert_from_huggingface with tinygrad models. this was broken by shard layer filtering which made the check sometimes fail. fixes #144
|
8 månader sedan |
Alex Cheema
|
2be446546f
refactor tinygrad, only load necessary layers for each shard fixes #128, enable JIT (much faster), prefill all layers not just the first shard fixes #12, use new ShardDownloader for more robust, parallel downloads
|
8 månader sedan |
Alex Cheema
|
55bcad98e3
standardise tinygrad models/tokenizers so it can handle mlx hf
|
9 månader sedan |
Alex Cheema
|
4cb36a7f55
increase max line length to 200
|
9 månader sedan |
Alex Cheema
|
ce761038ac
formatting / linting
|
9 månader sedan |
Alex Cheema
|
46d618abed
tiny fixes
|
9 månader sedan |
Alex Cheema
|
dd8d18128c
add an opaque inference_state that inference engines can use to pass around small state to other devices
|
9 månader sedan |
Alex Cheema
|
5bbde22a23
move everything under exo module
|
9 månader sedan |