• planish@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    7
    ·
    10 months ago

    Don’t they provide the source for the code to actually run the model? Otherwise how are people loading it up and running it? Are they shipping executables along with model weights?

    • h3ndrik@feddit.de
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      10 months ago

      What they mean by that is probably the fact that you can download the model, run it on your own hardware and adapt it. Contrary to what OpenAI does, who just offer a service and don’t give access to the model itself, you can just use ChatGPT through their servers.

      Most of the models come with a Github repo with code to run it and benchmarks. But it’s more or less just boilerplate code to get it running in one of the well-established machine learning frameworks. Maybe a few customizations and the exact setup to get a new model architecture running. It would usually be something like Huggingface’s Transformers library. There are a few other big projects which are used by people. If researchers come up with new maths, concepts and new architectures, it eventually gets implemented there.

      But the code that gets released alongside new models it usually meant for scientific repeatability and not necessarily for actual use. It might contain customizations that make it difficult to incorporate it into other things, usually isn’t maintained after the release and most of the times it is based on old versions of libraries, that were state of the art when they started with their research. So that’s usually not what gets used by people in the end.

      Interestingly enough companies all use different phrasing. Mistral AI claims to be commited to be “open & transparent” yet they like to drop torrent files to new models that come with zero explanation and code. And OpenAI still carries the word “open” in their company name, but at this point openness is more a hint of an idea from their very early days.

      Anyways, inference code and the model aren’t the same thing. It would be more like if we were talking about cake recipes and you provide me with the schematics of a kitchen aid.