Being a foss enthusiast I can configure most of my software in way too many ways. However I noticed that this is not true for most compilers. Which got me thinking: why isn’t that the case. In gcc (or your favorite compiler tool) I have a shitload of options about what are errors and warnings and how the code should be compiled and tons of other options. But not on how the code should be interpreted and what the code should look like.

Why can’t I simply add a module to a build process to make it [objective oriented | have indentation for brackets | automatically allocate memory | automatically assume types | auto forward-declarate | some other thing that differentiates one language from another]* ? Its so weird that I have a pdf reader that has an option to set the window icon, a mail client that lets me specify regex to search for a mentioned but forgotten attachment and play a game that lets me set my texture picmip resolution. But that the tool (gcc) to build these things has not even got a config file build in. We have build tools around them to supply arguments.

This could look like the following: ( oversimplified )

  1. preprocess
  2. compile
  3. assemble
  4. link

v

  1. add brackets from indentation
  2. preprocess
  3. check if objective oriented constraints are all satisfied
  4. do something else
  5. compile
  6. assemble
  7. run assembly through as an example ai for antivirus scanning
  8. link
  9. run test

There could also be a fork in this process: sending for example the source code both to a compiler and an interpreter to detect edge case behavior while compiling. Or compile with both automatic typing and your defined typing so that when rounding errors are big you can instantly compare with a dynamically typed version of your program. Or the other way around, maybe you want different parts of your code to be handled with different preprocessors.

The build process should be configured per project for things about the input like syntax and per computer for things about the output like optimizations.

There are of course some drawbacks, one being a trust issue where someone pulls in a obscure module to build malicious releases. It probably also is harder to maintain stability when you have to keep in mind that your preprocessor isn’t the first to be run. And your compiling process can take a lot longer if you have to go through multiple pre, post or even compilation phases.

If you know such a build tool, or c (: haha :) some obvious reasons that this should not exist, please let me know. Thank you for reading this lenghty post.

Thanks for the comments, based on them I think I can better explain what I want. I would like a language that has got minimal specification so its preprocessor, compiler, assembler and linker are a collection of plugins rather than one chunky program.

So the compiler reads for example a line. void main(int argc, char argv) and then all main body plugins get a event_newline. The function plugin reads this and creates a new object that contains the function main. Then sets an event_functionBody that is caught by other plugin(s) to read the contents of main and return what it has to do.

  • jeffhykin@lemm.ee
    link
    fedilink
    arrow-up
    23
    ·
    edit-2
    1 year ago

    does this compiler exist

    TLDR; 65% of what you want exists as the Rust compiler, which is probably as close as you’re going to get at the moment (edit: I was wrong see the comment about racket for a less practical but more flexible system). Take a look at macros like view! on this page. Rust doesnt support html-like syntax, but it does within that view! because someone made a macro that supports it. Rustc doesn’t directly have a config file AFAIK but it also doesn’t need any build tools (no make, cmake, autoconf, etc) because everything can be done with rust itself (because it’s macro system is Turing complete with full file access).

    Full Response:

    I agree with the general idea, but I think there are lots of misconceptions. Gcc does allow doing things before the preprocess step, after the preprocess step, before the linking step, etc. It’s possible, but not easy, to run your own programs inbetween those kinds of steps. As for why there’s no config file, it’s probably cause gcc is really old, but I’ll have to let someone else comment on that.

    However, syntax support is effectively a completely different feature request. For example the “adding brackets to indentation” couldn’t really/correctly come before the preprocessing step. I mean a really hacky solution like my indent experiment from a long time ago can, but it will never be even slightly reliable because of the preprocessor, multi-line strings, comments and other edgecases. Let me explain.

    • The syntax cannot be parsed without running the preprocessor. Things like un-matched brackets are completely allowed before the preprocessing step. It would be literally impossible for the parser to run before preprocessing.
    • So let’s talk preprocessing. The preprocessor is so stupid it won’t even notice the difference between C, Haskell, or Ada. It’s just looking for strings, comments, ints, and preprocessor directives. That’s it. It has no idea about scopes or brackets or anything like that.
    • So for the “adding brackets to indentation” to work, it would need to run its own preprocessor step, then do some parsing of its own, and then run the indent-to-bracket conversion.

    But note, preprocessor strings just coincidentally parse the same as C strings. There’s already a limitation of the preprocessor failing on, lets say, python where python has triple-quote strings.

    That said, preprocessing is actually highly unusual in the sense that it can be done as a separate step. Usually parsing needs to be done as a unified operation. Not to say it can’t be modular, but rather the module must be given to a central controller that knows about everything rather than just having a code-transformaiton step.

    With those misconceptions out of the way, now I want to talk about the parts I agree with.

    IMO the perfect language is the one that has an “engine” that is completely separate from the syntax. And then the language/compiler should allow for different syntax. LLVM IR could be argued as being “an engine”, but man is it a messy edgecase-y engine with no unified front-end.

    The closest current thing to what you’re talking about is almost certainly Rust macros. Unlike the preprocessor, Rust macros fully understand rust and are a part of the parsing process. They are decently close to what you’re saying, instead of compiler flags it’s just imports within Rust. You can write HMTL, SQL, and other code just right in the middle of a rust program (and it’s not a string either, it’s actual syntax support). Not only is it possible, but I have been eagerly awaiting for someone to create a garbage-collected syntax within a Rust macro. People have already created garbage collectors, it’s just a matter of making a nice wrapper and inter-op.

    That said, and even though Rust macros are head-and-sholders above basically every other language, I personally still think rust macros don’t go far enough. Indent-based code isn’t really possible within rust macros, rust macros can’t have imbalanced braces, and there can be escaping issues that prevent things like YAML syntax from ever being possible. They also can’t allow for extensions like units, e.g. 10gallons without wrapping it with some kind of delimiter (which defeats the point)

    AFAIK currently there is no compiler that supports a composable syntax like that. I’ve worked on designing such a system, and while I don’t think it’s impossible, it is extremely hard. There’s a lot of complications, like parsing precedence, lookaheads, operator precedence. Two syntax modules that don’t know about each other can easily break each other. Like I said, I don’t think it’s impossible, but it is difficult.

    • porgamrer@programming.dev
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      1 year ago

      I mentioned it elsewhere here but I think the Terra research language has explored this area more thoroughly than Rust, just because that’s its only purpose. The website and academic papers are definitely worth a skim: https://terralang.org/

      It’s basically a powerful LLVM-based compilation library exposed where everything is exposed through Lua bindings. The default Terra compiler is just a Lua script that you can pull apart, extend, rearrange, etc. It’s all designed for ease of experimentation, whereas Rust has to worry about being a rock-solid production compiler.

      Honourable mention to C# source generators too. They are janky as hell but very effective.

    • bizdelnick
      link
      fedilink
      arrow-up
      4
      arrow-down
      4
      ·
      edit-2
      1 year ago

      There’s nothing new in rust that was not already possible with C++. It is possible to change language syntax using macros and templates… if you want to write code that nobody will understand.

      • deur@feddit.nl
        link
        fedilink
        arrow-up
        9
        arrow-down
        1
        ·
        1 year ago

        Yeaaah except that rust-analyzer can honest to god manage to inspect macro codegen.

        And the fact that macros are made to retain “span” information…

        And that macros arent a huge hack…

      • Miaou@jlai.lu
        link
        fedilink
        arrow-up
        1
        ·
        1 year ago

        I like how you addressed the problem with this approach in c++ but somewhat still clicked on the “post” button