Derw status: December 2021 - parsing, packages, and testing
This month we've made great steps towards 1.0.0, improving everything from parsing to code generation to packages.
Welcome to this month’s roundup of all changes made to Derw. I’ll try to post these on a regular basis, some point during the end of each month. A tl;dr version will also be on Twitter, and you can put the same story together from my Github commits. There’ll be at least a changelog and a soup of my thoughts in the thought cauldron.
Changelog
Advent of code
To give myself a good target and mini-goal, I started writing some of advent of code in Derw, fixing blockers so that the code can actually be generated and ran correctly. These solutions can be found in the examples/advent_of_code directory. While not being how I would typically solve the problems, having real working examples in Derw itself does ensure that the compiler keeps working.
Introducing a lexer
I originally planned to implement Derw in two steps: first, write a compiler in TypeScript. Second, rewrite the compiler in Derw itself. I intended for the first to be messy, and the second to be a standard of code that would be a shining example of Derw done correctly. However, I think it may take longer to get the phase than I planned. There’s stuff required to get there, like a standard library and side effects represented in types. So, I rewrote the compiler to have a proper lexing step. Most of the parser has been shifted over to use that instead.
Debug mode
I needed a way to debug the parsed code blocks, so I added `—debug` as a flag. Debug mode will pretty-print the module’s AST, so that you can double check if the contents was parsed correctly. I figured out that I would sometimes only be interested in the AST of a specific block, so I added `—only` which will filter output to only show the block name mentioned.
I don’t think this will be super helpful to most people, as it requires familiarity with the AST, but it should help with bug reports and my own personal debugging.
Type aliases and object literals
Previously Derw only supported union types - where the constructors and destructors would be simply unparsed plain text. This did not scale well, so when I added type aliases I also added object literals so that type aliases could actually be created, and then during my lexer rewrite I switched constructors and destructors over to using object literals instead.
Let blocks
Since I didn’t have proper brackets support when implemented the advent of code solutions, I added support for let blocks in functions. They work as you’d expect from Elm or Haskell, though there is the strict requirement of needing a type. Consts currently don’t support let, but maybe they should. Though all values in a const’s let block actually can be at the top level since there is no variable.
List ranges
One of the typical tools within FP code is to generate a list of values based on some numbers. In Derw, both fixed numbers (e.g [ 1..5 ]) and variables (e.g [ low..high ]) are supported. In the generated TS/JS, these translate to Array.from.
Run Derw code directly
Previously to run Derw, you’d need to compile Derw then separately run ts-node/node. Now you can use the `—run` flag, and after generating the TS or JS output, it’ll run the entry point files you gave.
Generate Derw from Derw source
While wanting a better way to verified that the parser correctly parsed things, I decided to add a test that would generate Derw from a parsed file, then compare the original file to the generated code. The benefit of generating Derw means we can have a formatter built into the compiler. Simply use `—target derw` or use `—format` and the compiler will generate Derw if it was correctly parsed.
Type checking and namespace checking
Type checking has been implemented in two parts: first, type inference of the body of consts/functions. Second, checking the stated types against those inferred. It’s not completely implemented, and there’s no type checking between TypeScript and Derw. Not everything is properly inferred, so it’s still possible to write incorrect code, but I’ve got a decent foundation.
Names of types, consts and functions now have collision to make sure there’s not two things with the same name in the current namespace. Let blocks aren’t checked yet, however.
Published on npm
You can run Derw via npx now, `npx @eeue56/derw`. It’s a few features behind master, so if you want the latest and greatest, grabbing the repo is probably the best way forward. My plan is to publish a latest version at least once a month, on the first Saturday of the month.
Imports
When writing Derw code, there’s a few different things you might want to import. For example:
Global TypeScript modules (e.g those in node_modules)
Relative TypeScript modules (e.g those in your src folder)
Global Derw modules (e.g those installed via a package manager)
Relative Derw modules (e.g those in your src folder)
As a result I ended up with two different kinds of imports, global and relative. Global imports are done with an identifier, for example `import fs`. Relative imports are done with a string, e.g `import “./other”`. I’m not completely sold on this syntax yet, but it seems like a good starting point.
Additionally, imports may expose names of variables or types. They might also be aliased to other names. For example, `import “./other” as something exposing (isTrue)`. It might be a requirement that relative imports use aliases, to simplify namespace collisions. In Elm, you might have a global module “List”, and a relative module “List”. These are not allowed to coexist, as there’s no syntax for saying global vs relative. In Derw, you’re able to import both, provided one is aliased.
Relative imports first check if a derw file exists, and if not, if a TypeScript or JavaScript file exists.
Comments
Both single line and multiline comments are now supported. Like Elm, you use `—` for single line comments and `{- -}` for multiline. Multiline comments are only allowed at the top level, single comments are allowed only on their own lines. I’d like to build in support for doctests, so we’ll handle multiline comments a little different later on.
There is also the consideration of whether or not comments should stripped from output. Right now they are, but there’s a valid case for generated code to remain as close to the original Derw as possible. For example, if you want to generate some TypeScript from Derw, and then work off of the generated code rather than the original Derw, it might be handy to retain comments. I’m not sure if that’s a valid case though.
Standard library
Before I figure out how to deal with packages, I need a package to start from. The stdlib will be that - reflecting Elm’s closely though with fewer types and functions. Some basic types like Lists and Maybes are represented here. After I figured out how I’d like packages to look, I moved it out to it’s own repo at derw-lang/stdlib.
Packages and testing
While thinking about packages, something has been clear to me: I want it to be as simple as possible. There should be only one obvious way to test it, there should be a standardized directory name for putting your files, the JSON file should be minimal and consistent. The `—init` flag will now initialize a directory with a config file. Modules can be exposed by using the “exposing” key. Dependencies can be named with particular versions. That’s it, so far. Dependencies aren’t yet implemented, but the config key is there. Packages can be compiled by just running the derw binary in your package directory.
Testing is possible via the Test module, found in the stdlib. It is just a light wrapper around the assert module, but with arguments flipped to suit left-pipes. Once compiled, you can run the tests via bach. A nice thing about moving over to a packages way of thinking is that now I’ve added a way of testing an entire package at once, with `—test`. This will first compile the package, then run any tests via bach.
Bug fixes and small things
Piping into lambdas now works
Add CI to the repo and stdlib
Add benchmarking for parsers and generators
Support for equalities, boolean operators
Changes to code generation output
A quiet flag
Parser now handles functions as arguments to functions
Allow function calls as predicates in case..of
Thought cauldron
Generating Elm
It occurred to me that I could support Elm generation from Derw. I’m not sure of the exact use case, but theory that would allow you to write Derw, and have it work with 3 different codebases: Javascript, TypeScript and Elm. There are some blockers like mismatching type names (e.g boolean is actually Bool), globals not existing (e.g console.log must become Debug.log), and handling of strings (e.g format strings not existing).
Mismatching type names is pretty straightforward to solve with a lookup table. Globals not existing are likewise easy to solve with lookups, but the type of Debug.log is actually String → a → a, whereas console.log is a →void. Elm has no void, and if you try to substitute a general type (i.e lowercase type variable), you’ll get an error saying it’s too general. Therefore any function that needs to return void will actually need to lookup the type of the final body. So for Debug.log ““ “hello”, we’d look up the type of “hello” and change the const or function’s type to be string. Format strings has a similar problem. There is no general “toString” function in Elm any more, meaning that any time you wish to combine a value with a string, you must use the specific toString function required - for example, String.fromInt.
I’m not sure how much value there is in providing this functionality, and it’d need a rewrite of the generation code which is currently really clean and straightforward. So it will probably be put on hold until I get some clear indicators that it will be useful.
Typechecking Derw and TypeScript
Most of the compiler API for TypeScript is undocumented and a little painful to use. Giant objects so autocomplete can’t give you useful suggestions. No actual documentation outside of a wiki page recommending against using it. So I have this problem: how can Derw provide type hints and errors to developers as they are writing code? The answer is probably to dive into the TS compiler and make it work (or do something like use generated .d.ts files). The other alternative is to just always generate TS, and let the developers see errors when separately compiling the TS. That is the current behavior I’m following, but there is the `—verify` flag that will run generated TS through the TS compiler. One of the great benefits of using VSCode with TypeScript is that the autocomplete and typehints are mostly great, especially when working with unfamiliar or old code.
Package management
There’s a few routes that could be taken for package management: rely on npm, rely on Github, or use a package server. npm is what most TypeScript do today, with a package.json defining the dependencies. Deno has a central package listing, but relies on Github for actually hosting the code itself. Elm uses a package server which stores package metadata and uses Github for hosting.
For Derw, I’m leaning towards just using Github. There’ll be a simple dependency file, and those dependencies will be fetched from Github at a certain tag or hash. That seems like a reasonable step prior to investing time into setting up some other packaging solution.
Meta
The big one this month was the announcement post. One thing that I’m trying to do with this project is to manage expectations, something that can often be difficult with open source projects. Writing down why I have gone down this path helped me settle in my head where exactly Derw fits in. It sat at the top of Lobsters for a while, which was pretty good - there’d been some previous discussion about Derw prior to me releasing my announcement, and I hoped that I addressed all comments decently. That being said, it didn’t hit as big on Hackernews as I thought it might’ve, which raises the question of whether my USP is clear or exciting enough. But perhaps it’s okay to have a boring project. If it’s valuable it’ll prove itself over time.