# Proposal for Generated Filter Code
I am starting work on a new netcdf utility that takes a
simplified filter specification and uses it to generate a
complete HDF5 filter wrapper [2] plus an NCZarr filter wrapper.
The raison d'etre is that the process of building an HDF5 filter
wrapper [2,3] from scratch is complex, time-consuming and error
prone. Using a code generator is likely to simplify this
process. At the very least, it will produce base code that a
filter builder can modify to build the desired wrapper.
This program is analogous to, say, the yacc parser generator
that converts an annotated BNF to a full blown parser.
**What I need:** I have a simple prototype working, but I need
some community input on this idea. Would anyone use it? Is the
proposed specification (Appendix A) reasonably simple to construct?
If you want to participate, use this
[GitHub discussion](https://github.com/Unidata/netcdf-c/discussions/2288).
# Specification Overview
The filter specification is written in JSON, although it is
highly stylized. It was derived from the NumCodecs [4] format
but with significant extensions to support the Netcdf-4/HDF5
wrapper format.
A couple of visible extensions with respect to JSON are:
1. Single line comments are supported beginning with '#'.
2. An alternate string delimiter is provided using the '`'
character; chosen because occurrences of that delimiter in C
code is very uncommon.
The basic specification is a JSON dictionary with very specific
keys that are used to control code generation.
A draft example for specifying the zstandard filter wrapper is shown in
Appendix A. The various dictionary keys provide filter information.
* **"id"** -- specifies the NumCodecs name (Zstd) and the HDF5 assigned
identifier (32015); it also specifies an alternate preferred name.
* **"parameters"** -- a dictionary whose keys are the parameter names
as specified by NumCodecs, and the value is a keyword
indicating the type of the corresponding parameter.
The allowable types are "integer" or "float". or an enumeration
(not described here).
* **"initialize"** -- the value is a piece of code to initialize the
filter before use.
* **"finalize"** -- the value is a piece of code to shutdown the filter
after all use is complete.
* **"prefix"** -- arbitrary code to insert at the front of the filter
wrapper; typically used to include filter library specific headers.
* **"suffix"** -- arbitrary code to insert at the end of the filter
wrapper; typically used to include filter library specific utility
functions.
* **"encode"** -- a function name plus the code for a user-provided
function to invoke the filter's encoding/compression capability; this
has a very specific signature.
* **"decode"** -- a function name plus the code for a user-provided
function to invoke the filter's decoding/decompression capability; this
has a very specific signature.
# References
[1] [HDF5 Filter
Specification](https://support.hdfgroup.org/HDF5/doc/Advanced/DynamicallyLoadedFilters/HDF5DynamicallyLoadedFilters.pdf)<br>
[2] [Registered HDF5 Filter
Plugins](https://portal.hdfgroup.org/display/support/Registered+Filter+Plugins)<br>
[3] [User Contributed
Filters](https://support.hdfgroup.org/services/contributions.html#filters)<br>
[4] [NumCodecs](https://numcodecs.readthedocs.io/en/stable/)<br>
# Appendix A. Zstandard Draft Example
````
{
"id": {"zstd": 32015, "preferred": "zstandard"},
"parameters": [{"level": "integer"}]
"encode": ["name": "zstd_compress",
"code": # The signature is standardized
`
size_t zstd_compress(size_t srclen, void* srcbuf, size_t*
dstlenp, void** dstbufp, size_t cd_nelmts, const unsigned int* cd_values)
{
int ret = NC_NOERR;
size_t dstlen;
void* dstbuf;
dstlen = (size_t)ZSTD_compressBound(srclen);
if(ZSTD_isError(dstlen)) {ret = NC_EFILTER; goto cleanup;}
/* Prepare the destination buffer. */
if((dstbuf = malloc(dstlen))==NULL) {ret = NC_ENOMEM; goto
cleanup;}
dstlen = ZSTD_compress(dstbuf, dstlen, srcbuf, srclen,
/*level*/cd_values[0]);
if(ZSTD_isError(dstlen)) {ret = NC_EFILTER; goto cleanup;}
if(dstlenp) *dstlenp = dstlen;
if(dstbufp) *dstbufp = dstbuf;
cleanup:
return dstsize;
}`],
"decode": ["name": "zstd_decompress",
"code": # The signature is standardized
`...`]
"prefix": `...`,
"suffix": `...`
}
````