As UTF-8 is now part of Nit, the standard imposes conforming implementations to properly handle borderline cases like overlong sequences and such.
The codec defined here sanitizes an input before letting Nit play with it, avoiding potential security [issues](https://www.owasp.org/index.php/Canonicalization,_locale_and_Unicode)
The codec architecture can also be used later to handle different codings for source files (that or we decide that all that is not UTF-8 is to be rejected/misinterpreted) or text.
Pull-Request: #1628
Reviewed-by: Jean Privat <jean@pryen.org>
Reviewed-by: Alexandre Terrasa <alexandre@moz-code.org>