Is there some formal way(s) of quantifying potential flaws, or risk, and ensuring there’s sufficient spread of tests to cover them? Perhaps using some kind of complexity measure? Or a risk assessment of some kind?
Experience tells me I need to be extra careful around certain things - user input, code generation, anything with a publicly exposed surface, third-party libraries/services, financial data, personal information (especially of minors), batch data manipulation/migration, and so on.
But is there any accepted means of formally measuring a system and ensuring that some level of test quality exists?


Mutation testing. Someone else mentioned it as PIT testing, but its actual name is mutation testing. It accomplished exactly what you’re looking for here.
Something tells me that if a team struggles to put together working tests, extending their test sets to support mutation tests will also offer dubious returns on investment.