The main metrics of testing quality are coverage and number of rejections. In terms of software testing, we have to move toward minimization of invalid found defects, minimization of non-found valid defects and maximization testing coverage. The same is valid for automated testing but basically measurements being performed in relation to manual effort, while manual testing is benchmarking against UAT, defects found in production by users and stakeholders.
Scientifics interpret the same notions in terms of Errors:
Error type I – false positive (false alarm) – when reported defect is invalid.
Error type II – false negative (missing) – when defect was not revealed in testing.
See more details on this theory on Wikipedia [http://en.wikipedia.org/wiki/Type_I_and_type_II_errors]
There are special statistical measures which operate with those notions (see below an excerpt from Wikipedia.org)
Full details on that see here: http://en.wikipedia.org/wiki/Sensitivity_(tests)
So one-two metrics can be employed for both manual and automated testing analysis in time (for instance showing up in QA reports as curves on a graph). I would select Accuracy (ACC), Precision (http://en.wikipedia.org/wiki/Accuracy) and False discovery rate (FDR).
For automated testing, I would suggest to measure against manual testing considering test automation coverage but not overall (manual+UAT+Production), so that false positive (false alarm) is when test automation reveals wrong defect, false negative – when automated test did not find defect when expected due to auto test covers that functionality.
To test your auto test :), you may run pilot analysis using so called defect seeding technique.