Misc comments:
-
Life Cycle Stages should reference ones already defined in ISO - these are very close but I think aligning makes the most sense.
-
Design and Development stage:
a. explicitly call out mid- and post-training to be included in this phase (rather than just pre-training) - this is a growing component
b. explicitly call out that it shouldn't just be the final training run, but any test runs (you mention the entire training duration, but there are often many training runs done before the final one)
-
Consumer vs Provider: I think in practice there is a lot of gray area in these definitions. I recommend checking out the EU AI Act definitions, aligning to what's already out there
-
Consumer Functional Unit: for LLMs I think Query should be the FU rather than Token. LLMs behave very differently, especially with the rise of reasoning models, and they can generate a massively different amount of tokens from model to model. The query is really what is desired by the end user
-
Provider Functional Units: I don't think normalizing training by anything really makes sense or provides any useful information - instead a useful thing you can add here would be to think about how this phase can be properly attributed (amortized?) to the per inference CO2 value
-
Examples: Have you done any real-world testing of this spec? The reality is much, much more messy, complicated, and various depending on small decisions compared to what's in your examples. For instance, you say "Carbon emitted of inference servers" - just that one line raises about 50 questions in my head ranging from what's included in the energy portion (GPU, CPU, RAM, PUE, idle energy) how to calculate carbon (LB, MB, regional vs global) to how the model is configured (data precision, batching, quantization, etc.).
I understand you are trying to be general, but there is such a wide range of variability depending on different choices that the results of this spec won't be comparable as currently written. I highly recommend integrating learnings from real-world implementations of this before proceeding.
Overall thought:
The European Committee of Standardization (CEN) is currently working (with leading experts) on a "Guidelines and metrics for the environmental impact of artificial intelligence systems and services" standard which goes into significant technical depth. ISO standardization is a goal of theirs, I believe, and I fear the space could become confused and fragmented if your spec beats them to the punch. Therefore, I discourage you from proceeding with ISO standardization of this spec as it's currently written, as it would confuse the industry, and ultimately be detrimental to the greening of software. I still think it could be useful if it has real-world numbers to ground it.
Misc comments:
Life Cycle Stages should reference ones already defined in ISO - these are very close but I think aligning makes the most sense.
Design and Development stage:
a. explicitly call out mid- and post-training to be included in this phase (rather than just pre-training) - this is a growing component
b. explicitly call out that it shouldn't just be the final training run, but any test runs (you mention the entire training duration, but there are often many training runs done before the final one)
Consumer vs Provider: I think in practice there is a lot of gray area in these definitions. I recommend checking out the EU AI Act definitions, aligning to what's already out there
Consumer Functional Unit: for LLMs I think Query should be the FU rather than Token. LLMs behave very differently, especially with the rise of reasoning models, and they can generate a massively different amount of tokens from model to model. The query is really what is desired by the end user
Provider Functional Units: I don't think normalizing training by anything really makes sense or provides any useful information - instead a useful thing you can add here would be to think about how this phase can be properly attributed (amortized?) to the per inference CO2 value
Examples: Have you done any real-world testing of this spec? The reality is much, much more messy, complicated, and various depending on small decisions compared to what's in your examples. For instance, you say "Carbon emitted of inference servers" - just that one line raises about 50 questions in my head ranging from what's included in the energy portion (GPU, CPU, RAM, PUE, idle energy) how to calculate carbon (LB, MB, regional vs global) to how the model is configured (data precision, batching, quantization, etc.).
I understand you are trying to be general, but there is such a wide range of variability depending on different choices that the results of this spec won't be comparable as currently written. I highly recommend integrating learnings from real-world implementations of this before proceeding.
Overall thought:
The European Committee of Standardization (CEN) is currently working (with leading experts) on a "Guidelines and metrics for the environmental impact of artificial intelligence systems and services" standard which goes into significant technical depth. ISO standardization is a goal of theirs, I believe, and I fear the space could become confused and fragmented if your spec beats them to the punch. Therefore, I discourage you from proceeding with ISO standardization of this spec as it's currently written, as it would confuse the industry, and ultimately be detrimental to the greening of software. I still think it could be useful if it has real-world numbers to ground it.