Anthropic’s models show signs of introspection

Anthropic says its most advanced systems may be learning not just to reason, but to reflect internally on how they reason.

Why it matters: These introspective capabilities could make the models safer — or, possibly, just better at pretending to be safe.

Stay informed.

Get the NITIC newsletter!

This material Is based upon work supported by the U.S. National Science Foundation under Grant No. 2300188. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

NITIC is hosted by:

modal link

ITIN Account Login

Log in to your IT Innovation Network account to get full access to all ITIN Community of Practice content and opportunities. If you don’t have an ITIN account, register here.

Username(Required)

Password(Required)

Remember Me

Forgot your password?

Interested in learning more about NITIC activities and opportunities? Sign up for the NITIC newsletter!

X/Twitter

This field is for validation purposes and should be left unchanged.

First Name:(Required)

Last Name:(Required)

Organization:

Email:(Required)

Consent(Required)

I want to join the NITIC mailing list and receive the NITIC newsletter.