Widely considered a cornerstone of human morality, trust shapes many aspects of human social interactions. In this work, the authors present a theoretical analysis of the trust game, the canonical task for studying trust in behavioral and brain sciences, along with simulation results supporting their analysis. Specifically, leveraging reinforcement learning (RL) to train their AI agents, they systematically investigate learning trust under various parameterizations of this task.
Also, U [0, 1] denotes a uniform distribution with support [0, 1] and Beta(, ) is the Beta distribution. Algorithm 1 can be described in simple terms as follows. Recall that Trustor transfers rT , r [0, 1], to Trustee who, with probability 1 p(r), returns a zero amount, and, with probability p(r), returns KrT (r) to Trustor. At the start, i.e., prior to any learning, Sr and Fr are both set to zero. In 2 (1) (2)
id: c11fcac93572c2e564c6568614af1186 - page: 2
Figure 1: Mean frequency of RL Trustors transfers, with (r) = 0rm and p(r) = p0rn; see Proposition 1. (top row) Simulating the 0p0K < 1 condition (0 = 0.5, p0 = 0.5, K = 3) when (r) and p(r) are: (left) constant (m = n = 0), (middle) linear (m = n = 1), or (right) quadratic (m = n = 2). (bottom row) Simulating the 0p0K > 1 condition (0 = 1, p0 = 0.5, K = 3) when (r) and p(r) are: (left) constant (m = n = 0), (middle) linear (m = n = 1), or (right) quadratic (m = n = 2). As a visual aid, the dynamics for the first 200 trials are provided in a smaller plot, located at the center of each subplot. Note that these simulation results are fully consistent with Proposition 1, supporting our mathematical analysis.
id: af0bd5d4e3b8ea176ea7df20beb0656e - page: 3
Upon this transfer, TG Trustee either returns a positive amount to Trustor, in which case Sr is incremented by one (Line 5), or returns a zero amount to Trustor, in which case Fr is incremented by one (Line 7). Note that the former happens with probability p(r) and the latter with probability 1 p(r).
id: 86a127c68ccee451b608963bdaa32f7c - page: 3
In Figure 1, we simulate 10 RL Trustors and report the mean frequency of a fraction being transferred to TG Trustee over the past trials, for a total of N = 20, 000 trials. Note that the simulation results reported in Figure 1 support our mathematical analysis in Proposition 1. As Proposition 1 indicates, if 0p0K < 1, the optimal strategy for Trustor is to transfer nothing to Trustee (i.e., r = 0); Figure 1(top row) is consistent with this result: as Figure 1(top row) shows, the RL Trustor eventually arrives at the decision that they should transfer nothing to Trustee. Also, according to Proposition 1, if 0p0K > 1, the optimal strategy for Trustor is to transfer all of their endowment (i.e., r = 1); Figure 1(bottom row) is consistent with this result: as Figure 1(bottom row) shows, the RL Trustor eventually arrives at the decision that they should transfer all of their endowment to Trustee.
id: 69096deb614261c2c5c0d01e799484cf - page: 3