Playing Repeated Security Games with No Prior Knowledge by Xu et al. [1] is an interesting take on security games where both the payoffs and the attacker behaviors are unknown. Repeated security games between a defender and a bounded-rational attacker have many practical uses, however, previous forays into these types of games have each had certain disadvantages. Models that do not remember previous “attacks” or assume specific attacker behavior fail to account for the attacker’s adaptability, and, although the model by Kar et al. [2] accounts for adaptability, it makes assumptions that are unrealistic for many applications and is computationally intractable. Xu et al propose the Follow the Perturbed Leader with Uniform Exploration, or FPL-UE, defender strategy, a variant of the Follow the Perturbed Leader algorithm [3], to overcome these disadvantages. The FPL-UE algorithm is the “first defender strategy for repeated security games that has provable theoretical performance guarantees.” [1] It is also noteworthy that for zero-sum games, the algorithmic equilibrium and minimax value of the game can be approximated by applying the strategy to both sides.
In using this strategy, defenders protect n targets with k resources (k < n) while the attacker can attack, at most m targets. Utility of a certain target, i, is in the range [-0.5, 0.5], and is denoted by for covered targets and for uncovered target, where it is assumed for the defender, and the defender gains the reward (>0). This strategy assumes that the defender can know the utilities where its resources are located and maximizes those expected utilities. It also assumes that the attacker cannot observe the defender’s move at current round. No behavior model is assumed for the attacker, however.
The FPL-UE algorithm basically introduces the possibility of uniform exploration with probability γ to the FPL algorithm. In other words, at each round, the FPL-UE algorithm either performs a uniform random exploration with probability γ or performs an FPL strategy. The algorithm then estimates the reward for the round using the GR algorithm, which uses a geometric distribution and the probability of a target being guarded. The reward estimate is only updated if there is a resource (guard) there.
This FPL-UE defender strategy was evaluated against five different styles of attackers: uniform, adversarial, Stackelberg, BestResponse, and QuantalResponse. Uniform attacked with a uniformly random mixed strategy, adversarial attacked with the maximin mixed strategy, Stckelberg always played the optimal follower pure strategy of the Strong Stackelberg Equilibrium, BestResponse chose the best response to where vi is the strategy at round I and t is the current round, and QuantalResponse also responds to what BestResponse did, but it uses a QuantalResponse model. In this evaluation, it was found that the algorithm can converge at different rates, but converges has an upper bound of the regret bound.
Finally, it was shown that FPL-UE can “provably achieve low regret bounds, against both the best fixed strategy on hindsight, and the optimal adaptive strategy.” [1] It was also shown that FPL-UE is efficient against typical attacker profiles, outperforming a version of FPL improved by Neu and Bartok, and it is comparable to the current state-of-the-art SUQR algorithm that requires more prior knowledge than FPL-UE.
[1] H. Xu, L. Tran-Thanh, and N. R. Jennings. Playing Repeated Security Games with No Prior Knowledge, Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, pp. 104 -112. (2016)
[2] D. Kar, F. Fang, F. D. Fave, N. Sintov, and M. Tambe. ¸Sagame of thronesˇT: When human behavior models compete in repeated stackelberg security games. In InternationalConference on Autonomous Agents and Multiagent Systems(AAMAS 2015), 2015.
[3] A. Kalai and S. Vempala. Efficient algorithms for onlinedecision problems. J. Comput. Syst. Sci., 71(3):291–307,Oct. 2005.
Questions:
- What other possible attacker algorithms could have been used to evaluate the FPL-UE algorithm?
- What are some more real-world applications for this research?
- Why use Uniform Exploration? (Why not use another form of exploration?)