SIAM News Blog
SIAM News
Print

Green Security Games Along Trails

By Nicolas Betancourt and Mauricio Velasco

Mitigating the negative impacts of the current climate crisis and ongoing loss in Earth’s biodiversity is a compelling and important effort. However, the associated political, economic, and technical dimensions pose an unprecedented challenge. An essential avenue in the fight against climate change and preservation of Earth’s biodiversity involves the protection of large natural areas, which provide shelter to a large variety of species and serve as carbon sinks and water reservoirs. Unfortunately, protected natural areas around the world are under constant threat from things like poaching, illegal logging, mining, and wildlife trade. As such, the individuals who take care of these areas must optimally allocate the limited availability of rangers and other resources across extensive tracts of land.

Figure 1. Graph of trails in Ecuador’s Jama-Coaque Ecological Reserve. Figure courtesy of Third Millennium Alliance.
Computer scientists have made important contributions to such problems in the research field of Green Security Games. The field’s main objective is to design decision rules for rangers in protected areas so that surveillance occurs in the most critical spots [2]. Although numerous techniques exist for the solution of such problems, the main byproduct of these algorithms is the construction of sequential patrolling routes within the reserve.

Much of the work in this realm has focused on open spaces like the African savannah, which requires a discretization of the protected area. Yet such approaches do not capture the reality of South American parks, wherein the dense vegetation and rugged terrain mean that both rangers and poachers can only travel on a limited collection of available trails (i.e., the graph in Figure 1). In joint work with the Third Millennium Alliance—a non-governmental organization that is devoted to the conservation of the last remaining acres of the formerly immense Ecuadorian Chocó—we adapted existing ideas [2, 4] to advise the rangers of the Jama-Coaque Ecological Reserve as to what routes they should follow each day. Our methods combine ideas from reinforcement learning and combinatorial optimization, learn from previous experiences, and adapt routes to increase the frequency at which rangers thwart environmental crimes [3].

On a daily basis, every available ranger must visit some of the trails to look for illegal activity, and all of the feasible routes must start and end at a shelter within the reserve. In our problem, a patrolling agenda is a choice of one of the many feasible cycle-shaped routes along the graph of trails in the reserve; the goal is to build a decision rule that sequentially selects a route for patrollers to follow (see Figure 2).

Figure 2. A feasible route that starts and ends at the starred node (a bamboo house) in the graph of trails. Figure courtesy of Nicolas Betancourt.
To achieve that goal, we begin with a foundational method of Green Security Games that involves fixing a probability distribution on the set of patrolling agendas and sampling it at the beginning of each patrolling stage. The underlying assumption is that poachers are strategically sophisticated actors who will spy on rangers and learn whichever probability distribution they have previously used. The poachers will then choose the routes that are most likely to avoid rangers. If the rangers assume this informational advantage for the poachers, they can choose a distribution that forces the poachers to adopt the least convenient optimal response. The rangers fix the probability distribution that maximizes the expected number of encounters with poachers under the constraints that follow the adversaries’ fully strategical behavior. 

However, this game-theoretical approach—known as a Stackelberg game—presents two serious difficulties. First, the assumptions about the poachers might not be true, which would yield overly conservative and poorly performing suggested routes. And second, optimizing over the space of probability distributions on the cycles of a given graph of trails is equivalent to solving a mixed-integer quadratic problem on the circuit polytope of this graph; in turn, the latter is similar to solving a large instance of the traveling salesman problem and is thus very computationally intensive.

Our proposal focuses on a second method that is computationally more practical, makes weaker—and in our view, more realistic—assumptions about the poachers’ behavior and applies tools from reinforcement learning [4]. Suppose that we have a collection of Bernoulli independent variables that are indexed by the trails in the reserve; each variable models the event of identifying illegal activity on the trail. Provided that we know the success probabilities, every available ranger should pick the circuit-like route that maximizes the expected number of encounters with the poacher (see Figure 3). Since the team of rangers is attempting to cover most of the trails via only the collection of sets that is comprised of circuits that cross the shelter node, solving the optimization problem is very similar to an instance of a combinatorial optimization problem called the probabilistic maximum coverage problem [1].

Figure 3. The heat map of success probabilities on each trail determines a unique optimal route. Figure courtesy of Nicolas Betancourt.

More specifically, suppose that there are \(B\) available rangers at the reserve, each of whom is willing to choose a feasible route. Let \(C_G  \in \mathbb{R}^{n\times M}\) (where \(n\) is the number of trails and \(M\) is the number of circuits) be the matrix of cycles for the graph of trails \(G\) — i.e., the matrix such that its \(j\)th column, \(C^j \in \{0,1\}^n\), is the indicator vector of the \(j\)th circuit

\[C_i^j=\begin{cases} 1 & \textrm{Circuit }j \textrm{ crosses the trail } i\\ 0 & \textrm{Otherwise}  \end{cases}.\]

With this notation, we can describe the rangers’ problem as choosing the \(B\) columns of \(C_G\) that maximize the covered terrain while accounting for the probability of finding a poacher. If \(t_i\) is the Bernoulli random variable that corresponds to the \(i\)th trail with success probability \(\mu_i\), then the number of covered trails is given by

\[U_G(t, \eta)= \sum_{j=1}^M \eta_j \sum_{i=1}^n t_{i} C_i^j= t^T C_g \eta.\]

Here, \(\eta\in \{0,1\}^M\) is the choice of the team of rangers

\[\eta_j=\begin{cases} 1 & \textrm{Some ranger picked circuit }j \\ 0 & \textrm{Otherwise}  \end{cases}.\]

Therefore, the expected number of encounters with illegal activity (given the choice \(\eta\)) is

\[r_\mu(\eta):= \mathbb{E}[ U_G(t, \eta)]=\sum_{j=1}^M \eta_j \sum_{i=1}^n \mu_{i} C_i^j.\]

One could charge a penalty of \(\varepsilon\) for visited trails on which no illegal activity was found, thus resulting in a regularization of the choice \(\eta\) that maximizes the expected number of encounters with illegal activity:

\[r_\mu(\eta)= \sum_{j=1}^M \eta_j \sum_{i=1}^n \mu_{i} C_i^j- \varepsilon C_i^j(1-\mu_{i}).\]

If the rangers know the vector of probabilities \(\mu\), then finding the choice \(\eta\) that maximizes \(r_\mu(\eta)\) is equivalent to solving the following mixed-integer linear program (MILP):

\[(PG)_\mu \begin{cases}
    \underset{ \eta \in \mathbb{R}^M, y \in \mathbb{R}^n } \max &(1+\varepsilon)\sum \mu_i y_i -\varepsilon \lVert C_G \eta \rVert_1 \\
    &\text{ s.t.:}  \\
    & \sum_{j=1}^M \eta_j=B \\
    & y_i=\sum_{j=1}^M C_{i}^j\eta_j  \\
    & \eta_j \in \{0,1\}, y \in \{0,1,\cdots,B\}
    \end{cases}. \tag1\]

This formula’s resemblance to a maximum coverage problem (MCP) suggests the implementation of random rounding algorithms to efficiently find approximate solutions. But \((1)\) has a computational advantage over MCP because the relaxed problem without the integrality constraints is in fact equivalent to the original MILP.

In practice, however, the rangers do not know the probabilities of illegal activities along the various trails; while we can estimate such probabilities based on historical data, the data may change gradually over time as a result of the poachers’ strategic behavior. We therefore implement a combinatorial upper confidence bound (CUCB) algorithm that allows us to sequentially choose patrolling routes and gather new information to improve future decisions [1]. In CUCB, users have a historical record of two variables for each trail: (i) The number of times that each trail has been patrolled \((T_i)\) and (ii) the percentage of times that illegal activity was detected \((\overline{X}_i)\). Using those quantities, we can estimate the probability \(\mu_i\) using the CUCB rule

\[\overline{\mu}_i= \overline{X}_i+ \sqrt{\frac{\ln(k)}{T_i}},\]

where \(k\) is the number of elapsed patrolling stages. The estimation vector \(\overline{\mu}\) then becomes an input to solve the problem \((PG)_{\overline{\mu}}\). Afterwards, the ranger follows the resulting optimal route and updates the variables \(T_i\) and \(\overline{X}_i\) for each visited trail. The way in which we estimate the probabilities guarantees that the algorithm will give more weight to trails that have either been visited only a few times or have a high history of illegal activity. A second advantage of CUCB is that we can pre-train it with historical data or no initial data at all. Theoretical guarantees (known as regret bounds) reveal that when we implement CUCB for this problem, the expected number of times that rangers patrol a suboptimal route increases at the lowest possible asymptotical rate. 

Animation 1. For this computational demonstration, we used an actual map of trails in the Jama-Coaque Ecological Reserve in Ecuador. We simulated a poacher attempting to hunt and log within the reserve by following one of three routes, starting from a node on the map’s boundary. The poacher follows those routes in red on the lower portion of the animation with a uniform probability. A single ranger \((B=1)\) is watching over the reserve with no historical information and ignoring the trails within the poacher’s routes as well as the frequency at which the poacher visits them. The optimal route, assuming knowledge of the poacher's activity, is in the upper left corner. In the upper middle, a heat map shows the number of times that the ranger has visited every trail at the given stage of the combinatorial upper confidence bound. Brighter colors correspond to a higher number of visits. The upper right corner depicts the routes followed by the ranger (green) and the poacher (red), as well as the trails on which they meet (purple). Animation courtesy of Nicolas Betancourt.

Animation 1 shows that after several patrolling stages, our algorithm helps the ranger select the actual optimal route more frequently than any other approach. The three boxes in the bottom row of the video display the circuits that a hypothetical poacher occasionally follows with uniform probability. The ranger does not know the frequency at which the poacher visits the trails, or the specific trails that he visits. Given the probability distribution of this poacher, the optimal ranger route is outlined in green in the left upper corner. The center box in the top row provides a heat map of the number of times that the ranger has visited every trail so far, with brighter colors corresponding to a higher number of visits. The upper right corner depicts the ranger’s routes in green and the poacher’s routes in red, matching routes are in purple.

The amount of protected areas around the world is expanding, but the manpower that is allocated to monitor and safeguard these spaces is often insufficient. This limitation poses a challenge to the people in charge of these areas, who now must optimally allocate the patrolling resources in extensive tracts of land and are often in great disadvantage against the attackers. Our online learning approach to assist decision making in the patrol of extensive tracts of protected land yielded an algorithm that considers a vast combinatorial structure of alternatives along with statistical features that describe the history of illegal activity. We hope that the early outcomes of our research can be part of a solution in the continuous problem of monitoring large protected areas using limited patrolling resources.


Nicolas Betancourt presented this research during a contributed presentation at the 2022 SIAM Conference on Mathematics of Planet Earth, which took place concurrently with the 2022 SIAM Annual Meeting in Pittsburgh, Pa., this July.

References
[1] Chen, W., Wang, Y., & Yuan, Y. (2013). Combinatorial multi-armed bandit: General framework and applications. In S. Dasgupta & D. McAllester (Eds.), Proceedings of the 30th international conference on machine learning (Vol. 28) (pp. 151-159). Atlanta, GA.
[2] Kar, D., Nguyen, T.H., Fang, F., Brown, M., Sinha, A., Tambe, M., & Jiang, A.X. (2018). Trends and applications in Stackelberg security games. In T. Başar & G. Zaccour (Eds.), Handbook of dynamic game theory (pp. 1223-1269). Cham, Switzerland: Springer.
[3] Velasco, M., & Betancourt, N. (2022). Green security games in graphs. In preparation.
[4] Xu, L., Bondi, E., Fang, F., Perrault, A., Wang, K., & Tambe, M. (2020). Dual-mandate patrols: Multi-armed bandits for green security. Preprint, arXiv:2009.06560.

  Nicolas Betancourt is a master's student at the Universidad de los Andes in Colombia. His interests lie within the interactions of mathematics and computational methods with sustainability, especially natural resource management.  
  Mauricio Velasco is an associate professor at the Universidad de los Andes and an associate editor for the SIAM Journal on Applied Algebra and Geometry. His research work focuses on algebraic methods in optimization, coding theory, and machine learning. 

 

blog comments powered by Disqus