Application of Big Data Analytics to Support Homeland Security Investigations Targeting Human Smuggling Networks

Thomas Hodge


The continuous pressure from large volumes of aliens attempting to enter the country illegally creates a persistent challenge for the 20,000 office of border patrol (OBP) agents attempting to apprehend hundreds of thousands of aliens annually.[1] Human smuggling organizations (HSO) facilitating the smuggling of aliens into the United States have an unlawful network supporting their illicit transnational activities. Identifying those networks and the key facilitators is challenging due to high volumes of disparate data.

The research question for this thesis is how can big data analytics improve the effectiveness and efficiency of Homeland Security Investigations (HSI) targeting human smuggling networks? The purpose of this thesis is to determine whether applying big data analytics to data associated with human smuggling will make network identification of illegal aliens more efficient while producing the necessary articulable facts to substantiate enough probable cause for subsequent investigative actions.

An experimental data analytics application called Citrus was used to examine the efficiency and effectiveness of data analytics supporting criminal investigations. Citrus is a free-for-government use software tool designed by Sandia National Laboratories. The Department of Homeland Security Science and Technology provided this application to HSI for testing and evaluation. Citrus was built to discover, trend, and link disparate data and is being used as an analytics application developed and implemented by HSI Phoenix.

The research compares the results of queries being conducted manually to a same set of problems with Citrus, which measures the total number of targets, network identification, whether enough documentation is obtained to justify problem cause, and the timelessness of the results. To test the efficiency and the effectiveness of data analytics with Citrus, two types of tests were conducted, search and discovery.

  • The search query consists of extrapolating the highest volume of phone numbers from 45, 90, and 180 days of Arizona border patrol station reports and correlating those results to other indexes relating to financial and communication data.
  • The discovery query tests two reports created with Citrus. The quality of evidence report measures levels of probable cause against known and unknown entities from within a large dataset. The co-occurrence report is designed to determine the total number of phone number pairings from thousands of OBP reports, which has led to potential human smuggling network identification.

Efficiency is measured by comparing the time it takes to complete a search and a discovery query manually versus using the Citrus application, in identifying subjects and human smuggling networks. Effectiveness is measured by comparing the number of subjects and networks identified manually versus using the Citrus application for the search and discovery queries.

Determining whether probable cause can be obtained simultaneously while increasing the efficacy of human smuggling analysis is also a measure of effectiveness. Understanding the level of probable cause required to substantiate a warrant is important in targeting HSO networks, as it becomes the basis for proving human smuggling criminal violations. Probable cause exists when reasonable suspicion related to human smuggling activities can be articulated in a legal sense. For this study, probable cause exists when specific phone numbers are recorded frequently in OBP reports linked to communications records with other phone numbers also associated with OBP reports, and combined with financial transactions used by the same phone numbers.

Big data analytics is generally defined as large volumes of data that require large computer capacities and applications to discover meaningful insights.[2] The value of big data is realized when applied to understanding large datasets that eventually lead to making better decisions.[3] Big data analytics as a research topic is relatively new and has the interest of several industries, including the government.[4]

Data is growing at exponential rates.[5] Scores of authors promote the use of big data analytics and the potential value. From garnering greater insights to potentially altering the standard scientific method, big data analytics is a growing technology that has great benefits. The public sector, however, is developing the technology at slower rates compared to the private sector.[6] The application of data analytics may allow the conversion of large datasets into insights that result in better decisions for the public sectors.

Although big data analytics presents value and opportunity for the public sector, academic literature is scarce in supporting big data analytics in practice for public entities, according to Gammage.[7] Theoretical frameworks for big data analytics are also lacking. Additionally, literature for managers is scant that describes how best to develop and integrate big data analytics.[8] Moreover, a framework specifically for law enforcement or for federal investigations is nonexistent. HSI has a broad mission and validating how analytics is applied against different criminal programmatic areas is necessary before applying analytics nationally.[9]

The results of this research demonstrate that Citrus works well for triaging large amounts of data. The efficiency of Citrus to sift through voluminous amounts of reporting and communication and financial data was exponentially better. The effectiveness also proved to be substantially better with Citrus when compared to the same analysis process conducted manually. The amount of additional reports and the capability to calculate probable cause was decisively more effective with Citrus, even though dataset was limited.

Investigative discoveries may be made more efficient and effective with data analytics. The implications for HSI are significant, particularly relating to changing analytical tradecraft, revamping data systems, and increasing investigative process capacities as summarized as follows.

(1)               Analytical Tradecraft

The application of data analytics may reshape analytical tradecraft. Citrus demonstrates that analysts are able to create and answer hypotheses on a deeper level that leads to greater network identifications. With data analytics, new forms of analytical tradecraft can be produced, as data analytics potentially creates an unlimited means of reviewing and analyzing data in bigger ways.

(2)               Merging Data

Advancing data analytics requires HSI to remove barriers between data systems, which are imperative to maximizing the value of data analytics. HSI should move beyond systems designed to work well against one particular dataset to aggregated data from across the breadth of systems. Revamping the current HSI systems architecture may be necessary in evolving to a more data-driven organization through analytics.

(3)               Investigative Processes

With increases in efficiencies through data analytics, the analysis process and production may outpace investigation processes. If analytics can immediately identify which entities or persons within the data already possess enough probable cause exist, theoretically the HSI investigation process can be accelerated. This acceleration will have an impact on the judicial process, particularly relating to processing capacities of the courts. Upgrading the processing capacities for obtaining warrants will become vital as analytics becomes more prevalent.

In summary, HSI can be more effective and efficient in investigating and targeting criminal networks with data analytics. Exploring and investing further in the technology should be a high priority, as data analytics offers HSI enormous potential.


[1] “Stats and Summaries,” U.S. Customs and Border Protection, 1, accessed August 21, 2017, https://

[2] Emerging Technologies Big Data Community of Interest, HM Government Horizon Scanning Programme Emerging Technologies: Big Data (London: HM Government, 2014), 2,

[3] Amir Gandomi and Murtaza Haider, “Beyond the Hype: Big Data Concepts, Methods, and Analytics,” International Journal of Information Management 35, no. 2 (April 2015): 140, https://

[4] Jonathan Seddon and Wendy L. Currie, “A Model for Unpacking Big Data Analytics in High-Frequency Trading,” Journal of Business Research 70, no. C (2017): 301.

[5] Ase Dragland, “Big Data, for Better or Worse: 90% of World’s Data Generated over Last Two Years,” ScienceDaily, 1, May 22, 2013,

[6] Pandula Gamage, “New Development: Leveraging ‘Big Data’ Analytics in the Public Sector,” Public Money & Management 36, no. 5 (2016): 385.

[7] Gamage, 385.

[8] Gerard George, Martine R. Haas, and Alex Pentland, “Big Data and Management,” Academy of Management Journal 57, no. 2 (2014): 321.

[9] “Homeland Security Investigations,” U.S. Immigration and Customs Enforcement, 1, accessed October 15, 2017,

No Comments

Post a Comment