
Backstory
My world was upturned in Dec 2019 when I found out my niece, Tylee Ryan, and her little brother, JJ Vallow, were missing. Keith Morrison from Dateline described my family’s case as one of the darkest he’s ever covered.
One thing that makes this criminal case unique, in all the worst ways, is its complexity. There were so many players combined with complicated, blended families, name similarities (e.g., Melanie Gibb vs Melani Boudreaux), and a lexicon wholly unique to this case (e.g., ‘zombie’, ‘translated beings’, ‘seven gatherers’, ‘casting’). I started with a timeline of the case, but I went back to school to study data science specifically to gain the skills I needed to create resources that were less analog than a timeline (though I do have it on good authority that the FBI used my timeline in their research of the case).
Now that we’re three trials in—with one more to go before we start round ✌️ with appeals—I wanted to create an app that law enforcement, reporters, documentarians, and amateur sleuths could use to search across these trials. Originally I built it for myself and was running it locally on my computer, but after talking to a reporter who was having a difficult time tracking down facts for a podcast series, I decided to make it public.
I launched the Daybell Case App last week following Lori Vallow Daybell’s trial in Arizona, where she was found guilty of conspiracy in the death of her fourth husband, Charles Vallow. He was shot and killed July 11, 2019. I attended that trial and spent the entire trial building the app.

Why Natural Language Processing
I started using natural language processing (NLP) in client dashboards years ago—before generative AI became all the rage—because its ability to extend the usefulness of a search field. Traditional search fields rely on exact matches, forcing users to know the precise terms or filters the data requires. NLP eliminates this friction by interpreting the user’s intent, not just the literal words they type. This allows dashboards to handle synonyms, related concepts, misspellings, and even natural language queries like questions or phrases.
For example, if a user searches for URLs or page titles containing ‘lipstick’, NLP can apply lemmatization to also match variations like ‘lipsticks’, ensuring plural forms aren’t overlooked. More powerfully, synonym recognition can expand the search to include related products such as ‘lip stain’, ‘chapstick’, ‘lip gloss’, or ‘lip liner’. This means users don’t need to guess the exact terminology used in URLs or page titles—NLP bridges that gap by mapping user language to the site’s actual content structure. Even broader queries like ‘best products for dry lips’ can be interpreted to surface relevant pages, whether they mention ‘lip balm’ or ‘hydrating lip treatments’.
The result is a far more intuitive, flexible way to explore e-commerce content through dashboards, giving users richer, more relevant search results without the need for precision-perfect queries. By turning static search bars into intelligent content discovery tools, NLP makes dashboards exponentially more useful for both casual browsing and targeted research.
How I Incorporated NLP in the App
I added a toggle (‘Include related keywords’) that allows a user to expand search results to include different tenses of verbs, plural/singular versions of a search term, and lemmatized versions of a word.
The app handles two categories of phrase match:
- NLTK’s dictionaries: This category applies NLP to keywords using the dictionaries included in the NLP Python library I used, NLTK.
- Case-Specific dictionaries : If a keyword matches any of my custom dictionaries, it will use the custom dictionary over NLTK.
NLTK’s Dictionaries
In the example below, a search for the keyword ‘fight’ doesn’t just return articles containing that specific word; it also includes related terms like ‘fighting’ and ‘struggle’. Behind the scenes, the app leverages NLP techniques such as lemmatization to account for variations like ‘fighting’ and synonym mapping to surface conceptually similar terms like ‘struggle’. This ensures users can discover relevant content even if different language is used in the articles, reducing the need to guess the exact phrasing. By intelligently interpreting search intent, the app delivers a more comprehensive and intuitive search experience for complex trial coverage.


Case-Specific Dictionary
I created a custom dictionary that is specific to this grizzly case. For example, these cult members believed each of the victims pictured on the homescreen (as well as others) had been taken over by evil spirits. But they didn’t just refer to them as evil spirits; they also used terms like ‘demon’ and ‘zombie’—and even gave these evil spirits names, like ‘Ned’, ‘Hiplos’, ‘Hillary’, ‘Elroy’, and ‘Viola’.
What I wanted was to able to return matches for all of these terms if a user searched for one of them. Under the hood, this is what that looked like.

When I imported this custom_keyword_groups dictionary into my app.py file (the standard naming convention for the file that serves as a Python app’s engine), I assigned it an alias of ‘ckg’ for ease of reference and incorporated that logic into a loop that cycles through the dictionary to gather the acceptable synonyms.

Note: The lower() method converts search terms to lowercase to maximize matching potential.
Now if a user searches for a term included in one of these custom dictionaries (which are detailed in the app’s Lexicon page), all of the related terms will be highlighted on the transcript page they open.

Note: The arrows in the bottom-right corner allow a user to quickly cycle through the matches as these transcripts are quite long. A user with access to a keyboard can also use their left and right arrow keys.
And because I curate a timeline for the case, I also included some what I called one-to-many match lists. So if I search ‘all months’, for example, it will return matches for all dates included in the app.

I have similar lists for ‘group members’, ‘kids’ (which includes the names of the four kids who were identified as zombies as well as general terms like ‘kid’, ‘child’, ‘kids’, ‘children’, etc), ‘days of the week’, and ‘social media’.
Why These Applications of NLP
There are many ways NLP can be incorporated into an app’s search field. I wanted to find related keywords and keywords that are related by root term (e.g., ‘run’, ‘running’, ‘ran’). That required three steps:
Perform Part-of-Speech (POS) Tagging
Part-of-speech tagging is the process of figuring out how a word is being used in a sentence, i.e., whether it’s a noun, verb, adjective, or adverb. This is important because many words can take on different meanings depending on their role. For example, the word ‘fight’ could be a noun (‘a fight broke out’) or a verb (‘they fight daily’). In the context of search, knowing the correct part of speech helps the app find more relevant related terms. The app uses POS categories (like noun and verb) when querying related words to avoid mismatched words that don’t fit the user’s intent.
Set Up WordNet Synonym and Related Word Lookup
WordNet is a huge database of English words that groups similar words into collections called synsets. These synsets capture relationships between words, including synonyms, broader or narrower terms, and even derivationally related forms.
When the app receives a search term, it queries WordNet for related words that match the correct part of speech. For instance, searching for ‘fight’ as a verb might return related words like ‘struggle’, ‘brawl’, or ‘combat’. This process expands the search beyond exact matches, allowing users to find content even when different wording is used. WordNet adds depth and flexibility to a search field by teaching the app how words are connected in meaning.
Include Stemming to Broaden Matches
The SnowballStemmer is a tool that simplifies words down to their root form, helping the app recognize different variations of the same word. For example, ‘fights’, ‘fighting’, and ‘fought’ are all variations of the word ‘fight’. Rather than storing every possible form manually, the stemmer algorithm strips words down to their core meaning (the ‘stem’) so they can be treated as equivalent in search.
Stemming is especially useful in content-heavy dashboards where articles might use different tenses or plural forms. By applying stemming, the app ensures that a search for ‘fight’ still returns content containing ‘fighting’ or ‘fought’, giving users more complete and accurate results.
Not Just for Criminal Science
Although I do see more examples of AI being used in criminal justice (examples from my AI Timeline app), this technology isn’t limited to crime fighting. Apps like the one I built here could be used by any organization that is bogged down with large corpora of text (think HR docs, legal filings, university courses, etc).
If money had been no issue, I could have made the app infinitely more useful by adding the thousands of pages of documents released in compliance with the Freedom of Information Act (frequently referred to as ‘FOIA docs’), transcripts I’ve gathered from the many news magazines that have featured the case, my timeline, transcripts from the trials, etc. Now imagine adding a chat component to the app, visualizations, and machine learning capabilities, such as topic modeling. (Learn more about topic modeling in my free, interactive Machine Learning Model Picker).
Leave a Reply