Boosting Prediction Model Performance Through Feature Engineering

When building predictive models, raw variables (i.e., columns from your dataset) rarely tell the whole story. That’s where feature engineering comes in. It’s simply the process of transforming your existing variables into more meaningful features that help your model identify patterns it would otherwise miss.

This lesson was reinforced for me while using XGBoost to help an energy client predict which customer complaints could be resolved without sending out technicians. Instead of relying on raw data like exact timestamps and account age (for example), I converted select features into a variety of flags (for True/False, Yes/No output) or bins. The result? Some of these engineered features outperformed the original variables—and one of the contrived columns ended up being the strongest predictor for one of our models.

Feature engineering works whether your model is already performing well by making it more robust in production or struggling to find the signal in noisy data. I’ll even go so far as to say, it’s a good arrow to add to your quiver whether you’re running predictive analytics or not. For example, let’s say stakeholders want to know if their hip, youth-oriented ads are generating enough yeet at the register. They may want to distill their age column down to equal to and over 30 vs under 30.

Conversions from Current Project

One conversion I did was I took the date of the customer call and rolled up those dates into seasons. For complaints about high bills, the season variable ended up being the strongest predictor of cases that could safely be closed out (of course, after communicating the update to the customer).

Another conversion I performed was a boolean flag for new accounts, with the assumption being that maybe newer customers would be less aware of some of the seasonal spikes in utility costs. So I used the date of the call as the pin and returned True if the account was less than 12 months old and False if it wasn’t.

In addition to the new customer Boolean flag, I created bins for account age (e.g., < 1 yr, 1-2 yr, 3-5 yr, etc) because who knows? Maybe one of those values is particularly predictive. (One was.) A coworker and I pulled together a monster of a SQL query with a host of built-in and contrived variables. The model was solid right out of the gate, but we intend to improve it after getting through the backlog.

Note: You can learn what each of these evaluation benchmarks mean in the app I made using the Titanic dataset by hovering over the info icons.

Examples from Web Analytics Data

I rarely work with analytics data these days, but here are examples of metrics you can create from Google Analytics, Adobe Analytics, Chartbeat, MixPanel, etc.

Device Info

Is_Apple_and_Mobile: device from Apple devices
Is_Desktop: desktop-only flag

Traffic Source

Is_Direct_Traffic: brand-aware visitors who typed your URL
Is_Paid_Traffic: visitors from paid campaigns (cpc, ppc, cpm)
Is_Organic_Traffic: search engine visitors
Is_Email_Traffic: visitors from emails

Geographic

Is_Major_Metro: visitors from top X cities
Is_Target_Country_Traffic: domestic vs international
Is_English_Speaking: language barrier indicator

Datetime

Is_Weekend: Saturday/Sunday vs weekdays
Is_Business_Hours: 9-5 weekday hours
Is_Morning_Shopper: 6am-12pm visitors
Is_Evening_Shopper: 6pm-11pm visitors
Is_Holiday_Season: the date ranges for holidays significant to your business/visitors

Content

Is_Homepage_Landing: entered via homepage
Is_Product_Page_Landing: entered via product page
Is_Sale_Page_Landing: entered via promotional page
Is_Cart_Page_View: entered via any of your cart pages

Engagement

Is_Bounced: single-page sessions
Is_Long_Session: sessions over 3 minutes
Is_Multi_Product_Viewer: viewed multiple products
Has_Made_Purchase: converting visitors

Numeric

This technique isn’t limited to categorical data. You can create powerful features from continuous variables by applying thresholds.

Session duration: Is_Long_Session (>180 seconds)
Transaction revenue: Is_High_Value_Customer (>$100)
Page views: Is_High_Engagement (>3 pages)
Product views: Is_Multi_Product_Viewer (>1 product)
Day of month: Is_End_Of_Month (≥26th)

Interaction

You can even combine elements to capture complex behaviors.

Is_Mobile_Return_Visitor: mobile users who return
Is_Paid_Traffic_Bounce: paid visitors who immediately left
Is_Holiday_Campaign: pageview for dedicated holiday page or within x days of a holiday

Misc Examples

There are many more examples in the real world than what I’ve hit on in my example with customer-centric energy data or web analytics data. I’ll drop a few examples for inspiration below.

Numeric

High_Debt_Ratio: Flag applicants with debt-to-income ratios above 40%
Property_Age_Cat: Calculate property age from built_in year value and break into intuitive bins
Has_Large_Sample: Flag studies with sample sizes over 1,000 participants
Lexile_Score_Grp: Create bins from Lexile scores to group students by reading level

Datetime

Below are just few examples of how you can potentially transform datetime values to be more useful to your model.

Peak hours flags: Instead of storing exact timestamps, create a simple ‘Yes/No’ flag for whether a call came in during peak hours when your team is swamped
Weekend indicator: A true/false flag for weekend calls, since weekend issues might be more urgent or different in nature
Days since last contact: Rather than tracking multiple interaction dates, just count how many days it’s been since you last heard from this customer and then bin these values
Holiday proximity: Flag customers who called right before or after major holidays, as their issues might be time-sensitive

Categorical

I’m a big fan of binning (i.e., grouping values to make a categorical field less granular), but keep in mind not all values need a category. You could group values that are inconsequential into an ‘Other’ bucket.

Grouping rare cases: If you have 20 different complaint types but 15 of them only happen occasionally, lump those rare ones into an ‘Other’ bucket so your model can focus on the common patterns
High-value customer flag: Instead of storing exact customer tiers, just flag your VIP customers as ‘Yes’ or ‘No’
Problem complexity score: Convert detailed issue descriptions into simple ratings like ‘Simple,’ ‘Medium,’ or ‘Complex’
Is_First_Time_Buyer: Home buyers with no previous mortgage history
Customer language: If your website only offers English and Spanish, for example, group all other languages into ‘Other’

Content

You can create boolean flags from unstructured text by checking if cells contain certain keywords. This is where you can get more bang for your buck using regex and even more using Natural Language Processing.

Urgency keywords: Flag complaints that contain words like ’emergency,’ ‘urgent,’ or ‘ASAP’
Complaint length: Short complaints might be simple issues, while really long ones might indicate frustrated customers who need more attention
Emotion indicators: Simple flags for whether the customer sounds angry, confused, or satisfied based on their word choice
Is_Breaking_News: Flag articles published within 2 hours of a major event or contains ‘BREAKING:’ in page title
Property_Age_Bucket: Group properties into ‘New’ (0-10 years), ‘Established’ (10-30 years), ‘Older’ (30+ years)
Is_Long_Form: Content that exceeds 1,500 words

Why This Works

Tree-based models like XGBoost naturally create splits on features, making boolean flags and clear categories especially effective for these algorithms. It also has built-in handling for null (missing) values, which I appreciate. But, especially if you monitor the feature importance scores for your built-in and custom columns (using the feature_importances_ attribute for XGBoost), you can squeeze more value from your prediction models by taking advantage of feature engineering.

Here’s what that snippet of code looked like for my model:

# Get feature importance
importance = model.feature_importances

# Create dataframe for plotting
importance_df = pd.DataFrame({
    'Feature': feature_names,
    'Importance': importance * 100
}).sort_values('Importance', ascending=True)

Notes:

See the bar chart above to see how I visualized these feature importance scores. What it tells me is, of the 21 features I trained the model with, 16 of them actually impacted the outcomes.
Don’t multiply percentages by 100% unless you’re creating the visualizations with a Python library like I did with Plotly. (Even then I normally don’t do it. I was being lazy.) But if you pull whole numbers into a desktop dashboard tool like Power BI (like I do with this client), Tableau, etc, you’re going to need to first divide by 100.

Try It Yourself

Look at your categorical variables and ask: ‘What’s the one distinction here that probably matters most?’ Even if you have some overlap, like I did with having a new account flag (i.e., T/F) and account age buckets, you may find bins and particular boolean values are more predictive than any of your built-in features.

Image credit: Jametlene Reskp

Conversions from Current Project

Examples from Web Analytics Data

Device Info

Traffic Source

Geographic

Datetime

Content

Engagement

Numeric

Interaction

Misc Examples

Numeric

Datetime

Categorical

Content

Why This Works

Try It Yourself

Reader Interactions

Leave a Reply Cancel reply