Fusing MongoDB and Databricks to Deliver AI-Augmented Search

Vittal Pai, Francesco Baldissera, and Ashwin Gangadhar

#genAI

With customers' attention more and more dispersed across channels, platforms, and devices, the retail industry rages with the relentless competition. The customer’s search experience on your storefront is the cornerstone of capitalizing on your Zero Moment of Truth, the point in the buying cycle where the consumer's impression of a brand or product is formed.

Imagine a customer, Sarah, eager to buy a new pair of hiking boots. Instead of wandering aimlessly through pages and pages of search results, she expects to find her ideal pair easily. The smoother her search, the more likely she is to buy. Yet, achieving this seamless experience isn't a walk in the park for retailers.

Enter the dynamic duo of MongoDB and Databricks. By equipping their teams with this powerful tech stack, retailers can harness the might of real-time in-app analytics. This not only streamlines the search process but also infuses AI and advanced search functionalities into e-commerce applications. The result? An app that not only meets Sarah's current expectations but anticipates her future needs.

In this blog, we’ll help you navigate through what are the main reasons to implement an AI-augmented search solution by integrating both platforms. Let’s embark on this!

Check out our AI resource page to learn more about building AI-powered apps with MongoDB.

A solid foundation for your data model

For an e-commerce site built around the principles of an Event Driven and MACH Architecture, the data layer will need to ingest and transform data from a number of different sources.

Heterogeneous data, such as product catalog, user behavior on the e-commerce front-end, comments and ratings, search keywords, and customer lifecycle segmentation- all of this is necessary to personalize search results in real time. This increases the need for a flexible model such as in MongoDB’s documents and a platform that can easily take in data from a number of different sources- from API, CSV, and Kafka topics through the MongoDB Kafka Connector.

MongoDB's Translytical capabilities, combining transactional (OLTP) and analytical (OLAP) offer real-time data processing and analysis, enabling you to simplify your workloads while ensuring timely responsiveness and cost-effectiveness.

Now the data platform is servicing the operational needs of the application- what about adding in AI? Combining MongoDB with Databricks, using the MongoDB Spark Connector can allow you to train your models with your operational data from MongoDB easily and to trigger them to run in real-time to augment your application as the customer is using it.

The source, which includes SLP/PLP, Merchant Product Listing, and other sources, connects to MongoDB utilizing the Kafka Stream Processor. MongoDB collections include click logs, attributes, ATP Status, Price, and Product Search Catalog Corpus.
Centralization of heterogeneous data in a robust yet flexible Operational Data Layer

The foundation of an effective e-commerce data layer lies in having a solid yet flexible operational data platform, so the orchestrating of ML models to run at specific timeframes or responding to different events, enabling crucial data transformation, metadata enrichment, and data featurization becomes a simple, automated task for optimizing search result pages and deliver a frictionless purchasing process.

Check out this blog for a tutorial on achieving near real-time ingestion using the Kafka Connector with MongoDB Atlas, and data processing with Databricks Spark User Defined Functions.

Adding relevance to your search engine results pages

To achieve optimal product positioning on the Search Engine Results Page (SERP) after a user performs a query, retailers are challenged with creating a business score for their products' relevance. This score incorporates various factors such as stock levels, competitor prices, and price elasticity of demand.

These business scores are complex real-time analyses calibrated against so many factors- it’s a perfect use case for AI. Adding AI-generated relevance to your SERPs can accurately predict and display search results that are most relevant to users' queries, leading to higher engagement and increased click-through rates, while also helping businesses optimize their content based on the operational context of their markets.

The ingestion into the MongoDB Atlas document-based model laid the groundwork for this challenge, and leveraging the MongoDB Apache Spark Streaming Connector companies can persist their data into Databricks, taking advantage of its capabilities for data cleansing and complex data transformations, making it the ideal framework for delivering batch training and inference models.

Diagram of the full architecture
Diagram of the full architecture integrating MongoDB Atlas and Databricks for an e-commerce store, real-time analytics, and search

MongoDB App Services act as the mortar of our solution, achieving an overlap of the intelligence layer in an event-driven way, making it not only real-time but also cost-effective and rendering both your applications and business processes nimble.

Make sure to check out this GitHub repository to understand in depth how this is achieved.

Data freshness

Once that business score can be calculated comes the challenge of delivering it over the search feature of your application.

With MongoDB Atlas native workload isolation, operational data is continuously available on dedicated analytics nodes deployed in the same distributed cluster, and exposed to analysts within milliseconds of being stored in the database.

But data freshness is not only important for your analytics use cases, combining both your operational data with your analytics layer, retailers power in-app analytics and build amazing user experiences across your customer touch points.

Considering MongoDB Atlas Search's advanced features such as faceted search, auto-complete, and spell correction, retailers rest assured of a more intuitive and user-friendly search experience not only for their customers but for their developers, as it minimizes the tax of operational complexity as all these functionalities are bundled in the same platform.

App-driven analytics is a competitive advantage against traditional warehouse analytics

Additionally, the search functionality is optimized for performance, enabling businesses to handle high search query volumes without compromising user experience. The business score generated from the AI models trained and deployed with Databricks will provide the central point to act as a discriminator over where in the SERPs any of the specific products appear, rendering your search engine relevance fueled and securing the delivery of a high-quality user experience.

Conclusion

Search is a key part of the buying process for any customer. Showing customers exactly what they are looking for without investing too much time in the browsing stage reduces friction in the buying process, but as we’ve seen it might not be so easy technically.

Empower your teams with the right tech stack to take advantage of the power of real-time in-app analytics with MongoDB and Databricks. It’s the simplest way to build AI and search capabilities into your e-commerce app, to respond to current and future market expectations.

Check out the video below and this GitHub repository for all the code needed to integrate MongoDB and Databricks and deliver a real-time machine-learning solution for AI-augmented Search.