Skip to main content

MongoDB Atlas

Compatibility

Only available on Node.js.

You can still create API routes that use MongoDB with Next.js by setting the runtime variable to nodejs like so:

export const runtime = "nodejs";

You can read more about Edge runtimes in the Next.js documentation here.

LangChain.js supports MongoDB Atlas as a vector store, and supports both standard similarity search and maximal marginal relevance search, which takes a combination of documents are most similar to the inputs, then reranks and optimizes for diversity.

Setup

Installation

First, add the Node MongoDB SDK to your project:

npm install -S mongodb

Initial Cluster Configuration

Next, you'll need create a MongoDB Atlas cluster. Navigate to the MongoDB Atlas website and create an account if you don't already have one.

Create and name a cluster when prompted, then find it under Database. Select Collections and create either a blank collection or one from the provided sample data.

Note The cluster created must be MongoDB 7.0 or higher. If you are using a pre-7.0 version of MongoDB, you must use a version of langchainjs<=0.0.163.

Creating an Index

After configuring your cluster, you'll need to create an index on the collection field you want to search over.

Switch to the Atlas Search tab and click Create Search Index. From there, make sure you select Atlas Vector Search - JSON Editor, then select the appropriate database and collection and paste the following into the textbox:

{
"fields": [
{
"numDimensions": 1024,
"path": "embedding",
"similarity": "euclidean",
"type": "vector"
}
]
}

Note that the dimensions property should match the dimensionality of the embeddings you are using. For example, Cohere embeddings have 1024 dimensions, and by default OpenAI embeddings have 1536:

Note: By default the vector store expects an index name of default, an indexed collection field name of embedding, and a raw text field name of text. You should initialize the vector store with field names matching your index name collection schema as shown below.

Finally, proceed to build the index.

Usage

npm install @langchain/community

Ingestion

import { MongoDBAtlasVectorSearch } from "@langchain/mongodb";
import { CohereEmbeddings } from "@langchain/cohere";
import { MongoClient } from "mongodb";

const client = new MongoClient(process.env.MONGODB_ATLAS_URI || "");
const namespace = "langchain.test";
const [dbName, collectionName] = namespace.split(".");
const collection = client.db(dbName).collection(collectionName);

const vectorstore = await MongoDBAtlasVectorSearch.fromTexts(
["Hello world", "Bye bye", "What's this?"],
[{ id: 2 }, { id: 1 }, { id: 3 }],
new CohereEmbeddings({model: "embed-english-v3.0",}),
{
collection,
indexName: "default", // The name of the Atlas search index. Defaults to "default"
textKey: "text", // The name of the collection field containing the raw content. Defaults to "text"
embeddingKey: "embedding", // The name of the collection field containing the embedded text. Defaults to "embedding"
}
);

const assignedIds = await vectorstore.addDocuments([
{ pageContent: "upsertable", metadata: {} },
]);

const upsertedDocs = [{ pageContent: "overwritten", metadata: {} }];

await vectorstore.addDocuments(upsertedDocs, { ids: assignedIds });

await client.close();

API Reference:

import { MongoDBAtlasVectorSearch } from "@langchain/mongodb";
import { CohereEmbeddings } from "@langchain/cohere";
import { MongoClient } from "mongodb";

const client = new MongoClient(process.env.MONGODB_ATLAS_URI || "");
const namespace = "langchain.test";
const [dbName, collectionName] = namespace.split(".");
const collection = client.db(dbName).collection(collectionName);

const vectorStore = new MongoDBAtlasVectorSearch(new CohereEmbeddings({model: "embed-english-v3.0",}), {
collection,
indexName: "default", // The name of the Atlas search index. Defaults to "default"
textKey: "text", // The name of the collection field containing the raw content. Defaults to "text"
embeddingKey: "embedding", // The name of the collection field containing the embedded text. Defaults to "embedding"
});

const resultOne = await vectorStore.similaritySearch("Hello world", 1);
console.log(resultOne);

await client.close();

API Reference:

Maximal marginal relevance

import { MongoDBAtlasVectorSearch } from "@langchain/mongodb";
import { CohereEmbeddings } from "@langchain/cohere";
import { MongoClient } from "mongodb";

const client = new MongoClient(process.env.MONGODB_ATLAS_URI || "");
const namespace = "langchain.test";
const [dbName, collectionName] = namespace.split(".");
const collection = client.db(dbName).collection(collectionName);

const vectorStore = new MongoDBAtlasVectorSearch(new CohereEmbeddings({model: "embed-english-v3.0",}), {
collection,
indexName: "default", // The name of the Atlas search index. Defaults to "default"
textKey: "text", // The name of the collection field containing the raw content. Defaults to "text"
embeddingKey: "embedding", // The name of the collection field containing the embedded text. Defaults to "embedding"
});

const resultOne = await vectorStore.maxMarginalRelevanceSearch("Hello world", {
k: 4,
fetchK: 20, // The number of documents to return on initial fetch
});
console.log(resultOne);

// Using MMR in a vector store retriever

const retriever = await vectorStore.asRetriever({
searchType: "mmr",
searchKwargs: {
fetchK: 20,
lambda: 0.1,
},
});

const retrieverOutput = await retriever.invoke("Hello world");

console.log(retrieverOutput);

await client.close();

API Reference:

Metadata filtering

MongoDB Atlas supports pre-filtering of results on other fields. They require you to define which metadata fields you plan to filter on by updating the index. Here's an example:

{
"fields": [
{
"numDimensions": 1024,
"path": "embedding",
"similarity": "euclidean",
"type": "vector"
},
{
"path": "docstore_document_id",
"type": "filter"
}
]
}

Above, the first item in fields is the vector index, and the second item is the metadata property you want to filter on. The name of the property is path, so the above index would allow us to search on a metadata field named docstore_document_id.

Then, in your code you can use MQL Query Operators for filtering. Here's an example:

import { MongoDBAtlasVectorSearch } from "@langchain/mongodb";
import { CohereEmbeddings } from "@langchain/cohere";
import { MongoClient } from "mongodb";

import { sleep } from "langchain/util/time";

const client = new MongoClient(process.env.MONGODB_ATLAS_URI || "");
const namespace = "langchain.test";
const [dbName, collectionName] = namespace.split(".");
const collection = client.db(dbName).collection(collectionName);

const vectorStore = new MongoDBAtlasVectorSearch(new CohereEmbeddings({model: "embed-english-v3.0",}), {
collection,
indexName: "default", // The name of the Atlas search index. Defaults to "default"
textKey: "text", // The name of the collection field containing the raw content. Defaults to "text"
embeddingKey: "embedding", // The name of the collection field containing the embedded text. Defaults to "embedding"
});

await vectorStore.addDocuments([
{
pageContent: "Hey hey hey",
metadata: { docstore_document_id: "somevalue" },
},
]);

const retriever = vectorStore.asRetriever({
filter: {
preFilter: {
docstore_document_id: {
$eq: "somevalue",
},
},
},
});

// Mongo has a slight processing delay between ingest and availability
await sleep(2000);

const results = await retriever.invoke("goodbye");

console.log(results);

await client.close();

API Reference:


Was this page helpful?


You can also leave detailed feedback on GitHub.