1. What is the primary purpose of a vector database?
A) To store relational data efficiently
B) To store and search high-dimensional vector embeddings
C) To visualize neural network models
D) To convert images into text
Answer: B) To store and search high-dimensional vector embeddings
2. Which of the following is NOT typically used to generate embeddings?
A) BERT
B) OpenAI’s CLIP
C) PostgreSQL
D) Word2Vec
Answer: C) PostgreSQL
3. What is an embedding in the context of AI and machine learning?
- A) A type of SQL query
B) A low-dimensional vector representation of data
C) A database table
D) A visualization tool
Answer: B) A low-dimensional vector representation of data
4. In vector search, which metric is most commonly used to measure similarity?
A) Jaccard Index
B) Cosine similarity
C) Chi-squared distance
D) Manhattan distance
Answer: B) Cosine similarity
5. Which of the following is a popular vector database?
- A) MySQL
B) SQLite
C) Pinecone
D) MongoDB
Answer: C) Pinecone
6. What is Approximate Nearest Neighbor (ANN) search used for in vector databases?
A) Running SQL queries on text
B) Sorting images by date
C) Finding similar vectors efficiently in large datasets
D) Reducing dimensionality of data
Answer: C) Finding similar vectors efficiently in large datasets
7. Which of the following is a valid use case for a vector database?
- A) Inventory management
B) Financial transaction storage
C) Semantic search in documents
D) User role access control
Answer: C) Semantic search in documents
8. What role does dimensionality reduction play in vector storage?
A) Increases search time
B) Makes vectors non-unique
C) Reduces storage and speeds up similarity search
D) Deletes unnecessary vectors
Answer: C) Reduces storage and speeds up similarity search
9. Embeddings from models like OpenAI or HuggingFace can represent which of the following data types?
- A) Only numerical data
B) Only SQL tables
C) Text, images, and audio
D) Only structured data
Answer: C) Text, images, and audio
10. What kind of indexing is used in vector databases for fast retrieval?
- A) B-Tree indexing
B) Hash indexing
C) Inverted indexing
D) HNSW (Hierarchical Navigable Small World) indexing
Answer: D) HNSW (Hierarchical Navigable Small World) indexing
Section A: Embeddings Basics
- What is a vector embedding?
A) A type of database schema
B) A compressed image format
C) A numerical representation of data in vector space
D) A cloud-based file system
Answer:C - Embeddings are commonly used to represent:
A) Colors
B) Raw bytes
C) Semantic information (e.g., words, images, documents)
D) SQL queries
Answer:C - Which of the following is NOT typically used to generate embeddings?
A) BERT
B) Word2Vec
C) FAISS
D) OpenAI Embedding APIs
Answer:C - High-dimensional embeddings help in capturing:
A) Simple arithmetic
B) Complex semantic relationships
C) Network latency
D) Disk space
Answer:B - In NLP, vector embeddings are primarily used to:
A) Sort documents
B) Encode semantic meaning of words/sentences
C) Create relational tables
D) Define regex rules
Answer:B
Section B: Vector Similarity and Distance Metrics
- Which of the following is commonly used as a similarity metric in vector databases?
A) Hamming distance
B) Cosine similarity
C) Alphabetical order
D) Boolean logic
Answer:B - Euclidean distance is most suitable for measuring:
A) Direction similarity
B) Angular similarity
C) Straight-line distance between vectors
D) Document frequency
Answer:C - Cosine similarity measures:
A) The angle between two vectors
B) The speed of query execution
C) The length of documents
D) The color of data points
Answer:A - A lower cosine distance between two vectors indicates:
A) Greater dissimilarity
B) Greater similarity
C) Data corruption
D) Indexing error
Answer:B - Dot product is equivalent to cosine similarity when:
A) Vectors are normalized
B) Data is binary
C) Database is indexed
D) GPU is used
Answer:A
Section C: Vector Databases Fundamentals
- Which of the following is a specialized vector database?
A) PostgreSQL
B) MongoDB
C) Pinecone
D) SQLite
Answer:C - What is a key feature of a vector database?
A) Joins and foreign keys
B) Vector-based similarity search
C) Relational table structure
D) Blockchain storage
Answer:B - ANN in vector search stands for:
A) Artificial Node Network
B) Approximate Nearest Neighbor
C) Active Neural Node
D) Average Neural Network
Answer:B - FAISS is a library developed by:
A) Google
B) OpenAI
C) Facebook (Meta)
D) Microsoft
Answer:C - Milvus, Weaviate, and Qdrant are examples of:
A) Web servers
B) Key-value stores
C) Vector databases
D) Text encoders
Answer:C
Section D: Applications
- A key application of vector databases is:
A) Creating bar charts
B) Semantic search
C) Sorting numbers
D) File compression
Answer:B - In recommendation systems, embeddings help by:
A) Sorting users alphabetically
B) Mapping users/items into vector space for similarity
C) Encrypting user preferences
D) Running SQL triggers
Answer:B - Which industry benefits significantly from vector-based search?
A) Retail
B) Healthcare
C) Finance
D) All of the above
Answer:D - Vector search outperforms traditional search when dealing with:
A) Numerical queries
B) Fuzzy, semantic queries
C) Exact matching
D) Sorted lists
Answer:B - Hybrid search combines:
A) AI with cryptocurrency
B) Vector similarity with keyword-based search
C) SQL and NoSQL databases
D) Embeddings with charts
Answer:B
Section E: Indexing & Storage
- The primary purpose of indexing in vector databases is to:
A) Compress vectors
B) Speed up similarity search
C) Apply encryption
D) Store raw images
Answer:B - What does IVF stand for in the context of FAISS?
A) Inverted File Index
B) Indexed Vector Function
C) Internal Vector Filter
D) Immediate Vector Fetch
Answer:A - Which of the following is an index type used in FAISS?
A) B-Tree
B) PQ (Product Quantization)
C) Hash Map
D) LRU Cache
Answer:B - The HNSW algorithm stands for:
A) Hierarchical Neural Search Window
B) Hyper Network Signal Watch
C) Hierarchical Navigable Small World
D) Hybrid Node Search Weight
Answer:C - HNSW is known for its:
A) Slow insert speed
B) Accurate nearest neighbor search
C) Linear scan performance
D) Use of SQL triggers
Answer:B - Product Quantization (PQ) helps by:
A) Encrypting embeddings
B) Reducing vector size for faster search
C) Creating backup indexes
D) Clustering search results
Answer:B - Scalar quantization is a type of:
A) Embedding generation
B) Approximation technique for vectors
C) Database schema
D) Network protocol
Answer:B - In vector search, brute-force method refers to:
A) Index-less exact similarity search
B) Using GPU for quantization
C) Encrypting vectors
D) Running SQL scripts
Answer:A - Which hardware significantly accelerates vector search operations?
A) HDD
B) GPU
C) Router
D) SSD
Answer:B - Which of the following is a disadvantage of brute-force search?
A) High accuracy
B) Low latency
C) Poor scalability on large datasets
D) Vector corruption
Answer:C
Section F: Hybrid Search & Applications
- Hybrid search combines vector search with:
A) Graph databases
B) Relational logic
C) Keyword or symbolic search
D) Temporal clustering
Answer:C - A benefit of hybrid search is:
A) Limited query types
B) Full support only for SQL
C) Ability to combine semantic and keyword matching
D) Data compression
Answer:C - Which platform natively supports hybrid search?
A) Elasticsearch + dense vector plugin
B) SQLite
C) Hadoop
D) GitHub
Answer:A - In hybrid search, vector search improves:
A) Spelling correction
B) Semantic relevance
C) Page load speed
D) Data backup
Answer:B - A use case for hybrid search is:
A) Transactional banking systems
B) Semantic + keyword legal document search
C) Color rendering
D) Encryption
Answer:B
Section G: Vector Database Architecture
- In a vector database, the embedding dimension affects:
A) Query language
B) Vector space resolution
C) Transaction throughput
D) Query syntax
Answer:B - A high-dimensional embedding space can lead to:
A) Lower latency
B) Curse of dimensionality
C) Exact keyword match
D) Simplified database schema
Answer:B - Vector databases often rely on which type of architecture?
A) Monolithic
B) Serverless
C) Microservices with storage and compute separation
D) Blockchain
Answer:C - Real-time vector search requires:
A) Batch indexing only
B) Low-latency infrastructure
C) File transfer protocol
D) Legacy SQL engines
Answer:B - Cloud-native vector databases are typically designed for:
A) Offline search
B) On-prem analytics only
C) Scalability and distributed search
D) Graph rendering
Answer:C
Section H: Real-World Use Cases
- In e-commerce, vector databases can improve:
A) Checkout processing
B) Product recommendations based on user intent
C) Payment encryption
D) Invoice generation
Answer:B - In image search, embeddings are typically generated by:
A) Word2Vec
B) CNNs (Convolutional Neural Networks)
C) SQL queries
D) QR scanners
Answer:B - A typical vector embedding for a sentence might have dimensions of:
A) 3
B) 10
C) 768 or more
D) 1,000,000
Answer:C - In personalized search engines, vector embeddings help to:
A) Increase advertisement cost
B) Predict user preferences semantically
C) Break down session cookies
D) Encrypt results
Answer:B - One use case of vector search in legal tech is:
A) Matching client names
B) Identifying similar legal clauses across documents
C) Rendering HTML contracts
D) Printing legal forms
Answer:B - In genomics, vector embeddings are useful for:
A) Identifying similar gene sequences
B) Formatting DNA files
C) Password protecting data
D) Streaming videos
Answer:A - In customer support systems, vector search enhances:
A) Response time through semantic FAQ matching
B) Ticket creation speed
C) Dashboard design
D) Login security
Answer:A - Embeddings in cybersecurity can be used for:
A) Visualizing passwords
B) Semantic anomaly detection in logs
C) Compressing malware
D) Generating CAPTCHAs
Answer:B - Vector similarity can help in detecting:
A) Semantic duplicates
B) File corruption
C) Primary keys
D) Cloud latency
Answer:A - Vector databases are NOT ideal for:
A) Full-text semantic search
B) Social media content similarity
C) Inventory accounting systems
D) Image similarity search
Answer:C
Section I: Deployment & Scaling
- Which cloud provider offers managed vector database services?
A) AWS
B) Azure
C) GCP
D) All of the above
Answer:D - A critical factor for scaling vector databases is:
A) Table joins
B) Efficient memory usage and index sharding
C) Audit logging
D) Backup frequency
Answer:B - Embedding models are usually deployed:
A) Separately from the vector database
B) Inside the database engine
C) Only during batch jobs
D) In spreadsheets
Answer:A - Horizontal scaling of a vector DB involves:
A) Adding more indexes to the same machine
B) Adding more nodes to handle increased load
C) Shrinking embedding sizes
D) Removing indexes
Answer:B - Vector data is often stored in:
A) Relational schema
B) Flat files
C) Columnar format or binary blobs
D) CSV only
Answer:C - Which of these databases provides built-in distributed vector search?
A) SQLite
B) Redis with Vector extension
C) Neo4j
D) Notepad
Answer:B - Which tool allows REST or gRPC API access to vector DBs?
A) Weaviate
B) Excel
C) Hive
D) PowerPoint
Answer:A - Embeddings are usually updated when:
A) Schema changes
B) New data or model updates occur
C) SQL indexes are rebuilt
D) The database is restarted
Answer:B - To persist vector data across restarts, a vector DB must support:
A) Auto-scaling
B) Disk-based storage or checkpointing
C) Dark mode
D) Sorting by key
Answer:B - Milvus uses which architecture?
A) Blockchain
B) Monolithic binary
C) Microservices with separate components for storage, query, and indexing
D) Serverless Lambda only
Answer:C
Section J: Embedding Generation & Model Integration
- Which of the following can generate sentence embeddings?
A) GPT models
B) BERT variants (e.g., Sentence-BERT)
C) OpenAI Embedding API
D) All of the above
Answer:D - When using OpenAI’s text-embedding-ada-002, the output is:
A) A PDF document
B) A SQL table
C) A 1536-dimensional vector
D) A CSV file
Answer:C - Embedding models are usually trained using:
A) Supervised learning only
B) Unsupervised or contrastive learning techniques
C) Decision trees
D) SQL triggers
Answer:B - In multi-modal vector search, what type of data can be embedded together?
A) Images only
B) Text only
C) Text and images/audio combined
D) Only structured tables
Answer:C - Open-source embedding models can be deployed using:
A) Hugging Face Transformers
B) Docker containers
C) ONNX format
D) All of the above
Answer:D - A drawback of large embedding models is:
A) Low accuracy
B) High latency and resource consumption
C) Inability to scale
D) Lack of documentation
Answer:B - For privacy-sensitive embeddings, companies often:
A) Host models locally
B) Use API gateways
C) Avoid third-party APIs
D) All of the above
Answer:D - Normalizing embeddings before indexing is useful for:
A) Faster compression
B) Consistent similarity calculations
C) Cloud billing
D) SQL querying
Answer:B - Vector normalization typically involves:
A) Resizing the database
B) Dividing the vector by its L2 norm
C) Reversing cosine similarity
D) Adding random noise
Answer:B - Fine-tuning embedding models can improve:
A) File download speed
B) Search relevance in a specific domain
C) Battery life
D) Index rebuilding
Answer:B
Section K: Evaluation & Quality Control
- A common metric to evaluate vector search quality is:
A) SQL response time
B) Accuracy@K (Top-K accuracy)
C) Ping latency
D) File size
Answer:B - Recall@10 measures:
A) Database restart time
B) Number of true neighbors in top 10 results
C) Index rebuilding time
D) Vector corruption rate
Answer:B - Precision in vector search refers to:
A) Frequency of vector indexing
B) Proportion of relevant results among those retrieved
C) File formatting
D) Embedding size
Answer:B - Embedding drift refers to:
A) Data storage loss
B) Change in vector meaning over time or model version
C) GPU overheating
D) SQL replication failure
Answer:B - How can one prevent embedding drift issues?
A) Use static embeddings
B) Re-index when models are updated
C) Version embeddings
D) All of the above
Answer:D - Garbage in, garbage out applies to vector search because:
A) Indexes sort bad data
B) Poor quality input data leads to poor semantic matches
C) Data isn’t compressed
D) Embeddings are stored alphabetically
Answer:B - To validate semantic search, use:
A) A/B testing
B) Human-in-the-loop review
C) Evaluation benchmarks
D) All of the above
Answer:D - Embedding evaluation typically involves:
A) Comparing cosine similarity scores
B) Checking file timestamps
C) SQL command audits
D) Querying for NULL values
Answer:A - If similar queries produce inconsistent results, it may indicate:
A) Hardware failure
B) Inconsistent embeddings or index issues
C) WiFi problems
D) Outdated fonts
Answer:B - Best practice before deploying vector search in production:
A) Run SQL backups
B) Perform offline vector quality evaluation
C) Clear browser cache
D) Build a dashboard first
Answer:B
Section L: Advanced Concepts
- Vector databases often support filtering based on:
A) Vector length only
B) Metadata fields (e.g., tags, categories)
C) File size
D) File format
Answer:B - Combining vector similarity with metadata filters enables:
A) Random result generation
B) Contextual semantic search
C) Slower performance
D) More database joins
Answer:B - What is the role of score_thresholdin vector search?
A) Limits the number of documents stored
B) Filters results by minimum similarity score
C) Encrypts query vectors
D) Compresses database indexes
Answer:B - Vector recall can be improved by:
A) Removing embeddings
B) Increasing the number of probes in ANN
C) Using HTTP instead of gRPC
D) Disabling filtering
Answer:B - Which trade-off is common in ANN search?
A) Speed vs. accuracy
B) GPU vs. CPU
C) Storage vs. font size
D) SQL vs. NoSQL
Answer:A
Section M: Integration with LLMs & RAG
- Vector databases are often used in:
A) CMS systems
B) Retrieval-Augmented Generation (RAG) pipelines
C) Gaming physics engines
D) DNS lookup tables
Answer:B - RAG architecture typically retrieves context via:
A) SQL joins
B) Semantic search from a vector DB
C) HTML scrapers
D) Python list sorting
Answer:B - In a RAG pipeline, LLMs use retrieved embeddings to:
A) Generate more relevant and grounded responses
B) Sort search indexes
C) Train new embeddings
D) Ignore context
Answer:A - Pinecone, Weaviate, and Qdrant all support:
A) Direct fine-tuning of LLMs
B) Integration into RAG applications
C) GPU training only
D) Blockchain consensus
Answer:B - An important step in building a RAG system is:
A) Generating embeddings from chunks of documents
B) Formatting data in CSV only
C) Creating bar charts
D) Using relational joins
Answer:A
Section N: Real-Time & Streaming Use Cases
- Real-time vector search is essential in:
A) Log ingestion pipelines
B) Fraud detection systems
C) Static reports
D) Batch ETL pipelines
Answer:B - In real-time settings, ingestion latency affects:
A) SQL schema
B) Relevance of search results
C) Vector dimensions
D) GPU cooling
Answer:B - Event-driven architectures for vector search often use:
A) Kafka or pub/sub systems
B) Word processors
C) XML files
D) Paint apps
Answer:A - Vector databases with real-time indexing must support:
A) High write throughput
B) Manual uploads only
C) Offline indexing
D) Zero concurrency
Answer:A - Which is a performance bottleneck in real-time vector search?
A) Embedding generation time
B) File download speed
C) Admin dashboard design
D) Login frequency
Answer:A
Section O: Trends & Future Outlook
- A growing trend in vector databases is:
A) Cloud-only monoliths
B) Hybrid semantic+keyword search
C) Elimination of embeddings
D) Return to relational-only models
Answer:B - As embedding models improve, vector DBs must:
A) Reduce file size
B) Keep embeddings versioned and re-indexed
C) Migrate to Excel
D) Use fewer dimensions
Answer:B - New vector search methods are exploring:
A) LLM-guided retrieval
B) Color-based filtering
C) PDF-to-CSV pipelines
D) MD5-based hashing
Answer:A - Open source vector databases are often preferred because:
A) They run in Microsoft Word
B) They allow full customization and local deployment
C) They reduce vector length
D) They eliminate neural nets
Answer:B - One emerging challenge with vector databases is:
A) Lack of color support
B) Scalability with high-dimensional and large-scale data
C) Slow SQL query execution
D) File extension conflicts
Answer:B - Qdrant uses which underlying search algorithm?
A) Inverted index
B) HNSW (Hierarchical Navigable Small World)
C) KD-Tree
D) R-Tree
Answer:B - Weaviate allows module integrations with:
A) Hugging Face
B) OpenAI
C) Cohere
D) All of the above
Answer:D - LangChain is used to:
A) Build relational tables
B) Connect LLMs with vector stores and chains
C) Generate QR codes
D) Parse SQL
Answer:B - LlamaIndex is a tool for:
A) PDF compression
B) Creating vector indexes from data sources for LLMs
C) SQL tuning
D) Firewall setup
Answer:B - Embeddings can be encrypted before storage to:
A) Reduce dimensionality
B) Enhance security and privacy
C) Speed up rendering
D) Allow SQL compatibility
Answer:B
Section P: Cost Optimization & Efficiency
- One way to reduce storage costs in vector DBs is:
A) Use longer vectors
B) Apply quantization techniques like PQ or SQ
C) Store vectors as plain text
D) Avoid indexing
Answer:B - Query cost in vector DBs increases with:
A) Lower vector dimensionality
B) More restrictive metadata filters
C) More probes or higher recall settings
D) Using static embeddings
Answer:C - To reduce inference latency, embeddings can be:
A) Generated in real-time only
B) Pre-computed and cached
C) Ignored completely
D) Stored on blockchain
Answer:B - Which factor contributes most to compute cost in semantic search pipelines?
A) Index refresh rate
B) Embedding generation using large models
C) SQL joins
D) File imports
Answer:B - Fine-tuning models on small datasets may lead to:
A) Lower inference cost
B) Higher risk of overfitting
C) Faster indexing
D) Increased vector length
Answer:B
Section Q: Vector DB Tuning & Customization
- Changing the number of nprobein FAISS affects:
A) Query language
B) Search accuracy and latency
C) Vector shape
D) SQL syntax
Answer:B - Custom scoring functions in some vector DBs allow:
A) Arbitrary reshuffling of data
B) Fine-grained control over ranking logic
C) Ignoring similarity
D) Rewriting embeddings
Answer:B - FAISS index type IVF+PQprovides:
A) Full brute-force accuracy
B) Compressed, approximate search with fast recall
C) Keyword-only search
D) Multi-language tokenization
Answer:B - Rebalancing index shards in distributed DBs helps:
A) Reduce cosine similarity
B) Improve query load distribution
C) Eliminate high-dimensional embeddings
D) Sort data alphabetically
Answer:B - Query pre-warming is used to:
A) Increase database size
B) Reduce cold start latency in production systems
C) Sort results by length
D) Extend index lifetime
Answer:B
Section R: Multilingual Embeddings
- Multilingual embeddings map sentences from different languages into:
A) Isolated vector spaces
B) A shared embedding space
C) Separate databases
D) Binary formats
Answer:B - Which model is designed for multilingual embeddings?
A) mBERT
B) DALL·E
C) YOLOv5
D) InstructGPT
Answer:A - One challenge in multilingual vector search is:
A) GPU memory limits
B) Loss of semantic alignment across languages
C) Token length mismatch
D) Cloud billing issues
Answer:B - CLIP embeddings can work across:
A) SQL tables
B) Text and images
C) Datetime formats
D) Blockchain logs
Answer:B - In multilingual settings, it is recommended to:
A) Use isolated models per language
B) Use universal sentence encoders or multilingual models
C) Encode only in English
D) Use SQL collation
Answer:B
Section S: Model-Specific Behaviors
- OpenAI’s text-embedding-ada-002is optimized for:
A) Low-latency SQL queries
B) High-dimensional semantic representation
C) Image generation
D) File upload
Answer:B - Sentence Transformers are built on top of:
A) CNNs
B) BERT-based architectures
C) FAISS indexes
D) JavaScript
Answer:B - When using embeddings in LLM workflows, chunking long documents helps:
A) Compress data
B) Improve retrieval accuracy
C) Avoid semantic understanding
D) Remove vector metadata
Answer:B - Transformer-based embedding models usually scale poorly with:
A) Short input strings
B) Very long documents
C) Binary inputs
D) PNG images
Answer:B - Vector similarity can degrade if:
A) Tokenization is inconsistent
B) Metadata is present
C) Model is multilingual
D) Index is GPU-based
Answer:A
Section T: Privacy, Security & Compliance
- What’s a common method to protect sensitive embeddings?
A) Caching them in browsers
B) Encrypting embeddings before storing in DB
C) Storing them in plaintext
D) Disabling indexes
Answer:B - Which regulation may apply to embeddings containing personal data?
A) GDPR
B) HTTP
C) DNS
D) SSH
Answer:A - Embeddings that indirectly contain personal identifiers must be:
A) Compressed
B) Audited and privacy-protected
C) Ignored
D) Skipped during inference
Answer:B - One security risk in vector DBs is:
A) Embedding reversal attacks (to infer original content)
B) File compression errors
C) CSV injection
D) Color misrepresentation
Answer:A - Using private, local models reduces:
A) Vector dimensionality
B) Dependence on external APIs and privacy risks
C) SQL join latency
D) File corruption
Answer:B
Section U: Enterprise & Production Deployment
- High availability in vector DBs is ensured by:
A) Single-node setup
B) Replication and failover clusters
C) GPU-only inference
D) Query caching only
Answer:B - When embedding models are updated, vector indexes must be:
A) Renamed
B) Rebuilt to reflect new vector semantics
C) Duplicated
D) Shortened
Answer:B - Logging vector queries in production should be:
A) Disabled
B) Secure and anonymized
C) Stored in clear text
D) Shared with model vendors
Answer:B - A good practice for large-scale ingestion:
A) Load everything in memory
B) Batch upload embeddings in chunks
C) Use FTP
D) Build GUI first
Answer:B - Version control in embedding pipelines ensures:
A) UI updates
B) Reproducibility and model auditability
C) Real-time search
D) Embedding compression
Answer:B
Section V: LLM Limitations & Challenges
- LLM-generated embeddings may sometimes be:
A) Perfectly consistent
B) Sensitive to input phrasing
C) Always multilingual
D) Always 1024-dimensional
Answer:B - Hallucination in LLMs can occur even with:
A) Vector search
B) Accurate retrieval (if context is misinterpreted)
C) Metadata filters
D) Small vector size
Answer:B - Retrieval-Augmented Generation cannot fix:
A) Outdated context
B) Poor reasoning from LLM itself
C) Broken indexes
D) REST APIs
Answer:B - If retrieval quality is poor, RAG outputs will be:
A) More accurate
B) Contextually weaker and potentially incorrect
C) LLM-guided
D) Fact-checked automatically
Answer:B - One way to improve RAG quality is:
A) Increasing top-K retrieval
B) Using smaller vectors
C) Disabling chunking
D) Reducing batch size
Answer:A
Section W: Embedding Lifecycle & Management
- Embedding lifecycle includes:
A) Creation → Normalization → Indexing → Retrieval → Versioning
B) Training → SQL → PDF
C) Download → Upload → Rewrite
D) HTML → JS → Vector
Answer:A - Vector drift is caused by:
A) AI bias
B) Changes in domain or semantics over time
C) GPU errors
D) File formatting
Answer:B - One way to monitor vector drift:
A) Measure similarity between old and new embeddings for the same input
B) Monitor disk usage
C) Check query response time
D) Count SQL rows
Answer:A - Vector databases must support embedding versioning to:
A) Sort results by time
B) Compare different embedding models and rerank
C) Convert them to CSV
D) Index images
Answer:B - Regular re-indexing is essential when:
A) Metadata changes
B) Embedding models are updated or context shifts
C) Tables are renamed
D) Colors are added
Answer:B
Section X: Performance Optimization & Search Tuning
- Increasing the efparameter in HNSW improves:
A) Search accuracy
B) Write throughput
C) File download speed
D) Tokenization speed
Answer:A - Which FAISS index is best for small datasets with high precision?
A) HNSW
B) Flat (Brute-force)
C) PQ
D) LSM-Tree
Answer:B - Which technique helps balance speed and memory usage in FAISS?
A) LRU caching
B) Product Quantization (PQ)
C) Reverse indexing
D) Tokenization
Answer:B - Batch querying in vector DBs improves:
A) Latency for single queries
B) Throughput by reducing overhead
C) Token count
D) Index rebuild time
Answer:B - For high QPS (queries per second), a system must prioritize:
A) UI design
B) Low-latency index lookup and hardware parallelism
C) File size
D) JSON formatting
Answer:B
Section Y: Evaluation Metrics
- NDCG (Normalized Discounted Cumulative Gain) measures:
A) Vector length
B) Ranking quality with position-based weighting
C) Query latency
D) Metadata sort accuracy
Answer:B - Recall@K is primarily used to evaluate:
A) Storage format
B) Retrieval effectiveness
C) SQL sorting
D) Chart rendering
Answer:B - Cosine similarity is commonly used in vector search to measure:
A) Text overlap
B) Angular closeness of two vectors
C) File structure
D) GPU usage
Answer:B - Euclidean distance differs from cosine similarity by:
A) Ignoring vector magnitude
B) Considering absolute distance between vectors
C) Using text overlap
D) Requiring normalization
Answer:B - AUC-ROC is more applicable to:
A) Classification problems
B) Vector search
C) Embedding generation
D) Chunking documents
Answer:A
Section Z: Domain-Specific Applications
- In healthcare, vector embeddings help:
A) Encrypt billing records
B) Retrieve similar patient histories or medical documents
C) Run blood tests
D) Compress MRI images
Answer:B - Financial institutions can use vector DBs for:
A) Loan disbursement
B) Semantic analysis of analyst reports
C) ATM coordination
D) Barcode scanning
Answer:B - In scientific research, vector search can:
A) Perform chemical analysis
B) Retrieve similar research papers and findings
C) Store lab reports
D) Replace lab notebooks
Answer:B - For HR or recruiting systems, embeddings can match:
A) Employee ID
B) Candidate resumes to job descriptions semantically
C) Payroll tax IDs
D) Timesheet logs
Answer:B - Retail search using embeddings improves:
A) Inventory count
B) Semantic product discovery across categories
C) Store locations
D) Price updates
Answer:B
Section AA: Zero-Shot & Few-Shot Capabilities
- Zero-shot retrieval works by:
A) Fine-tuning for each use case
B) Using generalized embeddings for unseen queries
C) Disabling filters
D) Keyword lookup
Answer:B - Few-shot learning involves:
A) Massive datasets
B) Small task-specific examples to guide LLMs or embeddings
C) Blocking vector access
D) File compression
Answer:B - Embedding models like text-embedding-ada-002support zero-shot tasks by:
A) Matching input queries semantically without labeled training data
B) Using only synonyms
C) Hardcoding rules
D) Building SQL indexes
Answer:A - Zero-shot vector search is useful when:
A) Data is labeled
B) No annotated training examples are available
C) SQL is required
D) Filters are missing
Answer:B - One limitation of zero-shot search is:
A) Total accuracy
B) Lack of domain-specific tuning
C) Fast latency
D) Overuse of GPU
Answer:B
Section AB: Multi-modal Embeddings
- Multi-modal embeddings can represent:
A) Only structured text
B) Images, text, audio in a shared vector space
C) SQL queries
D) File sizes
Answer:B - OpenAI’s CLIP model can embed:
A) Audio files
B) Text and images into a common embedding space
C) Databases
D) CSS files
Answer:B - A practical use case for multi-modal search is:
A) Code compilation
B) Searching images using text queries
C) Sorting CSV rows
D) Sending emails
Answer:B - In a multi-modal vector DB, one challenge is:
A) Too many tables
B) Aligning vector dimensions across different modalities
C) Lack of users
D) HTTP errors
Answer:B - Audio embeddings can be used for:
A) Encrypting calls
B) Matching similar voice recordings or music
C) Creating firewalls
D) OCR scanning
Answer:B
Section AC: LLM-Agent + Vector DB Integration
- LLM agents use vector DBs to:
A) Sort URLs
B) Retrieve relevant context or facts for reasoning
C) Download content
D) Manage GPU drivers
Answer:B - LangChain and LlamaIndex provide:
A) Training loops
B) Pipelines to integrate LLMs with vector databases
C) SQL joins
D) GPU benchmarks
Answer:B - A vector store in an LLM agent workflow acts as:
A) Memory or long-term knowledge base
B) A CSS loader
C) A JSON parser
D) None of the above
Answer:A - Agents benefit from vector retrieval because it:
A) Blocks hallucination
B) Grounds outputs in factual, contextual data
C) Deletes metadata
D) Rewrites prompts
Answer:B - LangChain memory components may use vector DBs to:
A) Format prompt syntax
B) Store conversation history for retrieval
C) Rename sessions
D) Avoid tokenization
Answer:B
Section AD: Hardware Acceleration (GPU & Parallelism)
- GPU acceleration in vector search is useful for:
A) Faster brute-force (exact) and ANN searches
B) Coloring dashboards
C) Writing CSV files
D) Compressing PDFs
Answer:A - FAISS has GPU support via:
A) CUDA
B) HTML
C) REST API
D) USB
Answer:A - Vector DBs like Milvus support GPU usage to:
A) Improve visualization
B) Accelerate search and indexing
C) Format text
D) Replace models
Answer:B - High-dimensional vector search on CPU may cause:
A) Memory leaks
B) Latency and performance bottlenecks
C) Better speed
D) SQL errors
Answer:B - A drawback of relying heavily on GPU is:
A) Increased cost and resource consumption
B) Lower similarity
C) Lack of normalization
D) Metadata conflicts
Answer:A
Section AE: Low-resource or Edge Scenarios
- In mobile or edge environments, vector DBs must be:
A) Cloud-only
B) Lightweight and memory-efficient
C) JS-based
D) SQL-compatible only
Answer:B - Sentence transformers can be optimized for edge use with:
A) ONNX or quantized versions
B) JavaScript only
C) QR codes
D) RESTful logs
Answer:A - Trade-off in edge-based embedding is:
A) Higher precision, lower compute
B) Lower precision due to model size constraints
C) Higher GPU usage
D) Unlimited memory
Answer:B - For offline semantic search, a good stack is:
A) SQLite + MiniLM embeddings
B) RedisGraph
C) Tableau
D) GPT-4 streaming
Answer:A - One challenge of embedding on-device is:
A) Data privacy
B) Hardware limitations for real-time embedding
C) Lack of API access
D) Missing images
Answer:B
Section AF: Retrieval Strategies & Search Behavior
- The top_kparameter in vector search determines:
A) Number of queries sent
B) Number of nearest neighbors returned
C) Chunk size
D) Database ports
Answer:B - A high top_kvalue may lead to:
A) Faster search
B) Better recall but lower precision
C) Compressed results
D) Better keyword matches
Answer:B - To improve semantic coverage, a good practice is to:
A) Increase vector length
B) Chunk documents strategically
C) Use random queries
D) Embed metadata separately
Answer:B - Dense retrieval refers to:
A) Brute-force SQL
B) Using embeddings for semantic similarity search
C) Filtering based on numbers
D) HTML parsing
Answer:B - Sparse retrieval refers to:
A) Embedding-based search
B) Keyword/token-based search (e.g., BM25)
C) Vector compression
D) GraphQL
Answer:B
Section AG: Embedding Model Selection & Management
- An embedding model’s dimensiondetermines:
A) Color output
B) Length of its vector output
C) Number of GPU cores
D) API rate limits
Answer:B - Choosing a larger embedding model usually gives:
A) Shorter vectors
B) Better semantic representation but higher cost
C) Lower quality
D) SQL joins
Answer:B - Using domain-specific embedding models improves:
A) Generalization
B) Search relevance in that specific context
C) File conversion
D) Token overhead
Answer:B - You should not switch embedding models without:
A) Changing your CSS
B) Recomputing and re-indexing existing vectors
C) Saving the HTML
D) Disabling vector search
Answer:B - Embedding model drift can cause:
A) Better relevance
B) Degraded search performance over time
C) Smaller files
D) Vector shortening
Answer:B
Section AH: Hybrid Search (Keyword + Vector)
- Hybrid search combines:
A) SQL + HTML
B) Vector (dense) and keyword (sparse) retrieval
C) YAML + JSON
D) REST + WebSocket
Answer:B - Vector DBs like Weaviate support hybrid search via:
A) BM25 + cosine similarity scoring
B) XML tags
C) Local file sorting
D) Token rewriting
Answer:A - A benefit of hybrid search is:
A) Full memory usage
B) Improved relevance across ambiguous or misspelled queries
C) Eliminating embeddings
D) Slower performance
Answer:B - Hybrid scoring typically uses:
A) Simple token counts
B) Weighted combination of sparse and dense scores
C) Vector concatenation
D) GPU logs
Answer:B - A downside of hybrid search can be:
A) Lack of results
B) Complex tuning of score weighting
C) No metadata
D) Slower embeddings
Answer:B
Section AI: Storage & Troubleshooting
- Vectors are typically stored as:
A) Strings
B) Float arrays or binary-encoded formats
C) HTML tags
D) PDF streams
Answer:B - Slow query response in vector DB could be caused by:
A) High-dimensional vectors + low ANN tuning
B) Fast disk
C) Image data
D) JSON formatting
Answer:A - A broken embedding pipeline may result in:
A) Vector drift
B) Empty or irrelevant retrieval results
C) Faster indexing
D) Duplicate logs
Answer:B - One sign of misaligned vector indexing is:
A) Perfect recall
B) Frequent retrieval of irrelevant documents
C) High cosine similarity
D) Clean logs
Answer:B - Best practice before production deployment of vector DB:
A) Manual testing + relevance evaluation
B) DNS flushing
C) JSON formatting
D) Increasing image resolution
Answer:A
Section AJ: Data Preprocessing & Chunking
- Document chunking improves:
A) File size
B) Embedding granularity and retrieval precision
C) Token pricing
D) Batch sorting
Answer:B - A common chunking method is:
A) Sentence-wise or sliding window with overlap
B) File-splitting by color
C) MIME-type detection
D) PDF page numbers
Answer:A - Overlapping chunks help preserve:
A) Index size
B) Context across adjacent sections
C) Metadata fields
D) Token uniqueness
Answer:B - Preprocessing before embedding usually involves:
A) Compression
B) Cleaning, lowercasing, and removing stopwords (optional)
C) SQL parsing
D) IP masking
Answer:B - Too aggressive preprocessing may:
A) Reduce embedding latency
B) Harm semantic richness of embeddings
C) Improve accuracy
D) Increase storage cost
Answer:B
Section AK: Embedding Techniques & Tokenization
- Tokenization is required before embedding because:
A) Vectors are only binary
B) Embedding models operate on tokens, not raw text
C) It prevents file corruption
D) It optimizes JSON parsing
Answer:B - Byte Pair Encoding (BPE) is used for:
A) Tokenizing text efficiently for LLMs and embeddings
B) Compressing CSVs
C) Sorting documents
D) Counting files
Answer:A - Long text truncation before embedding may lead to:
A) Better performance
B) Loss of context
C) More accurate scores
D) Longer vectors
Answer:B - An embedding vector’s meaning is tied to:
A) Its position in the DB
B) The context and model used during generation
C) The file name
D) SQL schema
Answer:B - Embeddings from different models:
A) Are always identical
B) Should not be mixed in the same vector index
C) Can be concatenated for better results
D) Must be re-tokenized
Answer:B
Section AL: APIs and Query Patterns
- Most vector DBs expose APIs via:
A) REST and gRPC
B) FTP
C) SMTP
D) Bluetooth
Answer:A - In an API query to a vector DB, you typically send:
A) Raw text
B) A precomputed vector or embedding
C) Python bytecode
D) Metadata only
Answer:B - Query filters in vector DB APIs allow:
A) Content moderation
B) Metadata-based narrowing of search results
C) HTML editing
D) Chunk reprocessing
Answer:B - To paginate large search results, vector DBs may offer:
A) Scrolling views
B) Cursor-based or offset-based pagination
C) XML schema
D) File truncation
Answer:B - Many vector DBs support client libraries in:
A) Python, JavaScript, Go
B) C++, COBOL
C) HTML
D) SQL only
Answer:A
Section AM: Open Source vs. Managed Services
- One advantage of managed vector DBs:
A) Complete control over disk I/O
B) Reduced ops overhead and auto-scaling
C) Offline-only access
D) Manual index updates
Answer:B - Open-source vector DBs offer:
A) Full control, auditability, and local deployment
B) Less customization
C) No integrations
D) Always higher speed
Answer:A - Pinecone and Weaviate differ in that:
A) Pinecone is fully managed; Weaviate can be open-source or managed
B) Pinecone runs on-prem by default
C) Weaviate lacks vector support
D) Both only work on AWS
Answer:A - When to choose self-hosted vector DBs?
A) When latency doesn’t matter
B) For sensitive data, regulatory requirements, or full control
C) For mobile apps
D) When using images
Answer:B
Section AN: Real-World Failure Modes
- A typical cause of low retrieval quality in vector search is:
A) Using exact match queries
B) Poor or incorrect embedding strategy
C) Chunk size optimization
D) High cosine similarity
Answer:B
Section AO: Consistency & Index Maintenance
- Vector DB consistency means:
A) Data and index stay in sync after updates
B) Vectors never change
C) All queries return zero results
D) Database size remains constant
Answer:A - Incremental index updates help:
A) Avoid full re-indexing on new data
B) Compress vectors
C) Increase query latency
D) Break filters
Answer:A - Index rebuilding is necessary when:
A) Embedding model changes
B) File formats change
C) User interface updates
D) Only when queries fail
Answer:A - Vector DBs typically handle concurrent writes by:
A) Locking entire DB
B) Multi-version concurrency control (MVCC) or optimistic concurrency
C) Halting queries
D) Data deletion
Answer:B - Periodic index optimization improves:
A) Search speed and memory footprint
B) Tokenization rate
C) JSON parsing
D) Embedding quality
Answer:A
Section AP: Embedding Privacy & Security
- Embedding vectors can leak:
A) User query content if not encrypted
B) IP addresses
C) Metadata fields only
D) File permissions
Answer:A - Encrypting vectors at rest helps:
A) Prevent unauthorized access to sensitive embeddings
B) Increase vector length
C) Speed up indexing
D) Replace API keys
Answer:A - GDPR compliance with vector DBs involves:
A) Anonymizing data before embedding
B) Using SQL queries only
C) Disabling vector search
D) Running on-prem only
Answer:A - Access control in vector DBs is important because:
A) Anyone can modify vectors
B) Vectors often represent sensitive or proprietary data
C) It reduces GPU usage
D) It speeds queries
Answer:B - Differential privacy techniques applied to embeddings:
A) Add noise to vectors to protect individual data points
B) Compress vectors
C) Encrypt tokens
D) Split databases
Answer:A
Section AQ: Query Optimization
- Pre-filtering queries with metadata improves:
A) Vector dimension
B) Search speed and relevance
C) API throughput
D) Tokenization
Answer:B - Using approximate nearest neighbor (ANN) search trades:
A) Precision for speed and memory efficiency
B) GPU for CPU cycles
C) Data size for color
D) REST for gRPC
Answer:A - Caching frequent query results reduces:
A) Index rebuild times
B) Query latency
C) Vector length
D) Metadata size
Answer:B - Query rewriting for better embeddings includes:
A) Adding context or clarifying ambiguous terms
B) Compressing vectors
C) Encrypting queries
D) Adding HTML tags
Answer:A - Early stopping in ANN search can:
A) Increase speed with some loss of recall
B) Increase index size
C) Delete vectors
D) Format JSON
Answer:A
Section AR: Scaling & Architecture
- Horizontal scaling of vector DBs involves:
A) Adding more nodes to distribute load and data
B) Increasing vector dimension
C) Using bigger GPUs
D) Adding more tokens
Answer:A - Vertical scaling means:
A) Increasing the resources (CPU, RAM) of a single node
B) Adding more machines
C) Reducing index size
D) Compressing data
Answer:A - Sharding vector data can:
A) Help manage very large datasets by splitting vectors across servers
B) Slow down queries
C) Increase data loss
D) Reduce API calls
Answer:A - Replication in vector DBs ensures:
A) High availability and fault tolerance
B) Lower vector dimension
C) Slower indexing
D) Reduced metadata
Answer:A - Load balancing in vector DB clusters:
A) Distributes query traffic evenly across nodes
B) Compresses vectors
C) Deletes unused vectors
D) Encrypts queries
Answer:A - Hybrid cloud vector DB deployments allow:
A) Sensitive data to remain on-prem while leveraging cloud scalability
B) Only cloud-only usage
C) Only on-premise usage
D) No scaling
Answer:A - Monitoring vector DB performance includes tracking:
A) Query latency, throughput, and index health
B) File size
C) Usernames
D) CSS files
Answer:A - Alerts on vector DB anomalies help detect:
A) Sudden drops in retrieval accuracy or performance
B) Token counts
C) API key expirations
D) Disk formatting
Answer:A - Logging queries and embeddings helps with:
A) Debugging and auditing vector search behavior
B) Faster index rebuilds
C) File compression
D) GPU management
Answer:A - The best practice for scaling vector DBs is:
A) Start small, monitor performance, and scale incrementally
B) Buy the largest GPU immediately
C) Store only metadata
D) Disable vector search
Answer:A
Section AS: Embedding Fine-tuning & Customization
- Fine-tuning an embedding model can:
A) Tailor vectors for specific domain vocabulary and semantics
B) Reduce vector dimension automatically
C) Break tokenization
D) Remove metadata fields
Answer:A - Transfer learning in embeddings involves:
A) Starting from a pretrained model and adapting it to new data
B) Copying vectors directly
C) Using SQL joins
D) Random embedding initialization
Answer:A - Fine-tuned embeddings typically require:
A) Reindexing all existing vectors for consistency
B) Only updating metadata
C) No changes to the DB
D) Switching APIs
Answer:A - Embedding customization helps improve:
A) Semantic relevance for niche applications
B) Disk storage efficiency
C) GPU usage
D) Index size
Answer:A - A downside of fine-tuning is:
A) Increased training cost and complexity
B) Reduced vector dimension
C) No impact on search quality
D) Data loss
Answer:A
Section AT: Vector Compression & Storage
- Vector compression techniques:
A) Reduce storage size at potential cost of precision
B) Increase index size
C) Delete metadata
D) Encrypt data
Answer:A - Quantization is a common compression method that:
A) Converts float vectors to lower-bit representations
B) Expands vectors to higher dimensions
C) Splits vectors
D) Deletes tokens
Answer:A - Product quantization helps:
A) Compress large vector datasets efficiently for ANN search
B) Format JSON
C) Encrypt embeddings
D) Improve CPU speed only
Answer:A - Compressed vectors require:
A) Compatible index structures to ensure search quality
B) Manual decompression only
C) More metadata
D) Tokenization
Answer:A - Compression trade-offs include:
A) Faster search but possible accuracy loss
B) Larger vectors
C) Higher memory use
D) No speed changes
Answer:A
Section AU: Query Latency & Trade-offs
- Lower query latency can be achieved by:
A) Reducing index complexity and using ANN search
B) Increasing vector dimension
C) Querying all data at once
D) Using exact match only
Answer:A - Higher recall in vector search often leads to:
A) Higher query latency
B) Lower index size
C) No change in latency
D) No relevance improvements
Answer:A - Batch queries can:
A) Improve throughput but may increase per-query latency
B) Reduce index size
C) Increase GPU costs only
D) Format logs
Answer:A - Query caching is effective when:
A) Queries repeat frequently
B) Queries are all unique
C) Index is small
D) Metadata is missing
Answer:A - A/B testing retrieval parameters helps:
A) Optimize balance between speed and accuracy
B) Remove vectors
C) Increase vector dimension
D) Disable caching
Answer:A
Section AV: Multi-Tenancy & Access Control
- Multi-tenancy in vector DBs enables:
A) Multiple users or clients to share the same infrastructure securely
B) Single-user access only
C) No access control
D) Token mixing
Answer:A - Tenant isolation prevents:
A) Data leakage across customers
B) Vector compression
C) GPU sharing
D) API usage
Answer:A - Role-based access control (RBAC) allows:
A) Granular permission settings for users
B) Data duplication
C) Removing metadata
D) Token encryption only
Answer:A - Auditing in multi-tenant vector DBs is important for:
A) Compliance and security monitoring
B) Index rebuilding
C) Vector dimension scaling
D) Cache clearing
Answer:A - An API key scoped to a tenant:
A) Limits operations to only that tenant’s data
B) Gives full DB access
C) Disables vector search
D) Reduces vector length
Answer:A
Section AW: Future Trends & Innovations
- One emerging trend in vector DBs is:
A) Integration with LLMs for real-time reasoning
B) Only SQL joins
C) Replacing embeddings with images
D) Manual indexing only
Answer:A - Neural search combines:
A) Deep learning with vector similarity search
B) Keyword search only
C) Encryption and compression
D) Data deletion
Answer:A - Zero-shot retrieval leverages:
A) Pretrained embeddings without task-specific fine-tuning
B) SQL tables
C) Manual queries
D) Metadata filters only
Answer:A - Federated vector search enables:
A) Querying multiple distributed vector stores without centralizing data
B) Only on-prem usage
C) Single-node operation
D) Disabling caching
Answer:A - Quantum computing might impact vector search by:
A) Accelerating high-dimensional similarity computations in the future
B) Replacing GPUs today
C) Removing the need for embeddings
D) Compressing vectors instantly
Answer:A
Section AX: Multi-Modal & Cross-Domain Embeddings
- Multi-modal embeddings combine:
A) Text, images, audio, and other data types into a unified vector space
B) Only text vectors
C) SQL and NoSQL data
D) Metadata fields only
Answer:A - Multi-modal search allows:
A) Querying with text to find images or vice versa
B) Keyword-only retrieval
C) Vector concatenation only
D) Metadata filtering only
Answer:A - Cross-lingual embeddings enable:
A) Searching in one language and retrieving relevant results in another
B) Only English queries
C) Vector compression
D) Metadata translation
Answer:A - Multi-modal embedding models require:
A) Large, diverse training datasets
B) Only textual data
C) No fine-tuning
D) Manual indexing
Answer:A - A challenge with multi-modal embeddings is:
A) Aligning different data modalities in a shared vector space
B) Slower tokenization
C) Reducing vector dimension
D) Losing metadata
Answer:A
Section AY: Real-Time Updates & Streaming
- Real-time vector DB updates require:
A) Low-latency embedding and indexing pipelines
B) Batch reindexing only
C) Manual refreshes
D) Disabling filters
Answer:A - Streaming data ingestion in vector DBs is useful for:
A) Continuously updating search indexes with new data
B) Static datasets only
C) Deleting old vectors only
D) Manual backups
Answer:A - Event-driven architectures help vector DBs by:
A) Triggering updates on data changes automatically
B) Disabling API calls
C) Reducing index size
D) Increasing token length
Answer:A - Latency targets for real-time vector search are typically:
A) Under 100 milliseconds for user-facing applications
B) Minutes to hours
C) Days
D) No latency requirements
Answer:A - A downside of real-time updates is:
A) Increased system complexity and resource usage
B) Reduced query accuracy
C) Data loss
D) Metadata corruption
Answer:A
Section AZ: Vector DB Benchmarks & Metrics
- Common benchmarks for vector DBs include:
A) Recall@k, query latency, throughput, and index build time
B) File size only
C) Metadata count
D) User login speed
Answer:A - Recall@k measures:
A) Fraction of relevant items retrieved in the top-k results
B) Disk space used
C) Number of API calls
D) Metadata accuracy
Answer:A - Latency benchmarking helps evaluate:
A) Speed of query processing under load
B) Vector dimension
C) Token length
D) Metadata storage
Answer:A - Throughput in vector DB context means:
A) Number of queries processed per second
B) Number of vectors stored
C) Compression ratio
D) Cache size
Answer:A - Index build time impacts:
A) How quickly new or updated data becomes searchable
B) Tokenization speed only
C) GPU utilization only
D) API response format
Answer:A
Section BA: Emerging Vector DB Architectures
- Graph-based vector indexes (e.g., HNSW) use:
A) Navigable small-world graphs for efficient nearest neighbor search
B) Trees only
C) SQL joins
D) Manual index lookups
Answer:A - Product quantization is often combined with:
A) Inverted file structures to scale to billions of vectors
B) SQL databases
C) Metadata removal
D) Manual compression
Answer:A - Vector DBs using GPUs can:
A) Accelerate embedding generation and ANN search
B) Replace CPU entirely
C) Remove the need for indexes
D) Compress metadata
Answer:A - Serverless vector DB architectures:
A) Scale automatically without manual infrastructure management
B) Require dedicated servers
C) Have fixed capacity
D) Disable API access
Answer:A - Federated vector search is suitable for:
A) Data privacy scenarios where data cannot be centralized
B) Single-user apps only
C) Only on-prem setups
D) Static datasets
Answer:A - Auto-scaling vector DBs:
A) Dynamically adjust resources based on workload
B) Require manual intervention
C) Never change capacity
D) Disable filters
Answer:A - Approximate search algorithms trade:
A) Perfect accuracy for speed and scalability
B) Security for latency
C) Metadata for vector dimension
D) Compression for indexing
Answer:A - Hybrid indexing can combine:
A) Multiple index types (graph + quantization) for optimized performance
B) Only single index types
C) No indexing
D) Manual sorting
Answer:A - Cloud-native vector DBs often provide:
A) Easy integration with other cloud services and managed infrastructure
B) Only local deployment
C) No APIs
D) Static scaling
Answer:A - The future of vector databases will likely emphasize:
A) Better integration with AI models, real-time updates, and privacy guarantees
B) Replacing vector search with SQL only
C) Manual indexing
D) Fixed hardware requirements
Answer:A