Vector Database MCQ

0
12

Table of Contents

1. What is the primary purpose of a vector database?

A) To store relational data efficiently
B) To store and search high-dimensional vector embeddings
C) To visualize neural network models
D) To convert images into text

Answer: B) To store and search high-dimensional vector embeddings

2. Which of the following is NOT typically used to generate embeddings?

A) BERT
B) OpenAI’s CLIP
C) PostgreSQL
D) Word2Vec

Answer: C) PostgreSQL

3. What is an embedding in the context of AI and machine learning?

  1. A) A type of SQL query
    B) A low-dimensional vector representation of data
    C) A database table
    D) A visualization tool

Answer: B) A low-dimensional vector representation of data

4. In vector search, which metric is most commonly used to measure similarity?

A) Jaccard Index
B) Cosine similarity
C) Chi-squared distance
D) Manhattan distance

Answer: B) Cosine similarity

5. Which of the following is a popular vector database?

  1. A) MySQL
    B) SQLite
    C) Pinecone
    D) MongoDB

Answer: C) Pinecone

6. What is Approximate Nearest Neighbor (ANN) search used for in vector databases?

A) Running SQL queries on text
B) Sorting images by date
C) Finding similar vectors efficiently in large datasets
D) Reducing dimensionality of data

Answer: C) Finding similar vectors efficiently in large datasets

7. Which of the following is a valid use case for a vector database?

  1. A) Inventory management
    B) Financial transaction storage
    C) Semantic search in documents
    D) User role access control

Answer: C) Semantic search in documents

8. What role does dimensionality reduction play in vector storage?

A) Increases search time
B) Makes vectors non-unique
C) Reduces storage and speeds up similarity search
D) Deletes unnecessary vectors

Answer: C) Reduces storage and speeds up similarity search

9. Embeddings from models like OpenAI or HuggingFace can represent which of the following data types?

  1. A) Only numerical data
    B) Only SQL tables
    C) Text, images, and audio
    D) Only structured data

Answer: C) Text, images, and audio

10. What kind of indexing is used in vector databases for fast retrieval?

  1. A) B-Tree indexing
    B) Hash indexing
    C) Inverted indexing
    D) HNSW (Hierarchical Navigable Small World) indexing

Answer: D) HNSW (Hierarchical Navigable Small World) indexing

Section A: Embeddings Basics

  1. What is a vector embedding?
    A) A type of database schema
    B) A compressed image format
    C) A numerical representation of data in vector space
    D) A cloud-based file system
    Answer:C
  2. Embeddings are commonly used to represent:
    A) Colors
    B) Raw bytes
    C) Semantic information (e.g., words, images, documents)
    D) SQL queries
    Answer:C
  3. Which of the following is NOT typically used to generate embeddings?
    A) BERT
    B) Word2Vec
    C) FAISS
    D) OpenAI Embedding APIs
    Answer:C
  4. High-dimensional embeddings help in capturing:
    A) Simple arithmetic
    B) Complex semantic relationships
    C) Network latency
    D) Disk space
    Answer:B
  5. In NLP, vector embeddings are primarily used to:
    A) Sort documents
    B) Encode semantic meaning of words/sentences
    C) Create relational tables
    D) Define regex rules
    Answer:B

Section B: Vector Similarity and Distance Metrics

  1. Which of the following is commonly used as a similarity metric in vector databases?
    A) Hamming distance
    B) Cosine similarity
    C) Alphabetical order
    D) Boolean logic
    Answer:B
  2. Euclidean distance is most suitable for measuring:
    A) Direction similarity
    B) Angular similarity
    C) Straight-line distance between vectors
    D) Document frequency
    Answer:C
  3. Cosine similarity measures:
    A) The angle between two vectors
    B) The speed of query execution
    C) The length of documents
    D) The color of data points
    Answer:A
  4. A lower cosine distance between two vectors indicates:
    A) Greater dissimilarity
    B) Greater similarity
    C) Data corruption
    D) Indexing error
    Answer:B
  5. Dot product is equivalent to cosine similarity when:
    A) Vectors are normalized
    B) Data is binary
    C) Database is indexed
    D) GPU is used
    Answer:A

Section C: Vector Databases Fundamentals

  1. Which of the following is a specialized vector database?
    A) PostgreSQL
    B) MongoDB
    C) Pinecone
    D) SQLite
    Answer:C
  2. What is a key feature of a vector database?
    A) Joins and foreign keys
    B) Vector-based similarity search
    C) Relational table structure
    D) Blockchain storage
    Answer:B
  3. ANN in vector search stands for:
    A) Artificial Node Network
    B) Approximate Nearest Neighbor
    C) Active Neural Node
    D) Average Neural Network
    Answer:B
  4. FAISS is a library developed by:
    A) Google
    B) OpenAI
    C) Facebook (Meta)
    D) Microsoft
    Answer:C
  5. Milvus, Weaviate, and Qdrant are examples of:
    A) Web servers
    B) Key-value stores
    C) Vector databases
    D) Text encoders
    Answer:C

Section D: Applications

  1. A key application of vector databases is:
    A) Creating bar charts
    B) Semantic search
    C) Sorting numbers
    D) File compression
    Answer:B
  2. In recommendation systems, embeddings help by:
    A) Sorting users alphabetically
    B) Mapping users/items into vector space for similarity
    C) Encrypting user preferences
    D) Running SQL triggers
    Answer:B
  3. Which industry benefits significantly from vector-based search?
    A) Retail
    B) Healthcare
    C) Finance
    D) All of the above
    Answer:D
  4. Vector search outperforms traditional search when dealing with:
    A) Numerical queries
    B) Fuzzy, semantic queries
    C) Exact matching
    D) Sorted lists
    Answer:B
  5. Hybrid search combines:
    A) AI with cryptocurrency
    B) Vector similarity with keyword-based search
    C) SQL and NoSQL databases
    D) Embeddings with charts
    Answer:B

Section E: Indexing & Storage

  1. The primary purpose of indexing in vector databases is to:
    A) Compress vectors
    B) Speed up similarity search
    C) Apply encryption
    D) Store raw images
    Answer:B
  2. What does IVF stand for in the context of FAISS?
    A) Inverted File Index
    B) Indexed Vector Function
    C) Internal Vector Filter
    D) Immediate Vector Fetch
    Answer:A
  3. Which of the following is an index type used in FAISS?
    A) B-Tree
    B) PQ (Product Quantization)
    C) Hash Map
    D) LRU Cache
    Answer:B
  4. The HNSW algorithm stands for:
    A) Hierarchical Neural Search Window
    B) Hyper Network Signal Watch
    C) Hierarchical Navigable Small World
    D) Hybrid Node Search Weight
    Answer:C
  5. HNSW is known for its:
    A) Slow insert speed
    B) Accurate nearest neighbor search
    C) Linear scan performance
    D) Use of SQL triggers
    Answer:B
  6. Product Quantization (PQ) helps by:
    A) Encrypting embeddings
    B) Reducing vector size for faster search
    C) Creating backup indexes
    D) Clustering search results
    Answer:B
  7. Scalar quantization is a type of:
    A) Embedding generation
    B) Approximation technique for vectors
    C) Database schema
    D) Network protocol
    Answer:B
  8. In vector search, brute-force method refers to:
    A) Index-less exact similarity search
    B) Using GPU for quantization
    C) Encrypting vectors
    D) Running SQL scripts
    Answer:A
  9. Which hardware significantly accelerates vector search operations?
    A) HDD
    B) GPU
    C) Router
    D) SSD
    Answer:B
  10. Which of the following is a disadvantage of brute-force search?
    A) High accuracy
    B) Low latency
    C) Poor scalability on large datasets
    D) Vector corruption
    Answer:C

Section F: Hybrid Search & Applications

  1. Hybrid search combines vector search with:
    A) Graph databases
    B) Relational logic
    C) Keyword or symbolic search
    D) Temporal clustering
    Answer:C
  2. A benefit of hybrid search is:
    A) Limited query types
    B) Full support only for SQL
    C) Ability to combine semantic and keyword matching
    D) Data compression
    Answer:C
  3. Which platform natively supports hybrid search?
    A) Elasticsearch + dense vector plugin
    B) SQLite
    C) Hadoop
    D) GitHub
    Answer:A
  4. In hybrid search, vector search improves:
    A) Spelling correction
    B) Semantic relevance
    C) Page load speed
    D) Data backup
    Answer:B
  5. A use case for hybrid search is:
    A) Transactional banking systems
    B) Semantic + keyword legal document search
    C) Color rendering
    D) Encryption
    Answer:B

Section G: Vector Database Architecture

  1. In a vector database, the embedding dimension affects:
    A) Query language
    B) Vector space resolution
    C) Transaction throughput
    D) Query syntax
    Answer:B
  2. A high-dimensional embedding space can lead to:
    A) Lower latency
    B) Curse of dimensionality
    C) Exact keyword match
    D) Simplified database schema
    Answer:B
  3. Vector databases often rely on which type of architecture?
    A) Monolithic
    B) Serverless
    C) Microservices with storage and compute separation
    D) Blockchain
    Answer:C
  4. Real-time vector search requires:
    A) Batch indexing only
    B) Low-latency infrastructure
    C) File transfer protocol
    D) Legacy SQL engines
    Answer:B
  5. Cloud-native vector databases are typically designed for:
    A) Offline search
    B) On-prem analytics only
    C) Scalability and distributed search
    D) Graph rendering
    Answer:C

Section H: Real-World Use Cases

  1. In e-commerce, vector databases can improve:
    A) Checkout processing
    B) Product recommendations based on user intent
    C) Payment encryption
    D) Invoice generation
    Answer:B
  2. In image search, embeddings are typically generated by:
    A) Word2Vec
    B) CNNs (Convolutional Neural Networks)
    C) SQL queries
    D) QR scanners
    Answer:B
  3. A typical vector embedding for a sentence might have dimensions of:
    A) 3
    B) 10
    C) 768 or more
    D) 1,000,000
    Answer:C
  4. In personalized search engines, vector embeddings help to:
    A) Increase advertisement cost
    B) Predict user preferences semantically
    C) Break down session cookies
    D) Encrypt results
    Answer:B
  5. One use case of vector search in legal tech is:
    A) Matching client names
    B) Identifying similar legal clauses across documents
    C) Rendering HTML contracts
    D) Printing legal forms
    Answer:B
  6. In genomics, vector embeddings are useful for:
    A) Identifying similar gene sequences
    B) Formatting DNA files
    C) Password protecting data
    D) Streaming videos
    Answer:A
  7. In customer support systems, vector search enhances:
    A) Response time through semantic FAQ matching
    B) Ticket creation speed
    C) Dashboard design
    D) Login security
    Answer:A
  8. Embeddings in cybersecurity can be used for:
    A) Visualizing passwords
    B) Semantic anomaly detection in logs
    C) Compressing malware
    D) Generating CAPTCHAs
    Answer:B
  9. Vector similarity can help in detecting:
    A) Semantic duplicates
    B) File corruption
    C) Primary keys
    D) Cloud latency
    Answer:A
  10. Vector databases are NOT ideal for:
    A) Full-text semantic search
    B) Social media content similarity
    C) Inventory accounting systems
    D) Image similarity search
    Answer:C

Section I: Deployment & Scaling

  1. Which cloud provider offers managed vector database services?
    A) AWS
    B) Azure
    C) GCP
    D) All of the above
    Answer:D
  2. A critical factor for scaling vector databases is:
    A) Table joins
    B) Efficient memory usage and index sharding
    C) Audit logging
    D) Backup frequency
    Answer:B
  3. Embedding models are usually deployed:
    A) Separately from the vector database
    B) Inside the database engine
    C) Only during batch jobs
    D) In spreadsheets
    Answer:A
  4. Horizontal scaling of a vector DB involves:
    A) Adding more indexes to the same machine
    B) Adding more nodes to handle increased load
    C) Shrinking embedding sizes
    D) Removing indexes
    Answer:B
  5. Vector data is often stored in:
    A) Relational schema
    B) Flat files
    C) Columnar format or binary blobs
    D) CSV only
    Answer:C
  6. Which of these databases provides built-in distributed vector search?
    A) SQLite
    B) Redis with Vector extension
    C) Neo4j
    D) Notepad
    Answer:B
  7. Which tool allows REST or gRPC API access to vector DBs?
    A) Weaviate
    B) Excel
    C) Hive
    D) PowerPoint
    Answer:A
  8. Embeddings are usually updated when:
    A) Schema changes
    B) New data or model updates occur
    C) SQL indexes are rebuilt
    D) The database is restarted
    Answer:B
  9. To persist vector data across restarts, a vector DB must support:
    A) Auto-scaling
    B) Disk-based storage or checkpointing
    C) Dark mode
    D) Sorting by key
    Answer:B
  10. Milvus uses which architecture?
    A) Blockchain
    B) Monolithic binary
    C) Microservices with separate components for storage, query, and indexing
    D) Serverless Lambda only
    Answer:C

Section J: Embedding Generation & Model Integration

  1. Which of the following can generate sentence embeddings?
    A) GPT models
    B) BERT variants (e.g., Sentence-BERT)
    C) OpenAI Embedding API
    D) All of the above
    Answer:D
  2. When using OpenAI’s text-embedding-ada-002, the output is:
    A) A PDF document
    B) A SQL table
    C) A 1536-dimensional vector
    D) A CSV file
    Answer:C
  3. Embedding models are usually trained using:
    A) Supervised learning only
    B) Unsupervised or contrastive learning techniques
    C) Decision trees
    D) SQL triggers
    Answer:B
  4. In multi-modal vector search, what type of data can be embedded together?
    A) Images only
    B) Text only
    C) Text and images/audio combined
    D) Only structured tables
    Answer:C
  5. Open-source embedding models can be deployed using:
    A) Hugging Face Transformers
    B) Docker containers
    C) ONNX format
    D) All of the above
    Answer:D
  6. A drawback of large embedding models is:
    A) Low accuracy
    B) High latency and resource consumption
    C) Inability to scale
    D) Lack of documentation
    Answer:B
  7. For privacy-sensitive embeddings, companies often:
    A) Host models locally
    B) Use API gateways
    C) Avoid third-party APIs
    D) All of the above
    Answer:D
  8. Normalizing embeddings before indexing is useful for:
    A) Faster compression
    B) Consistent similarity calculations
    C) Cloud billing
    D) SQL querying
    Answer:B
  9. Vector normalization typically involves:
    A) Resizing the database
    B) Dividing the vector by its L2 norm
    C) Reversing cosine similarity
    D) Adding random noise
    Answer:B
  10. Fine-tuning embedding models can improve:
    A) File download speed
    B) Search relevance in a specific domain
    C) Battery life
    D) Index rebuilding
    Answer:B

Section K: Evaluation & Quality Control

  1. A common metric to evaluate vector search quality is:
    A) SQL response time
    B) Accuracy@K (Top-K accuracy)
    C) Ping latency
    D) File size
    Answer:B
  2. Recall@10 measures:
    A) Database restart time
    B) Number of true neighbors in top 10 results
    C) Index rebuilding time
    D) Vector corruption rate
    Answer:B
  3. Precision in vector search refers to:
    A) Frequency of vector indexing
    B) Proportion of relevant results among those retrieved
    C) File formatting
    D) Embedding size
    Answer:B
  4. Embedding drift refers to:
    A) Data storage loss
    B) Change in vector meaning over time or model version
    C) GPU overheating
    D) SQL replication failure
    Answer:B
  5. How can one prevent embedding drift issues?
    A) Use static embeddings
    B) Re-index when models are updated
    C) Version embeddings
    D) All of the above
    Answer:D
  6. Garbage in, garbage out applies to vector search because:
    A) Indexes sort bad data
    B) Poor quality input data leads to poor semantic matches
    C) Data isn’t compressed
    D) Embeddings are stored alphabetically
    Answer:B
  7. To validate semantic search, use:
    A) A/B testing
    B) Human-in-the-loop review
    C) Evaluation benchmarks
    D) All of the above
    Answer:D
  8. Embedding evaluation typically involves:
    A) Comparing cosine similarity scores
    B) Checking file timestamps
    C) SQL command audits
    D) Querying for NULL values
    Answer:A
  9. If similar queries produce inconsistent results, it may indicate:
    A) Hardware failure
    B) Inconsistent embeddings or index issues
    C) WiFi problems
    D) Outdated fonts
    Answer:B
  10. Best practice before deploying vector search in production:
    A) Run SQL backups
    B) Perform offline vector quality evaluation
    C) Clear browser cache
    D) Build a dashboard first
    Answer:B

Section L: Advanced Concepts

  1. Vector databases often support filtering based on:
    A) Vector length only
    B) Metadata fields (e.g., tags, categories)
    C) File size
    D) File format
    Answer:B
  2. Combining vector similarity with metadata filters enables:
    A) Random result generation
    B) Contextual semantic search
    C) Slower performance
    D) More database joins
    Answer:B
  3. What is the role of score_thresholdin vector search?
    A) Limits the number of documents stored
    B) Filters results by minimum similarity score
    C) Encrypts query vectors
    D) Compresses database indexes
    Answer:B
  4. Vector recall can be improved by:
    A) Removing embeddings
    B) Increasing the number of probes in ANN
    C) Using HTTP instead of gRPC
    D) Disabling filtering
    Answer:B
  5. Which trade-off is common in ANN search?
    A) Speed vs. accuracy
    B) GPU vs. CPU
    C) Storage vs. font size
    D) SQL vs. NoSQL
    Answer:A

Section M: Integration with LLMs & RAG

  1. Vector databases are often used in:
    A) CMS systems
    B) Retrieval-Augmented Generation (RAG) pipelines
    C) Gaming physics engines
    D) DNS lookup tables
    Answer:B
  2. RAG architecture typically retrieves context via:
    A) SQL joins
    B) Semantic search from a vector DB
    C) HTML scrapers
    D) Python list sorting
    Answer:B
  3. In a RAG pipeline, LLMs use retrieved embeddings to:
    A) Generate more relevant and grounded responses
    B) Sort search indexes
    C) Train new embeddings
    D) Ignore context
    Answer:A
  4. Pinecone, Weaviate, and Qdrant all support:
    A) Direct fine-tuning of LLMs
    B) Integration into RAG applications
    C) GPU training only
    D) Blockchain consensus
    Answer:B
  5. An important step in building a RAG system is:
    A) Generating embeddings from chunks of documents
    B) Formatting data in CSV only
    C) Creating bar charts
    D) Using relational joins
    Answer:A

Section N: Real-Time & Streaming Use Cases

  1. Real-time vector search is essential in:
    A) Log ingestion pipelines
    B) Fraud detection systems
    C) Static reports
    D) Batch ETL pipelines
    Answer:B
  2. In real-time settings, ingestion latency affects:
    A) SQL schema
    B) Relevance of search results
    C) Vector dimensions
    D) GPU cooling
    Answer:B
  3. Event-driven architectures for vector search often use:
    A) Kafka or pub/sub systems
    B) Word processors
    C) XML files
    D) Paint apps
    Answer:A
  4. Vector databases with real-time indexing must support:
    A) High write throughput
    B) Manual uploads only
    C) Offline indexing
    D) Zero concurrency
    Answer:A
  5. Which is a performance bottleneck in real-time vector search?
    A) Embedding generation time
    B) File download speed
    C) Admin dashboard design
    D) Login frequency
    Answer:A

Section O: Trends & Future Outlook

  1. A growing trend in vector databases is:
    A) Cloud-only monoliths
    B) Hybrid semantic+keyword search
    C) Elimination of embeddings
    D) Return to relational-only models
    Answer:B
  2. As embedding models improve, vector DBs must:
    A) Reduce file size
    B) Keep embeddings versioned and re-indexed
    C) Migrate to Excel
    D) Use fewer dimensions
    Answer:B
  3. New vector search methods are exploring:
    A) LLM-guided retrieval
    B) Color-based filtering
    C) PDF-to-CSV pipelines
    D) MD5-based hashing
    Answer:A
  4. Open source vector databases are often preferred because:
    A) They run in Microsoft Word
    B) They allow full customization and local deployment
    C) They reduce vector length
    D) They eliminate neural nets
    Answer:B
  5. One emerging challenge with vector databases is:
    A) Lack of color support
    B) Scalability with high-dimensional and large-scale data
    C) Slow SQL query execution
    D) File extension conflicts
    Answer:B
  6. Qdrant uses which underlying search algorithm?
    A) Inverted index
    B) HNSW (Hierarchical Navigable Small World)
    C) KD-Tree
    D) R-Tree
    Answer:B
  7. Weaviate allows module integrations with:
    A) Hugging Face
    B) OpenAI
    C) Cohere
    D) All of the above
    Answer:D
  8. LangChain is used to:
    A) Build relational tables
    B) Connect LLMs with vector stores and chains
    C) Generate QR codes
    D) Parse SQL
    Answer:B
  9. LlamaIndex is a tool for:
    A) PDF compression
    B) Creating vector indexes from data sources for LLMs
    C) SQL tuning
    D) Firewall setup
    Answer:B
  10. Embeddings can be encrypted before storage to:
    A) Reduce dimensionality
    B) Enhance security and privacy
    C) Speed up rendering
    D) Allow SQL compatibility
    Answer:B

Section P: Cost Optimization & Efficiency

  1. One way to reduce storage costs in vector DBs is:
    A) Use longer vectors
    B) Apply quantization techniques like PQ or SQ
    C) Store vectors as plain text
    D) Avoid indexing
    Answer:B
  2. Query cost in vector DBs increases with:
    A) Lower vector dimensionality
    B) More restrictive metadata filters
    C) More probes or higher recall settings
    D) Using static embeddings
    Answer:C
  3. To reduce inference latency, embeddings can be:
    A) Generated in real-time only
    B) Pre-computed and cached
    C) Ignored completely
    D) Stored on blockchain
    Answer:B
  4. Which factor contributes most to compute cost in semantic search pipelines?
    A) Index refresh rate
    B) Embedding generation using large models
    C) SQL joins
    D) File imports
    Answer:B
  5. Fine-tuning models on small datasets may lead to:
    A) Lower inference cost
    B) Higher risk of overfitting
    C) Faster indexing
    D) Increased vector length
    Answer:B

Section Q: Vector DB Tuning & Customization

  1. Changing the number of nprobein FAISS affects:
    A) Query language
    B) Search accuracy and latency
    C) Vector shape
    D) SQL syntax
    Answer:B
  2. Custom scoring functions in some vector DBs allow:
    A) Arbitrary reshuffling of data
    B) Fine-grained control over ranking logic
    C) Ignoring similarity
    D) Rewriting embeddings
    Answer:B
  3. FAISS index type IVF+PQprovides:
    A) Full brute-force accuracy
    B) Compressed, approximate search with fast recall
    C) Keyword-only search
    D) Multi-language tokenization
    Answer:B
  4. Rebalancing index shards in distributed DBs helps:
    A) Reduce cosine similarity
    B) Improve query load distribution
    C) Eliminate high-dimensional embeddings
    D) Sort data alphabetically
    Answer:B
  5. Query pre-warming is used to:
    A) Increase database size
    B) Reduce cold start latency in production systems
    C) Sort results by length
    D) Extend index lifetime
    Answer:B

Section R: Multilingual Embeddings

  1. Multilingual embeddings map sentences from different languages into:
    A) Isolated vector spaces
    B) A shared embedding space
    C) Separate databases
    D) Binary formats
    Answer:B
  2. Which model is designed for multilingual embeddings?
    A) mBERT
    B) DALL·E
    C) YOLOv5
    D) InstructGPT
    Answer:A
  3. One challenge in multilingual vector search is:
    A) GPU memory limits
    B) Loss of semantic alignment across languages
    C) Token length mismatch
    D) Cloud billing issues
    Answer:B
  4. CLIP embeddings can work across:
    A) SQL tables
    B) Text and images
    C) Datetime formats
    D) Blockchain logs
    Answer:B
  5. In multilingual settings, it is recommended to:
    A) Use isolated models per language
    B) Use universal sentence encoders or multilingual models
    C) Encode only in English
    D) Use SQL collation
    Answer:B

Section S: Model-Specific Behaviors

  1. OpenAI’s text-embedding-ada-002is optimized for:
    A) Low-latency SQL queries
    B) High-dimensional semantic representation
    C) Image generation
    D) File upload
    Answer:B
  2. Sentence Transformers are built on top of:
    A) CNNs
    B) BERT-based architectures
    C) FAISS indexes
    D) JavaScript
    Answer:B
  3. When using embeddings in LLM workflows, chunking long documents helps:
    A) Compress data
    B) Improve retrieval accuracy
    C) Avoid semantic understanding
    D) Remove vector metadata
    Answer:B
  4. Transformer-based embedding models usually scale poorly with:
    A) Short input strings
    B) Very long documents
    C) Binary inputs
    D) PNG images
    Answer:B
  5. Vector similarity can degrade if:
    A) Tokenization is inconsistent
    B) Metadata is present
    C) Model is multilingual
    D) Index is GPU-based
    Answer:A

Section T: Privacy, Security & Compliance

  1. What’s a common method to protect sensitive embeddings?
    A) Caching them in browsers
    B) Encrypting embeddings before storing in DB
    C) Storing them in plaintext
    D) Disabling indexes
    Answer:B
  2. Which regulation may apply to embeddings containing personal data?
    A) GDPR
    B) HTTP
    C) DNS
    D) SSH
    Answer:A
  3. Embeddings that indirectly contain personal identifiers must be:
    A) Compressed
    B) Audited and privacy-protected
    C) Ignored
    D) Skipped during inference
    Answer:B
  4. One security risk in vector DBs is:
    A) Embedding reversal attacks (to infer original content)
    B) File compression errors
    C) CSV injection
    D) Color misrepresentation
    Answer:A
  5. Using private, local models reduces:
    A) Vector dimensionality
    B) Dependence on external APIs and privacy risks
    C) SQL join latency
    D) File corruption
    Answer:B

Section U: Enterprise & Production Deployment

  1. High availability in vector DBs is ensured by:
    A) Single-node setup
    B) Replication and failover clusters
    C) GPU-only inference
    D) Query caching only
    Answer:B
  2. When embedding models are updated, vector indexes must be:
    A) Renamed
    B) Rebuilt to reflect new vector semantics
    C) Duplicated
    D) Shortened
    Answer:B
  3. Logging vector queries in production should be:
    A) Disabled
    B) Secure and anonymized
    C) Stored in clear text
    D) Shared with model vendors
    Answer:B
  4. A good practice for large-scale ingestion:
    A) Load everything in memory
    B) Batch upload embeddings in chunks
    C) Use FTP
    D) Build GUI first
    Answer:B
  5. Version control in embedding pipelines ensures:
    A) UI updates
    B) Reproducibility and model auditability
    C) Real-time search
    D) Embedding compression
    Answer:B

Section V: LLM Limitations & Challenges

  1. LLM-generated embeddings may sometimes be:
    A) Perfectly consistent
    B) Sensitive to input phrasing
    C) Always multilingual
    D) Always 1024-dimensional
    Answer:B
  2. Hallucination in LLMs can occur even with:
    A) Vector search
    B) Accurate retrieval (if context is misinterpreted)
    C) Metadata filters
    D) Small vector size
    Answer:B
  3. Retrieval-Augmented Generation cannot fix:
    A) Outdated context
    B) Poor reasoning from LLM itself
    C) Broken indexes
    D) REST APIs
    Answer:B
  4. If retrieval quality is poor, RAG outputs will be:
    A) More accurate
    B) Contextually weaker and potentially incorrect
    C) LLM-guided
    D) Fact-checked automatically
    Answer:B
  5. One way to improve RAG quality is:
    A) Increasing top-K retrieval
    B) Using smaller vectors
    C) Disabling chunking
    D) Reducing batch size
    Answer:A

Section W: Embedding Lifecycle & Management

  1. Embedding lifecycle includes:
    A) Creation → Normalization → Indexing → Retrieval → Versioning
    B) Training → SQL → PDF
    C) Download → Upload → Rewrite
    D) HTML → JS → Vector
    Answer:A
  2. Vector drift is caused by:
    A) AI bias
    B) Changes in domain or semantics over time
    C) GPU errors
    D) File formatting
    Answer:B
  3. One way to monitor vector drift:
    A) Measure similarity between old and new embeddings for the same input
    B) Monitor disk usage
    C) Check query response time
    D) Count SQL rows
    Answer:A
  4. Vector databases must support embedding versioning to:
    A) Sort results by time
    B) Compare different embedding models and rerank
    C) Convert them to CSV
    D) Index images
    Answer:B
  5. Regular re-indexing is essential when:
    A) Metadata changes
    B) Embedding models are updated or context shifts
    C) Tables are renamed
    D) Colors are added
    Answer:B

Section X: Performance Optimization & Search Tuning

  1. Increasing the efparameter in HNSW improves:
    A) Search accuracy
    B) Write throughput
    C) File download speed
    D) Tokenization speed
    Answer:A
  2. Which FAISS index is best for small datasets with high precision?
    A) HNSW
    B) Flat (Brute-force)
    C) PQ
    D) LSM-Tree
    Answer:B
  3. Which technique helps balance speed and memory usage in FAISS?
    A) LRU caching
    B) Product Quantization (PQ)
    C) Reverse indexing
    D) Tokenization
    Answer:B
  4. Batch querying in vector DBs improves:
    A) Latency for single queries
    B) Throughput by reducing overhead
    C) Token count
    D) Index rebuild time
    Answer:B
  5. For high QPS (queries per second), a system must prioritize:
    A) UI design
    B) Low-latency index lookup and hardware parallelism
    C) File size
    D) JSON formatting
    Answer:B

Section Y: Evaluation Metrics

  1. NDCG (Normalized Discounted Cumulative Gain) measures:
    A) Vector length
    B) Ranking quality with position-based weighting
    C) Query latency
    D) Metadata sort accuracy
    Answer:B
  2. Recall@K is primarily used to evaluate:
    A) Storage format
    B) Retrieval effectiveness
    C) SQL sorting
    D) Chart rendering
    Answer:B
  3. Cosine similarity is commonly used in vector search to measure:
    A) Text overlap
    B) Angular closeness of two vectors
    C) File structure
    D) GPU usage
    Answer:B
  4. Euclidean distance differs from cosine similarity by:
    A) Ignoring vector magnitude
    B) Considering absolute distance between vectors
    C) Using text overlap
    D) Requiring normalization
    Answer:B
  5. AUC-ROC is more applicable to:
    A) Classification problems
    B) Vector search
    C) Embedding generation
    D) Chunking documents
    Answer:A

Section Z: Domain-Specific Applications

  1. In healthcare, vector embeddings help:
    A) Encrypt billing records
    B) Retrieve similar patient histories or medical documents
    C) Run blood tests
    D) Compress MRI images
    Answer:B
  2. Financial institutions can use vector DBs for:
    A) Loan disbursement
    B) Semantic analysis of analyst reports
    C) ATM coordination
    D) Barcode scanning
    Answer:B
  3. In scientific research, vector search can:
    A) Perform chemical analysis
    B) Retrieve similar research papers and findings
    C) Store lab reports
    D) Replace lab notebooks
    Answer:B
  4. For HR or recruiting systems, embeddings can match:
    A) Employee ID
    B) Candidate resumes to job descriptions semantically
    C) Payroll tax IDs
    D) Timesheet logs
    Answer:B
  5. Retail search using embeddings improves:
    A) Inventory count
    B) Semantic product discovery across categories
    C) Store locations
    D) Price updates
    Answer:B

Section AA: Zero-Shot & Few-Shot Capabilities

  1. Zero-shot retrieval works by:
    A) Fine-tuning for each use case
    B) Using generalized embeddings for unseen queries
    C) Disabling filters
    D) Keyword lookup
    Answer:B
  2. Few-shot learning involves:
    A) Massive datasets
    B) Small task-specific examples to guide LLMs or embeddings
    C) Blocking vector access
    D) File compression
    Answer:B
  3. Embedding models like text-embedding-ada-002support zero-shot tasks by:
    A) Matching input queries semantically without labeled training data
    B) Using only synonyms
    C) Hardcoding rules
    D) Building SQL indexes
    Answer:A
  4. Zero-shot vector search is useful when:
    A) Data is labeled
    B) No annotated training examples are available
    C) SQL is required
    D) Filters are missing
    Answer:B
  5. One limitation of zero-shot search is:
    A) Total accuracy
    B) Lack of domain-specific tuning
    C) Fast latency
    D) Overuse of GPU
    Answer:B

Section AB: Multi-modal Embeddings

  1. Multi-modal embeddings can represent:
    A) Only structured text
    B) Images, text, audio in a shared vector space
    C) SQL queries
    D) File sizes
    Answer:B
  2. OpenAI’s CLIP model can embed:
    A) Audio files
    B) Text and images into a common embedding space
    C) Databases
    D) CSS files
    Answer:B
  3. A practical use case for multi-modal search is:
    A) Code compilation
    B) Searching images using text queries
    C) Sorting CSV rows
    D) Sending emails
    Answer:B
  4. In a multi-modal vector DB, one challenge is:
    A) Too many tables
    B) Aligning vector dimensions across different modalities
    C) Lack of users
    D) HTTP errors
    Answer:B
  5. Audio embeddings can be used for:
    A) Encrypting calls
    B) Matching similar voice recordings or music
    C) Creating firewalls
    D) OCR scanning
    Answer:B

Section AC: LLM-Agent + Vector DB Integration

  1. LLM agents use vector DBs to:
    A) Sort URLs
    B) Retrieve relevant context or facts for reasoning
    C) Download content
    D) Manage GPU drivers
    Answer:B
  2. LangChain and LlamaIndex provide:
    A) Training loops
    B) Pipelines to integrate LLMs with vector databases
    C) SQL joins
    D) GPU benchmarks
    Answer:B
  3. A vector store in an LLM agent workflow acts as:
    A) Memory or long-term knowledge base
    B) A CSS loader
    C) A JSON parser
    D) None of the above
    Answer:A
  4. Agents benefit from vector retrieval because it:
    A) Blocks hallucination
    B) Grounds outputs in factual, contextual data
    C) Deletes metadata
    D) Rewrites prompts
    Answer:B
  5. LangChain memory components may use vector DBs to:
    A) Format prompt syntax
    B) Store conversation history for retrieval
    C) Rename sessions
    D) Avoid tokenization
    Answer:B

Section AD: Hardware Acceleration (GPU & Parallelism)

  1. GPU acceleration in vector search is useful for:
    A) Faster brute-force (exact) and ANN searches
    B) Coloring dashboards
    C) Writing CSV files
    D) Compressing PDFs
    Answer:A
  2. FAISS has GPU support via:
    A) CUDA
    B) HTML
    C) REST API
    D) USB
    Answer:A
  3. Vector DBs like Milvus support GPU usage to:
    A) Improve visualization
    B) Accelerate search and indexing
    C) Format text
    D) Replace models
    Answer:B
  4. High-dimensional vector search on CPU may cause:
    A) Memory leaks
    B) Latency and performance bottlenecks
    C) Better speed
    D) SQL errors
    Answer:B
  5. A drawback of relying heavily on GPU is:
    A) Increased cost and resource consumption
    B) Lower similarity
    C) Lack of normalization
    D) Metadata conflicts
    Answer:A

Section AE: Low-resource or Edge Scenarios

  1. In mobile or edge environments, vector DBs must be:
    A) Cloud-only
    B) Lightweight and memory-efficient
    C) JS-based
    D) SQL-compatible only
    Answer:B
  2. Sentence transformers can be optimized for edge use with:
    A) ONNX or quantized versions
    B) JavaScript only
    C) QR codes
    D) RESTful logs
    Answer:A
  3. Trade-off in edge-based embedding is:
    A) Higher precision, lower compute
    B) Lower precision due to model size constraints
    C) Higher GPU usage
    D) Unlimited memory
    Answer:B
  4. For offline semantic search, a good stack is:
    A) SQLite + MiniLM embeddings
    B) RedisGraph
    C) Tableau
    D) GPT-4 streaming
    Answer:A
  5. One challenge of embedding on-device is:
    A) Data privacy
    B) Hardware limitations for real-time embedding
    C) Lack of API access
    D) Missing images
    Answer:B

Section AF: Retrieval Strategies & Search Behavior

  1. The top_kparameter in vector search determines:
    A) Number of queries sent
    B) Number of nearest neighbors returned
    C) Chunk size
    D) Database ports
    Answer:B
  2. A high top_kvalue may lead to:
    A) Faster search
    B) Better recall but lower precision
    C) Compressed results
    D) Better keyword matches
    Answer:B
  3. To improve semantic coverage, a good practice is to:
    A) Increase vector length
    B) Chunk documents strategically
    C) Use random queries
    D) Embed metadata separately
    Answer:B
  4. Dense retrieval refers to:
    A) Brute-force SQL
    B) Using embeddings for semantic similarity search
    C) Filtering based on numbers
    D) HTML parsing
    Answer:B
  5. Sparse retrieval refers to:
    A) Embedding-based search
    B) Keyword/token-based search (e.g., BM25)
    C) Vector compression
    D) GraphQL
    Answer:B

Section AG: Embedding Model Selection & Management

  1. An embedding model’s dimensiondetermines:
    A) Color output
    B) Length of its vector output
    C) Number of GPU cores
    D) API rate limits
    Answer:B
  2. Choosing a larger embedding model usually gives:
    A) Shorter vectors
    B) Better semantic representation but higher cost
    C) Lower quality
    D) SQL joins
    Answer:B
  3. Using domain-specific embedding models improves:
    A) Generalization
    B) Search relevance in that specific context
    C) File conversion
    D) Token overhead
    Answer:B
  4. You should not switch embedding models without:
    A) Changing your CSS
    B) Recomputing and re-indexing existing vectors
    C) Saving the HTML
    D) Disabling vector search
    Answer:B
  5. Embedding model drift can cause:
    A) Better relevance
    B) Degraded search performance over time
    C) Smaller files
    D) Vector shortening
    Answer:B

Section AH: Hybrid Search (Keyword + Vector)

  1. Hybrid search combines:
    A) SQL + HTML
    B) Vector (dense) and keyword (sparse) retrieval
    C) YAML + JSON
    D) REST + WebSocket
    Answer:B
  2. Vector DBs like Weaviate support hybrid search via:
    A) BM25 + cosine similarity scoring
    B) XML tags
    C) Local file sorting
    D) Token rewriting
    Answer:A
  3. A benefit of hybrid search is:
    A) Full memory usage
    B) Improved relevance across ambiguous or misspelled queries
    C) Eliminating embeddings
    D) Slower performance
    Answer:B
  4. Hybrid scoring typically uses:
    A) Simple token counts
    B) Weighted combination of sparse and dense scores
    C) Vector concatenation
    D) GPU logs
    Answer:B
  5. A downside of hybrid search can be:
    A) Lack of results
    B) Complex tuning of score weighting
    C) No metadata
    D) Slower embeddings
    Answer:B

Section AI: Storage & Troubleshooting

  1. Vectors are typically stored as:
    A) Strings
    B) Float arrays or binary-encoded formats
    C) HTML tags
    D) PDF streams
    Answer:B
  2. Slow query response in vector DB could be caused by:
    A) High-dimensional vectors + low ANN tuning
    B) Fast disk
    C) Image data
    D) JSON formatting
    Answer:A
  3. A broken embedding pipeline may result in:
    A) Vector drift
    B) Empty or irrelevant retrieval results
    C) Faster indexing
    D) Duplicate logs
    Answer:B
  4. One sign of misaligned vector indexing is:
    A) Perfect recall
    B) Frequent retrieval of irrelevant documents
    C) High cosine similarity
    D) Clean logs
    Answer:B
  5. Best practice before production deployment of vector DB:
    A) Manual testing + relevance evaluation
    B) DNS flushing
    C) JSON formatting
    D) Increasing image resolution
    Answer:A

Section AJ: Data Preprocessing & Chunking

  1. Document chunking improves:
    A) File size
    B) Embedding granularity and retrieval precision
    C) Token pricing
    D) Batch sorting
    Answer:B
  2. A common chunking method is:
    A) Sentence-wise or sliding window with overlap
    B) File-splitting by color
    C) MIME-type detection
    D) PDF page numbers
    Answer:A
  3. Overlapping chunks help preserve:
    A) Index size
    B) Context across adjacent sections
    C) Metadata fields
    D) Token uniqueness
    Answer:B
  4. Preprocessing before embedding usually involves:
    A) Compression
    B) Cleaning, lowercasing, and removing stopwords (optional)
    C) SQL parsing
    D) IP masking
    Answer:B
  5. Too aggressive preprocessing may:
    A) Reduce embedding latency
    B) Harm semantic richness of embeddings
    C) Improve accuracy
    D) Increase storage cost
    Answer:B

Section AK: Embedding Techniques & Tokenization

  1. Tokenization is required before embedding because:
    A) Vectors are only binary
    B) Embedding models operate on tokens, not raw text
    C) It prevents file corruption
    D) It optimizes JSON parsing
    Answer:B
  2. Byte Pair Encoding (BPE) is used for:
    A) Tokenizing text efficiently for LLMs and embeddings
    B) Compressing CSVs
    C) Sorting documents
    D) Counting files
    Answer:A
  3. Long text truncation before embedding may lead to:
    A) Better performance
    B) Loss of context
    C) More accurate scores
    D) Longer vectors
    Answer:B
  4. An embedding vector’s meaning is tied to:
    A) Its position in the DB
    B) The context and model used during generation
    C) The file name
    D) SQL schema
    Answer:B
  5. Embeddings from different models:
    A) Are always identical
    B) Should not be mixed in the same vector index
    C) Can be concatenated for better results
    D) Must be re-tokenized
    Answer:B

Section AL: APIs and Query Patterns

  1. Most vector DBs expose APIs via:
    A) REST and gRPC
    B) FTP
    C) SMTP
    D) Bluetooth
    Answer:A
  2. In an API query to a vector DB, you typically send:
    A) Raw text
    B) A precomputed vector or embedding
    C) Python bytecode
    D) Metadata only
    Answer:B
  3. Query filters in vector DB APIs allow:
    A) Content moderation
    B) Metadata-based narrowing of search results
    C) HTML editing
    D) Chunk reprocessing
    Answer:B
  4. To paginate large search results, vector DBs may offer:
    A) Scrolling views
    B) Cursor-based or offset-based pagination
    C) XML schema
    D) File truncation
    Answer:B
  5. Many vector DBs support client libraries in:
    A) Python, JavaScript, Go
    B) C++, COBOL
    C) HTML
    D) SQL only
    Answer:A

Section AM: Open Source vs. Managed Services

  1. One advantage of managed vector DBs:
    A) Complete control over disk I/O
    B) Reduced ops overhead and auto-scaling
    C) Offline-only access
    D) Manual index updates
    Answer:B
  2. Open-source vector DBs offer:
    A) Full control, auditability, and local deployment
    B) Less customization
    C) No integrations
    D) Always higher speed
    Answer:A
  3. Pinecone and Weaviate differ in that:
    A) Pinecone is fully managed; Weaviate can be open-source or managed
    B) Pinecone runs on-prem by default
    C) Weaviate lacks vector support
    D) Both only work on AWS
    Answer:A
  4. When to choose self-hosted vector DBs?
    A) When latency doesn’t matter
    B) For sensitive data, regulatory requirements, or full control
    C) For mobile apps
    D) When using images
    Answer:B

Section AN: Real-World Failure Modes

  1. A typical cause of low retrieval quality in vector search is:
    A) Using exact match queries
    B) Poor or incorrect embedding strategy
    C) Chunk size optimization
    D) High cosine similarity
    Answer:B

Section AO: Consistency & Index Maintenance

  1. Vector DB consistency means:
    A) Data and index stay in sync after updates
    B) Vectors never change
    C) All queries return zero results
    D) Database size remains constant
    Answer:A
  2. Incremental index updates help:
    A) Avoid full re-indexing on new data
    B) Compress vectors
    C) Increase query latency
    D) Break filters
    Answer:A
  3. Index rebuilding is necessary when:
    A) Embedding model changes
    B) File formats change
    C) User interface updates
    D) Only when queries fail
    Answer:A
  4. Vector DBs typically handle concurrent writes by:
    A) Locking entire DB
    B) Multi-version concurrency control (MVCC) or optimistic concurrency
    C) Halting queries
    D) Data deletion
    Answer:B
  5. Periodic index optimization improves:
    A) Search speed and memory footprint
    B) Tokenization rate
    C) JSON parsing
    D) Embedding quality
    Answer:A

Section AP: Embedding Privacy & Security

  1. Embedding vectors can leak:
    A) User query content if not encrypted
    B) IP addresses
    C) Metadata fields only
    D) File permissions
    Answer:A
  2. Encrypting vectors at rest helps:
    A) Prevent unauthorized access to sensitive embeddings
    B) Increase vector length
    C) Speed up indexing
    D) Replace API keys
    Answer:A
  3. GDPR compliance with vector DBs involves:
    A) Anonymizing data before embedding
    B) Using SQL queries only
    C) Disabling vector search
    D) Running on-prem only
    Answer:A
  4. Access control in vector DBs is important because:
    A) Anyone can modify vectors
    B) Vectors often represent sensitive or proprietary data
    C) It reduces GPU usage
    D) It speeds queries
    Answer:B
  5. Differential privacy techniques applied to embeddings:
    A) Add noise to vectors to protect individual data points
    B) Compress vectors
    C) Encrypt tokens
    D) Split databases
    Answer:A

Section AQ: Query Optimization

  1. Pre-filtering queries with metadata improves:
    A) Vector dimension
    B) Search speed and relevance
    C) API throughput
    D) Tokenization
    Answer:B
  2. Using approximate nearest neighbor (ANN) search trades:
    A) Precision for speed and memory efficiency
    B) GPU for CPU cycles
    C) Data size for color
    D) REST for gRPC
    Answer:A
  3. Caching frequent query results reduces:
    A) Index rebuild times
    B) Query latency
    C) Vector length
    D) Metadata size
    Answer:B
  4. Query rewriting for better embeddings includes:
    A) Adding context or clarifying ambiguous terms
    B) Compressing vectors
    C) Encrypting queries
    D) Adding HTML tags
    Answer:A
  5. Early stopping in ANN search can:
    A) Increase speed with some loss of recall
    B) Increase index size
    C) Delete vectors
    D) Format JSON
    Answer:A

Section AR: Scaling & Architecture

  1. Horizontal scaling of vector DBs involves:
    A) Adding more nodes to distribute load and data
    B) Increasing vector dimension
    C) Using bigger GPUs
    D) Adding more tokens
    Answer:A
  2. Vertical scaling means:
    A) Increasing the resources (CPU, RAM) of a single node
    B) Adding more machines
    C) Reducing index size
    D) Compressing data
    Answer:A
  3. Sharding vector data can:
    A) Help manage very large datasets by splitting vectors across servers
    B) Slow down queries
    C) Increase data loss
    D) Reduce API calls
    Answer:A
  4. Replication in vector DBs ensures:
    A) High availability and fault tolerance
    B) Lower vector dimension
    C) Slower indexing
    D) Reduced metadata
    Answer:A
  5. Load balancing in vector DB clusters:
    A) Distributes query traffic evenly across nodes
    B) Compresses vectors
    C) Deletes unused vectors
    D) Encrypts queries
    Answer:A
  6. Hybrid cloud vector DB deployments allow:
    A) Sensitive data to remain on-prem while leveraging cloud scalability
    B) Only cloud-only usage
    C) Only on-premise usage
    D) No scaling
    Answer:A
  7. Monitoring vector DB performance includes tracking:
    A) Query latency, throughput, and index health
    B) File size
    C) Usernames
    D) CSS files
    Answer:A
  8. Alerts on vector DB anomalies help detect:
    A) Sudden drops in retrieval accuracy or performance
    B) Token counts
    C) API key expirations
    D) Disk formatting
    Answer:A
  9. Logging queries and embeddings helps with:
    A) Debugging and auditing vector search behavior
    B) Faster index rebuilds
    C) File compression
    D) GPU management
    Answer:A
  10. The best practice for scaling vector DBs is:
    A) Start small, monitor performance, and scale incrementally
    B) Buy the largest GPU immediately
    C) Store only metadata
    D) Disable vector search
    Answer:A

Section AS: Embedding Fine-tuning & Customization

  1. Fine-tuning an embedding model can:
    A) Tailor vectors for specific domain vocabulary and semantics
    B) Reduce vector dimension automatically
    C) Break tokenization
    D) Remove metadata fields
    Answer:A
  2. Transfer learning in embeddings involves:
    A) Starting from a pretrained model and adapting it to new data
    B) Copying vectors directly
    C) Using SQL joins
    D) Random embedding initialization
    Answer:A
  3. Fine-tuned embeddings typically require:
    A) Reindexing all existing vectors for consistency
    B) Only updating metadata
    C) No changes to the DB
    D) Switching APIs
    Answer:A
  4. Embedding customization helps improve:
    A) Semantic relevance for niche applications
    B) Disk storage efficiency
    C) GPU usage
    D) Index size
    Answer:A
  5. A downside of fine-tuning is:
    A) Increased training cost and complexity
    B) Reduced vector dimension
    C) No impact on search quality
    D) Data loss
    Answer:A

Section AT: Vector Compression & Storage

  1. Vector compression techniques:
    A) Reduce storage size at potential cost of precision
    B) Increase index size
    C) Delete metadata
    D) Encrypt data
    Answer:A
  2. Quantization is a common compression method that:
    A) Converts float vectors to lower-bit representations
    B) Expands vectors to higher dimensions
    C) Splits vectors
    D) Deletes tokens
    Answer:A
  3. Product quantization helps:
    A) Compress large vector datasets efficiently for ANN search
    B) Format JSON
    C) Encrypt embeddings
    D) Improve CPU speed only
    Answer:A
  4. Compressed vectors require:
    A) Compatible index structures to ensure search quality
    B) Manual decompression only
    C) More metadata
    D) Tokenization
    Answer:A
  5. Compression trade-offs include:
    A) Faster search but possible accuracy loss
    B) Larger vectors
    C) Higher memory use
    D) No speed changes
    Answer:A

Section AU: Query Latency & Trade-offs

  1. Lower query latency can be achieved by:
    A) Reducing index complexity and using ANN search
    B) Increasing vector dimension
    C) Querying all data at once
    D) Using exact match only
    Answer:A
  2. Higher recall in vector search often leads to:
    A) Higher query latency
    B) Lower index size
    C) No change in latency
    D) No relevance improvements
    Answer:A
  3. Batch queries can:
    A) Improve throughput but may increase per-query latency
    B) Reduce index size
    C) Increase GPU costs only
    D) Format logs
    Answer:A
  4. Query caching is effective when:
    A) Queries repeat frequently
    B) Queries are all unique
    C) Index is small
    D) Metadata is missing
    Answer:A
  5. A/B testing retrieval parameters helps:
    A) Optimize balance between speed and accuracy
    B) Remove vectors
    C) Increase vector dimension
    D) Disable caching
    Answer:A

Section AV: Multi-Tenancy & Access Control

  1. Multi-tenancy in vector DBs enables:
    A) Multiple users or clients to share the same infrastructure securely
    B) Single-user access only
    C) No access control
    D) Token mixing
    Answer:A
  2. Tenant isolation prevents:
    A) Data leakage across customers
    B) Vector compression
    C) GPU sharing
    D) API usage
    Answer:A
  3. Role-based access control (RBAC) allows:
    A) Granular permission settings for users
    B) Data duplication
    C) Removing metadata
    D) Token encryption only
    Answer:A
  4. Auditing in multi-tenant vector DBs is important for:
    A) Compliance and security monitoring
    B) Index rebuilding
    C) Vector dimension scaling
    D) Cache clearing
    Answer:A
  5. An API key scoped to a tenant:
    A) Limits operations to only that tenant’s data
    B) Gives full DB access
    C) Disables vector search
    D) Reduces vector length
    Answer:A

Section AW: Future Trends & Innovations

  1. One emerging trend in vector DBs is:
    A) Integration with LLMs for real-time reasoning
    B) Only SQL joins
    C) Replacing embeddings with images
    D) Manual indexing only
    Answer:A
  2. Neural search combines:
    A) Deep learning with vector similarity search
    B) Keyword search only
    C) Encryption and compression
    D) Data deletion
    Answer:A
  3. Zero-shot retrieval leverages:
    A) Pretrained embeddings without task-specific fine-tuning
    B) SQL tables
    C) Manual queries
    D) Metadata filters only
    Answer:A
  4. Federated vector search enables:
    A) Querying multiple distributed vector stores without centralizing data
    B) Only on-prem usage
    C) Single-node operation
    D) Disabling caching
    Answer:A
  5. Quantum computing might impact vector search by:
    A) Accelerating high-dimensional similarity computations in the future
    B) Replacing GPUs today
    C) Removing the need for embeddings
    D) Compressing vectors instantly
    Answer:A

Section AX: Multi-Modal & Cross-Domain Embeddings

  1. Multi-modal embeddings combine:
    A) Text, images, audio, and other data types into a unified vector space
    B) Only text vectors
    C) SQL and NoSQL data
    D) Metadata fields only
    Answer:A
  2. Multi-modal search allows:
    A) Querying with text to find images or vice versa
    B) Keyword-only retrieval
    C) Vector concatenation only
    D) Metadata filtering only
    Answer:A
  3. Cross-lingual embeddings enable:
    A) Searching in one language and retrieving relevant results in another
    B) Only English queries
    C) Vector compression
    D) Metadata translation
    Answer:A
  4. Multi-modal embedding models require:
    A) Large, diverse training datasets
    B) Only textual data
    C) No fine-tuning
    D) Manual indexing
    Answer:A
  5. A challenge with multi-modal embeddings is:
    A) Aligning different data modalities in a shared vector space
    B) Slower tokenization
    C) Reducing vector dimension
    D) Losing metadata
    Answer:A

Section AY: Real-Time Updates & Streaming

  1. Real-time vector DB updates require:
    A) Low-latency embedding and indexing pipelines
    B) Batch reindexing only
    C) Manual refreshes
    D) Disabling filters
    Answer:A
  2. Streaming data ingestion in vector DBs is useful for:
    A) Continuously updating search indexes with new data
    B) Static datasets only
    C) Deleting old vectors only
    D) Manual backups
    Answer:A
  3. Event-driven architectures help vector DBs by:
    A) Triggering updates on data changes automatically
    B) Disabling API calls
    C) Reducing index size
    D) Increasing token length
    Answer:A
  4. Latency targets for real-time vector search are typically:
    A) Under 100 milliseconds for user-facing applications
    B) Minutes to hours
    C) Days
    D) No latency requirements
    Answer:A
  5. A downside of real-time updates is:
    A) Increased system complexity and resource usage
    B) Reduced query accuracy
    C) Data loss
    D) Metadata corruption
    Answer:A

Section AZ: Vector DB Benchmarks & Metrics

  1. Common benchmarks for vector DBs include:
    A) Recall@k, query latency, throughput, and index build time
    B) File size only
    C) Metadata count
    D) User login speed
    Answer:A
  2. Recall@k measures:
    A) Fraction of relevant items retrieved in the top-k results
    B) Disk space used
    C) Number of API calls
    D) Metadata accuracy
    Answer:A
  3. Latency benchmarking helps evaluate:
    A) Speed of query processing under load
    B) Vector dimension
    C) Token length
    D) Metadata storage
    Answer:A
  4. Throughput in vector DB context means:
    A) Number of queries processed per second
    B) Number of vectors stored
    C) Compression ratio
    D) Cache size
    Answer:A
  5. Index build time impacts:
    A) How quickly new or updated data becomes searchable
    B) Tokenization speed only
    C) GPU utilization only
    D) API response format
    Answer:A

Section BA: Emerging Vector DB Architectures

  1. Graph-based vector indexes (e.g., HNSW) use:
    A) Navigable small-world graphs for efficient nearest neighbor search
    B) Trees only
    C) SQL joins
    D) Manual index lookups
    Answer:A
  2. Product quantization is often combined with:
    A) Inverted file structures to scale to billions of vectors
    B) SQL databases
    C) Metadata removal
    D) Manual compression
    Answer:A
  3. Vector DBs using GPUs can:
    A) Accelerate embedding generation and ANN search
    B) Replace CPU entirely
    C) Remove the need for indexes
    D) Compress metadata
    Answer:A
  4. Serverless vector DB architectures:
    A) Scale automatically without manual infrastructure management
    B) Require dedicated servers
    C) Have fixed capacity
    D) Disable API access
    Answer:A
  5. Federated vector search is suitable for:
    A) Data privacy scenarios where data cannot be centralized
    B) Single-user apps only
    C) Only on-prem setups
    D) Static datasets
    Answer:A
  6. Auto-scaling vector DBs:
    A) Dynamically adjust resources based on workload
    B) Require manual intervention
    C) Never change capacity
    D) Disable filters
    Answer:A
  7. Approximate search algorithms trade:
    A) Perfect accuracy for speed and scalability
    B) Security for latency
    C) Metadata for vector dimension
    D) Compression for indexing
    Answer:A
  8. Hybrid indexing can combine:
    A) Multiple index types (graph + quantization) for optimized performance
    B) Only single index types
    C) No indexing
    D) Manual sorting
    Answer:A
  9. Cloud-native vector DBs often provide:
    A) Easy integration with other cloud services and managed infrastructure
    B) Only local deployment
    C) No APIs
    D) Static scaling
    Answer:A
  10. The future of vector databases will likely emphasize:
    A) Better integration with AI models, real-time updates, and privacy guarantees
    B) Replacing vector search with SQL only
    C) Manual indexing
    D) Fixed hardware requirements
    Answer:A

 

 

LEAVE A REPLY

Please enter your comment!
Please enter your name here