Because very little text results in weak, noisy, and ambiguous vector representations, cosine similarity difficulties with brief queries.
Simply put, a three-word question lacks sufficient semantic signal for stable embedding geometry.
For instance:
"reset password"
This might imply:
-
API endpoint
-
database credential reset
-
forgot password flow
-
admin reset
-
Resetting Active Directory
-
SaaS account recovery with the Linux passwd command
Underspecification occurs in the embedding.
Fundamental Issue
The angle between vectors is measured via cosine similarity:
A × B = cos(θ)
When vectors include rich semantic information, this is effective.
Brief queries result in:
A lot of unconnected pieces seem "similar enough."