All clustered indexes are unique. They have to be, since the row
data is stored in the index's leaf node. If a clustered index is
not unique by itself, SQL Server automatically adds a
"uniquifier". The potential advantage to including the PK in the
lastname, firstname clustered index would be for a query like a
lookup list (Select id, lastname +', ' + firstname ...), where all
fields could now come from the index entry without needing to read
the leaf node data. If the PK is an id, there's not even an index
size penalty since SQL Server would be adding a unique integer
anyway.
The real-world scenario under contemplation here is a an existing
clustered index on 2 fields that together are not unique vs. a
3-column clustered index that is unique. There is no scenario where
my app would ever retrieve just those 3 columns, as there isn't
enough information there to be useful -- consider that if one were
crazy enough to populate a combo box from a 350K-record table, with
just those three fields, you'd be unable to distinguish the
duplicate names.
The clustered index should be a big improvement if the table is
large enough, and if you are retrieving a subset of the data in
that order.
But only if the subset of data is just the three fields in the
clustered index, right?
So a
query like the lookup list above gets a modest benefit because the
sorting no longer needs to be done after retrieval. A search query
on last name (Select... Where lastname like 'Smi%') gets a big
improvement because SQL Server does not have to lookup the row
data once it finds the index values that qualify for the
condition- the data is right there in the index leaf node,
substantially reducing the number of pages SQL Server needs to
read.
Given that I always ask for more fields in all my SELECT statements,
all of which have a WHERE clause (either on one or both of the name
fields or on the PK alone), is there going to be any benefit to
retrieving those 3 fields from the clustered index when there are
other fields that have to be retrieved from the actual data pages?
If not, in regard to sorting, is there going to be a benefit to
adding the PK field? That is, is eliminating the overhead of the
"uniquifier" going to speed up the sort?