The texture unit batches up a bunch of requests and performs the addressing math on them-selecting mip levels and anisotropy, converting UVs to texel coordinates, applying clamp/wrap modes, etc. When a shader performs a texture read, it sends a request to the texture unit across a little bus between them the shader can then continue executing if possible, or it may get suspended and allow other shader threads to run while it waits for the texture read to finish.
Texture units operate independently and asynchronously from shader cores. Texture units have texture caches, and shader units have caches for instructions and constants/uniforms, and maybe a separate cache for buffer data depending on whether buffer loads are a separate path from texture loads or not (varies by GPU architecture). The whole chip shares a single L2 cache, but the different units will have individual L1 caches. They may be grouped together with one texture unit per shader core, or one texture unit shared among two or three shader cores, depending on the GPU. In addition to the shader cores there are also texture units. A small GPU in a notebook or tablet may have only a few cores while a high-end desktop GPU may have dozens. At the top level, a GPU is subdivided into a number of shader cores.