Prosody is the part of spoken language that are properties of larger-than-single-phoneme units (you'll often see terms like 'non-segmental' / larger than single segments).
Prosody is usually categorized as part of phonetics, though it has tendrils in more than that,
largely due to "anything larger" involving varied things that different languages use it for.
That said, a lot of prosody tends to focus on syllables and the way we lay intonation (in non-tonal languages), stress, and rhythm into them, probably because that is a common thing to do in many languages we commonly study (not because that is the only thing in prosody).
When you study prosody, it may be practical to sort into
- Physical - what we can measure
- Perceptual - how we experience
- Lingtuistic - what linguistic mechanisms use this
...also because a number of terms tend can be though of differently from each view
- For example
- fundamental vocal fold frequency is physical, pitch may be perceptual, intonation linguistic
- similar with intensity, loudness, (its effect on e.g. stress)
- mode of vocal fold vibration is physical, laryngeal voice quality more perceptual
- duration, perceived duration
- tone and stress is on the linguistic side, and what does and doesn't determine it varies with language
- lots of individual tidbits, like that embedded sentences might often have lower fundamental frequency, are linguistic
...and it can help to keep those three angles separated in your head even if the terms sometimes fail to do so.
There are also questions like that pauses technically do not fit that description directly, but because they are sort of the whitespace around the content, they could be considered prosodic boundaries, so may be is typically considered part of prosody as well - the sound equivalent of "this sentences changes meaning depending on where you put the comma"
Also consider the argument whether pauses are part of rhythm or not. In rehearsed speech pauses tend to be intentional and creative rhythm, while in in spontaneous speech it is often primarily hesitation, which has little to no contentful information. Also, longer pauses are often none of the above, and at best are considered separation from the last utterance.