As a design concept, Challenge Rating is broken nonsense.
At first I thought it was just me running it wrong. Then I found the "discussions" on Paizo, Reddit and elsewhere of GMs' problems balancing encounters with RAW CR.
It no work.
The reality is that action economy will always trump the best CR calculations. How were the PCs built (rolls, array or PB)? How much 'arcane drip'? Also, how good or bad is the GM at tactics? Then the math rocks enter the chat ...
So I just dumped CR and went back to eyeballing encounters, just like we do with old-school D&D