If you are new to quantum theory you might wonder what is the big deal about entanglement? You have heard the story – we create two qubits in an entangled state and then separate them spatially. Theoretical physicists sometimes to make their point will exaggerate and say send one qubit to Andromeda galaxy and keep the other with you. Then you measure one qubit and the instant you measure it, you know what the state of the other qubit will be. Your friend can make the measurement in Andromeda and get the opposite result. So what is so special about this? I can have two boxes. Put a coin that is heads up in one box and in another box put a coin that is tails up. Then randomly I send one box to Andromeda. Now I don’t know the state of the coin in the box that I am left with but if I open it I will know and my friend is sure to get the opposite result. So what is the big fuss about?
We made two assumptions here – local realism. Assumption 1 (reality): we assumed the state of the coin (heads or tails) existed independently of our observations. Assumption 2 (locality): observing the state of one coin cannot change the state of the other coin instantaneously (this is what is referred to as spooky action at a distance).
Now to get a little more technical and precise, substitute heads or tails with spin and suppose the spins of the two qubits were governed by anti-correlated waveforms like below:

Again, given this configuration there is no mystery. When we measure one spin, we know what the other would be. There is no paradox. The two spins are related as:
and herein lies the thesis of the EPR paper – we have to introduce a hidden variable
to explain the observations and
is the element of reality that the spin existed whether you choose to measure it or not. But Bell showed that the results of QM – the predictions it makes – are inconsistent with any local hidden variable theory. And that is what is so special about entanglement.
Re: the element of reality, I think the Stern-Gerlach experiments show that the spin does not exist independently of observations. If the spin had a definite direction independent of our observations then measuring the spin along an orthogonal direction would give zero. But as we know you can choose any direction you want you will get +1 or -1 and you can keep on repeating as much as you like (e.g., I measure along Z, then Y, then X, then Z again, then X, then Y and so on ad infinitum), you will keep getting +1s and -1s. Remember when you measure along Z, the spin will collapse to +1 or -1 (assuming it started in the superposition state say |+>). But this state (|0> or |1>) is a superposition in the X basis (what is |0> for Z becomes |+> for X!) so now when you measure in X basis you will again get +1 or -1 with equal probability! Then if you measure along Z or Y you again get +1 or -1 because in those bases the spin is again in superposition! You don’t have to measure along X, Y, Z necessarily. The detector could be oriented along an arbitrary angle theta – but in that case you will not get equal superposition of |0> and |1>.
So SG showed QM violates reality – the spin doesn’t even exist in reality – the act of measurement (observation) materializes the spin. Bell showed QM violates locality. And that is what is so special about QM in a nutshell. It violates both reality and locality. You only need a single qubit to demonstrate violation of reality and you need two qubits in an entangled state to demonstrate violation of locality. A simple way to understand the violation of locality is that in an entangled state the two qubits act logically as a single inseparable entity – their fate is entangled (the joint pdf is not factorizable). But the fact that we can separate them physically is the root cause of the problem.
PS: Susskind in his book (p. 223, Section 7.9) defends locality but I think he does a cheating. He confines himself to unitary evolution and granted in that case nothing that Bob does to his qubit can influence Alice’s density matrix. The problem arises if a measurement is performed and an outcome is known – that instantly changes Alice’s density matrix and that is the violation of locality. Otherwise the whole issue would have been settled long ago and we won’t be having so many debates and YouTube videos on this. If you read the section in his book carefully he extends Alice and Bob’s system to include their apparatus as well to work around the issue so that a measurement can be absorbed in terms of unitary evolution of the combined system.
Disclaimer: Please don’t take anything you read here for granted. I make these notes for myself and I study QM purely as a hobby. By no means I am an expert on this subject. Pick up a good book if you really want to learn QM.