Monday, July 4, 2011

... or close up the wall with our English dead

So I'm writing a PayPal module. Again. Third time. Once more unto the breach, indeed. Maybe this time I'll get it right.

See, first go I tried doing it the 'right' way. I read all the documentation, ran all the sample code, and got a sandbox account. Everything was lovely, until the day came to deal with real transactions.

The second system was born from the still-burning ashes of the first in a very phoenix-like way. (I assume most phoenixes start life confused and terrified by the fiery collapsing buildings they would tend to wake up inside...) I never want to be patching a live database's table structure before an IPN message retry deadline again.

Here's the total sum of everything I wish I'd known before starting:

PayPal Lies.

With a little more experience I realize that they are at least consistent liars, a slight mercy.

Now for some of you, the word 'Lie' may seem a little strong. Perhaps it is. Consider the following definition of the "receiver_id" IPN field from the PayPal Documentation:


receiver_id
Unique account ID of the payment recipient (i.e., the merchant). This is the same as the recipient's referral ID.
Length: 13 characters
On nearly every transaction, at least every one you see from the sandbox, this field contains our own Merchant ID, and we are encouraged to check this is indeed our own shop identifier before allowing the record through.

It's only when you finally get an "adjustment" record (extremely rare) that you see the mistake. On an adjustment, "receiver_id" is completely missing. But wait... there's our merchant code, but what's it doing over in "payer_id"?

That's when it twigs. PayPal means the receiver of the payment, not the IPN message. (Which is also called a 'receiver' endpoint) So when PayPal reverses a payment, they also reverse the ID fields.

This might sound sensible... for about five seconds. Random ordering of the 'From' and 'To' fields based on another 'Type' field is really not a great way to build foolproof systems. Especially when one of those fields is supposed to be used as a primary fraud protection measure.

A clever person might ask: "what happens if you buy something from your own shop? How does it handle the same ID in both fields?" but at least that question is easily answered: PayPal will not allow you to transact with yourself. Apparently it sends you blind.


So, when they literally say "(i.e., the merchant)" in the specification text they are badly mis-characterizing the API, leading you up a garden path. It's not like there's other documentation to cross-check this against either. At best, they're being confusing in an API intended to process millions of dollars worth of other people's money.

Here we go, I'll spend thirty seconds re-writing that definition to what it actually is:


receiver_id
Unique ID of the PayPal account that received this payment.
Length: 13 characters

And it's shorter, too. But even better would be using a design pattern that doesn't make the mistake of mixing concepts. Instead of a payer_id and a receiver_id which swap order, y'know, depending - they really should have a merchant_id and a customer_id which always have the corresponding data in them.

Even though payer_id and receiver_id are exactly the same data type, they are semantically different and their content should never appear in the other field. The consequences are just too terrifying.

I remember reading about a similar design error in the US Army's Cruise Missile Targeting Request software form which had the 'Target Destination" and the "Request Source" GPS co-ordinate fields right next to each other, with predictably unfortunate results. Sometimes data schema validation goes beyond just making sure the value is the right length.

Also, which 13 characters? Numbers? ASCII? UTF-16? A spec shouldn't be this ambiguous. Any "specification" which doesn't say things specifically is, to my mind, a big fat lie in a binder.

No comments:

Post a Comment