EBCDIC -vs- The World

My good friend and former co-worker Mike related today his struggles with 'autoconf' on MVS.
He has a better grasp of cross-platform issues, like 'make' logic that works as well on z/OS as on Unix, than most people I know personally.  Mike tells me the './configure' step works fine, but that a specific package using it refuses to support EBCDIC and it sounds like a religious matter. [sigh]

When I first encountered "Open Edition" (as it was called then), I was delighted and dismayed.
First I launched a shell and found all those Unix commands that I had seen on other platforms. But when I brought in a TAR file with my own bag of tricks, it failed.  The archive was intact, but my scripts crashed.  Trying to eyeball one of them, I got garbage.  Then it hit me: they were all in ASCII. But more significantly, the system was EBCDIC.  Duh!

I assumed what so many others assume: If it's Unix, it's ASCII. But I was wrong.
It took several months before I could accept that OVM and OMVS being EBCDIC was not only okay, but was and is "the right thing".  But developers who do not know our beloved mainframe environments have not walked this path and may react against it.  (As the authors of this package Mike is wrestling with appear to have done.)

The designers of the POSIX specification and of the X/Open Unix brand were very careful about what is defined and where, what is required and how.  Just what makes a system Unix?  For ten years, MVS has passed the test and is Unix branded.  But surely none of us expect "a Unix person" to accept MVS that way.  The single biggest difference between OpenMVS and "real Unix" is the character set.  It is a curse and a blessing.

Let me first mention the blessing.
CMS and z/OS, even with a POSIX face, must be EBCDIC for the sake of the legacy code they run. For all their faults,  this is one place IBM is exceptional:  They support historical workloads.  (They do it better than a certain other vendor of operating systems which shall remain nameless.)  The old code works.  But the old code uses EBCDIC for character representation.  After chewing on this for more than half a year, I realized that it must be so for the POSIX side as well, or there would be grossly confusing results.

In theory, the character set should be as easily replaced as most other aspects of the system. (For example, we let users run CSH instead of Bourne exclusively, which has grave consequences if they want to do scripting.)  In practice, the character set is more deeply entrenched.  When moving from one Unix to another, the theory was  "just recompile".  In practice, we know it doesn't work so smoothly. This is bad.  This is sad!

Programmers make assumptions.   I know: I'm a programmer, so I'm just as guilty.
There are ways to render any application source "character set agnostic".  Such techniques take time and practice.  Is it worth the hassle?  Yes!  Today,  the unnamed authors of the unidentified package Mike is wrestling with reject EBCDIC.  It's not that they can't as much as that they won't.  What is heartbreaking is that they have already done the tough part:  they deal with  differing character encodings.  Supporting EBCDIC for them would be no extra mile (IMHO), and their attitude paints them into a corner where they'll have trouble with any new-and-wonderful encoding yet to be devised.

Thankfully, compiler writers tend to be more disciplined than the rest of us.  The foundation is strong:  Any special character is represented by a well defined and always expected meta-character or escape sequence. Notably, newline is always  coded as   "\n",  never as  0x0A.  Even the most ASCII-entrenched Unix hack will chastise the programmer who uses the hexadecimal rather than the symbolic.   We all need to be more consistent.

The problem does not simply go away when we are more diligent.
There continue to be situations where character encoding bites us.
But as source code grows more robust, we can make progress.

-- R;


by sirsanta November 21, 2005 in Application Development
Permalink

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d834521c8469e200e55069b6588833

Listed below are links to weblogs that reference EBCDIC -vs- The World:

Comments

And now it gets even more interesting with Unicode. For example the appearance of Unicode columns in the DB2 (Version 8 NFM) Catalog.

Posted by: Martin Packer | Nov 21, 2005 3:45:39 PM

Funny...

There's an EBCDIC machine that was one of the early targets of some open source projects, and it went very well.

One of the earliest ports of Apache was to (then) Siemens-Nixdorf-Informationssysteme's BS2000/OSD. This was well before any port to an IBM EBCDIC platform.

Several other open sources have been ported to BS2000/OSD as well since then.

Perl used to support an EBCDIC build, but with 5.8 (working internally with Unicode) they stopped - but I think you can still build 5.8 under z/OS USS or z/VM. (z/VSE, probably not.)

Posted by: Ray Mullins | Nov 21, 2005 4:42:47 PM

I'm always fascinated by software vendors who respond "not interested" when a customer approaches them about opening up a new market opportunity for their product. Especially a market opportunity on IBM's flagship enterprise server. Didn't they ever read about Digital Research and Microsoft?

Posted by: Timothy Sipples | Nov 23, 2005 4:52:12 PM

The comments to this entry are closed.



The postings on this site are our own and don’t necessarily represent the positions, strategies or opinions of our employers.
© Copyright 2005 the respective authors of the Mainframe Weblog.