Posts Tagged ‘UniData’

Security is not Obscurity. Even in U2 [Part 3]

March 1, 2010 1 comment

A few years ago I read an interesting article titled Denial of Service via Algorithmic Complexity Attacks. When I started working with UniData, It never crossed my mind that U2 had the same class of vulnerabilities, but it does.

If you develop for a U2 system where you cannot afford for malicious internal/external entities to adversely affect system performance, then I highly suggest you read the above linked paper.

I’ll divide this into 3 sections.

Hash file vulnerability
Dynamic Array vulnerability

The first place I’ll draw your attention to is the humble hash file at the core of UniData and UniVerse. As you probably know, each record is placed in a group dependant on the hash value of its record ID, along with the modulo and hashing algorithm of the file. Now, there are 2 hashing algorithms that a hashed file can use. Type 0 or ‘GENERAL’ is the default, general use hashing algorithm, whereas Type 1 or ‘SEQ.NUM’ is an alternative you can specify and is designed to handle sequential keys. The hash file is basically a hash table with chaining.

Let’s assume we’re working at the HackMe Ltd company that has made a public website to integrate with their existing backend system, which is UniData driven. It is decided that people can pick their own usernames when signing up. Since these usernames are unique, they have been used as the record ID.

Ever since he was not hired after interviewing at HackMe Ltd, Harry has wanted to show them up. Knowing that they used UniData on the backend from his Interview (and their job ads), he installed UniData and makes some initial guesses at the modulo for their ‘users’ tables and calculates a few usernames sets for different modulus.

Now, by going to their website and taking timings for the “Check username availability” feature, Harry was able to become reasonably sure of the modulo for the file. Setting up his computer to run all night generating keys that hashed to a single group. Setting up his email server to automatically do a wget on the confirmation URL on received emails (hence getting around the “Confirm email address” emails).

The next day he runs a script to sign-up all the usernames gradually over the day. After they have all been signed up, Harry now simply scripts a few “Check username availability” calls for his last username generated to start his Denial of Service attack. Essentially, he has taken the non-matching lookup performance of the hash file from O(1 + k/n) to O(k) (where k is the number of keys and n is the modulo). Even worse than that, because of how level 1 overflows work, it now requires multiple disk reads as well (UniData only I believe). Continual random access to that file that is heavily weighted in one group is O(k^2)

Now, to give you a visual example, I have run a test on my home machine and produced 2 graphs.

Test specs:

CPU: Core Duo T7250 (2.0GHZ)
OS: Vista SP2 (32-bit)
DB: UniData 7.2 PE (Built 3771)
Hash File: Modulo 4013 – Type 0

The test:
Pre-generate 2 sets of numbers. One is of sequential keys, the other is of keys chosen because they all hash to a single group. Timings are recorded for the total time in milliseconds for:

  1. Write null records for all the keys and
  2. read in all the records.

Separate timings for sequential and chosen keys are taken. The test is repeated for different key counts from 1000 to 59000 in 1000 increments.

DOSAC UniBasic Code

First Graph – Sequential key timings by themselves:

Sequential Timings Only

Second Graph – Chosen key alongside sequential key timings:

Sequential and Chosen Timings

Naturally, timings are rough, but they are accurate enough to paint the picture.

Actually, now that I’ve mentioned painting…

Have you heard of Schlemiel the Painter?

Schlemiel gets a job as a street painter, painting the dotted lines down the middle of the road. On the first day he takes a can of paint out to the road and finishes 300 yards of the road. “That’s pretty good!” says his boss, “you’re a fast worker!” and pays him a kopeck.

The next day Schlemiel only gets 150 yards done. “Well, that’s not nearly as good as yesterday, but you’re still a fast worker. 150 yards is respectable,” and pays him a kopeck.

The next day Schlemiel paints 30 yards of the road. “Only 30!” shouts his boss. “That’s unacceptable! On the first day you did ten times that much work! What’s going on?”

“I can’t help it,” says Schlemiel. “Every day I get farther and farther away from the paint can!”

(Credit: Joel Spolsky, 2001)

When looking at Dynamic arrays in U2, you should see how they can be exactly like a computerised version of Schlemiel the Painter. In fact, a public article on PickWiki pointed this out quite some time ago. UniData is affect more so than UniVerse, in that UniVerse has an internal hint mechanism for attributes. The problem with this is, if an uncontrolled (eg, external) entity has control over the number of items in the dynamic array, you could be vulnerable to a Denial of Service attack. It could even be unintentional.

So, let’s see what all fuss is about. Firstly, a quick recap on the issue with dynamic arrays.

Essentially when doing an operation like “CRT STRING” it has to scan the string character by character counting attribute, multi-value and sub-value marks as it does. If you increment Y or Z (or X in UniData’s case) and do the same operation, it has to re-scan the string from the start all over again. As the number of elements increases, the more noticeable the flaw in this method becomes. In fact, cycling through each element in this manner is an O(k^2) algorithm.

I’ve seen this issue bite and bite hard. It involved 2 slightly broken programs finding just the right (or wrong) timing.

The first program was a record lock monitoring program. It used GETREADU() UniBasic command, after which it looped over every entry and generated a report on all locks over 10 minutes old. This process was automatically scheduled to run at regular intervals. It had been operating for months without issues.

The second program was a once off update program. Basically, it read each record in a large file, locked it then if certain complex conditions were met, it updated an attribute and moved on to the next record. See the problem? It didn’t release a record if it didn’t need updating. The processing was estimated to take about 30 minutes and as it turns out, not many records met the complex conditions.

See the bigger problem now? Yup, that’s right, the dynamic array returned by GETREADU() was astronomical! This resulting in the monitoring program saturating a CPU core. The same core the update program was running on. Uh oh! System performance issues ensured until the culprit was found and dealt with.

So, what do we do about these issues? You want a stable system right? One that is less easy to bring to its knees by malicious users and unfortunate timings of buggy code?

Hashed files:

DO NOT use external input as record keys! Place it in attribute 1, build a D-type dictionary and index it if you need, but not use it as the @ID!

A further option would be to have Hash files and their hashing algorithms updated to be able to deal with this type of malicious edge case. Other languages have (take Perl for example) updated their hash tables now to use hashing algorithms to be seeded at run-time. These means you cannot prepare ‘attack’ keys ahead of time and cannot replicate how the hashing works on another computer, since the hash algorithm will be seeded differently. Obviously, this cannot be done exactly the same with Hash files, is they are a persistent data store. It could however be done on each CREATE.FILE. That way, even if a malicious party can determine the modulo of a file, they be able to duplicate it on their system as each file will be seeded differently. Doing this would bring UniData and UniVerse inline with the security improvements made in other modern stacks.

Dynamic arrays:

This one is simple. Use REMOVE, don’t use simple FOR loops. Think through your data and were it is being sourced from. Is it from external entities? Is it from internal entities whose behaviour cannot be guaranteed to remain within safe bounds? If the answer to either of those questions is even a ‘Maybe’, stay safe and use REMOVE.


U2 Dictionaries [Part 2]

February 10, 2010 1 comment

In the last post I suggested that each piece of information in a file record needed an associated dictionary item.

Some may look at their files and realise it just cannot be done. In that case, “you’re doing it wrong”.

Common case: You have a file that logs transactions of some sort. For each transaction, it just appends it to the record, creating a new attribute.

There are several issues with this style of record structure.

Firstly. You cannot create dictionary items to reference any information (except of course, unless you create subroutine and call it from the dictionary). For example, if each transaction has a time-stamp, you cannot use UniQuery/RetrieVe to select all records with a certain time-stamp.

Secondly, any time you read in the record and need to count how many transactions are in the record, it needs to parse the entire record. Now, if you have each bit of information in a record stored in its own attribute (say time-stamp in , amount in , etc) it would only need to parse the first attribute, potentially cutting down on the CPU expense greatly.

So, if you must store some sort of transaction/log style data in a U2 record, please reconsider the traditional approach of appending the whole transaction to the end and take a more U2 perspective by splitting each bit of information into its own attribute. This way, it will be much easier to use U2’s inbuilt features when manipulating and reporting on your data.

U2 Dictionaries [Part 1]

February 5, 2010 Leave a comment

Something that often gets overlooked in the U2 world is best practice regarding dictionaries.

Before I get into it however, a very brief introduction to dictionaries for those who are new to UniVerse and UniData.

SQL databases have a schema which defines what data can be stored and where it is stored in relation to the rest of the data. This means every bit of data has a related section in the schema with gives it a name and a type.

UniVerse and UniData do not do this. The schema (dictionary) is simply there to describe the data (as opposed to define). You can give sections of the data arbitrary names and/or data types. In fact, you can give the same location multiple names and types, or even create a dictionary item that describes multiple other sections! Each file uses another file called a dictionary to hold its ‘schema’ (Which, for the rest of this post, will no longer be called a schema since it is misleading).

According to the UniData “Using UniData” manual, it describes a dictionary as containing “a set of records that define the structure of the records in the data file, called D-type records”. Now, it is very important to remember this next point: The manual is at best overly optimistic and at worst flat-out lying.

In SQL (excluding implementations such as SQLITE), if you get a table schema and it informs you that the third column is an INTEGER and it was called ‘Age’, then it would be safe to assume it was what it said it was. In the worst case, you can be certain it won’t ever return a value of “Fred”. In UniVerse and UniData, the dictionary doesn’t even need to contain a record to describe the third attribute (an attribute is sort of like a column, but different).

Also of note to new players is that D-type records are not the only records in a dictionary file. There are 3 other types of records to consider. Once again, straight from the manual: ‘A dictionary may also contain phrases, called PH-type records, and items that calculate or manipulate data, called virtual fields, or V-type records. A user may also define a dictionary item to store user-defined data, called X-type records’.

What does this mean for you? Well, like most of U2, when looking at the records in a dictionary file, anything goes. Some could be accurately describing the file structure, while others could be getting fields from sections in a completely different file. Some again could have nothing to do with the data in the file at all and are merely there because a programmer has used it as a convenient dumping ground. Also to consider is the item being completely wrong.

There are 2 sides to this. 1) It can make development faster as you can just tack on extra bits of data with no maintenance work required. 2) As a result of 1, you can quickly find systems in a state where your records have mystery data and you cannot even begin to work it out without scouring through many programs and manually inspecting the data.

Even more confusing, is that you can have multiple records referring to the exact same location but describing the data differently.

If you were to describe the U2 system holistically, you could call it a weakly-typed system. Whereas some other databases, query languages and programming languages are strongly-typed.

This is where best practices come in. Here are several simple rules, that if followed, should go a long way to ensure your dictionary files are useful, accurate and easier to maintain.

  1. If you ever add new data to a record, then you MUST create at a minimum, a D-type record to describe each piece of data.
  2. Always check that an appropriate dictionary item doesn’t already exist before creating a new one to reference a section of data in an existing file
  3. If you come across a missing dictionary item, don’t ignore. Either create it or add it to whatever bug-tracking system you use.
  4. Remember, after the type in attribute 1, you can write anything. Use this to describe what the data is if the name isn’t sufficiently self descriptive.
  5. Also, if the data is effectively a foreign key for another file, use the end of attribute 1 to mark it as such (including the main file it references).
  6. Use the User-type record ability to add a single record that describes the general purpose/usage of the overall file. Give it a simple/recognisable name like README or FILEINFO

Security is not Obscurity. Even in U2 [Part 1]

December 6, 2009 2 comments

Sure, we may benefit from whatever shelter is derived from running a less widely used/understood system, but relying solely on security through obscurity with your U2 system is, to put it nicely, extremely naive in this day and age. It just doesn’t add up when you pay thousands of dollars for firewalls and other network security paraphernalia and wouldn’t dream of allowing raw user input through in your SQL-based applications.

U2 may have different syntactical spices and a different method of representing data than its mainstream counterparts but the core principles behind secure coding practices still apply:

The list goes on.

So, what specific vulnerabilities should we look out for?

Let us start with the humble EXECUTE/PERFORM statements in UniData and UniVerse. SQL Injection is a widely know subject, but how many U2 developers have considered UniQuery/RetrieVe Injection? Did you know that in some cases, malformed UniQuery in a EXECUTE can drop you to ECL?

As developers in the U2 world, the same lessons learnt in SQL Injection can and should be applied when using *EXECUTE/*PERFORM, etc. Sanitise your input!

Do you have any statements that work like this?


In this case, TAINTED.INPUT is either supplied by a user or is comes from an external source. The results from the SELECT statement are now compromised and can contain any data. Take for instance the following input for this contrived example:

" OR WITH CC.NUMBER = "4657000000000000

Essentially, this converts the innocent SELECT statement which, for example, was used to search for customer’s first names to get contact numbers, into one which can be used to find Credit Card numbers (hopefully though, your CC numbers are encrypted in some manner). Even worse, if your program displays error messages that reveal record names when they cannot be read, then an attacker with patience can reveal almost any data they want from your system.

Remember, in UniData you can use the ‘USING’ keyword to specify any FILE to source the dictionaries (UniVerse does not have this, I believe). Aside the all the usual manipulation of results, the USING means that if someone can control the first few lines of a record (temp data dumping file anyone?) then using the SUBR() call, they can even cause programs and subroutines to be called!

Before you EVER use input from a user or an external source, make sure it is validated and sanitised. Expecting a number? Use MATCH and 1N0N. Expecting a name? Make sure it is doesn’t contain double quotes. Don’t want to allow ‘searching’ with your SELECT? For example, I use the following check to ensure the user input doesn’t escape the SELECT string with double quotes or attempt a wildcard search with [ or ].


Further to this UniQuery/RetrieVe Injection vulnerability, earlier I mentioned that in certain situations you could cause it to crash to ECL.

Are you running UniData in PICK mode? If you are, I suggest you type ‘UDT.OPTIONS’  at ECL right now and continue down until you see whether ’41  U_UDT_SERVER’ is set to ON or OFF. Now, did it say OFF? If so, then read on because you may be vulnerable.

While that option is turned off, certain malformed UniQuery statements can cause you to crash straight to ECL, even if you are in a program called by a program.

Lets see an example. First, compile the following program in Pick Mode.

CRT "Enter the program to select"
CRT "Executing query..."
CRT "Checking results..."
   CRT "Program FOUND!"
   CRT "Program doesn't exist"

Now, when you run the program (I called it CRASHTEST), put a record ID that exists in BP. Try again with a record ID that doesn’t exist. Your results should be something like this:

Normal Program Operation

Normal Program Operation

Looks good. Program works as expected. Underneath this simple program though, lies a bug feature in the Pick Parser. To show this, I will use a specifically formed input that will make the programs SELECT malformed in a manner to gain ECL access. This time when you run the program (with UDT.OPTIONS 41 off) type in this input, including quotes:
" @ID="
In this case, the program will crash out before ever returning.

Crashed from EXECUTE

Crashed from EXECUTE

There are 2 ways to deal with this. The first way is to set UDT.OPTIONS 41 to ON. This will result in the EXECUTE returning so we can handle it whatever way we wish.

The other way is to set ON.ABORT. I created a VOC paragraph for this, called EXCEPTION as follows:

DISPLAY This program aborted

Running the same test above now results in:

Aborted program

Aborted program

Personally, I believe the method that returns control to the program (UDT.OPTIONS 41) and handles it accordingly (always check what was set by RETURNING) is the safest since it doesn’t give away that the program has a compromised EXECUTE statement. However, this may not always be a viable option, so make sure you at least have ON.ABORT set.

Xalan errors in U2 XML

December 1, 2009 Leave a comment

Just a quick note about errors from U2 XML. I’m assuming UniVerse uses the same XML parser as UniData.

I was playing around with converting XML to HTML using XDOMTransform yesterday when an error got returned. After using XMLGetError I could see it was an ‘uncaught XalanDOMException’ (code 14 to be exact, UD error code 16.) That meant exactly zilch to me, so I looked up the Xalan documentation and found the following:

XalanDOMException Class Reference

Turns out it was unhappy about my namespace attribute in the html tag inside my xslt file.

Maybe this reference will be useful to someone else if you encounter other XalanDOMException errors from U2.

Statement Code Coverage Testing

November 27, 2009 1 comment

On the right-hand side you will see a link to the “Unibasic Code Coverage” project on sourceforge. This tool will enable you to perform statement level code coverage tests on UniData code. The results are then tabulated and saved as a colour coded HTML file.

This is based on the original prototype I wrote before further developing it for my current employer. Although this open source version is relatively simplistic, I will be progressing it by back-porting features as well as trialling new ideas.

If you try it out, leave a comment or send me an email and let me know how it went. If you are interested in helping with the development of it, or making the minor changes needed to port it to UniVerse then get in touch!

%d bloggers like this: