UniData, like other weakly-typed systems, makes some programming tasks easier by not needing the developer to declare and adhere to a data type with variables. The general pros and cons of this have been debated many times across many languages and hence will not discussed here. What will be discussed is specific cases where this can cause you unexpected headaches.
A posting was made on the u2ug forums by Koglan Naicker about some unexpected issues when finding duplicates in a data-set.
In short, he found that when strings contained large numbers, it would sometimes incorrectly evaluate two different strings as equal. For example:
IF '360091600172131297' EQ '360091600172131299' THEN CRT "Equal"
The above code results in “Equal” being displayed on the screen. This is caused by a combination of 2 factors.
The first being that UniData is weakly typed. This means that it does not explicitly distinguish between strings and numbers, but attempts to determine the data type by examining the data. In this case, since the strings are numeric, it automatically treats them as numbers.
The second part of this issue is because now that it is treating those 2 strings as numbers, it needs to handle them in an appropriate data type on the CPU. Since the 2 strings are too large to be treated as an integer, they get converted to a floating-point number. Due to rounding that occurs, this actually results in both of these strings being converted to the same float-point representation! A better method may have been to use something such as Bignum instead of converted to floating-point. There would be a speed trade-off, but surely that would have been better than potentially incorrect programs.
Some people suggest prefixing or appending a non-number character to each string to force them to be treated as a string. Not entirely elegant and can have performance implications. Fortunately, UniData does have proper functions to handle these situations. In the case where you will be comparing strings that may consist of only numeric characters, you should use the SCMP function. This function compares two strings as strings, regardless of the actual data in them. Using this when you need to be certain how the comparison is performed can save you a lot of headaches in the future.
Also of interest is that this issue doesn’t just apply to UniBasic, but can also affect UniQuery!
It should be noted though, this only affects UniQuery when the dictionary item is right-aligned with the format field (eg, 20R in attribute 5).
You can tested this by creating a file and creating 3 records with the @ID of ‘360091600172130474’, ‘360091600172131297’ and ‘360091600172131299’.
Now, select upon the file where the @ID = ‘360091600172131297″ and you can see that 2 records are returned!
When explicitly selected a record via a unique key, this isn’t the result a database should return.
So, when dealing with large, potentially numeric fields with UniQuery, you may need 2 dictionary items. A left-aligned one for selecting on and a right-aligned one if you require numerical sorting.
One of the benefits with U2 Data servers is that it can be extremely quick to turn-around a new system. The unfortunate downside is that this makes it extremely easy to ignore the architecture of your system. This can lead to future system performance issues and harder to maintain programs.
Here I’ll be looking at the set up of your files and records (tables and columns for those still grasping UniData/UniVerse). Your system revolves around your data, so if you don’t get it right to start with it inevitably leads to a sub-optimal system. What I won’t be discussing here is the usual modulo/block-size related maintenance of your files; there is already literature in the manuals for this topic.
To start with, you should have already read my previous post about correctly setting up the layout of your files and the need to create all the relevant D-type dictionary items. With that in mind, I have a story for you…
This story is about Johnny and Alicia, who are both admin staff working for a sales company back in the 1930’s. Both have a large set of contracts that they store in folders in a filing cabinet.
Occasionally their managers will ask them to find a contract that is being handled by a certain sales rep. Although they hate this task, each time they manually search through the stack of contracts to retrieve it. Funnily enough, in the time it takes Johnny to find one, Alicia can usually find at least two.
Curiosity gets the better of Johnny who eventually asks Alicia how she was so fast.
“It’s easy, I have moved the page with the sale rep’s name to the front of the contract”
Dang! So simple! Johnny realised having to dig ten pages deep on each contract was so senseless!
Fortunately, admin staff can now use digital retrieval systems, so they don’t have to think about this sort of small detail any more. The need to pay attention to this detail hasn’t gone away though. Now it rests with us.
Not only should you ensure the layout of data is in the correct format, but you should also pay attention to the order of your data. It should be organised with the most frequently searched upon and utilised data earlier on in the record. Since the record fields are separated by delimiters, using and querying later attributes requires the engine to scan every character up until to the requested attribute to determine where it starts. By moving the most frequently used data to the being of a record, you reduce the amount of work required to initial find the data.
Here are some timings from a simple test run I performed on our system.
The setup: A file with modulo 10007, pre-filled with records keyed from 10000 to 99999. Attributes 1, 2, … up until 29 are each set to the key. I have created a D-type attribute for each one timed (D1, D2 & D29).
The test: Perform a select on the file with the attribute equal to a value (E.g. SELECT TIMINGS WITH D1=”12345″). Repeat this 1000 times for each attribute tested.
Data in <1>: 338655 (100.00%)
Data in <2>: 342134 (101.03%)
Data in <29>: 471811 (139.32%)
Even with these small records, you can see the difference you can achieve by having your data in the correct order. Scale this up to larger files with bigger records, more complex select statements combined with the processing of these records in your subroutine and it can provide a significant difference in the execution times across a system.
The general IT knowledge of security has come along way in the last 20 years. Even more dramatically when considering the last 10 years.
People are generally aware that unless due care is taken, their computer could be injected with a virus, have personal information stolen from it or even be used to facilitate crime. Major OS Vendors have picked up their game and now are putting in a better attempt to prevent compromises from the OS level. Sure, you still hear the odd story about the latest privilege escalation, but compared to what it use to be…
Network level security has been given most of the attention (and IT budget funding) and is *generally* fairly secure these days. Application level is where most of the major hacks are happening now, but unfortunately, corporate uptake on securing their systems at the Application level hasn’t been as good as it was with the Networks.
Let’s be honesty and not undersell ourselves, securing complex applications is no mean feat. It takes knowledge, planning, lots of time & patience and sometimes out-of-the-box thinking. Thankfully, most modern programming languages and Database Management Systems do the heavy lifting for us. From the security features built into C# and Java to the vastly improved safety net found in SQL engines with fine-grained access control and in-built functions for preventing SQL injection, a lot of the basics have been solved.
This is where the U2 family has a few gaps to be filled. UniBasic needs some inbuilt functions for sanitisation, UniObjects needs some form of access control built around it and UniQuery/RetrieVe prepared statements/stored procedures would be nice.
With the increase push in integrating U2 servers as databases for modern front-ends such as web applications, data sanitisation is going to become a prevalent topic in the community. Built-in functions for UniQuery/RetrieVe, SQL and HTML sanitisation/encoding would be welcome additions to the UniBasic command family. Even better would be some form of prepared statements for the query languages. This make it simpler and easier to obtain better program security.
UniObjects is touted as a standard method of connecting GUI application front-ends to a U2 back-end. However, due to the limited access control supported by UniObjects, it is a dangerous hole in your system to have the required port open for anything other than back-end servers. Take into considering user ‘X’. User ‘X’ has appropriate login credentials for the old green screen system. IT brings out a new Windows GUI application, lets say for reporting, that runs on the user’s machine and uses UniObjects to connect to U2. In the old green screen system, User ‘X’ was limited to set menus and programs to run and could not get access to ECL/TCL. With enough knowledge (and malice), User ‘X’ can now freely use his green screen login credentials to log into the U2 system via UniObjects read/write records directly and even execute raw ECL/TCL commands.
So what exactly is the problem with UniObjects? Quite simply put, it has no fine-grained server-side control of what actions can be done, or commands issued via UniObjects. As long as you can log in, you can get a free pass to the back-end’s data. Let’s take MsSQL as a counter example. You can create views, stored procedures, grant or deny users a suite of privileges to tables and commands. Essentially, UniData needs to be able to have some access control scheme for UniQuery that allow you to define whether the users and read/write records in certain files. Ideally, all read/writes would be done through U2 UniBasic subroutines, with RPC daemon having the ability to have a command ‘white-list’ setup. That way, all data access can be moderated with UniBasic code and the RPC daemon having a white-list that only allows access to calling those subroutines.
All this highlights an issue we need to overcome as a community. The lack of U2 specific security literature. Where is the UniData/UniVerse security manual? Where is the “Top 10 common security mistakes” for U2? Sadly, security does seem to be an afterthought. Sometimes even a ‘neverthought’.
In the last post I suggested that each piece of information in a file record needed an associated dictionary item.
Some may look at their files and realise it just cannot be done. In that case, “you’re doing it wrong”.
Common case: You have a file that logs transactions of some sort. For each transaction, it just appends it to the record, creating a new attribute.
There are several issues with this style of record structure.
Firstly. You cannot create dictionary items to reference any information (except of course, unless you create subroutine and call it from the dictionary). For example, if each transaction has a time-stamp, you cannot use UniQuery/RetrieVe to select all records with a certain time-stamp.
Secondly, any time you read in the record and need to count how many transactions are in the record, it needs to parse the entire record. Now, if you have each bit of information in a record stored in its own attribute (say time-stamp in , amount in , etc) it would only need to parse the first attribute, potentially cutting down on the CPU expense greatly.
So, if you must store some sort of transaction/log style data in a U2 record, please reconsider the traditional approach of appending the whole transaction to the end and take a more U2 perspective by splitting each bit of information into its own attribute. This way, it will be much easier to use U2’s inbuilt features when manipulating and reporting on your data.
Something that often gets overlooked in the U2 world is best practice regarding dictionaries.
Before I get into it however, a very brief introduction to dictionaries for those who are new to UniVerse and UniData.
SQL databases have a schema which defines what data can be stored and where it is stored in relation to the rest of the data. This means every bit of data has a related section in the schema with gives it a name and a type.
UniVerse and UniData do not do this. The schema (dictionary) is simply there to describe the data (as opposed to define). You can give sections of the data arbitrary names and/or data types. In fact, you can give the same location multiple names and types, or even create a dictionary item that describes multiple other sections! Each file uses another file called a dictionary to hold its ‘schema’ (Which, for the rest of this post, will no longer be called a schema since it is misleading).
According to the UniData “Using UniData” manual, it describes a dictionary as containing “a set of records that define the structure of the records in the data file, called D-type records”. Now, it is very important to remember this next point: The manual is at best overly optimistic and at worst flat-out lying.
In SQL (excluding implementations such as SQLITE), if you get a table schema and it informs you that the third column is an INTEGER and it was called ‘Age’, then it would be safe to assume it was what it said it was. In the worst case, you can be certain it won’t ever return a value of “Fred”. In UniVerse and UniData, the dictionary doesn’t even need to contain a record to describe the third attribute (an attribute is sort of like a column, but different).
Also of note to new players is that D-type records are not the only records in a dictionary file. There are 3 other types of records to consider. Once again, straight from the manual: ‘A dictionary may also contain phrases, called PH-type records, and items that calculate or manipulate data, called virtual fields, or V-type records. A user may also define a dictionary item to store user-defined data, called X-type records’.
What does this mean for you? Well, like most of U2, when looking at the records in a dictionary file, anything goes. Some could be accurately describing the file structure, while others could be getting fields from sections in a completely different file. Some again could have nothing to do with the data in the file at all and are merely there because a programmer has used it as a convenient dumping ground. Also to consider is the item being completely wrong.
There are 2 sides to this. 1) It can make development faster as you can just tack on extra bits of data with no maintenance work required. 2) As a result of 1, you can quickly find systems in a state where your records have mystery data and you cannot even begin to work it out without scouring through many programs and manually inspecting the data.
Even more confusing, is that you can have multiple records referring to the exact same location but describing the data differently.
This is where best practices come in. Here are several simple rules, that if followed, should go a long way to ensure your dictionary files are useful, accurate and easier to maintain.
- If you ever add new data to a record, then you MUST create at a minimum, a D-type record to describe each piece of data.
- Always check that an appropriate dictionary item doesn’t already exist before creating a new one to reference a section of data in an existing file
- If you come across a missing dictionary item, don’t ignore. Either create it or add it to whatever bug-tracking system you use.
- Remember, after the type in attribute 1, you can write anything. Use this to describe what the data is if the name isn’t sufficiently self descriptive.
- Also, if the data is effectively a foreign key for another file, use the end of attribute 1 to mark it as such (including the main file it references).
- Use the User-type record ability to add a single record that describes the general purpose/usage of the overall file. Give it a simple/recognisable name like README or FILEINFO
Sure, we may benefit from whatever shelter is derived from running a less widely used/understood system, but relying solely on security through obscurity with your U2 system is, to put it nicely, extremely naive in this day and age. It just doesn’t add up when you pay thousands of dollars for firewalls and other network security paraphernalia and wouldn’t dream of allowing raw user input through in your SQL-based applications.
U2 may have different syntactical spices and a different method of representing data than its mainstream counterparts but the core principles behind secure coding practices still apply:
- Reduce your attack surface.
- Assume all externally sourced data may be malicious.
- Apply the Principle of Least Privilege.
- Principle of Defence in Depth.
- Fail Securely.
The list goes on.
So, what specific vulnerabilities should we look out for?
Let us start with the humble EXECUTE/PERFORM statements in UniData and UniVerse. SQL Injection is a widely know subject, but how many U2 developers have considered UniQuery/RetrieVe Injection? Did you know that in some cases, malformed UniQuery in a EXECUTE can drop you to ECL?
As developers in the U2 world, the same lessons learnt in SQL Injection can and should be applied when using *EXECUTE/*PERFORM, etc. Sanitise your input!
Do you have any statements that work like this?
EXECUTE 'SELECT MYFILE WITH FIRST.NAME = "':TAINTED.INPUT:'"'
In this case, TAINTED.INPUT is either supplied by a user or is comes from an external source. The results from the SELECT statement are now compromised and can contain any data. Take for instance the following input for this contrived example:
" OR WITH CC.NUMBER = "4657000000000000
Essentially, this converts the innocent SELECT statement which, for example, was used to search for customer’s first names to get contact numbers, into one which can be used to find Credit Card numbers (hopefully though, your CC numbers are encrypted in some manner). Even worse, if your program displays error messages that reveal record names when they cannot be read, then an attacker with patience can reveal almost any data they want from your system.
Remember, in UniData you can use the ‘USING’ keyword to specify any FILE to source the dictionaries (UniVerse does not have this, I believe). Aside the all the usual manipulation of results, the USING means that if someone can control the first few lines of a record (temp data dumping file anyone?) then using the SUBR() call, they can even cause programs and subroutines to be called!
Before you EVER use input from a user or an external source, make sure it is validated and sanitised. Expecting a number? Use MATCH and 1N0N. Expecting a name? Make sure it is doesn’t contain double quotes. Don’t want to allow ‘searching’ with your SELECT? For example, I use the following check to ensure the user input doesn’t escape the SELECT string with double quotes or attempt a wildcard search with [ or ].
IF TAINTED.INPUT MATCH "~'['0X~']'" AND NOT(INDEX(PCNAME, '"', 1)) THEN
Further to this UniQuery/RetrieVe Injection vulnerability, earlier I mentioned that in certain situations you could cause it to crash to ECL.
Are you running UniData in PICK mode? If you are, I suggest you type ‘UDT.OPTIONS’ at ECL right now and continue down until you see whether ’41 U_UDT_SERVER’ is set to ON or OFF. Now, did it say OFF? If so, then read on because you may be vulnerable.
While that option is turned off, certain malformed UniQuery statements can cause you to crash straight to ECL, even if you are in a program called by a program.
Lets see an example. First, compile the following program in Pick Mode.
CRT "Enter the program to select"
CRT "Executing query..."
EXECUTE 'SELECT BP WITH @ID="':PROG.NAME:'"' CAPTURING INFO RETURNING RESULT
CRT "Checking results..."
IF SYSTEM(11) > 0 THEN
CRT "Program FOUND!"
CRT "Program doesn't exist"
Now, when you run the program (I called it CRASHTEST), put a record ID that exists in BP. Try again with a record ID that doesn’t exist. Your results should be something like this:
Looks good. Program works as expected. Underneath this simple program though, lies a
bug feature in the Pick Parser. To show this, I will use a specifically formed input that will make the programs SELECT malformed in a manner to gain ECL access. This time when you run the program (with UDT.OPTIONS 41 off) type in this input, including quotes:
In this case, the program will crash out before ever returning.
There are 2 ways to deal with this. The first way is to set UDT.OPTIONS 41 to ON. This will result in the EXECUTE returning so we can handle it whatever way we wish.
The other way is to set ON.ABORT. I created a VOC paragraph for this, called EXCEPTION as follows:
DISPLAY This program aborted
Running the same test above now results in:
Personally, I believe the method that returns control to the program (UDT.OPTIONS 41) and handles it accordingly (always check what was set by RETURNING) is the safest since it doesn’t give away that the program has a compromised EXECUTE statement. However, this may not always be a viable option, so make sure you at least have ON.ABORT set.