One of the benefits with U2 Data servers is that it can be extremely quick to turn-around a new system. The unfortunate downside is that this makes it extremely easy to ignore the architecture of your system. This can lead to future system performance issues and harder to maintain programs.
Here I’ll be looking at the set up of your files and records (tables and columns for those still grasping UniData/UniVerse). Your system revolves around your data, so if you don’t get it right to start with it inevitably leads to a sub-optimal system. What I won’t be discussing here is the usual modulo/block-size related maintenance of your files; there is already literature in the manuals for this topic.
To start with, you should have already read my previous post about correctly setting up the layout of your files and the need to create all the relevant D-type dictionary items. With that in mind, I have a story for you…
This story is about Johnny and Alicia, who are both admin staff working for a sales company back in the 1930′s. Both have a large set of contracts that they store in folders in a filing cabinet.
Occasionally their managers will ask them to find a contract that is being handled by a certain sales rep. Although they hate this task, each time they manually search through the stack of contracts to retrieve it. Funnily enough, in the time it takes Johnny to find one, Alicia can usually find at least two.
Curiosity gets the better of Johnny who eventually asks Alicia how she was so fast.
“It’s easy, I have moved the page with the sale rep’s name to the front of the contract”
Dang! So simple! Johnny realised having to dig ten pages deep on each contract was so senseless!
Fortunately, admin staff can now use digital retrieval systems, so they don’t have to think about this sort of small detail any more. The need to pay attention to this detail hasn’t gone away though. Now it rests with us.
Not only should you ensure the layout of data is in the correct format, but you should also pay attention to the order of your data. It should be organised with the most frequently searched upon and utilised data earlier on in the record. Since the record fields are separated by delimiters, using and querying later attributes requires the engine to scan every character up until to the requested attribute to determine where it starts. By moving the most frequently used data to the being of a record, you reduce the amount of work required to initial find the data.
Here are some timings from a simple test run I performed on our system.
The setup: A file with modulo 10007, pre-filled with records keyed from 10000 to 99999. Attributes 1, 2, … up until 29 are each set to the key. I have created a D-type attribute for each one timed (D1, D2 & D29).
The test: Perform a select on the file with the attribute equal to a value (E.g. SELECT TIMINGS WITH D1=”12345″). Repeat this 1000 times for each attribute tested.
Data in <1>: 338655 (100.00%)
Data in <2>: 342134 (101.03%)
Data in <29>: 471811 (139.32%)
Even with these small records, you can see the difference you can achieve by having your data in the correct order. Scale this up to larger files with bigger records, more complex select statements combined with the processing of these records in your subroutine and it can provide a significant difference in the execution times across a system.
The general IT knowledge of security has come along way in the last 20 years. Even more dramatically when considering the last 10 years.
People are generally aware that unless due care is taken, their computer could be injected with a virus, have personal information stolen from it or even be used to facilitate crime. Major OS Vendors have picked up their game and now are putting in a better attempt to prevent compromises from the OS level. Sure, you still hear the odd story about the latest privilege escalation, but compared to what it use to be…
Network level security has been given most of the attention (and IT budget funding) and is *generally* fairly secure these days. Application level is where most of the major hacks are happening now, but unfortunately, corporate uptake on securing their systems at the Application level hasn’t been as good as it was with the Networks.
Let’s be honesty and not undersell ourselves, securing complex applications is no mean feat. It takes knowledge, planning, lots of time & patience and sometimes out-of-the-box thinking. Thankfully, most modern programming languages and Database Management Systems do the heavy lifting for us. From the security features built into C# and Java to the vastly improved safety net found in SQL engines with fine-grained access control and in-built functions for preventing SQL injection, a lot of the basics have been solved.
This is where the U2 family has a few gaps to be filled. UniBasic needs some inbuilt functions for sanitisation, UniObjects needs some form of access control built around it and UniQuery/RetrieVe prepared statements/stored procedures would be nice.
With the increase push in integrating U2 servers as databases for modern front-ends such as web applications, data sanitisation is going to become a prevalent topic in the community. Built-in functions for UniQuery/RetrieVe, SQL and HTML sanitisation/encoding would be welcome additions to the UniBasic command family. Even better would be some form of prepared statements for the query languages. This make it simpler and easier to obtain better program security.
UniObjects is touted as a standard method of connecting GUI application front-ends to a U2 back-end. However, due to the limited access control supported by UniObjects, it is a dangerous hole in your system to have the required port open for anything other than back-end servers. Take into considering user ‘X’. User ‘X’ has appropriate login credentials for the old green screen system. IT brings out a new Windows GUI application, lets say for reporting, that runs on the user’s machine and uses UniObjects to connect to U2. In the old green screen system, User ‘X’ was limited to set menus and programs to run and could not get access to ECL/TCL. With enough knowledge (and malice), User ‘X’ can now freely use his green screen login credentials to log into the U2 system via UniObjects read/write records directly and even execute raw ECL/TCL commands.
So what exactly is the problem with UniObjects? Quite simply put, it has no fine-grained server-side control of what actions can be done, or commands issued via UniObjects. As long as you can log in, you can get a free pass to the back-end’s data. Let’s take MsSQL as a counter example. You can create views, stored procedures, grant or deny users a suite of privileges to tables and commands. Essentially, UniData needs to be able to have some access control scheme for UniQuery that allow you to define whether the users and read/write records in certain files. Ideally, all read/writes would be done through U2 UniBasic subroutines, with RPC daemon having the ability to have a command ‘white-list’ setup. That way, all data access can be moderated with UniBasic code and the RPC daemon having a white-list that only allows access to calling those subroutines.
All this highlights an issue we need to overcome as a community. The lack of U2 specific security literature. Where is the UniData/UniVerse security manual? Where is the “Top 10 common security mistakes” for U2? Sadly, security does seem to be an afterthought. Sometimes even a ‘neverthought’.