Archive

Archive for the ‘Applications’ Category

Replacing Legacy Reporting with U2 DataVu

November 5, 2011 2 comments

International Spectrum has published the first article I have ever written for a magazine.

The title of the article is “Replacing Legacy Reporting with U2 DataVu” and you can find it here on page 12.

Here is a quick tease:

We all know what they look like: hard-to-read reports with mono-spaced fonts and — aside from the columns of text and the occasional company header — completely barren. More often than not, customers must log into a terminal session in order to generate, print, or view the reports. These reports are almost never available in easily consumable or distributable formats such as PDF.

Let me know what you think!

Advertisements

Application Level Caching

August 26, 2011 Leave a comment

Everyone here probably knows the various levels of caching that exist on a modern computer: From multiple CPU caches through to disk cache and even caching in the database engine itself. If you want to quickly touch up on some caching concepts/terminology, check out this short slide deck from Serhiy Oplakanets on Caching Basics

What I’m going to do shortly is outline some other methods of gaining significant performance improvements on your UniData and UniVerse systems.

There really isn’t anything special outside of U2 that you will need to do to get benefits from this, although a few extra tricks that do require either additional hardware or OS work can give quite a boost

First, just to make sure everyone is on the same page: Since UniData and UniVerse support hash-tables as their file (table) structure, you can simply use a file as a gloried key-value store. Key-value stores are ideal for caching.

I’ve dividing this post into 4 sections:

  1. Session Level Caching
  2. Account Level Caching
  3. Improving the above (SSD and RAM Disk)
  4. In Summary

Let me know what you think.

 

Session Level Caching

 

COMMON provides a method of keeping a small in-memory cache for the entire duration of a session. Simply declare an array in a named common block and away you go.

A real world example, I’ve seen this used for when a dictionary item made a SUBR call to a subroutine that in turn would read a multitude of control items to process the original record. This dictionary item was called nightly by an external reporting tool on a large number of records.

The original solution had an unacceptable run-time and after some profiling, it was determined that the READs of the control items were the main culprit. Since it was known that the control items would not change (and should not) during the processing, it was determined that caching the control items in-memory after they were read would reduce the run-time.

The solution involved: An array of ‘x’ elements. When a control item needed to be read in, it checked this array via a simple look-up and if it existed, it used it. If not, it would read it from disk and store it in the array.

The result: 10+ hour run-time was now less than 1 hour.

 

Account Caching

 

Alright, so you have a system that needs to handle some messages (perhaps via some form of SOAP/REST web service) The majority are read requests with a few write requests for good measure.

One of these messages is to ‘Get Products’. This message returns a list of products (ID, name and latest available version) that a customer currently has.

In your system, there are 2 files used by this request. ‘CUSTOMERS’ and ‘PRODUCTS’. CUSTOMERS<30> is a multivalued list of record ids for ‘PRODUCTS’. PRODUCTS<1> is the name of the product and PRODUCTS<11> is the latest available version.

Traditionally for each ‘Get Products’ request your system would read in the appropriate record then read in all the linked records from PRODUCTS to compile the response to the query. Assuming an average customer has 10 products, the average disk reads for this query is 11

Now this query is being called a lot, all these extra disk reads and processing are beginning to cause performance impacts. Thankfully, because your database supports key-value storage, you can quickly implement a cache to sit in between the receipt of the message and the processing.

All that is needed is a new file called ‘CACHE.GETPRODUCTS’. @ID is the CUSTOMERS id requested in the query, <1> is the date requested, <2> is the time requested and <3> is the response generated

Now, when ‘Get Products’ query is received, it will first do a read of the cache file and if it exists, simply return <3>. If the entry doesn’t exist, it will hand the request/response off to the usual processing routine. The subsequent request will then be stored in the cache before being returned.

Assuming the average declared above, a cache hit will result in 1 disk read and a cache miss will result in 12 disk reads and 1 write. If – for ease of math – we treat a write equal to a read, you only need a 16.7% Cache hit rate for it to perform better. That isn’t even taking in to considering CPU usage reduction, better disk cache performance, etc.

How you handle cache invalidation is dependent on your situation. It could be as simple as clearing it every ‘x’ period, as straight forward ignoring the cache record if it is older than ‘y’ time or as complex as individually invalidating records based on when the appropriate records in CUSTOMERS or PRODUCTS change.

What has been implemented here is a cache that is available not only in the current session, but to any program running or that will be run in the account(s) that have access to this cache file.

 

Improving the above

 

Okay, so you have a more intensive system than the above and you have determined caching can help you out. The problem is, even with the caching it still doesn’t meet your requirements and disk has been determined to be the biggest bottleneck.

You have 2 next steps that can be implemented easily.

The Disk Approach

Simple drop in a shiny new SSD drive or a WD Raptor and move the cache files over there. No need to back them up, mirror them or anything else as caching files are temporary data. As long as your system is setup to recreate them if missing on start-up and treat it as a cache miss if unavailable during operation, you are all set.

The benefit here is faster disk access as well as moving the activity off on to another device/bus.

The RAM approach

Instead of adding new hardware, perhaps you’d prefer to spare 64MB of RAM to the cause. In this case, you would simply create a RAM Drive and move the cache files there. You have now essentially created a RAM based key-value store to use as your heart desires.

For an example of what type of improvements this can have, I took the DOSAC test I previously created and ran it twice. Once with the file on traditional disk and once with the file on RAM Disk. The system stats are identical to last time I ran the test, except it was on Fedora (it comes with multiple 16MB RAM disks pre-configured).

RAM DOSAC Results

RAM DOSAC Results

That’s right: Massive improvements, as expected (excuse the display text bug).

 

In Summary

 

So, keep this in mind. U2 Databases give you some great flexibility in how you implement your software. Knowing the options available is crucial to being able to get the best results.

As the saying goes, measure twice, cut once. Work out what your performance bottlenecks are then determine the best solution. Some times it is better hardware, sometimes it is code clean up. Sometimes… it might just call for caching.

The problem with numbers

October 11, 2010 4 comments

UniData, like other weakly-typed systems, makes some programming tasks easier by not needing the developer to declare and adhere to a data type with variables. The general pros and cons of this have been debated many times across many languages and hence will not discussed here. What will be discussed is specific cases where this can cause you unexpected headaches.

A posting was made on the u2ug forums by Koglan Naicker about some unexpected issues when finding duplicates in a data-set.

In short, he found that when strings contained large numbers, it would sometimes incorrectly evaluate two different strings as equal. For example:

IF '360091600172131297' EQ '360091600172131299' THEN CRT "Equal"

The above code results in “Equal” being displayed on the screen. This is caused by a combination of 2 factors.

The first being that UniData is weakly typed. This means that it does not explicitly distinguish between strings and numbers, but attempts to determine the data type by examining the data. In this case, since the strings are numeric, it automatically treats them as numbers.

The second part of this issue is because now that it is treating those 2 strings as numbers, it needs to handle them in an appropriate data type on the CPU. Since the 2 strings are too large to be treated as an integer, they get converted to a floating-point number. Due to rounding that occurs, this actually results in both of these strings being converted to the same float-point representation! A better method may have been to use something such as Bignum instead of converted to floating-point. There would be a speed trade-off, but surely that would have been better than potentially incorrect programs.

Some people suggest prefixing or appending a non-number character to each string to force them to be treated as a string. Not entirely elegant and can have performance implications. Fortunately, UniData does have proper functions to handle these situations. In the case where you will be comparing strings that may consist of only numeric characters, you should use the SCMP function. This function compares two strings as strings, regardless of the actual data in them. Using this when you need to be certain how the comparison is performed can save you a lot of headaches in the future.

Also of interest is that this issue doesn’t just apply to UniBasic, but can also affect UniQuery!

It should be noted though, this only affects UniQuery when the dictionary item is right-aligned with the format field (eg, 20R in attribute 5).

You can tested this by creating a file and creating 3 records with the @ID of ‘360091600172130474’, ‘360091600172131297’ and ‘360091600172131299’.

Now, select upon the file where the @ID = ‘360091600172131297″ and you can see that 2 records are returned!

Results of selection

Non Unique Select

When explicitly selected a record via a unique key, this isn’t the result a database should return.

So, when dealing with large, potentially numeric fields with UniQuery, you may need 2 dictionary items. A left-aligned one for selecting on and a right-aligned one if you require numerical sorting.

Crouching Null, Hidden Bug

May 2, 2010 1 comment

Null, (actually just an empty string “” in U2) is a valid value. Normally, it would be treated exactly the same as other normal values, such as 1 or “1”, but it isn’t always.

I’ve seen a few bugs that have been created by not understand the differences in how nulls are treated. When debugging the code can look completely valid as well, meaning it takes even longer to identify and rectify the issue.

Okay, lets see how you go. I’ll give you a few series of records, each created with the same data. Your job, is to work out which records in the series will exactly match their ‘CONTROL.REC’. Good luck and try to do it without needing to compile the code!

Series 1:

 CONTROL.REC = "" : @AM : "A"

 DIM REC(3)

 REC(0) = ""
 REC(0)<2> = "A"

 REC(1) = ""
 REC(1)<-1> = "A"

 REC(2) = ""
 INS "A" BEFORE REC(2)<2>

 REC(3) = "A"
 INS "" BEFORE REC(3)<1>


Series 2:

 CONTROL.REC = "A" : @AM : ""

 DIM REC(3)

 REC(0) = "A"
 REC(0)<2> = ""

 REC(1) = "A"
 REC(1)<-1> = ""

 REC(2) = "A"
 INS "" BEFORE REC(2)<2>

 REC(3) = ""
 INS "A" BEFORE REC(3)<1>


Series 3:

 CONTROL.REC = "" : @AM : "A" : @AM : ""

 DIM REC(3)

 REC(0) = ""
 REC(0)<2> = "A"
 REC(0)<3> = ""

 REC(1) = ""
 REC(1)<-1> = "A"
 REC(1)<-1> = ""

 REC(2) = ""
 INS "A" BEFORE REC(2)<1>
 INS "" BEFORE REC(2)<1>

 REC(3) = ""
 INS "A" BEFORE REC(3)<2>
 INS "" BEFORE REC(3)<3>


Series 4:

 CONTROL.REC = "A" : @AM : "" : @AM : "A"

 DIM REC(3)

 REC(0) = "A"
 REC(0)<2> = ""
 REC(0)<3> = "A"

 REC(1) = "A"
 REC(1)<-1> = ""
 REC(1)<-1> = "A"

 REC(2) = "A"
 INS "" BEFORE REC(2)<1>
 INS "A" BEFORE REC(2)<1>

 REC(3) = "A"
 INS "" BEFORE REC(3)<2>
 INS "A" BEFORE REC(3)<3>


Series 5:

 CONTROL.REC = "A" : @AM : "" : @AM : ""

 DIM REC(4)

 REC(0) = "A"
 REC(0)<2> = ""
 REC(0)<3> = ""

 REC(1) = "A"
 REC(1)<-1> = ""
 REC(1)<-1> = ""

 REC(2) = "A"
 INS "" BEFORE REC(2)<2>
 INS "" BEFORE REC(2)<2>

 REC(3) = "A"
 INS "" BEFORE REC(3)<2>
 INS "" BEFORE REC(3)<3>
 
 REC(4) = ""
 INS "" BEFORE REC(4)<1>
 INS "A" BEFORE REC(4)<1>


Series 6:

 CONTROL.REC = "A" : @AM : "B" : @AM : "C"

 DIM REC(4)

 REC(0) = "A"
 REC(0)<2> = "B"
 REC(0)<3> = "C"

 REC(1) = "A"
 REC(1)<-1> = "B"
 REC(1)<-1> = "C"

 REC(2) = "A"
 INS "C" BEFORE REC(2)<2>
 INS "B" BEFORE REC(2)<2>

 REC(3) = "A"
 INS "B" BEFORE REC(3)<2>
 INS "C" BEFORE REC(3)<3>
 
 REC(4) = "C"
 INS "B" BEFORE REC(4)<1>
 INS "A" BEFORE REC(4)<1>


Did you get them all? I’ve put the answers as a comment so you can check them if you want. I’d be surprised if you got them all right…

So, what should you take away from this? <-1> and INS can give you (or others!) a world of headaches if you don’t understand their peculiarities with null values.

Final Note: In UniVerse INS behaviour for some cases is dependant on the flavour your account is running in and the $OPTIONS EXTRA.DELIM setting.

%d bloggers like this: