This is very similar to the previous version column method, but it uses the base data itself to compute a “virtual” version column. I’ll quote the Oracle Database PL/SQL Packages and Types Reference manual (before showing how to use one of the supplied packages) to help explain the goal and concepts behind a checksum or hash function:
A one-way hash function takes a variable-length input string, the data, and converts it to a fixed-length (generally smaller) output string called a hash value. The hash value serves as a unique identifier (like a fingerprint) of the input data. You can use the hash value to verify whether data has been changed or not.
Note that a one-way hash function is a hash function that isn’t easily revers-ible. It is easy to compute a hash value from the input data, but it is hard to generate data that hashes to a particular value.
We can use these hashes or checksums in the same way that we used our version column. We simply compare the hash or checksum value we obtain when we read data out of the database with that we obtain before modifying the data. If someone modified the row’s values after we read it out, but before we updated it, then the hash or checksum will almost certainly be different.
There are many ways to compute a hash or checksum. I’ll list several of these and demonstrate one in this section. All of these methods are based on supplied database functionality:
•\ DBMS_CRYPTO.HASH: This method is capable of computing a Secure Hash Algorithm 1 (SHA-1) or MD4/MD5 message digests. It is recommended that you use the SHA-1 algorithm.
•\ DBMS_SQLHASH.GETHASH: This method supports hash algorithms of SHA-1, MD4, and MD5. As a SYSDBA privileged user, you must grant execute on this package to a user before they can access it. This package is documented in the Oracle Database Security Guide.
•\ ORA_HASH: This method is a built-in SQL function that takes a VARCHAR2 value as input and (optionally) another pair of inputs that control the return value. The returned value is a number—by default a number between 0 and 4294967295.
•\ STANDARD_HASH: This is a built-in SQL function that computes a hash value on an expression using standard hash algorithms such as SHA1 (default), SHA256, SHA384, SHA512, and MD5. The returned value is a RAW datatype.
Note An array of hash and checksum functions are available in many programming languages, so there may be others at your disposal outside the database. That said, if you use built-in database capabilities, you will have increased your portability (to new languages, new approaches) in the future.
The following example shows how you might use the ORA_HASH built-in function to compute these hashes/checksums. The technique would also be applicable for the other listed approaches; the logic would not be very much different, but the APIs you call would be. First, we’ll start by removing the column we used in the previous example:
$ sqlplus eoda/foo@PDB1
SQL> alter table dept drop column last_mod; Table altered.
And then have our application query and display the information for department 10. Note that while we query the information, we compute the hash using the ORA_HASH built-in. This is the version information that we retain in our application. The following is our code to query and display:
SQL> variable deptno number
SQL> variable dname varchar2(14)
SQL> variable loc varchar2(13)
SQL> variable hash number
SQL> begin select deptno, dname, loc, ora_hash( dname || ‘/’ || loc ) hash into :deptno, :dname, :loc, :hash from dept where deptno = 10;end; /
PL/SQL procedure successfully completed.
SQL> select :deptno, :dname, :loc, :hash from dual;
:DEPTNO :DNAME :LOC :HASH
10 Accounting NEW YORK 2721972020
As you can see, the hash is just some number. It is the value we would want to use before updating. To update that row, we would lock the row in the database as it exists right now and then compare the hash value of that row with the hash value we computed when we read the data out of the database. The logic for doing so could look like the following:
SQL> exec :dname := lower(:dname); PL/SQL procedure successfully completed.
SQL> update deptset dname = :dnamewhere deptno = :deptnoand ora_hash( dname || ‘/’ || loc ) = :hash/1 row updated.
SQL> select dept.*,ora_hash( dname || ‘/’ || loc ) hash from dept where deptno = :deptno;
DEPTNO DNAME LOC HASH
10 accounting NEW YORK 2818855829
Upon requerying the data and computing the hash again after the update, we can see that the hash value is different. If someone had modified the row before we did, our hash values would not have compared. We can see this by attempting our update again, using the old hash value we read out the first time:
SQL> update deptset dname = :dnamewhere deptno = :deptnoand ora_hash( dname || ‘/’ || loc ) = :hash/0 rows updated.
As you can see, there were zero rows updated, since our hash value did not match the data currently in the database.
In order for this hash-based approach to work properly, we must ensure every application uses the same approach when computing the hash; specifically, they must concatenate dname with “/” with loc—in that order. To make that approach universal, I would suggest adding a virtual column to the table or using a view to add a column, so that the function is hidden from the application itself. Adding a column would look like this:
SQL> alter table dept add hash as( ora_hash(dname || ‘/’ || loc ) ); Table altered.
SQL> select * from dept where deptno = :deptno;
DEPTNO DNAME LOC HASH
10 accounting NEW YORK 2818855829
The added column is a virtual column and as such incurs no storage overhead. The value is not computed and stored on disk. Rather, it is computed upon retrieval of the data from the database.
This example showed how to implement optimistic locking with a hash or checksum. You should bear in mind that computing a hash or checksum is a somewhat CPU- intensive operation; it is computationally expensive. On a system where CPU bandwidth is a scarce resource, you must take this fact into consideration. However, this approach is much more network-friendly because the transmission of a relatively small hash instead of a before and after image of the row (to compare column by column) over the network will consume much less of that resource.
Leave a Reply