[Phpgroupware-cvs] phpgwapi/inc/adodb/docs/tips_portable

phpgroupware-cvs
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Phpgroupware-cvs] phpgwapi/inc/adodb/docs/tips_portable_sql.htm, 1.2

From:	nomail
Subject:	[Phpgroupware-cvs] phpgwapi/inc/adodb/docs/tips_portable_sql.htm, 1.2
Date:	Thu, 30 Dec 2004 07:47:32 +0100
Update of /phpgwapi/inc/adodb/docs
Added Files:
        Branch: 
          tips_portable_sql.htm

date: 2004/12/30 06:47:32;  author: skwashd;  state: Exp;  lines: +362 -0

Log Message:
new HEAD
=====================================================================
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<html>
<head>
        <title>Tips on Writing Portable SQL for Multiple Databases for 
PHP</title>
</head>

<body bgcolor=white>
<table width=100% border=0><tr><td><h2>Tips on Writing Portable SQL 
&nbsp;</h2></td><td>
 <div align=right><img 
src="cute_icons_for_site/adodb.gif"></div></td></tr></table>
  <p>Updated 18 Sep 2003. Added Portable Native SQL section.
<p>

 If you are writing an application that is used in multiple environments and 
  operating systems, you need to plan to support multiple databases. This 
article 
  is based on my experiences with multiple database systems, stretching from 
4th 
  Dimension in my Mac days, to the databases I currently use, which are: 
Oracle, 
  FoxPro, Access, MS SQL Server and MySQL. Although most of the advice here 
applies 
  to using SQL with Perl, Python and other programming languages, I will focus 
on PHP and how 
  the <a href="http://adodb.sourceforge.net/";>ADOdb</a> database abstraction 
library 
  offers some solutions.<p></p>
<p>Most database vendors practice product lock-in. The best or fastest way to 
  do things is often implemented using proprietary extensions to SQL. This 
makes 
  it extremely hard to write portable SQL code that performs well under all 
conditions. 
  When the first ANSI committee got together in 1984 to standardize SQL, the 
database 
  vendors had such different implementations that they could only agree on the 
  core functionality of SQL. Many important application specific requirements 
  were not standardized, and after so many years since the ANSI effort began, 
  it looks as if much useful database functionality will never be standardized. 
  Even though ANSI-92 SQL has codified much more, we still have to implement 
portability 
  at the application level.</p>
<h3><b>Selects</b></h3>
<p>The SELECT statement has been standardized to a great degree. Nearly every 
  database supports the following:</p>
<p>SELECT [cols] FROM [tables]<br>
  &nbsp;&nbsp;[WHERE conditions]<br>
  &nbsp; [GROUP BY cols]<br>
  &nbsp; [HAVING conditions] <br>
  &nbsp; [ORDER BY cols]</p>
<p>But so many useful techniques can only be implemented by using proprietary 
  extensions. For example, when writing SQL to retrieve the first 10 rows for 
  paging, you could write...</p>
<table width="80%" border="1" cellspacing="0" cellpadding="0" align="center">
  <tr> 
        <td><b>Database</b></td>
        <td><b>SQL Syntax</b></td>
  </tr>
  <tr> 
        <td>DB2</td>
        <td>select * from table fetch first 10 rows only</td>
  </tr>
  <tr> 
        <td>Informix</td>
        <td>select first 10 * from table</td>
  </tr>
  <tr> 
        <td>Microsoft SQL Server and Access</td>
        <td>select top 10 * from table</td>
  </tr>
  <tr> 
        <td>MySQL and PostgreSQL</td>
        <td>select * from table limit 10</td>
  </tr>
  <tr> 
        <td>Oracle 8i</td>
        <td>select * from (select * from table) where rownum &lt;= 10</td>
  </tr>
</table>
<p>This feature of getting a subset of data is so useful that in the PHP class 
  library ADOdb, we have a SelectLimit( ) function that allows you to hide the 
  implementation details within a function that will rewrite your SQL for 
you:</p>
<pre>$connection-&gt;SelectLimit('select * from table', 10);
</pre>
<p><b>Selects: Fetch Modes</b></p>
<p>PHP allows you to retrieve database records as arrays. You can choose to 
have 
  the arrays indexed by field name or number. However different low-level PHP 
  database drivers are inconsistent in their indexing efforts. ADOdb allows you 
  to determine your prefered mode. You set this by setting the variable 
$ADODB_FETCH_MODE 
  to either of the constants ADODB_FETCH_NUM (for numeric indexes) or 
ADODB_FETCH_ASSOC 
  (using field names as an associative index).</p>
<p>The default behaviour of ADOdb varies depending on the database you are 
using. 
  For consistency, set the fetch mode to either ADODB_FETCH_NUM (for speed) or 
  ADODB_FETCH_ASSOC (for convenience) at the beginning of your code. </p>
<p><b>Selects: Counting Records</b></p>
<p>Another problem with SELECTs is that some databases do not return the number 
  of rows retrieved from a select statement. This is because the highest 
performance 
  databases will return records to you even before the last record has been 
found. 
</p>
<p>In ADOdb, RecordCount( ) returns the number of rows returned, or will 
emulate 
  it by buffering the rows and returning the count after all rows have been 
returned. 
  This can be disabled for performance reasons when retrieving large recordsets 
  by setting the global variable $ADODB_COUNTRECS = false. This variable is 
checked 
  every time a query is executed, so you can selectively choose which 
recordsets 
  to count.</p>
<p>If you prefer to set $ADODB_COUNTRECS = false, ADOdb still has the 
PO_RecordCount( 
  ) function. This will return the number of rows, or if it is not found, it 
will 
  return an estimate using SELECT COUNT(*):</p>
<pre>$rs = $db-&gt;Execute(&quot;select * from table where state=$state&quot;);
$numrows = $rs-&gt;PO_RecordCount('table', &quot;state=$state&quot;);</pre>
<p><b>Selects: Locking</b> </p>
<p>SELECT statements are commonly used to implement row-level locking of 
tables. 
  Other databases such as Oracle, Interbase, PostgreSQL and MySQL with InnoDB 
  do not require row-level locking because they use versioning to display data 
  consistent with a specific point in time.</p>
<p>Currently, I recommend encapsulating the row-level locking in a separate 
function, 
  such as RowLock($table, $where):</p>
<pre>$connection-&gt;BeginTrans( );
$connection-&gt;RowLock($table, $where); </pre>
<pre><font color=green># some operation</font></pre>
<pre>if ($ok) $connection-&gt;CommitTrans( );
else $connection-&gt;RollbackTrans( );
</pre>
<p><b>Selects: Outer Joins</b></p>
<p>Not all databases support outer joins. Furthermore the syntax for outer 
joins 
  differs dramatically between database vendors. One portable (and possibly 
slower) 
  method of implementing outer joins is using UNION.</p>
<p>For example, an ANSI-92 left outer join between two tables t1 and t2 could 
  look like:</p>
<pre>SELECT t1.col1, t1.col2, t2.cola <br>  FROM t1 <i>LEFT JOIN</i> t2 ON 
t1.col = t2.col</pre>
<p>This can be emulated using:</p>
<pre>SELECT t1.col1, t1.col2, t2.cola FROM t1, t2 <br>     WHERE t1.col = 
t2.col 
   UNION ALL
SELECT col1, col2, null FROM t1 <br>       WHERE t1.col not in (select distinct 
col from t2)
</pre>
<p>Since ADOdb 2.13, we provide some hints in the connection object as to legal 
  join variations. This is still incomplete and sometimes depends on the 
database 
  version you are using, but is useful as a general guideline:</p>
<p><font face="Courier New, Courier, mono">$conn-&gt;leftOuter</font>: holds 
the 
  operator used for left outer joins (eg. '*='), or false if not known or not 
  available.<br>
  <font face="Courier New, Courier, mono">$conn-&gt;rightOuter</font>: holds 
the 
  operator used for right outer joins (eg '=*'), or false if not known or not 
  available.<br>
  <font face="Courier New, Courier, mono">$conn-&gt;ansiOuter</font>: boolean 
  that if true means that ANSI-92 style outer joins are supported, or false if 
  not known.</p>
<h3><b>Inserts</b> </h3>
<p>When you create records, you need to generate unique id's for each record. 
  There are two common techniques: (1) auto-incrementing columns and (2) 
sequences. 
</p>
<p>Auto-incrementing columns are supported by MySQL, Sybase and Microsoft 
Access 
  and SQL Server. However most other databases do not support this feature. So 
  for portability, you have little choice but to use sequences. Sequences are 
  special functions that return a unique incrementing number every time you 
call 
  it, suitable to be used as database keys. In ADOdb, we use the GenID( ) 
function. 
  It has takes a parameter, the sequence name. Different tables can have 
different 
  sequences. </p>
<pre>$id = 
$connection-&gt;GenID('sequence_name');<br>$connection-&gt;Execute(&quot;insert 
into table (id, firstname, lastname) <br>                       values ($id, 
$firstname, $lastname)&quot;);</pre>
<p>For databases that do not support sequences natively, ADOdb emulates 
sequences 
  by creating a table for every sequence.</p>
<h3><b>Binding</b></h3>
<p>Binding variables in an SQL statement is another tricky feature. Binding is 
  useful because it allows pre-compilation of SQL. When inserting multiple 
records 
  into a database in a loop, binding can offer a 50% (or greater) speedup. 
However 
  many databases such as Access and MySQL do not support binding natively and 
  there is some overhead in emulating binding. Furthermore, different databases 
  (specificly Oracle!) implement binding differently. My recommendation is to 
  use binding if your database queries are too slow, but make sure you are 
using 
  a database that supports it like Oracle. </p>
<p>ADOdb supports portable Prepare/Execute with:</p>
<pre>$stmt = $db-&gt;Prepare('select * from customers where custid=? and 
state=?');
$rs = $db-&gt;Execute($stmt, array($id,'New York'));</pre>
<p>Oracle uses named bind placeholders, not "?", so to support portable 
binding, we have Param() that generates 
the correct placeholder (available since ADOdb 3.92):
<pre><font color="#000000">$sql = <font color="#993300">'insert into table 
(col1,col2) values ('</font>.$DB-&gt;Param('a').<font 
color="#993300">','</font>.$DB-&gt;Param('b').<font color="#993300">')'</font>;
<font color="#006600"># generates 'insert into table (col1,col2) values (?,?)'
# or        'insert into table (col1,col2) values (:a,:b)</font>'
$stmt = $DB-&gt;Prepare($sql);
$stmt = $DB-&gt;Execute($stmt,array('one','two'));
</font></pre>
<a name="native"></a>
<h2>Portable Native SQL</h2>
<p>ADOdb provides the following functions for portably generating SQL functions 
  as strings to be merged into your SQL statements (some are only available 
since 
  ADOdb 3.92): </p>
<table width="75%" border="1" align=center>
  <tr> 
    <td width=30%><b>Function</b></td>
    <td><b>Description</b></td>
  </tr>
  <tr> 
    <td>DBDate($date)</td>
    <td>Pass in a UNIX timestamp or ISO date and it will convert it to a date 
      string formatted for INSERT/UPDATE</td>
  </tr>
  <tr> 
    <td>DBTimeStamp($date)</td>
    <td>Pass in a UNIX timestamp or ISO date and it will convert it to a 
timestamp 
      string formatted for INSERT/UPDATE</td>
  </tr>
  <tr> 
    <td>SQLDate($date, $fmt)</td>
    <td>Portably generate a date formatted using $fmt mask, for use in SELECT 
      statements.</td>
  </tr>
  <tr> 
    <td>OffsetDate($date, $ndays)</td>
    <td>Portably generate a $date offset by $ndays.</td>
  </tr>
  <tr> 
    <td>Concat($s1, $s2, ...)</td>
    <td>Portably concatenate strings. Alternatively, for mssql use mssqlpo 
driver, 
      which allows || operator.</td>
  </tr>
  <tr> 
    <td>IfNull($fld, $replaceNull)</td>
    <td>Returns a string that is the equivalent of MySQL IFNULL or Oracle 
NVL.</td>
  </tr>
  <tr>
    <td>Param($name)</td>
    <td>Generates bind placeholders, using ? or named conventions as 
appropriate.</td>
  </tr>
  <tr><td>$db->sysDate</td><td>Property that holds the SQL function that 
returns today's date</td>
</tr>
<tr><td>$db->sysTimeStamp</td><td>Property that holds the SQL function that 
returns the current
timestamp (date+time).
</td>
</tr>
<tr>
<td>$db->concat_operator</td><td>Property that holds the concatenation operator
</td>
</tr>
<tr><td>$db->length</td><td>Property that holds the name of the SQL strlen 
function.
</td></tr>

<tr><td>$db->upperCase</td><td>Property that holds the name of the SQL 
strtoupper function.
</td></tr>
<tr><td>$db->random</td><td>Property that holds the SQL to generate a random 
number between 0.00 and 1.00.
</td>
</tr>
<tr><td>$db->substr</td><td>Property that holds the name of the SQL substring 
function.
</td></tr>
</table>
<p>&nbsp; </p>
<h2>DDL and Tuning</h2>
There are database design tools such as ERWin or Dezign that allow you to 
generate data definition language commands such as ALTER TABLE or CREATE INDEX 
from Entity-Relationship diagrams. 
<p>
However if you prefer to use a PHP-based table creation scheme, adodb provides 
you with this feature. Here is the code to generate the SQL to create a table 
with: 
<ol>
        <li> Auto-increment primary key 'ID', </li>
        <li>The person's 'NAME' VARCHAR(32) NOT NULL and defaults to '', </li>
        <li>The date and time of record creation 'CREATED', </li>
        <li> The person's 'AGE', defaulting to 0, type NUMERIC(16). </li>
</ol>
<p>
Also create a compound index consisting of 'NAME' and 'AGE': 
<pre>
$datadict = <strong>NewDataDictionary</strong>($connection);
$flds = " 
<font color="#660000">  ID I AUTOINCREMENT PRIMARY,
  NAME C(32) DEFAULT '' NOTNULL,
  CREATED T DEFTIMESTAMP,
  AGE N(16) DEFAULT 0</font>
";
$sql1 = $datadict-><strong>CreateTableSQL</strong>('tabname', $flds);
$sql2 = $datadict-><strong>CreateIndexSQL</strong>('idx_name_age', 'tabname', 
'NAME,AGE');
</pre>

<h3>Data Types</h3>
<p>Stick to a few data types that are available in most databases. Char, 
varchar 
  and numeric/number are supported by most databases. Most other data types 
(including 
  integer, boolean and float) cannot be relied on being available. I recommend 
  using char(1) or number(1) to hold booleans. </p>
<p>Different databases have different ways of representing dates and 
timestamps/datetime. 
  ADOdb attempts to display all dates in ISO (YYYY-MM-DD) format. ADOdb also 
provides 
  DBDate( ) and DBTimeStamp( ) to convert dates to formats that are acceptable 
  to that database. Both functions accept Unix integer timestamps and date 
strings 
  in ISO format.</p>
<pre>$date1 = $connection-&gt;DBDate(time( ));<br>$date2 = 
$connection-&gt;DBTimeStamp('2002-02-23 13:03:33');</pre>
<p>We also provide functions to convert database dates to Unix timestamps:</p>
<pre>$unixts = $recordset-&gt;UnixDate('#2002-02-30#'); <font color="green"># 
MS Access date =&gt; unix timestamp</font></pre>
<p>The maximum length of a char/varchar field is also database specific. You 
can 
  only assume that field lengths of up to 250 characters are supported. This is 
  normally impractical for web based forum or content management systems. You 
  will need to be familiar with how databases handle large objects (LOBs). 
ADOdb 
  implements two functions, UpdateBlob( ) and UpdateClob( ) that allow you to 
  update fields holding Binary Large Objects (eg. pictures) and Character Large 
  Objects (eg. HTML articles):</p>
<pre><font color=green># for oracle </font>
$conn->Execute('INSERT INTO blobtable (id, blobcol) VALUES (1,empty_blob())'); 
$conn->UpdateBlob('blobtable','blobcol',$blobvalue,'id=1'); 
   
<font color=green># non-oracle databases</font>
$conn->Execute('INSERT INTO blobtable (id, blobcol) VALUES (1, null)'); 
$conn->UpdateBlob('blobtable','blobcol',$blobvalue,'id=1');
</pre>
<p>Null handling is another area where differences can occur. This is a 
mine-field, 
  because 3-value logic is tricky.
<p>In general, I avoid using nulls except for dates and default all my numeric 
  and character fields to 0 or the empty string. This maintains consistency 
with 
  PHP, where empty strings and zero are treated as equivalent, and avoids SQL 
  ambiguities when you use the ANY and EXISTS operators. However if your 
database 
  has significant amounts of missing or unknown data, using nulls might be a 
good 
  idea. 
  <p>
  ADOdb also supports a portable <a 
href=http://phplens.com/adodb/reference.functions.concat.html#ifnull>IfNull</a> 
function, so you can define what to display
  if the field contains a null.
<h3><b>Stored Procedures</b></h3>
<p>Stored procedures are another problem area. Some databases allow recordsets 
  to be returned in a stored procedure (Microsoft SQL Server and Sybase), and 
  others only allow output parameters to be returned. Stored procedures 
sometimes 
  need to be wrapped in special syntax. For example, Oracle requires such code 
  to be wrapped in an anonymous block with BEGIN and END. Also internal sql 
operators 
  and functions such as +, ||, TRIM( ), SUBSTR( ) or INSTR( ) vary between 
vendors. 
</p>
<p>An example of how to call a stored procedure with 2 parameters and 1 return 
  value follows:</p>
<pre>   switch ($db->databaseType) {
        case '<font color="#993300">mssql</font>':
          $sql = <font color="#000000"><font color="#993333">'<font 
color="#993300">SP_RUNSOMETHING</font>'</font></font>; break;
        case '<font color="#993300">oci8</font>':
          $sql = 
<font color="#993300">    </font><font color="#000000"><font 
color="#993300">&quot;declare RETVAL integer;begin :RETVAL := </font><font 
color="#000000"><font color="#993333"><font 
color="#993300">SP_RUNSOMETHING</font></font></font><font 
color="#993300">(:myid,:group);end;&quot;;
</font>   break;</font>
        default:
          die('<font color="#993300">Unsupported feature</font>');
        }
<font color="#000000"><font color="green">      # @RETVAL = SP_RUNSOMETHING 
@myid,@group</font>
        $stmt = $db-&gt;PrepareSP($sql);        <br>    
$db-&gt;Parameter($stmt,$id,'<font color="#993300">myid</font>'); 
        $db-&gt;Parameter($stmt,$group,'<font color="#993300">group</font>');
        <font color="green"># true indicates output parameter<br>       
</font>$db-&gt;Parameter($stmt,$ret,'<font 
color="#993300">RETVAL</font>',true); 
        $db-&gt;Execute($stmt); </font></pre>
<p>As you can see, the ADOdb API is the same for both databases. But the stored 
  procedure SQL syntax is quite different between databases and is not 
portable, 
  so be forewarned! However sometimes you have little choice as some systems 
only 
  allow data to be accessed via stored procedures. This is when the ultimate 
portability 
  solution might be the only solution: <i>treating portable SQL as a 
localization 
  exercise...</i></p>
<h3><b>SQL as a Localization Exercise</b></h3>
<p> In general to provide real portability, you will have to treat SQL coding 
  as a localization exercise. In PHP, it has become common to define separate 
  language files for English, Russian, Korean, etc. Similarly, I would suggest 
  you have separate Sybase, Intebase, MySQL, etc files, and conditionally 
include 
  the SQL based on the database. For example, each MySQL SQL statement would be 
  stored in a separate variable, in a file called 'mysql-lang.inc.php'.</p>
<pre>$sqlGetPassword = '<font color="#993300">select password from users where 
userid=%s</font>';
$sqlSearchKeyword = &quot;<font color="#993300">SELECT * FROM articles WHERE 
match (title,body) against (%s</font>)&quot;;</pre>
<p>In our main PHP file:</p>
<pre><font color=green># define which database to load...</font>
<b>$database = '<font color="#993300">mysql</font>';
include_once(&quot;<font 
color="#993300">$database-lang.inc.php</font>&quot;);</b>

$db = &amp;NewADOConnection($database);
$db->PConnect(...) or die('<font color="#993300">Failed to connect to 
database</font>');

<font color=green># search for a keyword $word</font>
$rs = $db-&gt;Execute(sprintf($sqlSearchKeyWord,$db-&gt;qstr($word)));</pre>
<p>Note that we quote the $word variable using the qstr( ) function. This is 
because 
  each database quotes strings using different conventions.</p>
<p>
<h3>Final Thoughts</h3>
<p>The best way to ensure that you have portable SQL is to have your data 
tables designed using 
sound principles. Learn the theory of normalization and entity-relationship 
diagrams and model 
your data carefully. Understand how joins and indexes work and how they are 
used to tune performance.
<p> Visit the following page for more references on database theory and 
vendors: 
  <a 
href="http://php.weblogs.com/sql_tutorial";>http://php.weblogs.com/sql_tutorial</a>.
 
  Also read this article on <a 
href=http://phplens.com/lens/php-book/optimizing-debugging-php.php>Optimizing 
PHP</a>.
<p>
<font size=1>(c) 2002-2003 John Lim.</font>

</body>
</html>
[Prev in Thread]
Current Thread
[Next in Thread]
[Phpgroupware-cvs] phpgwapi/inc/adodb/docs/tips_portable_sql.htm, 1.2, nomail <=
- [Phpgroupware-cvs] phpgwapi/inc/adodb/docs/tips_portable_sql.htm, 1.2, nomail, 2004/12/30
Prev by Date: [Phpgroupware-cvs] phpgwapi/inc/adodb/tests/testgenid.php, 1.2
Next by Date: [Phpgroupware-cvs] phpgwapi/inc/adodb/contrib/toxmlrpc.inc.php, 1.2
Previous by thread: [Phpgroupware-cvs] phpgwapi/templates/base/images/up.button.png, 1.2
Next by thread: [Phpgroupware-cvs] phpgwapi/inc/adodb/docs/tips_portable_sql.htm, 1.2
Index(es):
- Date
- Thread