Things not to believe in the MS documentation

From Jet Replication Wiki -- Microsoft Office Access Replication

Jump to: navigation, search

Microsoft's own documentation on Jet Replication is a patchwork of marvelous information and ill-conceived misinformation. This article is an effort to sort through some of the problems with MS's Jet Replication docs.

Contents

Microsoft's most current documentation

If you go to the top-level page for Microsoft Office Access, you'll find that a search for the term "replication" will not return any articles that tell you how to use replication in Access 2007. This is despite the fact that replication is fully supported by Access 2007 and the new ACE as long as you contintue to use MDB file format (i.e., the ACCDB does not support replication). The articles that come up in the search mostly cast replication as a problem that has to be overcome in Access 2007. This is a despicable approach for Microsoft to use, and convinces me that Microsoft has active animus towards Jet Replication and doesn't want you to use it.

This is also demonstrated by the fact that it's so difficult to find the top-level overview articles about Jet Replication as a whole, even when you're on the Access 2003 home page. If you search there for "replication" you get a list of articles that are directly about replication, but not a single one of them is the top-level article that they provide as the overview of Jet Replication in Access 2003:

Database Replication

This is the table of contents for four sub-articles that are themselves tables of contents that organize a collection of about two dozen articles on specific replication topics. But the information in these articles is shallow and not linked to more detailed information that is found in the replication white papers referred to on the Jet Replication Wiki's Resources page. And it is filled with errors and misleading statements. Indeed, it recapitulates many of the mistakes that were in the original documentation that were later corrected in the Jet 4 Replication FAQ. Several of the following items are direct attempts to correct the record in regard to the information found there.

Outright errors in the Database Replication articles

1. In Changes made to your database when you use replication (MDB) under the heading "Additional size limitations on a replicated database", this is stated as fact:

The Microsoft Jet database engine supports files up to a maximum of 1 gigabyte in size...

While Jet 3.x databases were limited to 1GB, Jet 4 databases doubled that limitation.

2. in About choosing a synchronization method this outright false statement is made:

Indirect synchronization can only be configured by using Replication Manager.

This is just not true. It can be accomplished by running the synchronizer and creating a handful of registry keys (full instructions in the Jet Replication Wiki FAQ at question 9. That's all the setup wizard in ReplMan does, but for some reason Microsoft wants to hide this. The de facto result of this kind of statement is that you can't use indirect replication, since since the Access 2002 developer tools, there has been no way to legally acquire a copy of ReplMan at all.

In regard to the statement in the same article saying the same thing about Internet Replication, I expect that the same thing applies -- it can no doubt be accomplished without ReplMan, but since I've never contemplated using Internet Replication, I can't say. Certainly the hard-wired dependency on IIS is a crucial problem, and perhaps the wizard in ReplMan makes it possible to configure it in ways that aren't otherwise practical.

3. I have debated over whether this next one belongs as an outright error or just a misleading statement, but while writing it up as misleading, I realized there was so many things wrong with it that it should be treated as completely erroneous. I'm referring to the article on conflicts, particularly the explanation of the conflict type referred to as "Update referential integrity conflict." Here's the definition and advice:

An update referential integrity conflict occurs when the primary key is updated at one replica and new child records that reference the original primary key value are added at a different replica. During synchronization, the new child records are deleted and logged in the conflict table. To reduce the number of these types of errors during synchronization, you may want to use the s_GUID feld as the primary key in some or all of your replicated tables.

There are so many things wrong with this that it's frightening. First of all, this shows that natural keys with CASCADE UPDATE have special problems in replicated applications. While it's perfectly valid in a non-replicated app to have a lookup table with a natural primary key, in a replicated app, this would be safe only in cases where you either entirely prohibit edits to the natural key field, or where you have CASCADE UPDATE turned off. The former can be done only via the user interface of your app, while the latter is not fail-safe.

Consider the situation with CASCADE UPDATE turned OFF where in Replica1 you change PK ABC to XYS. If there are records in the related table in Replica1, you will be prevented from making the change. But if there are no records in Replica1 to prohibit the change, and there are unsynchronized records pointing to ABC in Replica2, you'll end up with precisely the conflict described here.

Thus, replication makes natural keys, with or without CASCADE UPDATE, more problematic, because completely proper edits to the data can result in this precise kind of conflict. Because of that, natural keys should probably be completely avoided in replicated apps in order to avoid this type of replication error.

Now, the advice that is given in the MS article is to stop using the natural key as the PK and instead use a surrogate. But recommending that the s_GUID field is the best surrogate keys is very problematic precisely because of the issues Access has with GUIDs in general, as outlined by Michael Kaplan in his article on Replication and GUIDs, the Good, the Bad, and the Ugly. While using the s_GUID field as the PK will definitely avoid the problem with breaking the conflict viewer, it's still going to cause problems in all sorts of places (as outlined in the article).

It's also enormously inefficient in terms of data storage. It seems to me that a random Autonumber is by far the best surrogate key here, and that the advice is best considered to just be wrong.

Other misleading statements in the Database Replication articles

1. The article About choosing a tool for replication (MDB) again suggests that Briefcase replication is viable. See below for this page's comment on that subject as well Briefcase Replication for a more detailed explanation of the problems with it.

2. The About choosing a tool for replication (MDB) article also recommends JRO for programmatic control, despite the fact that JRO is a wholly inadequate replacement for tools like Replication Manager and the TSI Synchronizer that provide full control of Jet Replication functionality and despite the fact that its only purpose for being was as one of the ugly step-children of ADO, which Microsoft mistakenly pushed as the preferred data interface layer for Jet data stores despite the superiority of DAO as the native data interface layer. Microsoft has since back tracked on this and now recommends DAO instead of ADO (which is an orphan technology still actively supported only in Access and classic ASP -- for other platforms it's been replaced by the vastly superior ADO.NET, which Access cannot use), but the Jet Replication documentation still recommends the woefully underpowered JRO for programmatic control of replication operations. The only redeeming feature of JRO in comparison to DAO is that it offers the capability to initiate an indirect synch in code. But this can be accomplished with the TSI Synchronizer, which itself offers so much more functionality that there is really no reason not to use the TSI Synchronizer and ignore JRO entirely.

3. In About changing the design of a replicated database (MDB), the following statement is incomplete:

If you create a linked table in the Design Master, it's created as a local object and remains local until you make it replicable.

In fact, ODBC linked tables that are created after replication of a front-end MDB can never be made replicable, as the REPLICABLE property of the table link is disables.

4. This one is nothing but a quibble in terms of emphasis, but it's an important one. In regard to retention period, the article titled "About the replica set retention period setting (MDB)" says this:

The replica set should have a large retention period if the replicas do not synchronize frequently. However, if the replicas synchronize frequently and you want to keep the replica size small, specify a shorter retention period.

It seems to me that this is rather misleading, as it encourages setting a short retention period. There is only one practical result of setting a short retention period (other than making your replica expire sooner) and that is to reduce the size of the MSysTombstons table, which tracks deletions. Now, in a replicated database, there oughtn't be an overly large number of record deletions -- if there are regular batch deletions, it suggests that you've mistakenly replicated what amounts to a temporary table. The number of records in the MSysTombstones table should not be overly great and thus cannot really contribute significantly to the size of a replica.

5. The documentation continues to promote the use of GUIDs, despite the shortcomings of using them in Access applications (and see below), as in About AutoNumber field size and replicated databases (MDB):

If you plan to replicate your database, you need to consider what field size is appropriate for the AutoNumber field used as a table's primary key. If you use an AutoNumber field as the primary key for a table in the replicated database, set its FieldSize property to either Long Integer or Replication ID.
- If fewer than 100 records are routinely added between synchronizing replicas, use a Long Integer setting for the FieldSize property to take up less disk space.
- If more than 100 records are routinely added between synchronizing replicas, you should use Replication ID for the FieldSize property setting to prevent records from being assigned the same primary key value in each replica. Note, however, that an AutoNumber field with a Replication ID field size produces a 128-bit value that will require more disk space.

For what it's worth, since I started using Jet Replication in 1997, I've never used anything but regular random AutoNumbers (i.e., no Replication IDs) for any of my apps, and have never once had a random AutoNumber collision.

6. The instructions for various operations are rather dangerous if followed literally. For instance, in the article on troubleshooting replication, there is this suggestion for one problem:

I can't make my replica the Design Master.
If you still can't make your replica the Design Master and haven't received another error message, your replica may be damaged. Delete the replica and create a new one.

That should really recommend trying to create a new replica from the problematic replica before deleting it (or, at the very least, synching with another replica). While that is certainly implied by previous pieces of advice, the setup of the article is such that by default, the other text doesn't appear (only the headings), and so someone looking for advice regarding this specific problem might not see the other suggestions.

7. The advice on how to avoid replicating MDBs when dragged to the Briefcase is not exactly wrong, but not really right either. In Troubleshoot replication the article provides this suggestion:

Briefcase replicates Microsoft Access database files that you drag into My Briefcase. If you don't want to replicate the database file, change the file extension from .mdb (or .mde) to another extension.

While this will work, it will also make it hard for a user to launch the files with Access. And it requires knowledge that some users may not have (unfortunately). The way to turn off Briefcase Replication of MDBs/MDEs is to uninstall Briefcase Replication support with the Office/Access setup program. The solution MS offers is really only helpful if you have the occasional file you don't want replicated but still want to replicate others using Briefcase Replication.

Of course, as outlined below, I don't believe Briefcase Replication is useful to begin with.

8. On the topic of converting replica sets to new version, the same troubleshooting article suggests:

You can convert a replica set that you created in Microsoft Access 97 by synchronizing your replica set and then converting every replica individually in Access 2000 or later versions.

While it is true that you can upgrade in this manner and it's pretty quick and easy, I have always found it better when upgrading to synch all your replicas, unreplicate, then recreate the replica set from scratch. While this is much more work, it results in a much cleaner replica set as well as allowing you to take full advantage of new features (such as column-level conflict resolution).

9. The article on how to replicate programatically gives instructions only with JRO. JRO is an idiotic library created only because of the problems created by MS's now-abandoned move to try to replace DAO with ADO. It wouldn't exist if MS hadn't made that stupid mistake. Unfortunately, the legacy of that bad move is that JRO offers one piece of functionality that DAO lacks, and that's the ability to initiate an indirect synch in code. However, rather than add a dependency in your app on JRO, I recommend using the TSI Synchronizer instead. While that is also an outside dependency, it gets you programatic control of a whole host of replication functions that are unavailable with DAO/JRO.

10. While the article on unreplicating is OK so far as it goes, it ignores much easier methods that are outlined in Question 10 of the Jet Replication Wiki FAQ.

11. The articles on Replication Visibility and the comparison of visibility types are misleading in that they don't outline any of the potential downsides of using anything other than a global replica (except by implication). Michael Kaplan has discussed the problems of anonymous replicas in his article Sometimes it doesn't pay to be anonymous!

12. The article on conflicts is rather optimistic in its assessment of how well conflict resolution works with anything other than data edit conflicts. That is, with synchronization errors reported along with conflicts, you'd expect the conflict resolver to be able to resolve the errors, but this is not true in all cases. In fact, it depends on the kind of error, and not all of them can be dismissed by simply deleting the conflict record (which is what accepting the "winning" record does in a data conflict).

13. A more nebulous issue is that the documentation doesn't really explain what happens with conflict resolution. When there's a conflict during a synch, Jet applies one of the changes to both replicas. That's the "winning" record. The "losing" record is saved in the conflict table. When you are told that your replica has conflicts that need to be resolved, this is actually fales -- the conflicts have been resolved at the time you're presented with the notice. All you're doing is checking that Jet guessed right in picking the winning record. This has never really been made very clear in any of the Jet Replication documentation, and it took me years to understand it!

14. A minor issue in the article on conflicts is what is said about implementing column-level conflict resolution:

You can set the conflict tracking on a database or an individual table prior to making it replicable. Once you create a replica, you can’t change the setting. However, you can change the value of individual tables to row-level tracking.

While this is 100% true for replicas created from Jet 4 databases, it's not true of replicas that have been converted to Jet 4 from Jet 3.x. Instructions for how to get column-level conflict resolution in such a replica are found in the Jet Replication Wiki's FAQ Question 7. This is one of the reasons why in Item 7 above I recommend not converting individual replicas, but instead recreating the replica set from scratch after conversion.

15. Further in the article on conflicts, we have this description of the "Foreign key violation conflict."

A foreign key violation conflict occurs when there is an invalid primary key record. This could be caused by any of the other conflict types.

What is not said here is that this is the main type of replication error that cannot possibly be resolved with the conflict viewer (and very often creates the kind of problems that cause it to not even show up in the conflict viewer at all). These are very difficult to resolve because they require intimate knowledge of the replica set's referential integrity rules, as well as access to the replicas on both sides of the synch to figure out what actually happened. This can be extremely difficult with remote replicas when you don't have remote access (such as through Remote Desktop Protocol). This is the main case where I feel the move to treat replication errors the same as conflicts has not really been helpful.

16. This one is just sloppiness, I think, not reflecting updates to the Conflict Resolver that have happened in the last several years. In the conflicts article, the explanation under "Resolve synchronization conflicts between replica set members" uses terminology that has nothing at all to do with the conflict resolver that has been in use in Access since Access 2003 (or perhaps earlier, as it's a component shared with SQL Server). The article says:

3. For each conflict shown in the Conflict Viewer, choose one of the following options.
To keep the data in Replica 1, click Keep existing data.
To modify the data in Replica 1 and copy it to Replica 2 on the next synchronization, click Keep revised data.
To keep the data in Replica 2 and overwrite the data in Replica 1 on the next synchronization, click Overwrite with conflicting data.
To modify the data in Replica 2 and overwrite the data in Replica 1 on the next synchronization, click Overwrite with revised data.
4. Click Resolve.

The problem with this is that all the text in bold (from the original) refers to wordings that are not used in the conflict resolver at all. Here's what they say, and what's actually there:

Keep existing data : Keep winning change
Keep revised data : [does not exist -- it's simply part of the UI]
Overwrite with conflicting data: Resolve With This Data (under the "conflict loser" dropdown)
Overwrite with revised data : [does not exist -- it's simply part of the UI]
Resolve: [irrelevant -- command buttons take care of it]

What is described in the article represents the state of the Conflict Resolver interface in versions previous to Access 2003. On one of my PCs I have Access 2003 installed, and on another I have Access 2000. On the latter, the conflict resolver has the interface referred to in this article. On the PC with A2003, it's completely different.

This article is badly in need of revision. It has information that is specific to a particular version of the conflict resolver, which is actually independent of your version of Access, as it's a component outside Access itself. My A2003 PC has A2000 installed and the newer conflict resolver is what I get on that machine -- it's only on the older machine with only A2000 installed that I see the conflict resolver interface outlined in this article.

This is just really sloppy documentation on MS's part -- it can only serve to confuse current users of the conflict viewer.

Jet Replication works well for pushing out front end updates to end users

CLAIM: "Microsoft Jet replication is a good solution if you want to...automate the distribution of new features and updates to multiple users." -- Jet 3.5 White Paper and Jet 4 White Paper, p. 3

TRUTH: Jet Replication works well only for pure Jet objects, i.e., Tables and Queries, and does not work well over the long term for Access objects (forms, reports, macros, modules). See Why is replication not a good way to distribute changes to a front end application? in the Replication Wiki Frequently Asked Questions (FAQ).

Note that both the Jet 3.5 and Jet 4.0 FAQs correct this in their Question 12:

You can replicate the application database, the table database, or both. However, you may find it more efficient to replicate only the table database. Although design changes are replicable, synchronizing them can cause project stability issues. There have been reported instances of design changes to forms and modules that do not propagate successfully to all replicas in a replica set. If it is possible, develop your application fully before you replicate and distribute it. If you have to make design changes later, distribute the front-end database separately from the tables. Replicating only the table database typically offers reduced synchronization times as an added benefit.

Briefcase Replication is going to be useful

TRUTH: Briefcase Replication can only ever really work for a single-user database, moving data back and forth between two PCs. No single person can ever be using two PCs at once, so it's better to just copy over top of the MDB file based on editing dates. Any multi-user database will end up with major problems with this, as no multi-user database should be anything but a Split Architecture, and so the end user won't have the data file on his or her desktop anyway. The only scenario where it might be usable would be if there really only are two computers involved and both are in use at once by a single user each (two offices, for example). Briefcase Replication seems to be orphaned by Microsoft as it's never been given much play after its introduction as part of Windows 95.

A Replication ID (GUID) has only one downside when used as a primary key

CLAIM: "You can use the s_GUID field as the primary key in the database. The advantage of doing so is that it virtually eliminates the possibility of duplicate keys; the potential disadvantage is that the GUID field does not convey any meaning to the user." -- Jet 3.5 White Paper, p. 8, and Jet 4 White Paper, p. 12

TRUTH: As explained by Michael Kaplan in Replication and GUIDs, the Good, the Bad, and the Ugly, GUIDs are problematic for two reasons that the White Papers don't mention:
1. Access doesn't deal very well with GUID data types.
2. If you use a Replication ID as your primary key, the built-in conflict resolver will no longer work.
See Michael Kaplan's article for a fuller explanation of these points.
As to the asseration about meaningfulness to the end user, this is no different from using an Autonumber in a replicated database, because once replicated, all Autonumber fields are converted to Random Autonumbers which also won't have any real meaning to a user. That is, in a replicated database, it's even more inadvsiable than in an ordinary database to treat the value of an Autonumber field as having any meaning for end users -- it should only ever be used behind the scenes anyway, so there is no relative advantage/disadvantage either way for Replication ID vs. Autonumber.
It's important to note that Microsoft's Jet 4.0 FAQ gets it right (questions 2 & 3), even though the Jet 3.5 FAQ doesn't, while neither of the white papers makes the downside of GUIDs clear.
Personal tools